Methodology Series Module 5: Sampling Strategies.
Setia, Maninder Singh
2016-01-01
Once the research question and the research design have been finalised, it is important to select the appropriate sample for the study. The method by which the researcher selects the sample is the ' Sampling Method'. There are essentially two types of sampling methods: 1) probability sampling - based on chance events (such as random numbers, flipping a coin etc.); and 2) non-probability sampling - based on researcher's choice, population that accessible & available. Some of the non-probability sampling methods are: purposive sampling, convenience sampling, or quota sampling. Random sampling method (such as simple random sample or stratified random sample) is a form of probability sampling. It is important to understand the different sampling methods used in clinical studies and mention this method clearly in the manuscript. The researcher should not misrepresent the sampling method in the manuscript (such as using the term ' random sample' when the researcher has used convenience sample). The sampling method will depend on the research question. For instance, the researcher may want to understand an issue in greater detail for one particular population rather than worry about the ' generalizability' of these results. In such a scenario, the researcher may want to use ' purposive sampling' for the study.
Methodology Series Module 5: Sampling Strategies
Setia, Maninder Singh
2016-01-01
Once the research question and the research design have been finalised, it is important to select the appropriate sample for the study. The method by which the researcher selects the sample is the ‘ Sampling Method’. There are essentially two types of sampling methods: 1) probability sampling – based on chance events (such as random numbers, flipping a coin etc.); and 2) non-probability sampling – based on researcher's choice, population that accessible & available. Some of the non-probability sampling methods are: purposive sampling, convenience sampling, or quota sampling. Random sampling method (such as simple random sample or stratified random sample) is a form of probability sampling. It is important to understand the different sampling methods used in clinical studies and mention this method clearly in the manuscript. The researcher should not misrepresent the sampling method in the manuscript (such as using the term ‘ random sample’ when the researcher has used convenience sample). The sampling method will depend on the research question. For instance, the researcher may want to understand an issue in greater detail for one particular population rather than worry about the ‘ generalizability’ of these results. In such a scenario, the researcher may want to use ‘ purposive sampling’ for the study. PMID:27688438
An, Zhao; Wen-Xin, Zhang; Zhong, Yao; Yu-Kuan, Ma; Qing, Liu; Hou-Lang, Duan; Yi-di, Shang
2016-06-29
To optimize and simplify the survey method of Oncomelania hupensis snail in marshland endemic region of schistosomiasis and increase the precision, efficiency and economy of the snail survey. A quadrate experimental field was selected as the subject of 50 m×50 m size in Chayegang marshland near Henghu farm in the Poyang Lake region and a whole-covered method was adopted to survey the snails. The simple random sampling, systematic sampling and stratified random sampling methods were applied to calculate the minimum sample size, relative sampling error and absolute sampling error. The minimum sample sizes of the simple random sampling, systematic sampling and stratified random sampling methods were 300, 300 and 225, respectively. The relative sampling errors of three methods were all less than 15%. The absolute sampling errors were 0.221 7, 0.302 4 and 0.047 8, respectively. The spatial stratified sampling with altitude as the stratum variable is an efficient approach of lower cost and higher precision for the snail survey.
Makowski, David; Bancal, Rémi; Bensadoun, Arnaud; Monod, Hervé; Messéan, Antoine
2017-09-01
According to E.U. regulations, the maximum allowable rate of adventitious transgene presence in non-genetically modified (GM) crops is 0.9%. We compared four sampling methods for the detection of transgenic material in agricultural non-GM maize fields: random sampling, stratified sampling, random sampling + ratio reweighting, random sampling + regression reweighting. Random sampling involves simply sampling maize grains from different locations selected at random from the field concerned. The stratified and reweighting sampling methods make use of an auxiliary variable corresponding to the output of a gene-flow model (a zero-inflated Poisson model) simulating cross-pollination as a function of wind speed, wind direction, and distance to the closest GM maize field. With the stratified sampling method, an auxiliary variable is used to define several strata with contrasting transgene presence rates, and grains are then sampled at random from each stratum. With the two methods involving reweighting, grains are first sampled at random from various locations within the field, and the observations are then reweighted according to the auxiliary variable. Data collected from three maize fields were used to compare the four sampling methods, and the results were used to determine the extent to which transgene presence rate estimation was improved by the use of stratified and reweighting sampling methods. We found that transgene rate estimates were more accurate and that substantially smaller samples could be used with sampling strategies based on an auxiliary variable derived from a gene-flow model. © 2017 Society for Risk Analysis.
Che, W W; Frey, H Christopher; Lau, Alexis K H
2014-12-01
Population and diary sampling methods are employed in exposure models to sample simulated individuals and their daily activity on each simulation day. Different sampling methods may lead to variations in estimated human exposure. In this study, two population sampling methods (stratified-random and random-random) and three diary sampling methods (random resampling, diversity and autocorrelation, and Markov-chain cluster [MCC]) are evaluated. Their impacts on estimated children's exposure to ambient fine particulate matter (PM2.5 ) are quantified via case studies for children in Wake County, NC for July 2002. The estimated mean daily average exposure is 12.9 μg/m(3) for simulated children using the stratified population sampling method, and 12.2 μg/m(3) using the random sampling method. These minor differences are caused by the random sampling among ages within census tracts. Among the three diary sampling methods, there are differences in the estimated number of individuals with multiple days of exposures exceeding a benchmark of concern of 25 μg/m(3) due to differences in how multiday longitudinal diaries are estimated. The MCC method is relatively more conservative. In case studies evaluated here, the MCC method led to 10% higher estimation of the number of individuals with repeated exposures exceeding the benchmark. The comparisons help to identify and contrast the capabilities of each method and to offer insight regarding implications of method choice. Exposure simulation results are robust to the two population sampling methods evaluated, and are sensitive to the choice of method for simulating longitudinal diaries, particularly when analyzing results for specific microenvironments or for exposures exceeding a benchmark of concern. © 2014 Society for Risk Analysis.
Methods for sample size determination in cluster randomized trials
Rutterford, Clare; Copas, Andrew; Eldridge, Sandra
2015-01-01
Background: The use of cluster randomized trials (CRTs) is increasing, along with the variety in their design and analysis. The simplest approach for their sample size calculation is to calculate the sample size assuming individual randomization and inflate this by a design effect to account for randomization by cluster. The assumptions of a simple design effect may not always be met; alternative or more complicated approaches are required. Methods: We summarise a wide range of sample size methods available for cluster randomized trials. For those familiar with sample size calculations for individually randomized trials but with less experience in the clustered case, this manuscript provides formulae for a wide range of scenarios with associated explanation and recommendations. For those with more experience, comprehensive summaries are provided that allow quick identification of methods for a given design, outcome and analysis method. Results: We present first those methods applicable to the simplest two-arm, parallel group, completely randomized design followed by methods that incorporate deviations from this design such as: variability in cluster sizes; attrition; non-compliance; or the inclusion of baseline covariates or repeated measures. The paper concludes with methods for alternative designs. Conclusions: There is a large amount of methodology available for sample size calculations in CRTs. This paper gives the most comprehensive description of published methodology for sample size calculation and provides an important resource for those designing these trials. PMID:26174515
A random spatial sampling method in a rural developing nation
Michelle C. Kondo; Kent D.W. Bream; Frances K. Barg; Charles C. Branas
2014-01-01
Nonrandom sampling of populations in developing nations has limitations and can inaccurately estimate health phenomena, especially among hard-to-reach populations such as rural residents. However, random sampling of rural populations in developing nations can be challenged by incomplete enumeration of the base population. We describe a stratified random sampling method...
Serang, Oliver
2012-01-01
Linear programming (LP) problems are commonly used in analysis and resource allocation, frequently surfacing as approximations to more difficult problems. Existing approaches to LP have been dominated by a small group of methods, and randomized algorithms have not enjoyed popularity in practice. This paper introduces a novel randomized method of solving LP problems by moving along the facets and within the interior of the polytope along rays randomly sampled from the polyhedral cones defined by the bounding constraints. This conic sampling method is then applied to randomly sampled LPs, and its runtime performance is shown to compare favorably to the simplex and primal affine-scaling algorithms, especially on polytopes with certain characteristics. The conic sampling method is then adapted and applied to solve a certain quadratic program, which compute a projection onto a polytope; the proposed method is shown to outperform the proprietary software Mathematica on large, sparse QP problems constructed from mass spectometry-based proteomics. PMID:22952741
Methods and analysis of realizing randomized grouping.
Hu, Liang-Ping; Bao, Xiao-Lei; Wang, Qi
2011-07-01
Randomization is one of the four basic principles of research design. The meaning of randomization includes two aspects: one is to randomly select samples from the population, which is known as random sampling; the other is to randomly group all the samples, which is called randomized grouping. Randomized grouping can be subdivided into three categories: completely, stratified and dynamically randomized grouping. This article mainly introduces the steps of complete randomization, the definition of dynamic randomization and the realization of random sampling and grouping by SAS software.
Improved Compressive Sensing of Natural Scenes Using Localized Random Sampling
Barranca, Victor J.; Kovačič, Gregor; Zhou, Douglas; Cai, David
2016-01-01
Compressive sensing (CS) theory demonstrates that by using uniformly-random sampling, rather than uniformly-spaced sampling, higher quality image reconstructions are often achievable. Considering that the structure of sampling protocols has such a profound impact on the quality of image reconstructions, we formulate a new sampling scheme motivated by physiological receptive field structure, localized random sampling, which yields significantly improved CS image reconstructions. For each set of localized image measurements, our sampling method first randomly selects an image pixel and then measures its nearby pixels with probability depending on their distance from the initially selected pixel. We compare the uniformly-random and localized random sampling methods over a large space of sampling parameters, and show that, for the optimal parameter choices, higher quality image reconstructions can be consistently obtained by using localized random sampling. In addition, we argue that the localized random CS optimal parameter choice is stable with respect to diverse natural images, and scales with the number of samples used for reconstruction. We expect that the localized random sampling protocol helps to explain the evolutionarily advantageous nature of receptive field structure in visual systems and suggests several future research areas in CS theory and its application to brain imaging. PMID:27555464
NASA Astrophysics Data System (ADS)
WANG, P. T.
2015-12-01
Groundwater modeling requires to assign hydrogeological properties to every numerical grid. Due to the lack of detailed information and the inherent spatial heterogeneity, geological properties can be treated as random variables. Hydrogeological property is assumed to be a multivariate distribution with spatial correlations. By sampling random numbers from a given statistical distribution and assigning a value to each grid, a random field for modeling can be completed. Therefore, statistics sampling plays an important role in the efficiency of modeling procedure. Latin Hypercube Sampling (LHS) is a stratified random sampling procedure that provides an efficient way to sample variables from their multivariate distributions. This study combines the the stratified random procedure from LHS and the simulation by using LU decomposition to form LULHS. Both conditional and unconditional simulations of LULHS were develpoed. The simulation efficiency and spatial correlation of LULHS are compared to the other three different simulation methods. The results show that for the conditional simulation and unconditional simulation, LULHS method is more efficient in terms of computational effort. Less realizations are required to achieve the required statistical accuracy and spatial correlation.
Generating Random Samples of a Given Size Using Social Security Numbers.
ERIC Educational Resources Information Center
Erickson, Richard C.; Brauchle, Paul E.
1984-01-01
The purposes of this article are (1) to present a method by which social security numbers may be used to draw cluster samples of a predetermined size and (2) to describe procedures used to validate this method of drawing random samples. (JOW)
A Bayesian sequential design with adaptive randomization for 2-sided hypothesis test.
Yu, Qingzhao; Zhu, Lin; Zhu, Han
2017-11-01
Bayesian sequential and adaptive randomization designs are gaining popularity in clinical trials thanks to their potentials to reduce the number of required participants and save resources. We propose a Bayesian sequential design with adaptive randomization rates so as to more efficiently attribute newly recruited patients to different treatment arms. In this paper, we consider 2-arm clinical trials. Patients are allocated to the 2 arms with a randomization rate to achieve minimum variance for the test statistic. Algorithms are presented to calculate the optimal randomization rate, critical values, and power for the proposed design. Sensitivity analysis is implemented to check the influence on design by changing the prior distributions. Simulation studies are applied to compare the proposed method and traditional methods in terms of power and actual sample sizes. Simulations show that, when total sample size is fixed, the proposed design can obtain greater power and/or cost smaller actual sample size than the traditional Bayesian sequential design. Finally, we apply the proposed method to a real data set and compare the results with the Bayesian sequential design without adaptive randomization in terms of sample sizes. The proposed method can further reduce required sample size. Copyright © 2017 John Wiley & Sons, Ltd.
Observational studies of patients in the emergency department: a comparison of 4 sampling methods.
Valley, Morgan A; Heard, Kennon J; Ginde, Adit A; Lezotte, Dennis C; Lowenstein, Steven R
2012-08-01
We evaluate the ability of 4 sampling methods to generate representative samples of the emergency department (ED) population. We analyzed the electronic records of 21,662 consecutive patient visits at an urban, academic ED. From this population, we simulated different models of study recruitment in the ED by using 2 sample sizes (n=200 and n=400) and 4 sampling methods: true random, random 4-hour time blocks by exact sample size, random 4-hour time blocks by a predetermined number of blocks, and convenience or "business hours." For each method and sample size, we obtained 1,000 samples from the population. Using χ(2) tests, we measured the number of statistically significant differences between the sample and the population for 8 variables (age, sex, race/ethnicity, language, triage acuity, arrival mode, disposition, and payer source). Then, for each variable, method, and sample size, we compared the proportion of the 1,000 samples that differed from the overall ED population to the expected proportion (5%). Only the true random samples represented the population with respect to sex, race/ethnicity, triage acuity, mode of arrival, language, and payer source in at least 95% of the samples. Patient samples obtained using random 4-hour time blocks and business hours sampling systematically differed from the overall ED patient population for several important demographic and clinical variables. However, the magnitude of these differences was not large. Common sampling strategies selected for ED-based studies may affect parameter estimates for several representative population variables. However, the potential for bias for these variables appears small. Copyright © 2012. Published by Mosby, Inc.
An improved sampling method of complex network
NASA Astrophysics Data System (ADS)
Gao, Qi; Ding, Xintong; Pan, Feng; Li, Weixing
2014-12-01
Sampling subnet is an important topic of complex network research. Sampling methods influence the structure and characteristics of subnet. Random multiple snowball with Cohen (RMSC) process sampling which combines the advantages of random sampling and snowball sampling is proposed in this paper. It has the ability to explore global information and discover the local structure at the same time. The experiments indicate that this novel sampling method could keep the similarity between sampling subnet and original network on degree distribution, connectivity rate and average shortest path. This method is applicable to the situation where the prior knowledge about degree distribution of original network is not sufficient.
Sampling Large Graphs for Anticipatory Analytics
2015-05-15
low. C. Random Area Sampling Random area sampling [8] is a “ snowball ” sampling method in which a set of random seed vertices are selected and areas... Sampling Large Graphs for Anticipatory Analytics Lauren Edwards, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller Lincoln...systems, greater human-in-the-loop involvement, or through complex algorithms. We are investigating the use of sampling to mitigate these challenges
SNP selection and classification of genome-wide SNP data using stratified sampling random forests.
Wu, Qingyao; Ye, Yunming; Liu, Yang; Ng, Michael K
2012-09-01
For high dimensional genome-wide association (GWA) case-control data of complex disease, there are usually a large portion of single-nucleotide polymorphisms (SNPs) that are irrelevant with the disease. A simple random sampling method in random forest using default mtry parameter to choose feature subspace, will select too many subspaces without informative SNPs. Exhaustive searching an optimal mtry is often required in order to include useful and relevant SNPs and get rid of vast of non-informative SNPs. However, it is too time-consuming and not favorable in GWA for high-dimensional data. The main aim of this paper is to propose a stratified sampling method for feature subspace selection to generate decision trees in a random forest for GWA high-dimensional data. Our idea is to design an equal-width discretization scheme for informativeness to divide SNPs into multiple groups. In feature subspace selection, we randomly select the same number of SNPs from each group and combine them to form a subspace to generate a decision tree. The advantage of this stratified sampling procedure can make sure each subspace contains enough useful SNPs, but can avoid a very high computational cost of exhaustive search of an optimal mtry, and maintain the randomness of a random forest. We employ two genome-wide SNP data sets (Parkinson case-control data comprised of 408 803 SNPs and Alzheimer case-control data comprised of 380 157 SNPs) to demonstrate that the proposed stratified sampling method is effective, and it can generate better random forest with higher accuracy and lower error bound than those by Breiman's random forest generation method. For Parkinson data, we also show some interesting genes identified by the method, which may be associated with neurological disorders for further biological investigations.
Sampling Methods in Cardiovascular Nursing Research: An Overview.
Kandola, Damanpreet; Banner, Davina; O'Keefe-McCarthy, Sheila; Jassal, Debbie
2014-01-01
Cardiovascular nursing research covers a wide array of topics from health services to psychosocial patient experiences. The selection of specific participant samples is an important part of the research design and process. The sampling strategy employed is of utmost importance to ensure that a representative sample of participants is chosen. There are two main categories of sampling methods: probability and non-probability. Probability sampling is the random selection of elements from the population, where each element of the population has an equal and independent chance of being included in the sample. There are five main types of probability sampling including simple random sampling, systematic sampling, stratified sampling, cluster sampling, and multi-stage sampling. Non-probability sampling methods are those in which elements are chosen through non-random methods for inclusion into the research study and include convenience sampling, purposive sampling, and snowball sampling. Each approach offers distinct advantages and disadvantages and must be considered critically. In this research column, we provide an introduction to these key sampling techniques and draw on examples from the cardiovascular research. Understanding the differences in sampling techniques may aid nurses in effective appraisal of research literature and provide a reference pointfor nurses who engage in cardiovascular research.
A Multilevel, Hierarchical Sampling Technique for Spatially Correlated Random Fields
Osborn, Sarah; Vassilevski, Panayot S.; Villa, Umberto
2017-10-26
In this paper, we propose an alternative method to generate samples of a spatially correlated random field with applications to large-scale problems for forward propagation of uncertainty. A classical approach for generating these samples is the Karhunen--Loève (KL) decomposition. However, the KL expansion requires solving a dense eigenvalue problem and is therefore computationally infeasible for large-scale problems. Sampling methods based on stochastic partial differential equations provide a highly scalable way to sample Gaussian fields, but the resulting parametrization is mesh dependent. We propose a multilevel decomposition of the stochastic field to allow for scalable, hierarchical sampling based on solving amore » mixed finite element formulation of a stochastic reaction-diffusion equation with a random, white noise source function. Lastly, numerical experiments are presented to demonstrate the scalability of the sampling method as well as numerical results of multilevel Monte Carlo simulations for a subsurface porous media flow application using the proposed sampling method.« less
A Multilevel, Hierarchical Sampling Technique for Spatially Correlated Random Fields
DOE Office of Scientific and Technical Information (OSTI.GOV)
Osborn, Sarah; Vassilevski, Panayot S.; Villa, Umberto
In this paper, we propose an alternative method to generate samples of a spatially correlated random field with applications to large-scale problems for forward propagation of uncertainty. A classical approach for generating these samples is the Karhunen--Loève (KL) decomposition. However, the KL expansion requires solving a dense eigenvalue problem and is therefore computationally infeasible for large-scale problems. Sampling methods based on stochastic partial differential equations provide a highly scalable way to sample Gaussian fields, but the resulting parametrization is mesh dependent. We propose a multilevel decomposition of the stochastic field to allow for scalable, hierarchical sampling based on solving amore » mixed finite element formulation of a stochastic reaction-diffusion equation with a random, white noise source function. Lastly, numerical experiments are presented to demonstrate the scalability of the sampling method as well as numerical results of multilevel Monte Carlo simulations for a subsurface porous media flow application using the proposed sampling method.« less
A Comparison of Single Sample and Bootstrap Methods to Assess Mediation in Cluster Randomized Trials
ERIC Educational Resources Information Center
Pituch, Keenan A.; Stapleton, Laura M.; Kang, Joo Youn
2006-01-01
A Monte Carlo study examined the statistical performance of single sample and bootstrap methods that can be used to test and form confidence interval estimates of indirect effects in two cluster randomized experimental designs. The designs were similar in that they featured random assignment of clusters to one of two treatment conditions and…
Measuring larval nematode contamination on cattle pastures: Comparing two herbage sampling methods.
Verschave, S H; Levecke, B; Duchateau, L; Vercruysse, J; Charlier, J
2015-06-15
Assessing levels of pasture larval contamination is frequently used to study the population dynamics of the free-living stages of parasitic nematodes of livestock. Direct quantification of infective larvae (L3) on herbage is the most applied method to measure pasture larval contamination. However, herbage collection remains labour intensive and there is a lack of studies addressing the variation induced by the sampling method and the required sample size. The aim of this study was (1) to compare two different sampling methods in terms of pasture larval count results and time required to sample, (2) to assess the amount of variation in larval counts at the level of sample plot, pasture and season, respectively and (3) to calculate the required sample size to assess pasture larval contamination with a predefined precision using random plots across pasture. Eight young stock pastures of different commercial dairy herds were sampled in three consecutive seasons during the grazing season (spring, summer and autumn). On each pasture, herbage samples were collected through both a double-crossed W-transect with samples taken every 10 steps (method 1) and four random located plots of 0.16 m(2) with collection of all herbage within the plot (method 2). The average (± standard deviation (SD)) pasture larval contamination using sampling methods 1 and 2 was 325 (± 479) and 305 (± 444)L3/kg dry herbage (DH), respectively. Large discrepancies in pasture larval counts of the same pasture and season were often seen between methods, but no significant difference (P = 0.38) in larval counts between methods was found. Less time was required to collect samples with method 2. This difference in collection time between methods was most pronounced for pastures with a surface area larger than 1 ha. The variation in pasture larval counts from samples generated by random plot sampling was mainly due to the repeated measurements on the same pasture in the same season (residual variance component = 6.2), rather than due to pasture (variance component = 0.55) or season (variance component = 0.15). Using the observed distribution of L3, the required sample size (i.e. number of plots per pasture) for sampling a pasture through random plots with a particular precision was simulated. A higher relative precision was acquired when estimating PLC on pastures with a high larval contamination and a low level of aggregation compared to pastures with a low larval contamination when the same sample size was applied. In the future, herbage sampling through random plots across pasture (method 2) seems a promising method to develop further as no significant difference in counts between the methods was found and this method was less time consuming. Copyright © 2015 Elsevier B.V. All rights reserved.
Sample Selection in Randomized Experiments: A New Method Using Propensity Score Stratified Sampling
ERIC Educational Resources Information Center
Tipton, Elizabeth; Hedges, Larry; Vaden-Kiernan, Michael; Borman, Geoffrey; Sullivan, Kate; Caverly, Sarah
2014-01-01
Randomized experiments are often seen as the "gold standard" for causal research. Despite the fact that experiments use random assignment to treatment conditions, units are seldom selected into the experiment using probability sampling. Very little research on experimental design has focused on how to make generalizations to well-defined…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Romero, Vicente; Bonney, Matthew; Schroeder, Benjamin
When very few samples of a random quantity are available from a source distribution of unknown shape, it is usually not possible to accurately infer the exact distribution from which the data samples come. Under-estimation of important quantities such as response variance and failure probabilities can result. For many engineering purposes, including design and risk analysis, we attempt to avoid under-estimation with a strategy to conservatively estimate (bound) these types of quantities -- without being overly conservative -- when only a few samples of a random quantity are available from model predictions or replicate experiments. This report examines a classmore » of related sparse-data uncertainty representation and inference approaches that are relatively simple, inexpensive, and effective. Tradeoffs between the methods' conservatism, reliability, and risk versus number of data samples (cost) are quantified with multi-attribute metrics use d to assess method performance for conservative estimation of two representative quantities: central 95% of response; and 10 -4 probability of exceeding a response threshold in a tail of the distribution. Each method's performance is characterized with 10,000 random trials on a large number of diverse and challenging distributions. The best method and number of samples to use in a given circumstance depends on the uncertainty quantity to be estimated, the PDF character, and the desired reliability of bounding the true value. On the basis of this large data base and study, a strategy is proposed for selecting the method and number of samples for attaining reasonable credibility levels in bounding these types of quantities when sparse samples of random variables or functions are available from experiments or simulations.« less
Improved sampling and analysis of images in corneal confocal microscopy.
Schaldemose, E L; Fontain, F I; Karlsson, P; Nyengaard, J R
2017-10-01
Corneal confocal microscopy (CCM) is a noninvasive clinical method to analyse and quantify corneal nerve fibres in vivo. Although the CCM technique is in constant progress, there are methodological limitations in terms of sampling of images and objectivity of the nerve quantification. The aim of this study was to present a randomized sampling method of the CCM images and to develop an adjusted area-dependent image analysis. Furthermore, a manual nerve fibre analysis method was compared to a fully automated method. 23 idiopathic small-fibre neuropathy patients were investigated using CCM. Corneal nerve fibre length density (CNFL) and corneal nerve fibre branch density (CNBD) were determined in both a manual and automatic manner. Differences in CNFL and CNBD between (1) the randomized and the most common sampling method, (2) the adjusted and the unadjusted area and (3) the manual and automated quantification method were investigated. The CNFL values were significantly lower when using the randomized sampling method compared to the most common method (p = 0.01). There was not a statistical significant difference in the CNBD values between the randomized and the most common sampling method (p = 0.85). CNFL and CNBD values were increased when using the adjusted area compared to the standard area. Additionally, the study found a significant increase in the CNFL and CNBD values when using the manual method compared to the automatic method (p ≤ 0.001). The study demonstrated a significant difference in the CNFL values between the randomized and common sampling method indicating the importance of clear guidelines for the image sampling. The increase in CNFL and CNBD values when using the adjusted cornea area is not surprising. The observed increases in both CNFL and CNBD values when using the manual method of nerve quantification compared to the automatic method are consistent with earlier findings. This study underlines the importance of improving the analysis of the CCM images in order to obtain more objective corneal nerve fibre measurements. © 2017 The Authors Journal of Microscopy © 2017 Royal Microscopical Society.
NASA Astrophysics Data System (ADS)
Zhang, Jingdong; Zhu, Tao; Zheng, Hua; Kuang, Yang; Liu, Min; Huang, Wei
2017-04-01
The round trip time of the light pulse limits the maximum detectable frequency response range of vibration in phase-sensitive optical time domain reflectometry (φ-OTDR). We propose a method to break the frequency response range restriction of φ-OTDR system by modulating the light pulse interval randomly which enables a random sampling for every vibration point in a long sensing fiber. This sub-Nyquist randomized sampling method is suits for detecting sparse-wideband- frequency vibration signals. Up to MHz resonance vibration signal with over dozens of frequency components and 1.153MHz single frequency vibration signal are clearly identified for a sensing range of 9.6km with 10kHz maximum sampling rate.
Recording 2-D Nutation NQR Spectra by Random Sampling Method
Sinyavsky, Nikolaj; Jadzyn, Maciej; Ostafin, Michal; Nogaj, Boleslaw
2010-01-01
The method of random sampling was introduced for the first time in the nutation nuclear quadrupole resonance (NQR) spectroscopy where the nutation spectra show characteristic singularities in the form of shoulders. The analytic formulae for complex two-dimensional (2-D) nutation NQR spectra (I = 3/2) were obtained and the condition for resolving the spectral singularities for small values of an asymmetry parameter η was determined. Our results show that the method of random sampling of a nutation interferogram allows significant reduction of time required to perform a 2-D nutation experiment and does not worsen the spectral resolution. PMID:20949121
Jannink, I; Bennen, J N; Blaauw, J; van Diest, P J; Baak, J P
1995-01-01
This study compares the influence of two different nuclear sampling methods on the prognostic value of assessments of mean and standard deviation of nuclear area (MNA, SDNA) in 191 consecutive invasive breast cancer patients with long term follow up. The first sampling method used was 'at convenience' sampling (ACS); the second, systematic random sampling (SRS). Both sampling methods were tested with a sample size of 50 nuclei (ACS-50 and SRS-50). To determine whether, besides the sampling methods, sample size had impact on prognostic value as well, the SRS method was also tested using a sample size of 100 nuclei (SRS-100). SDNA values were systematically lower for ACS, obviously due to (unconsciously) not including small and large nuclei. Testing prognostic value of a series of cut off points, MNA and SDNA values assessed by the SRS method were prognostically significantly stronger than the values obtained by the ACS method. This was confirmed in Cox regression analysis. For the MNA, the Mantel-Cox p-values from SRS-50 and SRS-100 measurements were not significantly different. However, for the SDNA, SRS-100 yielded significantly lower p-values than SRS-50. In conclusion, compared with the 'at convenience' nuclear sampling method, systematic random sampling of nuclei is not only superior with respect to reproducibility of results, but also provides a better prognostic value in patients with invasive breast cancer.
Deterministic multidimensional nonuniform gap sampling.
Worley, Bradley; Powers, Robert
2015-12-01
Born from empirical observations in nonuniformly sampled multidimensional NMR data relating to gaps between sampled points, the Poisson-gap sampling method has enjoyed widespread use in biomolecular NMR. While the majority of nonuniform sampling schemes are fully randomly drawn from probability densities that vary over a Nyquist grid, the Poisson-gap scheme employs constrained random deviates to minimize the gaps between sampled grid points. We describe a deterministic gap sampling method, based on the average behavior of Poisson-gap sampling, which performs comparably to its random counterpart with the additional benefit of completely deterministic behavior. We also introduce a general algorithm for multidimensional nonuniform sampling based on a gap equation, and apply it to yield a deterministic sampling scheme that combines burst-mode sampling features with those of Poisson-gap schemes. Finally, we derive a relationship between stochastic gap equations and the expectation value of their sampling probability densities. Copyright © 2015 Elsevier Inc. All rights reserved.
Dong, Qi; Elliott, Michael R; Raghunathan, Trivellore E
2014-06-01
Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.
Dong, Qi; Elliott, Michael R.; Raghunathan, Trivellore E.
2017-01-01
Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs. PMID:29200608
Binomial leap methods for simulating stochastic chemical kinetics.
Tian, Tianhai; Burrage, Kevin
2004-12-01
This paper discusses efficient simulation methods for stochastic chemical kinetics. Based on the tau-leap and midpoint tau-leap methods of Gillespie [D. T. Gillespie, J. Chem. Phys. 115, 1716 (2001)], binomial random variables are used in these leap methods rather than Poisson random variables. The motivation for this approach is to improve the efficiency of the Poisson leap methods by using larger stepsizes. Unlike Poisson random variables whose range of sample values is from zero to infinity, binomial random variables have a finite range of sample values. This probabilistic property has been used to restrict possible reaction numbers and to avoid negative molecular numbers in stochastic simulations when larger stepsize is used. In this approach a binomial random variable is defined for a single reaction channel in order to keep the reaction number of this channel below the numbers of molecules that undergo this reaction channel. A sampling technique is also designed for the total reaction number of a reactant species that undergoes two or more reaction channels. Samples for the total reaction number are not greater than the molecular number of this species. In addition, probability properties of the binomial random variables provide stepsize conditions for restricting reaction numbers in a chosen time interval. These stepsize conditions are important properties of robust leap control strategies. Numerical results indicate that the proposed binomial leap methods can be applied to a wide range of chemical reaction systems with very good accuracy and significant improvement on efficiency over existing approaches. (c) 2004 American Institute of Physics.
Toward cost-efficient sampling methods
NASA Astrophysics Data System (ADS)
Luo, Peng; Li, Yongli; Wu, Chong; Zhang, Guijie
2015-09-01
The sampling method has been paid much attention in the field of complex network in general and statistical physics in particular. This paper proposes two new sampling methods based on the idea that a small part of vertices with high node degree could possess the most structure information of a complex network. The two proposed sampling methods are efficient in sampling high degree nodes so that they would be useful even if the sampling rate is low, which means cost-efficient. The first new sampling method is developed on the basis of the widely used stratified random sampling (SRS) method and the second one improves the famous snowball sampling (SBS) method. In order to demonstrate the validity and accuracy of two new sampling methods, we compare them with the existing sampling methods in three commonly used simulation networks that are scale-free network, random network, small-world network, and also in two real networks. The experimental results illustrate that the two proposed sampling methods perform much better than the existing sampling methods in terms of achieving the true network structure characteristics reflected by clustering coefficient, Bonacich centrality and average path length, especially when the sampling rate is low.
Sampling large random knots in a confined space
NASA Astrophysics Data System (ADS)
Arsuaga, J.; Blackstone, T.; Diao, Y.; Hinson, K.; Karadayi, E.; Saito, M.
2007-09-01
DNA knots formed under extreme conditions of condensation, as in bacteriophage P4, are difficult to analyze experimentally and theoretically. In this paper, we propose to use the uniform random polygon model as a supplementary method to the existing methods for generating random knots in confinement. The uniform random polygon model allows us to sample knots with large crossing numbers and also to generate large diagrammatically prime knot diagrams. We show numerically that uniform random polygons sample knots with large minimum crossing numbers and certain complicated knot invariants (as those observed experimentally). We do this in terms of the knot determinants or colorings. Our numerical results suggest that the average determinant of a uniform random polygon of n vertices grows faster than O(e^{n^2}) . We also investigate the complexity of prime knot diagrams. We show rigorously that the probability that a randomly selected 2D uniform random polygon of n vertices is almost diagrammatically prime goes to 1 as n goes to infinity. Furthermore, the average number of crossings in such a diagram is at the order of O(n2). Therefore, the two-dimensional uniform random polygons offer an effective way in sampling large (prime) knots, which can be useful in various applications.
Multilattice sampling strategies for region of interest dynamic MRI.
Rilling, Gabriel; Tao, Yuehui; Marshall, Ian; Davies, Mike E
2013-08-01
A multilattice sampling approach is proposed for dynamic MRI with Cartesian trajectories. It relies on the use of sampling patterns composed of several different lattices and exploits an image model where only some parts of the image are dynamic, whereas the rest is assumed static. Given the parameters of such an image model, the methodology followed for the design of a multilattice sampling pattern adapted to the model is described. The multi-lattice approach is compared to single-lattice sampling, as used by traditional acceleration methods such as UNFOLD (UNaliasing by Fourier-Encoding the Overlaps using the temporal Dimension) or k-t BLAST, and random sampling used by modern compressed sensing-based methods. On the considered image model, it allows more flexibility and higher accelerations than lattice sampling and better performance than random sampling. The method is illustrated on a phase-contrast carotid blood velocity mapping MR experiment. Combining the multilattice approach with the KEYHOLE technique allows up to 12× acceleration factors. Simulation and in vivo undersampling results validate the method. Compared to lattice and random sampling, multilattice sampling provides significant gains at high acceleration factors. © 2012 Wiley Periodicals, Inc.
Williamson, Graham R
2003-11-01
This paper discusses the theoretical limitations of the use of random sampling and probability theory in the production of a significance level (or P-value) in nursing research. Potential alternatives, in the form of randomization tests, are proposed. Research papers in nursing, medicine and psychology frequently misrepresent their statistical findings, as the P-values reported assume random sampling. In this systematic review of studies published between January 1995 and June 2002 in the Journal of Advanced Nursing, 89 (68%) studies broke this assumption because they used convenience samples or entire populations. As a result, some of the findings may be questionable. The key ideas of random sampling and probability theory for statistical testing (for generating a P-value) are outlined. The result of a systematic review of research papers published in the Journal of Advanced Nursing is then presented, showing how frequently random sampling appears to have been misrepresented. Useful alternative techniques that might overcome these limitations are then discussed. REVIEW LIMITATIONS: This review is limited in scope because it is applied to one journal, and so the findings cannot be generalized to other nursing journals or to nursing research in general. However, it is possible that other nursing journals are also publishing research articles based on the misrepresentation of random sampling. The review is also limited because in several of the articles the sampling method was not completely clearly stated, and in this circumstance a judgment has been made as to the sampling method employed, based on the indications given by author(s). Quantitative researchers in nursing should be very careful that the statistical techniques they use are appropriate for the design and sampling methods of their studies. If the techniques they employ are not appropriate, they run the risk of misinterpreting findings by using inappropriate, unrepresentative and biased samples.
Electromagnetic Scattering by Fully Ordered and Quasi-Random Rigid Particulate Samples
NASA Technical Reports Server (NTRS)
Mishchenko, Michael I.; Dlugach, Janna M.; Mackowski, Daniel W.
2016-01-01
In this paper we have analyzed circumstances under which a rigid particulate sample can behave optically as a true discrete random medium consisting of particles randomly moving relative to each other during measurement. To this end, we applied the numerically exact superposition T-matrix method to model far-field scattering characteristics of fully ordered and quasi-randomly arranged rigid multiparticle groups in fixed and random orientations. We have shown that, in and of itself, averaging optical observables over movements of a rigid sample as a whole is insufficient unless it is combined with a quasi-random arrangement of the constituent particles in the sample. Otherwise, certain scattering effects typical of discrete random media (including some manifestations of coherent backscattering) may not be accurately replicated.
Harry T. Valentine
2002-01-01
Randomized branch sampling (RBS) is a special application of multistage probability sampling (see Sampling, environmental), which was developed originally by Jessen [3] to estimate fruit counts on individual orchard trees. In general, the method can be used to obtain estimates of many different attributes of trees or other branched plants. The usual objective of RBS is...
Quantum Random Number Generation Using a Quanta Image Sensor
Amri, Emna; Felk, Yacine; Stucki, Damien; Ma, Jiaju; Fossum, Eric R.
2016-01-01
A new quantum random number generation method is proposed. The method is based on the randomness of the photon emission process and the single photon counting capability of the Quanta Image Sensor (QIS). It has the potential to generate high-quality random numbers with remarkable data output rate. In this paper, the principle of photon statistics and theory of entropy are discussed. Sample data were collected with QIS jot device, and its randomness quality was analyzed. The randomness assessment method and results are discussed. PMID:27367698
Point-Sampling and Line-Sampling Probability Theory, Geometric Implications, Synthesis
L.R. Grosenbaugh
1958-01-01
Foresters concerned with measuring tree populations on definite areas have long employed two well-known methods of representative sampling. In list or enumerative sampling the entire tree population is tallied with a known proportion being randomly selected and measured for volume or other variables. In area sampling all trees on randomly located plots or strips...
Practical quantum random number generator based on measuring the shot noise of vacuum states
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shen Yong; Zou Hongxin; Tian Liang
2010-06-15
The shot noise of vacuum states is a kind of quantum noise and is totally random. In this paper a nondeterministic random number generation scheme based on measuring the shot noise of vacuum states is presented and experimentally demonstrated. We use a homodyne detector to measure the shot noise of vacuum states. Considering that the frequency bandwidth of our detector is limited, we derive the optimal sampling rate so that sampling points have the least correlation with each other. We also choose a method to extract random numbers from sampling values, and prove that the influence of classical noise canmore » be avoided with this method so that the detector does not have to be shot-noise limited. The random numbers generated with this scheme have passed ent and diehard tests.« less
Random Numbers and Monte Carlo Methods
NASA Astrophysics Data System (ADS)
Scherer, Philipp O. J.
Many-body problems often involve the calculation of integrals of very high dimension which cannot be treated by standard methods. For the calculation of thermodynamic averages Monte Carlo methods are very useful which sample the integration volume at randomly chosen points. After summarizing some basic statistics, we discuss algorithms for the generation of pseudo-random numbers with given probability distribution which are essential for all Monte Carlo methods. We show how the efficiency of Monte Carlo integration can be improved by sampling preferentially the important configurations. Finally the famous Metropolis algorithm is applied to classical many-particle systems. Computer experiments visualize the central limit theorem and apply the Metropolis method to the traveling salesman problem.
Spline methods for approximating quantile functions and generating random samples
NASA Technical Reports Server (NTRS)
Schiess, J. R.; Matthews, C. G.
1985-01-01
Two cubic spline formulations are presented for representing the quantile function (inverse cumulative distribution function) of a random sample of data. Both B-spline and rational spline approximations are compared with analytic representations of the quantile function. It is also shown how these representations can be used to generate random samples for use in simulation studies. Comparisons are made on samples generated from known distributions and a sample of experimental data. The spline representations are more accurate for multimodal and skewed samples and to require much less time to generate samples than the analytic representation.
Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies
Theis, Fabian J.
2017-01-01
Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464
Melvin, Neal R; Poda, Daniel; Sutherland, Robert J
2007-10-01
When properly applied, stereology is a very robust and efficient method to quantify a variety of parameters from biological material. A common sampling strategy in stereology is systematic random sampling, which involves choosing a random sampling [corrected] start point outside the structure of interest, and sampling relevant objects at [corrected] sites that are placed at pre-determined, equidistant intervals. This has proven to be a very efficient sampling strategy, and is used widely in stereological designs. At the microscopic level, this is most often achieved through the use of a motorized stage that facilitates the systematic random stepping across the structure of interest. Here, we report a simple, precise and cost-effective software-based alternative to accomplishing systematic random sampling under the microscope. We believe that this approach will facilitate the use of stereological designs that employ systematic random sampling in laboratories that lack the resources to acquire costly, fully automated systems.
Ng, Ding-Quan; Liu, Shu-Wei; Lin, Yi-Pin
2018-09-15
In this study, a sampling campaign with a total of nine sampling events investigating lead in drinking water was conducted at 7 sampling locations in an old building with lead pipes in service in part of the building on the National Taiwan University campus. This study aims to assess the effectiveness of four different sampling methods, namely first draw sampling, sequential sampling, random daytime sampling and flush sampling, in lead contamination detection. In 3 out of the 7 sampling locations without lead pipe, lead could not be detected (<1.1 μg/L) in most samples regardless of the sampling methods. On the other hand, in the 4 sampling locations where lead pipes still existed, total lead concentrations >10 μg/L were consistently observed in 3 locations using any of the four sampling methods while the remaining location was identified to be contaminated using sequential sampling. High lead levels were consistently measured by the four sampling methods in the 3 locations in which particulate lead was either predominant or comparable to soluble lead. Compared to first draw and random daytime samplings, although flush sampling had a high tendency to reduce total lead in samples in lead-contaminated sites, the extent of lead reduction was location-dependent and not dependent on flush durations between 5 and 10 min. Overall, first draw sampling and random daytime sampling were reliable and effective in determining lead contamination in this study. Flush sampling could reveal the contamination if the extent is severe but tends to underestimate lead exposure risk. Copyright © 2018 Elsevier B.V. All rights reserved.
A new mosaic method for three-dimensional surface
NASA Astrophysics Data System (ADS)
Yuan, Yun; Zhu, Zhaokun; Ding, Yongjun
2011-08-01
Three-dimensional (3-D) data mosaic is a indispensable link in surface measurement and digital terrain map generation. With respect to the mosaic problem of the local unorganized cloud points with rude registration and mass mismatched points, a new mosaic method for 3-D surface based on RANSAC is proposed. Every circular of this method is processed sequentially by random sample with additional shape constraint, data normalization of cloud points, absolute orientation, data denormalization of cloud points, inlier number statistic, etc. After N random sample trials the largest consensus set is selected, and at last the model is re-estimated using all the points in the selected subset. The minimal subset is composed of three non-colinear points which form a triangle. The shape of triangle is considered in random sample selection in order to make the sample selection reasonable. A new coordinate system transformation algorithm presented in this paper is used to avoid the singularity. The whole rotation transformation between the two coordinate systems can be solved by twice rotations expressed by Euler angle vector, each rotation has explicit physical means. Both simulation and real data are used to prove the correctness and validity of this mosaic method. This method has better noise immunity due to its robust estimation property, and has high accuracy as the shape constraint is added to random sample and the data normalization added to the absolute orientation. This method is applicable for high precision measurement of three-dimensional surface and also for the 3-D terrain mosaic.
[Theory, method and application of method R on estimation of (co)variance components].
Liu, Wen-Zhong
2004-07-01
Theory, method and application of Method R on estimation of (co)variance components were reviewed in order to make the method be reasonably used. Estimation requires R values,which are regressions of predicted random effects that are calculated using complete dataset on predicted random effects that are calculated using random subsets of the same data. By using multivariate iteration algorithm based on a transformation matrix,and combining with the preconditioned conjugate gradient to solve the mixed model equations, the computation efficiency of Method R is much improved. Method R is computationally inexpensive,and the sampling errors and approximate credible intervals of estimates can be obtained. Disadvantages of Method R include a larger sampling variance than other methods for the same data,and biased estimates in small datasets. As an alternative method, Method R can be used in larger datasets. It is necessary to study its theoretical properties and broaden its application range further.
Random sampling and validation of covariance matrices of resonance parameters
NASA Astrophysics Data System (ADS)
Plevnik, Lucijan; Zerovnik, Gašper
2017-09-01
Analytically exact methods for random sampling of arbitrary correlated parameters are presented. Emphasis is given on one hand on the possible inconsistencies in the covariance data, concentrating on the positive semi-definiteness and consistent sampling of correlated inherently positive parameters, and on the other hand on optimization of the implementation of the methods itself. The methods have been applied in the program ENDSAM, written in the Fortran language, which from a file from a nuclear data library of a chosen isotope in ENDF-6 format produces an arbitrary number of new files in ENDF-6 format which contain values of random samples of resonance parameters (in accordance with corresponding covariance matrices) in places of original values. The source code for the program ENDSAM is available from the OECD/NEA Data Bank. The program works in the following steps: reads resonance parameters and their covariance data from nuclear data library, checks whether the covariance data is consistent, and produces random samples of resonance parameters. The code has been validated with both realistic and artificial data to show that the produced samples are statistically consistent. Additionally, the code was used to validate covariance data in existing nuclear data libraries. A list of inconsistencies, observed in covariance data of resonance parameters in ENDF-VII.1, JEFF-3.2 and JENDL-4.0 is presented. For now, the work has been limited to resonance parameters, however the methods presented are general and can in principle be extended to sampling and validation of any nuclear data.
NASA Astrophysics Data System (ADS)
Li, Hongzhi; Min, Donghong; Liu, Yusong; Yang, Wei
2007-09-01
To overcome the possible pseudoergodicity problem, molecular dynamic simulation can be accelerated via the realization of an energy space random walk. To achieve this, a biased free energy function (BFEF) needs to be priori obtained. Although the quality of BFEF is essential for sampling efficiency, its generation is usually tedious and nontrivial. In this work, we present an energy space metadynamics algorithm to efficiently and robustly obtain BFEFs. Moreover, in order to deal with the associated diffusion sampling problem caused by the random walk in the total energy space, the idea in the original umbrella sampling method is generalized to be the random walk in the essential energy space, which only includes the energy terms determining the conformation of a region of interest. This essential energy space generalization allows the realization of efficient localized enhanced sampling and also offers the possibility of further sampling efficiency improvement when high frequency energy terms irrelevant to the target events are free of activation. The energy space metadynamics method and its generalization in the essential energy space for the molecular dynamics acceleration are demonstrated in the simulation of a pentanelike system, the blocked alanine dipeptide model, and the leucine model.
Flexible sampling large-scale social networks by self-adjustable random walk
NASA Astrophysics Data System (ADS)
Xu, Xiao-Ke; Zhu, Jonathan J. H.
2016-12-01
Online social networks (OSNs) have become an increasingly attractive gold mine for academic and commercial researchers. However, research on OSNs faces a number of difficult challenges. One bottleneck lies in the massive quantity and often unavailability of OSN population data. Sampling perhaps becomes the only feasible solution to the problems. How to draw samples that can represent the underlying OSNs has remained a formidable task because of a number of conceptual and methodological reasons. Especially, most of the empirically-driven studies on network sampling are confined to simulated data or sub-graph data, which are fundamentally different from real and complete-graph OSNs. In the current study, we propose a flexible sampling method, called Self-Adjustable Random Walk (SARW), and test it against with the population data of a real large-scale OSN. We evaluate the strengths of the sampling method in comparison with four prevailing methods, including uniform, breadth-first search (BFS), random walk (RW), and revised RW (i.e., MHRW) sampling. We try to mix both induced-edge and external-edge information of sampled nodes together in the same sampling process. Our results show that the SARW sampling method has been able to generate unbiased samples of OSNs with maximal precision and minimal cost. The study is helpful for the practice of OSN research by providing a highly needed sampling tools, for the methodological development of large-scale network sampling by comparative evaluations of existing sampling methods, and for the theoretical understanding of human networks by highlighting discrepancies and contradictions between existing knowledge/assumptions of large-scale real OSN data.
A study of active learning methods for named entity recognition in clinical text.
Chen, Yukun; Lasko, Thomas A; Mei, Qiaozhu; Denny, Joshua C; Xu, Hua
2015-12-01
Named entity recognition (NER), a sequential labeling task, is one of the fundamental tasks for building clinical natural language processing (NLP) systems. Machine learning (ML) based approaches can achieve good performance, but they often require large amounts of annotated samples, which are expensive to build due to the requirement of domain experts in annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. In this study, our goal was to develop and evaluate both existing and new AL methods for a clinical NER task to identify concepts of medical problems, treatments, and lab tests from the clinical notes. Using the annotated NER corpus from the 2010 i2b2/VA NLP challenge that contained 349 clinical documents with 20,423 unique sentences, we simulated AL experiments using a number of existing and novel algorithms in three different categories including uncertainty-based, diversity-based, and baseline sampling strategies. They were compared with the passive learning that uses random sampling. Learning curves that plot performance of the NER model against the estimated annotation cost (based on number of sentences or words in the training set) were generated to evaluate different active learning and the passive learning methods and the area under the learning curve (ALC) score was computed. Based on the learning curves of F-measure vs. number of sentences, uncertainty sampling algorithms outperformed all other methods in ALC. Most diversity-based methods also performed better than random sampling in ALC. To achieve an F-measure of 0.80, the best method based on uncertainty sampling could save 66% annotations in sentences, as compared to random sampling. For the learning curves of F-measure vs. number of words, uncertainty sampling methods again outperformed all other methods in ALC. To achieve 0.80 in F-measure, in comparison to random sampling, the best uncertainty based method saved 42% annotations in words. But the best diversity based method reduced only 7% annotation effort. In the simulated setting, AL methods, particularly uncertainty-sampling based approaches, seemed to significantly save annotation cost for the clinical NER task. The actual benefit of active learning in clinical NER should be further evaluated in a real-time setting. Copyright © 2015 Elsevier Inc. All rights reserved.
Accelerated 1 H MRSI using randomly undersampled spiral-based k-space trajectories.
Chatnuntawech, Itthi; Gagoski, Borjan; Bilgic, Berkin; Cauley, Stephen F; Setsompop, Kawin; Adalsteinsson, Elfar
2014-07-30
To develop and evaluate the performance of an acquisition and reconstruction method for accelerated MR spectroscopic imaging (MRSI) through undersampling of spiral trajectories. A randomly undersampled spiral acquisition and sensitivity encoding (SENSE) with total variation (TV) regularization, random SENSE+TV, is developed and evaluated on single-slice numerical phantom, in vivo single-slice MRSI, and in vivo three-dimensional (3D)-MRSI at 3 Tesla. Random SENSE+TV was compared with five alternative methods for accelerated MRSI. For the in vivo single-slice MRSI, random SENSE+TV yields up to 2.7 and 2 times reduction in root-mean-square error (RMSE) of reconstructed N-acetyl aspartate (NAA), creatine, and choline maps, compared with the denoised fully sampled and uniformly undersampled SENSE+TV methods with the same acquisition time, respectively. For the in vivo 3D-MRSI, random SENSE+TV yields up to 1.6 times reduction in RMSE, compared with uniform SENSE+TV. Furthermore, by using random SENSE+TV, we have demonstrated on the in vivo single-slice and 3D-MRSI that acceleration factors of 4.5 and 4 are achievable with the same quality as the fully sampled data, as measured by RMSE of reconstructed NAA map, respectively. With the same scan time, random SENSE+TV yields lower RMSEs of metabolite maps than other methods evaluated. Random SENSE+TV achieves up to 4.5-fold acceleration with comparable data quality as the fully sampled acquisition. Magn Reson Med, 2014. © 2014 Wiley Periodicals, Inc. © 2014 Wiley Periodicals, Inc.
Sankari, E Siva; Manimegalai, D
2017-12-21
Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.
Performance of Random Effects Model Estimators under Complex Sampling Designs
ERIC Educational Resources Information Center
Jia, Yue; Stokes, Lynne; Harris, Ian; Wang, Yan
2011-01-01
In this article, we consider estimation of parameters of random effects models from samples collected via complex multistage designs. Incorporation of sampling weights is one way to reduce estimation bias due to unequal probabilities of selection. Several weighting methods have been proposed in the literature for estimating the parameters of…
A method for determining the weak statistical stationarity of a random process
NASA Technical Reports Server (NTRS)
Sadeh, W. Z.; Koper, C. A., Jr.
1978-01-01
A method for determining the weak statistical stationarity of a random process is presented. The core of this testing procedure consists of generating an equivalent ensemble which approximates a true ensemble. Formation of an equivalent ensemble is accomplished through segmenting a sufficiently long time history of a random process into equal, finite, and statistically independent sample records. The weak statistical stationarity is ascertained based on the time invariance of the equivalent-ensemble averages. Comparison of these averages with their corresponding time averages over a single sample record leads to a heuristic estimate of the ergodicity of a random process. Specific variance tests are introduced for evaluating the statistical independence of the sample records, the time invariance of the equivalent-ensemble autocorrelations, and the ergodicity. Examination and substantiation of these procedures were conducted utilizing turbulent velocity signals.
Restricted random search method based on taboo search in the multiple minima problem
NASA Astrophysics Data System (ADS)
Hong, Seung Do; Jhon, Mu Shik
1997-03-01
The restricted random search method is proposed as a simple Monte Carlo sampling method to search minima fast in the multiple minima problem. This method is based on taboo search applied recently to continuous test functions. The concept of the taboo region instead of the taboo list is used and therefore the sampling of a region near an old configuration is restricted in this method. This method is applied to 2-dimensional test functions and the argon clusters. This method is found to be a practical and efficient method to search near-global configurations of test functions and the argon clusters.
Random phase detection in multidimensional NMR.
Maciejewski, Mark W; Fenwick, Matthew; Schuyler, Adam D; Stern, Alan S; Gorbatyuk, Vitaliy; Hoch, Jeffrey C
2011-10-04
Despite advances in resolution accompanying the development of high-field superconducting magnets, biomolecular applications of NMR require multiple dimensions in order to resolve individual resonances, and the achievable resolution is typically limited by practical constraints on measuring time. In addition to the need for measuring long evolution times to obtain high resolution, the need to distinguish the sign of the frequency constrains the ability to shorten measuring times. Sign discrimination is typically accomplished by sampling the signal with two different receiver phases or by selecting a reference frequency outside the range of frequencies spanned by the signal and then sampling at a higher rate. In the parametrically sampled (indirect) time dimensions of multidimensional NMR experiments, either method imposes an additional factor of 2 sampling burden for each dimension. We demonstrate that by using a single detector phase at each time sample point, but randomly altering the phase for different points, the sign ambiguity that attends fixed single-phase detection is resolved. Random phase detection enables a reduction in experiment time by a factor of 2 for each indirect dimension, amounting to a factor of 8 for a four-dimensional experiment, albeit at the cost of introducing sampling artifacts. Alternatively, for fixed measuring time, random phase detection can be used to double resolution in each indirect dimension. Random phase detection is complementary to nonuniform sampling methods, and their combination offers the potential for additional benefits. In addition to applications in biomolecular NMR, random phase detection could be useful in magnetic resonance imaging and other signal processing contexts.
Huang, Lei
2015-01-01
To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required. PMID:26437409
Sparse sampling and reconstruction for electron and scanning probe microscope imaging
Anderson, Hyrum; Helms, Jovana; Wheeler, Jason W.; Larson, Kurt W.; Rohrer, Brandon R.
2015-07-28
Systems and methods for conducting electron or scanning probe microscopy are provided herein. In a general embodiment, the systems and methods for conducting electron or scanning probe microscopy with an undersampled data set include: driving an electron beam or probe to scan across a sample and visit a subset of pixel locations of the sample that are randomly or pseudo-randomly designated; determining actual pixel locations on the sample that are visited by the electron beam or probe; and processing data collected by detectors from the visits of the electron beam or probe at the actual pixel locations and recovering a reconstructed image of the sample.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Yi; Jakeman, John; Gittelson, Claude
2015-01-08
In this paper we present a localized polynomial chaos expansion for partial differential equations (PDE) with random inputs. In particular, we focus on time independent linear stochastic problems with high dimensional random inputs, where the traditional polynomial chaos methods, and most of the existing methods, incur prohibitively high simulation cost. Furthermore, the local polynomial chaos method employs a domain decomposition technique to approximate the stochastic solution locally. In each subdomain, a subdomain problem is solved independently and, more importantly, in a much lower dimensional random space. In a postprocesing stage, accurate samples of the original stochastic problems are obtained frommore » the samples of the local solutions by enforcing the correct stochastic structure of the random inputs and the coupling conditions at the interfaces of the subdomains. Overall, the method is able to solve stochastic PDEs in very large dimensions by solving a collection of low dimensional local problems and can be highly efficient. In our paper we present the general mathematical framework of the methodology and use numerical examples to demonstrate the properties of the method.« less
Gender Differentiation in the New York "Times": 1885 and 1985.
ERIC Educational Resources Information Center
Jolliffe, Lee
A study examined the descriptive language and sex-linked roles ascribed to women and men in articles of the New York "Times" from 1885 and 1985. Seven content analysis methods were applied to four random samples from the "Times"; one sample each for women and men from both years. Samples were drawn using randomly constructed…
ERIC Educational Resources Information Center
Felce, David; Perry, Jonathan
2004-01-01
Background: The aims were to: (i) explore the association between age and size of setting and staffing per resident; and (ii) report resident and setting characteristics, and indicators of service process and resident activity for a national random sample of staffed housing provision. Methods: Sixty settings were selected randomly from those…
The Bootstrap, the Jackknife, and the Randomization Test: A Sampling Taxonomy.
Rodgers, J L
1999-10-01
A simple sampling taxonomy is defined that shows the differences between and relationships among the bootstrap, the jackknife, and the randomization test. Each method has as its goal the creation of an empirical sampling distribution that can be used to test statistical hypotheses, estimate standard errors, and/or create confidence intervals. Distinctions between the methods can be made based on the sampling approach (with replacement versus without replacement) and the sample size (replacing the whole original sample versus replacing a subset of the original sample). The taxonomy is useful for teaching the goals and purposes of resampling schemes. An extension of the taxonomy implies other possible resampling approaches that have not previously been considered. Univariate and multivariate examples are presented.
Crampin, A C; Mwinuka, V; Malema, S S; Glynn, J R; Fine, P E
2001-01-01
Selection bias, particularly of controls, is common in case-control studies and may materially affect the results. Methods of control selection should be tailored both for the risk factors and disease under investigation and for the population being studied. We present here a control selection method devised for a case-control study of tuberculosis in rural Africa (Karonga, northern Malawi) that selects an age/sex frequency-matched random sample of the population, with a geographical distribution in proportion to the population density. We also present an audit of the selection process, and discuss the potential of this method in other settings.
Randomized subspace-based robust principal component analysis for hyperspectral anomaly detection
NASA Astrophysics Data System (ADS)
Sun, Weiwei; Yang, Gang; Li, Jialin; Zhang, Dianfa
2018-01-01
A randomized subspace-based robust principal component analysis (RSRPCA) method for anomaly detection in hyperspectral imagery (HSI) is proposed. The RSRPCA combines advantages of randomized column subspace and robust principal component analysis (RPCA). It assumes that the background has low-rank properties, and the anomalies are sparse and do not lie in the column subspace of the background. First, RSRPCA implements random sampling to sketch the original HSI dataset from columns and to construct a randomized column subspace of the background. Structured random projections are also adopted to sketch the HSI dataset from rows. Sketching from columns and rows could greatly reduce the computational requirements of RSRPCA. Second, the RSRPCA adopts the columnwise RPCA (CWRPCA) to eliminate negative effects of sampled anomaly pixels and that purifies the previous randomized column subspace by removing sampled anomaly columns. The CWRPCA decomposes the submatrix of the HSI data into a low-rank matrix (i.e., background component), a noisy matrix (i.e., noise component), and a sparse anomaly matrix (i.e., anomaly component) with only a small proportion of nonzero columns. The algorithm of inexact augmented Lagrange multiplier is utilized to optimize the CWRPCA problem and estimate the sparse matrix. Nonzero columns of the sparse anomaly matrix point to sampled anomaly columns in the submatrix. Third, all the pixels are projected onto the complemental subspace of the purified randomized column subspace of the background and the anomaly pixels in the original HSI data are finally exactly located. Several experiments on three real hyperspectral images are carefully designed to investigate the detection performance of RSRPCA, and the results are compared with four state-of-the-art methods. Experimental results show that the proposed RSRPCA outperforms four comparison methods both in detection performance and in computational time.
Detection and monitoring of invasive exotic plants: a comparison of four sampling methods
Cynthia D. Huebner
2007-01-01
The ability to detect and monitor exotic invasive plants is likely to vary depending on the sampling method employed. Methods with strong qualitative thoroughness for species detection often lack the intensity necessary to monitor vegetation change. Four sampling methods (systematic plot, stratified-random plot, modified Whittaker, and timed meander) in hemlock and red...
Balancing a U-Shaped Assembly Line by Applying Nested Partitions Method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhagwat, Nikhil V.
2005-01-01
In this study, we applied the Nested Partitions method to a U-line balancing problem and conducted experiments to evaluate the application. From the results, it is quite evident that the Nested Partitions method provided near optimal solutions (optimal in some cases). Besides, the execution time is quite short as compared to the Branch and Bound algorithm. However, for larger data sets, the algorithm took significantly longer times for execution. One of the reasons could be the way in which the random samples are generated. In the present study, a random sample is a solution in itself which requires assignment ofmore » tasks to various stations. The time taken to assign tasks to stations is directly proportional to the number of tasks. Thus, if the number of tasks increases, the time taken to generate random samples for the different regions also increases. The performance index for the Nested Partitions method in the present study was the number of stations in the random solutions (samples) generated. The total idle time for the samples can be used as another performance index. ULINO method is known to have used a combination of bounds to come up with good solutions. This approach of combining different performance indices can be used to evaluate the random samples and obtain even better solutions. Here, we used deterministic time values for the tasks. In industries where majority of tasks are performed manually, the stochastic version of the problem could be of vital importance. Experimenting with different objective functions (No. of stations was used in this study) could be of some significance to some industries where in the cost associated with creation of a new station is not the same. For such industries, the results obtained by using the present approach will not be of much value. Labor costs, task incompletion costs or a combination of those can be effectively used as alternate objective functions.« less
An evaluation of flow-stratified sampling for estimating suspended sediment loads
Robert B. Thomas; Jack Lewis
1995-01-01
Abstract - Flow-stratified sampling is a new method for sampling water quality constituents such as suspended sediment to estimate loads. As with selection-at-list-time (SALT) and time-stratified sampling, flow-stratified sampling is a statistical method requiring random sampling, and yielding unbiased estimates of load and variance. It can be used to estimate event...
True Randomness from Big Data.
Papakonstantinou, Periklis A; Woodruff, David P; Yang, Guang
2016-09-26
Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests.
NASA Astrophysics Data System (ADS)
Papakonstantinou, Periklis A.; Woodruff, David P.; Yang, Guang
2016-09-01
Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests.
Papakonstantinou, Periklis A.; Woodruff, David P.; Yang, Guang
2016-01-01
Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests. PMID:27666514
Latin Hypercube Sampling (LHS) UNIX Library/Standalone
DOE Office of Scientific and Technical Information (OSTI.GOV)
2004-05-13
The LHS UNIX Library/Standalone software provides the capability to draw random samples from over 30 distribution types. It performs the sampling by a stratified sampling method called Latin Hypercube Sampling (LHS). Multiple distributions can be sampled simultaneously, with user-specified correlations amongst the input distributions, LHS UNIX Library/ Standalone provides a way to generate multi-variate samples. The LHS samples can be generated either as a callable library (e.g., from within the DAKOTA software framework) or as a standalone capability. LHS UNIX Library/Standalone uses the Latin Hypercube Sampling method (LHS) to generate samples. LHS is a constrained Monte Carlo sampling scheme. Inmore » LHS, the range of each variable is divided into non-overlapping intervals on the basis of equal probability. A sample is selected at random with respect to the probability density in each interval, If multiple variables are sampled simultaneously, then values obtained for each are paired in a random manner with the n values of the other variables. In some cases, the pairing is restricted to obtain specified correlations amongst the input variables. Many simulation codes have input parameters that are uncertain and can be specified by a distribution, To perform uncertainty analysis and sensitivity analysis, random values are drawn from the input parameter distributions, and the simulation is run with these values to obtain output values. If this is done repeatedly, with many input samples drawn, one can build up a distribution of the output as well as examine correlations between input and output variables.« less
Cluster-randomized Studies in Educational Research: Principles and Methodological Aspects.
Dreyhaupt, Jens; Mayer, Benjamin; Keis, Oliver; Öchsner, Wolfgang; Muche, Rainer
2017-01-01
An increasing number of studies are being performed in educational research to evaluate new teaching methods and approaches. These studies could be performed more efficiently and deliver more convincing results if they more strictly applied and complied with recognized standards of scientific studies. Such an approach could substantially increase the quality in particular of prospective, two-arm (intervention) studies that aim to compare two different teaching methods. A key standard in such studies is randomization, which can minimize systematic bias in study findings; such bias may result if the two study arms are not structurally equivalent. If possible, educational research studies should also achieve this standard, although this is not yet generally the case. Some difficulties and concerns exist, particularly regarding organizational and methodological aspects. An important point to consider in educational research studies is that usually individuals cannot be randomized, because of the teaching situation, and instead whole groups have to be randomized (so-called "cluster randomization"). Compared with studies with individual randomization, studies with cluster randomization normally require (significantly) larger sample sizes and more complex methods for calculating sample size. Furthermore, cluster-randomized studies require more complex methods for statistical analysis. The consequence of the above is that a competent expert with respective special knowledge needs to be involved in all phases of cluster-randomized studies. Studies to evaluate new teaching methods need to make greater use of randomization in order to achieve scientifically convincing results. Therefore, in this article we describe the general principles of cluster randomization and how to implement these principles, and we also outline practical aspects of using cluster randomization in prospective, two-arm comparative educational research studies.
Investigating Test Equating Methods in Small Samples through Various Factors
ERIC Educational Resources Information Center
Asiret, Semih; Sünbül, Seçil Ömür
2016-01-01
In this study, equating methods for random group design using small samples through factors such as sample size, difference in difficulty between forms, and guessing parameter was aimed for comparison. Moreover, which method gives better results under which conditions was also investigated. In this study, 5,000 dichotomous simulated data…
Macrostructure from Microstructure: Generating Whole Systems from Ego Networks
Smith, Jeffrey A.
2014-01-01
This paper presents a new simulation method to make global network inference from sampled data. The proposed simulation method takes sampled ego network data and uses Exponential Random Graph Models (ERGM) to reconstruct the features of the true, unknown network. After describing the method, the paper presents two validity checks of the approach: the first uses the 20 largest Add Health networks while the second uses the Sociology Coauthorship network in the 1990's. For each test, I take random ego network samples from the known networks and use my method to make global network inference. I find that my method successfully reproduces the properties of the networks, such as distance and main component size. The results also suggest that simpler, baseline models provide considerably worse estimates for most network properties. I end the paper by discussing the bounds/limitations of ego network sampling. I also discuss possible extensions to the proposed approach. PMID:25339783
NASA Technical Reports Server (NTRS)
Tomberlin, T. J.
1985-01-01
Research studies of residents' responses to noise consist of interviews with samples of individuals who are drawn from a number of different compact study areas. The statistical techniques developed provide a basis for those sample design decisions. These techniques are suitable for a wide range of sample survey applications. A sample may consist of a random sample of residents selected from a sample of compact study areas, or in a more complex design, of a sample of residents selected from a sample of larger areas (e.g., cities). The techniques may be applied to estimates of the effects on annoyance of noise level, numbers of noise events, the time-of-day of the events, ambient noise levels, or other factors. Methods are provided for determining, in advance, how accurately these effects can be estimated for different sample sizes and study designs. Using a simple cost function, they also provide for optimum allocation of the sample across the stages of the design for estimating these effects. These techniques are developed via a regression model in which the regression coefficients are assumed to be random, with components of variance associated with the various stages of a multi-stage sample design.
Applying Active Learning to Assertion Classification of Concepts in Clinical Text
Chen, Yukun; Mani, Subramani; Xu, Hua
2012-01-01
Supervised machine learning methods for clinical natural language processing (NLP) research require a large number of annotated samples, which are very expensive to build because of the involvement of physicians. Active learning, an approach that actively samples from a large pool, provides an alternative solution. Its major goal in classification is to reduce the annotation effort while maintaining the quality of the predictive model. However, few studies have investigated its uses in clinical NLP. This paper reports an application of active learning to a clinical text classification task: to determine the assertion status of clinical concepts. The annotated corpus for the assertion classification task in the 2010 i2b2/VA Clinical NLP Challenge was used in this study. We implemented several existing and newly developed active learning algorithms and assessed their uses. The outcome is reported in the global ALC score, based on the Area under the average Learning Curve of the AUC (Area Under the Curve) score. Results showed that when the same number of annotated samples was used, active learning strategies could generate better classification models (best ALC – 0.7715) than the passive learning method (random sampling) (ALC – 0.7411). Moreover, to achieve the same classification performance, active learning strategies required fewer samples than the random sampling method. For example, to achieve an AUC of 0.79, the random sampling method used 32 samples, while our best active learning algorithm required only 12 samples, a reduction of 62.5% in manual annotation effort. PMID:22127105
Model's sparse representation based on reduced mixed GMsFE basis methods
NASA Astrophysics Data System (ADS)
Jiang, Lijian; Li, Qiuqi
2017-06-01
In this paper, we propose a model's sparse representation based on reduced mixed generalized multiscale finite element (GMsFE) basis methods for elliptic PDEs with random inputs. A typical application for the elliptic PDEs is the flow in heterogeneous random porous media. Mixed generalized multiscale finite element method (GMsFEM) is one of the accurate and efficient approaches to solve the flow problem in a coarse grid and obtain the velocity with local mass conservation. When the inputs of the PDEs are parameterized by the random variables, the GMsFE basis functions usually depend on the random parameters. This leads to a large number degree of freedoms for the mixed GMsFEM and substantially impacts on the computation efficiency. In order to overcome the difficulty, we develop reduced mixed GMsFE basis methods such that the multiscale basis functions are independent of the random parameters and span a low-dimensional space. To this end, a greedy algorithm is used to find a set of optimal samples from a training set scattered in the parameter space. Reduced mixed GMsFE basis functions are constructed based on the optimal samples using two optimal sampling strategies: basis-oriented cross-validation and proper orthogonal decomposition. Although the dimension of the space spanned by the reduced mixed GMsFE basis functions is much smaller than the dimension of the original full order model, the online computation still depends on the number of coarse degree of freedoms. To significantly improve the online computation, we integrate the reduced mixed GMsFE basis methods with sparse tensor approximation and obtain a sparse representation for the model's outputs. The sparse representation is very efficient for evaluating the model's outputs for many instances of parameters. To illustrate the efficacy of the proposed methods, we present a few numerical examples for elliptic PDEs with multiscale and random inputs. In particular, a two-phase flow model in random porous media is simulated by the proposed sparse representation method.
Model's sparse representation based on reduced mixed GMsFE basis methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Lijian, E-mail: ljjiang@hnu.edu.cn; Li, Qiuqi, E-mail: qiuqili@hnu.edu.cn
2017-06-01
In this paper, we propose a model's sparse representation based on reduced mixed generalized multiscale finite element (GMsFE) basis methods for elliptic PDEs with random inputs. A typical application for the elliptic PDEs is the flow in heterogeneous random porous media. Mixed generalized multiscale finite element method (GMsFEM) is one of the accurate and efficient approaches to solve the flow problem in a coarse grid and obtain the velocity with local mass conservation. When the inputs of the PDEs are parameterized by the random variables, the GMsFE basis functions usually depend on the random parameters. This leads to a largemore » number degree of freedoms for the mixed GMsFEM and substantially impacts on the computation efficiency. In order to overcome the difficulty, we develop reduced mixed GMsFE basis methods such that the multiscale basis functions are independent of the random parameters and span a low-dimensional space. To this end, a greedy algorithm is used to find a set of optimal samples from a training set scattered in the parameter space. Reduced mixed GMsFE basis functions are constructed based on the optimal samples using two optimal sampling strategies: basis-oriented cross-validation and proper orthogonal decomposition. Although the dimension of the space spanned by the reduced mixed GMsFE basis functions is much smaller than the dimension of the original full order model, the online computation still depends on the number of coarse degree of freedoms. To significantly improve the online computation, we integrate the reduced mixed GMsFE basis methods with sparse tensor approximation and obtain a sparse representation for the model's outputs. The sparse representation is very efficient for evaluating the model's outputs for many instances of parameters. To illustrate the efficacy of the proposed methods, we present a few numerical examples for elliptic PDEs with multiscale and random inputs. In particular, a two-phase flow model in random porous media is simulated by the proposed sparse representation method.« less
GEOSTATISTICAL SAMPLING DESIGNS FOR HAZARDOUS WASTE SITES
This chapter discusses field sampling design for environmental sites and hazardous waste sites with respect to random variable sampling theory, Gy's sampling theory, and geostatistical (kriging) sampling theory. The literature often presents these sampling methods as an adversari...
Sampling Strategies and Processing of Biobank Tissue Samples from Porcine Biomedical Models.
Blutke, Andreas; Wanke, Rüdiger
2018-03-06
In translational medical research, porcine models have steadily become more popular. Considering the high value of individual animals, particularly of genetically modified pig models, and the often-limited number of available animals of these models, establishment of (biobank) collections of adequately processed tissue samples suited for a broad spectrum of subsequent analyses methods, including analyses not specified at the time point of sampling, represent meaningful approaches to take full advantage of the translational value of the model. With respect to the peculiarities of porcine anatomy, comprehensive guidelines have recently been established for standardized generation of representative, high-quality samples from different porcine organs and tissues. These guidelines are essential prerequisites for the reproducibility of results and their comparability between different studies and investigators. The recording of basic data, such as organ weights and volumes, the determination of the sampling locations and of the numbers of tissue samples to be generated, as well as their orientation, size, processing and trimming directions, are relevant factors determining the generalizability and usability of the specimen for molecular, qualitative, and quantitative morphological analyses. Here, an illustrative, practical, step-by-step demonstration of the most important techniques for generation of representative, multi-purpose biobank specimen from porcine tissues is presented. The methods described here include determination of organ/tissue volumes and densities, the application of a volume-weighted systematic random sampling procedure for parenchymal organs by point-counting, determination of the extent of tissue shrinkage related to histological embedding of samples, and generation of randomly oriented samples for quantitative stereological analyses, such as isotropic uniform random (IUR) sections generated by the "Orientator" and "Isector" methods, and vertical uniform random (VUR) sections.
Computational methods for efficient structural reliability and reliability sensitivity analysis
NASA Technical Reports Server (NTRS)
Wu, Y.-T.
1993-01-01
This paper presents recent developments in efficient structural reliability analysis methods. The paper proposes an efficient, adaptive importance sampling (AIS) method that can be used to compute reliability and reliability sensitivities. The AIS approach uses a sampling density that is proportional to the joint PDF of the random variables. Starting from an initial approximate failure domain, sampling proceeds adaptively and incrementally with the goal of reaching a sampling domain that is slightly greater than the failure domain to minimize over-sampling in the safe region. Several reliability sensitivity coefficients are proposed that can be computed directly and easily from the above AIS-based failure points. These probability sensitivities can be used for identifying key random variables and for adjusting design to achieve reliability-based objectives. The proposed AIS methodology is demonstrated using a turbine blade reliability analysis problem.
Cluster-randomized Studies in Educational Research: Principles and Methodological Aspects
Dreyhaupt, Jens; Mayer, Benjamin; Keis, Oliver; Öchsner, Wolfgang; Muche, Rainer
2017-01-01
An increasing number of studies are being performed in educational research to evaluate new teaching methods and approaches. These studies could be performed more efficiently and deliver more convincing results if they more strictly applied and complied with recognized standards of scientific studies. Such an approach could substantially increase the quality in particular of prospective, two-arm (intervention) studies that aim to compare two different teaching methods. A key standard in such studies is randomization, which can minimize systematic bias in study findings; such bias may result if the two study arms are not structurally equivalent. If possible, educational research studies should also achieve this standard, although this is not yet generally the case. Some difficulties and concerns exist, particularly regarding organizational and methodological aspects. An important point to consider in educational research studies is that usually individuals cannot be randomized, because of the teaching situation, and instead whole groups have to be randomized (so-called “cluster randomization”). Compared with studies with individual randomization, studies with cluster randomization normally require (significantly) larger sample sizes and more complex methods for calculating sample size. Furthermore, cluster-randomized studies require more complex methods for statistical analysis. The consequence of the above is that a competent expert with respective special knowledge needs to be involved in all phases of cluster-randomized studies. Studies to evaluate new teaching methods need to make greater use of randomization in order to achieve scientifically convincing results. Therefore, in this article we describe the general principles of cluster randomization and how to implement these principles, and we also outline practical aspects of using cluster randomization in prospective, two-arm comparative educational research studies. PMID:28584874
Sequential time interleaved random equivalent sampling for repetitive signal.
Zhao, Yijiu; Liu, Jingjing
2016-12-01
Compressed sensing (CS) based sampling techniques exhibit many advantages over other existing approaches for sparse signal spectrum sensing; they are also incorporated into non-uniform sampling signal reconstruction to improve the efficiency, such as random equivalent sampling (RES). However, in CS based RES, only one sample of each acquisition is considered in the signal reconstruction stage, and it will result in more acquisition runs and longer sampling time. In this paper, a sampling sequence is taken in each RES acquisition run, and the corresponding block measurement matrix is constructed using a Whittaker-Shannon interpolation formula. All the block matrices are combined into an equivalent measurement matrix with respect to all sampling sequences. We implemented the proposed approach with a multi-cores analog-to-digital converter (ADC), whose ADC cores are time interleaved. A prototype realization of this proposed CS based sequential random equivalent sampling method has been developed. It is able to capture an analog waveform at an equivalent sampling rate of 40 GHz while sampled at 1 GHz physically. Experiments indicate that, for a sparse signal, the proposed CS based sequential random equivalent sampling exhibits high efficiency.
Osborn, Sarah; Zulian, Patrick; Benson, Thomas; ...
2018-01-30
This work describes a domain embedding technique between two nonmatching meshes used for generating realizations of spatially correlated random fields with applications to large-scale sampling-based uncertainty quantification. The goal is to apply the multilevel Monte Carlo (MLMC) method for the quantification of output uncertainties of PDEs with random input coefficients on general and unstructured computational domains. We propose a highly scalable, hierarchical sampling method to generate realizations of a Gaussian random field on a given unstructured mesh by solving a reaction–diffusion PDE with a stochastic right-hand side. The stochastic PDE is discretized using the mixed finite element method on anmore » embedded domain with a structured mesh, and then, the solution is projected onto the unstructured mesh. This work describes implementation details on how to efficiently transfer data from the structured and unstructured meshes at coarse levels, assuming that this can be done efficiently on the finest level. We investigate the efficiency and parallel scalability of the technique for the scalable generation of Gaussian random fields in three dimensions. An application of the MLMC method is presented for quantifying uncertainties of subsurface flow problems. Here, we demonstrate the scalability of the sampling method with nonmatching mesh embedding, coupled with a parallel forward model problem solver, for large-scale 3D MLMC simulations with up to 1.9·109 unknowns.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Osborn, Sarah; Zulian, Patrick; Benson, Thomas
This work describes a domain embedding technique between two nonmatching meshes used for generating realizations of spatially correlated random fields with applications to large-scale sampling-based uncertainty quantification. The goal is to apply the multilevel Monte Carlo (MLMC) method for the quantification of output uncertainties of PDEs with random input coefficients on general and unstructured computational domains. We propose a highly scalable, hierarchical sampling method to generate realizations of a Gaussian random field on a given unstructured mesh by solving a reaction–diffusion PDE with a stochastic right-hand side. The stochastic PDE is discretized using the mixed finite element method on anmore » embedded domain with a structured mesh, and then, the solution is projected onto the unstructured mesh. This work describes implementation details on how to efficiently transfer data from the structured and unstructured meshes at coarse levels, assuming that this can be done efficiently on the finest level. We investigate the efficiency and parallel scalability of the technique for the scalable generation of Gaussian random fields in three dimensions. An application of the MLMC method is presented for quantifying uncertainties of subsurface flow problems. Here, we demonstrate the scalability of the sampling method with nonmatching mesh embedding, coupled with a parallel forward model problem solver, for large-scale 3D MLMC simulations with up to 1.9·109 unknowns.« less
Exploratory and Confirmatory Analysis of the Trauma Practices Questionnaire
ERIC Educational Resources Information Center
Craig, Carlton D.; Sprang, Ginny
2009-01-01
Objective: The present study provides psychometric data for the Trauma Practices Questionnaire (TPQ). Method: A nationally randomized sample of 2,400 surveys was sent to self-identified trauma treatment specialists, and 711 (29.6%) were returned. Results: An exploratory factor analysis (N = 319) conducted on a randomly split sample (RSS) revealed…
A weighted ℓ{sub 1}-minimization approach for sparse polynomial chaos expansions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peng, Ji; Hampton, Jerrad; Doostan, Alireza, E-mail: alireza.doostan@colorado.edu
2014-06-15
This work proposes a method for sparse polynomial chaos (PC) approximation of high-dimensional stochastic functions based on non-adapted random sampling. We modify the standard ℓ{sub 1}-minimization algorithm, originally proposed in the context of compressive sampling, using a priori information about the decay of the PC coefficients, when available, and refer to the resulting algorithm as weightedℓ{sub 1}-minimization. We provide conditions under which we may guarantee recovery using this weighted scheme. Numerical tests are used to compare the weighted and non-weighted methods for the recovery of solutions to two differential equations with high-dimensional random inputs: a boundary value problem with amore » random elliptic operator and a 2-D thermally driven cavity flow with random boundary condition.« less
Piecewise SALT sampling for estimating suspended sediment yields
Robert B. Thomas
1989-01-01
A probability sampling method called SALT (Selection At List Time) has been developed for collecting and summarizing data on delivery of suspended sediment in rivers. It is based on sampling and estimating yield using a suspended-sediment rating curve for high discharges and simple random sampling for low flows. The method gives unbiased estimates of total yield and...
Sampling estimators of total mill receipts for use in timber product output studies
John P. Brown; Richard G. Oderwald
2012-01-01
Data from the 2001 timber product output study for Georgia was explored to determine new methods for stratifying mills and finding suitable sampling estimators. Estimators for roundwood receipts totals comprised several types: simple random sample, ratio, stratified sample, and combined ratio. Two stratification methods were examined: the Dalenius-Hodges (DH) square...
Bergh, Daniel
2015-01-01
Chi-square statistics are commonly used for tests of fit of measurement models. Chi-square is also sensitive to sample size, which is why several approaches to handle large samples in test of fit analysis have been developed. One strategy to handle the sample size problem may be to adjust the sample size in the analysis of fit. An alternative is to adopt a random sample approach. The purpose of this study was to analyze and to compare these two strategies using simulated data. Given an original sample size of 21,000, for reductions of sample sizes down to the order of 5,000 the adjusted sample size function works as good as the random sample approach. In contrast, when applying adjustments to sample sizes of lower order the adjustment function is less effective at approximating the chi-square value for an actual random sample of the relevant size. Hence, the fit is exaggerated and misfit under-estimated using the adjusted sample size function. Although there are big differences in chi-square values between the two approaches at lower sample sizes, the inferences based on the p-values may be the same.
Tharwat, Alaa; Moemen, Yasmine S; Hassanien, Aboul Ella
2016-12-09
Measuring toxicity is one of the main steps in drug development. Hence, there is a high demand for computational models to predict the toxicity effects of the potential drugs. In this study, we used a dataset, which consists of four toxicity effects:mutagenic, tumorigenic, irritant and reproductive effects. The proposed model consists of three phases. In the first phase, rough set-based methods are used to select the most discriminative features for reducing the classification time and improving the classification performance. Due to the imbalanced class distribution, in the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling and Synthetic Minority Oversampling Technique are used to solve the problem of imbalanced datasets. ITerative Sampling (ITS) method is proposed to avoid the limitations of those methods. ITS method has two steps. The first step (sampling step) iteratively modifies the prior distribution of the minority and majority classes. In the second step, a data cleaning method is used to remove the overlapping that is produced from the first step. In the third phase, Bagging classifier is used to classify an unknown drug into toxic or non-toxic. The experimental results proved that the proposed model performed well in classifying the unknown samples according to all toxic effects in the imbalanced datasets.
Multistage point relascope and randomized branch sampling for downed coarse woody debris estimation
Jeffrey H. Gove; Mark J. Ducey; Harry T. Valentine
2002-01-01
New sampling methods have recently been introduced that allow estimation of downed coarse woody debris using an angle gauge, or relascope. The theory behind these methods is based on sampling straight pieces of downed coarse woody debris. When pieces deviate from this ideal situation, auxillary methods must be employed. We describe a two-stage procedure where the...
Method of multiplexed analysis using ion mobility spectrometer
Belov, Mikhail E [Richland, WA; Smith, Richard D [Richland, WA
2009-06-02
A method for analyzing analytes from a sample introduced into a Spectrometer by generating a pseudo random sequence of a modulation bins, organizing each modulation bin as a series of submodulation bins, thereby forming an extended pseudo random sequence of submodulation bins, releasing the analytes in a series of analyte packets into a Spectrometer, thereby generating an unknown original ion signal vector, detecting the analytes at a detector, and characterizing the sample using the plurality of analyte signal subvectors. The method is advantageously applied to an Ion Mobility Spectrometer, and an Ion Mobility Spectrometer interfaced with a Time of Flight Mass Spectrometer.
Toward a Principled Sampling Theory for Quasi-Orders
Ünlü, Ali; Schrepp, Martin
2016-01-01
Quasi-orders, that is, reflexive and transitive binary relations, have numerous applications. In educational theories, the dependencies of mastery among the problems of a test can be modeled by quasi-orders. Methods such as item tree or Boolean analysis that mine for quasi-orders in empirical data are sensitive to the underlying quasi-order structure. These data mining techniques have to be compared based on extensive simulation studies, with unbiased samples of randomly generated quasi-orders at their basis. In this paper, we develop techniques that can provide the required quasi-order samples. We introduce a discrete doubly inductive procedure for incrementally constructing the set of all quasi-orders on a finite item set. A randomization of this deterministic procedure allows us to generate representative samples of random quasi-orders. With an outer level inductive algorithm, we consider the uniform random extensions of the trace quasi-orders to higher dimension. This is combined with an inner level inductive algorithm to correct the extensions that violate the transitivity property. The inner level correction step entails sampling biases. We propose three algorithms for bias correction and investigate them in simulation. It is evident that, on even up to 50 items, the new algorithms create close to representative quasi-order samples within acceptable computing time. Hence, the principled approach is a significant improvement to existing methods that are used to draw quasi-orders uniformly at random but cannot cope with reasonably large item sets. PMID:27965601
Toward a Principled Sampling Theory for Quasi-Orders.
Ünlü, Ali; Schrepp, Martin
2016-01-01
Quasi-orders, that is, reflexive and transitive binary relations, have numerous applications. In educational theories, the dependencies of mastery among the problems of a test can be modeled by quasi-orders. Methods such as item tree or Boolean analysis that mine for quasi-orders in empirical data are sensitive to the underlying quasi-order structure. These data mining techniques have to be compared based on extensive simulation studies, with unbiased samples of randomly generated quasi-orders at their basis. In this paper, we develop techniques that can provide the required quasi-order samples. We introduce a discrete doubly inductive procedure for incrementally constructing the set of all quasi-orders on a finite item set. A randomization of this deterministic procedure allows us to generate representative samples of random quasi-orders. With an outer level inductive algorithm, we consider the uniform random extensions of the trace quasi-orders to higher dimension. This is combined with an inner level inductive algorithm to correct the extensions that violate the transitivity property. The inner level correction step entails sampling biases. We propose three algorithms for bias correction and investigate them in simulation. It is evident that, on even up to 50 items, the new algorithms create close to representative quasi-order samples within acceptable computing time. Hence, the principled approach is a significant improvement to existing methods that are used to draw quasi-orders uniformly at random but cannot cope with reasonably large item sets.
NASA Astrophysics Data System (ADS)
Sung, S.; Kim, H. G.; Lee, D. K.; Park, J. H.; Mo, Y.; Kil, S.; Park, C.
2016-12-01
The impact of climate change has been observed throughout the globe. The ecosystem experiences rapid changes such as vegetation shift, species extinction. In these context, Species Distribution Model (SDM) is one of the popular method to project impact of climate change on the ecosystem. SDM basically based on the niche of certain species with means to run SDM present point data is essential to find biological niche of species. To run SDM for plants, there are certain considerations on the characteristics of vegetation. Normally, to make vegetation data in large area, remote sensing techniques are used. In other words, the exact point of presence data has high uncertainties as we select presence data set from polygons and raster dataset. Thus, sampling methods for modeling vegetation presence data should be carefully selected. In this study, we used three different sampling methods for selection of presence data of vegetation: Random sampling, Stratified sampling and Site index based sampling. We used one of the R package BIOMOD2 to access uncertainty from modeling. At the same time, we included BioCLIM variables and other environmental variables as input data. As a result of this study, despite of differences among the 10 SDMs, the sampling methods showed differences in ROC values, random sampling methods showed the lowest ROC value while site index based sampling methods showed the highest ROC value. As a result of this study the uncertainties from presence data sampling methods and SDM can be quantified.
Reducing seed dependent variability of non-uniformly sampled multidimensional NMR data
NASA Astrophysics Data System (ADS)
Mobli, Mehdi
2015-07-01
The application of NMR spectroscopy to study the structure, dynamics and function of macromolecules requires the acquisition of several multidimensional spectra. The one-dimensional NMR time-response from the spectrometer is extended to additional dimensions by introducing incremented delays in the experiment that cause oscillation of the signal along "indirect" dimensions. For a given dimension the delay is incremented at twice the rate of the maximum frequency (Nyquist rate). To achieve high-resolution requires acquisition of long data records sampled at the Nyquist rate. This is typically a prohibitive step due to time constraints, resulting in sub-optimal data records to the detriment of subsequent analyses. The multidimensional NMR spectrum itself is typically sparse, and it has been shown that in such cases it is possible to use non-Fourier methods to reconstruct a high-resolution multidimensional spectrum from a random subset of non-uniformly sampled (NUS) data. For a given acquisition time, NUS has the potential to improve the sensitivity and resolution of a multidimensional spectrum, compared to traditional uniform sampling. The improvements in sensitivity and/or resolution achieved by NUS are heavily dependent on the distribution of points in the random subset acquired. Typically, random points are selected from a probability density function (PDF) weighted according to the NMR signal envelope. In extreme cases as little as 1% of the data is subsampled. The heavy under-sampling can result in poor reproducibility, i.e. when two experiments are carried out where the same number of random samples is selected from the same PDF but using different random seeds. Here, a jittered sampling approach is introduced that is shown to improve random seed dependent reproducibility of multidimensional spectra generated from NUS data, compared to commonly applied NUS methods. It is shown that this is achieved due to the low variability of the inherent sensitivity of the random subset chosen from a given PDF. Finally, it is demonstrated that metrics used to find optimal NUS distributions are heavily dependent on the inherent sensitivity of the random subset, and such optimisation is therefore less critical when using the proposed sampling scheme.
Su, Ruiliang; Chen, Xiang; Cao, Shuai; Zhang, Xu
2016-01-14
Sign language recognition (SLR) has been widely used for communication amongst the hearing-impaired and non-verbal community. This paper proposes an accurate and robust SLR framework using an improved decision tree as the base classifier of random forests. This framework was used to recognize Chinese sign language subwords using recordings from a pair of portable devices worn on both arms consisting of accelerometers (ACC) and surface electromyography (sEMG) sensors. The experimental results demonstrated the validity of the proposed random forest-based method for recognition of Chinese sign language (CSL) subwords. With the proposed method, 98.25% average accuracy was obtained for the classification of a list of 121 frequently used CSL subwords. Moreover, the random forests method demonstrated a superior performance in resisting the impact of bad training samples. When the proportion of bad samples in the training set reached 50%, the recognition error rate of the random forest-based method was only 10.67%, while that of a single decision tree adopted in our previous work was almost 27.5%. Our study offers a practical way of realizing a robust and wearable EMG-ACC-based SLR systems.
Random sampling of elementary flux modes in large-scale metabolic networks.
Machado, Daniel; Soons, Zita; Patil, Kiran Raosaheb; Ferreira, Eugénio C; Rocha, Isabel
2012-09-15
The description of a metabolic network in terms of elementary (flux) modes (EMs) provides an important framework for metabolic pathway analysis. However, their application to large networks has been hampered by the combinatorial explosion in the number of modes. In this work, we develop a method for generating random samples of EMs without computing the whole set. Our algorithm is an adaptation of the canonical basis approach, where we add an additional filtering step which, at each iteration, selects a random subset of the new combinations of modes. In order to obtain an unbiased sample, all candidates are assigned the same probability of getting selected. This approach avoids the exponential growth of the number of modes during computation, thus generating a random sample of the complete set of EMs within reasonable time. We generated samples of different sizes for a metabolic network of Escherichia coli, and observed that they preserve several properties of the full EM set. It is also shown that EM sampling can be used for rational strain design. A well distributed sample, that is representative of the complete set of EMs, should be suitable to most EM-based methods for analysis and optimization of metabolic networks. Source code for a cross-platform implementation in Python is freely available at http://code.google.com/p/emsampler. dmachado@deb.uminho.pt Supplementary data are available at Bioinformatics online.
ERIC Educational Resources Information Center
Jia, Yue; Stokes, Lynne; Harris, Ian; Wang, Yan
2011-01-01
Estimation of parameters of random effects models from samples collected via complex multistage designs is considered. One way to reduce estimation bias due to unequal probabilities of selection is to incorporate sampling weights. Many researchers have been proposed various weighting methods (Korn, & Graubard, 2003; Pfeffermann, Skinner,…
ERIC Educational Resources Information Center
Bibiso, Abyot; Olango, Menna; Bibiso, Mesfin
2017-01-01
The purpose of this study was to investigate the relationship between teacher's commitment and female students academic achievement in selected secondary school of Wolaita zone, Southern Ethiopia. The research method employed was survey study and the sampling techniques were purposive, simple random and stratified random sampling. Questionnaire…
The Analysis of Organizational Diagnosis on Based Six Box Model in Universities
ERIC Educational Resources Information Center
Hamid, Rahimi; Siadat, Sayyed Ali; Reza, Hoveida; Arash, Shahin; Ali, Nasrabadi Hasan; Azizollah, Arbabisarjou
2011-01-01
Purpose: The analysis of organizational diagnosis on based six box model at universities. Research method: Research method was descriptive-survey. Statistical population consisted of 1544 faculty members of universities which through random strafed sampling method 218 persons were chosen as the sample. Research Instrument were organizational…
Response Rates in Random-Digit-Dialed Telephone Surveys: Estimation vs. Measurement.
ERIC Educational Resources Information Center
Franz, Jennifer D.
The efficacy of the random digit dialing method in telephone surveys was examined. Random digit dialing (RDD) generates a pure random sample and provides the advantage of including unlisted phone numbers, as well as numbers which are too new to be listed. Its disadvantage is that it generates a major proportion of nonworking and business…
MicroRNA array normalization: an evaluation using a randomized dataset as the benchmark.
Qin, Li-Xuan; Zhou, Qin
2014-01-01
MicroRNA arrays possess a number of unique data features that challenge the assumption key to many normalization methods. We assessed the performance of existing normalization methods using two microRNA array datasets derived from the same set of tumor samples: one dataset was generated using a blocked randomization design when assigning arrays to samples and hence was free of confounding array effects; the second dataset was generated without blocking or randomization and exhibited array effects. The randomized dataset was assessed for differential expression between two tumor groups and treated as the benchmark. The non-randomized dataset was assessed for differential expression after normalization and compared against the benchmark. Normalization improved the true positive rate significantly in the non-randomized data but still possessed a false discovery rate as high as 50%. Adding a batch adjustment step before normalization further reduced the number of false positive markers while maintaining a similar number of true positive markers, which resulted in a false discovery rate of 32% to 48%, depending on the specific normalization method. We concluded the paper with some insights on possible causes of false discoveries to shed light on how to improve normalization for microRNA arrays.
MicroRNA Array Normalization: An Evaluation Using a Randomized Dataset as the Benchmark
Qin, Li-Xuan; Zhou, Qin
2014-01-01
MicroRNA arrays possess a number of unique data features that challenge the assumption key to many normalization methods. We assessed the performance of existing normalization methods using two microRNA array datasets derived from the same set of tumor samples: one dataset was generated using a blocked randomization design when assigning arrays to samples and hence was free of confounding array effects; the second dataset was generated without blocking or randomization and exhibited array effects. The randomized dataset was assessed for differential expression between two tumor groups and treated as the benchmark. The non-randomized dataset was assessed for differential expression after normalization and compared against the benchmark. Normalization improved the true positive rate significantly in the non-randomized data but still possessed a false discovery rate as high as 50%. Adding a batch adjustment step before normalization further reduced the number of false positive markers while maintaining a similar number of true positive markers, which resulted in a false discovery rate of 32% to 48%, depending on the specific normalization method. We concluded the paper with some insights on possible causes of false discoveries to shed light on how to improve normalization for microRNA arrays. PMID:24905456
Convenience Samples and Caregiving Research: How Generalizable Are the Findings?
ERIC Educational Resources Information Center
Pruchno, Rachel A.; Brill, Jonathan E.; Shands, Yvonne; Gordon, Judith R.; Genderson, Maureen Wilson; Rose, Miriam; Cartwright, Francine
2008-01-01
Purpose: We contrast characteristics of respondents recruited using convenience strategies with those of respondents recruited by random digit dial (RDD) methods. We compare sample variances, means, and interrelationships among variables generated from the convenience and RDD samples. Design and Methods: Women aged 50 to 64 who work full time and…
Improvement of Predictive Ability by Uniform Coverage of the Target Genetic Space
Bustos-Korts, Daniela; Malosetti, Marcos; Chapman, Scott; Biddulph, Ben; van Eeuwijk, Fred
2016-01-01
Genome-enabled prediction provides breeders with the means to increase the number of genotypes that can be evaluated for selection. One of the major challenges in genome-enabled prediction is how to construct a training set of genotypes from a calibration set that represents the target population of genotypes, where the calibration set is composed of a training and validation set. A random sampling protocol of genotypes from the calibration set will lead to low quality coverage of the total genetic space by the training set when the calibration set contains population structure. As a consequence, predictive ability will be affected negatively, because some parts of the genotypic diversity in the target population will be under-represented in the training set, whereas other parts will be over-represented. Therefore, we propose a training set construction method that uniformly samples the genetic space spanned by the target population of genotypes, thereby increasing predictive ability. To evaluate our method, we constructed training sets alongside with the identification of corresponding genomic prediction models for four genotype panels that differed in the amount of population structure they contained (maize Flint, maize Dent, wheat, and rice). Training sets were constructed using uniform sampling, stratified-uniform sampling, stratified sampling and random sampling. We compared these methods with a method that maximizes the generalized coefficient of determination (CD). Several training set sizes were considered. We investigated four genomic prediction models: multi-locus QTL models, GBLUP models, combinations of QTL and GBLUPs, and Reproducing Kernel Hilbert Space (RKHS) models. For the maize and wheat panels, construction of the training set under uniform sampling led to a larger predictive ability than under stratified and random sampling. The results of our methods were similar to those of the CD method. For the rice panel, all training set construction methods led to similar predictive ability, a reflection of the very strong population structure in this panel. PMID:27672112
Simulation of the Effects of Random Measurement Errors
ERIC Educational Resources Information Center
Kinsella, I. A.; Hannaidh, P. B. O.
1978-01-01
Describes a simulation method for measurement of errors that requires calculators and tables of random digits. Each student simulates the random behaviour of the component variables in the function and by combining the results of all students, the outline of the sampling distribution of the function can be obtained. (GA)
Tassie, Jean-Michel; Malateste, Karen; Pujades-Rodríguez, Mar; Poulet, Elisabeth; Bennett, Diane; Harries, Anthony; Mahy, Mary; Schechter, Mauro; Souteyrand, Yves; Dabis, François
2010-11-10
Retention of patients on antiretroviral therapy (ART) over time is a proxy for quality of care and an outcome indicator to monitor ART programs. Using existing databases (Antiretroviral in Lower Income Countries of the International Databases to Evaluate AIDS and Médecins Sans Frontières), we evaluated three sampling approaches to simplify the generation of outcome indicators. We used individual patient data from 27 ART sites and included 27,201 ART-naive adults (≥15 years) who initiated ART in 2005. For each site, we generated two outcome indicators at 12 months, retention on ART and proportion of patients lost to follow-up (LFU), first using all patient data and then within a smaller group of patients selected using three sampling methods (random, systematic and consecutive sampling). For each method and each site, 500 samples were generated, and the average result was compared with the unsampled value. The 95% sampling distribution (SD) was expressed as the 2.5(th) and 97.5(th) percentile values from the 500 samples. Overall, retention on ART was 76.5% (range 58.9-88.6) and the proportion of patients LFU, 13.5% (range 0.8-31.9). Estimates of retention from sampling (n = 5696) were 76.5% (SD 75.4-77.7) for random, 76.5% (75.3-77.5) for systematic and 76.0% (74.1-78.2) for the consecutive method. Estimates for the proportion of patients LFU were 13.5% (12.6-14.5), 13.5% (12.6-14.3) and 14.0% (12.5-15.5), respectively. With consecutive sampling, 50% of sites had SD within ±5% of the unsampled site value. Our results suggest that random, systematic or consecutive sampling methods are feasible for monitoring ART indicators at national level. However, sampling may not produce precise estimates in some sites.
Xu, Chonggang; Gertner, George
2013-01-01
Fourier Amplitude Sensitivity Test (FAST) is one of the most popular uncertainty and sensitivity analysis techniques. It uses a periodic sampling approach and a Fourier transformation to decompose the variance of a model output into partial variances contributed by different model parameters. Until now, the FAST analysis is mainly confined to the estimation of partial variances contributed by the main effects of model parameters, but does not allow for those contributed by specific interactions among parameters. In this paper, we theoretically show that FAST analysis can be used to estimate partial variances contributed by both main effects and interaction effects of model parameters using different sampling approaches (i.e., traditional search-curve based sampling, simple random sampling and random balance design sampling). We also analytically calculate the potential errors and biases in the estimation of partial variances. Hypothesis tests are constructed to reduce the effect of sampling errors on the estimation of partial variances. Our results show that compared to simple random sampling and random balance design sampling, sensitivity indices (ratios of partial variances to variance of a specific model output) estimated by search-curve based sampling generally have higher precision but larger underestimations. Compared to simple random sampling, random balance design sampling generally provides higher estimation precision for partial variances contributed by the main effects of parameters. The theoretical derivation of partial variances contributed by higher-order interactions and the calculation of their corresponding estimation errors in different sampling schemes can help us better understand the FAST method and provide a fundamental basis for FAST applications and further improvements. PMID:24143037
Xu, Chonggang; Gertner, George
2011-01-01
Fourier Amplitude Sensitivity Test (FAST) is one of the most popular uncertainty and sensitivity analysis techniques. It uses a periodic sampling approach and a Fourier transformation to decompose the variance of a model output into partial variances contributed by different model parameters. Until now, the FAST analysis is mainly confined to the estimation of partial variances contributed by the main effects of model parameters, but does not allow for those contributed by specific interactions among parameters. In this paper, we theoretically show that FAST analysis can be used to estimate partial variances contributed by both main effects and interaction effects of model parameters using different sampling approaches (i.e., traditional search-curve based sampling, simple random sampling and random balance design sampling). We also analytically calculate the potential errors and biases in the estimation of partial variances. Hypothesis tests are constructed to reduce the effect of sampling errors on the estimation of partial variances. Our results show that compared to simple random sampling and random balance design sampling, sensitivity indices (ratios of partial variances to variance of a specific model output) estimated by search-curve based sampling generally have higher precision but larger underestimations. Compared to simple random sampling, random balance design sampling generally provides higher estimation precision for partial variances contributed by the main effects of parameters. The theoretical derivation of partial variances contributed by higher-order interactions and the calculation of their corresponding estimation errors in different sampling schemes can help us better understand the FAST method and provide a fundamental basis for FAST applications and further improvements.
Lensless digital holography with diffuse illumination through a pseudo-random phase mask.
Bernet, Stefan; Harm, Walter; Jesacher, Alexander; Ritsch-Marte, Monika
2011-12-05
Microscopic imaging with a setup consisting of a pseudo-random phase mask, and an open CMOS camera, without an imaging objective, is demonstrated. The pseudo random phase mask acts as a diffuser for an incoming laser beam, scattering a speckle pattern to a CMOS chip, which is recorded once as a reference. A sample which is afterwards inserted somewhere in the optical beam path changes the speckle pattern. A single (non-iterative) image processing step, comparing the modified speckle pattern with the previously recorded one, generates a sharp image of the sample. After a first calibration the method works in real-time and allows quantitative imaging of complex (amplitude and phase) samples in an extended three-dimensional volume. Since no lenses are used, the method is free from lens abberations. Compared to standard inline holography the diffuse sample illumination improves the axial sectioning capability by increasing the effective numerical aperture in the illumination path, and it suppresses the undesired so-called twin images. For demonstration, a high resolution spatial light modulator (SLM) is programmed to act as the pseudo-random phase mask. We show experimental results, imaging microscopic biological samples, e.g. insects, within an extended volume at a distance of 15 cm with a transverse and longitudinal resolution of about 60 μm and 400 μm, respectively.
ERIC Educational Resources Information Center
Sebro, Negusse Yohannes; Goshu, Ayele Taye
2017-01-01
This study aims to explore Bayesian multilevel modeling to investigate variations of average academic achievement of grade eight school students. A sample of 636 students is randomly selected from 26 private and government schools by a two-stage stratified sampling design. Bayesian method is used to estimate the fixed and random effects. Input and…
A Numerical Theory for Impedance Education in Three-Dimensional Normal Incidence Tubes
NASA Technical Reports Server (NTRS)
Watson, Willie R.; Jones, Michael G.
2016-01-01
A method for educing the locally-reacting acoustic impedance of a test sample mounted in a 3-D normal incidence impedance tube is presented and validated. The unique feature of the method is that the excitation frequency (or duct geometry) may be such that high-order duct modes may exist. The method educes the impedance, iteratively, by minimizing an objective function consisting of the difference between the measured and numerically computed acoustic pressure at preselected measurement points in the duct. The method is validated on planar and high-order mode sources with data synthesized from exact mode theory. These data are then subjected to random jitter to simulate the effects of measurement uncertainties on the educed impedance spectrum. The primary conclusions of the study are 1) Without random jitter the method is in excellent agreement with that for known impedance samples, and 2) Random jitter that is compatible to that found in a typical experiment has minimal impact on the accuracy of the educed impedance.
Shear wave speed estimation by adaptive random sample consensus method.
Lin, Haoming; Wang, Tianfu; Chen, Siping
2014-01-01
This paper describes a new method for shear wave velocity estimation that is capable of extruding outliers automatically without preset threshold. The proposed method is an adaptive random sample consensus (ARANDSAC) and the metric used here is finding the certain percentage of inliers according to the closest distance criterion. To evaluate the method, the simulation and phantom experiment results were compared using linear regression with all points (LRWAP) and radon sum transform (RS) method. The assessment reveals that the relative biases of mean estimation are 20.00%, 4.67% and 5.33% for LRWAP, ARANDSAC and RS respectively for simulation, 23.53%, 4.08% and 1.08% for phantom experiment. The results suggested that the proposed ARANDSAC algorithm is accurate in shear wave speed estimation.
VNIR hyperspectral background characterization methods in adverse weather conditions
NASA Astrophysics Data System (ADS)
Romano, João M.; Rosario, Dalton; Roth, Luz
2009-05-01
Hyperspectral technology is currently being used by the military to detect regions of interest where potential targets may be located. Weather variability, however, may affect the ability for an algorithm to discriminate possible targets from background clutter. Nonetheless, different background characterization approaches may facilitate the ability for an algorithm to discriminate potential targets over a variety of weather conditions. In a previous paper, we introduced a new autonomous target size invariant background characterization process, the Autonomous Background Characterization (ABC) or also known as the Parallel Random Sampling (PRS) method, features a random sampling stage, a parallel process to mitigate the inclusion by chance of target samples into clutter background classes during random sampling; and a fusion of results at the end. In this paper, we will demonstrate how different background characterization approaches are able to improve performance of algorithms over a variety of challenging weather conditions. By using the Mahalanobis distance as the standard algorithm for this study, we compare the performance of different characterization methods such as: the global information, 2 stage global information, and our proposed method, ABC, using data that was collected under a variety of adverse weather conditions. For this study, we used ARDEC's Hyperspectral VNIR Adverse Weather data collection comprised of heavy, light, and transitional fog, light and heavy rain, and low light conditions.
Quantifying errors without random sampling.
Phillips, Carl V; LaPole, Luwanna M
2003-06-12
All quantifications of mortality, morbidity, and other health measures involve numerous sources of error. The routine quantification of random sampling error makes it easy to forget that other sources of error can and should be quantified. When a quantification does not involve sampling, error is almost never quantified and results are often reported in ways that dramatically overstate their precision. We argue that the precision implicit in typical reporting is problematic and sketch methods for quantifying the various sources of error, building up from simple examples that can be solved analytically to more complex cases. There are straightforward ways to partially quantify the uncertainty surrounding a parameter that is not characterized by random sampling, such as limiting reported significant figures. We present simple methods for doing such quantifications, and for incorporating them into calculations. More complicated methods become necessary when multiple sources of uncertainty must be combined. We demonstrate that Monte Carlo simulation, using available software, can estimate the uncertainty resulting from complicated calculations with many sources of uncertainty. We apply the method to the current estimate of the annual incidence of foodborne illness in the United States. Quantifying uncertainty from systematic errors is practical. Reporting this uncertainty would more honestly represent study results, help show the probability that estimated values fall within some critical range, and facilitate better targeting of further research.
Vallée, Julie; Souris, Marc; Fournet, Florence; Bochaton, Audrey; Mobillion, Virginie; Peyronnie, Karine; Salem, Gérard
2007-01-01
Background Geographical objectives and probabilistic methods are difficult to reconcile in a unique health survey. Probabilistic methods focus on individuals to provide estimates of a variable's prevalence with a certain precision, while geographical approaches emphasise the selection of specific areas to study interactions between spatial characteristics and health outcomes. A sample selected from a small number of specific areas creates statistical challenges: the observations are not independent at the local level, and this results in poor statistical validity at the global level. Therefore, it is difficult to construct a sample that is appropriate for both geographical and probability methods. Methods We used a two-stage selection procedure with a first non-random stage of selection of clusters. Instead of randomly selecting clusters, we deliberately chose a group of clusters, which as a whole would contain all the variation in health measures in the population. As there was no health information available before the survey, we selected a priori determinants that can influence the spatial homogeneity of the health characteristics. This method yields a distribution of variables in the sample that closely resembles that in the overall population, something that cannot be guaranteed with randomly-selected clusters, especially if the number of selected clusters is small. In this way, we were able to survey specific areas while minimising design effects and maximising statistical precision. Application We applied this strategy in a health survey carried out in Vientiane, Lao People's Democratic Republic. We selected well-known health determinants with unequal spatial distribution within the city: nationality and literacy. We deliberately selected a combination of clusters whose distribution of nationality and literacy is similar to the distribution in the general population. Conclusion This paper describes the conceptual reasoning behind the construction of the survey sample and shows that it can be advantageous to choose clusters using reasoned hypotheses, based on both probability and geographical approaches, in contrast to a conventional, random cluster selection strategy. PMID:17543100
Federal Register 2010, 2011, 2012, 2013, 2014
2013-11-27
... permissible methods of taking, other means of effecting the least practicable impact on the species or stock... non-destructive sampling methods to monitor rocky intertidal algal and invertebrate species abundances... and random quadrat are sampled, using methods described by Foster et al. (1991) and Dethier et al...
The mean and variance of phylogenetic diversity under rarefaction
Matsen, Frederick A.
2013-01-01
Summary Phylogenetic diversity (PD) depends on sampling depth, which complicates the comparison of PD between samples of different depth. One approach to dealing with differing sample depth for a given diversity statistic is to rarefy, which means to take a random subset of a given size of the original sample. Exact analytical formulae for the mean and variance of species richness under rarefaction have existed for some time but no such solution exists for PD.We have derived exact formulae for the mean and variance of PD under rarefaction. We confirm that these formulae are correct by comparing exact solution mean and variance to that calculated by repeated random (Monte Carlo) subsampling of a dataset of stem counts of woody shrubs of Toohey Forest, Queensland, Australia. We also demonstrate the application of the method using two examples: identifying hotspots of mammalian diversity in Australasian ecoregions, and characterising the human vaginal microbiome.There is a very high degree of correspondence between the analytical and random subsampling methods for calculating mean and variance of PD under rarefaction, although the Monte Carlo method requires a large number of random draws to converge on the exact solution for the variance.Rarefaction of mammalian PD of ecoregions in Australasia to a common standard of 25 species reveals very different rank orderings of ecoregions, indicating quite different hotspots of diversity than those obtained for unrarefied PD. The application of these methods to the vaginal microbiome shows that a classical score used to quantify bacterial vaginosis is correlated with the shape of the rarefaction curve.The analytical formulae for the mean and variance of PD under rarefaction are both exact and more efficient than repeated subsampling. Rarefaction of PD allows for many applications where comparisons of samples of different depth is required. PMID:23833701
The mean and variance of phylogenetic diversity under rarefaction.
Nipperess, David A; Matsen, Frederick A
2013-06-01
Phylogenetic diversity (PD) depends on sampling depth, which complicates the comparison of PD between samples of different depth. One approach to dealing with differing sample depth for a given diversity statistic is to rarefy, which means to take a random subset of a given size of the original sample. Exact analytical formulae for the mean and variance of species richness under rarefaction have existed for some time but no such solution exists for PD.We have derived exact formulae for the mean and variance of PD under rarefaction. We confirm that these formulae are correct by comparing exact solution mean and variance to that calculated by repeated random (Monte Carlo) subsampling of a dataset of stem counts of woody shrubs of Toohey Forest, Queensland, Australia. We also demonstrate the application of the method using two examples: identifying hotspots of mammalian diversity in Australasian ecoregions, and characterising the human vaginal microbiome.There is a very high degree of correspondence between the analytical and random subsampling methods for calculating mean and variance of PD under rarefaction, although the Monte Carlo method requires a large number of random draws to converge on the exact solution for the variance.Rarefaction of mammalian PD of ecoregions in Australasia to a common standard of 25 species reveals very different rank orderings of ecoregions, indicating quite different hotspots of diversity than those obtained for unrarefied PD. The application of these methods to the vaginal microbiome shows that a classical score used to quantify bacterial vaginosis is correlated with the shape of the rarefaction curve.The analytical formulae for the mean and variance of PD under rarefaction are both exact and more efficient than repeated subsampling. Rarefaction of PD allows for many applications where comparisons of samples of different depth is required.
Salter, Robert; Holmes, Steven; Legg, David; Coble, Joel; George, Bruce
2012-02-01
Pork tissue samples that tested positive and negative by the Charm II tetracycline test screening method in the slaughter plant laboratory were tested with the modified AOAC International liquid chromatography tandem mass spectrometry (LC-MS-MS) method 995.09 to determine the predictive value of the screening method at detecting total tetracyclines at 10 μg/kg of tissue, in compliance with Russian import regulations. There were 218 presumptive-positive tetracycline samples of 4,195 randomly tested hogs. Of these screening test positive samples, 83% (182) were positive, >10 μg/kg by LC-MS-MS; 12.8% (28) were false violative, greater than limit of detection (LOD) but <10 μg/kg; and 4.2% (8) were not detected at the LC-MS-MS LOD. The 36 false-violative and not-detected samples represent 1% of the total samples screened. Twenty-seven of 30 randomly selected tetracycline screening negative samples tested below the LC-MS-MS LOD, and 3 samples tested <3 μg/kg chlortetracycline. Results indicate that the Charm II tetracycline test is effective at predicting hogs containing >10 μg/kg total tetracyclines in compliance with Russian import regulations.
Thomas B. Lynch; Jeffrey H. Gove
2014-01-01
The typical "double counting" application of the mirage method of boundary correction cannot be applied to sampling systems such as critical height sampling (CHS) that are based on a Monte Carlo sample of a tree (or debris) attribute because the critical height (or other random attribute) sampled from a mirage point is generally not equal to the critical...
A distance limited method for sampling downed coarse woody debris
Jeffrey H. Gove; Mark J. Ducey; Harry T. Valentine; Michael S. Williams
2012-01-01
A new sampling method for down coarse woody debris is proposed based on limiting the perpendicular distance from individual pieces to a randomly chosen sample point. Two approaches are presented that allow different protocols to be used to determine field measurements; estimators for each protocol are also developed. Both protocols are compared via simulation against...
Adaptive cluster sampling: An efficient method for assessing inconspicuous species
Andrea M. Silletti; Joan Walker
2003-01-01
Restorationistis typically evaluate the success of a project by estimating the population sizes of species that have been planted or seeded. Because total census is raely feasible, they must rely on sampling methods for population estimates. However, traditional random sampling designs may be inefficient for species that, for one reason or another, are challenging to...
Linear combinations come alive in crossover designs.
Shuster, Jonathan J
2017-10-30
Before learning anything about statistical inference in beginning service courses in biostatistics, students learn how to calculate the mean and variance of linear combinations of random variables. Practical precalculus examples of the importance of these exercises can be helpful for instructors, the target audience of this paper. We shall present applications to the "1-sample" and "2-sample" methods for randomized short-term 2-treatment crossover studies, where patients experience both treatments in random order with a "washout" between the active treatment periods. First, we show that the 2-sample method is preferred as it eliminates "conditional bias" when sample sizes by order differ and produces a smaller variance. We also demonstrate that it is usually advisable to use the differences in posttests (ignoring baseline and post washout values) rather than the differences between the changes in treatment from the start of the period to the end of the period ("delta of delta"). Although the intent is not to provide a definitive discussion of crossover designs, we provide a section and references to excellent alternative methods, where instructors can provide motivation to students to explore the topic in greater detail in future readings or courses. Copyright © 2017 John Wiley & Sons, Ltd.
Using random telephone sampling to recruit generalizable samples for family violence studies.
Slep, Amy M Smith; Heyman, Richard E; Williams, Mathew C; Van Dyke, Cheryl E; O'Leary, Susan G
2006-12-01
Convenience sampling methods predominate in recruiting for laboratory-based studies within clinical and family psychology. The authors used random digit dialing (RDD) to determine whether they could feasibly recruit generalizable samples for 2 studies (a parenting study and an intimate partner violence study). RDD screen response rate was 42-45%; demographics matched those in the 2000 U.S. Census, with small- to medium-sized differences on race, age, and income variables. RDD respondents who qualified for, but did not participate in, the laboratory study of parents showed small differences on income, couple conflicts, and corporal punishment. Time and cost are detailed, suggesting that RDD may be a feasible, effective method by which to recruit more generalizable samples for in-laboratory studies of family violence when those studies have sufficient resources. (c) 2006 APA, all rights reserved.
High-speed imaging using CMOS image sensor with quasi pixel-wise exposure
NASA Astrophysics Data System (ADS)
Sonoda, T.; Nagahara, H.; Endo, K.; Sugiyama, Y.; Taniguchi, R.
2017-02-01
Several recent studies in compressive video sensing have realized scene capture beyond the fundamental trade-off limit between spatial resolution and temporal resolution using random space-time sampling. However, most of these studies showed results for higher frame rate video that were produced by simulation experiments or using an optically simulated random sampling camera, because there are currently no commercially available image sensors with random exposure or sampling capabilities. We fabricated a prototype complementary metal oxide semiconductor (CMOS) image sensor with quasi pixel-wise exposure timing that can realize nonuniform space-time sampling. The prototype sensor can reset exposures independently by columns and fix these amount of exposure by rows for each 8x8 pixel block. This CMOS sensor is not fully controllable via the pixels, and has line-dependent controls, but it offers flexibility when compared with regular CMOS or charge-coupled device sensors with global or rolling shutters. We propose a method to realize pseudo-random sampling for high-speed video acquisition that uses the flexibility of the CMOS sensor. We reconstruct the high-speed video sequence from the images produced by pseudo-random sampling using an over-complete dictionary.
Mapping ecological systems with a random foret model: tradeoffs between errors and bias
Emilie Grossmann; Janet Ohmann; James Kagan; Heather May; Matthew Gregory
2010-01-01
New methods for predictive vegetation mapping allow improved estimations of plant community composition across large regions. Random Forest (RF) models limit over-fitting problems of other methods, and are known for making accurate classification predictions from noisy, nonnormal data, but can be biased when plot samples are unbalanced. We developed two contrasting...
Economic Intervention and Parenting: A Randomized Experiment of Statewide Child Development Accounts
ERIC Educational Resources Information Center
Nam, Yunju; Wikoff, Nora; Sherraden, Michael
2016-01-01
Objective: We examine the effects of Child Development Accounts (CDAs) on parenting stress and practices. Methods: We use data from the SEED for Oklahoma Kids (SEED OK) experiment. SEED OK selected caregivers of infants from Oklahoma birth certificates using a probability sampling method, randomly assigned caregivers to the treatment (n = 1,132)…
ERIC Educational Resources Information Center
Modebelu, M. N.; Ogbonna, C. C.
2014-01-01
This study aimed at determining the effect of reform-based-instructional method learning styles on students' achievement and retention in mathematics. A sample size of 119 students was randomly selected. The quasiexperimental design comprising pre-test, post-test, and randomized control group were employed. The Collin Rose learning styles…
ERIC Educational Resources Information Center
Le, Huynh-Nhu; Perry, Deborah F.; Stuart, Elizabeth A.
2011-01-01
Objective: A randomized controlled trial was conducted to evaluate the efficacy of a cognitive-behavioral (CBT) intervention to prevent perinatal depression in high-risk Latinas. Method: A sample of 217 participants, predominantly low-income Central American immigrants who met demographic and depression risk criteria, were randomized into usual…
Self-balanced real-time photonic scheme for ultrafast random number generation
NASA Astrophysics Data System (ADS)
Li, Pu; Guo, Ya; Guo, Yanqiang; Fan, Yuanlong; Guo, Xiaomin; Liu, Xianglian; Shore, K. Alan; Dubrova, Elena; Xu, Bingjie; Wang, Yuncai; Wang, Anbang
2018-06-01
We propose a real-time self-balanced photonic method for extracting ultrafast random numbers from broadband randomness sources. In place of electronic analog-to-digital converters (ADCs), the balanced photo-detection technology is used to directly quantize optically sampled chaotic pulses into a continuous random number stream. Benefitting from ultrafast photo-detection, our method can efficiently eliminate the generation rate bottleneck from electronic ADCs which are required in nearly all the available fast physical random number generators. A proof-of-principle experiment demonstrates that using our approach 10 Gb/s real-time and statistically unbiased random numbers are successfully extracted from a bandwidth-enhanced chaotic source. The generation rate achieved experimentally here is being limited by the bandwidth of the chaotic source. The method described has the potential to attain a real-time rate of 100 Gb/s.
Systematic random sampling of the comet assay.
McArt, Darragh G; Wasson, Gillian R; McKerr, George; Saetzler, Kurt; Reed, Matt; Howard, C Vyvyan
2009-07-01
The comet assay is a technique used to quantify DNA damage and repair at a cellular level. In the assay, cells are embedded in agarose and the cellular content is stripped away leaving only the DNA trapped in an agarose cavity which can then be electrophoresed. The damaged DNA can enter the agarose and migrate while the undamaged DNA cannot and is retained. DNA damage is measured as the proportion of the migratory 'tail' DNA compared to the total DNA in the cell. The fundamental basis of these arbitrary values is obtained in the comet acquisition phase using fluorescence microscopy with a stoichiometric stain in tandem with image analysis software. Current methods deployed in such an acquisition are expected to be both objectively and randomly obtained. In this paper we examine the 'randomness' of the acquisition phase and suggest an alternative method that offers both objective and unbiased comet selection. In order to achieve this, we have adopted a survey sampling approach widely used in stereology, which offers a method of systematic random sampling (SRS). This is desirable as it offers an impartial and reproducible method of comet analysis that can be used both manually or automated. By making use of an unbiased sampling frame and using microscope verniers, we are able to increase the precision of estimates of DNA damage. Results obtained from a multiple-user pooled variation experiment showed that the SRS technique attained a lower variability than that of the traditional approach. The analysis of a single user with repetition experiment showed greater individual variances while not being detrimental to overall averages. This would suggest that the SRS method offers a better reflection of DNA damage for a given slide and also offers better user reproducibility.
ERIC Educational Resources Information Center
Bawaneh, Ali Khalid Ali; Zain, Ahmad Nurulazam Md; Ghazali, Munirah
2010-01-01
The purpose of the present study is to investigate the effectiveness of Conflict Maps and the V-Shape method as teaching methods in bringing about conceptual change in science among primary eighth-grade students in Jordan. A randomly selected sample (N = 63) from the Bani Kenana region North of Jordan was randomly assigned to the two teaching…
Novel application of the MSSCP method in biodiversity studies.
Tomczyk-Żak, Karolina; Kaczanowski, Szymon; Górecka, Magdalena; Zielenkiewicz, Urszula
2012-02-01
Analysis of 16S rRNA sequence diversity is widely performed for characterizing the biodiversity of microbial samples. The number of determined sequences has a considerable impact on complete results. Although the cost of mass sequencing is decreasing, it is often still too high for individual projects. We applied the multi-temperature single-strand conformational polymorphism (MSSCP) method to decrease the number of analysed sequences. This was a novel application of this method. As a control, the same sample was analysed using random sequencing. In this paper, we adapted the MSSCP technique for screening of unique sequences of the 16S rRNA gene library and bacterial strains isolated from biofilms growing on the walls of an ancient gold mine in Poland and determined whether the results obtained by both methods differed and whether random sequencing could be replaced by MSSCP. Although it was biased towards the detection of rare sequences in the samples, the qualitative results of MSSCP were not different than those of random sequencing. Unambiguous discrimination of unique clones and strains creates an opportunity to effectively estimate the biodiversity of natural communities, especially in populations which are numerous but species poor. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Erener, Arzu; Sivas, A. Abdullah; Selcuk-Kestel, A. Sevtap; Düzgün, H. Sebnem
2017-07-01
All of the quantitative landslide susceptibility mapping (QLSM) methods requires two basic data types, namely, landslide inventory and factors that influence landslide occurrence (landslide influencing factors, LIF). Depending on type of landslides, nature of triggers and LIF, accuracy of the QLSM methods differs. Moreover, how to balance the number of 0 (nonoccurrence) and 1 (occurrence) in the training set obtained from the landslide inventory and how to select which one of the 1's and 0's to be included in QLSM models play critical role in the accuracy of the QLSM. Although performance of various QLSM methods is largely investigated in the literature, the challenge of training set construction is not adequately investigated for the QLSM methods. In order to tackle this challenge, in this study three different training set selection strategies along with the original data set is used for testing the performance of three different regression methods namely Logistic Regression (LR), Bayesian Logistic Regression (BLR) and Fuzzy Logistic Regression (FLR). The first sampling strategy is proportional random sampling (PRS), which takes into account a weighted selection of landslide occurrences in the sample set. The second method, namely non-selective nearby sampling (NNS), includes randomly selected sites and their surrounding neighboring points at certain preselected distances to include the impact of clustering. Selective nearby sampling (SNS) is the third method, which concentrates on the group of 1's and their surrounding neighborhood. A randomly selected group of landslide sites and their neighborhood are considered in the analyses similar to NNS parameters. It is found that LR-PRS, FLR-PRS and BLR-Whole Data set-ups, with order, yield the best fits among the other alternatives. The results indicate that in QLSM based on regression models, avoidance of spatial correlation in the data set is critical for the model's performance.
Methodological reporting of randomized trials in five leading Chinese nursing journals.
Shi, Chunhu; Tian, Jinhui; Ren, Dan; Wei, Hongli; Zhang, Lihuan; Wang, Quan; Yang, Kehu
2014-01-01
Randomized controlled trials (RCTs) are not always well reported, especially in terms of their methodological descriptions. This study aimed to investigate the adherence of methodological reporting complying with CONSORT and explore associated trial level variables in the Chinese nursing care field. In June 2012, we identified RCTs published in five leading Chinese nursing journals and included trials with details of randomized methods. The quality of methodological reporting was measured through the methods section of the CONSORT checklist and the overall CONSORT methodological items score was calculated and expressed as a percentage. Meanwhile, we hypothesized that some general and methodological characteristics were associated with reporting quality and conducted a regression with these data to explore the correlation. The descriptive and regression statistics were calculated via SPSS 13.0. In total, 680 RCTs were included. The overall CONSORT methodological items score was 6.34 ± 0.97 (Mean ± SD). No RCT reported descriptions and changes in "trial design," changes in "outcomes" and "implementation," or descriptions of the similarity of interventions for "blinding." Poor reporting was found in detailing the "settings of participants" (13.1%), "type of randomization sequence generation" (1.8%), calculation methods of "sample size" (0.4%), explanation of any interim analyses and stopping guidelines for "sample size" (0.3%), "allocation concealment mechanism" (0.3%), additional analyses in "statistical methods" (2.1%), and targeted subjects and methods of "blinding" (5.9%). More than 50% of trials described randomization sequence generation, the eligibility criteria of "participants," "interventions," and definitions of the "outcomes" and "statistical methods." The regression analysis found that publication year and ITT analysis were weakly associated with CONSORT score. The completeness of methodological reporting of RCTs in the Chinese nursing care field is poor, especially with regard to the reporting of trial design, changes in outcomes, sample size calculation, allocation concealment, blinding, and statistical methods.
Vallée, Julie; Souris, Marc; Fournet, Florence; Bochaton, Audrey; Mobillion, Virginie; Peyronnie, Karine; Salem, Gérard
2007-06-01
Geographical objectives and probabilistic methods are difficult to reconcile in a unique health survey. Probabilistic methods focus on individuals to provide estimates of a variable's prevalence with a certain precision, while geographical approaches emphasise the selection of specific areas to study interactions between spatial characteristics and health outcomes. A sample selected from a small number of specific areas creates statistical challenges: the observations are not independent at the local level, and this results in poor statistical validity at the global level. Therefore, it is difficult to construct a sample that is appropriate for both geographical and probability methods. We used a two-stage selection procedure with a first non-random stage of selection of clusters. Instead of randomly selecting clusters, we deliberately chose a group of clusters, which as a whole would contain all the variation in health measures in the population. As there was no health information available before the survey, we selected a priori determinants that can influence the spatial homogeneity of the health characteristics. This method yields a distribution of variables in the sample that closely resembles that in the overall population, something that cannot be guaranteed with randomly-selected clusters, especially if the number of selected clusters is small. In this way, we were able to survey specific areas while minimising design effects and maximising statistical precision. We applied this strategy in a health survey carried out in Vientiane, Lao People's Democratic Republic. We selected well-known health determinants with unequal spatial distribution within the city: nationality and literacy. We deliberately selected a combination of clusters whose distribution of nationality and literacy is similar to the distribution in the general population. This paper describes the conceptual reasoning behind the construction of the survey sample and shows that it can be advantageous to choose clusters using reasoned hypotheses, based on both probability and geographical approaches, in contrast to a conventional, random cluster selection strategy.
Gradient-free MCMC methods for dynamic causal modelling
Sengupta, Biswa; Friston, Karl J.; Penny, Will D.
2015-03-14
Here, we compare the performance of four gradient-free MCMC samplers (random walk Metropolis sampling, slice-sampling, adaptive MCMC sampling and population-based MCMC sampling with tempering) in terms of the number of independent samples they can produce per unit computational time. For the Bayesian inversion of a single-node neural mass model, both adaptive and population-based samplers are more efficient compared with random walk Metropolis sampler or slice-sampling; yet adaptive MCMC sampling is more promising in terms of compute time. Slice-sampling yields the highest number of independent samples from the target density -- albeit at almost 1000% increase in computational time, in comparisonmore » to the most efficient algorithm (i.e., the adaptive MCMC sampler).« less
2013-01-01
Background The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case–control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. Results We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson’s disease (PD) case–control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk. Conclusions We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS. PMID:23394771
NASA Astrophysics Data System (ADS)
Wu, J.; Yao, W.; Zhang, J.; Li, Y.
2018-04-01
Labeling 3D point cloud data with traditional supervised learning methods requires considerable labelled samples, the collection of which is cost and time expensive. This work focuses on adopting domain adaption concept to transfer existing trained random forest classifiers (based on source domain) to new data scenes (target domain), which aims at reducing the dependence of accurate 3D semantic labeling in point clouds on training samples from the new data scene. Firstly, two random forest classifiers were firstly trained with existing samples previously collected for other data. They were different from each other by using two different decision tree construction algorithms: C4.5 with information gain ratio and CART with Gini index. Secondly, four random forest classifiers adapted to the target domain are derived through transferring each tree in the source random forest models with two types of operations: structure expansion and reduction-SER and structure transfer-STRUT. Finally, points in target domain are labelled by fusing the four newly derived random forest classifiers using weights of evidence based fusion model. To validate our method, experimental analysis was conducted using 3 datasets: one is used as the source domain data (Vaihingen data for 3D Semantic Labelling); another two are used as the target domain data from two cities in China (Jinmen city and Dunhuang city). Overall accuracies of 85.5 % and 83.3 % for 3D labelling were achieved for Jinmen city and Dunhuang city data respectively, with only 1/3 newly labelled samples compared to the cases without domain adaption.
NASA Astrophysics Data System (ADS)
Wu, Jinglai; Luo, Zhen; Zhang, Nong; Zhang, Yunqing; Walker, Paul D.
2017-02-01
This paper proposes an uncertain modelling and computational method to analyze dynamic responses of rigid-flexible multibody systems (or mechanisms) with random geometry and material properties. Firstly, the deterministic model for the rigid-flexible multibody system is built with the absolute node coordinate formula (ANCF), in which the flexible parts are modeled by using ANCF elements, while the rigid parts are described by ANCF reference nodes (ANCF-RNs). Secondly, uncertainty for the geometry of rigid parts is expressed as uniform random variables, while the uncertainty for the material properties of flexible parts is modeled as a continuous random field, which is further discretized to Gaussian random variables using a series expansion method. Finally, a non-intrusive numerical method is developed to solve the dynamic equations of systems involving both types of random variables, which systematically integrates the deterministic generalized-α solver with Latin Hypercube sampling (LHS) and Polynomial Chaos (PC) expansion. The benchmark slider-crank mechanism is used as a numerical example to demonstrate the characteristics of the proposed method.
Creating ensembles of decision trees through sampling
Kamath, Chandrika; Cantu-Paz, Erick
2005-08-30
A system for decision tree ensembles that includes a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method is based on statistical sampling techniques and includes the steps of reading the data; sorting the data; evaluating a potential split according to some criterion using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.
Sample Size Calculations for Micro-randomized Trials in mHealth
Liao, Peng; Klasnja, Predrag; Tewari, Ambuj; Murphy, Susan A.
2015-01-01
The use and development of mobile interventions are experiencing rapid growth. In “just-in-time” mobile interventions, treatments are provided via a mobile device and they are intended to help an individual make healthy decisions “in the moment,” and thus have a proximal, near future impact. Currently the development of mobile interventions is proceeding at a much faster pace than that of associated data science methods. A first step toward developing data-based methods is to provide an experimental design for testing the proximal effects of these just-in-time treatments. In this paper, we propose a “micro-randomized” trial design for this purpose. In a micro-randomized trial, treatments are sequentially randomized throughout the conduct of the study, with the result that each participant may be randomized at the 100s or 1000s of occasions at which a treatment might be provided. Further, we develop a test statistic for assessing the proximal effect of a treatment as well as an associated sample size calculator. We conduct simulation evaluations of the sample size calculator in various settings. Rules of thumb that might be used in designing a micro-randomized trial are discussed. This work is motivated by our collaboration on the HeartSteps mobile application designed to increase physical activity. PMID:26707831
Sampling in epidemiological research: issues, hazards and pitfalls.
Tyrer, Stephen; Heyman, Bob
2016-04-01
Surveys of people's opinions are fraught with difficulties. It is easier to obtain information from those who respond to text messages or to emails than to attempt to obtain a representative sample. Samples of the population that are selected non-randomly in this way are termed convenience samples as they are easy to recruit. This introduces a sampling bias. Such non-probability samples have merit in many situations, but an epidemiological enquiry is of little value unless a random sample is obtained. If a sufficient number of those selected actually complete a survey, the results are likely to be representative of the population. This editorial describes probability and non-probability sampling methods and illustrates the difficulties and suggested solutions in performing accurate epidemiological research.
Sampling in epidemiological research: issues, hazards and pitfalls
Tyrer, Stephen; Heyman, Bob
2016-01-01
Surveys of people's opinions are fraught with difficulties. It is easier to obtain information from those who respond to text messages or to emails than to attempt to obtain a representative sample. Samples of the population that are selected non-randomly in this way are termed convenience samples as they are easy to recruit. This introduces a sampling bias. Such non-probability samples have merit in many situations, but an epidemiological enquiry is of little value unless a random sample is obtained. If a sufficient number of those selected actually complete a survey, the results are likely to be representative of the population. This editorial describes probability and non-probability sampling methods and illustrates the difficulties and suggested solutions in performing accurate epidemiological research. PMID:27087985
Mishra, S; Chawla, D; Agarwal, R; Deorari, A K; Paul, V K; Bhutani, V K
2009-12-01
We determined usefulness of transcutaneous bilirubinometry to decrease the need for blood sampling to assay serum total bilirubin (STB) in the management of jaundiced healthy Indian neonates. Newborns, > or =35 weeks' gestation, with clinical evidence of jaundice were enrolled in an institutional approved randomized clinical trial. The severity of hyperbilirubinaemia was determined by two non-invasive methods: i) protocol-based visual assessment of bilirubin (VaB) and ii) transcutaneous bilirubin (TcB) determination (BiliCheck). By a random allocation, either method was used to decide the need for blood sampling, which was defined to be present if assessed STB by allocated method exceeded 80% of hour-specific threshold values for phototherapy (2004 AAP Guidelines). A total of 617 neonates were randomized to either TcB (n = 314) or VaB (n = 303) groups with comparable gestation, birth weight and postnatal age. Need for blood sampling to assay STB was 34% lower (95% CI: 10% to 51%) in the TcB group compared with VaB group (17.5% vs 26.4% assessments; risk difference: -8.9%, 95% CI: -2.4% to -15.4%; p = 0.008). Routine use of transcutaneous bilirubinometry compared with systematic visual assessment of bilirubin significantly reduced the need for blood sampling to assay STB in jaundiced term and late-preterm neonates. (ClinicalTrials.gov number, NCT00653874).
Random vs. systematic sampling from administrative databases involving human subjects.
Hagino, C; Lo, R J
1998-09-01
Two sampling techniques, simple random sampling (SRS) and systematic sampling (SS), were compared to determine whether they yield similar and accurate distributions for the following four factors: age, gender, geographic location and years in practice. Any point estimate within 7 yr or 7 percentage points of its reference standard (SRS or the entire data set, i.e., the target population) was considered "acceptably similar" to the reference standard. The sampling frame was from the entire membership database of the Canadian Chiropractic Association. The two sampling methods were tested using eight different sample sizes of n (50, 100, 150, 200, 250, 300, 500, 800). From the profile/characteristics, summaries of four known factors [gender, average age, number (%) of chiropractors in each province and years in practice], between- and within-methods chi 2 tests and unpaired t tests were performed to determine whether any of the differences [descriptively greater than 7% or 7 yr] were also statistically significant. The strengths of the agreements between the provincial distributions were quantified by calculating the percent agreements for each (provincial pairwise-comparison methods). Any percent agreement less than 70% was judged to be unacceptable. Our assessments of the two sampling methods (SRS and SS) for the different sample sizes tested suggest that SRS and SS yielded acceptably similar results. Both methods started to yield "correct" sample profiles at approximately the same sample size (n > 200). SS is not only convenient, it can be recommended for sampling from large databases in which the data are listed without any inherent order biases other than alphabetical listing by surname.
Sefa, Eunice; Adimazoya, Edward Akolgo; Yartey, Emmanuel; Lenzi, Rachel; Tarpo, Cindy; Heward-Mills, Nii Lante; Lew, Katherine; Ampeh, Yvonne
2018-01-01
Introduction Generating a nationally representative sample in low and middle income countries typically requires resource-intensive household level sampling with door-to-door data collection. High mobile phone penetration rates in developing countries provide new opportunities for alternative sampling and data collection methods, but there is limited information about response rates and sample biases in coverage and nonresponse using these methods. We utilized data from an interactive voice response, random-digit dial, national mobile phone survey in Ghana to calculate standardized response rates and assess representativeness of the obtained sample. Materials and methods The survey methodology was piloted in two rounds of data collection. The final survey included 18 demographic, media exposure, and health behavior questions. Call outcomes and response rates were calculated according to the American Association of Public Opinion Research guidelines. Sample characteristics, productivity, and costs per interview were calculated. Representativeness was assessed by comparing data to the Ghana Demographic and Health Survey and the National Population and Housing Census. Results The survey was fielded during a 27-day period in February-March 2017. There were 9,469 completed interviews and 3,547 partial interviews. Response, cooperation, refusal, and contact rates were 31%, 81%, 7%, and 39% respectively. Twenty-three calls were dialed to produce an eligible contact: nonresponse was substantial due to the automated calling system and dialing of many unassigned or non-working numbers. Younger, urban, better educated, and male respondents were overrepresented in the sample. Conclusions The innovative mobile phone data collection methodology yielded a large sample in a relatively short period. Response rates were comparable to other surveys, although substantial coverage bias resulted from fewer women, rural, and older residents completing the mobile phone survey in comparison to household surveys. Random digit dialing of mobile phones offers promise for future data collection in Ghana and may be suitable for other developing countries. PMID:29351349
Mukaka, Mavuto; White, Sarah A; Terlouw, Dianne J; Mwapasa, Victor; Kalilani-Phiri, Linda; Faragher, E Brian
2016-07-22
Missing outcomes can seriously impair the ability to make correct inferences from randomized controlled trials (RCTs). Complete case (CC) analysis is commonly used, but it reduces sample size and is perceived to lead to reduced statistical efficiency of estimates while increasing the potential for bias. As multiple imputation (MI) methods preserve sample size, they are generally viewed as the preferred analytical approach. We examined this assumption, comparing the performance of CC and MI methods to determine risk difference (RD) estimates in the presence of missing binary outcomes. We conducted simulation studies of 5000 simulated data sets with 50 imputations of RCTs with one primary follow-up endpoint at different underlying levels of RD (3-25 %) and missing outcomes (5-30 %). For missing at random (MAR) or missing completely at random (MCAR) outcomes, CC method estimates generally remained unbiased and achieved precision similar to or better than MI methods, and high statistical coverage. Missing not at random (MNAR) scenarios yielded invalid inferences with both methods. Effect size estimate bias was reduced in MI methods by always including group membership even if this was unrelated to missingness. Surprisingly, under MAR and MCAR conditions in the assessed scenarios, MI offered no statistical advantage over CC methods. While MI must inherently accompany CC methods for intention-to-treat analyses, these findings endorse CC methods for per protocol risk difference analyses in these conditions. These findings provide an argument for the use of the CC approach to always complement MI analyses, with the usual caveat that the validity of the mechanism for missingness be thoroughly discussed. More importantly, researchers should strive to collect as much data as possible.
Monte Carlo Sampling in Fractal Landscapes
NASA Astrophysics Data System (ADS)
Leitão, Jorge C.; Lopes, J. M. Viana Parente; Altmann, Eduardo G.
2013-05-01
We design a random walk to explore fractal landscapes such as those describing chaotic transients in dynamical systems. We show that the random walk moves efficiently only when its step length depends on the height of the landscape via the largest Lyapunov exponent of the chaotic system. We propose a generalization of the Wang-Landau algorithm which constructs not only the density of states (transient time distribution) but also the correct step length. As a result, we obtain a flat-histogram Monte Carlo method which samples fractal landscapes in polynomial time, a dramatic improvement over the exponential scaling of traditional uniform-sampling methods. Our results are not limited by the dimensionality of the landscape and are confirmed numerically in chaotic systems with up to 30 dimensions.
NASA Astrophysics Data System (ADS)
Lv, Chao; Zheng, Lianqing; Yang, Wei
2012-01-01
Molecular dynamics sampling can be enhanced via the promoting of potential energy fluctuations, for instance, based on a Hamiltonian modified with the addition of a potential-energy-dependent biasing term. To overcome the diffusion sampling issue, which reveals the fact that enlargement of event-irrelevant energy fluctuations may abolish sampling efficiency, the essential energy space random walk (EESRW) approach was proposed earlier. To more effectively accelerate the sampling of solute conformations in aqueous environment, in the current work, we generalized the EESRW method to a two-dimension-EESRW (2D-EESRW) strategy. Specifically, the essential internal energy component of a focused region and the essential interaction energy component between the focused region and the environmental region are employed to define the two-dimensional essential energy space. This proposal is motivated by the general observation that in different conformational events, the two essential energy components have distinctive interplays. Model studies on the alanine dipeptide and the aspartate-arginine peptide demonstrate sampling improvement over the original one-dimension-EESRW strategy; with the same biasing level, the present generalization allows more effective acceleration of the sampling of conformational transitions in aqueous solution. The 2D-EESRW generalization is readily extended to higher dimension schemes and employed in more advanced enhanced-sampling schemes, such as the recent orthogonal space random walk method.
Sampling Methods for Detection and Monitoring of the Asian Citrus Psyllid (Hemiptera: Psyllidae).
Monzo, C; Arevalo, H A; Jones, M M; Vanaclocha, P; Croxton, S D; Qureshi, J A; Stansly, P A
2015-06-01
The Asian citrus psyllid (ACP), Diaphorina citri Kuwayama is a key pest of citrus due to its role as vector of citrus greening disease or "huanglongbing." ACP monitoring is considered an indispensable tool for management of vector and disease. In the present study, datasets collected between 2009 and 2013 from 245 citrus blocks were used to evaluate precision, sensitivity for detection, and efficiency of five sampling methods. The number of samples needed to reach a 0.25 standard error-mean ratio was estimated using Taylor's power law and used to compare precision among sampling methods. Comparison of detection sensitivity and time expenditure (cost) between stem-tap and other sampling methodologies conducted consecutively at the same location were also assessed. Stem-tap sampling was the most efficient sampling method when ACP densities were moderate to high and served as the basis for comparison with all other methods. Protocols that grouped trees near randomly selected locations across the block were more efficient than sampling trees at random across the block. Sweep net sampling was similar to stem-taps in number of captures per sampled unit, but less precise at any ACP density. Yellow sticky traps were 14 times more sensitive than stem-taps but much more time consuming and thus less efficient except at very low population densities. Visual sampling was efficient for detecting and monitoring ACP at low densities. Suction sampling was time consuming and taxing but the most sensitive of all methods for detection of sparse populations. This information can be used to optimize ACP monitoring efforts. © The Authors 2015. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
The fast algorithm of spark in compressive sensing
NASA Astrophysics Data System (ADS)
Xie, Meihua; Yan, Fengxia
2017-01-01
Compressed Sensing (CS) is an advanced theory on signal sampling and reconstruction. In CS theory, the reconstruction condition of signal is an important theory problem, and spark is a good index to study this problem. But the computation of spark is NP hard. In this paper, we study the problem of computing spark. For some special matrixes, for example, the Gaussian random matrix and 0-1 random matrix, we obtain some conclusions. Furthermore, for Gaussian random matrix with fewer rows than columns, we prove that its spark equals to the number of its rows plus one with probability 1. For general matrix, two methods are given to compute its spark. One is the method of directly searching and the other is the method of dual-tree searching. By simulating 24 Gaussian random matrixes and 18 0-1 random matrixes, we tested the computation time of these two methods. Numerical results showed that the dual-tree searching method had higher efficiency than directly searching, especially for those matrixes which has as much as rows and columns.
Freeman, Lindsay M; Pang, Lin; Fainman, Yeshaiahu
2018-05-09
The analysis of DNA has led to revolutionary advancements in the fields of medical diagnostics, genomics, prenatal screening, and forensic science, with the global DNA testing market expected to reach revenues of USD 10.04 billion per year by 2020. However, the current methods for DNA analysis remain dependent on the necessity for fluorophores or conjugated proteins, leading to high costs associated with consumable materials and manual labor. Here, we demonstrate a potential label-free DNA composition detection method using surface-enhanced Raman spectroscopy (SERS) in which we identify the composition of cytosine and adenine within single strands of DNA. This approach depends on the fact that there is one phosphate backbone per nucleotide, which we use as a reference to compensate for systematic measurement variations. We utilize plasmonic nanomaterials with random Raman sampling to perform label-free detection of the nucleotide composition within DNA strands, generating a calibration curve from standard samples of DNA and demonstrating the capability of resolving the nucleotide composition. The work represents an innovative way for detection of the DNA composition within DNA strands without the necessity of attached labels, offering a highly sensitive and reproducible method that factors in random sampling to minimize error.
DOE Office of Scientific and Technical Information (OSTI.GOV)
V Yashchuk; R Conley; E Anderson
Verification of the reliability of metrology data from high quality X-ray optics requires that adequate methods for test and calibration of the instruments be developed. For such verification for optical surface profilometers in the spatial frequency domain, a modulation transfer function (MTF) calibration method based on binary pseudo-random (BPR) gratings and arrays has been suggested [1] and [2] and proven to be an effective calibration method for a number of interferometric microscopes, a phase shifting Fizeau interferometer, and a scatterometer [5]. Here we describe the details of development of binary pseudo-random multilayer (BPRML) test samples suitable for characterization of scanningmore » (SEM) and transmission (TEM) electron microscopes. We discuss the results of TEM measurements with the BPRML test samples fabricated from a WiSi2/Si multilayer coating with pseudo-randomly distributed layers. In particular, we demonstrate that significant information about the metrological reliability of the TEM measurements can be extracted even when the fundamental frequency of the BPRML sample is smaller than the Nyquist frequency of the measurements. The measurements demonstrate a number of problems related to the interpretation of the SEM and TEM data. Note that similar BPRML test samples can be used to characterize X-ray microscopes. Corresponding work with X-ray microscopes is in progress.« less
Alderson-Day, Ben; Fernyhough, Charles
2015-01-01
Inner speech is often reported to be a common and central part of inner experience, but its true prevalence is unclear. Many questionnaire-based measures appear to lack convergent validity and it has been claimed that they overestimate inner speech in comparison to experience sampling methods (which involve collecting data at random timepoints). The present study compared self-reporting of inner speech collected via a general questionnaire and experience sampling, using data from a custom-made smartphone app (Inner Life). Fifty-one university students completed a generalized self-report measure of inner speech (the Varieties of Inner Speech Questionnaire, VISQ) and responded to at least seven random alerts to report on incidences of inner speech over a 2-week period. Correlations and pairwise comparisons were used to compare generalized endorsements and randomly sampled scores for each VISQ subscale. Significant correlations were observed between general and randomly sampled measures for only two of the four VISQ subscales, and endorsements of inner speech with evaluative or motivational characteristics did not correlate at all across different measures. Endorsement of inner speech items was significantly lower for random sampling compared to generalized self-report, for all VISQ subscales. Exploratory analysis indicated that specific inner speech characteristics were also related to anxiety and future-oriented thinking. PMID:25964773
A fast ergodic algorithm for generating ensembles of equilateral random polygons
NASA Astrophysics Data System (ADS)
Varela, R.; Hinson, K.; Arsuaga, J.; Diao, Y.
2009-03-01
Knotted structures are commonly found in circular DNA and along the backbone of certain proteins. In order to properly estimate properties of these three-dimensional structures it is often necessary to generate large ensembles of simulated closed chains (i.e. polygons) of equal edge lengths (such polygons are called equilateral random polygons). However finding efficient algorithms that properly sample the space of equilateral random polygons is a difficult problem. Currently there are no proven algorithms that generate equilateral random polygons with its theoretical distribution. In this paper we propose a method that generates equilateral random polygons in a 'step-wise uniform' way. We prove that this method is ergodic in the sense that any given equilateral random polygon can be generated by this method and we show that the time needed to generate an equilateral random polygon of length n is linear in terms of n. These two properties make this algorithm a big improvement over the existing generating methods. Detailed numerical comparisons of our algorithm with other widely used algorithms are provided.
ERIC Educational Resources Information Center
Bockenholt, Ulf; Van Der Heijden, Peter G. M.
2007-01-01
Randomized response (RR) is a well-known method for measuring sensitive behavior. Yet this method is not often applied because: (i) of its lower efficiency and the resulting need for larger sample sizes which make applications of RR costly; (ii) despite its privacy-protection mechanism the RR design may not be followed by every respondent; and…
Community of Inquiry Method and Language Skills Acquisition: Empirical Evidence
ERIC Educational Resources Information Center
Preece, Abdul Shakhour Duncan
2015-01-01
The study investigates the effectiveness of community of inquiry method in preparing students to develop listening and speaking skills in a sample of junior secondary school students in Borno state, Nigeria. A sample of 100 students in standard classes was drawn in one secondary school in Maiduguri metropolis through stratified random sampling…
Gradient-free MCMC methods for dynamic causal modelling.
Sengupta, Biswa; Friston, Karl J; Penny, Will D
2015-05-15
In this technical note we compare the performance of four gradient-free MCMC samplers (random walk Metropolis sampling, slice-sampling, adaptive MCMC sampling and population-based MCMC sampling with tempering) in terms of the number of independent samples they can produce per unit computational time. For the Bayesian inversion of a single-node neural mass model, both adaptive and population-based samplers are more efficient compared with random walk Metropolis sampler or slice-sampling; yet adaptive MCMC sampling is more promising in terms of compute time. Slice-sampling yields the highest number of independent samples from the target density - albeit at almost 1000% increase in computational time, in comparison to the most efficient algorithm (i.e., the adaptive MCMC sampler). Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Pillemer, Karl; Meador, Rhoda; Henderson, Charles, Jr.; Robison, Julie; Hegeman, Carol; Graham, Edwin; Schultz, Leslie
2008-01-01
Purpose: This article reports on a randomized, controlled intervention study designed to reduce employee turnover by creating a retention specialist position in nursing homes. Design and Methods: We collected data three times over a 1-year period in 30 nursing homes, sampled in stratified random manner from facilities in New York State and…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Yonggang, E-mail: wangyg@ustc.edu.cn; Hui, Cong; Liu, Chong
The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving,more » so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.« less
Wang, Yonggang; Hui, Cong; Liu, Chong; Xu, Chao
2016-04-01
The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving, so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.
Unsupervised Bayesian linear unmixing of gene expression microarrays.
Bazot, Cécile; Dobigeon, Nicolas; Tourneret, Jean-Yves; Zaas, Aimee K; Ginsburg, Geoffrey S; Hero, Alfred O
2013-03-19
This paper introduces a new constrained model and the corresponding algorithm, called unsupervised Bayesian linear unmixing (uBLU), to identify biological signatures from high dimensional assays like gene expression microarrays. The basis for uBLU is a Bayesian model for the data samples which are represented as an additive mixture of random positive gene signatures, called factors, with random positive mixing coefficients, called factor scores, that specify the relative contribution of each signature to a specific sample. The particularity of the proposed method is that uBLU constrains the factor loadings to be non-negative and the factor scores to be probability distributions over the factors. Furthermore, it also provides estimates of the number of factors. A Gibbs sampling strategy is adopted here to generate random samples according to the posterior distribution of the factors, factor scores, and number of factors. These samples are then used to estimate all the unknown parameters. Firstly, the proposed uBLU method is applied to several simulated datasets with known ground truth and compared with previous factor decomposition methods, such as principal component analysis (PCA), non negative matrix factorization (NMF), Bayesian factor regression modeling (BFRM), and the gradient-based algorithm for general matrix factorization (GB-GMF). Secondly, we illustrate the application of uBLU on a real time-evolving gene expression dataset from a recent viral challenge study in which individuals have been inoculated with influenza A/H3N2/Wisconsin. We show that the uBLU method significantly outperforms the other methods on the simulated and real data sets considered here. The results obtained on synthetic and real data illustrate the accuracy of the proposed uBLU method when compared to other factor decomposition methods from the literature (PCA, NMF, BFRM, and GB-GMF). The uBLU method identifies an inflammatory component closely associated with clinical symptom scores collected during the study. Using a constrained model allows recovery of all the inflammatory genes in a single factor.
Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much.
He, Bryan; De Sa, Christopher; Mitliagkas, Ioannis; Ré, Christopher
2016-01-01
Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been conjectured that the mixing times of random scan and systematic scan do not differ by more than a logarithmic factor, we show by counterexample that this is not the case, and we prove that that the mixing times do not differ by more than a polynomial factor under mild conditions. To prove these relative bounds, we introduce a method of augmenting the state space to study systematic scan using conductance.
Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much
He, Bryan; De Sa, Christopher; Mitliagkas, Ioannis; Ré, Christopher
2016-01-01
Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been conjectured that the mixing times of random scan and systematic scan do not differ by more than a logarithmic factor, we show by counterexample that this is not the case, and we prove that that the mixing times do not differ by more than a polynomial factor under mild conditions. To prove these relative bounds, we introduce a method of augmenting the state space to study systematic scan using conductance. PMID:28344429
Geostatistical Sampling Methods for Efficient Uncertainty Analysis in Flow and Transport Problems
NASA Astrophysics Data System (ADS)
Liodakis, Stylianos; Kyriakidis, Phaedon; Gaganis, Petros
2015-04-01
In hydrogeological applications involving flow and transport of in heterogeneous porous media the spatial distribution of hydraulic conductivity is often parameterized in terms of a lognormal random field based on a histogram and variogram model inferred from data and/or synthesized from relevant knowledge. Realizations of simulated conductivity fields are then generated using geostatistical simulation involving simple random (SR) sampling and are subsequently used as inputs to physically-based simulators of flow and transport in a Monte Carlo framework for evaluating the uncertainty in the spatial distribution of solute concentration due to the uncertainty in the spatial distribution of hydraulic con- ductivity [1]. Realistic uncertainty analysis, however, calls for a large number of simulated concentration fields; hence, can become expensive in terms of both time and computer re- sources. A more efficient alternative to SR sampling is Latin hypercube (LH) sampling, a special case of stratified random sampling, which yields a more representative distribution of simulated attribute values with fewer realizations [2]. Here, term representative implies realizations spanning efficiently the range of possible conductivity values corresponding to the lognormal random field. In this work we investigate the efficiency of alternative methods to classical LH sampling within the context of simulation of flow and transport in a heterogeneous porous medium. More precisely, we consider the stratified likelihood (SL) sampling method of [3], in which attribute realizations are generated using the polar simulation method by exploring the geometrical properties of the multivariate Gaussian distribution function. In addition, we propose a more efficient version of the above method, here termed minimum energy (ME) sampling, whereby a set of N representative conductivity realizations at M locations is constructed by: (i) generating a representative set of N points distributed on the surface of a M-dimensional, unit radius hyper-sphere, (ii) relocating the N points on a representative set of N hyper-spheres of different radii, and (iii) transforming the coordinates of those points to lie on N different hyper-ellipsoids spanning the multivariate Gaussian distribution. The above method is applied in a dimensionality reduction context by defining flow-controlling points over which representative sampling of hydraulic conductivity is performed, thus also accounting for the sensitivity of the flow and transport model to the input hydraulic conductivity field. The performance of the various stratified sampling methods, LH, SL, and ME, is compared to that of SR sampling in terms of reproduction of ensemble statistics of hydraulic conductivity and solute concentration for different sample sizes N (numbers of realizations). The results indicate that ME sampling constitutes an equally if not more efficient simulation method than LH and SL sampling, as it can reproduce to a similar extent statistics of the conductivity and concentration fields, yet with smaller sampling variability than SR sampling. References [1] Gutjahr A.L. and Bras R.L. Spatial variability in subsurface flow and transport: A review. Reliability Engineering & System Safety, 42, 293-316, (1993). [2] Helton J.C. and Davis F.J. Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems. Reliability Engineering & System Safety, 81, 23-69, (2003). [3] Switzer P. Multiple simulation of spatial fields. In: Heuvelink G, Lemmens M (eds) Proceedings of the 4th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Coronet Books Inc., pp 629?635 (2000).
An integrate-over-temperature approach for enhanced sampling.
Gao, Yi Qin
2008-02-14
A simple method is introduced to achieve efficient random walking in the energy space in molecular dynamics simulations which thus enhances the sampling over a large energy range. The approach is closely related to multicanonical and replica exchange simulation methods in that it allows configurations of the system to be sampled in a wide energy range by making use of Boltzmann distribution functions at multiple temperatures. A biased potential is quickly generated using this method and is then used in accelerated molecular dynamics simulations.
NASA Astrophysics Data System (ADS)
Kawakami, Shun; Sasaki, Toshihiko; Koashi, Masato
2017-07-01
An essential step in quantum key distribution is the estimation of parameters related to the leaked amount of information, which is usually done by sampling of the communication data. When the data size is finite, the final key rate depends on how the estimation process handles statistical fluctuations. Many of the present security analyses are based on the method with simple random sampling, where hypergeometric distribution or its known bounds are used for the estimation. Here we propose a concise method based on Bernoulli sampling, which is related to binomial distribution. Our method is suitable for the Bennett-Brassard 1984 (BB84) protocol with weak coherent pulses [C. H. Bennett and G. Brassard, Proceedings of the IEEE Conference on Computers, Systems and Signal Processing (IEEE, New York, 1984), Vol. 175], reducing the number of estimated parameters to achieve a higher key generation rate compared to the method with simple random sampling. We also apply the method to prove the security of the differential-quadrature-phase-shift (DQPS) protocol in the finite-key regime. The result indicates that the advantage of the DQPS protocol over the phase-encoding BB84 protocol in terms of the key rate, which was previously confirmed in the asymptotic regime, persists in the finite-key regime.
Distributed fiber sparse-wideband vibration sensing by sub-Nyquist additive random sampling
NASA Astrophysics Data System (ADS)
Zhang, Jingdong; Zheng, Hua; Zhu, Tao; Yin, Guolu; Liu, Min; Bai, Yongzhong; Qu, Dingrong; Qiu, Feng; Huang, Xianbing
2018-05-01
The round trip time of the light pulse limits the maximum detectable vibration frequency response range of phase-sensitive optical time domain reflectometry ({\\phi}-OTDR). Unlike the uniform laser pulse interval in conventional {\\phi}-OTDR, we randomly modulate the pulse interval, so that an equivalent sub-Nyquist additive random sampling (sNARS) is realized for every sensing point of the long interrogation fiber. For an {\\phi}-OTDR system with 10 km sensing length, the sNARS method is optimized by theoretical analysis and Monte Carlo simulation, and the experimental results verify that a wide-band spars signal can be identified and reconstructed. Such a method can broaden the vibration frequency response range of {\\phi}-OTDR, which is of great significance in sparse-wideband-frequency vibration signal detection, such as rail track monitoring and metal defect detection.
A new sampling scheme for developing metamodels with the zeros of Chebyshev polynomials
NASA Astrophysics Data System (ADS)
Wu, Jinglai; Luo, Zhen; Zhang, Nong; Zhang, Yunqing
2015-09-01
The accuracy of metamodelling is determined by both the sampling and approximation. This article proposes a new sampling method based on the zeros of Chebyshev polynomials to capture the sampling information effectively. First, the zeros of one-dimensional Chebyshev polynomials are applied to construct Chebyshev tensor product (CTP) sampling, and the CTP is then used to construct high-order multi-dimensional metamodels using the 'hypercube' polynomials. Secondly, the CTP sampling is further enhanced to develop Chebyshev collocation method (CCM) sampling, to construct the 'simplex' polynomials. The samples of CCM are randomly and directly chosen from the CTP samples. Two widely studied sampling methods, namely the Smolyak sparse grid and Hammersley, are used to demonstrate the effectiveness of the proposed sampling method. Several numerical examples are utilized to validate the approximation accuracy of the proposed metamodel under different dimensions.
[Exploration of the concept of genetic drift in genetics teaching of undergraduates].
Wang, Chun-ming
2016-01-01
Genetic drift is one of the difficulties in teaching genetics due to its randomness and probability which could easily cause conceptual misunderstanding. The “sampling error" in its definition is often misunderstood because of the research method of “sampling", which disturbs the results and causes the random changes in allele frequency. I analyzed and compared the definitions of genetic drift in domestic and international genetic textbooks, and found that the definitions containing “sampling error" are widely adopted but are interpreted correctly in only a few textbooks. Here, the history of research on genetic drift, i.e., the contributions of Wright, Fisher and Kimura, is introduced. Moreover, I particularly describe two representative articles recently published about genetic drift teaching of undergraduates, which point out that misconceptions are inevitable for undergraduates during the studying process and also provide a preliminary solution. Combined with my own teaching practice, I suggest that the definition of genetic drift containing “sampling error" can be adopted with further interpretation, i.e., “sampling error" is random sampling among gametes when generating the next generation of alleles which is equivalent to a random sampling of all gametes participating in mating in gamete pool and has no relationship with artificial sampling in general genetics studies. This article may provide some help in genetics teaching.
Scott, JoAnna M; deCamp, Allan; Juraska, Michal; Fay, Michael P; Gilbert, Peter B
2017-04-01
Stepped wedge designs are increasingly commonplace and advantageous for cluster randomized trials when it is both unethical to assign placebo, and it is logistically difficult to allocate an intervention simultaneously to many clusters. We study marginal mean models fit with generalized estimating equations for assessing treatment effectiveness in stepped wedge cluster randomized trials. This approach has advantages over the more commonly used mixed models that (1) the population-average parameters have an important interpretation for public health applications and (2) they avoid untestable assumptions on latent variable distributions and avoid parametric assumptions about error distributions, therefore, providing more robust evidence on treatment effects. However, cluster randomized trials typically have a small number of clusters, rendering the standard generalized estimating equation sandwich variance estimator biased and highly variable and hence yielding incorrect inferences. We study the usual asymptotic generalized estimating equation inferences (i.e., using sandwich variance estimators and asymptotic normality) and four small-sample corrections to generalized estimating equation for stepped wedge cluster randomized trials and for parallel cluster randomized trials as a comparison. We show by simulation that the small-sample corrections provide improvement, with one correction appearing to provide at least nominal coverage even with only 10 clusters per group. These results demonstrate the viability of the marginal mean approach for both stepped wedge and parallel cluster randomized trials. We also study the comparative performance of the corrected methods for stepped wedge and parallel designs, and describe how the methods can accommodate interval censoring of individual failure times and incorporate semiparametric efficient estimators.
Clagett, Bartholt; Nathanson, Katherine L.; Ciosek, Stephanie L.; McDermoth, Monique; Vaughn, David J.; Mitra, Nandita; Weiss, Andrew; Martonik, Rachel; Kanetsky, Peter A.
2013-01-01
Random-digit dialing (RDD) using landline telephone numbers is the historical gold standard for control recruitment in population-based epidemiologic research. However, increasing cell-phone usage and diminishing response rates suggest that the effectiveness of RDD in recruiting a random sample of the general population, particularly for younger target populations, is decreasing. In this study, we compared landline RDD with alternative methods of control recruitment, including RDD using cell-phone numbers and address-based sampling (ABS), to recruit primarily white men aged 18–55 years into a study of testicular cancer susceptibility conducted in the Philadelphia, Pennsylvania, metropolitan area between 2009 and 2012. With few exceptions, eligible and enrolled controls recruited by means of RDD and ABS were similar with regard to characteristics for which data were collected on the screening survey. While we find ABS to be a comparably effective method of recruiting young males compared with landline RDD, we acknowledge the potential impact that selection bias may have had on our results because of poor overall response rates, which ranged from 11.4% for landline RDD to 1.7% for ABS. PMID:24008901
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yashchuk, V.V.; Conley, R.; Anderson, E.H.
Verification of the reliability of metrology data from high quality X-ray optics requires that adequate methods for test and calibration of the instruments be developed. For such verification for optical surface profilometers in the spatial frequency domain, a modulation transfer function (MTF) calibration method based on binarypseudo-random (BPR) gratings and arrays has been suggested and and proven to be an effective calibration method for a number of interferometric microscopes, a phase shifting Fizeau interferometer, and a scatterometer. Here we describe the details of development of binarypseudo-random multilayer (BPRML) test samples suitable for characterization of scanning (SEM) and transmission (TEM) electronmore » microscopes. We discuss the results of TEM measurements with the BPRML test samples fabricated from a WiSi{sub 2}/Si multilayer coating with pseudo-randomly distributed layers. In particular, we demonstrate that significant information about the metrological reliability of the TEM measurements can be extracted even when the fundamental frequency of the BPRML sample is smaller than the Nyquist frequency of the measurements. The measurements demonstrate a number of problems related to the interpretation of the SEM and TEM data. Note that similar BPRML testsamples can be used to characterize X-ray microscopes. Corresponding work with X-ray microscopes is in progress.« less
Investigation of spectral analysis techniques for randomly sampled velocimetry data
NASA Technical Reports Server (NTRS)
Sree, Dave
1993-01-01
It is well known that velocimetry (LV) generates individual realization velocity data that are randomly or unevenly sampled in time. Spectral analysis of such data to obtain the turbulence spectra, and hence turbulence scales information, requires special techniques. The 'slotting' technique of Mayo et al, also described by Roberts and Ajmani, and the 'Direct Transform' method of Gaster and Roberts are well known in the LV community. The slotting technique is faster than the direct transform method in computation. There are practical limitations, however, as to how a high frequency and accurate estimate can be made for a given mean sampling rate. These high frequency estimates are important in obtaining the microscale information of turbulence structure. It was found from previous studies that reliable spectral estimates can be made up to about the mean sampling frequency (mean data rate) or less. If the data were evenly samples, the frequency range would be half the sampling frequency (i.e. up to Nyquist frequency); otherwise, aliasing problem would occur. The mean data rate and the sample size (total number of points) basically limit the frequency range. Also, there are large variabilities or errors associated with the high frequency estimates from randomly sampled signals. Roberts and Ajmani proposed certain pre-filtering techniques to reduce these variabilities, but at the cost of low frequency estimates. The prefiltering acts as a high-pass filter. Further, Shapiro and Silverman showed theoretically that, for Poisson sampled signals, it is possible to obtain alias-free spectral estimates far beyond the mean sampling frequency. But the question is, how far? During his tenure under 1993 NASA-ASEE Summer Faculty Fellowship Program, the author investigated from his studies on the spectral analysis techniques for randomly sampled signals that the spectral estimates can be enhanced or improved up to about 4-5 times the mean sampling frequency by using a suitable prefiltering technique. But, this increased bandwidth comes at the cost of the lower frequency estimates. The studies further showed that large data sets of the order of 100,000 points, or more, high data rates, and Poisson sampling are very crucial for obtaining reliable spectral estimates from randomly sampled data, such as LV data. Some of the results of the current study are presented.
Sales, Jessica M.; Latham, Teaniese P.; DiClemente, Ralph J.; Rose, Eve
2013-01-01
Objectives To characterize dual-method protection users and report the prevalence of dual-method use among young adult African American women residing in the Southeastern United States. Design Analysis of baseline data from a randomized controlled trial. Setting A clinic-based sample of young women enrolled in a randomized trial of a human immunodeficiency virus (HIV)–prevention program in Atlanta, Georgia, from June 2005 to June 2007. Participants African American women aged 14 to 20 years who reported unprotected sexual activity in the past 6months. Of the eligible adolescents, 94% (N=701) were enrolled in the study and completed baseline assessments. Outcome Measures Dual-method protection use as well as sociodemographic, individual-level, interpersonal-level, and community-level factors and interpersonal communication skills. Only data from the baseline assessment, before randomization, were used for the analysis. Results A total of 102 participants (14.6%) were classified as dual-method protection users. After controlling for age and clinic, significant differences between dual-method users and non–dual-method users were found for impulsivity, self-esteem, social support, relationship style, partner communication self-efficacy, and fear of condom negotiation. Conclusions Dual-method protection use is low. Identification of factors that differentiate dual-method users from non–dual-method users at the individual, interpersonal, and community levels in this young African American sample suggests that HIV, sexually transmitted disease, and unintended pregnancy risk–reduction programs should address factors at each level, not simply the individual level, and that this may involve structural and/or clinical counseling practice changes in clinics that serve young women, to optimally facilitate dual-method protection use among young African American women in the Southeastern United States. PMID:21135341
ERIC Educational Resources Information Center
Yalcin, Sinan
2016-01-01
In this study it was aimed to determine the relationship between teachers' positive psychological capital levels and organisational commitment. The study was conducted as a correlational survey which is one of the quantitative methods. The sample group consists of 244 teachers selected by using random sampling method among 1270 teachers working in…
Auxiliary Parameter MCMC for Exponential Random Graph Models
NASA Astrophysics Data System (ADS)
Byshkin, Maksym; Stivala, Alex; Mira, Antonietta; Krause, Rolf; Robins, Garry; Lomi, Alessandro
2016-11-01
Exponential random graph models (ERGMs) are a well-established family of statistical models for analyzing social networks. Computational complexity has so far limited the appeal of ERGMs for the analysis of large social networks. Efficient computational methods are highly desirable in order to extend the empirical scope of ERGMs. In this paper we report results of a research project on the development of snowball sampling methods for ERGMs. We propose an auxiliary parameter Markov chain Monte Carlo (MCMC) algorithm for sampling from the relevant probability distributions. The method is designed to decrease the number of allowed network states without worsening the mixing of the Markov chains, and suggests a new approach for the developments of MCMC samplers for ERGMs. We demonstrate the method on both simulated and actual (empirical) network data and show that it reduces CPU time for parameter estimation by an order of magnitude compared to current MCMC methods.
Dynamic laser speckle analyzed considering inhomogeneities in the biological sample
NASA Astrophysics Data System (ADS)
Braga, Roberto A.; González-Peña, Rolando J.; Viana, Dimitri Campos; Rivera, Fernando Pujaico
2017-04-01
Dynamic laser speckle phenomenon allows a contactless and nondestructive way to monitor biological changes that are quantified by second-order statistics applied in the images in time using a secondary matrix known as time history of the speckle pattern (THSP). To avoid being time consuming, the traditional way to build the THSP restricts the data to a line or column. Our hypothesis is that the spatial restriction of the information could compromise the results, particularly when undesirable and unexpected optical inhomogeneities occur, such as in cell culture media. It tested a spatial random approach to collect the points to form a THSP. Cells in a culture medium and in drying paint, representing homogeneous samples in different levels, were tested, and a comparison with the traditional method was carried out. An alternative random selection based on a Gaussian distribution around a desired position was also presented. The results showed that the traditional protocol presented higher variation than the outcomes using the random method. The higher the inhomogeneity of the activity map, the higher the efficiency of the proposed method using random points. The Gaussian distribution proved to be useful when there was a well-defined area to monitor.
Single-Phase Mail Survey Design for Rare Population Subgroups
ERIC Educational Resources Information Center
Brick, J. Michael; Andrews, William R.; Mathiowetz, Nancy A.
2016-01-01
Although using random digit dialing (RDD) telephone samples was the preferred method for conducting surveys of households for many years, declining response and coverage rates have led researchers to explore alternative approaches. The use of address-based sampling (ABS) has been examined for sampling the general population and subgroups, most…
Fearon, Elizabeth; Chabata, Sungai T; Thompson, Jennifer A; Cowan, Frances M; Hargreaves, James R
2017-09-14
While guidance exists for obtaining population size estimates using multiplier methods with respondent-driven sampling surveys, we lack specific guidance for making sample size decisions. To guide the design of multiplier method population size estimation studies using respondent-driven sampling surveys to reduce the random error around the estimate obtained. The population size estimate is obtained by dividing the number of individuals receiving a service or the number of unique objects distributed (M) by the proportion of individuals in a representative survey who report receipt of the service or object (P). We have developed an approach to sample size calculation, interpreting methods to estimate the variance around estimates obtained using multiplier methods in conjunction with research into design effects and respondent-driven sampling. We describe an application to estimate the number of female sex workers in Harare, Zimbabwe. There is high variance in estimates. Random error around the size estimate reflects uncertainty from M and P, particularly when the estimate of P in the respondent-driven sampling survey is low. As expected, sample size requirements are higher when the design effect of the survey is assumed to be greater. We suggest a method for investigating the effects of sample size on the precision of a population size estimate obtained using multipler methods and respondent-driven sampling. Uncertainty in the size estimate is high, particularly when P is small, so balancing against other potential sources of bias, we advise researchers to consider longer service attendance reference periods and to distribute more unique objects, which is likely to result in a higher estimate of P in the respondent-driven sampling survey. ©Elizabeth Fearon, Sungai T Chabata, Jennifer A Thompson, Frances M Cowan, James R Hargreaves. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 14.09.2017.
2013-01-01
Background Traditional Lot Quality Assurance Sampling (LQAS) designs assume observations are collected using simple random sampling. Alternatively, randomly sampling clusters of observations and then individuals within clusters reduces costs but decreases the precision of the classifications. In this paper, we develop a general framework for designing the cluster(C)-LQAS system and illustrate the method with the design of data quality assessments for the community health worker program in Rwanda. Results To determine sample size and decision rules for C-LQAS, we use the beta-binomial distribution to account for inflated risk of errors introduced by sampling clusters at the first stage. We present general theory and code for sample size calculations. The C-LQAS sample sizes provided in this paper constrain misclassification risks below user-specified limits. Multiple C-LQAS systems meet the specified risk requirements, but numerous considerations, including per-cluster versus per-individual sampling costs, help identify optimal systems for distinct applications. Conclusions We show the utility of C-LQAS for data quality assessments, but the method generalizes to numerous applications. This paper provides the necessary technical detail and supplemental code to support the design of C-LQAS for specific programs. PMID:24160725
Hedt-Gauthier, Bethany L; Mitsunaga, Tisha; Hund, Lauren; Olives, Casey; Pagano, Marcello
2013-10-26
Traditional Lot Quality Assurance Sampling (LQAS) designs assume observations are collected using simple random sampling. Alternatively, randomly sampling clusters of observations and then individuals within clusters reduces costs but decreases the precision of the classifications. In this paper, we develop a general framework for designing the cluster(C)-LQAS system and illustrate the method with the design of data quality assessments for the community health worker program in Rwanda. To determine sample size and decision rules for C-LQAS, we use the beta-binomial distribution to account for inflated risk of errors introduced by sampling clusters at the first stage. We present general theory and code for sample size calculations.The C-LQAS sample sizes provided in this paper constrain misclassification risks below user-specified limits. Multiple C-LQAS systems meet the specified risk requirements, but numerous considerations, including per-cluster versus per-individual sampling costs, help identify optimal systems for distinct applications. We show the utility of C-LQAS for data quality assessments, but the method generalizes to numerous applications. This paper provides the necessary technical detail and supplemental code to support the design of C-LQAS for specific programs.
NASA Technical Reports Server (NTRS)
Poole, L. R.
1974-01-01
A study was conducted of an alternate method for storage and use of bathymetry data in the Langley Research Center and Virginia Institute of Marine Science mid-Atlantic continental-shelf wave-refraction computer program. The regional bathymetry array was divided into 105 indexed modules which can be read individually into memory in a nonsequential manner from a peripheral file using special random-access subroutines. In running a sample refraction case, a 75-percent decrease in program field length was achieved by using the random-access storage method in comparison with the conventional method of total regional array storage. This field-length decrease was accompanied by a comparative 5-percent increase in central processing time and a 477-percent increase in the number of operating-system calls. A comparative Langley Research Center computer system cost savings of 68 percent was achieved by using the random-access storage method.
Direct generation of all-optical random numbers from optical pulse amplitude chaos.
Li, Pu; Wang, Yun-Cai; Wang, An-Bang; Yang, Ling-Zhen; Zhang, Ming-Jiang; Zhang, Jian-Zhong
2012-02-13
We propose and theoretically demonstrate an all-optical method for directly generating all-optical random numbers from pulse amplitude chaos produced by a mode-locked fiber ring laser. Under an appropriate pump intensity, the mode-locked laser can experience a quasi-periodic route to chaos. Such a chaos consists of a stream of pulses with a fixed repetition frequency but random intensities. In this method, we do not require sampling procedure and external triggered clocks but directly quantize the chaotic pulses stream into random number sequence via an all-optical flip-flop. Moreover, our simulation results show that the pulse amplitude chaos has no periodicity and possesses a highly symmetric distribution of amplitude. Thus, in theory, the obtained random number sequence without post-processing has a high-quality randomness verified by industry-standard statistical tests.
Sparsely sampling the sky: Regular vs. random sampling
NASA Astrophysics Data System (ADS)
Paykari, P.; Pires, S.; Starck, J.-L.; Jaffe, A. H.
2015-09-01
Aims: The next generation of galaxy surveys, aiming to observe millions of galaxies, are expensive both in time and money. This raises questions regarding the optimal investment of this time and money for future surveys. In a previous work, we have shown that a sparse sampling strategy could be a powerful substitute for the - usually favoured - contiguous observation of the sky. In our previous paper, regular sparse sampling was investigated, where the sparse observed patches were regularly distributed on the sky. The regularity of the mask introduces a periodic pattern in the window function, which induces periodic correlations at specific scales. Methods: In this paper, we use a Bayesian experimental design to investigate a "random" sparse sampling approach, where the observed patches are randomly distributed over the total sparsely sampled area. Results: We find that in this setting, the induced correlation is evenly distributed amongst all scales as there is no preferred scale in the window function. Conclusions: This is desirable when we are interested in any specific scale in the galaxy power spectrum, such as the matter-radiation equality scale. As the figure of merit shows, however, there is no preference between regular and random sampling to constrain the overall galaxy power spectrum and the cosmological parameters.
Methodological Reporting of Randomized Trials in Five Leading Chinese Nursing Journals
Shi, Chunhu; Tian, Jinhui; Ren, Dan; Wei, Hongli; Zhang, Lihuan; Wang, Quan; Yang, Kehu
2014-01-01
Background Randomized controlled trials (RCTs) are not always well reported, especially in terms of their methodological descriptions. This study aimed to investigate the adherence of methodological reporting complying with CONSORT and explore associated trial level variables in the Chinese nursing care field. Methods In June 2012, we identified RCTs published in five leading Chinese nursing journals and included trials with details of randomized methods. The quality of methodological reporting was measured through the methods section of the CONSORT checklist and the overall CONSORT methodological items score was calculated and expressed as a percentage. Meanwhile, we hypothesized that some general and methodological characteristics were associated with reporting quality and conducted a regression with these data to explore the correlation. The descriptive and regression statistics were calculated via SPSS 13.0. Results In total, 680 RCTs were included. The overall CONSORT methodological items score was 6.34±0.97 (Mean ± SD). No RCT reported descriptions and changes in “trial design,” changes in “outcomes” and “implementation,” or descriptions of the similarity of interventions for “blinding.” Poor reporting was found in detailing the “settings of participants” (13.1%), “type of randomization sequence generation” (1.8%), calculation methods of “sample size” (0.4%), explanation of any interim analyses and stopping guidelines for “sample size” (0.3%), “allocation concealment mechanism” (0.3%), additional analyses in “statistical methods” (2.1%), and targeted subjects and methods of “blinding” (5.9%). More than 50% of trials described randomization sequence generation, the eligibility criteria of “participants,” “interventions,” and definitions of the “outcomes” and “statistical methods.” The regression analysis found that publication year and ITT analysis were weakly associated with CONSORT score. Conclusions The completeness of methodological reporting of RCTs in the Chinese nursing care field is poor, especially with regard to the reporting of trial design, changes in outcomes, sample size calculation, allocation concealment, blinding, and statistical methods. PMID:25415382
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yashchuk, Valeriy V; Conley, Raymond; Anderson, Erik H
Verification of the reliability of metrology data from high quality x-ray optics requires that adequate methods for test and calibration of the instruments be developed. For such verification for optical surface profilometers in the spatial frequency domain, a modulation transfer function (MTF) calibration method based on binary pseudo-random (BPR) gratings and arrays has been suggested [Proc. SPIE 7077-7 (2007), Opt. Eng. 47(7), 073602-1-5 (2008)} and proven to be an effective calibration method for a number of interferometric microscopes, a phase shifting Fizeau interferometer, and a scatterometer [Nucl. Instr. and Meth. A 616, 172-82 (2010)]. Here we describe the details ofmore » development of binary pseudo-random multilayer (BPRML) test samples suitable for characterization of scanning (SEM) and transmission (TEM) electron microscopes. We discuss the results of TEM measurements with the BPRML test samples fabricated from a WiSi2/Si multilayer coating with pseudo randomly distributed layers. In particular, we demonstrate that significant information about the metrological reliability of the TEM measurements can be extracted even when the fundamental frequency of the BPRML sample is smaller than the Nyquist frequency of the measurements. The measurements demonstrate a number of problems related to the interpretation of the SEM and TEM data. Note that similar BPRML test samples can be used to characterize x-ray microscopes. Corresponding work with x-ray microscopes is in progress.« less
NASA Astrophysics Data System (ADS)
Liao, Kaihua; Zhou, Zhiwen; Lai, Xiaoming; Zhu, Qing; Feng, Huihui
2017-04-01
The identification of representative soil moisture sampling sites is important for the validation of remotely sensed mean soil moisture in a certain area and ground-based soil moisture measurements in catchment or hillslope hydrological studies. Numerous approaches have been developed to identify optimal sites for predicting mean soil moisture. Each method has certain advantages and disadvantages, but they have rarely been evaluated and compared. In our study, surface (0-20 cm) soil moisture data from January 2013 to March 2016 (a total of 43 sampling days) were collected at 77 sampling sites on a mixed land-use (tea and bamboo) hillslope in the hilly area of Taihu Lake Basin, China. A total of 10 methods (temporal stability (TS) analyses based on 2 indices, K-means clustering based on 6 kinds of inputs and 2 random sampling strategies) were evaluated for determining optimal sampling sites for mean soil moisture estimation. They were TS analyses based on the smallest index of temporal stability (ITS, a combination of the mean relative difference and standard deviation of relative difference (SDRD)) and based on the smallest SDRD, K-means clustering based on soil properties and terrain indices (EFs), repeated soil moisture measurements (Theta), EFs plus one-time soil moisture data (EFsTheta), and the principal components derived from EFs (EFs-PCA), Theta (Theta-PCA), and EFsTheta (EFsTheta-PCA), and global and stratified random sampling strategies. Results showed that the TS based on the smallest ITS was better (RMSE = 0.023 m3 m-3) than that based on the smallest SDRD (RMSE = 0.034 m3 m-3). The K-means clustering based on EFsTheta (-PCA) was better (RMSE <0.020 m3 m-3) than these based on EFs (-PCA) and Theta (-PCA). The sampling design stratified by the land use was more efficient than the global random method. Forty and 60 sampling sites are needed for stratified sampling and global sampling respectively to make their performances comparable to the best K-means method (EFsTheta-PCA). Overall, TS required only one site, but its accuracy was limited. The best K-means method required <8 sites and yielded high accuracy, but extra soil and terrain information is necessary when using this method. The stratified sampling strategy can only be used if no pre-knowledge about soil moisture variation is available. This information will help in selecting the optimal methods for estimation the area mean soil moisture.
Using Non-experimental Data to Estimate Treatment Effects
Stuart, Elizabeth A.; Marcus, Sue M.; Horvitz-Lennon, Marcela V.; Gibbons, Robert D.; Normand, Sharon-Lise T.
2009-01-01
While much psychiatric research is based on randomized controlled trials (RCTs), where patients are randomly assigned to treatments, sometimes RCTs are not feasible. This paper describes propensity score approaches, which are increasingly used for estimating treatment effects in non-experimental settings. The primary goal of propensity score methods is to create sets of treated and comparison subjects who look as similar as possible, in essence replicating a randomized experiment, at least with respect to observed patient characteristics. A study to estimate the metabolic effects of antipsychotic medication in a sample of Florida Medicaid beneficiaries with schizophrenia illustrates methods. PMID:20563313
R. A. Fisher and his advocacy of randomization.
Hall, Nancy S
2007-01-01
The requirement of randomization in experimental design was first stated by R. A. Fisher, statistician and geneticist, in 1925 in his book Statistical Methods for Research Workers. Earlier designs were systematic and involved the judgment of the experimenter; this led to possible bias and inaccurate interpretation of the data. Fisher's dictum was that randomization eliminates bias and permits a valid test of significance. Randomization in experimenting had been used by Charles Sanders Peirce in 1885 but the practice was not continued. Fisher developed his concepts of randomizing as he considered the mathematics of small samples, in discussions with "Student," William Sealy Gosset. Fisher published extensively. His principles of experimental design were spread worldwide by the many "voluntary workers" who came from other institutions to Rothamsted Agricultural Station in England to learn Fisher's methods.
ERIC Educational Resources Information Center
Livingston, Samuel A.; Kim, Sooyeon
2010-01-01
A series of resampling studies investigated the accuracy of equating by four different methods in a random groups equating design with samples of 400, 200, 100, and 50 test takers taking each form. Six pairs of forms were constructed. Each pair was constructed by assigning items from an existing test taken by 9,000 or more test takers. The…
A sub-sampled approach to extremely low-dose STEM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stevens, A.; Luzi, L.; Yang, H.
The inpainting of randomly sub-sampled images acquired by scanning transmission electron microscopy (STEM) is an attractive method for imaging under low-dose conditions (≤ 1 e -Å 2) without changing either the operation of the microscope or the physics of the imaging process. We show that 1) adaptive sub-sampling increases acquisition speed, resolution, and sensitivity; and 2) random (non-adaptive) sub-sampling is equivalent, but faster than, traditional low-dose techniques. Adaptive sub-sampling opens numerous possibilities for the analysis of beam sensitive materials and in-situ dynamic processes at the resolution limit of the aberration corrected microscope and is demonstrated here for the analysis ofmore » the node distribution in metal-organic frameworks (MOFs).« less
Sampling maternal care behaviour in domestic dogs: What's the best approach?
Czerwinski, Veronika H; Smith, Bradley P; Hynd, Philip I; Hazel, Susan J
2017-07-01
Our understanding of the frequency and duration of maternal care behaviours in the domestic dog during the first two postnatal weeks is limited, largely due to the inconsistencies in the sampling methodologies that have been employed. In order to develop a more concise picture of maternal care behaviour during this period, and to help establish the sampling method that represents these behaviours best, we compared a variety of time sampling methods Six litters were continuously observed for a total of 96h over postnatal days 3, 6, 9 and 12 (24h per day). Frequent (dam presence, nursing duration, contact duration) and infrequent maternal behaviours (anogenital licking duration and frequency) were coded using five different time sampling methods that included: 12-h night (1800-0600h), 12-h day (0600-1800h), one hour period during the night (1800-0600h), one hour period during the day (0600-1800h) and a one hour period anytime. Each of the one hour time sampling method consisted of four randomly chosen 15-min periods. Two random sets of four 15-min period were also analysed to ensure reliability. We then determined which of the time sampling methods averaged over the three 24-h periods best represented the frequency and duration of behaviours. As might be expected, frequently occurring behaviours were adequately represented by short (oneh) sampling periods, however this was not the case with the infrequent behaviour. Thus, we argue that the time sampling methodology employed must match the behaviour of interest. This caution applies to maternal behaviour in altricial species, such as canids, as well as all systematic behavioural observations utilising time sampling methodology. Copyright © 2017. Published by Elsevier B.V.
Long, Ju
2016-05-01
In China, -(SEA), -α(3.7) and -α(4.2) are common deletional α-thalassemia alleles. Gap-PCR is the currently used detection method for these alleles, whose disadvantages include time-consuming procedure and increased potential for PCR product contamination. Therefore, this detection method needs to be improved. Based on identical-primer homologous fragments, a qPCR system was developed for deletional α-thalassemia genotyping, which was composed of a group of quantitatively-related primers and their corresponding probes plus two groups of qualitatively-related primers and their corresponding probes. In order to verify the accuracy of the qPCR system, known genotype samples and random samples are employed. The standard curve result demonstrated that designed primers and probes all yielded good amplification efficiency. In the tests of known genotype samples and random samples, sample detection results were consistent with verification results. In detecting αα, -(SEA), -α(3.7) and -α(4.2) alleles, deletional α-thalassemia alleles are accurately detected by this method. In addition, this method is provided with a wider detection range, greater speed and reduced PCR product contamination risk when compared with current common gap-PCR detection reagents. Copyright © 2016 Elsevier B.V. All rights reserved.
The Effect of Cluster Sampling Design in Survey Research on the Standard Error Statistic.
ERIC Educational Resources Information Center
Wang, Lin; Fan, Xitao
Standard statistical methods are used to analyze data that is assumed to be collected using a simple random sampling scheme. These methods, however, tend to underestimate variance when the data is collected with a cluster design, which is often found in educational survey research. The purposes of this paper are to demonstrate how a cluster design…
L'Engle, Kelly; Sefa, Eunice; Adimazoya, Edward Akolgo; Yartey, Emmanuel; Lenzi, Rachel; Tarpo, Cindy; Heward-Mills, Nii Lante; Lew, Katherine; Ampeh, Yvonne
2018-01-01
Generating a nationally representative sample in low and middle income countries typically requires resource-intensive household level sampling with door-to-door data collection. High mobile phone penetration rates in developing countries provide new opportunities for alternative sampling and data collection methods, but there is limited information about response rates and sample biases in coverage and nonresponse using these methods. We utilized data from an interactive voice response, random-digit dial, national mobile phone survey in Ghana to calculate standardized response rates and assess representativeness of the obtained sample. The survey methodology was piloted in two rounds of data collection. The final survey included 18 demographic, media exposure, and health behavior questions. Call outcomes and response rates were calculated according to the American Association of Public Opinion Research guidelines. Sample characteristics, productivity, and costs per interview were calculated. Representativeness was assessed by comparing data to the Ghana Demographic and Health Survey and the National Population and Housing Census. The survey was fielded during a 27-day period in February-March 2017. There were 9,469 completed interviews and 3,547 partial interviews. Response, cooperation, refusal, and contact rates were 31%, 81%, 7%, and 39% respectively. Twenty-three calls were dialed to produce an eligible contact: nonresponse was substantial due to the automated calling system and dialing of many unassigned or non-working numbers. Younger, urban, better educated, and male respondents were overrepresented in the sample. The innovative mobile phone data collection methodology yielded a large sample in a relatively short period. Response rates were comparable to other surveys, although substantial coverage bias resulted from fewer women, rural, and older residents completing the mobile phone survey in comparison to household surveys. Random digit dialing of mobile phones offers promise for future data collection in Ghana and may be suitable for other developing countries.
Assessing the Generalizability of Randomized Trial Results to Target Populations
Stuart, Elizabeth A.; Bradshaw, Catherine P.; Leaf, Philip J.
2014-01-01
Recent years have seen increasing interest in and attention to evidence-based practices, where the “evidence” generally comes from well-conducted randomized trials. However, while those trials yield accurate estimates of the effect of the intervention for the participants in the trial (known as “internal validity”), they do not always yield relevant information about the effects in a particular target population (known as “external validity”). This may be due to a lack of specification of a target population when designing the trial, difficulties recruiting a sample that is representative of a pre-specified target population, or to interest in considering a target population somewhat different from the population directly targeted by the trial. This paper first provides an overview of existing design and analysis methods for assessing and enhancing the ability of a randomized trial to estimate treatment effects in a target population. It then provides a case study using one particular method, which weights the subjects in a randomized trial to match the population on a set of observed characteristics. The case study uses data from a randomized trial of School-wide Positive Behavioral Interventions and Supports (PBIS); our interest is in generalizing the results to the state of Maryland. In the case of PBIS, after weighting, estimated effects in the target population were similar to those observed in the randomized trial. The paper illustrates that statistical methods can be used to assess and enhance the external validity of randomized trials, making the results more applicable to policy and clinical questions. However, there are also many open research questions; future research should focus on questions of treatment effect heterogeneity and further developing these methods for enhancing external validity. Researchers should think carefully about the external validity of randomized trials and be cautious about extrapolating results to specific populations unless they are confident of the similarity between the trial sample and that target population. PMID:25307417
Assessing the generalizability of randomized trial results to target populations.
Stuart, Elizabeth A; Bradshaw, Catherine P; Leaf, Philip J
2015-04-01
Recent years have seen increasing interest in and attention to evidence-based practices, where the "evidence" generally comes from well-conducted randomized trials. However, while those trials yield accurate estimates of the effect of the intervention for the participants in the trial (known as "internal validity"), they do not always yield relevant information about the effects in a particular target population (known as "external validity"). This may be due to a lack of specification of a target population when designing the trial, difficulties recruiting a sample that is representative of a prespecified target population, or to interest in considering a target population somewhat different from the population directly targeted by the trial. This paper first provides an overview of existing design and analysis methods for assessing and enhancing the ability of a randomized trial to estimate treatment effects in a target population. It then provides a case study using one particular method, which weights the subjects in a randomized trial to match the population on a set of observed characteristics. The case study uses data from a randomized trial of school-wide positive behavioral interventions and supports (PBIS); our interest is in generalizing the results to the state of Maryland. In the case of PBIS, after weighting, estimated effects in the target population were similar to those observed in the randomized trial. The paper illustrates that statistical methods can be used to assess and enhance the external validity of randomized trials, making the results more applicable to policy and clinical questions. However, there are also many open research questions; future research should focus on questions of treatment effect heterogeneity and further developing these methods for enhancing external validity. Researchers should think carefully about the external validity of randomized trials and be cautious about extrapolating results to specific populations unless they are confident of the similarity between the trial sample and that target population.
Cuevas, Erik; Díaz, Margarita
2015-01-01
In this paper, a new method for robustly estimating multiple view relations from point correspondences is presented. The approach combines the popular random sampling consensus (RANSAC) algorithm and the evolutionary method harmony search (HS). With this combination, the proposed method adopts a different sampling strategy than RANSAC to generate putative solutions. Under the new mechanism, at each iteration, new candidate solutions are built taking into account the quality of the models generated by previous candidate solutions, rather than purely random as it is the case of RANSAC. The rules for the generation of candidate solutions (samples) are motivated by the improvisation process that occurs when a musician searches for a better state of harmony. As a result, the proposed approach can substantially reduce the number of iterations still preserving the robust capabilities of RANSAC. The method is generic and its use is illustrated by the estimation of homographies, considering synthetic and real images. Additionally, in order to demonstrate the performance of the proposed approach within a real engineering application, it is employed to solve the problem of position estimation in a humanoid robot. Experimental results validate the efficiency of the proposed method in terms of accuracy, speed, and robustness.
NASA Astrophysics Data System (ADS)
Witteveen, Jeroen A. S.; Bijl, Hester
2009-10-01
The Unsteady Adaptive Stochastic Finite Elements (UASFE) method resolves the effect of randomness in numerical simulations of single-mode aeroelastic responses with a constant accuracy in time for a constant number of samples. In this paper, the UASFE framework is extended to multi-frequency responses and continuous structures by employing a wavelet decomposition pre-processing step to decompose the sampled multi-frequency signals into single-frequency components. The effect of the randomness on the multi-frequency response is then obtained by summing the results of the UASFE interpolation at constant phase for the different frequency components. Results for multi-frequency responses and continuous structures show a three orders of magnitude reduction of computational costs compared to crude Monte Carlo simulations in a harmonically forced oscillator, a flutter panel problem, and the three-dimensional transonic AGARD 445.6 wing aeroelastic benchmark subject to random fields and random parameters with various probability distributions.
Influence of item distribution pattern and abundance on efficiency of benthic core sampling
Behney, Adam C.; O'Shaughnessy, Ryan; Eichholz, Michael W.; Stafford, Joshua D.
2014-01-01
ore sampling is a commonly used method to estimate benthic item density, but little information exists about factors influencing the accuracy and time-efficiency of this method. We simulated core sampling in a Geographic Information System framework by generating points (benthic items) and polygons (core samplers) to assess how sample size (number of core samples), core sampler size (cm2), distribution of benthic items, and item density affected the bias and precision of estimates of density, the detection probability of items, and the time-costs. When items were distributed randomly versus clumped, bias decreased and precision increased with increasing sample size and increased slightly with increasing core sampler size. Bias and precision were only affected by benthic item density at very low values (500–1,000 items/m2). Detection probability (the probability of capturing ≥ 1 item in a core sample if it is available for sampling) was substantially greater when items were distributed randomly as opposed to clumped. Taking more small diameter core samples was always more time-efficient than taking fewer large diameter samples. We are unable to present a single, optimal sample size, but provide information for researchers and managers to derive optimal sample sizes dependent on their research goals and environmental conditions.
Olives, Casey; Pagano, Marcello; Deitchler, Megan; Hedt, Bethany L; Egge, Kari; Valadez, Joseph J
2009-04-01
Traditional lot quality assurance sampling (LQAS) methods require simple random sampling to guarantee valid results. However, cluster sampling has been proposed to reduce the number of random starting points. This study uses simulations to examine the classification error of two such designs, a 67x3 (67 clusters of three observations) and a 33x6 (33 clusters of six observations) sampling scheme to assess the prevalence of global acute malnutrition (GAM). Further, we explore the use of a 67x3 sequential sampling scheme for LQAS classification of GAM prevalence. Results indicate that, for independent clusters with moderate intracluster correlation for the GAM outcome, the three sampling designs maintain approximate validity for LQAS analysis. Sequential sampling can substantially reduce the average sample size that is required for data collection. The presence of intercluster correlation can impact dramatically the classification error that is associated with LQAS analysis.
[Environmental Education Units.
ERIC Educational Resources Information Center
Minneapolis Independent School District 275, Minn.
Two of these three pamphlets describe methods of teaching young elementary school children the principles of sampling. Tiles of five colors are added to a tub and children sample these randomly; using the tiles as units for a graph, they draw a representation of the population. Pooling results leads to a more reliable sample. Practice is given in…
Tran, Viet-Thi; Porcher, Raphael; Falissard, Bruno; Ravaud, Philippe
2016-12-01
To describe methods to determine sample sizes in surveys using open-ended questions and to assess how resampling methods can be used to determine data saturation in these surveys. We searched the literature for surveys with open-ended questions and assessed the methods used to determine sample size in 100 studies selected at random. Then, we used Monte Carlo simulations on data from a previous study on the burden of treatment to assess the probability of identifying new themes as a function of the number of patients recruited. In the literature, 85% of researchers used a convenience sample, with a median size of 167 participants (interquartile range [IQR] = 69-406). In our simulation study, the probability of identifying at least one new theme for the next included subject was 32%, 24%, and 12% after the inclusion of 30, 50, and 100 subjects, respectively. The inclusion of 150 participants at random resulted in the identification of 92% themes (IQR = 91-93%) identified in the original study. In our study, data saturation was most certainly reached for samples >150 participants. Our method may be used to determine when to continue the study to find new themes or stop because of futility. Copyright © 2016 Elsevier Inc. All rights reserved.
Interval sampling methods and measurement error: a computer simulation.
Wirth, Oliver; Slaven, James; Taylor, Matthew A
2014-01-01
A simulation study was conducted to provide a more thorough account of measurement error associated with interval sampling methods. A computer program simulated the application of momentary time sampling, partial-interval recording, and whole-interval recording methods on target events randomly distributed across an observation period. The simulation yielded measures of error for multiple combinations of observation period, interval duration, event duration, and cumulative event duration. The simulations were conducted up to 100 times to yield measures of error variability. Although the present simulation confirmed some previously reported characteristics of interval sampling methods, it also revealed many new findings that pertain to each method's inherent strengths and weaknesses. The analysis and resulting error tables can help guide the selection of the most appropriate sampling method for observation-based behavioral assessments. © Society for the Experimental Analysis of Behavior.
Method and apparatus for in-situ characterization of energy storage and energy conversion devices
Christophersen, Jon P [Idaho Falls, ID; Motloch, Chester G [Idaho Falls, ID; Morrison, John L [Butte, MT; Albrecht, Weston [Layton, UT
2010-03-09
Disclosed are methods and apparatuses for determining an impedance of an energy-output device using a random noise stimulus applied to the energy-output device. A random noise signal is generated and converted to a random noise stimulus as a current source correlated to the random noise signal. A bias-reduced response of the energy-output device to the random noise stimulus is generated by comparing a voltage at the energy-output device terminal to an average voltage signal. The random noise stimulus and bias-reduced response may be periodically sampled to generate a time-varying current stimulus and a time-varying voltage response, which may be correlated to generate an autocorrelated stimulus, an autocorrelated response, and a cross-correlated response. Finally, the autocorrelated stimulus, the autocorrelated response, and the cross-correlated response may be combined to determine at least one of impedance amplitude, impedance phase, and complex impedance.
Design and analysis of group-randomized trials in cancer: A review of current practices.
Murray, David M; Pals, Sherri L; George, Stephanie M; Kuzmichev, Andrey; Lai, Gabriel Y; Lee, Jocelyn A; Myles, Ranell L; Nelson, Shakira M
2018-06-01
The purpose of this paper is to summarize current practices for the design and analysis of group-randomized trials involving cancer-related risk factors or outcomes and to offer recommendations to improve future trials. We searched for group-randomized trials involving cancer-related risk factors or outcomes that were published or online in peer-reviewed journals in 2011-15. During 2016-17, in Bethesda MD, we reviewed 123 articles from 76 journals to characterize their design and their methods for sample size estimation and data analysis. Only 66 (53.7%) of the articles reported appropriate methods for sample size estimation. Only 63 (51.2%) reported exclusively appropriate methods for analysis. These findings suggest that many investigators do not adequately attend to the methodological challenges inherent in group-randomized trials. These practices can lead to underpowered studies, to an inflated type 1 error rate, and to inferences that mislead readers. Investigators should work with biostatisticians or other methodologists familiar with these issues. Funders and editors should ensure careful methodological review of applications and manuscripts. Reviewers should ensure that studies are properly planned and analyzed. These steps are needed to improve the rigor and reproducibility of group-randomized trials. The Office of Disease Prevention (ODP) at the National Institutes of Health (NIH) has taken several steps to address these issues. ODP offers an online course on the design and analysis of group-randomized trials. ODP is working to increase the number of methodologists who serve on grant review panels. ODP has developed standard language for the Application Guide and the Review Criteria to draw investigators' attention to these issues. Finally, ODP has created a new Research Methods Resources website to help investigators, reviewers, and NIH staff better understand these issues. Published by Elsevier Inc.
Wampler, Peter J; Rediske, Richard R; Molla, Azizur R
2013-01-18
A remote sensing technique was developed which combines a Geographic Information System (GIS); Google Earth, and Microsoft Excel to identify home locations for a random sample of households in rural Haiti. The method was used to select homes for ethnographic and water quality research in a region of rural Haiti located within 9 km of a local hospital and source of health education in Deschapelles, Haiti. The technique does not require access to governmental records or ground based surveys to collect household location data and can be performed in a rapid, cost-effective manner. The random selection of households and the location of these households during field surveys were accomplished using GIS, Google Earth, Microsoft Excel, and handheld Garmin GPSmap 76CSx GPS units. Homes were identified and mapped in Google Earth, exported to ArcMap 10.0, and a random list of homes was generated using Microsoft Excel which was then loaded onto handheld GPS units for field location. The development and use of a remote sensing method was essential to the selection and location of random households. A total of 537 homes initially were mapped and a randomized subset of 96 was identified as potential survey locations. Over 96% of the homes mapped using Google Earth imagery were correctly identified as occupied dwellings. Only 3.6% of the occupants of mapped homes visited declined to be interviewed. 16.4% of the homes visited were not occupied at the time of the visit due to work away from the home or market days. A total of 55 households were located using this method during the 10 days of fieldwork in May and June of 2012. The method used to generate and field locate random homes for surveys and water sampling was an effective means of selecting random households in a rural environment lacking geolocation infrastructure. The success rate for locating households using a handheld GPS was excellent and only rarely was local knowledge required to identify and locate households. This method provides an important technique that can be applied to other developing countries where a randomized study design is needed but infrastructure is lacking to implement more traditional participant selection methods.
Adiposity and Quality of Life: A Case Study from an Urban Center in Nigeria
ERIC Educational Resources Information Center
Akinpelu, Aderonke O.; Akinola, Odunayo T.; Gbiri, Caleb A.
2009-01-01
Objective: To determine relationship between adiposity indices and quality of life (QOL) of residents of a housing estate in Lagos, Nigeria. Design: Cross-sectional survey employing multistep random sampling method. Setting: Urban residential estate. Participants: This study involved 900 randomly selected residents of Abesan Housing Estate, Lagos,…
ERIC Educational Resources Information Center
Amir, Nader; Taylor, Charles T.
2012-01-01
Objective: To examine the efficacy of a multisession computerized interpretation modification program (IMP) in the treatment of generalized social anxiety disorder (GSAD). Method: The sample comprised 49 individuals meeting diagnostic criteria for GSAD who were enrolled in a randomized, double-blind placebo-controlled trial comparing IMP (n = 23)…
Estimation of reference intervals from small samples: an example using canine plasma creatinine.
Geffré, A; Braun, J P; Trumel, C; Concordet, D
2009-12-01
According to international recommendations, reference intervals should be determined from at least 120 reference individuals, which often are impossible to achieve in veterinary clinical pathology, especially for wild animals. When only a small number of reference subjects is available, the possible bias cannot be known and the normality of the distribution cannot be evaluated. A comparison of reference intervals estimated by different methods could be helpful. The purpose of this study was to compare reference limits determined from a large set of canine plasma creatinine reference values, and large subsets of this data, with estimates obtained from small samples selected randomly. Twenty sets each of 120 and 27 samples were randomly selected from a set of 1439 plasma creatinine results obtained from healthy dogs in another study. Reference intervals for the whole sample and for the large samples were determined by a nonparametric method. The estimated reference limits for the small samples were minimum and maximum, mean +/- 2 SD of native and Box-Cox-transformed values, 2.5th and 97.5th percentiles by a robust method on native and Box-Cox-transformed values, and estimates from diagrams of cumulative distribution functions. The whole sample had a heavily skewed distribution, which approached Gaussian after Box-Cox transformation. The reference limits estimated from small samples were highly variable. The closest estimates to the 1439-result reference interval for 27-result subsamples were obtained by both parametric and robust methods after Box-Cox transformation but were grossly erroneous in some cases. For small samples, it is recommended that all values be reported graphically in a dot plot or histogram and that estimates of the reference limits be compared using different methods.
Neither fixed nor random: weighted least squares meta-analysis.
Stanley, T D; Doucouliagos, Hristos
2015-06-15
This study challenges two core conventional meta-analysis methods: fixed effect and random effects. We show how and explain why an unrestricted weighted least squares estimator is superior to conventional random-effects meta-analysis when there is publication (or small-sample) bias and better than a fixed-effect weighted average if there is heterogeneity. Statistical theory and simulations of effect sizes, log odds ratios and regression coefficients demonstrate that this unrestricted weighted least squares estimator provides satisfactory estimates and confidence intervals that are comparable to random effects when there is no publication (or small-sample) bias and identical to fixed-effect meta-analysis when there is no heterogeneity. When there is publication selection bias, the unrestricted weighted least squares approach dominates random effects; when there is excess heterogeneity, it is clearly superior to fixed-effect meta-analysis. In practical applications, an unrestricted weighted least squares weighted average will often provide superior estimates to both conventional fixed and random effects. Copyright © 2015 John Wiley & Sons, Ltd.
Redding, David W; Lucas, Tim C D; Blackburn, Tim M; Jones, Kate E
2017-01-01
Statistical approaches for inferring the spatial distribution of taxa (Species Distribution Models, SDMs) commonly rely on available occurrence data, which is often clumped and geographically restricted. Although available SDM methods address some of these factors, they could be more directly and accurately modelled using a spatially-explicit approach. Software to fit models with spatial autocorrelation parameters in SDMs are now widely available, but whether such approaches for inferring SDMs aid predictions compared to other methodologies is unknown. Here, within a simulated environment using 1000 generated species' ranges, we compared the performance of two commonly used non-spatial SDM methods (Maximum Entropy Modelling, MAXENT and boosted regression trees, BRT), to a spatial Bayesian SDM method (fitted using R-INLA), when the underlying data exhibit varying combinations of clumping and geographic restriction. Finally, we tested how any recommended methodological settings designed to account for spatially non-random patterns in the data impact inference. Spatial Bayesian SDM method was the most consistently accurate method, being in the top 2 most accurate methods in 7 out of 8 data sampling scenarios. Within high-coverage sample datasets, all methods performed fairly similarly. When sampling points were randomly spread, BRT had a 1-3% greater accuracy over the other methods and when samples were clumped, the spatial Bayesian SDM method had a 4%-8% better AUC score. Alternatively, when sampling points were restricted to a small section of the true range all methods were on average 10-12% less accurate, with greater variation among the methods. Model inference under the recommended settings to account for autocorrelation was not impacted by clumping or restriction of data, except for the complexity of the spatial regression term in the spatial Bayesian model. Methods, such as those made available by R-INLA, can be successfully used to account for spatial autocorrelation in an SDM context and, by taking account of random effects, produce outputs that can better elucidate the role of covariates in predicting species occurrence. Given that it is often unclear what the drivers are behind data clumping in an empirical occurrence dataset, or indeed how geographically restricted these data are, spatially-explicit Bayesian SDMs may be the better choice when modelling the spatial distribution of target species.
Sampling considerations for disease surveillance in wildlife populations
Nusser, S.M.; Clark, W.R.; Otis, D.L.; Huang, L.
2008-01-01
Disease surveillance in wildlife populations involves detecting the presence of a disease, characterizing its prevalence and spread, and subsequent monitoring. A probability sample of animals selected from the population and corresponding estimators of disease prevalence and detection provide estimates with quantifiable statistical properties, but this approach is rarely used. Although wildlife scientists often assume probability sampling and random disease distributions to calculate sample sizes, convenience samples (i.e., samples of readily available animals) are typically used, and disease distributions are rarely random. We demonstrate how landscape-based simulation can be used to explore properties of estimators from convenience samples in relation to probability samples. We used simulation methods to model what is known about the habitat preferences of the wildlife population, the disease distribution, and the potential biases of the convenience-sample approach. Using chronic wasting disease in free-ranging deer (Odocoileus virginianus) as a simple illustration, we show that using probability sample designs with appropriate estimators provides unbiased surveillance parameter estimates but that the selection bias and coverage errors associated with convenience samples can lead to biased and misleading results. We also suggest practical alternatives to convenience samples that mix probability and convenience sampling. For example, a sample of land areas can be selected using a probability design that oversamples areas with larger animal populations, followed by harvesting of individual animals within sampled areas using a convenience sampling method.
NASA Astrophysics Data System (ADS)
Zhang, Hua; Yang, Hui; Li, Hongxing; Huang, Guangnan; Ding, Zheyi
2018-04-01
The attenuation of random noise is important for improving the signal to noise ratio (SNR). However, the precondition for most conventional denoising methods is that the noisy data must be sampled on a uniform grid, making the conventional methods unsuitable for non-uniformly sampled data. In this paper, a denoising method capable of regularizing the noisy data from a non-uniform grid to a specified uniform grid is proposed. Firstly, the denoising method is performed for every time slice extracted from the 3D noisy data along the source and receiver directions, then the 2D non-equispaced fast Fourier transform (NFFT) is introduced in the conventional fast discrete curvelet transform (FDCT). The non-equispaced fast discrete curvelet transform (NFDCT) can be achieved based on the regularized inversion of an operator that links the uniformly sampled curvelet coefficients to the non-uniformly sampled noisy data. The uniform curvelet coefficients can be calculated by using the inversion algorithm of the spectral projected-gradient for ℓ1-norm problems. Then local threshold factors are chosen for the uniform curvelet coefficients for each decomposition scale, and effective curvelet coefficients are obtained respectively for each scale. Finally, the conventional inverse FDCT is applied to the effective curvelet coefficients. This completes the proposed 3D denoising method using the non-equispaced curvelet transform in the source-receiver domain. The examples for synthetic data and real data reveal the effectiveness of the proposed approach in applications to noise attenuation for non-uniformly sampled data compared with the conventional FDCT method and wavelet transformation.
ERIC Educational Resources Information Center
Oyewole, Olawale; Adetimirin, Airen
2015-01-01
Lecturers and postgraduates are among the users of the university libraries and their perception of the libraries has influence on utilization of the information resources, hence the need for this study. Survey method was adopted for the study and simple random sampling method was used to select sample size of 38 lecturers and 233 postgraduates.…
Tangen, C M; Koch, G G
1999-03-01
In the randomized clinical trial setting, controlling for covariates is expected to produce variance reduction for the treatment parameter estimate and to adjust for random imbalances of covariates between the treatment groups. However, for the logistic regression model, variance reduction is not obviously obtained. This can lead to concerns about the assumptions of the logistic model. We introduce a complementary nonparametric method for covariate adjustment. It provides results that are usually compatible with expectations for analysis of covariance. The only assumptions required are based on randomization and sampling arguments. The resulting treatment parameter is a (unconditional) population average log-odds ratio that has been adjusted for random imbalance of covariates. Data from a randomized clinical trial are used to compare results from the traditional maximum likelihood logistic method with those from the nonparametric logistic method. We examine treatment parameter estimates, corresponding standard errors, and significance levels in models with and without covariate adjustment. In addition, we discuss differences between unconditional population average treatment parameters and conditional subpopulation average treatment parameters. Additional features of the nonparametric method, including stratified (multicenter) and multivariate (multivisit) analyses, are illustrated. Extensions of this methodology to the proportional odds model are also made.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gilbert, R.O.; Eberhardt, L.L.; Fowler, E.B.
This paper is centered around the use of stratified random sampling for estimating the total amount (inventory) of $sup 239-240$Pu and uranium in surface soil at ten ''safety-shot'' sites on the Nevada Test Site (NTS) and Tonopah Test Range (TTR) that are currently being studied by the Nevada Applied Ecology Group (NAEG). The use of stratified random sampling has resulted in estimates of inventory at these desert study sites that have smaller standard errors than would have been the case had simple random sampling (no stratification) been used. Estimates of inventory are given for $sup 235$U, $sup 238$U, and $supmore » 239-240$Pu in soil at A Site of Area 11 on the NTS. Other results presented include average concentrations of one or more of these isotopes in soil and vegetation and in soil profile samples at depths to 25 cm. The regression relationship between soil and vegetation concentrations of $sup 235$U and $sup 238$U at adjacent sampling locations is also examined using three different models. The applicability of stratified random sampling to the estimation of concentration contours of $sup 239-240$Pu in surface soil using computer algorithms is also investigated. Estimates of such contours are obtained using several different methods. The planning of field sampling plans for estimating inventory and distribution is discussed. (auth)« less
On the predictivity of pore-scale simulations: Estimating uncertainties with multilevel Monte Carlo
NASA Astrophysics Data System (ADS)
Icardi, Matteo; Boccardo, Gianluca; Tempone, Raúl
2016-09-01
A fast method with tunable accuracy is proposed to estimate errors and uncertainties in pore-scale and Digital Rock Physics (DRP) problems. The overall predictivity of these studies can be, in fact, hindered by many factors including sample heterogeneity, computational and imaging limitations, model inadequacy and not perfectly known physical parameters. The typical objective of pore-scale studies is the estimation of macroscopic effective parameters such as permeability, effective diffusivity and hydrodynamic dispersion. However, these are often non-deterministic quantities (i.e., results obtained for specific pore-scale sample and setup are not totally reproducible by another ;equivalent; sample and setup). The stochastic nature can arise due to the multi-scale heterogeneity, the computational and experimental limitations in considering large samples, and the complexity of the physical models. These approximations, in fact, introduce an error that, being dependent on a large number of complex factors, can be modeled as random. We propose a general simulation tool, based on multilevel Monte Carlo, that can reduce drastically the computational cost needed for computing accurate statistics of effective parameters and other quantities of interest, under any of these random errors. This is, to our knowledge, the first attempt to include Uncertainty Quantification (UQ) in pore-scale physics and simulation. The method can also provide estimates of the discretization error and it is tested on three-dimensional transport problems in heterogeneous materials, where the sampling procedure is done by generation algorithms able to reproduce realistic consolidated and unconsolidated random sphere and ellipsoid packings and arrangements. A totally automatic workflow is developed in an open-source code [1], that include rigid body physics and random packing algorithms, unstructured mesh discretization, finite volume solvers, extrapolation and post-processing techniques. The proposed method can be efficiently used in many porous media applications for problems such as stochastic homogenization/upscaling, propagation of uncertainty from microscopic fluid and rock properties to macro-scale parameters, robust estimation of Representative Elementary Volume size for arbitrary physics.
Confidence intervals for a difference between lognormal means in cluster randomization trials.
Poirier, Julia; Zou, G Y; Koval, John
2017-04-01
Cluster randomization trials, in which intact social units are randomized to different interventions, have become popular in the last 25 years. Outcomes from these trials in many cases are positively skewed, following approximately lognormal distributions. When inference is focused on the difference between treatment arm arithmetic means, existent confidence interval procedures either make restricting assumptions or are complex to implement. We approach this problem by assuming log-transformed outcomes from each treatment arm follow a one-way random effects model. The treatment arm means are functions of multiple parameters for which separate confidence intervals are readily available, suggesting that the method of variance estimates recovery may be applied to obtain closed-form confidence intervals. A simulation study showed that this simple approach performs well in small sample sizes in terms of empirical coverage, relatively balanced tail errors, and interval widths as compared to existing methods. The methods are illustrated using data arising from a cluster randomization trial investigating a critical pathway for the treatment of community acquired pneumonia.
NASA Technical Reports Server (NTRS)
Johnson, R. A.; Wehrly, T.
1976-01-01
Population models for dependence between two angular measurements and for dependence between an angular and a linear observation are proposed. The method of canonical correlations first leads to new population and sample measures of dependence in this latter situation. An example relating wind direction to the level of a pollutant is given. Next, applied to pairs of angular measurements, the method yields previously proposed sample measures in some special cases and a new sample measure in general.
Reference-free error estimation for multiple measurement methods.
Madan, Hennadii; Pernuš, Franjo; Špiclin, Žiga
2018-01-01
We present a computational framework to select the most accurate and precise method of measurement of a certain quantity, when there is no access to the true value of the measurand. A typical use case is when several image analysis methods are applied to measure the value of a particular quantitative imaging biomarker from the same images. The accuracy of each measurement method is characterized by systematic error (bias), which is modeled as a polynomial in true values of measurand, and the precision as random error modeled with a Gaussian random variable. In contrast to previous works, the random errors are modeled jointly across all methods, thereby enabling the framework to analyze measurement methods based on similar principles, which may have correlated random errors. Furthermore, the posterior distribution of the error model parameters is estimated from samples obtained by Markov chain Monte-Carlo and analyzed to estimate the parameter values and the unknown true values of the measurand. The framework was validated on six synthetic and one clinical dataset containing measurements of total lesion load, a biomarker of neurodegenerative diseases, which was obtained with four automatic methods by analyzing brain magnetic resonance images. The estimates of bias and random error were in a good agreement with the corresponding least squares regression estimates against a reference.
NASA Astrophysics Data System (ADS)
Mueanploy, Wannapa
2015-06-01
The objective of this research was to offer the way to improve engineering students in Physics topic of vector product. The sampling of this research was the engineering students at Pathumwan Institute of Technology during the first semester of academic year 2013. 1) Select 120 students by random sampling are asked to fill in a satisfaction questionnaire scale, to select size of three dimensions vector card in order to apply in the classroom. 2) Select 60 students by random sampling to do achievement test and take the test to be used in the classroom. The methods used in analysis of achievement test by the Kuder-Richardson Method (KR- 20). The results show that 12 items of achievement test are appropriate to be applied in the classroom. The achievement test gets Difficulty (P) = 0.40-0.67, Discrimination = 0.33-0.73 and Reliability (r) = 0.70.The experimental in the classroom. 3) Select 60 students by random sampling divide into two groups; group one (the controlled group) with 30 students was chosen to study in the vector product lesson by the regular teaching method. Group two (the experimental group) with 30 students was chosen to learn the vector product lesson with three dimensions vector card. 4) Analyzed data between the controlled group and the experimental group, the result showed that experimental group got higher achievement test than the controlled group significant at .01 level.
Designing Studies That Would Address the Multilayered Nature of Health Care
Pennell, Michael; Rhoda, Dale; Hade, Erinn M.; Paskett, Electra D.
2010-01-01
We review design and analytic methods available for multilevel interventions in cancer research with particular attention to study design, sample size requirements, and potential to provide statistical evidence for causal inference. The most appropriate methods will depend on the stage of development of the research and whether randomization is possible. Early on, fractional factorial designs may be used to screen intervention components, particularly when randomization of individuals is possible. Quasi-experimental designs, including time-series and multiple baseline designs, can be useful once the intervention is designed because they require few sites and can provide the preliminary evidence to plan efficacy studies. In efficacy and effectiveness studies, group-randomized trials are preferred when randomization is possible and regression discontinuity designs are preferred otherwise if assignment based on a quantitative score is possible. Quasi-experimental designs may be used, especially when combined with recent developments in analytic methods to reduce bias in effect estimates. PMID:20386057
ERIC Educational Resources Information Center
Price, James H.; Thompson, Amy; Khubchandani, Jagdish; Dake, Joseph; Payton, Erica; Teeple, Karen
2014-01-01
Objective: To assess the perceptions and practices of a national sample of college and university presidents regarding their support for concealed handguns being carried on college campuses. Participants: The sample for this study consisted of a national random sample of 900 college or university presidents. Methods: In the spring of 2013, a…
Catarino, Rosa; Vassilakos, Pierre; Bilancioni, Aline; Vanden Eynde, Mathieu; Meyer-Hamme, Ulrike; Menoud, Pierre-Alain; Guerry, Frédéric; Petignat, Patrick
2015-01-01
Background Human papillomavirus (HPV) self-sampling (self-HPV) is valuable in cervical cancer screening. HPV testing is usually performed on physician-collected cervical smears stored in liquid-based medium. Dry filters and swabs are an alternative. We evaluated the adequacy of self-HPV using two dry storage and transport devices, the FTA cartridge and swab. Methods A total of 130 women performed two consecutive self-HPV samples. Randomization determined which of the two tests was performed first: self-HPV using dry swabs (s-DRY) or vaginal specimen collection using a cytobrush applied to an FTA cartridge (s-FTA). After self-HPV, a physician collected a cervical sample using liquid-based medium (Dr-WET). HPV types were identified by real-time PCR. Agreement between collection methods was measured using the kappa statistic. Results HPV prevalence for high-risk types was 62.3% (95%CI: 53.7–70.2) detected by s-DRY, 56.2% (95%CI: 47.6–64.4) by Dr-WET, and 54.6% (95%CI: 46.1–62.9) by s-FTA. There was overall agreement of 70.8% between s-FTA and s-DRY samples (kappa = 0.34), and of 82.3% between self-HPV and Dr-WET samples (kappa = 0.56). Detection sensitivities for low-grade squamous intraepithelial lesion or worse (LSIL+) were: 64.0% (95%CI: 44.5–79.8) for s-FTA, 84.6% (95%CI: 66.5–93.9) for s-DRY, and 76.9% (95%CI: 58.0–89.0) for Dr-WET. The preferred self-collection method among patients was s-DRY (40.8% vs. 15.4%). Regarding costs, FTA card was five times more expensive than the swab (~5 US dollars (USD)/per card vs. ~1 USD/per swab). Conclusion Self-HPV using dry swabs is sensitive for detecting LSIL+ and less expensive than s-FTA. Trial Registration International Standard Randomized Controlled Trial Number (ISRCTN): 43310942 PMID:26630353
Learning Bayesian Networks from Correlated Data
NASA Astrophysics Data System (ADS)
Bae, Harold; Monti, Stefano; Montano, Monty; Steinberg, Martin H.; Perls, Thomas T.; Sebastiani, Paola
2016-05-01
Bayesian networks are probabilistic models that represent complex distributions in a modular way and have become very popular in many fields. There are many methods to build Bayesian networks from a random sample of independent and identically distributed observations. However, many observational studies are designed using some form of clustered sampling that introduces correlations between observations within the same cluster and ignoring this correlation typically inflates the rate of false positive associations. We describe a novel parameterization of Bayesian networks that uses random effects to model the correlation within sample units and can be used for structure and parameter learning from correlated data without inflating the Type I error rate. We compare different learning metrics using simulations and illustrate the method in two real examples: an analysis of genetic and non-genetic factors associated with human longevity from a family-based study, and an example of risk factors for complications of sickle cell anemia from a longitudinal study with repeated measures.
Zhao, Wenle; Weng, Yanqiu; Wu, Qi; Palesch, Yuko
2012-01-01
To evaluate the performance of randomization designs under various parameter settings and trial sample sizes, and identify optimal designs with respect to both treatment imbalance and allocation randomness, we evaluate 260 design scenarios from 14 randomization designs under 15 sample sizes range from 10 to 300, using three measures for imbalance and three measures for randomness. The maximum absolute imbalance and the correct guess (CG) probability are selected to assess the trade-off performance of each randomization design. As measured by the maximum absolute imbalance and the CG probability, we found that performances of the 14 randomization designs are located in a closed region with the upper boundary (worst case) given by Efron's biased coin design (BCD) and the lower boundary (best case) from the Soares and Wu's big stick design (BSD). Designs close to the lower boundary provide a smaller imbalance and a higher randomness than designs close to the upper boundary. Our research suggested that optimization of randomization design is possible based on quantified evaluation of imbalance and randomness. Based on the maximum imbalance and CG probability, the BSD, Chen's biased coin design with imbalance tolerance method, and Chen's Ehrenfest urn design perform better than popularly used permuted block design, EBCD, and Wei's urn design. Copyright © 2011 John Wiley & Sons, Ltd.
Discriminative motif discovery via simulated evolution and random under-sampling.
Song, Tao; Gu, Hong
2014-01-01
Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.
Zhou, Hanzhi; Elliott, Michael R; Raghunathan, Trivellore E
2016-06-01
Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.
Zhou, Hanzhi; Elliott, Michael R.; Raghunathan, Trivellore E.
2017-01-01
Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in “Delta-V,” a key crash severity measure. PMID:29226161
Duchêne, Sebastián; Duchêne, David; Holmes, Edward C; Ho, Simon Y W
2015-07-01
Rates and timescales of viral evolution can be estimated using phylogenetic analyses of time-structured molecular sequences. This involves the use of molecular-clock methods, calibrated by the sampling times of the viral sequences. However, the spread of these sampling times is not always sufficient to allow the substitution rate to be estimated accurately. We conducted Bayesian phylogenetic analyses of simulated virus data to evaluate the performance of the date-randomization test, which is sometimes used to investigate whether time-structured data sets have temporal signal. An estimate of the substitution rate passes this test if its mean does not fall within the 95% credible intervals of rate estimates obtained using replicate data sets in which the sampling times have been randomized. We find that the test sometimes fails to detect rate estimates from data with no temporal signal. This error can be minimized by using a more conservative criterion, whereby the 95% credible interval of the estimate with correct sampling times should not overlap with those obtained with randomized sampling times. We also investigated the behavior of the test when the sampling times are not uniformly distributed throughout the tree, which sometimes occurs in empirical data sets. The test performs poorly in these circumstances, such that a modification to the randomization scheme is needed. Finally, we illustrate the behavior of the test in analyses of nucleotide sequences of cereal yellow dwarf virus. Our results validate the use of the date-randomization test and allow us to propose guidelines for interpretation of its results. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Evaluation and optimization of sampling errors for the Monte Carlo Independent Column Approximation
NASA Astrophysics Data System (ADS)
Räisänen, Petri; Barker, W. Howard
2004-07-01
The Monte Carlo Independent Column Approximation (McICA) method for computing domain-average broadband radiative fluxes is unbiased with respect to the full ICA, but its flux estimates contain conditional random noise. McICA's sampling errors are evaluated here using a global climate model (GCM) dataset and a correlated-k distribution (CKD) radiation scheme. Two approaches to reduce McICA's sampling variance are discussed. The first is to simply restrict all of McICA's samples to cloudy regions. This avoids wasting precious few samples on essentially homogeneous clear skies. Clear-sky fluxes need to be computed separately for this approach, but this is usually done in GCMs for diagnostic purposes anyway. Second, accuracy can be improved by repeated sampling, and averaging those CKD terms with large cloud radiative effects. Although this naturally increases computational costs over the standard CKD model, random errors for fluxes and heating rates are reduced by typically 50% to 60%, for the present radiation code, when the total number of samples is increased by 50%. When both variance reduction techniques are applied simultaneously, globally averaged flux and heating rate random errors are reduced by a factor of #3.
ERIC Educational Resources Information Center
Juhasz, Stephen; And Others
Table of contents (TOC) practices of some 120 primary journals were analyzed. The journals were randomly selected. The method of randomization is described. The samples were selected from a university library with a holding of approximately 12,000 titles published worldwide. A questionnaire was designed. Purpose was to find uniformity and…
The Impact of Five Missing Data Treatments on a Cross-Classified Random Effects Model
ERIC Educational Resources Information Center
Hoelzle, Braden R.
2012-01-01
The present study compared the performance of five missing data treatment methods within a Cross-Classified Random Effects Model environment under various levels and patterns of missing data given a specified sample size. Prior research has shown the varying effect of missing data treatment options within the context of numerous statistical…
An assessment of re-randomization methods in bark beetle (Scolytidae) trapping bioassays
Christopher J. Fettig; Christopher P. Dabney; Stepehen R. McKelvey; Robert R. Borys
2006-01-01
Numerous studies have explored the role of semiochemicals in the behavior of bark beetles (Scolytidae). Multiple funnel traps are often used to elucidate these behavioral responses. Sufficient sample sizes are obtained by using large numbers of traps to which treatments are randomly assigned once, or by frequent collection of trap catches and subsequent re-...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Fangyan; Zhang, Song; Chung Wong, Pak
Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, themore » size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.« less
Physical Activity among Rural Older Adults with Diabetes
ERIC Educational Resources Information Center
Arcury, Thomas A.; Snively, Beverly M.; Bell, Ronny A.; Smith, Shannon L.; Stafford, Jeanette M.; Wetmore-Arkader, Lindsay K.; Quandt, Sara A.
2006-01-01
Purpose: This analysis describes physical activity levels and factors associated with physical activity in an ethnically diverse (African American, Native American, white) sample of rural older adults with diabetes. Method: Data were collected using a population-based, cross-sectional stratified random sample survey of 701 community-dwelling…
Generation of kth-order random toposequences
NASA Astrophysics Data System (ADS)
Odgers, Nathan P.; McBratney, Alex. B.; Minasny, Budiman
2008-05-01
The model presented in this paper derives toposequences from a digital elevation model (DEM). It is written in ArcInfo Macro Language (AML). The toposequences are called kth-order random toposequences, because they take a random path uphill to the top of a hill and downhill to a stream or valley bottom from a randomly selected seed point, and they are located in a streamshed of order k according to a particular stream-ordering system. We define a kth-order streamshed as the area of land that drains directly to a stream segment of stream order k. The model attempts to optimise the spatial configuration of a set of derived toposequences iteratively by using simulated annealing to maximise the total sum of distances between each toposequence hilltop in the set. The user is able to select the order, k, of the derived toposequences. Toposequences are useful for determining soil sampling locations for use in collecting soil data for digital soil mapping applications. Sampling locations can be allocated according to equal elevation or equal-distance intervals along the length of the toposequence, for example. We demonstrate the use of this model for a study area in the Hunter Valley of New South Wales, Australia. Of the 64 toposequences derived, 32 were first-order random toposequences according to Strahler's stream-ordering system, and 32 were second-order random toposequences. The model that we present in this paper is an efficient method for sampling soil along soil toposequences. The soils along a toposequence are related to each other by the topography they are found in, so soil data collected by this method is useful for establishing soil-landscape rules for the preparation of digital soil maps.
Naqvi, Shahid A; D'Souza, Warren D
2005-04-01
Current methods to calculate dose distributions with organ motion can be broadly classified as "dose convolution" and "fluence convolution" methods. In the former, a static dose distribution is convolved with the probability distribution function (PDF) that characterizes the motion. However, artifacts are produced near the surface and around inhomogeneities because the method assumes shift invariance. Fluence convolution avoids these artifacts by convolving the PDF with the incident fluence instead of the patient dose. In this paper we present an alternative method that improves the accuracy, generality as well as the speed of dose calculation with organ motion. The algorithm starts by sampling an isocenter point from a parametrically defined space curve corresponding to the patient-specific motion trajectory. Then a photon is sampled in the linac head and propagated through the three-dimensional (3-D) collimator structure corresponding to a particular MLC segment chosen randomly from the planned IMRT leaf sequence. The photon is then made to interact at a point in the CT-based simulation phantom. Randomly sampled monoenergetic kernel rays issued from this point are then made to deposit energy in the voxels. Our method explicitly accounts for MLC-specific effects (spectral hardening, tongue-and-groove, head scatter) as well as changes in SSD with isocentric displacement, assuming that the body moves rigidly with the isocenter. Since the positions are randomly sampled from a continuum, there is no motion discretization, and the computation takes no more time than a static calculation. To validate our method, we obtained ten separate film measurements of an IMRT plan delivered on a phantom moving sinusoidally, with each fraction starting with a random phase. For 2 cm motion amplitude, we found that a ten-fraction average of the film measurements gave an agreement with the calculated infinite fraction average to within 2 mm in the isodose curves. The results also corroborate the existing notion that the interfraction dose variability due to the interplay between the MLC motion and breathing motion averages out over typical multifraction treatments. Simulation with motion waveforms more representative of real breathing indicate that the motion can produce penumbral spreading asymmetric about the static dose distributions. Such calculations can help a clinician decide to use, for example, a larger margin in the superior direction than in the inferior direction. In the paper we demonstrate that a 15 min run on a single CPU can readily illustrate the effect of a patient-specific breathing waveform, and can guide the physician in making informed decisions about margin expansion and dose escalation.
Kang, Leni; Zhang, Shaokai; Zhao, Fanghui; Qiao, Youlin
2014-03-01
To evaluate and adjust the verification bias existed in the screening or diagnostic tests. Inverse-probability weighting method was used to adjust the sensitivity and specificity of the diagnostic tests, with an example of cervical cancer screening used to introduce the Compare Tests package in R software which could be implemented. Sensitivity and specificity calculated from the traditional method and maximum likelihood estimation method were compared to the results from Inverse-probability weighting method in the random-sampled example. The true sensitivity and specificity of the HPV self-sampling test were 83.53% (95%CI:74.23-89.93)and 85.86% (95%CI: 84.23-87.36). In the analysis of data with randomly missing verification by gold standard, the sensitivity and specificity calculated by traditional method were 90.48% (95%CI:80.74-95.56)and 71.96% (95%CI:68.71-75.00), respectively. The adjusted sensitivity and specificity under the use of Inverse-probability weighting method were 82.25% (95% CI:63.11-92.62) and 85.80% (95% CI: 85.09-86.47), respectively, whereas they were 80.13% (95%CI:66.81-93.46)and 85.80% (95%CI: 84.20-87.41) under the maximum likelihood estimation method. The inverse-probability weighting method could effectively adjust the sensitivity and specificity of a diagnostic test when verification bias existed, especially when complex sampling appeared.
Laber, Eric B; Zhao, Ying-Qi; Regh, Todd; Davidian, Marie; Tsiatis, Anastasios; Stanford, Joseph B; Zeng, Donglin; Song, Rui; Kosorok, Michael R
2016-04-15
A personalized treatment strategy formalizes evidence-based treatment selection by mapping patient information to a recommended treatment. Personalized treatment strategies can produce better patient outcomes while reducing cost and treatment burden. Thus, among clinical and intervention scientists, there is a growing interest in conducting randomized clinical trials when one of the primary aims is estimation of a personalized treatment strategy. However, at present, there are no appropriate sample size formulae to assist in the design of such a trial. Furthermore, because the sampling distribution of the estimated outcome under an estimated optimal treatment strategy can be highly sensitive to small perturbations in the underlying generative model, sample size calculations based on standard (uncorrected) asymptotic approximations or computer simulations may not be reliable. We offer a simple and robust method for powering a single stage, two-armed randomized clinical trial when the primary aim is estimating the optimal single stage personalized treatment strategy. The proposed method is based on inverting a plugin projection confidence interval and is thereby regular and robust to small perturbations of the underlying generative model. The proposed method requires elicitation of two clinically meaningful parameters from clinical scientists and uses data from a small pilot study to estimate nuisance parameters, which are not easily elicited. The method performs well in simulated experiments and is illustrated using data from a pilot study of time to conception and fertility awareness. Copyright © 2015 John Wiley & Sons, Ltd.
Method for analyzing microbial communities
Zhou, Jizhong [Oak Ridge, TN; Wu, Liyou [Oak Ridge, TN
2010-07-20
The present invention provides a method for quantitatively analyzing microbial genes, species, or strains in a sample that contains at least two species or strains of microorganisms. The method involves using an isothermal DNA polymerase to randomly and representatively amplify genomic DNA of the microorganisms in the sample, hybridizing the resultant polynucleotide amplification product to a polynucleotide microarray that can differentiate different genes, species, or strains of microorganisms of interest, and measuring hybridization signals on the microarray to quantify the genes, species, or strains of interest.
Accelerating IMRT optimization by voxel sampling
NASA Astrophysics Data System (ADS)
Martin, Benjamin C.; Bortfeld, Thomas R.; Castañon, David A.
2007-12-01
This paper presents a new method for accelerating intensity-modulated radiation therapy (IMRT) optimization using voxel sampling. Rather than calculating the dose to the entire patient at each step in the optimization, the dose is only calculated for some randomly selected voxels. Those voxels are then used to calculate estimates of the objective and gradient which are used in a randomized version of a steepest descent algorithm. By selecting different voxels on each step, we are able to find an optimal solution to the full problem. We also present an algorithm to automatically choose the best sampling rate for each structure within the patient during the optimization. Seeking further improvements, we experimented with several other gradient-based optimization algorithms and found that the delta-bar-delta algorithm performs well despite the randomness. Overall, we were able to achieve approximately an order of magnitude speedup on our test case as compared to steepest descent.
McClure, Foster D; Lee, Jung K
2012-01-01
The validation process for an analytical method usually employs an interlaboratory study conducted as a balanced completely randomized model involving a specified number of randomly chosen laboratories, each analyzing a specified number of randomly allocated replicates. For such studies, formulas to obtain approximate unbiased estimates of the variance and uncertainty of the sample laboratory-to-laboratory (lab-to-lab) STD (S(L)) have been developed primarily to account for the uncertainty of S(L) when there is a need to develop an uncertainty budget that includes the uncertainty of S(L). For the sake of completeness on this topic, formulas to estimate the variance and uncertainty of the sample lab-to-lab variance (S(L)2) were also developed. In some cases, it was necessary to derive the formulas based on an approximate distribution for S(L)2.
Modelling the light-scattering properties of a planetary-regolith analog sample
NASA Astrophysics Data System (ADS)
Vaisanen, T.; Markkanen, J.; Hadamcik, E.; Levasseur-Regourd, A. C.; Lasue, J.; Blum, J.; Penttila, A.; Muinonen, K.
2017-12-01
Solving the scattering properties of asteroid surfaces can be made cheaper, faster, and more accurate with reliable physics-based electromagnetic scattering programs for large and dense random media. Existing exact methods fail to produce solutions for such large systems and it is essential to develop approximate methods. Radiative transfer (RT) is an approximate method which works for sparse random media such as atmospheres fails when applied to dense media. In order to make the method applicable to dense media, we have developed a radiative-transfer coherent-backscattering method (RT-CB) with incoherent interactions. To show the current progress with the RT-CB, we have modeled a planetary-regolith analog sample. The analog sample is a low-density agglomerate produced by random ballistic deposition of almost equisized silicate spheres studied using the PROGRA2-surf experiment. The scattering properties were then computed with the RT-CB assuming that the silicate spheres were equisized and that there were a Gaussian particle size distribution. The results were then compared to the measured data and the intensity plot is shown below. The phase functions are normalized to unity at the 40-deg phase angle. The tentative intensity modeling shows good match with the measured data, whereas the polarization modeling shows discrepancies. In summary, the current RT-CB modeling is promising, but more work needs to be carried out, in particular, for modeling the polarization. Acknowledgments. Research supported by European Research Council with Advanced Grant No. 320773 SAEMPL, Scattering and Absorption of ElectroMagnetic waves in ParticuLate media. Computational resources provided by CSC - IT Centre for Science Ltd, Finland.
Krusche, Adele; Rudolf von Rohr, Isabelle; Muse, Kate; Duggan, Danielle; Crane, Catherine; Williams, J. Mark G.
2014-01-01
Background Randomized controlled trials (RCTs) are widely accepted as being the most efficient way of investigating the efficacy of psychological therapies. However, researchers conducting RCTs commonly report difficulties recruiting an adequate sample within planned timescales. In an effort to overcome recruitment difficulties, researchers often are forced to expand their recruitment criteria or extend the recruitment phase, thus increasing costs and delaying publication of results. Research investigating the effectiveness of recruitment strategies is limited and trials often fail to report sufficient details about the recruitment sources and resources utilised. Purpose We examined the efficacy of strategies implemented during the Staying Well after Depression RCT in Oxford to recruit participants with a history of recurrent depression. Methods We describe eight recruitment methods utilised and two further sources not initiated by the research team and examine their efficacy in terms of (i) the return, including the number of potential participants who contacted the trial and the number who were randomized into the trial, (ii) cost-effectiveness, comprising direct financial cost and manpower for initial contacts and randomized participants, and (iii) comparison of sociodemographic characteristics of individuals recruited from different sources. Results Poster advertising, web-based advertising and mental health worker referrals were the cheapest methods per randomized participant; however, the ratio of randomized participants to initial contacts differed markedly per source. Advertising online, via posters and on a local radio station were the most cost-effective recruitment methods for soliciting participants who subsequently were randomized into the trial. Advertising across many sources (saturation) was found to be important. Limitations It may not be feasible to employ all the recruitment methods used in this trial to obtain participation from other populations, such as those currently unwell, or in other geographical locations. Recruitment source was unavailable for participants who could not be reached after the initial contact. Thus, it is possible that the efficiency of certain methods of recruitment was poorer than estimated. Efficacy and costs of other recruitment initiatives, such as providing travel expenses to the in-person eligibility assessment and making follow-up telephone calls to candidates who contacted the recruitment team but could not be screened promptly, were not analysed. Conclusions Website advertising resulted in the highest number of randomized participants and was the second cheapest method of recruiting. Future research should evaluate the effectiveness of recruitment strategies for other samples to contribute to a comprehensive base of knowledge for future RCTs. PMID:24686105
Yura, Harold T; Hanson, Steen G
2012-04-01
Methods for simulation of two-dimensional signals with arbitrary power spectral densities and signal amplitude probability density functions are disclosed. The method relies on initially transforming a white noise sample set of random Gaussian distributed numbers into a corresponding set with the desired spectral distribution, after which this colored Gaussian probability distribution is transformed via an inverse transform into the desired probability distribution. In most cases the method provides satisfactory results and can thus be considered an engineering approach. Several illustrative examples with relevance for optics are given.
NASA Astrophysics Data System (ADS)
Deng, Chengbin; Wu, Changshan
2013-12-01
Urban impervious surface information is essential for urban and environmental applications at the regional/national scales. As a popular image processing technique, spectral mixture analysis (SMA) has rarely been applied to coarse-resolution imagery due to the difficulty of deriving endmember spectra using traditional endmember selection methods, particularly within heterogeneous urban environments. To address this problem, we derived endmember signatures through a least squares solution (LSS) technique with known abundances of sample pixels, and integrated these endmember signatures into SMA for mapping large-scale impervious surface fraction. In addition, with the same sample set, we carried out objective comparative analyses among SMA (i.e. fully constrained and unconstrained SMA) and machine learning (i.e. Cubist regression tree and Random Forests) techniques. Analysis of results suggests three major conclusions. First, with the extrapolated endmember spectra from stratified random training samples, the SMA approaches performed relatively well, as indicated by small MAE values. Second, Random Forests yields more reliable results than Cubist regression tree, and its accuracy is improved with increased sample sizes. Finally, comparative analyses suggest a tentative guide for selecting an optimal approach for large-scale fractional imperviousness estimation: unconstrained SMA might be a favorable option with a small number of samples, while Random Forests might be preferred if a large number of samples are available.
Cuevas, Erik; Díaz, Margarita
2015-01-01
In this paper, a new method for robustly estimating multiple view relations from point correspondences is presented. The approach combines the popular random sampling consensus (RANSAC) algorithm and the evolutionary method harmony search (HS). With this combination, the proposed method adopts a different sampling strategy than RANSAC to generate putative solutions. Under the new mechanism, at each iteration, new candidate solutions are built taking into account the quality of the models generated by previous candidate solutions, rather than purely random as it is the case of RANSAC. The rules for the generation of candidate solutions (samples) are motivated by the improvisation process that occurs when a musician searches for a better state of harmony. As a result, the proposed approach can substantially reduce the number of iterations still preserving the robust capabilities of RANSAC. The method is generic and its use is illustrated by the estimation of homographies, considering synthetic and real images. Additionally, in order to demonstrate the performance of the proposed approach within a real engineering application, it is employed to solve the problem of position estimation in a humanoid robot. Experimental results validate the efficiency of the proposed method in terms of accuracy, speed, and robustness. PMID:26339228
Lee, Minjung; Dignam, James J.; Han, Junhee
2014-01-01
We propose a nonparametric approach for cumulative incidence estimation when causes of failure are unknown or missing for some subjects. Under the missing at random assumption, we estimate the cumulative incidence function using multiple imputation methods. We develop asymptotic theory for the cumulative incidence estimators obtained from multiple imputation methods. We also discuss how to construct confidence intervals for the cumulative incidence function and perform a test for comparing the cumulative incidence functions in two samples with missing cause of failure. Through simulation studies, we show that the proposed methods perform well. The methods are illustrated with data from a randomized clinical trial in early stage breast cancer. PMID:25043107
Decision tree modeling using R.
Zhang, Zhongheng
2016-08-01
In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
Huang, Jihan; Li, Mengying; Lv, Yinghua; Yang, Juan; Xu, Ling; Wang, Jingjing; Chen, Junchao; Wang, Kun; He, Yingchun; Zheng, Qingshan
2016-09-01
This study was aimed at exploring the accuracy of population pharmacokinetic method in evaluating the bioequivalence of pidotimod with sparse data profiles and whether this method is suitable for bioequivalence evaluation in special populations such as children with fewer samplings. Methods In this single-dose, two-period crossover study, 20 healthy male Chinese volunteers were randomized 1 : 1 to receive either the test or reference formulation, with a 1-week washout before receiving the alternative formulation. Noncompartmental and population compartmental pharmacokinetic analyses were conducted. Simulated data were analyzed to graphically evaluate the model and the pharmacokinetic characteristics of the two pidotimod formulations. Various sparse sampling scenarios were generated from the real bioequivalence clinical trial data and evaluated by population pharmacokinetic method. The 90% confidence intervals (CIs) for AUC0-12h, AUC0-∞, and Cmax were 97.3 - 118.7%, 96.9 - 118.7%, and 95.1 - 109.8%, respectively, within the 80 - 125% range for bioequivalence using noncompartmental analysis. The population compartmental pharmacokinetics of pidotimod were described using a one-compartment model with first-order absorption and lag time. In the comparison of estimations in different dataset, the estimation of random three- and< fixed four-point sampling strategies can provide results similar to those obtained through rich sampling. The nonlinear mixed-effects model requires fewer data points. Moreover, compared with the noncompartmental analysis method, the pharmacokinetic parameters can be more accurately estimated using nonlinear mixed-effects model. The population pharmacokinetic modeling method was used to assess the bioequivalence of two pidotimod formulations with relatively few sampling points and further validated the bioequivalence of the two formulations. This method may provide useful information for regulating bioequivalence evaluation in special populations.
NASA Astrophysics Data System (ADS)
Voss, Sebastian; Zimmermann, Beate; Zimmermann, Alexander
2016-04-01
In the last three decades, an increasing number of studies analyzed spatial patterns in throughfall to investigate the consequences of rainfall redistribution for biogeochemical and hydrological processes in forests. In the majority of cases, variograms were used to characterize the spatial properties of the throughfall data. The estimation of the variogram from sample data requires an appropriate sampling scheme: most importantly, a large sample and an appropriate layout of sampling locations that often has to serve both variogram estimation and geostatistical prediction. While some recommendations on these aspects exist, they focus on Gaussian data and high ratios of the variogram range to the extent of the study area. However, many hydrological data, and throughfall data in particular, do not follow a Gaussian distribution. In this study, we examined the effect of extent, sample size, sampling design, and calculation methods on variogram estimation of throughfall data. For our investigation, we first generated non-Gaussian random fields based on throughfall data with heavy outliers. Subsequently, we sampled the fields with three extents (plots with edge lengths of 25 m, 50 m, and 100 m), four common sampling designs (two grid-based layouts, transect and random sampling), and five sample sizes (50, 100, 150, 200, 400). We then estimated the variogram parameters by method-of-moments and residual maximum likelihood. Our key findings are threefold. First, the choice of the extent has a substantial influence on the estimation of the variogram. A comparatively small ratio of the extent to the correlation length is beneficial for variogram estimation. Second, a combination of a minimum sample size of 150, a design that ensures the sampling of small distances and variogram estimation by residual maximum likelihood offers a good compromise between accuracy and efficiency. Third, studies relying on method-of-moments based variogram estimation may have to employ at least 200 sampling points for reliable variogram estimates. These suggested sample sizes exceed the numbers recommended by studies dealing with Gaussian data by up to 100 %. Given that most previous throughfall studies relied on method-of-moments variogram estimation and sample sizes << 200, our current knowledge about throughfall spatial variability stands on shaky ground.
NASA Astrophysics Data System (ADS)
Voss, Sebastian; Zimmermann, Beate; Zimmermann, Alexander
2016-09-01
In the last decades, an increasing number of studies analyzed spatial patterns in throughfall by means of variograms. The estimation of the variogram from sample data requires an appropriate sampling scheme: most importantly, a large sample and a layout of sampling locations that often has to serve both variogram estimation and geostatistical prediction. While some recommendations on these aspects exist, they focus on Gaussian data and high ratios of the variogram range to the extent of the study area. However, many hydrological data, and throughfall data in particular, do not follow a Gaussian distribution. In this study, we examined the effect of extent, sample size, sampling design, and calculation method on variogram estimation of throughfall data. For our investigation, we first generated non-Gaussian random fields based on throughfall data with large outliers. Subsequently, we sampled the fields with three extents (plots with edge lengths of 25 m, 50 m, and 100 m), four common sampling designs (two grid-based layouts, transect and random sampling) and five sample sizes (50, 100, 150, 200, 400). We then estimated the variogram parameters by method-of-moments (non-robust and robust estimators) and residual maximum likelihood. Our key findings are threefold. First, the choice of the extent has a substantial influence on the estimation of the variogram. A comparatively small ratio of the extent to the correlation length is beneficial for variogram estimation. Second, a combination of a minimum sample size of 150, a design that ensures the sampling of small distances and variogram estimation by residual maximum likelihood offers a good compromise between accuracy and efficiency. Third, studies relying on method-of-moments based variogram estimation may have to employ at least 200 sampling points for reliable variogram estimates. These suggested sample sizes exceed the number recommended by studies dealing with Gaussian data by up to 100 %. Given that most previous throughfall studies relied on method-of-moments variogram estimation and sample sizes ≪200, currently available data are prone to large uncertainties.
Sampling western spruce budworm larvae by frequency of occurrence on lower crown branches.
R.R. Mason; R.C. Beckwith
1990-01-01
A sampling method was derived whereby budworm density can be estimated by the frequency of occurrence of larvae over a given threshold number instead of by direct counts on branch samples. The model used for converting frequencies to mean densities is appropriate for nonrandom as well as random distributions and, therefore, is applicable to all population densities of...
Entrepreneurial Intentions of Agricultural Students: Levels and Determinants
ERIC Educational Resources Information Center
Pouratashi, Mahtab
2015-01-01
Purpose: This paper examined levels and determinants of entrepreneurial intentions amongst agricultural students. Methodology: The statistical population comprised students in colleges of agriculture at University of Tehran. By use of a random sampling method, a sample of 120 students participated in the study. The instrument for data collection…
Reaching a Representative Sample of College Students: A Comparative Analysis
ERIC Educational Resources Information Center
Giovenco, Daniel P.; Gundersen, Daniel A.; Delnevo, Cristine D.
2016-01-01
Objective: To explore the feasibility of a random-digit dial (RDD) cellular phone survey in order to reach a national and representative sample of college students. Methods: Demographic distributions from the 2011 National Young Adult Health Survey (NYAHS) were benchmarked against enrollment numbers from the Integrated Postsecondary Education…
Academic Self-Efficacy Perceptions of Teacher Candidates
ERIC Educational Resources Information Center
Yesilyurt, Etem
2013-01-01
This study aims determining academic self-efficacy perception of teacher candidates. It is survey model. Population of the study consists of teacher candidates in 2010-2011 academic years at Ahmet Kelesoglu Education Faculty of Education Formation of Selcuk University. A simple random sample was selected as sampling method and the study was…
Resource Utilisation and Curriculum Implementation in Community Colleges in Kenya
ERIC Educational Resources Information Center
Kigwilu, Peter Changilwa; Akala, Winston Jumba
2017-01-01
The study investigated how Catholic-sponsored community colleges in Nairobi utilise the existing physical facilities and teaching and learning resources for effective implementation of Artisan and Craft curricula. The study adopted a mixed methods research design. Proportional stratified random sampling was used to sample 172 students and 18…
Equilibrium Molecular Thermodynamics from Kirkwood Sampling
2015-01-01
We present two methods for barrierless equilibrium sampling of molecular systems based on the recently proposed Kirkwood method (J. Chem. Phys.2009, 130, 134102). Kirkwood sampling employs low-order correlations among internal coordinates of a molecule for random (or non-Markovian) sampling of the high dimensional conformational space. This is a geometrical sampling method independent of the potential energy surface. The first method is a variant of biased Monte Carlo, where Kirkwood sampling is used for generating trial Monte Carlo moves. Using this method, equilibrium distributions corresponding to different temperatures and potential energy functions can be generated from a given set of low-order correlations. Since Kirkwood samples are generated independently, this method is ideally suited for massively parallel distributed computing. The second approach is a variant of reservoir replica exchange, where Kirkwood sampling is used to construct a reservoir of conformations, which exchanges conformations with the replicas performing equilibrium sampling corresponding to different thermodynamic states. Coupling with the Kirkwood reservoir enhances sampling by facilitating global jumps in the conformational space. The efficiency of both methods depends on the overlap of the Kirkwood distribution with the target equilibrium distribution. We present proof-of-concept results for a model nine-atom linear molecule and alanine dipeptide. PMID:25915525
Resolution of plasma sample mix-ups through comparison of patient antibody patterns to E. coli.
Vetter, Beatrice N; Orlowski, Vanessa; Schüpbach, Jörg; Böni, Jürg; Rühe, Bettina; Huder, Jon B
2015-12-01
Accidental sample mix-ups and the need for their swift resolution is a challenge faced by every analytical laboratory. To this end, we developed a simple immunoblot-based method, making use of a patient's characteristic plasma antibody profile to Escherichia coli (E. coli) proteins. Nitrocellulose strips of size-separated proteins from E. coli whole-cell lysates were incubated with patient plasma and visualised with an enzyme-coupled secondary antibody and substrate. Plasma samples of 20 random patients as well as five longitudinal samples of three patients were analysed for antibody band patterns, to evaluate uniqueness and consistency over time, respectively. For sample mix-ups, antibody band patterns of questionable samples were compared with samples of known identity. Comparison of anti-E. coli antibody patterns of 20 random patients showed a unique antibody profile for each patient. Antibody profiles remained consistent over time, as shown for three patients over several years. Three example cases demonstrate the use of this methodology in mis-labelling or -pipetting incidences. Our simple method for resolving plasma sample mix-ups between non-related individuals can be performed with basic laboratory equipment and thus can easily be adopted by analytical laboratories. Copyright © 2015 Elsevier B.V. All rights reserved.
An intrinsic algorithm for parallel Poisson disk sampling on arbitrary surfaces.
Ying, Xiang; Xin, Shi-Qing; Sun, Qian; He, Ying
2013-09-01
Poisson disk sampling has excellent spatial and spectral properties, and plays an important role in a variety of visual computing. Although many promising algorithms have been proposed for multidimensional sampling in euclidean space, very few studies have been reported with regard to the problem of generating Poisson disks on surfaces due to the complicated nature of the surface. This paper presents an intrinsic algorithm for parallel Poisson disk sampling on arbitrary surfaces. In sharp contrast to the conventional parallel approaches, our method neither partitions the given surface into small patches nor uses any spatial data structure to maintain the voids in the sampling domain. Instead, our approach assigns each sample candidate a random and unique priority that is unbiased with regard to the distribution. Hence, multiple threads can process the candidates simultaneously and resolve conflicts by checking the given priority values. Our algorithm guarantees that the generated Poisson disks are uniformly and randomly distributed without bias. It is worth noting that our method is intrinsic and independent of the embedding space. This intrinsic feature allows us to generate Poisson disk patterns on arbitrary surfaces in IR(n). To our knowledge, this is the first intrinsic, parallel, and accurate algorithm for surface Poisson disk sampling. Furthermore, by manipulating the spatially varying density function, we can obtain adaptive sampling easily.
Recruitment for Occupational Research: Using Injured Workers as the Point of Entry into Workplaces
Koehoorn, Mieke; Trask, Catherine M.; Teschke, Kay
2013-01-01
Objective To investigate the feasibility, costs and sample representativeness of a recruitment method that used workers with back injuries as the point of entry into diverse working environments. Methods Workers' compensation claims were used to randomly sample workers from five heavy industries and to recruit their employers for ergonomic assessments of the injured worker and up to 2 co-workers. Results The final study sample included 54 workers from the workers’ compensation registry and 72 co-workers. This sample of 126 workers was based on an initial random sample of 822 workers with a compensation claim, or a ratio of 1 recruited worker to approximately 7 sampled workers. The average recruitment cost was CND$262/injured worker and CND$240/participating worksite including co-workers. The sample was representative of the heavy industry workforce, and was successful in recruiting the self-employed (8.2%), workers from small employers (<20 workers, 38.7%), and workers from diverse working environments (49 worksites, 29 worksite types, and 51 occupations). Conclusions The recruitment rate was low but the cost per participant reasonable and the sample representative of workers in small worksites. Small worksites represent a significant portion of the workforce but are typically underrepresented in occupational research despite having distinct working conditions, exposures and health risks worthy of investigation. PMID:23826387
ERIC Educational Resources Information Center
Cui, Zhongmin; Kolen, Michael J.
2008-01-01
This article considers two methods of estimating standard errors of equipercentile equating: the parametric bootstrap method and the nonparametric bootstrap method. Using a simulation study, these two methods are compared under three sample sizes (300, 1,000, and 3,000), for two test content areas (the Iowa Tests of Basic Skills Maps and Diagrams…
Resampling methods in Microsoft Excel® for estimating reference intervals
Theodorsson, Elvar
2015-01-01
Computer- intensive resampling/bootstrap methods are feasible when calculating reference intervals from non-Gaussian or small reference samples. Microsoft Excel® in version 2010 or later includes natural functions, which lend themselves well to this purpose including recommended interpolation procedures for estimating 2.5 and 97.5 percentiles. The purpose of this paper is to introduce the reader to resampling estimation techniques in general and in using Microsoft Excel® 2010 for the purpose of estimating reference intervals in particular. Parametric methods are preferable to resampling methods when the distributions of observations in the reference samples is Gaussian or can transformed to that distribution even when the number of reference samples is less than 120. Resampling methods are appropriate when the distribution of data from the reference samples is non-Gaussian and in case the number of reference individuals and corresponding samples are in the order of 40. At least 500-1000 random samples with replacement should be taken from the results of measurement of the reference samples. PMID:26527366
Resampling methods in Microsoft Excel® for estimating reference intervals.
Theodorsson, Elvar
2015-01-01
Computer-intensive resampling/bootstrap methods are feasible when calculating reference intervals from non-Gaussian or small reference samples. Microsoft Excel® in version 2010 or later includes natural functions, which lend themselves well to this purpose including recommended interpolation procedures for estimating 2.5 and 97.5 percentiles. The purpose of this paper is to introduce the reader to resampling estimation techniques in general and in using Microsoft Excel® 2010 for the purpose of estimating reference intervals in particular. Parametric methods are preferable to resampling methods when the distributions of observations in the reference samples is Gaussian or can transformed to that distribution even when the number of reference samples is less than 120. Resampling methods are appropriate when the distribution of data from the reference samples is non-Gaussian and in case the number of reference individuals and corresponding samples are in the order of 40. At least 500-1000 random samples with replacement should be taken from the results of measurement of the reference samples.
Subrandom methods for multidimensional nonuniform sampling.
Worley, Bradley
2016-08-01
Methods of nonuniform sampling that utilize pseudorandom number sequences to select points from a weighted Nyquist grid are commonplace in biomolecular NMR studies, due to the beneficial incoherence introduced by pseudorandom sampling. However, these methods require the specification of a non-arbitrary seed number in order to initialize a pseudorandom number generator. Because the performance of pseudorandom sampling schedules can substantially vary based on seed number, this can complicate the task of routine data collection. Approaches such as jittered sampling and stochastic gap sampling are effective at reducing random seed dependence of nonuniform sampling schedules, but still require the specification of a seed number. This work formalizes the use of subrandom number sequences in nonuniform sampling as a means of seed-independent sampling, and compares the performance of three subrandom methods to their pseudorandom counterparts using commonly applied schedule performance metrics. Reconstruction results using experimental datasets are also provided to validate claims made using these performance metrics. Copyright © 2016 Elsevier Inc. All rights reserved.
Olives, Casey; Pagano, Marcello; Deitchler, Megan; Hedt, Bethany L; Egge, Kari; Valadez, Joseph J
2009-01-01
Traditional lot quality assurance sampling (LQAS) methods require simple random sampling to guarantee valid results. However, cluster sampling has been proposed to reduce the number of random starting points. This study uses simulations to examine the classification error of two such designs, a 67×3 (67 clusters of three observations) and a 33×6 (33 clusters of six observations) sampling scheme to assess the prevalence of global acute malnutrition (GAM). Further, we explore the use of a 67×3 sequential sampling scheme for LQAS classification of GAM prevalence. Results indicate that, for independent clusters with moderate intracluster correlation for the GAM outcome, the three sampling designs maintain approximate validity for LQAS analysis. Sequential sampling can substantially reduce the average sample size that is required for data collection. The presence of intercluster correlation can impact dramatically the classification error that is associated with LQAS analysis. PMID:20011037
Housworth, E A; Martins, E P
2001-01-01
Statistical randomization tests in evolutionary biology often require a set of random, computer-generated trees. For example, earlier studies have shown how large numbers of computer-generated trees can be used to conduct phylogenetic comparative analyses even when the phylogeny is uncertain or unknown. These methods were limited, however, in that (in the absence of molecular sequence or other data) they allowed users to assume that no phylogenetic information was available or that all possible trees were known. Intermediate situations where only a taxonomy or other limited phylogenetic information (e.g., polytomies) are available are technically more difficult. The current study describes a procedure for generating random samples of phylogenies while incorporating limited phylogenetic information (e.g., four taxa belong together in a subclade). The procedure can be used to conduct comparative analyses when the phylogeny is only partially resolved or can be used in other randomization tests in which large numbers of possible phylogenies are needed.
Cauley, Jane A.; LaCroix, Andrea Z.; Robbins, John A.; Larson, Joseph; Wallace, Robert; Wactawski-Wende, Jean; Chen, Zhao; Bauer, Douglas C.; Cummings, Steven R.; Jackson, Rebecca
2009-01-01
Purpose To test the hypothesis that the reduction in fractures with hormone therapy (HT) is greater in women with lower estradiol levels. Methods We conducted a nested case-control study within the Women’s Health Initiative HT Trials. The sample included 231 hip fracture case-control pairs and a random sample of 519 all fracture case-control pairs. Cases and controls were matched for age, ethnicity, randomization date, fracture history and hysterectomy status. Hormones were measured prior to randomization. Incident cases of fracture identified over an average follow-up of 6.53 years. Results There was no evidence that the effect of HT on fracture differed by baseline estradiol (E2) or sex hormone binding globulin (SHBG). Across all quartiles of E2 and SHBG, women randomized to HT had about a 50% lower risk of fracture including hip fracture, compared to placebo. Conclusion The effect of HT on fracture reduction is independent of estradiol and SHBG levels. PMID:19436934
Model Reduction via Principe Component Analysis and Markov Chain Monte Carlo (MCMC) Methods
NASA Astrophysics Data System (ADS)
Gong, R.; Chen, J.; Hoversten, M. G.; Luo, J.
2011-12-01
Geophysical and hydrogeological inverse problems often include a large number of unknown parameters, ranging from hundreds to millions, depending on parameterization and problems undertaking. This makes inverse estimation and uncertainty quantification very challenging, especially for those problems in two- or three-dimensional spatial domains. Model reduction technique has the potential of mitigating the curse of dimensionality by reducing total numbers of unknowns while describing the complex subsurface systems adequately. In this study, we explore the use of principal component analysis (PCA) and Markov chain Monte Carlo (MCMC) sampling methods for model reduction through the use of synthetic datasets. We compare the performances of three different but closely related model reduction approaches: (1) PCA methods with geometric sampling (referred to as 'Method 1'), (2) PCA methods with MCMC sampling (referred to as 'Method 2'), and (3) PCA methods with MCMC sampling and inclusion of random effects (referred to as 'Method 3'). We consider a simple convolution model with five unknown parameters as our goal is to understand and visualize the advantages and disadvantages of each method by comparing their inversion results with the corresponding analytical solutions. We generated synthetic data with noise added and invert them under two different situations: (1) the noised data and the covariance matrix for PCA analysis are consistent (referred to as the unbiased case), and (2) the noise data and the covariance matrix are inconsistent (referred to as biased case). In the unbiased case, comparison between the analytical solutions and the inversion results show that all three methods provide good estimates of the true values and Method 1 is computationally more efficient. In terms of uncertainty quantification, Method 1 performs poorly because of relatively small number of samples obtained, Method 2 performs best, and Method 3 overestimates uncertainty due to inclusion of random effects. However, in the biased case, only Method 3 correctly estimates all the unknown parameters, and both Methods 1 and 2 provide wrong values for the biased parameters. The synthetic case study demonstrates that if the covariance matrix for PCA analysis is inconsistent with true models, the PCA methods with geometric or MCMC sampling will provide incorrect estimates.
Unconstrained Enhanced Sampling for Free Energy Calculations of Biomolecules: A Review
Miao, Yinglong; McCammon, J. Andrew
2016-01-01
Free energy calculations are central to understanding the structure, dynamics and function of biomolecules. Yet insufficient sampling of biomolecular configurations is often regarded as one of the main sources of error. Many enhanced sampling techniques have been developed to address this issue. Notably, enhanced sampling methods based on biasing collective variables (CVs), including the widely used umbrella sampling, adaptive biasing force and metadynamics, have been discussed in a recent excellent review (Abrams and Bussi, Entropy, 2014). Here, we aim to review enhanced sampling methods that do not require predefined system-dependent CVs for biomolecular simulations and as such do not suffer from the hidden energy barrier problem as encountered in the CV-biasing methods. These methods include, but are not limited to, replica exchange/parallel tempering, self-guided molecular/Langevin dynamics, essential energy space random walk and accelerated molecular dynamics. While it is overwhelming to describe all details of each method, we provide a summary of the methods along with the applications and offer our perspectives. We conclude with challenges and prospects of the unconstrained enhanced sampling methods for accurate biomolecular free energy calculations. PMID:27453631
Unconstrained Enhanced Sampling for Free Energy Calculations of Biomolecules: A Review.
Miao, Yinglong; McCammon, J Andrew
Free energy calculations are central to understanding the structure, dynamics and function of biomolecules. Yet insufficient sampling of biomolecular configurations is often regarded as one of the main sources of error. Many enhanced sampling techniques have been developed to address this issue. Notably, enhanced sampling methods based on biasing collective variables (CVs), including the widely used umbrella sampling, adaptive biasing force and metadynamics, have been discussed in a recent excellent review (Abrams and Bussi, Entropy, 2014). Here, we aim to review enhanced sampling methods that do not require predefined system-dependent CVs for biomolecular simulations and as such do not suffer from the hidden energy barrier problem as encountered in the CV-biasing methods. These methods include, but are not limited to, replica exchange/parallel tempering, self-guided molecular/Langevin dynamics, essential energy space random walk and accelerated molecular dynamics. While it is overwhelming to describe all details of each method, we provide a summary of the methods along with the applications and offer our perspectives. We conclude with challenges and prospects of the unconstrained enhanced sampling methods for accurate biomolecular free energy calculations.
ERIC Educational Resources Information Center
Glassman, Jill R.; Potter, Susan C.; Baumler, Elizabeth R.; Coyle, Karin K.
2015-01-01
Introduction: Group-randomized trials (GRTs) are one of the most rigorous methods for evaluating the effectiveness of group-based health risk prevention programs. Efficiently designing GRTs with a sample size that is sufficient for meeting the trial's power and precision goals while not wasting resources exceeding them requires estimates of the…
NASA Astrophysics Data System (ADS)
Bhosale, Parag; Staring, Marius; Al-Ars, Zaid; Berendsen, Floris F.
2018-03-01
Currently, non-rigid image registration algorithms are too computationally intensive to use in time-critical applications. Existing implementations that focus on speed typically address this by either parallelization on GPU-hardware, or by introducing methodically novel techniques into CPU-oriented algorithms. Stochastic gradient descent (SGD) optimization and variations thereof have proven to drastically reduce the computational burden for CPU-based image registration, but have not been successfully applied in GPU hardware due to its stochastic nature. This paper proposes 1) NiftyRegSGD, a SGD optimization for the GPU-based image registration tool NiftyReg, 2) random chunk sampler, a new random sampling strategy that better utilizes the memory bandwidth of GPU hardware. Experiments have been performed on 3D lung CT data of 19 patients, which compared NiftyRegSGD (with and without random chunk sampler) with CPU-based elastix Fast Adaptive SGD (FASGD) and NiftyReg. The registration runtime was 21.5s, 4.4s and 2.8s for elastix-FASGD, NiftyRegSGD without, and NiftyRegSGD with random chunk sampling, respectively, while similar accuracy was obtained. Our method is publicly available at https://github.com/SuperElastix/NiftyRegSGD.
NASA Astrophysics Data System (ADS)
Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.
2018-05-01
Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.
New Mathematical Strategy Using Branch and Bound Method
NASA Astrophysics Data System (ADS)
Tarray, Tanveer Ahmad; Bhat, Muzafar Rasool
In this paper, the problem of optimal allocation in stratified random sampling is used in the presence of nonresponse. The problem is formulated as a nonlinear programming problem (NLPP) and is solved using Branch and Bound method. Also the results are formulated through LINGO.
Two new methods to fit models for network meta-analysis with random inconsistency effects.
Law, Martin; Jackson, Dan; Turner, Rebecca; Rhodes, Kirsty; Viechtbauer, Wolfgang
2016-07-28
Meta-analysis is a valuable tool for combining evidence from multiple studies. Network meta-analysis is becoming more widely used as a means to compare multiple treatments in the same analysis. However, a network meta-analysis may exhibit inconsistency, whereby the treatment effect estimates do not agree across all trial designs, even after taking between-study heterogeneity into account. We propose two new estimation methods for network meta-analysis models with random inconsistency effects. The model we consider is an extension of the conventional random-effects model for meta-analysis to the network meta-analysis setting and allows for potential inconsistency using random inconsistency effects. Our first new estimation method uses a Bayesian framework with empirically-based prior distributions for both the heterogeneity and the inconsistency variances. We fit the model using importance sampling and thereby avoid some of the difficulties that might be associated with using Markov Chain Monte Carlo (MCMC). However, we confirm the accuracy of our importance sampling method by comparing the results to those obtained using MCMC as the gold standard. The second new estimation method we describe uses a likelihood-based approach, implemented in the metafor package, which can be used to obtain (restricted) maximum-likelihood estimates of the model parameters and profile likelihood confidence intervals of the variance components. We illustrate the application of the methods using two contrasting examples. The first uses all-cause mortality as an outcome, and shows little evidence of between-study heterogeneity or inconsistency. The second uses "ear discharge" as an outcome, and exhibits substantial between-study heterogeneity and inconsistency. Both new estimation methods give results similar to those obtained using MCMC. The extent of heterogeneity and inconsistency should be assessed and reported in any network meta-analysis. Our two new methods can be used to fit models for network meta-analysis with random inconsistency effects. They are easily implemented using the accompanying R code in the Additional file 1. Using these estimation methods, the extent of inconsistency can be assessed and reported.
Lafontaine, Sean J V; Sawada, M; Kristjansson, Elizabeth
2017-02-16
With the expansion and growth of research on neighbourhood characteristics, there is an increased need for direct observational field audits. Herein, we introduce a novel direct observational audit method and systematic social observation instrument (SSOI) for efficiently assessing neighbourhood aesthetics over large urban areas. Our audit method uses spatial random sampling stratified by residential zoning and incorporates both mobile geographic information systems technology and virtual environments. The reliability of our method was tested in two ways: first, in 15 Ottawa neighbourhoods, we compared results at audited locations over two subsequent years, and second; we audited every residential block (167 blocks) in one neighbourhood and compared the distribution of SSOI aesthetics index scores with results from the randomly audited locations. Finally, we present interrater reliability and consistency results on all observed items. The observed neighbourhood average aesthetics index score estimated from four or five stratified random audit locations is sufficient to characterize the average neighbourhood aesthetics. The SSOI was internally consistent and demonstrated good to excellent interrater reliability. At the neighbourhood level, aesthetics is positively related to SES and physical activity and negatively correlated with BMI. The proposed approach to direct neighbourhood auditing performs sufficiently and has the advantage of financial and temporal efficiency when auditing a large city.
Krusche, Adele; Rudolf von Rohr, Isabelle; Muse, Kate; Duggan, Danielle; Crane, Catherine; Williams, J Mark G
2014-04-01
Randomized controlled trials (RCTs) are widely accepted as being the most efficient way of investigating the efficacy of psychological therapies. However, researchers conducting RCTs commonly report difficulties in recruiting an adequate sample within planned timescales. In an effort to overcome recruitment difficulties, researchers often are forced to expand their recruitment criteria or extend the recruitment phase, thus increasing costs and delaying publication of results. Research investigating the effectiveness of recruitment strategies is limited, and trials often fail to report sufficient details about the recruitment sources and resources utilized. We examined the efficacy of strategies implemented during the Staying Well after Depression RCT in Oxford to recruit participants with a history of recurrent depression. We describe eight recruitment methods utilized and two further sources not initiated by the research team and examine their efficacy in terms of (1) the return, including the number of potential participants who contacted the trial and the number who were randomized into the trial; (2) cost-effectiveness, comprising direct financial cost and manpower for initial contacts and randomized participants; and (3) comparison of sociodemographic characteristics of individuals recruited from different sources. Poster advertising, web-based advertising, and mental health worker referrals were the cheapest methods per randomized participant; however, the ratio of randomized participants to initial contacts differed markedly per source. Advertising online, via posters, and on a local radio station were the most cost-effective recruitment methods for soliciting participants who subsequently were randomized into the trial. Advertising across many sources (saturation) was found to be important. It may not be feasible to employ all the recruitment methods used in this trial to obtain participation from other populations, such as those currently unwell, or in other geographical locations. Recruitment source was unavailable for participants who could not be reached after the initial contact. Thus, it is possible that the efficiency of certain methods of recruitment was poorer than estimated. Efficacy and costs of other recruitment initiatives, such as providing travel expenses to the in-person eligibility assessment and making follow-up telephone calls to candidates who contacted the recruitment team but could not be screened promptly, were not analysed. Website advertising resulted in the highest number of randomized participants and was the second cheapest method of recruiting. Future research should evaluate the effectiveness of recruitment strategies for other samples to contribute to a comprehensive base of knowledge for future RCTs.
[Krigle estimation and its simulated sampling of Chilo suppressalis population density].
Yuan, Zheming; Bai, Lianyang; Wang, Kuiwu; Hu, Xiangyue
2004-07-01
In order to draw up a rational sampling plan for the larvae population of Chilo suppressalis, an original population and its two derivative populations, random population and sequence population, were sampled and compared with random sampling, gap-range-random sampling, and a new systematic sampling integrated Krigle interpolation and random original position. As for the original population whose distribution was up to aggregative and dependence range in line direction was 115 cm (6.9 units), gap-range-random sampling in line direction was more precise than random sampling. Distinguishing the population pattern correctly is the key to get a better precision. Gap-range-random sampling and random sampling are fit for aggregated population and random population, respectively, but both of them are difficult to apply in practice. Therefore, a new systematic sampling named as Krigle sample (n = 441) was developed to estimate the density of partial sample (partial estimation, n = 441) and population (overall estimation, N = 1500). As for original population, the estimated precision of Krigle sample to partial sample and population was better than that of investigation sample. With the increase of the aggregation intensity of population, Krigel sample was more effective than investigation sample in both partial estimation and overall estimation in the appropriate sampling gap according to the dependence range.
Rauscher, Sarah; Neale, Chris; Pomès, Régis
2009-10-13
Generalized-ensemble algorithms in temperature space have become popular tools to enhance conformational sampling in biomolecular simulations. A random walk in temperature leads to a corresponding random walk in potential energy, which can be used to cross over energetic barriers and overcome the problem of quasi-nonergodicity. In this paper, we introduce two novel methods: simulated tempering distributed replica sampling (STDR) and virtual replica exchange (VREX). These methods are designed to address the practical issues inherent in the replica exchange (RE), simulated tempering (ST), and serial replica exchange (SREM) algorithms. RE requires a large, dedicated, and homogeneous cluster of CPUs to function efficiently when applied to complex systems. ST and SREM both have the drawback of requiring extensive initial simulations, possibly adaptive, for the calculation of weight factors or potential energy distribution functions. STDR and VREX alleviate the need for lengthy initial simulations, and for synchronization and extensive communication between replicas. Both methods are therefore suitable for distributed or heterogeneous computing platforms. We perform an objective comparison of all five algorithms in terms of both implementation issues and sampling efficiency. We use disordered peptides in explicit water as test systems, for a total simulation time of over 42 μs. Efficiency is defined in terms of both structural convergence and temperature diffusion, and we show that these definitions of efficiency are in fact correlated. Importantly, we find that ST-based methods exhibit faster temperature diffusion and correspondingly faster convergence of structural properties compared to RE-based methods. Within the RE-based methods, VREX is superior to both SREM and RE. On the basis of our observations, we conclude that ST is ideal for simple systems, while STDR is well-suited for complex systems.
Żebrowska, Magdalena; Posch, Martin; Magirr, Dominic
2016-05-30
Consider a parallel group trial for the comparison of an experimental treatment to a control, where the second-stage sample size may depend on the blinded primary endpoint data as well as on additional blinded data from a secondary endpoint. For the setting of normally distributed endpoints, we demonstrate that this may lead to an inflation of the type I error rate if the null hypothesis holds for the primary but not the secondary endpoint. We derive upper bounds for the inflation of the type I error rate, both for trials that employ random allocation and for those that use block randomization. We illustrate the worst-case sample size reassessment rule in a case study. For both randomization strategies, the maximum type I error rate increases with the effect size in the secondary endpoint and the correlation between endpoints. The maximum inflation increases with smaller block sizes if information on the block size is used in the reassessment rule. Based on our findings, we do not question the well-established use of blinded sample size reassessment methods with nuisance parameter estimates computed from the blinded interim data of the primary endpoint. However, we demonstrate that the type I error rate control of these methods relies on the application of specific, binding, pre-planned and fully algorithmic sample size reassessment rules and does not extend to general or unplanned sample size adjustments based on blinded data. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Least squares polynomial chaos expansion: A review of sampling strategies
NASA Astrophysics Data System (ADS)
Hadigol, Mohammad; Doostan, Alireza
2018-04-01
As non-institutive polynomial chaos expansion (PCE) techniques have gained growing popularity among researchers, we here provide a comprehensive review of major sampling strategies for the least squares based PCE. Traditional sampling methods, such as Monte Carlo, Latin hypercube, quasi-Monte Carlo, optimal design of experiments (ODE), Gaussian quadratures, as well as more recent techniques, such as coherence-optimal and randomized quadratures are discussed. We also propose a hybrid sampling method, dubbed alphabetic-coherence-optimal, that employs the so-called alphabetic optimality criteria used in the context of ODE in conjunction with coherence-optimal samples. A comparison between the empirical performance of the selected sampling methods applied to three numerical examples, including high-order PCE's, high-dimensional problems, and low oversampling ratios, is presented to provide a road map for practitioners seeking the most suitable sampling technique for a problem at hand. We observed that the alphabetic-coherence-optimal technique outperforms other sampling methods, specially when high-order ODE are employed and/or the oversampling ratio is low.
Multidimensional Normalization to Minimize Plate Effects of Suspension Bead Array Data.
Hong, Mun-Gwan; Lee, Woojoo; Nilsson, Peter; Pawitan, Yudi; Schwenk, Jochen M
2016-10-07
Enhanced by the growing number of biobanks, biomarker studies can now be performed with reasonable statistical power by using large sets of samples. Antibody-based proteomics by means of suspension bead arrays offers one attractive approach to analyze serum, plasma, or CSF samples for such studies in microtiter plates. To expand measurements beyond single batches, with either 96 or 384 samples per plate, suitable normalization methods are required to minimize the variation between plates. Here we propose two normalization approaches utilizing MA coordinates. The multidimensional MA (multi-MA) and MA-loess both consider all samples of a microtiter plate per suspension bead array assay and thus do not require any external reference samples. We demonstrate the performance of the two MA normalization methods with data obtained from the analysis of 384 samples including both serum and plasma. Samples were randomized across 96-well sample plates, processed, and analyzed in assay plates, respectively. Using principal component analysis (PCA), we could show that plate-wise clusters found in the first two components were eliminated by multi-MA normalization as compared with other normalization methods. Furthermore, we studied the correlation profiles between random pairs of antibodies and found that both MA normalization methods substantially reduced the inflated correlation introduced by plate effects. Normalization approaches using multi-MA and MA-loess minimized batch effects arising from the analysis of several assay plates with antibody suspension bead arrays. In a simulated biomarker study, multi-MA restored associations lost due to plate effects. Our normalization approaches, which are available as R package MDimNormn, could also be useful in studies using other types of high-throughput assay data.
Statistical scaling of geometric characteristics in stochastically generated pore microstructures
Hyman, Jeffrey D.; Guadagnini, Alberto; Winter, C. Larrabee
2015-05-21
In this study, we analyze the statistical scaling of structural attributes of virtual porous microstructures that are stochastically generated by thresholding Gaussian random fields. Characterization of the extent at which randomly generated pore spaces can be considered as representative of a particular rock sample depends on the metrics employed to compare the virtual sample against its physical counterpart. Typically, comparisons against features and/patterns of geometric observables, e.g., porosity and specific surface area, flow-related macroscopic parameters, e.g., permeability, or autocorrelation functions are used to assess the representativeness of a virtual sample, and thereby the quality of the generation method. Here, wemore » rely on manifestations of statistical scaling of geometric observables which were recently observed in real millimeter scale rock samples [13] as additional relevant metrics by which to characterize a virtual sample. We explore the statistical scaling of two geometric observables, namely porosity (Φ) and specific surface area (SSA), of porous microstructures generated using the method of Smolarkiewicz and Winter [42] and Hyman and Winter [22]. Our results suggest that the method can produce virtual pore space samples displaying the symptoms of statistical scaling observed in real rock samples. Order q sample structure functions (statistical moments of absolute increments) of Φ and SSA scale as a power of the separation distance (lag) over a range of lags, and extended self-similarity (linear relationship between log structure functions of successive orders) appears to be an intrinsic property of the generated media. The width of the range of lags where power-law scaling is observed and the Hurst coefficient associated with the variables we consider can be controlled by the generation parameters of the method.« less
Variance of discharge estimates sampled using acoustic Doppler current profilers from moving boats
Garcia, Carlos M.; Tarrab, Leticia; Oberg, Kevin; Szupiany, Ricardo; Cantero, Mariano I.
2012-01-01
This paper presents a model for quantifying the random errors (i.e., variance) of acoustic Doppler current profiler (ADCP) discharge measurements from moving boats for different sampling times. The model focuses on the random processes in the sampled flow field and has been developed using statistical methods currently available for uncertainty analysis of velocity time series. Analysis of field data collected using ADCP from moving boats from three natural rivers of varying sizes and flow conditions shows that, even though the estimate of the integral time scale of the actual turbulent flow field is larger than the sampling interval, the integral time scale of the sampled flow field is on the order of the sampling interval. Thus, an equation for computing the variance error in discharge measurements associated with different sampling times, assuming uncorrelated flow fields is appropriate. The approach is used to help define optimal sampling strategies by choosing the exposure time required for ADCPs to accurately measure flow discharge.
Kitchener, H; Gittins, M; Cruickshank, M; Moseley, C; Fletcher, S; Albrow, R; Gray, A; Brabin, L; Torgerson, D; Crosbie, E J; Sargent, A; Roberts, C
2018-06-01
Objectives To measure the feasibility and effectiveness of interventions to increase cervical screening uptake amongst young women. Methods A two-phase cluster randomized trial conducted in general practices in the NHS Cervical Screening Programme. In Phase 1, women in practices randomized to intervention due for their first invitation to cervical screening received a pre-invitation leaflet and, separately, access to online booking. In Phase 2, non-attenders at six months were randomized to one of: vaginal self-sample kits sent unrequested or offered; timed appointments; nurse navigator; or the choice between nurse navigator or self-sample kits. Primary outcome was uplift in intervention vs. control practices, at 3 and 12 months post invitation. Results Phase 1 randomized 20,879 women. Neither pre-invitation leaflet nor online booking increased screening uptake by three months (18.8% pre-invitation leaflet vs. 19.2% control and 17.8% online booking vs. 17.2% control). Uptake was higher amongst human papillomavirus vaccinees at three months (OR 2.07, 95% CI 1.69-2.53, p < 0.001). Phase 2 randomized 10,126 non-attenders, with 32-34 clusters for each intervention and 100 clusters as controls. Sending self-sample kits increased uptake at 12 months (OR 1.51, 95% CI 1.20-1.91, p = 0.001), as did timed appointments (OR 1.41, 95% CI 1.14-1.74, p = 0.001). The offer of a nurse navigator, a self-sample kits on request, and choice between timed appointments and nurse navigator were ineffective. Conclusions Amongst non-attenders, self-sample kits sent and timed appointments achieved an uplift in screening over the short term; longer term impact is less certain. Prior human papillomavirus vaccination was associated with increased screening uptake.
Unit Costs of Interlibrary Loans and Photocopies at a Regional Medical Library: Preliminary Report *
Spencer, Carol C.
1970-01-01
Unit costs of providing interlibrary loans and photocopies were determined by a method not previously used for library cost studies: random time sampling with self-observation. The working time of all appropriate personnel was sampled using Random Alarm Mechanisms and a structured checklist of tasks. The total lender's unit cost per request received, including direct labor, materials, fringe benefits, and overhead, was $1.526 for originals mailed postpaid by lender, and $1.534 for photocopies mailed. Corresponding unit costs per request filled were: originals, $1.932, and photocopies, $1.763. PMID:5439910
Zhang, Haixia; Zhao, Junkang; Gu, Caijiao; Cui, Yan; Rong, Huiying; Meng, Fanlong; Wang, Tong
2015-05-01
The study of the medical expenditure and its influencing factors among the students enrolling in Urban Resident Basic Medical Insurance (URBMI) in Taiyuan indicated that non response bias and selection bias coexist in dependent variable of the survey data. Unlike previous studies only focused on one missing mechanism, a two-stage method to deal with two missing mechanisms simultaneously was suggested in this study, combining multiple imputation with sample selection model. A total of 1 190 questionnaires were returned by the students (or their parents) selected in child care settings, schools and universities in Taiyuan by stratified cluster random sampling in 2012. In the returned questionnaires, 2.52% existed not missing at random (NMAR) of dependent variable and 7.14% existed missing at random (MAR) of dependent variable. First, multiple imputation was conducted for MAR by using completed data, then sample selection model was used to correct NMAR in multiple imputation, and a multi influencing factor analysis model was established. Based on 1 000 times resampling, the best scheme of filling the random missing values is the predictive mean matching (PMM) method under the missing proportion. With this optimal scheme, a two stage survey was conducted. Finally, it was found that the influencing factors on annual medical expenditure among the students enrolling in URBMI in Taiyuan included population group, annual household gross income, affordability of medical insurance expenditure, chronic disease, seeking medical care in hospital, seeking medical care in community health center or private clinic, hospitalization, hospitalization canceled due to certain reason, self medication and acceptable proportion of self-paid medical expenditure. The two-stage method combining multiple imputation with sample selection model can deal with non response bias and selection bias effectively in dependent variable of the survey data.
Systematic versus random sampling in stereological studies.
West, Mark J
2012-12-01
The sampling that takes place at all levels of an experimental design must be random if the estimate is to be unbiased in a statistical sense. There are two fundamental ways by which one can make a random sample of the sections and positions to be probed on the sections. Using a card-sampling analogy, one can pick any card at all out of a deck of cards. This is referred to as independent random sampling because the sampling of any one card is made without reference to the position of the other cards. The other approach to obtaining a random sample would be to pick a card within a set number of cards and others at equal intervals within the deck. Systematic sampling along one axis of many biological structures is more efficient than random sampling, because most biological structures are not randomly organized. This article discusses the merits of systematic versus random sampling in stereological studies.
Sampled-Data Consensus of Linear Multi-agent Systems With Packet Losses.
Zhang, Wenbing; Tang, Yang; Huang, Tingwen; Kurths, Jurgen
In this paper, the consensus problem is studied for a class of multi-agent systems with sampled data and packet losses, where random and deterministic packet losses are considered, respectively. For random packet losses, a Bernoulli-distributed white sequence is used to describe packet dropouts among agents in a stochastic way. For deterministic packet losses, a switched system with stable and unstable subsystems is employed to model packet dropouts in a deterministic way. The purpose of this paper is to derive consensus criteria, such that linear multi-agent systems with sampled-data and packet losses can reach consensus. By means of the Lyapunov function approach and the decomposition method, the design problem of a distributed controller is solved in terms of convex optimization. The interplay among the allowable bound of the sampling interval, the probability of random packet losses, and the rate of deterministic packet losses are explicitly derived to characterize consensus conditions. The obtained criteria are closely related to the maximum eigenvalue of the Laplacian matrix versus the second minimum eigenvalue of the Laplacian matrix, which reveals the intrinsic effect of communication topologies on consensus performance. Finally, simulations are given to show the effectiveness of the proposed results.In this paper, the consensus problem is studied for a class of multi-agent systems with sampled data and packet losses, where random and deterministic packet losses are considered, respectively. For random packet losses, a Bernoulli-distributed white sequence is used to describe packet dropouts among agents in a stochastic way. For deterministic packet losses, a switched system with stable and unstable subsystems is employed to model packet dropouts in a deterministic way. The purpose of this paper is to derive consensus criteria, such that linear multi-agent systems with sampled-data and packet losses can reach consensus. By means of the Lyapunov function approach and the decomposition method, the design problem of a distributed controller is solved in terms of convex optimization. The interplay among the allowable bound of the sampling interval, the probability of random packet losses, and the rate of deterministic packet losses are explicitly derived to characterize consensus conditions. The obtained criteria are closely related to the maximum eigenvalue of the Laplacian matrix versus the second minimum eigenvalue of the Laplacian matrix, which reveals the intrinsic effect of communication topologies on consensus performance. Finally, simulations are given to show the effectiveness of the proposed results.
Parenthood and the Quality of Experience in Daily Life: A Longitudinal Study
ERIC Educational Resources Information Center
Fave, Antonella Delle; Massimini, Fausto
2004-01-01
This longitudinal study analyzes the time budget and the quality of experience reported by new parents. Five primiparous couples were repeatedly administered Experience Sampling Method. They carried pagers sending random signals 6-8 times a day; at the signal reception, they filled out forms sampling current thoughts, activities, and the quality…
Examining Work and Family Conflict among Female Bankers in Accra Metropolis, Ghana
ERIC Educational Resources Information Center
Kissi-Abrokwah, Bernard; Andoh-Robertson, Theophilus; Tutu-Danquah, Cecilia; Agbesi, Catherine Selorm
2015-01-01
This study investigated the effects and solutions of work and family conflict among female bankers in Accra Metropolis. Using triangulatory mixed method design, a structured questionnaire was randomly administered to 300 female bankers and 15 female Bankers who were interviewed were also sampled by using convenient sampling technique. The…
The Prevalence and Nature of Intellectual Disability in Norwegian Prisons
ERIC Educational Resources Information Center
Sondenaa, E.; Rasmussen, K.; Palmstierna, T.; Nottestad, J.
2008-01-01
Background: The objective of the study was to calculate the prevalence of inmates with intellectual disabilities (ID), and identify historical, medical and criminological characteristics of a certain impact. Methods: A random sample of 143 inmates from a Norwegian prison cross sectional sample was studied. The Hayes Ability Screening Index (HASI)…
Development of Effective Teacher Program: Teamwork Building Program for Thailand's Municipal Schools
ERIC Educational Resources Information Center
Chantathai, Pimpka; Tesaputa, Kowat; Somprach, Kanokorn
2015-01-01
This research is aimed to formulate the effective teacher teamwork program in municipal schools in Thailand. Primary survey on current situation and problem was conducted to develop the plan to suggest potential programs. Samples were randomly selected from municipal schools by using multi-stage sampling method in order to investigate their…
ERIC Educational Resources Information Center
Zatzick, Douglas F.; Grossman, David C.; Russo, Joan; Pynoos, Robert; Berliner, Lucy; Jurkovich, Gregory; Sabin, Janice A.; Katon, Wayne; Ghesquiere, Angela; McCauley, Elizabeth; Rivara, Frederick P.
2006-01-01
Objective: Adolescents constitute a high-risk population for traumatic physical injury, yet few longitudinal investigations have assessed the development of posttraumatic stress disorder (PTSD) symptoms over time in representative samples. Method: Between July 2002 and August 2003,108 randomly selected injured adolescent patients ages 12 to 18 and…
The Impact of Sample Size and Other Factors When Estimating Multilevel Logistic Models
ERIC Educational Resources Information Center
Schoeneberger, Jason A.
2016-01-01
The design of research studies utilizing binary multilevel models must necessarily incorporate knowledge of multiple factors, including estimation method, variance component size, or number of predictors, in addition to sample sizes. This Monte Carlo study examined the performance of random effect binary outcome multilevel models under varying…
Simulation-Extrapolation for Estimating Means and Causal Effects with Mismeasured Covariates
ERIC Educational Resources Information Center
Lockwood, J. R.; McCaffrey, Daniel F.
2015-01-01
Regression, weighting and related approaches to estimating a population mean from a sample with nonrandom missing data often rely on the assumption that conditional on covariates, observed samples can be treated as random. Standard methods using this assumption generally will fail to yield consistent estimators when covariates are measured with…
Baldissera, Sandro; Ferrante, Gianluigi; Quarchioni, Elisa; Minardi, Valentina; Possenti, Valentina; Carrozzi, Giuliano; Masocco, Maria; Salmaso, Stefania
2014-04-01
Field substitution of nonrespondents can be used to maintain the planned sample size and structure in surveys but may introduce additional bias. Sample weighting is suggested as the preferable alternative; however, limited empirical evidence exists comparing the two methods. We wanted to assess the impact of substitution on surveillance results using data from Progressi delle Aziende Sanitarie per la Salute in Italia-Progress by Local Health Units towards a Healthier Italy (PASSI). PASSI is conducted by Local Health Units (LHUs) through telephone interviews of stratified random samples of residents. Nonrespondents are replaced with substitutes randomly preselected in the same LHU stratum. We compared the weighted estimates obtained in the original PASSI sample (used as a reference) and in the substitutes' sample. The differences were evaluated using a Wald test. In 2011, 50,697 units were selected: 37,252 were from the original sample and 13,445 were substitutes; 37,162 persons were interviewed. The initially planned size and demographic composition were restored. No significant differences in the estimates between the original and the substitutes' sample were found. In our experience, field substitution is an acceptable method for dealing with nonresponse, maintaining the characteristics of the original sample without affecting the results. This evidence can support appropriate decisions about planning and implementing a surveillance system. Copyright © 2014 Elsevier Inc. All rights reserved.
Ghasemi, Fahimeh; Fassihi, Afshin; Pérez-Sánchez, Horacio; Mehri Dehnavi, Alireza
2017-02-05
Thousands of molecules and descriptors are available for a medicinal chemist thanks to the technological advancements in different branches of chemistry. This fact as well as the correlation between them has raised new problems in quantitative structure activity relationship studies. Proper parameter initialization in statistical modeling has merged as another challenge in recent years. Random selection of parameters leads to poor performance of deep neural network (DNN). In this research, deep belief network (DBN) was applied to initialize DNNs. DBN is composed of some stacks of restricted Boltzmann machine, an energy-based method that requires computing log likelihood gradient for all samples. Three different sampling approaches were suggested to solve this gradient. In this respect, the impact of DBN was applied based on the different sampling approaches mentioned above to initialize the DNN architecture in predicting biological activity of all fifteen Kaggle targets that contain more than 70k molecules. The same as other fields of processing research, the outputs of these models demonstrated significant superiority to that of DNN with random parameters. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Estimating rare events in biochemical systems using conditional sampling.
Sundar, V S
2017-01-28
The paper focuses on development of variance reduction strategies to estimate rare events in biochemical systems. Obtaining this probability using brute force Monte Carlo simulations in conjunction with the stochastic simulation algorithm (Gillespie's method) is computationally prohibitive. To circumvent this, important sampling tools such as the weighted stochastic simulation algorithm and the doubly weighted stochastic simulation algorithm have been proposed. However, these strategies require an additional step of determining the important region to sample from, which is not straightforward for most of the problems. In this paper, we apply the subset simulation method, developed as a variance reduction tool in the context of structural engineering, to the problem of rare event estimation in biochemical systems. The main idea is that the rare event probability is expressed as a product of more frequent conditional probabilities. These conditional probabilities are estimated with high accuracy using Monte Carlo simulations, specifically the Markov chain Monte Carlo method with the modified Metropolis-Hastings algorithm. Generating sample realizations of the state vector using the stochastic simulation algorithm is viewed as mapping the discrete-state continuous-time random process to the standard normal random variable vector. This viewpoint opens up the possibility of applying more sophisticated and efficient sampling schemes developed elsewhere to problems in stochastic chemical kinetics. The results obtained using the subset simulation method are compared with existing variance reduction strategies for a few benchmark problems, and a satisfactory improvement in computational time is demonstrated.
Adapted random sampling patterns for accelerated MRI.
Knoll, Florian; Clason, Christian; Diwoky, Clemens; Stollberger, Rudolf
2011-02-01
Variable density random sampling patterns have recently become increasingly popular for accelerated imaging strategies, as they lead to incoherent aliasing artifacts. However, the design of these sampling patterns is still an open problem. Current strategies use model assumptions like polynomials of different order to generate a probability density function that is then used to generate the sampling pattern. This approach relies on the optimization of design parameters which is very time consuming and therefore impractical for daily clinical use. This work presents a new approach that generates sampling patterns by making use of power spectra of existing reference data sets and hence requires neither parameter tuning nor an a priori mathematical model of the density of sampling points. The approach is validated with downsampling experiments, as well as with accelerated in vivo measurements. The proposed approach is compared with established sampling patterns, and the generalization potential is tested by using a range of reference images. Quantitative evaluation is performed for the downsampling experiments using RMS differences to the original, fully sampled data set. Our results demonstrate that the image quality of the method presented in this paper is comparable to that of an established model-based strategy when optimization of the model parameter is carried out and yields superior results to non-optimized model parameters. However, no random sampling pattern showed superior performance when compared to conventional Cartesian subsampling for the considered reconstruction strategy.
Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption.
Hartwig, Fernando Pires; Davey Smith, George; Bowden, Jack
2017-12-01
Mendelian randomization (MR) is being increasingly used to strengthen causal inference in observational studies. Availability of summary data of genetic associations for a variety of phenotypes from large genome-wide association studies (GWAS) allows straightforward application of MR using summary data methods, typically in a two-sample design. In addition to the conventional inverse variance weighting (IVW) method, recently developed summary data MR methods, such as the MR-Egger and weighted median approaches, allow a relaxation of the instrumental variable assumptions. Here, a new method - the mode-based estimate (MBE) - is proposed to obtain a single causal effect estimate from multiple genetic instruments. The MBE is consistent when the largest number of similar (identical in infinite samples) individual-instrument causal effect estimates comes from valid instruments, even if the majority of instruments are invalid. We evaluate the performance of the method in simulations designed to mimic the two-sample summary data setting, and demonstrate its use by investigating the causal effect of plasma lipid fractions and urate levels on coronary heart disease risk. The MBE presented less bias and lower type-I error rates than other methods under the null in many situations. Its power to detect a causal effect was smaller compared with the IVW and weighted median methods, but was larger than that of MR-Egger regression, with sample size requirements typically smaller than those available from GWAS consortia. The MBE relaxes the instrumental variable assumptions, and should be used in combination with other approaches in sensitivity analyses. © The Author 2017. Published by Oxford University Press on behalf of the International Epidemiological Association
Rochefort, Christian M; Buckeridge, David L; Tanguay, Andréanne; Biron, Alain; D'Aragon, Frédérick; Wang, Shengrui; Gallix, Benoit; Valiquette, Louis; Audet, Li-Anne; Lee, Todd C; Jayaraman, Dev; Petrucci, Bruno; Lefebvre, Patricia
2017-02-16
Adverse events (AEs) in acute care hospitals are frequent and associated with significant morbidity, mortality, and costs. Measuring AEs is necessary for quality improvement and benchmarking purposes, but current detection methods lack in accuracy, efficiency, and generalizability. The growing availability of electronic health records (EHR) and the development of natural language processing techniques for encoding narrative data offer an opportunity to develop potentially better methods. The purpose of this study is to determine the accuracy and generalizability of using automated methods for detecting three high-incidence and high-impact AEs from EHR data: a) hospital-acquired pneumonia, b) ventilator-associated event and, c) central line-associated bloodstream infection. This validation study will be conducted among medical, surgical and ICU patients admitted between 2013 and 2016 to the Centre hospitalier universitaire de Sherbrooke (CHUS) and the McGill University Health Centre (MUHC), which has both French and English sites. A random 60% sample of CHUS patients will be used for model development purposes (cohort 1, development set). Using a random sample of these patients, a reference standard assessment of their medical chart will be performed. Multivariate logistic regression and the area under the curve (AUC) will be employed to iteratively develop and optimize three automated AE detection models (i.e., one per AE of interest) using EHR data from the CHUS. These models will then be validated on a random sample of the remaining 40% of CHUS patients (cohort 1, internal validation set) using chart review to assess accuracy. The most accurate models developed and validated at the CHUS will then be applied to EHR data from a random sample of patients admitted to the MUHC French site (cohort 2) and English site (cohort 3)-a critical requirement given the use of narrative data -, and accuracy will be assessed using chart review. Generalizability will be determined by comparing AUCs from cohorts 2 and 3 to those from cohort 1. This study will likely produce more accurate and efficient measures of AEs. These measures could be used to assess the incidence rates of AEs, evaluate the success of preventive interventions, or benchmark performance across hospitals.
Olsen, Geary W; Logan, Perry W; Hansen, Kristen J; Simpson, Cathy A; Burris, Jean M; Burlew, Michele M; Vorarath, Phanasouk P; Venkateswarlu, Pothapragada; Schumpert, John C; Mandel, Jeffrey H
2003-01-01
This investigation randomly sampled a fluorochemical manufacturing employee population to determine the distribution of serum fluorochemical levels according to employees' jobs and work areas. Previous analyses of medical surveillance data have not shown significant associations between fluorochemical production employees' clinical chemistry and hematology tests and their serum PFOS and perfluorooctanoate (PFOA, C(7)F(15)COO(-)) concentrations, but may have been subject to nonparticipation bias. A random sample of the on-site film plant employee population, where fluorochemicals are not produced, determined their serum concentrations also. Of the 232 employees randomly selected for serum sampling, 186 (80%) employees participated (n=126 chemical plant; n=60 film plant). Sera samples were extracted using an ion-pairing extraction procedure and were quantitatively analyzed for seven fluorochemicals using high-pressure liquid chromatography electrospray tandem mass spectrometry methods. Geometric means (in parts per million) and 95% confidence intervals (in parentheses) of the random sample of 126 chemical plant employees were: PFOS 0.941 (0.787-1.126); PFOA 0.899 (0.722-1.120); perfluorohexanesulfonate 0.180 (0.145-0.223); N-ethyl perfluorooctanesulfonamidoacetate 0.008 (0.006-0.011); N-methyl perfluorooctanesulfonamidoacetate 0.081 (0.067-0.098); perfluorooctanesulfonamide 0.013 (0.009-0.018); and perfluorooctanesulfonamidoacetate 0.022 (0.018-0.029). These geometric means were approximately one order of magnitude higher than those observed for the film plant employees.
Study of Dynamic Characteristics of Aeroelastic Systems Utilizing Randomdec Signatures
NASA Technical Reports Server (NTRS)
Chang, C. S.
1975-01-01
The feasibility of utilizing the random decrement method in conjunction with a signature analysis procedure to determine the dynamic characteristics of an aeroelastic system for the purpose of on-line prediction of potential on-set of flutter was examined. Digital computer programs were developed to simulate sampled response signals of a two-mode aeroelastic system. Simulated response data were used to test the random decrement method. A special curve-fit approach was developed for analyzing the resulting signatures. A number of numerical 'experiments' were conducted on the combined processes. The method is capable of determining frequency and damping values accurately from randomdec signatures of carefully selected lengths.
NASA Astrophysics Data System (ADS)
Robotham, A. S. G.; Howlett, Cullan
2018-06-01
In this short note we publish the analytic quantile function for the Navarro, Frenk & White (NFW) profile. All known published and coded methods for sampling from the 3D NFW PDF use either accept-reject, or numeric interpolation (sometimes via a lookup table) for projecting random Uniform samples through the quantile distribution function to produce samples of the radius. This is a common requirement in N-body initial condition (IC), halo occupation distribution (HOD), and semi-analytic modelling (SAM) work for correctly assigning particles or galaxies to positions given an assumed concentration for the NFW profile. Using this analytic description allows for much faster and cleaner code to solve a common numeric problem in modern astronomy. We release R and Python versions of simple code that achieves this sampling, which we note is trivial to reproduce in any modern programming language.
Computer methods for sampling from the gamma distribution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, M.E.; Tadikamalla, P.R.
1978-01-01
Considerable attention has recently been directed at developing ever faster algorithms for generating gamma random variates on digital computers. This paper surveys the current state of the art including the leading algorithms of Ahrens and Dieter, Atkinson, Cheng, Fishman, Marsaglia, Tadikamalla, and Wallace. General random variate generation techniques are explained with reference to these gamma algorithms. Computer simulation experiments on IBM and CDC computers are reported.
ERIC Educational Resources Information Center
Lane, Aoife; Murphy, Niamh; Bauman, Adrian; Chey, Tien
2010-01-01
Objective: To assess the impact of a community based, low-contact intervention on the physical activity habits of insufficiently active women. Design: Randomized controlled trial. Participants: Inactive Irish women. Method: A population sample of women participating in a mass 10 km event were up followed at 2 and 6 months, and those who had…
Caries status in 16 year-olds with varying exposure to water fluoridation in Ireland.
Mullen, J; McGaffin, J; Farvardin, N; Brightman, S; Haire, C; Freeman, R
2012-12-01
Most of the Republic of Ireland's public water supplies have been fluoridated since the mid-1960s while Northern Ireland has never been fluoridated, apart from some small short-lived schemes in east Ulster. This study examines dental caries status in 16 year-olds in a part of Ireland straddling fluoridated and non-fluoridated water supply areas and compares two methods of assessing the effectiveness of water fluoridation. The cross-sectional survey tested differences in caries status by two methods: 1, Estimated Fluoridation Status as used previously in national and regional studies in the Republic and in the All-Island study of 2002; 2, Percentage Lifetime Exposure, a modification of a system described by Slade in 1995 and used in Australian caries research. Adolescents were selected for the study by a two-part random sampling process. Firstly, schools were selected in each area by creating three tiers based on school size, and selecting schools randomly from each tier. Then random sampling of 16-year-olds from these schools, based on a pre-set sampling fraction for each tier of schools. With both systems of measurement, significantly lower caries levels were found in those children with the greatest exposure to fluoridated water when compared to those with the least exposure. The survey provides further evidence of the effectiveness in reducing dental caries experience up to 16 years of age. The extra intricacies involved in using the Percentage Lifetime Exposure method did not provide much more information when compared to the simpler Estimated Fluoridation Status method.
Wang, Xiaolong; Li, Lin; Zhao, Jiaxin; Li, Fangliang; Guo, Wei; Chen, Xia
2017-04-01
To evaluate the effects of different preservation methods (stored in a -20°C ice chest, preserved in liquid nitrogen and dried in silica gel) on inter simple sequence repeat (ISSR) or random amplified polymorphic DNA (RAPD) analyses in various botanical specimens (including broad-leaved plants, needle-leaved plants and succulent plants) for different times (three weeks and three years), we used a statistical analysis based on the number of bands, genetic index and cluster analysis. The results demonstrate that methods used to preserve samples can provide sufficient amounts of genomic DNA for ISSR and RAPD analyses; however, the effect of different preservation methods on these analyses vary significantly, and the preservation time has little effect on these analyses. Our results provide a reference for researchers to select the most suitable preservation method depending on their study subject for the analysis of molecular markers based on genomic DNA. Copyright © 2017 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.
Hamiltonian Monte Carlo acceleration using surrogate functions with random bases.
Zhang, Cheng; Shahbaba, Babak; Zhao, Hongkai
2017-11-01
For big data analysis, high computational cost for Bayesian methods often limits their applications in practice. In recent years, there have been many attempts to improve computational efficiency of Bayesian inference. Here we propose an efficient and scalable computational technique for a state-of-the-art Markov chain Monte Carlo methods, namely, Hamiltonian Monte Carlo. The key idea is to explore and exploit the structure and regularity in parameter space for the underlying probabilistic model to construct an effective approximation of its geometric properties. To this end, we build a surrogate function to approximate the target distribution using properly chosen random bases and an efficient optimization process. The resulting method provides a flexible, scalable, and efficient sampling algorithm, which converges to the correct target distribution. We show that by choosing the basis functions and optimization process differently, our method can be related to other approaches for the construction of surrogate functions such as generalized additive models or Gaussian process models. Experiments based on simulated and real data show that our approach leads to substantially more efficient sampling algorithms compared to existing state-of-the-art methods.
Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry
2013-08-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
Ozçift, Akin
2011-05-01
Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.
Underestimating extreme events in power-law behavior due to machine-dependent cutoffs
NASA Astrophysics Data System (ADS)
Radicchi, Filippo
2014-11-01
Power-law distributions are typical macroscopic features occurring in almost all complex systems observable in nature. As a result, researchers in quantitative analyses must often generate random synthetic variates obeying power-law distributions. The task is usually performed through standard methods that map uniform random variates into the desired probability space. Whereas all these algorithms are theoretically solid, in this paper we show that they are subject to severe machine-dependent limitations. As a result, two dramatic consequences arise: (i) the sampling in the tail of the distribution is not random but deterministic; (ii) the moments of the sample distribution, which are theoretically expected to diverge as functions of the sample sizes, converge instead to finite values. We provide quantitative indications for the range of distribution parameters that can be safely handled by standard libraries used in computational analyses. Whereas our findings indicate possible reinterpretations of numerical results obtained through flawed sampling methodologies, they also pave the way for the search for a concrete solution to this central issue shared by all quantitative sciences dealing with complexity.
Catarino, Rosa; Vassilakos, Pierre; Bilancioni, Aline; Vanden Eynde, Mathieu; Meyer-Hamme, Ulrike; Menoud, Pierre-Alain; Guerry, Frédéric; Petignat, Patrick
2015-01-01
Human papillomavirus (HPV) self-sampling (self-HPV) is valuable in cervical cancer screening. HPV testing is usually performed on physician-collected cervical smears stored in liquid-based medium. Dry filters and swabs are an alternative. We evaluated the adequacy of self-HPV using two dry storage and transport devices, the FTA cartridge and swab. A total of 130 women performed two consecutive self-HPV samples. Randomization determined which of the two tests was performed first: self-HPV using dry swabs (s-DRY) or vaginal specimen collection using a cytobrush applied to an FTA cartridge (s-FTA). After self-HPV, a physician collected a cervical sample using liquid-based medium (Dr-WET). HPV types were identified by real-time PCR. Agreement between collection methods was measured using the kappa statistic. HPV prevalence for high-risk types was 62.3% (95%CI: 53.7-70.2) detected by s-DRY, 56.2% (95%CI: 47.6-64.4) by Dr-WET, and 54.6% (95%CI: 46.1-62.9) by s-FTA. There was overall agreement of 70.8% between s-FTA and s-DRY samples (kappa = 0.34), and of 82.3% between self-HPV and Dr-WET samples (kappa = 0.56). Detection sensitivities for low-grade squamous intraepithelial lesion or worse (LSIL+) were: 64.0% (95%CI: 44.5-79.8) for s-FTA, 84.6% (95%CI: 66.5-93.9) for s-DRY, and 76.9% (95%CI: 58.0-89.0) for Dr-WET. The preferred self-collection method among patients was s-DRY (40.8% vs. 15.4%). Regarding costs, FTA card was five times more expensive than the swab (~5 US dollars (USD)/per card vs. ~1 USD/per swab). Self-HPV using dry swabs is sensitive for detecting LSIL+ and less expensive than s-FTA. International Standard Randomized Controlled Trial Number (ISRCTN): 43310942.
Valid statistical inference methods for a case-control study with missing data.
Tian, Guo-Liang; Zhang, Chi; Jiang, Xuejun
2018-04-01
The main objective of this paper is to derive the valid sampling distribution of the observed counts in a case-control study with missing data under the assumption of missing at random by employing the conditional sampling method and the mechanism augmentation method. The proposed sampling distribution, called the case-control sampling distribution, can be used to calculate the standard errors of the maximum likelihood estimates of parameters via the Fisher information matrix and to generate independent samples for constructing small-sample bootstrap confidence intervals. Theoretical comparisons of the new case-control sampling distribution with two existing sampling distributions exhibit a large difference. Simulations are conducted to investigate the influence of the three different sampling distributions on statistical inferences. One finding is that the conclusion by the Wald test for testing independency under the two existing sampling distributions could be completely different (even contradictory) from the Wald test for testing the equality of the success probabilities in control/case groups under the proposed distribution. A real cervical cancer data set is used to illustrate the proposed statistical methods.
Capers, Patrice L.; Brown, Andrew W.; Dawson, John A.; Allison, David B.
2015-01-01
Background: Meta-research can involve manual retrieval and evaluation of research, which is resource intensive. Creation of high throughput methods (e.g., search heuristics, crowdsourcing) has improved feasibility of large meta-research questions, but possibly at the cost of accuracy. Objective: To evaluate the use of double sampling combined with multiple imputation (DS + MI) to address meta-research questions, using as an example adherence of PubMed entries to two simple consolidated standards of reporting trials guidelines for titles and abstracts. Methods: For the DS large sample, we retrieved all PubMed entries satisfying the filters: RCT, human, abstract available, and English language (n = 322, 107). For the DS subsample, we randomly sampled 500 entries from the large sample. The large sample was evaluated with a lower rigor, higher throughput (RLOTHI) method using search heuristics, while the subsample was evaluated using a higher rigor, lower throughput (RHITLO) human rating method. Multiple imputation of the missing-completely at-random RHITLO data for the large sample was informed by: RHITLO data from the subsample; RLOTHI data from the large sample; whether a study was an RCT; and country and year of publication. Results: The RHITLO and RLOTHI methods in the subsample largely agreed (phi coefficients: title = 1.00, abstract = 0.92). Compliance with abstract and title criteria has increased over time, with non-US countries improving more rapidly. DS + MI logistic regression estimates were more precise than subsample estimates (e.g., 95% CI for change in title and abstract compliance by year: subsample RHITLO 1.050–1.174 vs. DS + MI 1.082–1.151). As evidence of improved accuracy, DS + MI coefficient estimates were closer to RHITLO than the large sample RLOTHI. Conclusion: Our results support our hypothesis that DS + MI would result in improved precision and accuracy. This method is flexible and may provide a practical way to examine large corpora of literature. PMID:25988135
The Analysis of Iranian Students' Persistence in Online Education
ERIC Educational Resources Information Center
Mahmodi, Mahdi; Ebrahimzade, Issa
2015-01-01
In the following research, the relationship between instructional interaction and student persistence in e-learning has been analyzed. In order to conduct a descriptive-analytic survey, 744 undergraduate e-students were selected by stratified random sampling method to examine not only the frequency and the methods of establishing an instructional…
Teacher Perceptions of Principals' Leadership Qualities: A Mixed Methods Study
ERIC Educational Resources Information Center
Hauserman, Cal P.; Ivankova, Nataliya V.; Stick, Sheldon L.
2013-01-01
This mixed methods sequential explanatory study utilized the Multi-factor Leadership Questionnaire, responses to open-ended questions, and in-depth interviews to identify transformational leadership qualities that were present among principals in Alberta, Canada. The first quantitative phase consisted of a random sample of 135 schools (with…
King-Shier, Kathryn M; Hemmelgarn, Brenda R; Musto, Richard; Quan, Hude
2014-01-01
Background: Francophones who live outside the primarily French-speaking province of Quebec, Canada, risk being excluded from research by lack of a sampling frame. We examined the adequacy of random sampling, advertising, and respondent-driven sampling for recruitment of francophones for survey research. Methods: We recruited francophones residing in the city of Calgary, Alberta, through advertising and respondentdriven sampling. These 2 samples were then compared with a random subsample of Calgary francophones derived from the 2006 Canadian Community Health Survey (CCHS). We assessed the effectiveness of advertising and respondent-driven sampling in relation to the CCHS sample by comparing demographic characteristics and selected items from the CCHS (specifically self-reported general health status, perceived weight, and having a family doctor). Results: We recruited 120 francophones through advertising and 145 through respondent-driven sampling; the random sample from the CCHS consisted of 259 records. The samples derived from advertising and respondentdriven sampling differed from the CCHS in terms of age (mean ages 41.0, 37.6, and 42.5 years, respectively), sex (proportion of males 26.1%, 40.6%, and 56.6%, respectively), education (college or higher 86.7% , 77.9% , and 59.1%, respectively), place of birth (immigrants accounting for 45.8%, 55.2%, and 3.7%, respectively), and not having a regular medical doctor (16.7%, 34.5%, and 16.6%, respectively). Differences were not tested statistically because of limitations on the analysis of CCHS data imposed by Statistics Canada. Interpretation: The samples generated exclusively through advertising and respondent-driven sampling were not representative of the gold standard sample from the CCHS. Use of such biased samples for research studies could generate misleading results. PMID:25426180
Different hunting strategies select for different weights in red deer.
Martínez, María; Rodríguez-Vigal, Carlos; Jones, Owen R; Coulson, Tim; San Miguel, Alfonso
2005-09-22
Much insight can be derived from records of shot animals. Most researchers using such data assume that their data represents a random sample of a particular demographic class. However, hunters typically select a non-random subset of the population and hunting is, therefore, not a random process. Here, with red deer (Cervus elaphus) hunting data from a ranch in Toledo, Spain, we demonstrate that data collection methods have a significant influence upon the apparent relationship between age and weight. We argue that a failure to correct for such methodological bias may have significant consequences for the interpretation of analyses involving weight or correlated traits such as breeding success, and urge researchers to explore methods to identify and correct for such bias in their data.
Random sphere packing model of heterogeneous propellants
NASA Astrophysics Data System (ADS)
Kochevets, Sergei Victorovich
It is well recognized that combustion of heterogeneous propellants is strongly dependent on the propellant morphology. Recent developments in computing systems make it possible to start three-dimensional modeling of heterogeneous propellant combustion. A key component of such large scale computations is a realistic model of industrial propellants which retains the true morphology---a goal never achieved before. The research presented develops the Random Sphere Packing Model of heterogeneous propellants and generates numerical samples of actual industrial propellants. This is done by developing a sphere packing algorithm which randomly packs a large number of spheres with a polydisperse size distribution within a rectangular domain. First, the packing code is developed, optimized for performance, and parallelized using the OpenMP shared memory architecture. Second, the morphology and packing fraction of two simple cases of unimodal and bimodal packs are investigated computationally and analytically. It is shown that both the Loose Random Packing and Dense Random Packing limits are not well defined and the growth rate of the spheres is identified as the key parameter controlling the efficiency of the packing. For a properly chosen growth rate, computational results are found to be in excellent agreement with experimental data. Third, two strategies are developed to define numerical samples of polydisperse heterogeneous propellants: the Deterministic Strategy and the Random Selection Strategy. Using these strategies, numerical samples of industrial propellants are generated. The packing fraction is investigated and it is shown that the experimental values of the packing fraction can be achieved computationally. It is strongly believed that this Random Sphere Packing Model of propellants is a major step forward in the realistic computational modeling of heterogeneous propellant of combustion. In addition, a method of analysis of the morphology of heterogeneous propellants is developed which uses the concept of multi-point correlation functions. A set of intrinsic length scales of local density fluctuations in random heterogeneous propellants is identified by performing a Monte-Carlo study of the correlation functions. This method of analysis shows great promise for understanding the origins of the combustion instability of heterogeneous propellants, and is believed to become a valuable tool for the development of safe and reliable rocket engines.
Probabilistic generation of random networks taking into account information on motifs occurrence.
Bois, Frederic Y; Gayraud, Ghislaine
2015-01-01
Because of the huge number of graphs possible even with a small number of nodes, inference on network structure is known to be a challenging problem. Generating large random directed graphs with prescribed probabilities of occurrences of some meaningful patterns (motifs) is also difficult. We show how to generate such random graphs according to a formal probabilistic representation, using fast Markov chain Monte Carlo methods to sample them. As an illustration, we generate realistic graphs with several hundred nodes mimicking a gene transcription interaction network in Escherichia coli.
Probabilistic Generation of Random Networks Taking into Account Information on Motifs Occurrence
Bois, Frederic Y.
2015-01-01
Abstract Because of the huge number of graphs possible even with a small number of nodes, inference on network structure is known to be a challenging problem. Generating large random directed graphs with prescribed probabilities of occurrences of some meaningful patterns (motifs) is also difficult. We show how to generate such random graphs according to a formal probabilistic representation, using fast Markov chain Monte Carlo methods to sample them. As an illustration, we generate realistic graphs with several hundred nodes mimicking a gene transcription interaction network in Escherichia coli. PMID:25493547
Hydration Free Energy from Orthogonal Space Random Walk and Polarizable Force Field.
Abella, Jayvee R; Cheng, Sara Y; Wang, Qiantao; Yang, Wei; Ren, Pengyu
2014-07-08
The orthogonal space random walk (OSRW) method has shown enhanced sampling efficiency in free energy calculations from previous studies. In this study, the implementation of OSRW in accordance with the polarizable AMOEBA force field in TINKER molecular modeling software package is discussed and subsequently applied to the hydration free energy calculation of 20 small organic molecules, among which 15 are positively charged and five are neutral. The calculated hydration free energies of these molecules are compared with the results obtained from the Bennett acceptance ratio method using the same force field, and overall an excellent agreement is obtained. The convergence and the efficiency of the OSRW are also discussed and compared with BAR. Combining enhanced sampling techniques such as OSRW with polarizable force fields is very promising for achieving both accuracy and efficiency in general free energy calculations.
NASA Astrophysics Data System (ADS)
Ander, Louise; Lark, Murray; Smedley, Pauline; Watts, Michael; Hamilton, Elliott; Fletcher, Tony; Crabbe, Helen; Close, Rebecca; Studden, Mike; Leonardi, Giovanni
2015-04-01
Random sampling design is optimal in order to be able to assess outcomes, such as the mean of a given variable across an area. However, this optimal sampling design may be compromised to an unknown extent by unavoidable real-world factors: the extent to which the study design can still be considered random, and the influence this may have on the choice of appropriate statistical data analysis is examined in this work. We take a study which relied on voluntary participation for the sampling of private water tap chemical composition in England, UK. This study was designed and implemented as a categorical, randomised study. The local geological classes were grouped into 10 types, which were considered to be most important in likely effects on groundwater chemistry (the source of all the tap waters sampled). Locations of the users of private water supplies were made available to the study group from the Local Authority in the area. These were then assigned, based on location, to geological groups 1 to 10 and randomised within each group. However, the permission to collect samples then required active, voluntary participation by householders and thus, unlike many environmental studies, could not always follow the initial sample design. Impediments to participation ranged from 'willing but not available' during the designated sampling period, to a lack of response to requests to sample (assumed to be wholly unwilling or unable to participate). Additionally, a small number of unplanned samples were collected via new participants making themselves known to the sampling teams, during the sampling period. Here we examine the impact this has on the 'random' nature of the resulting data distribution, by comparison with the non-participating known supplies. We consider the implications this has on choice of statistical analysis methods to predict values and uncertainty at un-sampled locations.
Miller, Ezer; Huppert, Amit; Novikov, Ilya; Warburg, Alon; Hailu, Asrat; Abbasi, Ibrahim; Freedman, Laurence S
2015-11-10
In this work, we describe a two-stage sampling design to estimate the infection prevalence in a population. In the first stage, an imperfect diagnostic test was performed on a random sample of the population. In the second stage, a different imperfect test was performed in a stratified random sample of the first sample. To estimate infection prevalence, we assumed conditional independence between the diagnostic tests and develop method of moments estimators based on expectations of the proportions of people with positive and negative results on both tests that are functions of the tests' sensitivity, specificity, and the infection prevalence. A closed-form solution of the estimating equations was obtained assuming a specificity of 100% for both tests. We applied our method to estimate the infection prevalence of visceral leishmaniasis according to two quantitative polymerase chain reaction tests performed on blood samples taken from 4756 patients in northern Ethiopia. The sensitivities of the tests were also estimated, as well as the standard errors of all estimates, using a parametric bootstrap. We also examined the impact of departures from our assumptions of 100% specificity and conditional independence on the estimated prevalence. Copyright © 2015 John Wiley & Sons, Ltd.
Adjusting for multiple prognostic factors in the analysis of randomised trials
2013-01-01
Background When multiple prognostic factors are adjusted for in the analysis of a randomised trial, it is unclear (1) whether it is necessary to account for each of the strata, formed by all combinations of the prognostic factors (stratified analysis), when randomisation has been balanced within each stratum (stratified randomisation), or whether adjusting for the main effects alone will suffice, and (2) the best method of adjustment in terms of type I error rate and power, irrespective of the randomisation method. Methods We used simulation to (1) determine if a stratified analysis is necessary after stratified randomisation, and (2) to compare different methods of adjustment in terms of power and type I error rate. We considered the following methods of analysis: adjusting for covariates in a regression model, adjusting for each stratum using either fixed or random effects, and Mantel-Haenszel or a stratified Cox model depending on outcome. Results Stratified analysis is required after stratified randomisation to maintain correct type I error rates when (a) there are strong interactions between prognostic factors, and (b) there are approximately equal number of patients in each stratum. However, simulations based on real trial data found that type I error rates were unaffected by the method of analysis (stratified vs unstratified), indicating these conditions were not met in real datasets. Comparison of different analysis methods found that with small sample sizes and a binary or time-to-event outcome, most analysis methods lead to either inflated type I error rates or a reduction in power; the lone exception was a stratified analysis using random effects for strata, which gave nominal type I error rates and adequate power. Conclusions It is unlikely that a stratified analysis is necessary after stratified randomisation except in extreme scenarios. Therefore, the method of analysis (accounting for the strata, or adjusting only for the covariates) will not generally need to depend on the method of randomisation used. Most methods of analysis work well with large sample sizes, however treating strata as random effects should be the analysis method of choice with binary or time-to-event outcomes and a small sample size. PMID:23898993
Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach
Risso, Davide; Taglioli, Luca; De Iasio, Sergio; Gueresi, Paola; Alfani, Guido; Nelli, Sergio; Rossi, Paolo; Paoli, Giorgio; Tofanelli, Sergio
2015-01-01
This research is the first empirical attempt to calculate the various components of the hidden bias associated with the sampling strategies routinely-used in human genetics, with special reference to surname-based strategies. We reconstructed surname distributions of 26 Italian communities with different demographic features across the last six centuries (years 1447–2001). The degree of overlapping between "reference founding core" distributions and the distributions obtained from sampling the present day communities by probabilistic and selective methods was quantified under different conditions and models. When taking into account only one individual per surname (low kinship model), the average discrepancy was 59.5%, with a peak of 84% by random sampling. When multiple individuals per surname were considered (high kinship model), the discrepancy decreased by 8–30% at the cost of a larger variance. Criteria aimed at maximizing locally-spread patrilineages and long-term residency appeared to be affected by recent gene flows much more than expected. Selection of the more frequent family names following low kinship criteria proved to be a suitable approach only for historically stable communities. In any other case true random sampling, despite its high variance, did not return more biased estimates than other selective methods. Our results indicate that the sampling of individuals bearing historically documented surnames (founders' method) should be applied, especially when studying the male-specific genome, to prevent an over-stratification of ancient and recent genetic components that heavily biases inferences and statistics. PMID:26452043
Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach.
Risso, Davide; Taglioli, Luca; De Iasio, Sergio; Gueresi, Paola; Alfani, Guido; Nelli, Sergio; Rossi, Paolo; Paoli, Giorgio; Tofanelli, Sergio
2015-01-01
This research is the first empirical attempt to calculate the various components of the hidden bias associated with the sampling strategies routinely-used in human genetics, with special reference to surname-based strategies. We reconstructed surname distributions of 26 Italian communities with different demographic features across the last six centuries (years 1447-2001). The degree of overlapping between "reference founding core" distributions and the distributions obtained from sampling the present day communities by probabilistic and selective methods was quantified under different conditions and models. When taking into account only one individual per surname (low kinship model), the average discrepancy was 59.5%, with a peak of 84% by random sampling. When multiple individuals per surname were considered (high kinship model), the discrepancy decreased by 8-30% at the cost of a larger variance. Criteria aimed at maximizing locally-spread patrilineages and long-term residency appeared to be affected by recent gene flows much more than expected. Selection of the more frequent family names following low kinship criteria proved to be a suitable approach only for historically stable communities. In any other case true random sampling, despite its high variance, did not return more biased estimates than other selective methods. Our results indicate that the sampling of individuals bearing historically documented surnames (founders' method) should be applied, especially when studying the male-specific genome, to prevent an over-stratification of ancient and recent genetic components that heavily biases inferences and statistics.
Sparse feature learning for instrument identification: Effects of sampling and pooling methods.
Han, Yoonchang; Lee, Subin; Nam, Juhan; Lee, Kyogu
2016-05-01
Feature learning for music applications has recently received considerable attention from many researchers. This paper reports on the sparse feature learning algorithm for musical instrument identification, and in particular, focuses on the effects of the frame sampling techniques for dictionary learning and the pooling methods for feature aggregation. To this end, two frame sampling techniques are examined that are fixed and proportional random sampling. Furthermore, the effect of using onset frame was analyzed for both of proposed sampling methods. Regarding summarization of the feature activation, a standard deviation pooling method is used and compared with the commonly used max- and average-pooling techniques. Using more than 47 000 recordings of 24 instruments from various performers, playing styles, and dynamics, a number of tuning parameters are experimented including the analysis frame size, the dictionary size, and the type of frequency scaling as well as the different sampling and pooling methods. The results show that the combination of proportional sampling and standard deviation pooling achieve the best overall performance of 95.62% while the optimal parameter set varies among the instrument classes.
Robustness-Based Design Optimization Under Data Uncertainty
NASA Technical Reports Server (NTRS)
Zaman, Kais; McDonald, Mark; Mahadevan, Sankaran; Green, Lawrence
2010-01-01
This paper proposes formulations and algorithms for design optimization under both aleatory (i.e., natural or physical variability) and epistemic uncertainty (i.e., imprecise probabilistic information), from the perspective of system robustness. The proposed formulations deal with epistemic uncertainty arising from both sparse and interval data without any assumption about the probability distributions of the random variables. A decoupled approach is proposed in this paper to un-nest the robustness-based design from the analysis of non-design epistemic variables to achieve computational efficiency. The proposed methods are illustrated for the upper stage design problem of a two-stage-to-orbit (TSTO) vehicle, where the information on the random design inputs are only available as sparse point and/or interval data. As collecting more data reduces uncertainty but increases cost, the effect of sample size on the optimality and robustness of the solution is also studied. A method is developed to determine the optimal sample size for sparse point data that leads to the solutions of the design problem that are least sensitive to variations in the input random variables.
Improved high-dimensional prediction with Random Forests by the use of co-data.
Te Beest, Dennis E; Mes, Steven W; Wilting, Saskia M; Brakenhoff, Ruud H; van de Wiel, Mark A
2017-12-28
Prediction in high dimensional settings is difficult due to the large number of variables relative to the sample size. We demonstrate how auxiliary 'co-data' can be used to improve the performance of a Random Forest in such a setting. Co-data are incorporated in the Random Forest by replacing the uniform sampling probabilities that are used to draw candidate variables by co-data moderated sampling probabilities. Co-data here are defined as any type information that is available on the variables of the primary data, but does not use its response labels. These moderated sampling probabilities are, inspired by empirical Bayes, learned from the data at hand. We demonstrate the co-data moderated Random Forest (CoRF) with two examples. In the first example we aim to predict the presence of a lymph node metastasis with gene expression data. We demonstrate how a set of external p-values, a gene signature, and the correlation between gene expression and DNA copy number can improve the predictive performance. In the second example we demonstrate how the prediction of cervical (pre-)cancer with methylation data can be improved by including the location of the probe relative to the known CpG islands, the number of CpG sites targeted by a probe, and a set of p-values from a related study. The proposed method is able to utilize auxiliary co-data to improve the performance of a Random Forest.
1986-08-01
materials (2.2 w/o and 3.0 w/o MgO). The other two batches (2.8 w/o and 3.1 w/o MgO), of higher purity, were made using E-10 zirconia powder from...CID) powders Two methods have been used for the coprecipitation of doped zirconia powders from solutions of chemical precursors. (4) Method I, for...of powder, approximate sample size 3.2 Kg (6.4 Kg for zirconia powder ); 342 3. Random selection of sample; 4. Partial drying of sample to reduce caking
NASA Technical Reports Server (NTRS)
Deepak, A.; Fluellen, A.
1978-01-01
An efficient numerical method of multiple quadratures, the Conroy method, is applied to the problem of computing multiple scattering contributions in the radiative transfer through realistic planetary atmospheres. A brief error analysis of the method is given and comparisons are drawn with the more familiar Monte Carlo method. Both methods are stochastic problem-solving models of a physical or mathematical process and utilize the sampling scheme for points distributed over a definite region. In the Monte Carlo scheme the sample points are distributed randomly over the integration region. In the Conroy method, the sample points are distributed systematically, such that the point distribution forms a unique, closed, symmetrical pattern which effectively fills the region of the multidimensional integration. The methods are illustrated by two simple examples: one, of multidimensional integration involving two independent variables, and the other, of computing the second order scattering contribution to the sky radiance.
NASA Technical Reports Server (NTRS)
Lites, B. W.; Skumanich, A.
1985-01-01
A method is presented for recovery of the vector magnetic field and thermodynamic parameters from polarization measurement of photospheric line profiles measured with filtergraphs. The method includes magneto-optic effects and may be utilized on data sampled at arbitrary wavelengths within the line profile. The accuracy of this method is explored through inversion of synthetic Stokes profiles subjected to varying levels of random noise, instrumental wave-length resolution, and line profile sampling. The level of error introduced by the systematic effect of profile sampling over a finite fraction of the 5 minute oscillation cycle is also investigated. The results presented here are intended to guide instrumental design and observational procedure.
Westfall, Jacob; Kenny, David A; Judd, Charles M
2014-10-01
Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The conventional procedures of statistical power analysis fail to provide appropriate answers to these questions because they are based on statistical models in which stimuli are not assumed to be a source of random variation in the data, models that are inappropriate for experiments involving crossed random factors of participants and stimuli. In this article, we present new methods of power analysis for designs with crossed random factors, and we give detailed, practical guidance to psychology researchers planning experiments in which a sample of participants responds to a sample of stimuli. We extensively examine 5 commonly used experimental designs, describe how to estimate statistical power in each, and provide power analysis results based on a reasonable set of default parameter values. We then develop general conclusions and formulate rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli. We show that in crossed designs, statistical power typically does not approach unity as the number of participants goes to infinity but instead approaches a maximum attainable power value that is possibly small, depending on the stimulus sample. We also consider the statistical merits of designs involving multiple stimulus blocks. Finally, we provide a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.
School Principals' Leadership Behaviours and Its Relation with Teachers' Sense of Self-Efficacy
ERIC Educational Resources Information Center
Mehdinezhad, Vali; Mansouri, Masoumeh
2016-01-01
The aim of this study was to investigate the relationship between school principals' leadership behaviours and teachers' sense of self-efficacy. The research method was descriptive and correlational. A sample size of 254 teachers was simply selected randomly by proportional sampling. For data collection, the Teachers' Sense of Efficacy Scale of…
Health Outcomes in Adolescence: Associations with Family, Friends and School Engagement
ERIC Educational Resources Information Center
Carter, Melissa; McGee, Rob; Taylor, Barry; Williams, Sheila
2007-01-01
Aim: To examine the associations between connectedness to family and friends, and school engagement, and selected health compromising and health promoting behaviours in a sample of New Zealand adolescents. Methods: A web-based survey was designed and administered to a random sample of 652 Year 11 students aged 16 years from all Dunedin (NZ) high…
How Generalizable Is Your Experiment? An Index for Comparing Samples and Populations
ERIC Educational Resources Information Center
Tipton, Elizabeth
2013-01-01
Recent research on the design of social experiments has highlighted the effects of different design choices on research findings. Since experiments rarely collect their samples using random selection, in order to address these external validity problems and design choices, recent research has focused on two areas. The first area is on methods for…
ERIC Educational Resources Information Center
Doerann-George, Judith
The Integrated Moving Average (IMA) model of time series, and the analysis of intervention effects based on it, assume random shocks which are normally distributed. To determine the robustness of the analysis to violations of this assumption, empirical sampling methods were employed. Samples were generated from three populations; normal,…
Dimensions of Oppositional Defiant Disorder in 3-Year-Old Preschoolers
ERIC Educational Resources Information Center
Ezpeleta, Lourdes; Granero, Roser; de la Osa, Nuria; Penelo, Eva; Domenech, Josep M.
2012-01-01
Background: To test the factor structure of oppositional defiant disorder (ODD) symptoms and to study the relationships between the proposed dimensions and external variables in a community sample of preschool children. Method: A sample of 1,341 3-year-old preschoolers was randomly selected and screened for a double-phase design. In total, 622…
Eating and Exercising: Nebraska Adolescents' Attitudes and Behaviors. Technical Report 25.
ERIC Educational Resources Information Center
Newman, Ian M.
This report describes selected eating and exercise patterns among a sample of 2,237 Nebraska youth in grades 9-12 selected from a random sample of 24 junior and senior high schools. The eating patterns reported cover food selection, body image, weight management, and weight loss methods. The exercise patterns relate to the frequency of…
The Relationship between Emotional Intelligence and Problem Solving Skills in Prospective Teachers
ERIC Educational Resources Information Center
Deniz, Sabahattin
2013-01-01
This study aims to investigate the relationship between emotional intelligence and problem solving. The sample set of the research was taken from the Faculty of Education of Mugla University by the random sampling method. The participants were 386 students--prospective teachers--(224 females; 182 males) who took part in the study voluntarily.…
Geldsetzer, Pascal; Fink, Günther; Vaikath, Maria; Bärnighausen, Till
2018-02-01
(1) To evaluate the operational efficiency of various sampling methods for patient exit interviews; (2) to discuss under what circumstances each method yields an unbiased sample; and (3) to propose a new, operationally efficient, and unbiased sampling method. Literature review, mathematical derivation, and Monte Carlo simulations. Our simulations show that in patient exit interviews it is most operationally efficient if the interviewer, after completing an interview, selects the next patient exiting the clinical consultation. We demonstrate mathematically that this method yields a biased sample: patients who spend a longer time with the clinician are overrepresented. This bias can be removed by selecting the next patient who enters, rather than exits, the consultation room. We show that this sampling method is operationally more efficient than alternative methods (systematic and simple random sampling) in most primary health care settings. Under the assumption that the order in which patients enter the consultation room is unrelated to the length of time spent with the clinician and the interviewer, selecting the next patient entering the consultation room tends to be the operationally most efficient unbiased sampling method for patient exit interviews. © 2016 The Authors. Health Services Research published by Wiley Periodicals, Inc. on behalf of Health Research and Educational Trust.
Asteroid orbital inversion using uniform phase-space sampling
NASA Astrophysics Data System (ADS)
Muinonen, K.; Pentikäinen, H.; Granvik, M.; Oszkiewicz, D.; Virtanen, J.
2014-07-01
We review statistical inverse methods for asteroid orbit computation from a small number of astrometric observations and short time intervals of observations. With the help of Markov-chain Monte Carlo methods (MCMC), we present a novel inverse method that utilizes uniform sampling of the phase space for the orbital elements. The statistical orbital ranging method (Virtanen et al. 2001, Muinonen et al. 2001) was set out to resolve the long-lasting challenges in the initial computation of orbits for asteroids. The ranging method starts from the selection of a pair of astrometric observations. Thereafter, the topocentric ranges and angular deviations in R.A. and Decl. are randomly sampled. The two Cartesian positions allow for the computation of orbital elements and, subsequently, the computation of ephemerides for the observation dates. Candidate orbital elements are included in the sample of accepted elements if the χ^2-value between the observed and computed observations is within a pre-defined threshold. The sample orbital elements obtain weights based on a certain debiasing procedure. When the weights are available, the full sample of orbital elements allows the probabilistic assessments for, e.g., object classification and ephemeris computation as well as the computation of collision probabilities. The MCMC ranging method (Oszkiewicz et al. 2009; see also Granvik et al. 2009) replaces the original sampling algorithm described above with a proposal probability density function (p.d.f.), and a chain of sample orbital elements results in the phase space. MCMC ranging is based on a bivariate Gaussian p.d.f. for the topocentric ranges, and allows for the sampling to focus on the phase-space domain with most of the probability mass. In the virtual-observation MCMC method (Muinonen et al. 2012), the proposal p.d.f. for the orbital elements is chosen to mimic the a posteriori p.d.f. for the elements: first, random errors are simulated for each observation, resulting in a set of virtual observations; second, corresponding virtual least-squares orbital elements are derived using the Nelder-Mead downhill simplex method; third, repeating the procedure two times allows for a computation of a difference for two sets of virtual orbital elements; and, fourth, this orbital-element difference constitutes a symmetric proposal in a random-walk Metropolis-Hastings algorithm, avoiding the explicit computation of the proposal p.d.f. In a discrete approximation, the allowed proposals coincide with the differences that are based on a large number of pre-computed sets of virtual least-squares orbital elements. The virtual-observation MCMC method is thus based on the characterization of the relevant volume in the orbital-element phase space. Here we utilize MCMC to map the phase-space domain of acceptable solutions. We can make use of the proposal p.d.f.s from the MCMC ranging and virtual-observation methods. The present phase-space mapping produces, upon convergence, a uniform sampling of the solution space within a pre-defined χ^2-value. The weights of the sampled orbital elements are then computed on the basis of the corresponding χ^2-values. The present method resembles the original ranging method. On one hand, MCMC mapping is insensitive to local extrema in the phase space and efficiently maps the solution space. This is somewhat contrary to the MCMC methods described above. On the other hand, MCMC mapping can suffer from producing a small number of sample elements with small χ^2-values, in resemblance to the original ranging method. We apply the methods to example near-Earth, main-belt, and transneptunian objects, and highlight the utilization of the methods in the data processing and analysis pipeline of the ESA Gaia space mission.
ERIC Educational Resources Information Center
Kariuki, Patrick N. K.; Bush, Elizabeth Danielle
2008-01-01
The purpose of this study was to examine the effects of Total Physical Response by Storytelling and the traditional teaching method on a foreign language in a selected high school. The sample consisted of 30 students who were randomly selected and randomly assigned to experimental and control group. The experimental group was taught using Total…
[Use of blood lead data to evaluate and prevent childhood lead poisoning in Latin America].
Romieu, Isabelle
2003-01-01
Exposure to lead is a widespread and serious threat to the health of children in Latin America. Health officials should monitor sources of exposure and health outcomes to design, implement, and evaluate prevention and control activities. To evaluate the magnitude of lead as a public health problem, three key elements must be defined: I) the potential sources of exposure, 2) the indicators to evaluate health effects and environmental exposure, and 3) the sampling methods for the population at risk. Several strategies can be used to select the study population depending on the study objectives, the time limitations, and the available resources. If the objective is to evaluate the magnitude and sources of the problem, the following sampling methods can be used: I) population-based random sampling; 2) facility-based random sampling within hospitals, daycare centers, or schools; 3) target sampling of high risk groups; 4) convenience sampling of volunteers; and 5) case reporting (which can lead to the identification of populations at risk and sources of exposures). For all sampling methods, information gathering should include the use of a questionnaire to collect general information on the participants and on potential local sources of exposure, as well as the collection of biological samples. In interpreting data, one should consider the type of sampling used and the non-response rates, as well as factors that might influence blood lead measurements, such as age and seasonal variability. Blood lead measurements should be integrated in an overall strategy to prevent lead toxicity in children. The English version of this paper is available at: http://www.insp.mx/salud/index.html.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piepel, Gregory F.; Matzke, Brett D.; Sego, Landon H.
2013-04-27
This report discusses the methodology, formulas, and inputs needed to make characterization and clearance decisions for Bacillus anthracis-contaminated and uncontaminated (or decontaminated) areas using a statistical sampling approach. Specifically, the report includes the methods and formulas for calculating the • number of samples required to achieve a specified confidence in characterization and clearance decisions • confidence in making characterization and clearance decisions for a specified number of samples for two common statistically based environmental sampling approaches. In particular, the report addresses an issue raised by the Government Accountability Office by providing methods and formulas to calculate the confidence that amore » decision area is uncontaminated (or successfully decontaminated) if all samples collected according to a statistical sampling approach have negative results. Key to addressing this topic is the probability that an individual sample result is a false negative, which is commonly referred to as the false negative rate (FNR). The two statistical sampling approaches currently discussed in this report are 1) hotspot sampling to detect small isolated contaminated locations during the characterization phase, and 2) combined judgment and random (CJR) sampling during the clearance phase. Typically if contamination is widely distributed in a decision area, it will be detectable via judgment sampling during the characterization phrase. Hotspot sampling is appropriate for characterization situations where contamination is not widely distributed and may not be detected by judgment sampling. CJR sampling is appropriate during the clearance phase when it is desired to augment judgment samples with statistical (random) samples. The hotspot and CJR statistical sampling approaches are discussed in the report for four situations: 1. qualitative data (detect and non-detect) when the FNR = 0 or when using statistical sampling methods that account for FNR > 0 2. qualitative data when the FNR > 0 but statistical sampling methods are used that assume the FNR = 0 3. quantitative data (e.g., contaminant concentrations expressed as CFU/cm2) when the FNR = 0 or when using statistical sampling methods that account for FNR > 0 4. quantitative data when the FNR > 0 but statistical sampling methods are used that assume the FNR = 0. For Situation 2, the hotspot sampling approach provides for stating with Z% confidence that a hotspot of specified shape and size with detectable contamination will be found. Also for Situation 2, the CJR approach provides for stating with X% confidence that at least Y% of the decision area does not contain detectable contamination. Forms of these statements for the other three situations are discussed in Section 2.2. Statistical methods that account for FNR > 0 currently only exist for the hotspot sampling approach with qualitative data (or quantitative data converted to qualitative data). This report documents the current status of methods and formulas for the hotspot and CJR sampling approaches. Limitations of these methods are identified. Extensions of the methods that are applicable when FNR = 0 to account for FNR > 0, or to address other limitations, will be documented in future revisions of this report if future funding supports the development of such extensions. For quantitative data, this report also presents statistical methods and formulas for 1. quantifying the uncertainty in measured sample results 2. estimating the true surface concentration corresponding to a surface sample 3. quantifying the uncertainty of the estimate of the true surface concentration. All of the methods and formulas discussed in the report were applied to example situations to illustrate application of the methods and interpretation of the results.« less
A comparison of two sampling designs for fish assemblage assessment in a large river
Kiraly, Ian A.; Coghlan, Stephen M.; Zydlewski, Joseph D.; Hayes, Daniel
2014-01-01
We compared the efficiency of stratified random and fixed-station sampling designs to characterize fish assemblages in anticipation of dam removal on the Penobscot River, the largest river in Maine. We used boat electrofishing methods in both sampling designs. Multiple 500-m transects were selected randomly and electrofished in each of nine strata within the stratified random sampling design. Within the fixed-station design, up to 11 transects (1,000 m) were electrofished, all of which had been sampled previously. In total, 88 km of shoreline were electrofished during summer and fall in 2010 and 2011, and 45,874 individuals of 34 fish species were captured. Species-accumulation and dissimilarity curve analyses indicated that all sampling effort, other than fall 2011 under the fixed-station design, provided repeatable estimates of total species richness and proportional abundances. Overall, our sampling designs were similar in precision and efficiency for sampling fish assemblages. The fixed-station design was negatively biased for estimating the abundance of species such as Common Shiner Luxilus cornutus and Fallfish Semotilus corporalis and was positively biased for estimating biomass for species such as White Sucker Catostomus commersonii and Atlantic Salmon Salmo salar. However, we found no significant differences between the designs for proportional catch and biomass per unit effort, except in fall 2011. The difference observed in fall 2011 was due to limitations on the number and location of fixed sites that could be sampled, rather than an inherent bias within the design. Given the results from sampling in the Penobscot River, application of the stratified random design is preferable to the fixed-station design due to less potential for bias caused by varying sampling effort, such as what occurred in the fall 2011 fixed-station sample or due to purposeful site selection.
Erlandsson, Lena; Rosenstierne, Maiken W.; McLoughlin, Kevin; Jaing, Crystal; Fomsgaard, Anders
2011-01-01
A common technique used for sensitive and specific diagnostic virus detection in clinical samples is PCR that can identify one or several viruses in one assay. However, a diagnostic microarray containing probes for all human pathogens could replace hundreds of individual PCR-reactions and remove the need for a clear clinical hypothesis regarding a suspected pathogen. We have established such a diagnostic platform for random amplification and subsequent microarray identification of viral pathogens in clinical samples. We show that Phi29 polymerase-amplification of a diverse set of clinical samples generates enough viral material for successful identification by the Microbial Detection Array, demonstrating the potential of the microarray technique for broad-spectrum pathogen detection. We conclude that this method detects both DNA and RNA virus, present in the same sample, as well as differentiates between different virus subtypes. We propose this assay for diagnostic analysis of viruses in clinical samples. PMID:21853040
NeCamp, Timothy; Kilbourne, Amy; Almirall, Daniel
2017-08-01
Cluster-level dynamic treatment regimens can be used to guide sequential treatment decision-making at the cluster level in order to improve outcomes at the individual or patient-level. In a cluster-level dynamic treatment regimen, the treatment is potentially adapted and re-adapted over time based on changes in the cluster that could be impacted by prior intervention, including aggregate measures of the individuals or patients that compose it. Cluster-randomized sequential multiple assignment randomized trials can be used to answer multiple open questions preventing scientists from developing high-quality cluster-level dynamic treatment regimens. In a cluster-randomized sequential multiple assignment randomized trial, sequential randomizations occur at the cluster level and outcomes are observed at the individual level. This manuscript makes two contributions to the design and analysis of cluster-randomized sequential multiple assignment randomized trials. First, a weighted least squares regression approach is proposed for comparing the mean of a patient-level outcome between the cluster-level dynamic treatment regimens embedded in a sequential multiple assignment randomized trial. The regression approach facilitates the use of baseline covariates which is often critical in the analysis of cluster-level trials. Second, sample size calculators are derived for two common cluster-randomized sequential multiple assignment randomized trial designs for use when the primary aim is a between-dynamic treatment regimen comparison of the mean of a continuous patient-level outcome. The methods are motivated by the Adaptive Implementation of Effective Programs Trial which is, to our knowledge, the first-ever cluster-randomized sequential multiple assignment randomized trial in psychiatry.
Unsupervised learning on scientific ocean drilling datasets from the South China Sea
NASA Astrophysics Data System (ADS)
Tse, Kevin C.; Chiu, Hon-Chim; Tsang, Man-Yin; Li, Yiliang; Lam, Edmund Y.
2018-06-01
Unsupervised learning methods were applied to explore data patterns in multivariate geophysical datasets collected from ocean floor sediment core samples coming from scientific ocean drilling in the South China Sea. Compared to studies on similar datasets, but using supervised learning methods which are designed to make predictions based on sample training data, unsupervised learning methods require no a priori information and focus only on the input data. In this study, popular unsupervised learning methods including K-means, self-organizing maps, hierarchical clustering and random forest were coupled with different distance metrics to form exploratory data clusters. The resulting data clusters were externally validated with lithologic units and geologic time scales assigned to the datasets by conventional methods. Compact and connected data clusters displayed varying degrees of correspondence with existing classification by lithologic units and geologic time scales. K-means and self-organizing maps were observed to perform better with lithologic units while random forest corresponded best with geologic time scales. This study sets a pioneering example of how unsupervised machine learning methods can be used as an automatic processing tool for the increasingly high volume of scientific ocean drilling data.
A multispectral photon-counting double random phase encoding scheme for image authentication.
Yi, Faliu; Moon, Inkyu; Lee, Yeon H
2014-05-20
In this paper, we propose a new method for color image-based authentication that combines multispectral photon-counting imaging (MPCI) and double random phase encoding (DRPE) schemes. The sparsely distributed information from MPCI and the stationary white noise signal from DRPE make intruder attacks difficult. In this authentication method, the original multispectral RGB color image is down-sampled into a Bayer image. The three types of color samples (red, green and blue color) in the Bayer image are encrypted with DRPE and the amplitude part of the resulting image is photon counted. The corresponding phase information that has nonzero amplitude after photon counting is then kept for decryption. Experimental results show that the retrieved images from the proposed method do not visually resemble their original counterparts. Nevertheless, the original color image can be efficiently verified with statistical nonlinear correlations. Our experimental results also show that different interpolation algorithms applied to Bayer images result in different verification effects for multispectral RGB color images.
[Respondent-Driven Sampling: a new sampling method to study visible and hidden populations].
Mantecón, Alejandro; Juan, Montse; Calafat, Amador; Becoña, Elisardo; Román, Encarna
2008-01-01
The paper introduces a variant of chain-referral sampling: respondent-driven sampling (RDS). This sampling method shows that methods based on network analysis can be combined with the statistical validity of standard probability sampling methods. In this sense, RDS appears to be a mathematical improvement of snowball sampling oriented to the study of hidden populations. However, we try to prove its validity with populations that are not within a sampling frame but can nonetheless be contacted without difficulty. The basics of RDS are explained through our research on young people (aged 14 to 25) who go clubbing, consume alcohol and other drugs, and have sex. Fieldwork was carried out between May and July 2007 in three Spanish regions: Baleares, Galicia and Comunidad Valenciana. The presentation of the study shows the utility of this type of sampling when the population is accessible but there is a difficulty deriving from the lack of a sampling frame. However, the sample obtained is not a random representative one in statistical terms of the target population. It must be acknowledged that the final sample is representative of a 'pseudo-population' that approximates to the target population but is not identical to it.
Polynomial chaos representation of databases on manifolds
DOE Office of Scientific and Technical Information (OSTI.GOV)
Soize, C., E-mail: christian.soize@univ-paris-est.fr; Ghanem, R., E-mail: ghanem@usc.edu
2017-04-15
Characterizing the polynomial chaos expansion (PCE) of a vector-valued random variable with probability distribution concentrated on a manifold is a relevant problem in data-driven settings. The probability distribution of such random vectors is multimodal in general, leading to potentially very slow convergence of the PCE. In this paper, we build on a recent development for estimating and sampling from probabilities concentrated on a diffusion manifold. The proposed methodology constructs a PCE of the random vector together with an associated generator that samples from the target probability distribution which is estimated from data concentrated in the neighborhood of the manifold. Themore » method is robust and remains efficient for high dimension and large datasets. The resulting polynomial chaos construction on manifolds permits the adaptation of many uncertainty quantification and statistical tools to emerging questions motivated by data-driven queries.« less
Comparison of Methods for Estimating Low Flow Characteristics of Streams
Tasker, Gary D.
1987-01-01
Four methods for estimating the 7-day, 10-year and 7-day, 20-year low flows for streams are compared by the bootstrap method. The bootstrap method is a Monte Carlo technique in which random samples are drawn from an unspecified sampling distribution defined from observed data. The nonparametric nature of the bootstrap makes it suitable for comparing methods based on a flow series for which the true distribution is unknown. Results show that the two methods based on hypothetical distribution (Log-Pearson III and Weibull) had lower mean square errors than did the G. E. P. Box-D. R. Cox transformation method or the Log-W. C. Boughton method which is based on a fit of plotting positions.
Kim, Yoonsang; Emery, Sherry
2013-01-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
Multiple Comparisons of Observation Means--Are the Means Significantly Different?
ERIC Educational Resources Information Center
Fahidy, T. Z.
2009-01-01
Several currently popular methods of ascertaining which treatment (population) means are different, via random samples obtained under each treatment, are briefly described and illustrated by evaluating catalyst performance in a chemical reactor.
High throughput nonparametric probability density estimation.
Farmer, Jenny; Jacobs, Donald
2018-01-01
In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.
High throughput nonparametric probability density estimation
Farmer, Jenny
2018-01-01
In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference. PMID:29750803
Management of low-grade cervical abnormalities detected at screening: which method do women prefer?
Whynes, D K; Woolley, C; Philips, Z
2008-12-01
To establish whether women with low-grade abnormalities detected during screening for cervical cancer prefer to be managed by cytological surveillance or by immediate colposcopy. TOMBOLA (Trial of Management of Borderline and Other Low-grade Abnormal smears) is a randomized controlled trial comparing alternative management strategies following the screen-detection of low-grade cytological abnormalities. At exit, a sample of TOMBOLA women completed a questionnaire eliciting opinions on their management, contingent valuations (CV) of the management methods and preferences. Within-trial quality of life (EQ-5D) data collected for a sample of TOMBOLA women throughout their follow-up enabled the comparison of self-reported health at various time points, by management method. Once management had been initiated, self-reported health in the colposcopy arm rose relative to that in the surveillance arm, although the effect was short-term only. For the majority of women, the satisfaction ratings and the CV indicated approval of the management method to which they had been randomized. Of the minority manifesting a preference for the method which they had not experienced, relatively more would have preferred colposcopy than would have preferred surveillance. The findings must be interpreted in the light of sample bias with respect to preferences, whereby enthusiasm for colposcopy was probably over-represented amongst trial participants. The study suggests that neither of the management methods is preferred unequivocally; rather, individual women have individual preferences, although many would be indifferent between methods.
Visell, Yon
2015-04-01
This paper proposes a fast, physically accurate method for synthesizing multimodal, acoustic and haptic, signatures of distributed fracture in quasi-brittle heterogeneous materials, such as wood, granular media, or other fiber composites. Fracture processes in these materials are challenging to simulate with existing methods, due to the prevalence of large numbers of disordered, quasi-random spatial degrees of freedom, representing the complex physical state of a sample over the geometric volume of interest. Here, I develop an algorithm for simulating such processes, building on a class of statistical lattice models of fracture that have been widely investigated in the physics literature. This algorithm is enabled through a recently published mathematical construction based on the inverse transform method of random number sampling. It yields a purely time domain stochastic jump process representing stress fluctuations in the medium. The latter can be readily extended by a mean field approximation that captures the averaged constitutive (stress-strain) behavior of the material. Numerical simulations and interactive examples demonstrate the ability of these algorithms to generate physically plausible acoustic and haptic signatures of fracture in complex, natural materials interactively at audio sampling rates.
Ngwakongnwi, Emmanuel; King-Shier, Kathryn M; Hemmelgarn, Brenda R; Musto, Richard; Quan, Hude
2014-01-01
Francophones who live outside the primarily French-speaking province of Quebec, Canada, risk being excluded from research by lack of a sampling frame. We examined the adequacy of random sampling, advertising, and respondent-driven sampling for recruitment of francophones for survey research. We recruited francophones residing in the city of Calgary, Alberta, through advertising and respondentdriven sampling. These 2 samples were then compared with a random subsample of Calgary francophones derived from the 2006 Canadian Community Health Survey (CCHS). We assessed the effectiveness of advertising and respondent-driven sampling in relation to the CCHS sample by comparing demographic characteristics and selected items from the CCHS (specifically self-reported general health status, perceived weight, and having a family doctor). We recruited 120 francophones through advertising and 145 through respondent-driven sampling; the random sample from the CCHS consisted of 259 records. The samples derived from advertising and respondentdriven sampling differed from the CCHS in terms of age (mean ages 41.0, 37.6, and 42.5 years, respectively), sex (proportion of males 26.1%, 40.6%, and 56.6%, respectively), education (college or higher 86.7% , 77.9% , and 59.1%, respectively), place of birth (immigrants accounting for 45.8%, 55.2%, and 3.7%, respectively), and not having a regular medical doctor (16.7%, 34.5%, and 16.6%, respectively). Differences were not tested statistically because of limitations on the analysis of CCHS data imposed by Statistics Canada. The samples generated exclusively through advertising and respondent-driven sampling were not representative of the gold standard sample from the CCHS. Use of such biased samples for research studies could generate misleading results.
Knott, V; Rees, D J; Cheng, Z; Brownlee, G G
1988-01-01
Sets of overlapping cosmid clones generated by random sampling and fingerprinting methods complement data at pyrB (96.5') and oriC (84') in the published physical map of E. coli. A new cloning strategy using sheared DNA, and a low copy, inducible cosmid vector were used in order to reduce bias in libraries, in conjunction with micro-methods for preparing cosmid DNA from a large number of clones. Our results are relevant to the design of the best approach to the physical mapping of large genomes. PMID:2834694
A comparison of fitness-case sampling methods for genetic programming
NASA Astrophysics Data System (ADS)
Martínez, Yuliana; Naredo, Enrique; Trujillo, Leonardo; Legrand, Pierrick; López, Uriel
2017-11-01
Genetic programming (GP) is an evolutionary computation paradigm for automatic program induction. GP has produced impressive results but it still needs to overcome some practical limitations, particularly its high computational cost, overfitting and excessive code growth. Recently, many researchers have proposed fitness-case sampling methods to overcome some of these problems, with mixed results in several limited tests. This paper presents an extensive comparative study of four fitness-case sampling methods, namely: Interleaved Sampling, Random Interleaved Sampling, Lexicase Selection and Keep-Worst Interleaved Sampling. The algorithms are compared on 11 symbolic regression problems and 11 supervised classification problems, using 10 synthetic benchmarks and 12 real-world data-sets. They are evaluated based on test performance, overfitting and average program size, comparing them with a standard GP search. Comparisons are carried out using non-parametric multigroup tests and post hoc pairwise statistical tests. The experimental results suggest that fitness-case sampling methods are particularly useful for difficult real-world symbolic regression problems, improving performance, reducing overfitting and limiting code growth. On the other hand, it seems that fitness-case sampling cannot improve upon GP performance when considering supervised binary classification.
Xun-Ping, W; An, Z
2017-07-27
Objective To optimize and simplify the survey method of Oncomelania hupensis snails in marshland endemic regions of schistosomiasis, so as to improve the precision, efficiency and economy of the snail survey. Methods A snail sampling strategy (Spatial Sampling Scenario of Oncomelania based on Plant Abundance, SOPA) which took the plant abundance as auxiliary variable was explored and an experimental study in a 50 m×50 m plot in a marshland in the Poyang Lake region was performed. Firstly, the push broom surveyed data was stratified into 5 layers by the plant abundance data; then, the required numbers of optimal sampling points of each layer through Hammond McCullagh equation were calculated; thirdly, every sample point in the line with the Multiple Directional Interpolation (MDI) placement scheme was pinpointed; and finally, the comparison study among the outcomes of the spatial random sampling strategy, the traditional systematic sampling method, the spatial stratified sampling method, Sandwich spatial sampling and inference and SOPA was performed. Results The method (SOPA) proposed in this study had the minimal absolute error of 0.213 8; and the traditional systematic sampling method had the largest estimate, and the absolute error was 0.924 4. Conclusion The snail sampling strategy (SOPA) proposed in this study obtains the higher estimation accuracy than the other four methods.
Leray, Matthieu; Knowlton, Nancy
2017-01-01
DNA metabarcoding, the PCR-based profiling of natural communities, is becoming the method of choice for biodiversity monitoring because it circumvents some of the limitations inherent to traditional ecological surveys. However, potential sources of bias that can affect the reproducibility of this method remain to be quantified. The interpretation of differences in patterns of sequence abundance and the ecological relevance of rare sequences remain particularly uncertain. Here we used one artificial mock community to explore the significance of abundance patterns and disentangle the effects of two potential biases on data reproducibility: indexed PCR primers and random sampling during Illumina MiSeq sequencing. We amplified a short fragment of the mitochondrial Cytochrome c Oxidase Subunit I (COI) for a single mock sample containing equimolar amounts of total genomic DNA from 34 marine invertebrates belonging to six phyla. We used seven indexed broad-range primers and sequenced the resulting library on two consecutive Illumina MiSeq runs. The total number of Operational Taxonomic Units (OTUs) was ∼4 times higher than expected based on the composition of the mock sample. Moreover, the total number of reads for the 34 components of the mock sample differed by up to three orders of magnitude. However, 79 out of 86 of the unexpected OTUs were represented by <10 sequences that did not appear consistently across replicates. Our data suggest that random sampling of rare OTUs (e.g., small associated fauna such as parasites) accounted for most of variation in OTU presence-absence, whereas biases associated with indexed PCRs accounted for a larger amount of variation in relative abundance patterns. These results suggest that random sampling during sequencing leads to the low reproducibility of rare OTUs. We suggest that the strategy for handling rare OTUs should depend on the objectives of the study. Systematic removal of rare OTUs may avoid inflating diversity based on common β descriptors but will exclude positive records of taxa that are functionally important. Our results further reinforce the need for technical replicates (parallel PCR and sequencing from the same sample) in metabarcoding experimental designs. Data reproducibility should be determined empirically as it will depend upon the sequencing depth, the type of sample, the sequence analysis pipeline, and the number of replicates. Moreover, estimating relative biomasses or abundances based on read counts remains elusive at the OTU level.
Simulation of design-unbiased point-to-particle sampling compared to alternatives on plantation rows
Thomas B. Lynch; David Hamlin; Mark J. Ducey
2016-01-01
Total quantities of tree attributes can be estimated in plantations by sampling on plantation rows using several methods. At random sample points on a row, either fixed row lengths or variable row lengths with a fixed number of sample trees can be assessed. Ratio of means or mean of ratios estimators can be developed for the fixed number of trees option but are not...
Benefits of Multiple Methods for Evaluating HIV Counseling and Testing Sites in Pennsylvania.
ERIC Educational Resources Information Center
Encandela, John A.; Gehl, Mary Beth; Silvestre, Anthony; Schelzel, George
1999-01-01
Examines results from two methods used to evaluate publicly funded human immunodeficiency virus (HIV) counseling and testing in Pennsylvania. Results of written mail surveys of all sites and interviews from a random sample of 30 sites were similar in terms of questions posed and complementary in other ways. (SLD)
Adolescent Sexuality Related Beliefs and Differences by Sexual Experience Status
ERIC Educational Resources Information Center
Tolma, Eleni L.; Oman, Roy F.; Vesely, Sara K.; Aspy, Cheryl B.; Rodine, Sharon; Marshall, LaDonna; Fluhr, Janene
2007-01-01
Purpose: To examine if attitudes toward premarital sex, beliefs about peer influence, and family communication about sexual relationships differ by sexual experience status. Methods: Data were collected from a randomly selected ethnically diverse youth sample (N = 1,318) residing in two Midwestern cities. The primary method used in data analysis…
Assessment of the Implementation of Continuous Assessment: The Case of METTU University
ERIC Educational Resources Information Center
Walde, Getinet Seifu
2016-01-01
This paper examines the status of the implementation of continuous assessment (CA) in Mettu University. A random stratified sampling method was used to select 309 students and 29 instructors and purposive method used to select quality assurance and faculty Deans. Questionnaires, focus group discussion, interview and documents were used for data…
ERIC Educational Resources Information Center
Tipton, Elizabeth; Fellers, Lauren; Caverly, Sarah; Vaden-Kiernan, Michael; Borman, Geoffrey; Sullivan, Kate; Ruiz de Castilla, Veronica
2016-01-01
Recently, statisticians have begun developing methods to improve the generalizability of results from large-scale experiments in education. This work has included the development of methods for improved site selection when random sampling is infeasible, including the use of stratification and targeted recruitment strategies. This article provides…
TNO/Centaurs grouping tested with asteroid data sets
NASA Astrophysics Data System (ADS)
Fulchignoni, M.; Birlan, M.; Barucci, M. A.
2001-11-01
Recently, we have discussed the possible subdivision in few groups of a sample of 22 TNO and Centaurs for which the BVRIJ photometry were available (Barucci et al., 2001, A&A, 371,1150). We obtained this results using the multivariate statistics adopted to define the current asteroid taxonomy, namely the Principal Components Analysis and the G-mode method (Tholen & Barucci, 1989, in ASTEROIDS II). How these methods work with a very small statistical sample as the TNO/Centaurs one? Theoretically, the number of degrees of freedom of the sample is correct. In fact it is 88 in our case and have to be larger then 50 to cope with the requirements of the G-mode. Does the random sampling of the small number of members of a large population contain enough information to reveal some structure in the population? We extracted several samples of 22 asteroids out of a data-base of 86 objects of known taxonomic type for which BVRIJ photometry is available from ECAS (Zellner et al. 1985, ICARUS 61, 355), SMASS II (S.W. Bus, 1999, PhD Thesis, MIT), and the Bell et al. Atlas of the asteroid infrared spectra. The objects constituting the first sample were selected in order to give a good representation of the major asteroid taxonomic classes (at least three samples each class): C,S,D,A, and G. Both methods were able to distinguish all these groups confirming the validity of the adopted methods. The S class is hard to individuate as a consequence of the choice of I and J variables, which imply a lack of information on the absorption band at 1 micron. The other samples were obtained by random choice of the objects. Not all the major groups were well represented (less than three samples per groups), but the general trend of the asteroid taxonomy has been always obtained. We conclude that the quoted grouping of TNO/Centaurs is representative of some physico-chemical structure of the outer solar system small body population.
Stable and efficient retrospective 4D-MRI using non-uniformly distributed quasi-random numbers
NASA Astrophysics Data System (ADS)
Breuer, Kathrin; Meyer, Cord B.; Breuer, Felix A.; Richter, Anne; Exner, Florian; Weng, Andreas M.; Ströhle, Serge; Polat, Bülent; Jakob, Peter M.; Sauer, Otto A.; Flentje, Michael; Weick, Stefan
2018-04-01
The purpose of this work is the development of a robust and reliable three-dimensional (3D) Cartesian imaging technique for fast and flexible retrospective 4D abdominal MRI during free breathing. To this end, a non-uniform quasi random (NU-QR) reordering of the phase encoding (k y –k z ) lines was incorporated into 3D Cartesian acquisition. The proposed sampling scheme allocates more phase encoding points near the k-space origin while reducing the sampling density in the outer part of the k-space. Respiratory self-gating in combination with SPIRiT-reconstruction is used for the reconstruction of abdominal data sets in different respiratory phases (4D-MRI). Six volunteers and three patients were examined at 1.5 T during free breathing. Additionally, data sets with conventional two-dimensional (2D) linear and 2D quasi random phase encoding order were acquired for the volunteers for comparison. A quantitative evaluation of image quality versus scan times (from 70 s to 626 s) for the given sampling schemes was obtained by calculating the normalized mutual information (NMI) for all volunteers. Motion estimation was accomplished by calculating the maximum derivative of a signal intensity profile of a transition (e.g. tumor or diaphragm). The 2D non-uniform quasi-random distribution of phase encoding lines in Cartesian 3D MRI yields more efficient undersampling patterns for parallel imaging compared to conventional uniform quasi-random and linear sampling. Median NMI values of NU-QR sampling are the highest for all scan times. Therefore, within the same scan time 4D imaging could be performed with improved image quality. The proposed method allows for the reconstruction of motion artifact reduced 4D data sets with isotropic spatial resolution of 2.1 × 2.1 × 2.1 mm3 in a short scan time, e.g. 10 respiratory phases in only 3 min. Cranio-caudal tumor displacements between 23 and 46 mm could be observed. NU-QR sampling enables for stable 4D-MRI with high temporal and spatial resolution within short scan time for visualization of organ or tumor motion during free breathing. Further studies, e.g. the application of the method for radiotherapy planning are needed to investigate the clinical applicability and diagnostic value of the approach.
Ensemble Feature Learning of Genomic Data Using Support Vector Machine
Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.
2016-01-01
The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923
Selby-Harrington, M; Sorenson, J R; Quade, D; Stearns, S C; Tesh, A S; Donat, P L
1995-01-01
OBJECTIVES. A randomized controlled trial was conducted to test the effectiveness and cost effectiveness of three outreach interventions to promote well-child screening for children on Medicaid. METHODS. In rural North Carolina, a random sample of 2053 families with children due or overdue for screening was stratified according to the presence of a home phone. Families were randomly assigned to receive a mailed pamphlet and letter, a phone call, or a home visit outreach intervention, or the usual (control) method of informing at Medicaid intake. RESULTS. All interventions produced more screenings than the control method, but increases were significant only for families with phones. Among families with phones, a home visit was the most effective intervention but a phone call was the most cost-effective. However, absolute rates of effectiveness were low, and incremental costs per effect were high. CONCLUSIONS. Pamphlets, phone calls, and home visits by nurses were minimally effective for increasing well-child screenings. Alternate outreach methods are needed, especially for families without phones. PMID:7573627
Event-triggered synchronization for reaction-diffusion complex networks via random sampling
NASA Astrophysics Data System (ADS)
Dong, Tao; Wang, Aijuan; Zhu, Huiyun; Liao, Xiaofeng
2018-04-01
In this paper, the synchronization problem of the reaction-diffusion complex networks (RDCNs) with Dirichlet boundary conditions is considered, where the data is sampled randomly. An event-triggered controller based on the sampled data is proposed, which can reduce the number of controller and the communication load. Under this strategy, the synchronization problem of the diffusion complex network is equivalently converted to the stability of a of reaction-diffusion complex dynamical systems with time delay. By using the matrix inequality technique and Lyapunov method, the synchronization conditions of the RDCNs are derived, which are dependent on the diffusion term. Moreover, it is found the proposed control strategy can get rid of the Zeno behavior naturally. Finally, a numerical example is given to verify the obtained results.
Feature genes predicting the FLT3/ITD mutation in acute myeloid leukemia
LI, CHENGLONG; ZHU, BIAO; CHEN, JIAO; HUANG, XIAOBING
2016-01-01
In the present study, gene expression profiles of acute myeloid leukemia (AML) samples were analyzed to identify feature genes with the capacity to predict the mutation status of FLT3/ITD. Two machine learning models, namely the support vector machine (SVM) and random forest (RF) methods, were used for classification. Four datasets were downloaded from the European Bioinformatics Institute, two of which (containing 371 samples, including 281 FLT3/ITD mutation-negative and 90 mutation-positive samples) were randomly defined as the training group, while the other two datasets (containing 488 samples, including 350 FLT3/ITD mutation-negative and 138 mutation-positive samples) were defined as the test group. Differentially expressed genes (DEGs) were identified by significance analysis of the micro-array data by using the training samples. The classification efficiency of the SCM and RF methods was evaluated using the following parameters: Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and the area under the receiver operating characteristic curve. Functional enrichment analysis was performed for the feature genes with DAVID. A total of 585 DEGs were identified in the training group, of which 580 were upregulated and five were downregulated. The classification accuracy rates of the two methods for the training group, the test group and the combined group using the 585 feature genes were >90%. For the SVM and RF methods, the rates of correct determination, specificity and PPV were >90%, while the sensitivity and NPV were >80%. The SVM method produced a slightly better classification effect than the RF method. A total of 13 biological pathways were overrepresented by the feature genes, mainly involving energy metabolism, chromatin organization and translation. The feature genes identified in the present study may be used to predict the mutation status of FLT3/ITD in patients with AML. PMID:27177049
Feature genes predicting the FLT3/ITD mutation in acute myeloid leukemia.
Li, Chenglong; Zhu, Biao; Chen, Jiao; Huang, Xiaobing
2016-07-01
In the present study, gene expression profiles of acute myeloid leukemia (AML) samples were analyzed to identify feature genes with the capacity to predict the mutation status of FLT3/ITD. Two machine learning models, namely the support vector machine (SVM) and random forest (RF) methods, were used for classification. Four datasets were downloaded from the European Bioinformatics Institute, two of which (containing 371 samples, including 281 FLT3/ITD mutation-negative and 90 mutation‑positive samples) were randomly defined as the training group, while the other two datasets (containing 488 samples, including 350 FLT3/ITD mutation-negative and 138 mutation-positive samples) were defined as the test group. Differentially expressed genes (DEGs) were identified by significance analysis of the microarray data by using the training samples. The classification efficiency of the SCM and RF methods was evaluated using the following parameters: Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and the area under the receiver operating characteristic curve. Functional enrichment analysis was performed for the feature genes with DAVID. A total of 585 DEGs were identified in the training group, of which 580 were upregulated and five were downregulated. The classification accuracy rates of the two methods for the training group, the test group and the combined group using the 585 feature genes were >90%. For the SVM and RF methods, the rates of correct determination, specificity and PPV were >90%, while the sensitivity and NPV were >80%. The SVM method produced a slightly better classification effect than the RF method. A total of 13 biological pathways were overrepresented by the feature genes, mainly involving energy metabolism, chromatin organization and translation. The feature genes identified in the present study may be used to predict the mutation status of FLT3/ITD in patients with AML.
Seroprevalence and risk factors associated with bovine brucellosis in the Potohar Plateau, Pakistan.
Ali, Shahzad; Akhter, Shamim; Neubauer, Heinrich; Melzer, Falk; Khan, Iahtasham; Abatih, Emmanuel Nji; El-Adawy, Hosny; Irfan, Muhammad; Muhammad, Ali; Akbar, Muhammad Waqas; Umar, Sajid; Ali, Qurban; Iqbal, Muhammad Naeem; Mahmood, Abid; Ahmed, Haroon
2017-01-28
The seroprevalence and risk factors of bovine brucellosis were studied at animal and herd level using a combination of culture, serological and molecular methods. The study was conducted in 253 randomly selected cattle herds of the Potohar plateau, Pakistan from which a total of 2709 serum (1462 cattle and 1247 buffaloes) and 2330 milk (1168 cattle and 1162 buffaloes) samples were collected. Data on risk factors associated with seroprevalence of brucellosis were collected through interviews using questionnaires. Univariable and multivariable random effects logistic regression models were used for identifying important risk factors at animal and herd levels. One hundred and seventy (6.3%) samples and 47 (18.6%) herds were seropositive for brucellosis by Rose Bengal Plate test. Variations in seroprevalence were observed across the different sampling sites. At animal level, sex, species and stock replacement were found to be potential risk factors for brucellosis. At herd level, herd size (≥9 animals) and insemination method used were important risk factors. The presence of Brucella DNA was confirmed with a real-time polymerase chain reaction assay (qRT-PCR) in 52.4% out of 170 serological positive samples. In total, 156 (6.7%) milk samples were positive by milk ring test. B. abortus biovar 1 was cultured from 5 positive milk samples. This study shows that the seroprevalence of bovine brucellosis is high in some regions in Pakistan. Prevalence was associated with herd size, abortion history, insemination methods used, age, sex and stock replacement methods. The infected animal may act as source of infection for other animals and for humans. The development of control strategies for bovine brucellosis through implementation of continuous surveillance and education programs in Pakistan is warranted.
Caperchione, Cristina M; Duncan, Mitch J; Rosenkranz, Richard R; Vandelanotte, Corneel; Van Itallie, Anetta K; Savage, Trevor N; Hooker, Cindy; Maeder, Anthony J; Mummery, W Kerry; Kolt, Gregory S
2016-04-15
To describe in detail the recruitment methods and enrollment rates, the screening methods, and the baseline characteristics of a sample of adults participating in the Walk 2.0 Study, an 18 month, 3-arm randomized controlled trial of a Web 2.0 based physical activity intervention. A two-fold recruitment plan was developed and implemented, including a direct mail-out to an extract from the Australian Electoral Commission electoral roll, and other supplementary methods including email and telephone. Physical activity screening involved two steps: a validated single-item self-report instrument and the follow-up Active Australia Questionnaire. Readiness for physical activity participation was also based on a two-step process of administering the Physical Activity Readiness Questionnaire and, where needed, further clearance from a medical practitioner. Across all recruitment methods, a total of 1244 participants expressed interest in participating, of which 656 were deemed eligible. Of these, 504 were later enrolled in the Walk 2.0 trial (77% enrollment rate) and randomized to the Walk 1.0 group (n = 165), the Walk 2.0 group (n = 168), or the Logbook group (n = 171). Mean age of the total sample was 50.8 years, with 65.2% female and 79.1% born in Australia. The results of this recruitment process demonstrate the successful use of multiple strategies to obtain a diverse sample of adults eligible to take part in a web-based physical activity promotion intervention. The use of dual screening processes ensured safe participation in the intervention. This approach to recruitment and physical activity screening can be used as a model for further trials in this area.
Methods for estimating the amount of vernal pool habitat in the northeastern United States
Van Meter, R.; Bailey, L.L.; Grant, E.H.C.
2008-01-01
The loss of small, seasonal wetlands is a major concern for a variety of state, local, and federal organizations in the northeastern U.S. Identifying and estimating the number of vernal pools within a given region is critical to developing long-term conservation and management strategies for these unique habitats and their faunal communities. We use three probabilistic sampling methods (simple random sampling, adaptive cluster sampling, and the dual frame method) to estimate the number of vernal pools on protected, forested lands. Overall, these methods yielded similar values of vernal pool abundance for each study area, and suggest that photographic interpretation alone may grossly underestimate the number of vernal pools in forested habitats. We compare the relative efficiency of each method and discuss ways of improving precision. Acknowledging that the objectives of a study or monitoring program ultimately determine which sampling designs are most appropriate, we recommend that some type of probabilistic sampling method be applied. We view the dual-frame method as an especially useful way of combining incomplete remote sensing methods, such as aerial photograph interpretation, with a probabilistic sample of the entire area of interest to provide more robust estimates of the number of vernal pools and a more representative sample of existing vernal pool habitats.
Pipes, W O; Minnigh, H A; Moyer, B; Troy, M A
1986-01-01
A total of 2,601 water samples from six different water systems were tested for coliform bacteria by Clark's presence-absence (P-A) test and by the membrane filter (MF) method. There was no significant difference in the fraction of samples positive for coliform bacteria for any of the systems tested. It was concluded that the two tests are equivalent for monitoring purposes. However, 152 samples were positive for coliform bacteria by the MF method but negative by the P-A test, and 132 samples were positive by the P-A test but negative by the MF method. Many of these differences for individual samples can be explained by random dispersion of bacteria in subsamples when the coliform density is low. However, 15 samples had MF counts greater than 3 and gave negative P-A results. The only apparent explanation for most of these results is that coliform bacteria were present in the P-A test bottles but did not produce acid and gas. Two other studies have reported more samples positive by Clark's P-A test than by the MF method. PMID:3532953
Different hunting strategies select for different weights in red deer
Martínez, María; Rodríguez-Vigal, Carlos; Jones, Owen R; Coulson, Tim; Miguel, Alfonso San
2005-01-01
Much insight can be derived from records of shot animals. Most researchers using such data assume that their data represents a random sample of a particular demographic class. However, hunters typically select a non-random subset of the population and hunting is, therefore, not a random process. Here, with red deer (Cervus elaphus) hunting data from a ranch in Toledo, Spain, we demonstrate that data collection methods have a significant influence upon the apparent relationship between age and weight. We argue that a failure to correct for such methodological bias may have significant consequences for the interpretation of analyses involving weight or correlated traits such as breeding success, and urge researchers to explore methods to identify and correct for such bias in their data. PMID:17148205
Sengüven, Burcu; Baris, Emre; Oygur, Tulin; Berktas, Mehmet
2014-01-01
Discussing a protocol involving xylene-ethanol deparaffinization on slides followed by a kit-based extraction that allows for the extraction of high quality DNA from FFPE tissues. DNA was extracted from the FFPE tissues of 16 randomly selected blocks. Methods involving deparaffinization on slides or tubes, enzyme digestion overnight or for 72 hours and isolation using phenol chloroform method or a silica-based commercial kit were compared in terms of yields, concentrations and the amplifiability. The highest yield of DNA was produced from the samples that were deparaffinized on slides, digested for 72 hours and isolated with a commercial kit. Samples isolated with the phenol-chloroform method produced DNA of lower purity than the samples that were purified with kit. The samples isolated with the commercial kit resulted in better PCR amplification. Silica-based commercial kits and deparaffinized on slides should be considered for DNA extraction from FFPE.
Feature Selection for Ridge Regression with Provable Guarantees.
Paul, Saurabh; Drineas, Petros
2016-04-01
We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shen, C; Chen, L; Jia, X
2016-06-15
Purpose: Reducing x-ray exposure and speeding up data acquisition motived studies on projection data undersampling. It is an important question that for a given undersampling ratio, what the optimal undersampling approach is. In this study, we propose a new undersampling scheme: random-ray undersampling. We will mathematically analyze its projection matrix properties and demonstrate its advantages. We will also propose a new reconstruction method that simultaneously performs CT image reconstruction and projection domain data restoration. Methods: By representing projection operator under the basis of singular vectors of full projection operator, matrix representations for an undersampling case can be generated and numericalmore » singular value decomposition can be performed. We compared properties of matrices among three undersampling approaches: regular-view undersampling, regular-ray undersampling, and the proposed random-ray undersampling. To accomplish CT reconstruction for random undersampling, we developed a novel method that iteratively performs CT reconstruction and missing projection data restoration via regularization approaches. Results: For a given undersampling ratio, random-ray undersampling preserved mathematical properties of full projection operator better than the other two approaches. This translates to advantages of reconstructing CT images at lower errors. Different types of image artifacts were observed depending on undersampling strategies, which were ascribed to the unique singular vectors of the sampling operators in the image domain. We tested the proposed reconstruction algorithm on a Forbid phantom with only 30% of the projection data randomly acquired. Reconstructed image error was reduced from 9.4% in a TV method to 7.6% in the proposed method. Conclusion: The proposed random-ray undersampling is mathematically advantageous over other typical undersampling approaches. It may permit better image reconstruction at the same undersampling ratio. The novel algorithm suitable for this random-ray undersampling was able to reconstruct high-quality images.« less
Nevo, Daniel; Nishihara, Reiko; Ogino, Shuji; Wang, Molin
2017-08-04
In the analysis of time-to-event data with multiple causes using a competing risks Cox model, often the cause of failure is unknown for some of the cases. The probability of a missing cause is typically assumed to be independent of the cause given the time of the event and covariates measured before the event occurred. In practice, however, the underlying missing-at-random assumption does not necessarily hold. Motivated by colorectal cancer molecular pathological epidemiology analysis, we develop a method to conduct valid analysis when additional auxiliary variables are available for cases only. We consider a weaker missing-at-random assumption, with missing pattern depending on the observed quantities, which include the auxiliary covariates. We use an informative likelihood approach that will yield consistent estimates even when the underlying model for missing cause of failure is misspecified. The superiority of our method over naive methods in finite samples is demonstrated by simulation study results. We illustrate the use of our method in an analysis of colorectal cancer data from the Nurses' Health Study cohort, where, apparently, the traditional missing-at-random assumption fails to hold.
NASA Astrophysics Data System (ADS)
Hubert, Maxime; Pacureanu, Alexandra; Guilloud, Cyril; Yang, Yang; da Silva, Julio C.; Laurencin, Jerome; Lefebvre-Joud, Florence; Cloetens, Peter
2018-05-01
In X-ray tomography, ring-shaped artifacts present in the reconstructed slices are an inherent problem degrading the global image quality and hindering the extraction of quantitative information. To overcome this issue, we propose a strategy for suppression of ring artifacts originating from the coherent mixing of the incident wave and the object. We discuss the limits of validity of the empty beam correction in the framework of a simple formalism. We then deduce a correction method based on two-dimensional random sample displacement, with minimal cost in terms of spatial resolution, acquisition, and processing time. The method is demonstrated on bone tissue and on a hydrogen electrode of a ceramic-metallic solid oxide cell. Compared to the standard empty beam correction, we obtain high quality nanotomography images revealing detailed object features. The resulting absence of artifacts allows straightforward segmentation and posterior quantification of the data.
ERIC Educational Resources Information Center
Mapuranga, Barbra; Dumba, Oswald; Musodza, Blessing
2015-01-01
This study investigated the impact of Inclusive Education (IE) on the rights of children with Intellectual Disabilities in schools around Chegutu. The qualitative case study method was used for the research. Questionnaires and interviews were used to collect data from schools around Chegutu. Random sampling was used to choose the sample group from…
ERIC Educational Resources Information Center
Pope, Lizzy; Wolf, Randi L.
2012-01-01
Objective: This pilot study examined whether informing children of the presence of vegetables in select snack food items alters taste preference. Methods: A random sample of 68 elementary and middle school children tasted identical pairs of 3 snack food items containing vegetables. In each pair, 1 sample's label included the food's vegetable (eg,…
ERIC Educational Resources Information Center
Andrade, Naleen N.; Hishinuma, Earl S.; McDermott, John F., Jr.; Johnson, Ronald C.; Goebert, Deborah A.; Makini, George K., Jr.; Nahulu, Linda B.; Yuen, Noelle Y. C.; McArdle, John J.; Bell, Cathy K.; Carlton, Barry S.; Miyamoto, Robin H.; Nishimura, Stephanie T.; Else, Iwalani R. N.; Guerrero, Anthony P. S.; Darmal, Arsalan; Yates, Alayne; Waldron, Jane A.
2006-01-01
Objectives: The prevalence rates of disorders among a community-based sample of Hawaiian youths were determined and compared to previously published epidemiological studies. Method: Using a two-phase design, 7,317 adolescents were surveyed (60% participation rate), from which 619 were selected in a modified random sample during the 1992-1993 to…
Oral Health among Preschool Children with Autism Spectrum Disorders: A Case-Control Study
ERIC Educational Resources Information Center
Du, Rennan Y; Yiu, Cynthia K. Y.; King, Nigel M.; Wong, Virginia C. N.; McGrath, Colman P. J.
2015-01-01
Aim: To assess and compare the oral health status of preschool children with and without autism spectrum disorders. Methods: A random sample of 347 preschool children with autism spectrum disorder was recruited from 19 Special Child Care Centres in Hong Kong. An age- and gender-matched sample was recruited from mainstream preschools as the control…
A Validation Study of the Revised Personal Safety Decision Scale
ERIC Educational Resources Information Center
Kim, HaeJung; Hopkins, Karen M.
2017-01-01
Objective: The purpose of this study is to examine the reliability and validity of an 11-item Personal Safety Decision Scale (PSDS) in a sample of child welfare workers. Methods: Data were derived from a larger cross-sectional online survey to a random stratified sample of 477 public child welfare workers in a mid-Atlantic State. An exploratory…
ERIC Educational Resources Information Center
Dillard, James Price; Spear, Margaret E.
2010-01-01
Objective: To assess knowledge of human papillomavirus (HPV) and perceived barriers to being vaccinated against the virus. Participants: Three hundred ninety-six undergraduate women enrolled at Penn State University in Fall 2008. Methods: A random sample of students were invited to participate in a Web-based survey. Results: Awareness of HPV and…
ERIC Educational Resources Information Center
Naji Qasem, Mamun Ali; Ahmad Gul, Showkeen Bilal
2014-01-01
The study was conducted to know the effect of items direction (positive or negative) on the factorial construction and criterion related validity in Likert scale. The descriptive survey research method was used for the study and the sample consisted of 510 undergraduate students selected by used random sampling technique. A scale developed by…
ERIC Educational Resources Information Center
Green, Samuel B.; Thompson, Marilyn S.; Levy, Roy; Lo, Wen-Juo
2015-01-01
Traditional parallel analysis (T-PA) estimates the number of factors by sequentially comparing sample eigenvalues with eigenvalues for randomly generated data. Revised parallel analysis (R-PA) sequentially compares the "k"th eigenvalue for sample data to the "k"th eigenvalue for generated data sets, conditioned on"k"-…
ERIC Educational Resources Information Center
Jared, Nzabonimpa Buregeya
2011-01-01
The study examined the Influence of Secondary School Head Teachers' General and Instructional Supervisory Practices on Teachers' Work Performance. Qualitative and qualitative methods with a descriptive-correlational research approach were used in the study. Purposive sampling technique alongside random sampling technique was used to select the…
ERIC Educational Resources Information Center
Nam, Yunju; Mason, Lisa Reyes; Kim, Youngmi; Clancy, Margaret; Sherraden, Michael
2013-01-01
This study examined whether and how survey response differs by race and Hispanic origin, using data from birth certificates and survey administrative data for a large-scale statewide experiment. The sample consisted of mothers of infants selected from Oklahoma birth certificates using a stratified random sampling method (N = 7,111). This study…
ERIC Educational Resources Information Center
Shahbazzadeh, Somayeh; Beliad, Mohammad Reza
2017-01-01
This study investigates the mediatory role of exercise self-regulation role in the relationship between personality traits and anger management among athletes. The statistical population of this study includes all athlete students of Shar-e Ghods College, among which 260 people were selected as sample using random sampling method. In addition, the…
Paul F. Hessburg; Bradley G. Smith; Scott D. Kreiter; Craig A. Miller; Cecilia H. McNicoll; Michele. Wasienko-Holland
2000-01-01
In the interior Columbia River basin midscale ecological assessment, we mapped and characterized historical and current vegetation composition and structure of 337 randomly sampled subwatersheds (9500 ha average size) in 43 subbasins (404 000 ha average size). We compared landscape patterns, vegetation structure and composition, and landscape vulnerability to wildfires...
Visualizing Time-Varying Distribution Data in EOS Application
NASA Technical Reports Server (NTRS)
Shen, Han-Wei
2004-01-01
In this research, we have developed several novel visualization methods for spatial probability density function data. Our focus has been on 2D spatial datasets, where each pixel is a random variable, and has multiple samples which are the results of experiments on that random variable. We developed novel clustering algorithms as a means to reduce the information contained in these datasets; and investigated different ways of interpreting and clustering the data.
Generalized Ensemble Sampling of Enzyme Reaction Free Energy Pathways
Wu, Dongsheng; Fajer, Mikolai I.; Cao, Liaoran; Cheng, Xiaolin; Yang, Wei
2016-01-01
Free energy path sampling plays an essential role in computational understanding of chemical reactions, particularly those occurring in enzymatic environments. Among a variety of molecular dynamics simulation approaches, the generalized ensemble sampling strategy is uniquely attractive for the fact that it not only can enhance the sampling of rare chemical events but also can naturally ensure consistent exploration of environmental degrees of freedom. In this review, we plan to provide a tutorial-like tour on an emerging topic: generalized ensemble sampling of enzyme reaction free energy path. The discussion is largely focused on our own studies, particularly ones based on the metadynamics free energy sampling method and the on-the-path random walk path sampling method. We hope that this mini presentation will provide interested practitioners some meaningful guidance for future algorithm formulation and application study. PMID:27498634
Communication: Multiple atomistic force fields in a single enhanced sampling simulation
NASA Astrophysics Data System (ADS)
Hoang Viet, Man; Derreumaux, Philippe; Nguyen, Phuong H.
2015-07-01
The main concerns of biomolecular dynamics simulations are the convergence of the conformational sampling and the dependence of the results on the force fields. While the first issue can be addressed by employing enhanced sampling techniques such as simulated tempering or replica exchange molecular dynamics, repeating these simulations with different force fields is very time consuming. Here, we propose an automatic method that includes different force fields into a single advanced sampling simulation. Conformational sampling using three all-atom force fields is enhanced by simulated tempering and by formulating the weight parameters of the simulated tempering method in terms of the energy fluctuations, the system is able to perform random walk in both temperature and force field spaces. The method is first demonstrated on a 1D system and then validated by the folding of the 10-residue chignolin peptide in explicit water.
A comparative study of clock rate and drift estimation
NASA Technical Reports Server (NTRS)
Breakiron, Lee A.
1994-01-01
Five different methods of drift determination and four different methods of rate determination were compared using months of hourly phase and frequency data from a sample of cesium clocks and active hydrogen masers. Linear least squares on frequency is selected as the optimal method of determining both drift and rate, more on the basis of parameter parsimony and confidence measures than on random and systematic errors.
The adjusting factor method for weight-scaling truckloads of mixed hardwood sawlogs
Edward L. Adams
1976-01-01
A new method of weight-scaling truckloads of mixed hardwood sawlogs systematically adjusts for changes in the weight/volume ratio of logs coming into a sawmill. It uses a conversion factor based on the running average of weight/volume ratios of randomly selected sample loads. A test of the method indicated that over a period of time the weight-scaled volume should...
Methods of Reverberation Mapping. I. Time-lag Determination by Measures of Randomness
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chelouche, Doron; Pozo-Nuñez, Francisco; Zucker, Shay, E-mail: doron@sci.haifa.ac.il, E-mail: francisco.pozon@gmail.com, E-mail: shayz@post.tau.ac.il
A class of methods for measuring time delays between astronomical time series is introduced in the context of quasar reverberation mapping, which is based on measures of randomness or complexity of the data. Several distinct statistical estimators are considered that do not rely on polynomial interpolations of the light curves nor on their stochastic modeling, and do not require binning in correlation space. Methods based on von Neumann’s mean-square successive-difference estimator are found to be superior to those using other estimators. An optimized von Neumann scheme is formulated, which better handles sparsely sampled data and outperforms current implementations of discretemore » correlation function methods. This scheme is applied to existing reverberation data of varying quality, and consistency with previously reported time delays is found. In particular, the size–luminosity relation of the broad-line region in quasars is recovered with a scatter comparable to that obtained by other works, yet with fewer assumptions made concerning the process underlying the variability. The proposed method for time-lag determination is particularly relevant for irregularly sampled time series, and in cases where the process underlying the variability cannot be adequately modeled.« less
Ghodrati, Sajjad; Kandi, Saeideh Gorji; Mohseni, Mohsen
2018-06-01
In recent years, various surface roughness measurement methods have been proposed as alternatives to the commonly used stylus profilometry, which is a low-speed, destructive, expensive but precise method. In this study, a novel method, called "image profilometry," has been introduced for nondestructive, fast, and low-cost surface roughness measurement of randomly rough metallic samples based on image processing and machine vision. The impacts of influential parameters such as image resolution and filtering approach for elimination of the long wavelength surface undulations on the accuracy of the image profilometry results have been comprehensively investigated. Ten surface roughness parameters were measured for the samples using both the stylus and image profilometry. Based on the results, the best image resolution was 800 dpi, and the most practical filtering method was Gaussian convolution+cutoff. In these conditions, the best and worst correlation coefficients (R 2 ) between the stylus and image profilometry results were 0.9892 and 0.9313, respectively. Our results indicated that the image profilometry predicted the stylus profilometry results with high accuracy. Consequently, it could be a viable alternative to the stylus profilometry, particularly in online applications.
The Bayesian group lasso for confounded spatial data
Hefley, Trevor J.; Hooten, Mevin B.; Hanks, Ephraim M.; Russell, Robin E.; Walsh, Daniel P.
2017-01-01
Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.
Exact tests using two correlated binomial variables in contemporary cancer clinical trials.
Yu, Jihnhee; Kepner, James L; Iyer, Renuka
2009-12-01
New therapy strategies for the treatment of cancer are rapidly emerging because of recent technology advances in genetics and molecular biology. Although newer targeted therapies can improve survival without measurable changes in tumor size, clinical trial conduct has remained nearly unchanged. When potentially efficacious therapies are tested, current clinical trial design and analysis methods may not be suitable for detecting therapeutic effects. We propose an exact method with respect to testing cytostatic cancer treatment using correlated bivariate binomial random variables to simultaneously assess two primary outcomes. The method is easy to implement. It does not increase the sample size over that of the univariate exact test and in most cases reduces the sample size required. Sample size calculations are provided for selected designs.
Hsu, Jia-Lien; Hung, Ping-Cheng; Lin, Hung-Yen; Hsieh, Chung-Ho
2015-04-01
Breast cancer is one of the most common cause of cancer mortality. Early detection through mammography screening could significantly reduce mortality from breast cancer. However, most of screening methods may consume large amount of resources. We propose a computational model, which is solely based on personal health information, for breast cancer risk assessment. Our model can be served as a pre-screening program in the low-cost setting. In our study, the data set, consisting of 3976 records, is collected from Taipei City Hospital starting from 2008.1.1 to 2008.12.31. Based on the dataset, we first apply the sampling techniques and dimension reduction method to preprocess the testing data. Then, we construct various kinds of classifiers (including basic classifiers, ensemble methods, and cost-sensitive methods) to predict the risk. The cost-sensitive method with random forest classifier is able to achieve recall (or sensitivity) as 100 %. At the recall of 100 %, the precision (positive predictive value, PPV), and specificity of cost-sensitive method with random forest classifier was 2.9 % and 14.87 %, respectively. In our study, we build a breast cancer risk assessment model by using the data mining techniques. Our model has the potential to be served as an assisting tool in the breast cancer screening.
Sampling pig farms at the abattoir in a cross-sectional study - Evaluation of a sampling method.
Birkegård, Anna Camilla; Halasa, Tariq; Toft, Nils
2017-09-15
A cross-sectional study design is relatively inexpensive, fast and easy to conduct when compared to other study designs. Careful planning is essential to obtaining a representative sample of the population, and the recommended approach is to use simple random sampling from an exhaustive list of units in the target population. This approach is rarely feasible in practice, and other sampling procedures must often be adopted. For example, when slaughter pigs are the target population, sampling the pigs on the slaughter line may be an alternative to on-site sampling at a list of farms. However, it is difficult to sample a large number of farms from an exact predefined list, due to the logistics and workflow of an abattoir. Therefore, it is necessary to have a systematic sampling procedure and to evaluate the obtained sample with respect to the study objective. We propose a method for 1) planning, 2) conducting, and 3) evaluating the representativeness and reproducibility of a cross-sectional study when simple random sampling is not possible. We used an example of a cross-sectional study with the aim of quantifying the association of antimicrobial resistance and antimicrobial consumption in Danish slaughter pigs. It was not possible to visit farms within the designated timeframe. Therefore, it was decided to use convenience sampling at the abattoir. Our approach was carried out in three steps: 1) planning: using data from meat inspection to plan at which abattoirs and how many farms to sample; 2) conducting: sampling was carried out at five abattoirs; 3) evaluation: representativeness was evaluated by comparing sampled and non-sampled farms, and the reproducibility of the study was assessed through simulated sampling based on meat inspection data from the period where the actual data collection was carried out. In the cross-sectional study samples were taken from 681 Danish pig farms, during five weeks from February to March 2015. The evaluation showed that the sampling procedure was reproducible with results comparable to the collected sample. However, the sampling procedure favoured sampling of large farms. Furthermore, both under-sampled and over-sampled areas were found using scan statistics. In conclusion, sampling conducted at abattoirs can provide a spatially representative sample. Hence it is a possible cost-effective alternative to simple random sampling. However, it is important to assess the properties of the resulting sample so that any potential selection bias can be addressed when reporting the findings. Copyright © 2017 Elsevier B.V. All rights reserved.
Nonuniform sampling theorems for random signals in the linear canonical transform domain
NASA Astrophysics Data System (ADS)
Shuiqing, Xu; Congmei, Jiang; Yi, Chai; Youqiang, Hu; Lei, Huang
2018-06-01
Nonuniform sampling can be encountered in various practical processes because of random events or poor timebase. The analysis and applications of the nonuniform sampling for deterministic signals related to the linear canonical transform (LCT) have been well considered and researched, but up to now no papers have been published regarding the various nonuniform sampling theorems for random signals related to the LCT. The aim of this article is to explore the nonuniform sampling and reconstruction of random signals associated with the LCT. First, some special nonuniform sampling models are briefly introduced. Second, based on these models, some reconstruction theorems for random signals from various nonuniform samples associated with the LCT have been derived. Finally, the simulation results are made to prove the accuracy of the sampling theorems. In addition, the latent real practices of the nonuniform sampling for random signals have been also discussed.
Experiential sampling in the study of multiple personality disorder.
Loewenstein, R J; Hamilton, J; Alagna, S; Reid, N; deVries, M
1987-01-01
The authors describe the application of experiential sampling, a new time-sampling method, to the assessment of rapid state changes in a woman with multiple personality disorder. She was signaled at random intervals during study periods and asked to provide information on alternate personality switches, amnesia, and mood state. The alternates displayed some characteristics that were as different as those occurring between separate individuals studied previously with this method. There were notable discrepancies between the self-report study data and information reported during therapy hours. The authors conclude that the phenomenology of multiple personality disorder is frequently more complex than is suspected early in the course of treatment.
Hoffman, Steven J; Justicz, Victoria
2016-07-01
To develop and validate a method for automatically quantifying the scientific quality and sensationalism of individual news records. After retrieving 163,433 news records mentioning the Severe Acute Respiratory Syndrome (SARS) and H1N1 pandemics, a maximum entropy model for inductive machine learning was used to identify relationships among 500 randomly sampled news records that correlated with systematic human assessments of their scientific quality and sensationalism. These relationships were then computationally applied to automatically classify 10,000 additional randomly sampled news records. The model was validated by randomly sampling 200 records and comparing human assessments of them to the computer assessments. The computer model correctly assessed the relevance of 86% of news records, the quality of 65% of records, and the sensationalism of 73% of records, as compared to human assessments. Overall, the scientific quality of SARS and H1N1 news media coverage had potentially important shortcomings, but coverage was not too sensationalizing. Coverage slightly improved between the two pandemics. Automated methods can evaluate news records faster, cheaper, and possibly better than humans. The specific procedure implemented in this study can at the very least identify subsets of news records that are far more likely to have particular scientific and discursive qualities. Copyright © 2016 Elsevier Inc. All rights reserved.
de Jong, Martijn G; Pieters, Rik; Stremersch, Stefan
2012-09-01
Answers to sensitive questions are prone to social desirability bias. If not properly addressed, the validity of the research can be suspect. This article presents multigroup item randomized response theory (MIRRT) to measure self-reported sensitive topics across cultures. The method was specifically developed to reduce social desirability bias by making an a priori change in the design of the survey. The change involves the use of a randomization device (e.g., a die) that preserves participants' privacy at the item level. In cases where multiple items measure a higher level theoretical construct, the researcher could still make inferences at the individual level. The method can correct for under- and overreporting, even if both occur in a sample of individuals or across nations. We present and illustrate MIRRT in a nontechnical manner, provide WinBugs software code so that researchers can directly implement it, and present 2 cross-national studies in which it was applied. The first study compared nonstudent samples from 2 countries (total n = 927) on permissive sexual attitudes and risky sexual behavior and related these to individual-level characteristics such as the Big Five personality traits. The second study compared nonstudent samples from 17 countries (total n = 6,195) on risky sexual behavior and related these to individual-level characteristics, such as gender and age, and to country-level characteristics, such as sex ratio.
NASA Astrophysics Data System (ADS)
Gerwe, David R.; Lee, David J.; Barchers, Jeffrey D.
2000-10-01
A post-processing methodology for reconstructing undersampled image sequences with randomly varying blur is described which can provide image enhancement beyond the sampling resolution of the sensor. This method is demonstrated on simulated imagery and on adaptive optics compensated imagery taken by the Starfire Optical Range 3.5 meter telescope that has been artificially undersampled. Also shown are the results of multiframe blind deconvolution of some of the highest quality optical imagery of low earth orbit satellites collected with a ground based telescope to date. The algorithm used is a generalization of multiframe blind deconvolution techniques which includes a representation of spatial sampling by the focal plane array elements in the forward stochastic model of the imaging system. This generalization enables the random shifts and shape of the adaptive compensated PSF to be used to partially eliminate the aliasing effects associated with sub- Nyquist sampling of the image by the focal plane array. The method could be used to reduce resolution loss which occurs when imaging in wide FOV modes.
NASA Astrophysics Data System (ADS)
Gerwe, David R.; Lee, David J.; Barchers, Jeffrey D.
2002-09-01
We describe a postprocessing methodology for reconstructing undersampled image sequences with randomly varying blur that can provide image enhancement beyond the sampling resolution of the sensor. This method is demonstrated on simulated imagery and on adaptive-optics-(AO)-compensated imagery taken by the Starfire Optical Range 3.5-m telescope that has been artificially undersampled. Also shown are the results of multiframe blind deconvolution of some of the highest quality optical imagery of low earth orbit satellites collected with a ground-based telescope to date. The algorithm used is a generalization of multiframe blind deconvolution techniques that include a representation of spatial sampling by the focal plane array elements based on a forward stochastic model. This generalization enables the random shifts and shape of the AO- compensated point spread function (PSF) to be used to partially eliminate the aliasing effects associated with sub-Nyquist sampling of the image by the focal plane array. The method could be used to reduce resolution loss that occurs when imaging in wide- field-of-view (FOV) modes.
Prevalence of American foulbrood in asymptomatic apiaries of Kurdistan, Iran
Khezri, M.; Moharrami, M.; Modirrousta, H.; Torkaman, M.; Rokhzad, B.; Khanbabaie, H.
2018-01-01
Aim: Paenibacillus larvae subsp. larvae is the etiological agent of American foulbrood (AFB), the most virulent bacterial disease of honey bee brood worldwide. In many countries, AFB is a notifiable disease since it is highly contagious, in most cases incurable, and able to kill affected colonies. The aim of this study was to determine the prevalence of P. larvae subsp. larvae in Kurdistan province apiaries by polymerase chain reaction (PCR) technique. Materials and Methods: A total of 100 samples were randomly purchased from apiaries in Kurdistan, Iran. Apiaries were randomly sampled in accordance with the instructions of the veterinary organization from different provinces and were tested using PCR method and an exclusive primer of 16S rRNA for the presence of P. larvae subsp. larvae. Results: The results of this study indicated a low level of contamination with P. larvae subsp. larvae in the Kurdistan province. The number of positive samples obtained by PCR was 2%. Conclusion: Therefore, monitoring programs for this honeybee disease in Kurdistan should be developed and implemented to ensure that it is detected early and managed. PMID:29657417
ERIC Educational Resources Information Center
Thompson, Bruce
The relationship between analysis of variance (ANOVA) methods and their analogs (analysis of covariance and multiple analyses of variance and covariance--collectively referred to as OVA methods) and the more general analytic case is explored. A small heuristic data set is used, with a hypothetical sample of 20 subjects, randomly assigned to five…
Condom and Other Contraceptive Use among a Random Sample of Female Adolescents: A Snapshot in Time.
ERIC Educational Resources Information Center
Grimley, Diane M.; Lee, Patricia A.
1997-01-01
Examines the sexual practices of 235 females aged 15 to 19 years and their readiness to use specific contraceptive methods. Results indicate that, despite the availability of newer contraceptive methods, most sexually active adolescents were least resistant to using condoms, perceiving the male condom as an acceptable preventative both for…
Factors Affecting the Unemployment (Rate) of Female Art Graduates in Iran
ERIC Educational Resources Information Center
Hedayat, Mina; Kahn, Sabzali Musa; Hanafi, Jaffri
2013-01-01
The aim of this study is to explore the relationship between the opportunities of female artist graduates in Tehran Province and the current employment market. Mixed method was employed in this study. The population of the current study consisted of 240 female artist graduates selected using a systematic random sampling method from both public and…
Assessing relative abundance and reproductive success of shrubsteppe raptors
Lehman, Robert N.; Carpenter, L.B.; Steenhof, Karen; Kochert, Michael N.
1998-01-01
From 1991-1994, we quantified relative abundance and reproductive success of the Ferruginous Hawk (Buteo regalis), Northern Harrier (Circus cyaneus), Burrowing Owl (Speotytoc unicularia), and Short-eared Owl (Asio flammeus) on the shrubsteppe plateaus (benchlands) in and near the Snake River Birds of Prey National Conservation Area in southwestern Idaho. To assess relative abundance, we searched randomly selected plots using four sampling methods: point counts, line transects, and quadrats of two sizes. On a persampling-effort basis, transects were slightly more effective than point counts and quadrats for locating raptor nests (3.4 pairs detected/100 h of effort vs. 2.2-3.1 pairs). Random sampling using quadrats failed to detect a Short-eared Owl population increase from 1993 to 1994. To evaluate nesting success, we tried to determine reproductive outcome for all nesting attempts located during random, historical, and incidental nest searches. We compared nesting success estimates based on all nesting attempts, on attempts found during incubation, and the Mayfield model. Most pairs used to evaluate success were pairs found incidentally. Visits to historical nesting areas yielded the highest number of pairs per sampling effort (14.6/100 h), but reoccupancy rates for most species decreased through time. Estimates based on all attempts had the highest sample sizes but probably overestimated success for all species except the Ferruginous Hawk. Estimates of success based on nesting attempts found during incubation had the lowest sample sizes. All three methods yielded biased nesting snccess estimates for the Northern Harrier and Short-eared Owl. The estimate based on pairs found during incubation probably provided the least biased estimate for the Burrowing Owl. Assessments of nesting success were hindered by difficulties in confirming egg laying and nesting success for all species except the Ferruginous hawk.
Uechi, Ken; Asakura, Keiko; Ri, Yui; Masayasu, Shizuko; Sasaki, Satoshi
2016-02-01
Several estimation methods for 24-h sodium excretion using spot urine sample have been reported, but accurate estimation at the individual level remains difficult. We aimed to clarify the most accurate method of estimating 24-h sodium excretion with different numbers of available spot urine samples. A total of 370 participants from throughout Japan collected multiple 24-h urine and spot urine samples independently. Participants were allocated randomly into a development and a validation dataset. Two estimation methods were established in the development dataset using the two 24-h sodium excretion samples as reference: the 'simple mean method' estimated by multiplying the sodium-creatinine ratio by predicted 24-h creatinine excretion, whereas the 'regression method' employed linear regression analysis. The accuracy of the two methods was examined by comparing the estimated means and concordance correlation coefficients (CCC) in the validation dataset. Mean sodium excretion by the simple mean method with three spot urine samples was closest to that by 24-h collection (difference: -1.62 mmol/day). CCC with the simple mean method increased with an increased number of spot urine samples at 0.20, 0.31, and 0.42 using one, two, and three samples, respectively. This method with three spot urine samples yielded higher CCC than the regression method (0.40). When only one spot urine sample was available for each study participant, CCC was higher with the regression method (0.36). The simple mean method with three spot urine samples yielded the most accurate estimates of sodium excretion. When only one spot urine sample was available, the regression method was preferable.
Zheng, Lianqing; Chen, Mengen; Yang, Wei
2009-06-21
To overcome the pseudoergodicity problem, conformational sampling can be accelerated via generalized ensemble methods, e.g., through the realization of random walks along prechosen collective variables, such as spatial order parameters, energy scaling parameters, or even system temperatures or pressures, etc. As usually observed, in generalized ensemble simulations, hidden barriers are likely to exist in the space perpendicular to the collective variable direction and these residual free energy barriers could greatly abolish the sampling efficiency. This sampling issue is particularly severe when the collective variable is defined in a low-dimension subset of the target system; then the "Hamiltonian lagging" problem, which reveals the fact that necessary structural relaxation falls behind the move of the collective variable, may be likely to occur. To overcome this problem in equilibrium conformational sampling, we adopted the orthogonal space random walk (OSRW) strategy, which was originally developed in the context of free energy simulation [L. Zheng, M. Chen, and W. Yang, Proc. Natl. Acad. Sci. U.S.A. 105, 20227 (2008)]. Thereby, generalized ensemble simulations can simultaneously escape both the explicit barriers along the collective variable direction and the hidden barriers that are strongly coupled with the collective variable move. As demonstrated in our model studies, the present OSRW based generalized ensemble treatments show improved sampling capability over the corresponding classical generalized ensemble treatments.
Sengüven, Burcu; Baris, Emre; Oygur, Tulin; Berktas, Mehmet
2014-01-01
Aim: Discussing a protocol involving xylene-ethanol deparaffinization on slides followed by a kit-based extraction that allows for the extraction of high quality DNA from FFPE tissues. Methods: DNA was extracted from the FFPE tissues of 16 randomly selected blocks. Methods involving deparaffinization on slides or tubes, enzyme digestion overnight or for 72 hours and isolation using phenol chloroform method or a silica-based commercial kit were compared in terms of yields, concentrations and the amplifiability. Results: The highest yield of DNA was produced from the samples that were deparaffinized on slides, digested for 72 hours and isolated with a commercial kit. Samples isolated with the phenol-chloroform method produced DNA of lower purity than the samples that were purified with kit. The samples isolated with the commercial kit resulted in better PCR amplification. Conclusion: Silica-based commercial kits and deparaffinized on slides should be considered for DNA extraction from FFPE. PMID:24688314
The usefulness of Skylab/EREP S-190 and S-192 imagery in multistage forest surveys
NASA Technical Reports Server (NTRS)
Langley, P. G.; Vanroessel, J. (Principal Investigator)
1976-01-01
The author has identified the following significant results. The RMSE of point location achieved with the annotation system on S190A imagery was 100 m and 90 m in the x and y direction, respectively. Potential gains in sampling precision attributable to space derived imagery ranged from 4.9 to 43.3 percent depending on the image type, interpretation method, time of year, and sampling method applied. Seasonal variation was significant. S190A products obtained in September yielded higher gains than those obtained in June. Using 100 primary sample units as a base under simple random sampling, the revenue made available for incorporating space acquired data into the sample design to estimate timber volume was as high as $39,400.00.
Probabilistic inference using linear Gaussian importance sampling for hybrid Bayesian networks
NASA Astrophysics Data System (ADS)
Sun, Wei; Chang, K. C.
2005-05-01
Probabilistic inference for Bayesian networks is in general NP-hard using either exact algorithms or approximate methods. However, for very complex networks, only the approximate methods such as stochastic sampling could be used to provide a solution given any time constraint. There are several simulation methods currently available. They include logic sampling (the first proposed stochastic method for Bayesian networks, the likelihood weighting algorithm) the most commonly used simulation method because of its simplicity and efficiency, the Markov blanket scoring method, and the importance sampling algorithm. In this paper, we first briefly review and compare these available simulation methods, then we propose an improved importance sampling algorithm called linear Gaussian importance sampling algorithm for general hybrid model (LGIS). LGIS is aimed for hybrid Bayesian networks consisting of both discrete and continuous random variables with arbitrary distributions. It uses linear function and Gaussian additive noise to approximate the true conditional probability distribution for continuous variable given both its parents and evidence in a Bayesian network. One of the most important features of the newly developed method is that it can adaptively learn the optimal important function from the previous samples. We test the inference performance of LGIS using a 16-node linear Gaussian model and a 6-node general hybrid model. The performance comparison with other well-known methods such as Junction tree (JT) and likelihood weighting (LW) shows that LGIS-GHM is very promising.
Reynolds, Maureen D; Tarter, Ralph E; Kirisci, Levent
2004-09-06
Men qualifying for substance use disorder (SUD) consequent to consumption of an illicit drug were compared according to recruitment method. It was hypothesized that volunteers would be more self-disclosing and exhibit more severe disturbances compared to randomly recruited subjects. Personal, demographic, family, social, substance use, psychiatric, and SUD characteristics of volunteers (N = 146) were compared to randomly recruited (N = 102) subjects. Volunteers had lower socioceconomic status, were more likely to be African American, and had lower IQ than randomly recruited subjects. Volunteers also evidenced greater social and family maladjustment and more frequently had received treatment for substance abuse. In addition, lower social desirability response bias was observed in the volunteers. SUD was not more severe in the volunteers; however, they reported a higher lifetime rate of opiate, diet, depressant, and analgesic drug use. Volunteers and randomly recruited subjects qualifying for SUD consequent to illicit drug use are similar in SUD severity but differ in terms of severity of psychosocial disturbance and history of drug involvement. The factors discriminating volunteers and randomly recruited subjects are well known to impact on outcome, hence they need to be considered in research design, especially when selecting a sampling strategy in treatment research.
John, Bindu; Bellipady, Sumanth Shetty; Bhat, Shrinivasa Undaru
2016-01-01
Aims. The purpose of this pilot trial was to determine the efficacy of sleep promotion program to adapt it for the use of adolescents studying in various schools of Mangalore, India, and evaluate the feasibility issues before conducting a randomized controlled trial in a larger sample of adolescents. Methods. A randomized controlled trial design with stratified random sampling method was used. Fifty-eight adolescents were selected (mean age: 14.02 ± 2.15 years; intervention group, n = 34; control group, n = 24). Self-report questionnaires, including sociodemographic questionnaire with some additional questions on sleep and activities, Sleep Hygiene Index, Pittsburgh Sleep Quality Index, The Cleveland Adolescent Sleepiness Questionnaire, and PedsQL™ Present Functioning Visual Analogue Scale, were used. Results. Insufficient weekday-weekend sleep duration with increasing age of adolescents was observed. The program revealed a significant effect in the experimental group over the control group in overall sleep quality, sleep onset latency, sleep duration, daytime sleepiness, and emotional and overall distress. No significant effect was observed in sleep hygiene and other sleep parameters. All target variables showed significant correlations with each other. Conclusion. The intervention holds a promise for improving the sleep behaviors in healthy adolescents. However, the effect of the sleep promotion program treatment has yet to be proven through a future research. This trial is registered with ISRCTN13083118. PMID:27088040
A sub-sampled approach to extremely low-dose STEM
Stevens, A.; Luzi, L.; Yang, H.; ...
2018-01-22
The inpainting of deliberately and randomly sub-sampled images offers a potential means to image specimens at a high resolution and under extremely low-dose conditions (≤1 e -/Å 2) using a scanning transmission electron microscope. We show that deliberate sub-sampling acquires images at least an order of magnitude faster than conventional low-dose methods for an equivalent electron dose. More importantly, when adaptive sub-sampling is implemented to acquire the images, there is a significant increase in the resolution and sensitivity which accompanies the increase in imaging speed. Lastly, we demonstrate the potential of this method for beam sensitive materials and in-situ observationsmore » by experimentally imaging the node distribution in a metal-organic framework.« less
A sub-sampled approach to extremely low-dose STEM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stevens, A.; Luzi, L.; Yang, H.
The inpainting of deliberately and randomly sub-sampled images offers a potential means to image specimens at a high resolution and under extremely low-dose conditions (≤1 e -/Å 2) using a scanning transmission electron microscope. We show that deliberate sub-sampling acquires images at least an order of magnitude faster than conventional low-dose methods for an equivalent electron dose. More importantly, when adaptive sub-sampling is implemented to acquire the images, there is a significant increase in the resolution and sensitivity which accompanies the increase in imaging speed. Lastly, we demonstrate the potential of this method for beam sensitive materials and in-situ observationsmore » by experimentally imaging the node distribution in a metal-organic framework.« less
ERIC Educational Resources Information Center
Erickson, Frederick
A method of evaluating bilingual-bicultural education programs that has a sociolinguistic basis uses samples of the language spoken by a number of bilingual program students as they go through their school day. A random sample of the child's speech would be continuously recorded for an hour, with a bilingual observer taking running notes on where…
ERIC Educational Resources Information Center
Mathur, Manju; Bhargava, Rachna; Benipal, Ramandeep; Luthra, Neena; Basu, Sabita; Kaur, Jasbinder; Chavan, B. S.
2007-01-01
Objective: To compare the dietary habits and nutritional status of mentally retarded (MR) and normal (NG) subjects and to examine the relationship between the dietary habits and nutritional status and the level of mental retardation in the MR group. Method: A case control design was utilized: 117 MR (random sampling) and 100 NG (quota sampling)…
NASA Astrophysics Data System (ADS)
Schünemann, Adriano Luis; Inácio Fernandes Filho, Elpídio; Rocha Francelino, Marcio; Rodrigues Santos, Gérson; Thomazini, Andre; Batista Pereira, Antônio; Gonçalves Reynaud Schaefer, Carlos Ernesto
2017-04-01
The knowledge of environmental variables values, in non-sampled sites from a minimum data set can be accessed through interpolation technique. Kriging and the classifier Random Forest algorithm are examples of predictors with this aim. The objective of this work was to compare methods of soil attributes spatialization in a recent deglaciated environment with complex landforms. Prediction of the selected soil attributes (potassium, calcium and magnesium) from ice-free areas were tested by using morphometric covariables, and geostatistical models without these covariables. For this, 106 soil samples were collected at 0-10 cm depth in Keller Peninsula, King George Island, Maritime Antarctica. Soil chemical analysis was performed by the gravimetric method, determining values of potassium, calcium and magnesium for each sampled point. Digital terrain models (DTMs) were obtained by using Terrestrial Laser Scanner. DTMs were generated from a cloud of points with spatial resolutions of 1, 5, 10, 20 and 30 m. Hence, 40 morphometric covariates were generated. Simple Kriging was performed using the R package software. The same data set coupled with morphometric covariates, was used to predict values of the studied attributes in non-sampled sites through Random Forest interpolator. Little differences were observed on the DTMs generated by Simple kriging and Random Forest interpolators. Also, DTMs with better spatial resolution did not improved the quality of soil attributes prediction. Results revealed that Simple Kriging can be used as interpolator when morphometric covariates are not available, with little impact regarding quality. It is necessary to go further in soil chemical attributes prediction techniques, especially in periglacial areas with complex landforms.
Unbiased feature selection in learning random forests for high-dimensional data.
Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi
2015-01-01
Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.
Uncertainty in monitoring E. coli concentrations in streams and stormwater runoff
NASA Astrophysics Data System (ADS)
Harmel, R. D.; Hathaway, J. M.; Wagner, K. L.; Wolfe, J. E.; Karthikeyan, R.; Francesconi, W.; McCarthy, D. T.
2016-03-01
Microbial contamination of surface waters, a substantial public health concern throughout the world, is typically identified by fecal indicator bacteria such as Escherichia coli. Thus, monitoring E. coli concentrations is critical to evaluate current conditions, determine restoration effectiveness, and inform model development and calibration. An often overlooked component of these monitoring and modeling activities is understanding the inherent random and systematic uncertainty present in measured data. In this research, a review and subsequent analysis was performed to identify, document, and analyze measurement uncertainty of E. coli data collected in stream flow and stormwater runoff as individual discrete samples or throughout a single runoff event. Data on the uncertainty contributed by sample collection, sample preservation/storage, and laboratory analysis in measured E. coli concentrations were compiled and analyzed, and differences in sampling method and data quality scenarios were compared. The analysis showed that: (1) manual integrated sampling produced the lowest random and systematic uncertainty in individual samples, but automated sampling typically produced the lowest uncertainty when sampling throughout runoff events; (2) sample collection procedures often contributed the highest amount of uncertainty, although laboratory analysis introduced substantial random uncertainty and preservation/storage introduced substantial systematic uncertainty under some scenarios; and (3) the uncertainty in measured E. coli concentrations was greater than that of sediment and nutrients, but the difference was not as great as may be assumed. This comprehensive analysis of uncertainty in E. coli concentrations measured in streamflow and runoff should provide valuable insight for designing E. coli monitoring projects, reducing uncertainty in quality assurance efforts, regulatory and policy decision making, and fate and transport modeling.
Ammari, Michelle Mikhael; Moliterno, Luiz Flávio Martins; Hirata Júnior, Raphael; Séllos, Mariana Canano; Soviero, Vera Mendes; Coutinho Filho, Wagner Pereira
2014-01-01
The aim of this study was to compare the efficacy of chemochemical methods (Carisolv™ and Papacárie®) versus the manual method (excavators) in reducing the cariogenic microbiota in dentine caries of primary teeth. Forty-six healthy children (5 to 9 years old) having at least one primary tooth with a cavitated dentine carious lesion were included in the study. The teeth presented no clinical or radiographic signs of pulpal involvement. The sample of 74 teeth was randomly divided into three different groups: Papacárie® (n = 25), Carisolv™ (n = 27) and Manual (n = 22). Samples of carious and sound dentine were collected with sterile excavators before and after caries removal in the three groups. The dentine samples were transferred to glass tubes containing a 1mL thioglycollate medium used as a carrier and enriched for microbiological detection of mutans streptococci and Lactobacillus spp, after incubation for 6h at room temperature. The minimum detection value for colony forming units (CFU) was 3.3 x 102 CFU/ml, and the results were converted into scores from 0 to 4. A significant difference was observed in relation to the microbiological scores before and after caries removal for all methods (Wilcoxon test; p < 0.001). The use of chemomechanical methods for caries removal did not improve the reduction of cariogenic microorganisms in dentine caries lesions, in comparison with manual excavation.
Hazard Function Estimation with Cause-of-Death Data Missing at Random.
Wang, Qihua; Dinse, Gregg E; Liu, Chunling
2012-04-01
Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data.
A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.
Tabe-Bordbar, Shayan; Emad, Amin; Zhao, Sihai Dave; Sinha, Saurabh
2018-04-26
Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.
Randomized trials published in some Chinese journals: how many are randomized?
Wu, Taixiang; Li, Youping; Bian, Zhaoxiang; Liu, Guanjian; Moher, David
2009-01-01
Background The approximately 1100 medical journals now active in China are publishing a rapidly increasing number of research reports, including many studies identified by their authors as randomized controlled trials. It has been noticed that these reports mostly present positive results, and their quality and authenticity have consequently been called into question. We investigated the adequacy of randomization of clinical trials published in recent years in China to determine how many of them met acceptable standards for allocating participants to treatment groups. Methods The China National Knowledge Infrastructure electronic database was searched for reports of randomized controlled trials on 20 common diseases published from January 1994 to June 2005. From this sample, a subset of trials that appeared to have used randomization methods was selected. Twenty-one investigators trained in the relevant knowledge, communication skills and quality control issues interviewed the original authors of these trials about the participant randomization methods and related quality-control features of their trials. Results From an initial sample of 37,313 articles identified in the China National Knowledge Infrastructure database, we found 3137 apparent randomized controlled trials. Of these, 1452 were studies of conventional medicine (published in 411 journals) and 1685 were studies of traditional Chinese medicine (published in 352 journals). Interviews with the authors of 2235 of these reports revealed that only 207 studies adhered to accepted methodology for randomization and could on those grounds be deemed authentic randomized controlled trials (6.8%, 95% confidence interval 5.9–7.7). There was no statistically significant difference in the rate of authenticity between randomized controlled trials of traditional interventions and those of conventional interventions. Randomized controlled trials conducted at hospitals affiliated to medical universities were more likely to be authentic than trials conducted at level 3 and level 2 hospitals (relative risk 1.58, 95% confidence interval 1.18–2.13, and relative risk 14.42, 95% confidence interval 9.40–22.10, respectively). The likelihood of authenticity was higher in level 3 hospitals than in level 2 hospitals (relative risk 9.32, 95% confidence interval 5.83–14.89). All randomized controlled trials of pre-market drug clinical trial were authentic by our criteria. Of the trials conducted at university-affiliated hospitals, 56.3% were authentic (95% confidence interval 32.0–81.0). Conclusion Most reports of randomized controlled trials published in some Chinese journals lacked an adequate description of randomization. Similarly, most so called 'randomized controlled trials' were not real randomized controlled trials owing toa lack of adequate understanding on the part of the authors of rigorous clinical trial design. All randomized controlled trials of pre-market drug clinical trial included in this research were authentic. Randomized controlled trials conducted by authors in high level hospitals, especially in hospitals affiliated to medical universities had a higher rate of authenticity. That so many non-randomized controlled trials were published as randomized controlled trials reflected the fact that peer review needs to be improved and a good practice guide for peer review including how to identify the authenticity of the study urgently needs to be developed. PMID:19573242
Accounting for selection bias in association studies with complex survey data.
Wirth, Kathleen E; Tchetgen Tchetgen, Eric J
2014-05-01
Obtaining representative information from hidden and hard-to-reach populations is fundamental to describe the epidemiology of many sexually transmitted diseases, including HIV. Unfortunately, simple random sampling is impractical in these settings, as no registry of names exists from which to sample the population at random. However, complex sampling designs can be used, as members of these populations tend to congregate at known locations, which can be enumerated and sampled at random. For example, female sex workers may be found at brothels and street corners, whereas injection drug users often come together at shooting galleries. Despite the logistical appeal, complex sampling schemes lead to unequal probabilities of selection, and failure to account for this differential selection can result in biased estimates of population averages and relative risks. However, standard techniques to account for selection can lead to substantial losses in efficiency. Consequently, researchers implement a variety of strategies in an effort to balance validity and efficiency. Some researchers fully or partially account for the survey design, whereas others do nothing and treat the sample as a realization of the population of interest. We use directed acyclic graphs to show how certain survey sampling designs, combined with subject-matter considerations unique to individual exposure-outcome associations, can induce selection bias. Finally, we present a novel yet simple maximum likelihood approach for analyzing complex survey data; this approach optimizes statistical efficiency at no cost to validity. We use simulated data to illustrate this method and compare it with other analytic techniques.
Breslow, Norman E.; Lumley, Thomas; Ballantyne, Christie M; Chambless, Lloyd E.; Kulich, Michal
2009-01-01
The case-cohort study involves two-phase sampling: simple random sampling from an infinite super-population at phase one and stratified random sampling from a finite cohort at phase two. Standard analyses of case-cohort data involve solution of inverse probability weighted (IPW) estimating equations, with weights determined by the known phase two sampling fractions. The variance of parameter estimates in (semi)parametric models, including the Cox model, is the sum of two terms: (i) the model based variance of the usual estimates that would be calculated if full data were available for the entire cohort; and (ii) the design based variance from IPW estimation of the unknown cohort total of the efficient influence function (IF) contributions. This second variance component may be reduced by adjusting the sampling weights, either by calibration to known cohort totals of auxiliary variables correlated with the IF contributions or by their estimation using these same auxiliary variables. Both adjustment methods are implemented in the R survey package. We derive the limit laws of coefficients estimated using adjusted weights. The asymptotic results suggest practical methods for construction of auxiliary variables that are evaluated by simulation of case-cohort samples from the National Wilms Tumor Study and by log-linear modeling of case-cohort data from the Atherosclerosis Risk in Communities Study. Although not semiparametric efficient, estimators based on adjusted weights may come close to achieving full efficiency within the class of augmented IPW estimators. PMID:20174455
A method to estimate the effect of deformable image registration uncertainties on daily dose mapping
Murphy, Martin J.; Salguero, Francisco J.; Siebers, Jeffrey V.; Staub, David; Vaman, Constantin
2012-01-01
Purpose: To develop a statistical sampling procedure for spatially-correlated uncertainties in deformable image registration and then use it to demonstrate their effect on daily dose mapping. Methods: Sequential daily CT studies are acquired to map anatomical variations prior to fractionated external beam radiotherapy. The CTs are deformably registered to the planning CT to obtain displacement vector fields (DVFs). The DVFs are used to accumulate the dose delivered each day onto the planning CT. Each DVF has spatially-correlated uncertainties associated with it. Principal components analysis (PCA) is applied to measured DVF error maps to produce decorrelated principal component modes of the errors. The modes are sampled independently and reconstructed to produce synthetic registration error maps. The synthetic error maps are convolved with dose mapped via deformable registration to model the resulting uncertainty in the dose mapping. The results are compared to the dose mapping uncertainty that would result from uncorrelated DVF errors that vary randomly from voxel to voxel. Results: The error sampling method is shown to produce synthetic DVF error maps that are statistically indistinguishable from the observed error maps. Spatially-correlated DVF uncertainties modeled by our procedure produce patterns of dose mapping error that are different from that due to randomly distributed uncertainties. Conclusions: Deformable image registration uncertainties have complex spatial distributions. The authors have developed and tested a method to decorrelate the spatial uncertainties and make statistical samples of highly correlated error maps. The sample error maps can be used to investigate the effect of DVF uncertainties on daily dose mapping via deformable image registration. An initial demonstration of this methodology shows that dose mapping uncertainties can be sensitive to spatial patterns in the DVF uncertainties. PMID:22320766
Extraction of linear features on SAR imagery
NASA Astrophysics Data System (ADS)
Liu, Junyi; Li, Deren; Mei, Xin
2006-10-01
Linear features are usually extracted from SAR imagery by a few edge detectors derived from the contrast ratio edge detector with a constant probability of false alarm. On the other hand, the Hough Transform is an elegant way of extracting global features like curve segments from binary edge images. Randomized Hough Transform can reduce the computation time and memory usage of the HT drastically. While Randomized Hough Transform will bring about a great deal of cells invalid during the randomized sample. In this paper, we propose a new approach to extract linear features on SAR imagery, which is an almost automatic algorithm based on edge detection and Randomized Hough Transform. The presented improved method makes full use of the directional information of each edge candidate points so as to solve invalid cumulate problems. Applied result is in good agreement with the theoretical study, and the main linear features on SAR imagery have been extracted automatically. The method saves storage space and computational time, which shows its effectiveness and applicability.
Multivariate analysis: greater insights into complex systems
USDA-ARS?s Scientific Manuscript database
Many agronomic researchers measure and collect multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate (MV) statistical methods encompass the simultaneous analysis of all random variables (RV) measured on each experimental or sampling ...
Compressive sampling of polynomial chaos expansions: Convergence analysis and sampling strategies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hampton, Jerrad; Doostan, Alireza, E-mail: alireza.doostan@colorado.edu
2015-01-01
Sampling orthogonal polynomial bases via Monte Carlo is of interest for uncertainty quantification of models with random inputs, using Polynomial Chaos (PC) expansions. It is known that bounding a probabilistic parameter, referred to as coherence, yields a bound on the number of samples necessary to identify coefficients in a sparse PC expansion via solution to an ℓ{sub 1}-minimization problem. Utilizing results for orthogonal polynomials, we bound the coherence parameter for polynomials of Hermite and Legendre type under their respective natural sampling distribution. In both polynomial bases we identify an importance sampling distribution which yields a bound with weaker dependence onmore » the order of the approximation. For more general orthonormal bases, we propose the coherence-optimal sampling: a Markov Chain Monte Carlo sampling, which directly uses the basis functions under consideration to achieve a statistical optimality among all sampling schemes with identical support. We demonstrate these different sampling strategies numerically in both high-order and high-dimensional, manufactured PC expansions. In addition, the quality of each sampling method is compared in the identification of solutions to two differential equations, one with a high-dimensional random input and the other with a high-order PC expansion. In both cases, the coherence-optimal sampling scheme leads to similar or considerably improved accuracy.« less
Probability of coincidental similarity among the orbits of small bodies - I. Pairing
NASA Astrophysics Data System (ADS)
Jopek, Tadeusz Jan; Bronikowska, Małgorzata
2017-09-01
Probability of coincidental clustering among orbits of comets, asteroids and meteoroids depends on many factors like: the size of the orbital sample searched for clusters or the size of the identified group, it is different for groups of 2,3,4,… members. Probability of coincidental clustering is assessed by the numerical simulation, therefore, it depends also on the method used for the synthetic orbits generation. We have tested the impact of some of these factors. For a given size of the orbital sample we have assessed probability of random pairing among several orbital populations of different sizes. We have found how these probabilities vary with the size of the orbital samples. Finally, keeping fixed size of the orbital sample we have shown that the probability of random pairing can be significantly different for the orbital samples obtained by different observation techniques. Also for the user convenience we have obtained several formulae which, for given size of the orbital sample can be used to calculate the similarity threshold corresponding to the small value of the probability of coincidental similarity among two orbits.
Kretzschmar, A; Durand, E; Maisonnasse, A; Vallon, J; Le Conte, Y
2015-06-01
A new procedure of stratified sampling is proposed in order to establish an accurate estimation of Varroa destructor populations on sticky bottom boards of the hive. It is based on the spatial sampling theory that recommends using regular grid stratification in the case of spatially structured process. The distribution of varroa mites on sticky board being observed as spatially structured, we designed a sampling scheme based on a regular grid with circles centered on each grid element. This new procedure is then compared with a former method using partially random sampling. Relative error improvements are exposed on the basis of a large sample of simulated sticky boards (n=20,000) which provides a complete range of spatial structures, from a random structure to a highly frame driven structure. The improvement of varroa mite number estimation is then measured by the percentage of counts with an error greater than a given level. © The Authors 2015. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics
NASA Technical Reports Server (NTRS)
Pohorille, Andrew
2006-01-01
The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described by rate constants. These problems are isomorphic with chemical kinetics problems. Recently, several efficient techniques for this purpose have been developed based on the approach originally proposed by Gillespie. Although the utility of the techniques mentioned above for Bayesian problems has not been determined, further research along these lines is warranted
Effective Recruitment of Schools for Randomized Clinical Trials: Role of School Nurses.
Petosa, R L; Smith, L
2017-01-01
In school settings, nurses lead efforts to improve the student health and well-being to support academic success. Nurses are guided by evidenced-based practice and data to inform care decisions. The randomized controlled trial (RCT) is considered the gold standard of scientific rigor for clinical trials. RCTs are critical to the development of evidence-based health promotion programs in schools. The purpose of this article is to present practical solutions to implementing principles of randomization to RCT trials conducted in school settings. Randomization is a powerful sampling method used to build internal and external validity. The school's daily organization and educational mission provide several barriers to randomization. Based on the authors' experience in conducting school-based RCTs, they offer a host of practical solutions to working with schools to successfully implement randomization procedures. Nurses play a critical role in implementing RCTs in schools to promote rigorous science in support of evidence-based practice.
Cromwell, Elizabeth A; Ngondi, Jeremiah; McFarland, Deborah; King, Jonathan D; Emerson, Paul M
2012-10-01
In the context of trachoma control, population coverage with mass drug administration (MDA) using antibiotics is measured using routine data. Due to the limitations of administrative records as well as the potential for bias from incomplete or incorrect records, a literature review of coverage survey methods applied in neglected tropical disease control programmes and immunisation outreach was conducted to inform the design of coverage surveys for trachoma control. Several methods were identified, including the '30 × 7' survey method for the Expanded Programme on Immunization (EPI 30×7), other cluster random sampling (CRS) methods, lot quality assurance sampling (LQAS), purposive sampling and routine data. When compared against one another, the EPI and other CRS methods produced similar population coverage estimates, whilst LQAS, purposive sampling and use of administrative data did not generate estimates consistent with CRS. In conclusion, CRS methods present a consistent approach for MDA coverage surveys despite different methods of household selection. They merit use until standard guidelines are available. CRS methods should be used to verify population coverage derived from LQAS, purposive sampling methods and administrative reports. Copyright © 2012 Royal Society of Tropical Medicine and Hygiene. Published by Elsevier Ltd. All rights reserved.
Designing a national soil erosion monitoring network for England and Wales
NASA Astrophysics Data System (ADS)
Lark, Murray; Rawlins, Barry; Anderson, Karen; Evans, Martin; Farrow, Luke; Glendell, Miriam; James, Mike; Rickson, Jane; Quine, Timothy; Quinton, John; Brazier, Richard
2014-05-01
Although soil erosion is recognised as a significant threat to sustainable land use and may be a priority for action in any forthcoming EU Soil Framework Directive, those responsible for setting national policy with respect to erosion are constrained by a lack of robust, representative, data at large spatial scales. This reflects the process-orientated nature of much soil erosion research. Recognising this limitation, The UK Department for Environment, Food and Rural Affairs (Defra) established a project to pilot a cost-effective framework for monitoring of soil erosion in England and Wales (E&W). The pilot will compare different soil erosion monitoring methods at a site scale and provide statistical information for the final design of the full national monitoring network that will: provide unbiased estimates of the spatial mean of soil erosion rate across E&W (tonnes ha-1 yr-1) for each of three land-use classes - arable and horticultural grassland upland and semi-natural habitats quantify the uncertainty of these estimates with confidence intervals. Probability (design-based) sampling provides most efficient unbiased estimates of spatial means. In this study, a 16 hectare area (a square of 400 x 400 m) positioned at the centre of a 1-km grid cell, selected at random from mapped land use across E&W, provided the sampling support for measurement of erosion rates, with at least 94% of the support area corresponding to the target land use classes. Very small or zero erosion rates likely to be encountered at many sites reduce the sampling efficiency and make it difficult to compare different methods of soil erosion monitoring. Therefore, to increase the proportion of samples with larger erosion rates without biasing our estimates, we increased the inclusion probability density in areas where the erosion rate is likely to be large by using stratified random sampling. First, each sampling domain (land use class in E&W) was divided into strata; e.g. two sub-domains within which, respectively, small or no erosion rates, and moderate or larger erosion rates are expected. Each stratum was then sampled independently and at random. The sample density need not be equal in the two strata, but is known and is accounted for in the estimation of the mean and its standard error. To divide the domains into strata we used information on slope angle, previous interpretation of erosion susceptibility of the soil associations that correspond to the soil map of E&W at 1:250 000 (Soil Survey of England and Wales, 1983), and visual interpretation of evidence of erosion from aerial photography. While each domain could be stratified on the basis of the first two criteria, air photo interpretation across the whole country was not feasible. For this reason we used a two-phase random sampling for stratification (TPRS) design (de Gruijter et al., 2006). First, we formed an initial random sample of 1-km grid cells from the target domain. Second, each cell was then allocated to a stratum on the basis of the three criteria. A subset of the selected cells from each stratum were then selected for field survey at random, with a specified sampling density for each stratum so as to increase the proportion of cells where moderate or larger erosion rates were expected. Once measurements of erosion have been made, an estimate of the spatial mean of the erosion rate over the target domain, its standard error and associated uncertainty can be calculated by an expression which accounts for the estimated proportions of the two strata within the initial random sample. de Gruijter, J.J., Brus, D.J., Biekens, M.F.P. & Knotters, M. 2006. Sampling for Natural Resource Monitoring. Springer, Berlin. Soil Survey of England and Wales. 1983 National Soil Map NATMAP Vector 1:250,000. National Soil Research Institute, Cranfield University.
Poisson-Box Sampling algorithms for three-dimensional Markov binary mixtures
NASA Astrophysics Data System (ADS)
Larmier, Coline; Zoia, Andrea; Malvagi, Fausto; Dumonteil, Eric; Mazzolo, Alain
2018-02-01
Particle transport in Markov mixtures can be addressed by the so-called Chord Length Sampling (CLS) methods, a family of Monte Carlo algorithms taking into account the effects of stochastic media on particle propagation by generating on-the-fly the material interfaces crossed by the random walkers during their trajectories. Such methods enable a significant reduction of computational resources as opposed to reference solutions obtained by solving the Boltzmann equation for a large number of realizations of random media. CLS solutions, which neglect correlations induced by the spatial disorder, are faster albeit approximate, and might thus show discrepancies with respect to reference solutions. In this work we propose a new family of algorithms (called 'Poisson Box Sampling', PBS) aimed at improving the accuracy of the CLS approach for transport in d-dimensional binary Markov mixtures. In order to probe the features of PBS methods, we will focus on three-dimensional Markov media and revisit the benchmark problem originally proposed by Adams, Larsen and Pomraning [1] and extended by Brantley [2]: for these configurations we will compare reference solutions, standard CLS solutions and the new PBS solutions for scalar particle flux, transmission and reflection coefficients. PBS will be shown to perform better than CLS at the expense of a reasonable increase in computational time.
Statistical auditing and randomness test of lotto k/N-type games
NASA Astrophysics Data System (ADS)
Coronel-Brizio, H. F.; Hernández-Montoya, A. R.; Rapallo, F.; Scalas, E.
2008-11-01
One of the most popular lottery games worldwide is the so-called “lotto k/N”. It considers N numbers 1,2,…,N from which k are drawn randomly, without replacement. A player selects k or more numbers and the first prize is shared amongst those players whose selected numbers match all of the k randomly drawn. Exact rules may vary in different countries. In this paper, mean values and covariances for the random variables representing the numbers drawn from this kind of game are presented, with the aim of using them to audit statistically the consistency of a given sample of historical results with theoretical values coming from a hypergeometric statistical model. The method can be adapted to test pseudorandom number generators.
Dezhdar, Shahin; Jahanpour, Faezeh; Firouz Bakht, Saeedeh; Ostovar, Afshin
2016-04-01
Hospitalized premature babies often undergo various painful procedures. Kangaroo mother care (KMC) and swaddling are two pain reduction methods. This study was undertaken to compare the effects of swaddling and KMC on pain during venous sampling in premature neonates. This study was performed as a randomized clinical trial on 90 premature neonates. The neonates were divided into three groups using a random allocation block. The three groups were group A (swaddling), group B (KMC), and group C (control). In all three groups, the heart rate and arterial oxygen saturation were measured and recorded in time intervals of 30 seconds before, during, and 30, 60, 90, and 120 seconds after blood sampling. The neonate's face was video recorded and assessed using the premature infant pain profile (PIPP) at time intervals of 30 seconds. The data was analyzed using the t-test, chi-square test, Repeated Measure analysis of variance (ANOVA), Kruskal-Wallis, Post-hoc, and Bonferroni test. The findings revealed that pain was reduced to a great extent in the swaddling and KMC methods compared to the control group. However, there was no significant difference between KMC and swaddling (P ≥ 0.05). The results of this study indicate that there is no meaningful difference between swaddling and KMC on physiological indexes and pain in neonates. Therefore, the swaddling method may be a good substitute for KMC.
Williams, M S; Ebel, E D; Cao, Y
2013-01-01
The fitting of statistical distributions to microbial sampling data is a common application in quantitative microbiology and risk assessment applications. An underlying assumption of most fitting techniques is that data are collected with simple random sampling, which is often times not the case. This study develops a weighted maximum likelihood estimation framework that is appropriate for microbiological samples that are collected with unequal probabilities of selection. A weighted maximum likelihood estimation framework is proposed for microbiological samples that are collected with unequal probabilities of selection. Two examples, based on the collection of food samples during processing, are provided to demonstrate the method and highlight the magnitude of biases in the maximum likelihood estimator when data are inappropriately treated as a simple random sample. Failure to properly weight samples to account for how data are collected can introduce substantial biases into inferences drawn from the data. The proposed methodology will reduce or eliminate an important source of bias in inferences drawn from the analysis of microbial data. This will also make comparisons between studies and the combination of results from different studies more reliable, which is important for risk assessment applications. © 2012 No claim to US Government works.
Spectral and correlation analysis with applications to middle-atmosphere radars
NASA Technical Reports Server (NTRS)
Rastogi, Prabhat K.
1989-01-01
The correlation and spectral analysis methods for uniformly sampled stationary random signals, estimation of their spectral moments, and problems arising due to nonstationary are reviewed. Some of these methods are already in routine use in atmospheric radar experiments. Other methods based on the maximum entropy principle and time series models have been used in analyzing data, but are just beginning to receive attention in the analysis of radar signals. These methods are also briefly discussed.
Kim, Min-Kyu; Hong, Seong-Kwan; Kwon, Oh-Kyong
2015-12-26
This paper presents a fast multiple sampling method for low-noise CMOS image sensor (CIS) applications with column-parallel successive approximation register analog-to-digital converters (SAR ADCs). The 12-bit SAR ADC using the proposed multiple sampling method decreases the A/D conversion time by repeatedly converting a pixel output to 4-bit after the first 12-bit A/D conversion, reducing noise of the CIS by one over the square root of the number of samplings. The area of the 12-bit SAR ADC is reduced by using a 10-bit capacitor digital-to-analog converter (DAC) with four scaled reference voltages. In addition, a simple up/down counter-based digital processing logic is proposed to perform complex calculations for multiple sampling and digital correlated double sampling. To verify the proposed multiple sampling method, a 256 × 128 pixel array CIS with 12-bit SAR ADCs was fabricated using 0.18 μm CMOS process. The measurement results shows that the proposed multiple sampling method reduces each A/D conversion time from 1.2 μs to 0.45 μs and random noise from 848.3 μV to 270.4 μV, achieving a dynamic range of 68.1 dB and an SNR of 39.2 dB.
Some practical problems in implementing randomization.
Downs, Matt; Tucker, Kathryn; Christ-Schmidt, Heidi; Wittes, Janet
2010-06-01
While often theoretically simple, implementing randomization to treatment in a masked, but confirmable, fashion can prove difficult in practice. At least three categories of problems occur in randomization: (1) bad judgment in the choice of method, (2) design and programming errors in implementing the method, and (3) human error during the conduct of the trial. This article focuses on these latter two types of errors, dealing operationally with what can go wrong after trial designers have selected the allocation method. We offer several case studies and corresponding recommendations for lessening the frequency of problems in allocating treatment or for mitigating the consequences of errors. Recommendations include: (1) reviewing the randomization schedule before starting a trial, (2) being especially cautious of systems that use on-demand random number generators, (3) drafting unambiguous randomization specifications, (4) performing thorough testing before entering a randomization system into production, (5) maintaining a dataset that captures the values investigators used to randomize participants, thereby allowing the process of treatment allocation to be reproduced and verified, (6) resisting the urge to correct errors that occur in individual treatment assignments, (7) preventing inadvertent unmasking to treatment assignments in kit allocations, and (8) checking a sample of study drug kits to allow detection of errors in drug packaging and labeling. Although we performed a literature search of documented randomization errors, the examples that we provide and the resultant recommendations are based largely on our own experience in industry-sponsored clinical trials. We do not know how representative our experience is or how common errors of the type we have seen occur. Our experience underscores the importance of verifying the integrity of the treatment allocation process before and during a trial. Clinical Trials 2010; 7: 235-245. http://ctj.sagepub.com.
Digital simulation of an arbitrary stationary stochastic process by spectral representation.
Yura, Harold T; Hanson, Steen G
2011-04-01
In this paper we present a straightforward, efficient, and computationally fast method for creating a large number of discrete samples with an arbitrary given probability density function and a specified spectral content. The method relies on initially transforming a white noise sample set of random Gaussian distributed numbers into a corresponding set with the desired spectral distribution, after which this colored Gaussian probability distribution is transformed via an inverse transform into the desired probability distribution. In contrast to previous work, where the analyses were limited to auto regressive and or iterative techniques to obtain satisfactory results, we find that a single application of the inverse transform method yields satisfactory results for a wide class of arbitrary probability distributions. Although a single application of the inverse transform technique does not conserve the power spectra exactly, it yields highly accurate numerical results for a wide range of probability distributions and target power spectra that are sufficient for system simulation purposes and can thus be regarded as an accurate engineering approximation, which can be used for wide range of practical applications. A sufficiency condition is presented regarding the range of parameter values where a single application of the inverse transform method yields satisfactory agreement between the simulated and target power spectra, and a series of examples relevant for the optics community are presented and discussed. Outside this parameter range the agreement gracefully degrades but does not distort in shape. Although we demonstrate the method here focusing on stationary random processes, we see no reason why the method could not be extended to simulate non-stationary random processes. © 2011 Optical Society of America
ERIC Educational Resources Information Center
Abdu-Raheem, B. O.
2012-01-01
This study investigated the effects of problem-solving method of teaching on secondary school students' achievement and retention in Social Studies. The study adopted the quasi-experimental, pre-test, post-test, control group design. The sample for the study consisted of 240 Junior Secondary School Class II students randomly selected from six…
An Analysis of Methods Used To Reduce Nonresponse Bias in Survey Research.
ERIC Educational Resources Information Center
Johnson, Victoria A.
The effectiveness of five methods used to estimate the population parameters of a variable of interest from a random sample in the presence of non-response to mail surveys was tested in conditions that vary the return rate and the relationship of the variable of interest to the likelihood of response. Data from 125,092 adult Alabama residents in…
ERIC Educational Resources Information Center
Pálsdóttir, Ágústa
2014-01-01
Introduction: The paper presents findings from a study investigating the health and lifestyle information behaviour of different groups of Icelanders. The paper focuses on the use of social media and its role in current information behaviour. Method: Quantitative methods were used. Two random samples were used in the study and the data were…
Measuring Link-Resolver Success: Comparing 360 Link with a Local Implementation of WebBridge
ERIC Educational Resources Information Center
Herrera, Gail
2011-01-01
This study reviewed link resolver success comparing 360 Link and a local implementation of WebBridge. Two methods were used: (1) comparing article-level access and (2) examining technical issues for 384 randomly sampled OpenURLs. Google Analytics was used to collect user-generated OpenURLs. For both methods, 360 Link out-performed the local…
Smith, Blair H; Hannaford, Philip C; Elliott, Alison M; Smith, W Cairns; Chambers, W Alastair
2005-04-01
Sampling for primary care research must strike a balance between efficiency and external validity. For most conditions, even a large population sample will yield a small number of cases, yet other sampling techniques risk problems with extrapolation of findings. To compare the efficiency and external validity of two sampling methods for both an intervention study and epidemiological research in primary care--a convenience sample and a general population sample--comparing the response and follow-up rates, the demographic and clinical characteristics of each sample, and calculating the 'number needed to sample' (NNS) for a hypothetical randomized controlled trial. In 1996, we selected two random samples of adults from 29 general practices in Grampian, for an epidemiological study of chronic pain. One sample of 4175 was identified by an electronic questionnaire that listed patients receiving regular analgesic prescriptions--the 'repeat prescription sample'. The other sample of 5036 was identified from all patients on practice lists--the 'general population sample'. Questionnaires, including demographic, pain and general health measures, were sent to all. A similar follow-up questionnaire was sent in 2000 to all those agreeing to participate in further research. We identified a potential group of subjects for a hypothetical trial in primary care based on a recently published trial (those aged 25-64, with severe chronic back pain, willing to participate in further research). The repeat prescription sample produced better response rates than the general sample overall (86% compared with 82%, P < 0.001), from both genders and from the oldest and youngest age groups. The NNS using convenience sampling was 10 for each member of the final potential trial sample, compared with 55 using general population sampling. There were important differences between the samples in age, marital and employment status, social class and educational level. However, among the potential trial sample, there were no demographic differences. Those from the repeat prescription sample had poorer indices than the general population sample in all pain and health measures. The repeat prescription sampling method was approximately five times more efficient than the general population method. However demographic and clinical differences in the repeat prescription sample might hamper extrapolation of findings to the general population, particularly in an epidemiological study, and demonstrate that simple comparison with age and gender of the target population is insufficient.
An improved initialization center k-means clustering algorithm based on distance and density
NASA Astrophysics Data System (ADS)
Duan, Yanling; Liu, Qun; Xia, Shuyin
2018-04-01
Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.
Brodaty, Henry; Mothakunnel, Annu; de Vel-Palumbo, Melissa; Ames, David; Ellis, Kathryn A; Reppermund, Simone; Kochan, Nicole A; Savage, Greg; Trollor, Julian N; Crawford, John; Sachdev, Perminder S
2014-01-01
We examined whether differences in findings of studies examining mild cognitive impairment (MCI) were associated with recruitment methods by comparing sample characteristics in two contemporaneous Australian studies, using population-based and convenience sampling. The Sydney Memory and Aging Study invited participants randomly from the electoral roll in defined geographic areas in Sydney. The Australian Imaging, Biomarkers and Lifestyle Study of Ageing recruited cognitively normal (CN) individuals via media appeals and MCI participants via referrals from clinicians in Melbourne and Perth. Demographic and cognitive variables were harmonized, and similar diagnostic criteria were applied to both samples retrospectively. CN participants recruited via convenience sampling were younger, better educated, more likely to be married and have a family history of dementia, and performed better cognitively than those recruited via population-based sampling. MCI participants recruited via population-based sampling had better memory performance and were less likely to carry the apolipoprotein E ε4 allele than clinically referred participants but did not differ on other demographic variables. A convenience sample of normal controls is likely to be younger and better functioning and that of an MCI group likely to perform worse than a purportedly random sample. Sampling bias should be considered when interpreting findings. Copyright © 2014 Elsevier Inc. All rights reserved.
Generation of Stationary Non-Gaussian Time Histories with a Specified Cross-spectral Density
Smallwood, David O.
1997-01-01
The paper reviews several methods for the generation of stationary realizations of sampled time histories with non-Gaussian distributions and introduces a new method which can be used to control the cross-spectral density matrix and the probability density functions (pdfs) of the multiple input problem. Discussed first are two methods for the specialized case of matching the auto (power) spectrum, the skewness, and kurtosis using generalized shot noise and using polynomial functions. It is then shown that the skewness and kurtosis can also be controlled by the phase of a complex frequency domain description of the random process. The general casemore » of matching a target probability density function using a zero memory nonlinear (ZMNL) function is then covered. Next methods for generating vectors of random variables with a specified covariance matrix for a class of spherically invariant random vectors (SIRV) are discussed. Finally the general case of matching the cross-spectral density matrix of a vector of inputs with non-Gaussian marginal distributions is presented.« less
Martín, Julia; Rodríguez-Gómez, Rocío; Zafra-Gómez, Alberto; Alonso, Esteban; Vílchez, José L; Navalón, Alberto
2016-04-01
A new method for the determination of four perfluoroalkyl carboxylic acids (from C5 to C8) and perfluorooctane sulfonate in human milk samples using stir-bar sorptive extraction-ultra-HPLC-MS/MS has been accurately optimized and validated. Polydimethylsiloxane and polyethyleneglycol modified silicone materials were evaluated. Overall, polyethyleneglycol led to a better sensitivity. After optimizing experimental variables, the method was validated reaching detection limits in the range of 0.05-0.20 ng ml(-1); recovery rates from 81 to 105% and relative standard deviations fewer than 13% in all cases. The method was applied to milk samples from five randomly selected women. All samples were positive for at least one of the target compounds with concentrations ranging between 0.8 and 6.6 ng ml(-1), being the most abundant perfluorooctane sulfonate.
Shah, Anoop D.; Bartlett, Jonathan W.; Carpenter, James; Nicholas, Owen; Hemingway, Harry
2014-01-01
Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The “true” imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001–2010) with complete data on all covariates. Variables were artificially made “missing at random,” and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data. PMID:24589914
Shah, Anoop D; Bartlett, Jonathan W; Carpenter, James; Nicholas, Owen; Hemingway, Harry
2014-03-15
Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The "true" imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001-2010) with complete data on all covariates. Variables were artificially made "missing at random," and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data.
NASA Astrophysics Data System (ADS)
Narimani, M.; Sadeghieh Ahari, S.; Rajabi, S.
This research aims to determine efficacy of two therapeutic methods and compare them; Eye Movement, Desensitization and Reprocessing (EMDR) and Cognitive Behavioral Therapy (CBT) for reduction of anxiety and depression of Iranian combatant afflicted with Post traumatic Stress Disorder (PTSD) after imposed war. Statistical population of current study includes combatants afflicted with PTSD that were hospitalized in Isar Hospital of Ardabil province or were inhabited in Ardabil. These persons were selected through simple random sampling and were randomly located in three groups. The method was extended test method and study design was multi-group test-retest. Used tools include hospital anxiety and depression scale. This survey showed that exercise of EMDR and CBT has caused significant reduction of anxiety and depression.
Interest of LQAS method in a survey of HTLV-I infection in Benin (West Africa).
Houinato, Dismand; Preux, Pierre-Marie; Charriere, Bénédicte; Massit, Bruno; Avodé, Gilbert; Denis, François; Dumas, Michel; Boutros-Toni, Fernand; Salamon, Roger
2002-02-01
HTLV-I is heterogeneously distributed in Sub-Saharan Africa. Traditional survey methods as cluster sampling could provide information for a country or region of interest. However, they cannot identify small areas with higher prevalences of infection to help in the health policy planning. Identification of such areas could be done by a Lot Quality Assurance Sampling (LQAS) method, which is currently used in industry to identify a poor performance in assembly lines. The LQAS method was used in Atacora (Northern Benin) between March and May 1998 to identify areas with a HTLV-I seroprevalence higher than 4%. Sixty-five subjects were randomly selected in each of 36 communes (lots) of this department. Lots were classified as unacceptable when the sample contained at least one positive subject. The LQAS method identified 25 (69.4 %) communes with a prevalence higher than 4%. Using stratified sampling theory, the overall HTLV-I seroprevalence was 4.5% (95% CI: 3.6-5.4%). These data show the interest of LQAS method application under field conditions to detect clusters of infection.
Genome-Wide Chromosomal Targets of Oncogenic Transcription Factors
2008-04-01
axis. (a) Comparison between STAGE and ChIP-chip when the same sample was analyzed by both methods. The gray line indicates all predicted STAGE targets...numbers of single-hit tags (Y-axis) were plotted against the frequen- cies of those tags in the random ( gray bars) and experimental (black bars) tag...size of 500 bp gave an optimal separation between random and real data. Data shown is for a window size of 500 bp. The gray bars indicate log10 of the
Network Sampling with Memory: A proposal for more efficient sampling from social networks.
Mouw, Ted; Verdery, Ashton M
2012-08-01
Techniques for sampling from networks have grown into an important area of research across several fields. For sociologists, the possibility of sampling from a network is appealing for two reasons: (1) A network sample can yield substantively interesting data about network structures and social interactions, and (2) it is useful in situations where study populations are difficult or impossible to survey with traditional sampling approaches because of the lack of a sampling frame. Despite its appeal, methodological concerns about the precision and accuracy of network-based sampling methods remain. In particular, recent research has shown that sampling from a network using a random walk based approach such as Respondent Driven Sampling (RDS) can result in high design effects (DE)-the ratio of the sampling variance to the sampling variance of simple random sampling (SRS). A high design effect means that more cases must be collected to achieve the same level of precision as SRS. In this paper we propose an alternative strategy, Network Sampling with Memory (NSM), which collects network data from respondents in order to reduce design effects and, correspondingly, the number of interviews needed to achieve a given level of statistical power. NSM combines a "List" mode, where all individuals on the revealed network list are sampled with the same cumulative probability, with a "Search" mode, which gives priority to bridge nodes connecting the current sample to unexplored parts of the network. We test the relative efficiency of NSM compared to RDS and SRS on 162 school and university networks from Add Health and Facebook that range in size from 110 to 16,278 nodes. The results show that the average design effect for NSM on these 162 networks is 1.16, which is very close to the efficiency of a simple random sample (DE=1), and 98.5% lower than the average DE we observed for RDS.
Network Sampling with Memory: A proposal for more efficient sampling from social networks
Mouw, Ted; Verdery, Ashton M.
2013-01-01
Techniques for sampling from networks have grown into an important area of research across several fields. For sociologists, the possibility of sampling from a network is appealing for two reasons: (1) A network sample can yield substantively interesting data about network structures and social interactions, and (2) it is useful in situations where study populations are difficult or impossible to survey with traditional sampling approaches because of the lack of a sampling frame. Despite its appeal, methodological concerns about the precision and accuracy of network-based sampling methods remain. In particular, recent research has shown that sampling from a network using a random walk based approach such as Respondent Driven Sampling (RDS) can result in high design effects (DE)—the ratio of the sampling variance to the sampling variance of simple random sampling (SRS). A high design effect means that more cases must be collected to achieve the same level of precision as SRS. In this paper we propose an alternative strategy, Network Sampling with Memory (NSM), which collects network data from respondents in order to reduce design effects and, correspondingly, the number of interviews needed to achieve a given level of statistical power. NSM combines a “List” mode, where all individuals on the revealed network list are sampled with the same cumulative probability, with a “Search” mode, which gives priority to bridge nodes connecting the current sample to unexplored parts of the network. We test the relative efficiency of NSM compared to RDS and SRS on 162 school and university networks from Add Health and Facebook that range in size from 110 to 16,278 nodes. The results show that the average design effect for NSM on these 162 networks is 1.16, which is very close to the efficiency of a simple random sample (DE=1), and 98.5% lower than the average DE we observed for RDS. PMID:24159246
2012-01-01
Background With the current focus on personalized medicine, patient/subject level inference is often of key interest in translational research. As a result, random effects models (REM) are becoming popular for patient level inference. However, for very large data sets that are characterized by large sample size, it can be difficult to fit REM using commonly available statistical software such as SAS since they require inordinate amounts of computer time and memory allocations beyond what are available preventing model convergence. For example, in a retrospective cohort study of over 800,000 Veterans with type 2 diabetes with longitudinal data over 5 years, fitting REM via generalized linear mixed modeling using currently available standard procedures in SAS (e.g. PROC GLIMMIX) was very difficult and same problems exist in Stata’s gllamm or R’s lme packages. Thus, this study proposes and assesses the performance of a meta regression approach and makes comparison with methods based on sampling of the full data. Data We use both simulated and real data from a national cohort of Veterans with type 2 diabetes (n=890,394) which was created by linking multiple patient and administrative files resulting in a cohort with longitudinal data collected over 5 years. Methods and results The outcome of interest was mean annual HbA1c measured over a 5 years period. Using this outcome, we compared parameter estimates from the proposed random effects meta regression (REMR) with estimates based on simple random sampling and VISN (Veterans Integrated Service Networks) based stratified sampling of the full data. Our results indicate that REMR provides parameter estimates that are less likely to be biased with tighter confidence intervals when the VISN level estimates are homogenous. Conclusion When the interest is to fit REM in repeated measures data with very large sample size, REMR can be used as a good alternative. It leads to reasonable inference for both Gaussian and non-Gaussian responses if parameter estimates are homogeneous across VISNs. PMID:23095325
ERIC Educational Resources Information Center
Chidi, Christopher O.; Shadare, Oluseyi A.
2011-01-01
This study investigated the influence of host community on industrial relations practices and policies using Agbara community and Power Holding Company of Nigeria PLC as a case. The study adopted both the qualitative and quantitative methods. A total of 120 samples were drawn from the population using the simple random sampling technique in which…
ERIC Educational Resources Information Center
Al Rabadi, Wail Minwer; Salem, Rifqa Khleif
2018-01-01
The study was designed to identify the effect of high-order thinking on the quality of life among Ajloun University students. The study used the associative method. The randomly selected sample consisted of 147 students from Ajloun University College. The study used two tools: The two measures were applied to the sample of the current study after…
Paul F. Hessburg; Bradley G. Smith; Craig A. Miller; Scott D. Kreiter; R. Brion Salter
1999-01-01
In the interior Columbia River basin midscale ecological assessment, including portions of the Klamath and Great Basins, we mapped and characterized historical and current vegetation composition and structure of 337 randomly sampled subwatersheds (9500 ha average size) in 43 subbasins (404 000 ha average size). We compared landscape patterns, vegetation structure and...
NASA Astrophysics Data System (ADS)
Rougier, Simon; Puissant, Anne; Stumpf, André; Lachiche, Nicolas
2016-09-01
Vegetation monitoring is becoming a major issue in the urban environment due to the services they procure and necessitates an accurate and up to date mapping. Very High Resolution satellite images enable a detailed mapping of the urban tree and herbaceous vegetation. Several supervised classifications with statistical learning techniques have provided good results for the detection of urban vegetation but necessitate a large amount of training data. In this context, this study proposes to investigate the performances of different sampling strategies in order to reduce the number of examples needed. Two windows based active learning algorithms from state-of-art are compared to a classical stratified random sampling and a third combining active learning and stratified strategies is proposed. The efficiency of these strategies is evaluated on two medium size French cities, Strasbourg and Rennes, associated to different datasets. Results demonstrate that classical stratified random sampling can in some cases be just as effective as active learning methods and that it should be used more frequently to evaluate new active learning methods. Moreover, the active learning strategies proposed in this work enables to reduce the computational runtime by selecting multiple windows at each iteration without increasing the number of windows needed.
Compressed-Sensing Multi-Spectral Imaging of the Post-Operative Spine
Worters, Pauline W.; Sung, Kyunghyun; Stevens, Kathryn J.; Koch, Kevin M.; Hargreaves, Brian A.
2012-01-01
Purpose To apply compressed sensing (CS) to in vivo multi-spectral imaging (MSI), which uses additional encoding to avoid MRI artifacts near metal, and demonstrate the feasibility of CS-MSI in post-operative spinal imaging. Materials and Methods Thirteen subjects referred for spinal MRI were examined using T2-weighted MSI. A CS undersampling factor was first determined using a structural similarity index as a metric for image quality. Next, these fully sampled datasets were retrospectively undersampled using a variable-density random sampling scheme and reconstructed using an iterative soft-thresholding method. The fully- and under-sampled images were compared by using a 5-point scale. Prospectively undersampled CS-MSI data were also acquired from two subjects to ensure that the prospective random sampling did not affect the image quality. Results A two-fold outer reduction factor was deemed feasible for the spinal datasets. CS-MSI images were shown to be equivalent or better than the original MSI images in all categories: nerve visualization: p = 0.00018; image artifact: p = 0.00031; image quality: p = 0.0030. No alteration of image quality and T2 contrast was observed from prospectively undersampled CS-MSI. Conclusion This study shows that the inherently sparse nature of MSI data allows modest undersampling followed by CS reconstruction with no loss of diagnostic quality. PMID:22791572
Compressive Sampling based Image Coding for Resource-deficient Visual Communication.
Liu, Xianming; Zhai, Deming; Zhou, Jiantao; Zhang, Xinfeng; Zhao, Debin; Gao, Wen
2016-04-14
In this paper, a new compressive sampling based image coding scheme is developed to achieve competitive coding efficiency at lower encoder computational complexity, while supporting error resilience. This technique is particularly suitable for visual communication with resource-deficient devices. At the encoder, compact image representation is produced, which is a polyphase down-sampled version of the input image; but the conventional low-pass filter prior to down-sampling is replaced by a local random binary convolution kernel. The pixels of the resulting down-sampled pre-filtered image are local random measurements and placed in the original spatial configuration. The advantages of local random measurements are two folds: 1) preserve high-frequency image features that are otherwise discarded by low-pass filtering; 2) remain a conventional image and can therefore be coded by any standardized codec to remove statistical redundancy of larger scales. Moreover, measurements generated by different kernels can be considered as multiple descriptions of the original image and therefore the proposed scheme has the advantage of multiple description coding. At the decoder, a unified sparsity-based soft-decoding technique is developed to recover the original image from received measurements in a framework of compressive sensing. Experimental results demonstrate that the proposed scheme is competitive compared with existing methods, with a unique strength of recovering fine details and sharp edges at low bit-rates.
Estimating the Size of a Large Network and its Communities from a Random Sample
Chen, Lin; Karbasi, Amin; Crawford, Forrest W.
2017-01-01
Most real-world networks are too large to be measured or studied directly and there is substantial interest in estimating global network properties from smaller sub-samples. One of the most important global properties is the number of vertices/nodes in the network. Estimating the number of vertices in a large network is a major challenge in computer science, epidemiology, demography, and intelligence analysis. In this paper we consider a population random graph G = (V, E) from the stochastic block model (SBM) with K communities/blocks. A sample is obtained by randomly choosing a subset W ⊆ V and letting G(W) be the induced subgraph in G of the vertices in W. In addition to G(W), we observe the total degree of each sampled vertex and its block membership. Given this partial information, we propose an efficient PopULation Size Estimation algorithm, called PULSE, that accurately estimates the size of the whole population as well as the size of each community. To support our theoretical analysis, we perform an exhaustive set of experiments to study the effects of sample size, K, and SBM model parameters on the accuracy of the estimates. The experimental results also demonstrate that PULSE significantly outperforms a widely-used method called the network scale-up estimator in a wide variety of scenarios. PMID:28867924
Estimating the Size of a Large Network and its Communities from a Random Sample.
Chen, Lin; Karbasi, Amin; Crawford, Forrest W
2016-01-01
Most real-world networks are too large to be measured or studied directly and there is substantial interest in estimating global network properties from smaller sub-samples. One of the most important global properties is the number of vertices/nodes in the network. Estimating the number of vertices in a large network is a major challenge in computer science, epidemiology, demography, and intelligence analysis. In this paper we consider a population random graph G = ( V, E ) from the stochastic block model (SBM) with K communities/blocks. A sample is obtained by randomly choosing a subset W ⊆ V and letting G ( W ) be the induced subgraph in G of the vertices in W . In addition to G ( W ), we observe the total degree of each sampled vertex and its block membership. Given this partial information, we propose an efficient PopULation Size Estimation algorithm, called PULSE, that accurately estimates the size of the whole population as well as the size of each community. To support our theoretical analysis, we perform an exhaustive set of experiments to study the effects of sample size, K , and SBM model parameters on the accuracy of the estimates. The experimental results also demonstrate that PULSE significantly outperforms a widely-used method called the network scale-up estimator in a wide variety of scenarios.
Efficient sampling of complex network with modified random walk strategies
NASA Astrophysics Data System (ADS)
Xie, Yunya; Chang, Shuhua; Zhang, Zhipeng; Zhang, Mi; Yang, Lei
2018-02-01
We present two novel random walk strategies, choosing seed node (CSN) random walk and no-retracing (NR) random walk. Different from the classical random walk sampling, the CSN and NR strategies focus on the influences of the seed node choice and path overlap, respectively. Three random walk samplings are applied in the Erdös-Rényi (ER), Barabási-Albert (BA), Watts-Strogatz (WS), and the weighted USAir networks, respectively. Then, the major properties of sampled subnets, such as sampling efficiency, degree distributions, average degree and average clustering coefficient, are studied. The similar conclusions can be reached with these three random walk strategies. Firstly, the networks with small scales and simple structures are conducive to the sampling. Secondly, the average degree and the average clustering coefficient of the sampled subnet tend to the corresponding values of original networks with limited steps. And thirdly, all the degree distributions of the subnets are slightly biased to the high degree side. However, the NR strategy performs better for the average clustering coefficient of the subnet. In the real weighted USAir networks, some obvious characters like the larger clustering coefficient and the fluctuation of degree distribution are reproduced well by these random walk strategies.
Wang, Yuhao; Li, Xin; Xu, Kai; Ren, Fengbo; Yu, Hao
2017-04-01
Compressive sensing is widely used in biomedical applications, and the sampling matrix plays a critical role on both quality and power consumption of signal acquisition. It projects a high-dimensional vector of data into a low-dimensional subspace by matrix-vector multiplication. An optimal sampling matrix can ensure accurate data reconstruction and/or high compression ratio. Most existing optimization methods can only produce real-valued embedding matrices that result in large energy consumption during data acquisition. In this paper, we propose an efficient method that finds an optimal Boolean sampling matrix in order to reduce the energy consumption. Compared to random Boolean embedding, our data-driven Boolean sampling matrix can improve the image recovery quality by 9 dB. Moreover, in terms of sampling hardware complexity, it reduces the energy consumption by 4.6× and the silicon area by 1.9× over the data-driven real-valued embedding.
Convenience samples and caregiving research: how generalizable are the findings?
Pruchno, Rachel A; Brill, Jonathan E; Shands, Yvonne; Gordon, Judith R; Genderson, Maureen Wilson; Rose, Miriam; Cartwright, Francine
2008-12-01
We contrast characteristics of respondents recruited using convenience strategies with those of respondents recruited by random digit dial (RDD) methods. We compare sample variances, means, and interrelationships among variables generated from the convenience and RDD samples. Women aged 50 to 64 who work full time and provide care to a community-dwelling older person were recruited using either RDD (N = 55) or convenience methods (N = 87). Telephone interviews were conducted using reliable, valid measures of demographics, characteristics of the care recipient, help provided to the care recipient, evaluations of caregiver-care recipient relationship, and outcomes common to caregiving research. Convenience and RDD samples had similar variances on 68.4% of the examined variables. We found significant mean differences for 63% of the variables examined. Bivariate correlations suggest that one would reach different conclusions using the convenience and RDD sample data sets. Researchers should use convenience samples cautiously, as they may have limited generalizability.
Soil Sampling Techniques For Alabama Grain Fields
NASA Technical Reports Server (NTRS)
Thompson, A. N.; Shaw, J. N.; Mask, P. L.; Touchton, J. T.; Rickman, D.
2003-01-01
Characterizing the spatial variability of nutrients facilitates precision soil sampling. Questions exist regarding the best technique for directed soil sampling based on a priori knowledge of soil and crop patterns. The objective of this study was to evaluate zone delineation techniques for Alabama grain fields to determine which method best minimized the soil test variability. Site one (25.8 ha) and site three (20.0 ha) were located in the Tennessee Valley region, and site two (24.2 ha) was located in the Coastal Plain region of Alabama. Tennessee Valley soils ranged from well drained Rhodic and Typic Paleudults to somewhat poorly drained Aquic Paleudults and Fluventic Dystrudepts. Coastal Plain s o i l s ranged from coarse-loamy Rhodic Kandiudults to loamy Arenic Kandiudults. Soils were sampled by grid soil sampling methods (grid sizes of 0.40 ha and 1 ha) consisting of: 1) twenty composited cores collected randomly throughout each grid (grid-cell sampling) and, 2) six composited cores collected randomly from a -3x3 m area at the center of each grid (grid-point sampling). Zones were established from 1) an Order 1 Soil Survey, 2) corn (Zea mays L.) yield maps, and 3) airborne remote sensing images. All soil properties were moderately to strongly spatially dependent as per semivariogram analyses. Differences in grid-point and grid-cell soil test values suggested grid-point sampling does not accurately represent grid values. Zones created by soil survey, yield data, and remote sensing images displayed lower coefficient of variations (8CV) for soil test values than overall field values, suggesting these techniques group soil test variability. However, few differences were observed between the three zone delineation techniques. Results suggest directed sampling using zone delineation techniques outlined in this paper would result in more efficient soil sampling for these Alabama grain fields.