Sample records for estimate sample sizes

  1. Sample Size Estimation: The Easy Way

    ERIC Educational Resources Information Center

    Weller, Susan C.

    2015-01-01

    This article presents a simple approach to making quick sample size estimates for basic hypothesis tests. Although there are many sources available for estimating sample sizes, methods are not often integrated across statistical tests, levels of measurement of variables, or effect sizes. A few parameters are required to estimate sample sizes and…

  2. Using known populations of pronghorn to evaluate sampling plans and estimators

    USGS Publications Warehouse

    Kraft, K.M.; Johnson, D.H.; Samuelson, J.M.; Allen, S.H.

    1995-01-01

    Although sampling plans and estimators of abundance have good theoretical properties, their performance in real situations is rarely assessed because true population sizes are unknown. We evaluated widely used sampling plans and estimators of population size on 3 known clustered distributions of pronghorn (Antilocapra americana). Our criteria were accuracy of the estimate, coverage of 95% confidence intervals, and cost. Sampling plans were combinations of sampling intensities (16, 33, and 50%), sample selection (simple random sampling without replacement, systematic sampling, and probability proportional to size sampling with replacement), and stratification. We paired sampling plans with suitable estimators (simple, ratio, and probability proportional to size). We used area of the sampling unit as the auxiliary variable for the ratio and probability proportional to size estimators. All estimators were nearly unbiased, but precision was generally low (overall mean coefficient of variation [CV] = 29). Coverage of 95% confidence intervals was only 89% because of the highly skewed distribution of the pronghorn counts and small sample sizes, especially with stratification. Stratification combined with accurate estimates of optimal stratum sample sizes increased precision, reducing the mean CV from 33 without stratification to 25 with stratification; costs increased 23%. Precise results (mean CV = 13) but poor confidence interval coverage (83%) were obtained with simple and ratio estimators when the allocation scheme included all sampling units in the stratum containing most pronghorn. Although areas of the sampling units varied, ratio estimators and probability proportional to size sampling did not increase precision, possibly because of the clumped distribution of pronghorn. Managers should be cautious in using sampling plans and estimators to estimate abundance of aggregated populations.

  3. Sample Size Calculations for Population Size Estimation Studies Using Multiplier Methods With Respondent-Driven Sampling Surveys.

    PubMed

    Fearon, Elizabeth; Chabata, Sungai T; Thompson, Jennifer A; Cowan, Frances M; Hargreaves, James R

    2017-09-14

    While guidance exists for obtaining population size estimates using multiplier methods with respondent-driven sampling surveys, we lack specific guidance for making sample size decisions. To guide the design of multiplier method population size estimation studies using respondent-driven sampling surveys to reduce the random error around the estimate obtained. The population size estimate is obtained by dividing the number of individuals receiving a service or the number of unique objects distributed (M) by the proportion of individuals in a representative survey who report receipt of the service or object (P). We have developed an approach to sample size calculation, interpreting methods to estimate the variance around estimates obtained using multiplier methods in conjunction with research into design effects and respondent-driven sampling. We describe an application to estimate the number of female sex workers in Harare, Zimbabwe. There is high variance in estimates. Random error around the size estimate reflects uncertainty from M and P, particularly when the estimate of P in the respondent-driven sampling survey is low. As expected, sample size requirements are higher when the design effect of the survey is assumed to be greater. We suggest a method for investigating the effects of sample size on the precision of a population size estimate obtained using multipler methods and respondent-driven sampling. Uncertainty in the size estimate is high, particularly when P is small, so balancing against other potential sources of bias, we advise researchers to consider longer service attendance reference periods and to distribute more unique objects, which is likely to result in a higher estimate of P in the respondent-driven sampling survey. ©Elizabeth Fearon, Sungai T Chabata, Jennifer A Thompson, Frances M Cowan, James R Hargreaves. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 14.09.2017.

  4. The special case of the 2 × 2 table: asymptotic unconditional McNemar test can be used to estimate sample size even for analysis based on GEE.

    PubMed

    Borkhoff, Cornelia M; Johnston, Patrick R; Stephens, Derek; Atenafu, Eshetu

    2015-07-01

    Aligning the method used to estimate sample size with the planned analytic method ensures the sample size needed to achieve the planned power. When using generalized estimating equations (GEE) to analyze a paired binary primary outcome with no covariates, many use an exact McNemar test to calculate sample size. We reviewed the approaches to sample size estimation for paired binary data and compared the sample size estimates on the same numerical examples. We used the hypothesized sample proportions for the 2 × 2 table to calculate the correlation between the marginal proportions to estimate sample size based on GEE. We solved the inside proportions based on the correlation and the marginal proportions to estimate sample size based on exact McNemar, asymptotic unconditional McNemar, and asymptotic conditional McNemar. The asymptotic unconditional McNemar test is a good approximation of GEE method by Pan. The exact McNemar is too conservative and yields unnecessarily large sample size estimates than all other methods. In the special case of a 2 × 2 table, even when a GEE approach to binary logistic regression is the planned analytic method, the asymptotic unconditional McNemar test can be used to estimate sample size. We do not recommend using an exact McNemar test. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. RnaSeqSampleSize: real data based sample size estimation for RNA sequencing.

    PubMed

    Zhao, Shilin; Li, Chung-I; Guo, Yan; Sheng, Quanhu; Shyr, Yu

    2018-05-30

    One of the most important and often neglected components of a successful RNA sequencing (RNA-Seq) experiment is sample size estimation. A few negative binomial model-based methods have been developed to estimate sample size based on the parameters of a single gene. However, thousands of genes are quantified and tested for differential expression simultaneously in RNA-Seq experiments. Thus, additional issues should be carefully addressed, including the false discovery rate for multiple statistic tests, widely distributed read counts and dispersions for different genes. To solve these issues, we developed a sample size and power estimation method named RnaSeqSampleSize, based on the distributions of gene average read counts and dispersions estimated from real RNA-seq data. Datasets from previous, similar experiments such as the Cancer Genome Atlas (TCGA) can be used as a point of reference. Read counts and their dispersions were estimated from the reference's distribution; using that information, we estimated and summarized the power and sample size. RnaSeqSampleSize is implemented in R language and can be installed from Bioconductor website. A user friendly web graphic interface is provided at http://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/ . RnaSeqSampleSize provides a convenient and powerful way for power and sample size estimation for an RNAseq experiment. It is also equipped with several unique features, including estimation for interested genes or pathway, power curve visualization, and parameter optimization.

  6. Blinded sample size re-estimation in three-arm trials with 'gold standard' design.

    PubMed

    Mütze, Tobias; Friede, Tim

    2017-10-15

    In this article, we study blinded sample size re-estimation in the 'gold standard' design with internal pilot study for normally distributed outcomes. The 'gold standard' design is a three-arm clinical trial design that includes an active and a placebo control in addition to an experimental treatment. We focus on the absolute margin approach to hypothesis testing in three-arm trials at which the non-inferiority of the experimental treatment and the assay sensitivity are assessed by pairwise comparisons. We compare several blinded sample size re-estimation procedures in a simulation study assessing operating characteristics including power and type I error. We find that sample size re-estimation based on the popular one-sample variance estimator results in overpowered trials. Moreover, sample size re-estimation based on unbiased variance estimators such as the Xing-Ganju variance estimator results in underpowered trials, as it is expected because an overestimation of the variance and thus the sample size is in general required for the re-estimation procedure to eventually meet the target power. To overcome this problem, we propose an inflation factor for the sample size re-estimation with the Xing-Ganju variance estimator and show that this approach results in adequately powered trials. Because of favorable features of the Xing-Ganju variance estimator such as unbiasedness and a distribution independent of the group means, the inflation factor does not depend on the nuisance parameter and, therefore, can be calculated prior to a trial. Moreover, we prove that the sample size re-estimation based on the Xing-Ganju variance estimator does not bias the effect estimate. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  7. Estimating population size with correlated sampling unit estimates

    Treesearch

    David C. Bowden; Gary C. White; Alan B. Franklin; Joseph L. Ganey

    2003-01-01

    Finite population sampling theory is useful in estimating total population size (abundance) from abundance estimates of each sampled unit (quadrat). We develop estimators that allow correlated quadrat abundance estimates, even for quadrats in different sampling strata. Correlated quadrat abundance estimates based on mark–recapture or distance sampling methods occur...

  8. Estimation of sample size and testing power (Part 4).

    PubMed

    Hu, Liang-ping; Bao, Xiao-lei; Guan, Xue; Zhou, Shi-guo

    2012-01-01

    Sample size estimation is necessary for any experimental or survey research. An appropriate estimation of sample size based on known information and statistical knowledge is of great significance. This article introduces methods of sample size estimation of difference test for data with the design of one factor with two levels, including sample size estimation formulas and realization based on the formulas and the POWER procedure of SAS software for quantitative data and qualitative data with the design of one factor with two levels. In addition, this article presents examples for analysis, which will play a leading role for researchers to implement the repetition principle during the research design phase.

  9. Accounting for twin births in sample size calculations for randomised trials.

    PubMed

    Yelland, Lisa N; Sullivan, Thomas R; Collins, Carmel T; Price, David J; McPhee, Andrew J; Lee, Katherine J

    2018-05-04

    Including twins in randomised trials leads to non-independence or clustering in the data. Clustering has important implications for sample size calculations, yet few trials take this into account. Estimates of the intracluster correlation coefficient (ICC), or the correlation between outcomes of twins, are needed to assist with sample size planning. Our aims were to provide ICC estimates for infant outcomes, describe the information that must be specified in order to account for clustering due to twins in sample size calculations, and develop a simple tool for performing sample size calculations for trials including twins. ICCs were estimated for infant outcomes collected in four randomised trials that included twins. The information required to account for clustering due to twins in sample size calculations is described. A tool that calculates the sample size based on this information was developed in Microsoft Excel and in R as a Shiny web app. ICC estimates ranged between -0.12, indicating a weak negative relationship, and 0.98, indicating a strong positive relationship between outcomes of twins. Example calculations illustrate how the ICC estimates and sample size calculator can be used to determine the target sample size for trials including twins. Clustering among outcomes measured on twins should be taken into account in sample size calculations to obtain the desired power. Our ICC estimates and sample size calculator will be useful for designing future trials that include twins. Publication of additional ICCs is needed to further assist with sample size planning for future trials. © 2018 John Wiley & Sons Ltd.

  10. Determining Sample Size for Accurate Estimation of the Squared Multiple Correlation Coefficient.

    ERIC Educational Resources Information Center

    Algina, James; Olejnik, Stephen

    2000-01-01

    Discusses determining sample size for estimation of the squared multiple correlation coefficient and presents regression equations that permit determination of the sample size for estimating this parameter for up to 20 predictor variables. (SLD)

  11. Distribution of the two-sample t-test statistic following blinded sample size re-estimation.

    PubMed

    Lu, Kaifeng

    2016-05-01

    We consider the blinded sample size re-estimation based on the simple one-sample variance estimator at an interim analysis. We characterize the exact distribution of the standard two-sample t-test statistic at the final analysis. We describe a simulation algorithm for the evaluation of the probability of rejecting the null hypothesis at given treatment effect. We compare the blinded sample size re-estimation method with two unblinded methods with respect to the empirical type I error, the empirical power, and the empirical distribution of the standard deviation estimator and final sample size. We characterize the type I error inflation across the range of standardized non-inferiority margin for non-inferiority trials, and derive the adjusted significance level to ensure type I error control for given sample size of the internal pilot study. We show that the adjusted significance level increases as the sample size of the internal pilot study increases. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  12. Estimating the size of hidden populations using respondent-driven sampling data: Case examples from Morocco

    PubMed Central

    Johnston, Lisa G; McLaughlin, Katherine R; Rhilani, Houssine El; Latifi, Amina; Toufik, Abdalla; Bennani, Aziza; Alami, Kamal; Elomari, Boutaina; Handcock, Mark S

    2015-01-01

    Background Respondent-driven sampling is used worldwide to estimate the population prevalence of characteristics such as HIV/AIDS and associated risk factors in hard-to-reach populations. Estimating the total size of these populations is of great interest to national and international organizations, however reliable measures of population size often do not exist. Methods Successive Sampling-Population Size Estimation (SS-PSE) along with network size imputation allows population size estimates to be made without relying on separate studies or additional data (as in network scale-up, multiplier and capture-recapture methods), which may be biased. Results Ten population size estimates were calculated for people who inject drugs, female sex workers, men who have sex with other men, and migrants from sub-Sahara Africa in six different cities in Morocco. SS-PSE estimates fell within or very close to the likely values provided by experts and the estimates from previous studies using other methods. Conclusions SS-PSE is an effective method for estimating the size of hard-to-reach populations that leverages important information within respondent-driven sampling studies. The addition of a network size imputation method helps to smooth network sizes allowing for more accurate results. However, caution should be used particularly when there is reason to believe that clustered subgroups may exist within the population of interest or when the sample size is small in relation to the population. PMID:26258908

  13. Effects of sample size on estimates of population growth rates calculated with matrix models.

    PubMed

    Fiske, Ian J; Bruna, Emilio M; Bolker, Benjamin M

    2008-08-28

    Matrix models are widely used to study the dynamics and demography of populations. An important but overlooked issue is how the number of individuals sampled influences estimates of the population growth rate (lambda) calculated with matrix models. Even unbiased estimates of vital rates do not ensure unbiased estimates of lambda-Jensen's Inequality implies that even when the estimates of the vital rates are accurate, small sample sizes lead to biased estimates of lambda due to increased sampling variance. We investigated if sampling variability and the distribution of sampling effort among size classes lead to biases in estimates of lambda. Using data from a long-term field study of plant demography, we simulated the effects of sampling variance by drawing vital rates and calculating lambda for increasingly larger populations drawn from a total population of 3842 plants. We then compared these estimates of lambda with those based on the entire population and calculated the resulting bias. Finally, we conducted a review of the literature to determine the sample sizes typically used when parameterizing matrix models used to study plant demography. We found significant bias at small sample sizes when survival was low (survival = 0.5), and that sampling with a more-realistic inverse J-shaped population structure exacerbated this bias. However our simulations also demonstrate that these biases rapidly become negligible with increasing sample sizes or as survival increases. For many of the sample sizes used in demographic studies, matrix models are probably robust to the biases resulting from sampling variance of vital rates. However, this conclusion may depend on the structure of populations or the distribution of sampling effort in ways that are unexplored. We suggest more intensive sampling of populations when individual survival is low and greater sampling of stages with high elasticities.

  14. Effects of tree-to-tree variations on sap flux-based transpiration estimates in a forested watershed

    NASA Astrophysics Data System (ADS)

    Kume, Tomonori; Tsuruta, Kenji; Komatsu, Hikaru; Kumagai, Tomo'omi; Higashi, Naoko; Shinohara, Yoshinori; Otsuki, Kyoichi

    2010-05-01

    To estimate forest stand-scale water use, we assessed how sample sizes affect confidence of stand-scale transpiration (E) estimates calculated from sap flux (Fd) and sapwood area (AS_tree) measurements of individual trees. In a Japanese cypress plantation, we measured Fd and AS_tree in all trees (n = 58) within a 20 × 20 m study plot, which was divided into four 10 × 10 subplots. We calculated E from stand AS_tree (AS_stand) and mean stand Fd (JS) values. Using Monte Carlo analyses, we examined potential errors associated with sample sizes in E, AS_stand, and JS by using the original AS_tree and Fd data sets. Consequently, we defined optimal sample sizes of 10 and 15 for AS_stand and JS estimates, respectively, in the 20 × 20 m plot. Sample sizes greater than the optimal sample sizes did not decrease potential errors. The optimal sample sizes for JS changed according to plot size (e.g., 10 × 10 m and 10 × 20 m), while the optimal sample sizes for AS_stand did not. As well, the optimal sample sizes for JS did not change in different vapor pressure deficit conditions. In terms of E estimates, these results suggest that the tree-to-tree variations in Fd vary among different plots, and that plot size to capture tree-to-tree variations in Fd is an important factor. This study also discusses planning balanced sampling designs to extrapolate stand-scale estimates to catchment-scale estimates.

  15. Species richness in soil bacterial communities: a proposed approach to overcome sample size bias.

    PubMed

    Youssef, Noha H; Elshahed, Mostafa S

    2008-09-01

    Estimates of species richness based on 16S rRNA gene clone libraries are increasingly utilized to gauge the level of bacterial diversity within various ecosystems. However, previous studies have indicated that regardless of the utilized approach, species richness estimates obtained are dependent on the size of the analyzed clone libraries. We here propose an approach to overcome sample size bias in species richness estimates in complex microbial communities. Parametric (Maximum likelihood-based and rarefaction curve-based) and non-parametric approaches were used to estimate species richness in a library of 13,001 near full-length 16S rRNA clones derived from soil, as well as in multiple subsets of the original library. Species richness estimates obtained increased with the increase in library size. To obtain a sample size-unbiased estimate of species richness, we calculated the theoretical clone library sizes required to encounter the estimated species richness at various clone library sizes, used curve fitting to determine the theoretical clone library size required to encounter the "true" species richness, and subsequently determined the corresponding sample size-unbiased species richness value. Using this approach, sample size-unbiased estimates of 17,230, 15,571, and 33,912 were obtained for the ML-based, rarefaction curve-based, and ACE-1 estimators, respectively, compared to bias-uncorrected values of 15,009, 11,913, and 20,909.

  16. Improving the accuracy of livestock distribution estimates through spatial interpolation.

    PubMed

    Bryssinckx, Ward; Ducheyne, Els; Muhwezi, Bernard; Godfrey, Sunday; Mintiens, Koen; Leirs, Herwig; Hendrickx, Guy

    2012-11-01

    Animal distribution maps serve many purposes such as estimating transmission risk of zoonotic pathogens to both animals and humans. The reliability and usability of such maps is highly dependent on the quality of the input data. However, decisions on how to perform livestock surveys are often based on previous work without considering possible consequences. A better understanding of the impact of using different sample designs and processing steps on the accuracy of livestock distribution estimates was acquired through iterative experiments using detailed survey. The importance of sample size, sample design and aggregation is demonstrated and spatial interpolation is presented as a potential way to improve cattle number estimates. As expected, results show that an increasing sample size increased the precision of cattle number estimates but these improvements were mainly seen when the initial sample size was relatively low (e.g. a median relative error decrease of 0.04% per sampled parish for sample sizes below 500 parishes). For higher sample sizes, the added value of further increasing the number of samples declined rapidly (e.g. a median relative error decrease of 0.01% per sampled parish for sample sizes above 500 parishes. When a two-stage stratified sample design was applied to yield more evenly distributed samples, accuracy levels were higher for low sample densities and stabilised at lower sample sizes compared to one-stage stratified sampling. Aggregating the resulting cattle number estimates yielded significantly more accurate results because of averaging under- and over-estimates (e.g. when aggregating cattle number estimates from subcounty to district level, P <0.009 based on a sample of 2,077 parishes using one-stage stratified samples). During aggregation, area-weighted mean values were assigned to higher administrative unit levels. However, when this step is preceded by a spatial interpolation to fill in missing values in non-sampled areas, accuracy is improved remarkably. This counts especially for low sample sizes and spatially even distributed samples (e.g. P <0.001 for a sample of 170 parishes using one-stage stratified sampling and aggregation on district level). Whether the same observations apply on a lower spatial scale should be further investigated.

  17. Influences of sampling size and pattern on the uncertainty of correlation estimation between soil water content and its influencing factors

    NASA Astrophysics Data System (ADS)

    Lai, Xiaoming; Zhu, Qing; Zhou, Zhiwen; Liao, Kaihua

    2017-12-01

    In this study, seven random combination sampling strategies were applied to investigate the uncertainties in estimating the hillslope mean soil water content (SWC) and correlation coefficients between the SWC and soil/terrain properties on a tea + bamboo hillslope. One of the sampling strategies is the global random sampling and the other six are the stratified random sampling on the top, middle, toe, top + mid, top + toe and mid + toe slope positions. When each sampling strategy was applied, sample sizes were gradually reduced and each sampling size contained 3000 replicates. Under each sampling size of each sampling strategy, the relative errors (REs) and coefficients of variation (CVs) of the estimated hillslope mean SWC and correlation coefficients between the SWC and soil/terrain properties were calculated to quantify the accuracy and uncertainty. The results showed that the uncertainty of the estimations decreased as the sampling size increasing. However, larger sample sizes were required to reduce the uncertainty in correlation coefficient estimation than in hillslope mean SWC estimation. Under global random sampling, 12 randomly sampled sites on this hillslope were adequate to estimate the hillslope mean SWC with RE and CV ≤10%. However, at least 72 randomly sampled sites were needed to ensure the estimated correlation coefficients with REs and CVs ≤10%. Comparing with all sampling strategies, reducing sampling sites on the middle slope had the least influence on the estimation of hillslope mean SWC and correlation coefficients. Under this strategy, 60 sites (10 on the middle slope and 50 on the top and toe slopes) were enough to ensure the estimated correlation coefficients with REs and CVs ≤10%. This suggested that when designing the SWC sampling, the proportion of sites on the middle slope can be reduced to 16.7% of the total number of sites. Findings of this study will be useful for the optimal SWC sampling design.

  18. Estimation of sample size and testing power (part 5).

    PubMed

    Hu, Liang-ping; Bao, Xiao-lei; Guan, Xue; Zhou, Shi-guo

    2012-02-01

    Estimation of sample size and testing power is an important component of research design. This article introduced methods for sample size and testing power estimation of difference test for quantitative and qualitative data with the single-group design, the paired design or the crossover design. To be specific, this article introduced formulas for sample size and testing power estimation of difference test for quantitative and qualitative data with the above three designs, the realization based on the formulas and the POWER procedure of SAS software and elaborated it with examples, which will benefit researchers for implementing the repetition principle.

  19. Detecting spatial structures in throughfall data: The effect of extent, sample size, sampling design, and variogram estimation method

    NASA Astrophysics Data System (ADS)

    Voss, Sebastian; Zimmermann, Beate; Zimmermann, Alexander

    2016-09-01

    In the last decades, an increasing number of studies analyzed spatial patterns in throughfall by means of variograms. The estimation of the variogram from sample data requires an appropriate sampling scheme: most importantly, a large sample and a layout of sampling locations that often has to serve both variogram estimation and geostatistical prediction. While some recommendations on these aspects exist, they focus on Gaussian data and high ratios of the variogram range to the extent of the study area. However, many hydrological data, and throughfall data in particular, do not follow a Gaussian distribution. In this study, we examined the effect of extent, sample size, sampling design, and calculation method on variogram estimation of throughfall data. For our investigation, we first generated non-Gaussian random fields based on throughfall data with large outliers. Subsequently, we sampled the fields with three extents (plots with edge lengths of 25 m, 50 m, and 100 m), four common sampling designs (two grid-based layouts, transect and random sampling) and five sample sizes (50, 100, 150, 200, 400). We then estimated the variogram parameters by method-of-moments (non-robust and robust estimators) and residual maximum likelihood. Our key findings are threefold. First, the choice of the extent has a substantial influence on the estimation of the variogram. A comparatively small ratio of the extent to the correlation length is beneficial for variogram estimation. Second, a combination of a minimum sample size of 150, a design that ensures the sampling of small distances and variogram estimation by residual maximum likelihood offers a good compromise between accuracy and efficiency. Third, studies relying on method-of-moments based variogram estimation may have to employ at least 200 sampling points for reliable variogram estimates. These suggested sample sizes exceed the number recommended by studies dealing with Gaussian data by up to 100 %. Given that most previous throughfall studies relied on method-of-moments variogram estimation and sample sizes ≪200, currently available data are prone to large uncertainties.

  20. A computer program for sample size computations for banding studies

    USGS Publications Warehouse

    Wilson, K.R.; Nichols, J.D.; Hines, J.E.

    1989-01-01

    Sample sizes necessary for estimating survival rates of banded birds, adults and young, are derived based on specified levels of precision. The banding study can be new or ongoing. The desired coefficient of variation (CV) for annual survival estimates, the CV for mean annual survival estimates, and the length of the study must be specified to compute sample sizes. A computer program is available for computation of the sample sizes, and a description of the input and output is provided.

  1. Performance and separation occurrence of binary probit regression estimator using maximum likelihood method and Firths approach under different sample size

    NASA Astrophysics Data System (ADS)

    Lusiana, Evellin Dewi

    2017-12-01

    The parameters of binary probit regression model are commonly estimated by using Maximum Likelihood Estimation (MLE) method. However, MLE method has limitation if the binary data contains separation. Separation is the condition where there are one or several independent variables that exactly grouped the categories in binary response. It will result the estimators of MLE method become non-convergent, so that they cannot be used in modeling. One of the effort to resolve the separation is using Firths approach instead. This research has two aims. First, to identify the chance of separation occurrence in binary probit regression model between MLE method and Firths approach. Second, to compare the performance of binary probit regression model estimator that obtained by MLE method and Firths approach using RMSE criteria. Those are performed using simulation method and under different sample size. The results showed that the chance of separation occurrence in MLE method for small sample size is higher than Firths approach. On the other hand, for larger sample size, the probability decreased and relatively identic between MLE method and Firths approach. Meanwhile, Firths estimators have smaller RMSE than MLEs especially for smaller sample sizes. But for larger sample sizes, the RMSEs are not much different. It means that Firths estimators outperformed MLE estimator.

  2. Sample Size and Item Parameter Estimation Precision When Utilizing the One-Parameter "Rasch" Model

    ERIC Educational Resources Information Center

    Custer, Michael

    2015-01-01

    This study examines the relationship between sample size and item parameter estimation precision when utilizing the one-parameter model. Item parameter estimates are examined relative to "true" values by evaluating the decline in root mean squared deviation (RMSD) and the number of outliers as sample size increases. This occurs across…

  3. Comparison of Sample Size by Bootstrap and by Formulas Based on Normal Distribution Assumption.

    PubMed

    Wang, Zuozhen

    2018-01-01

    Bootstrapping technique is distribution-independent, which provides an indirect way to estimate the sample size for a clinical trial based on a relatively smaller sample. In this paper, sample size estimation to compare two parallel-design arms for continuous data by bootstrap procedure are presented for various test types (inequality, non-inferiority, superiority, and equivalence), respectively. Meanwhile, sample size calculation by mathematical formulas (normal distribution assumption) for the identical data are also carried out. Consequently, power difference between the two calculation methods is acceptably small for all the test types. It shows that the bootstrap procedure is a credible technique for sample size estimation. After that, we compared the powers determined using the two methods based on data that violate the normal distribution assumption. To accommodate the feature of the data, the nonparametric statistical method of Wilcoxon test was applied to compare the two groups in the data during the process of bootstrap power estimation. As a result, the power estimated by normal distribution-based formula is far larger than that by bootstrap for each specific sample size per group. Hence, for this type of data, it is preferable that the bootstrap method be applied for sample size calculation at the beginning, and that the same statistical method as used in the subsequent statistical analysis is employed for each bootstrap sample during the course of bootstrap sample size estimation, provided there is historical true data available that can be well representative of the population to which the proposed trial is planning to extrapolate.

  4. Detecting spatial structures in throughfall data: the effect of extent, sample size, sampling design, and variogram estimation method

    NASA Astrophysics Data System (ADS)

    Voss, Sebastian; Zimmermann, Beate; Zimmermann, Alexander

    2016-04-01

    In the last three decades, an increasing number of studies analyzed spatial patterns in throughfall to investigate the consequences of rainfall redistribution for biogeochemical and hydrological processes in forests. In the majority of cases, variograms were used to characterize the spatial properties of the throughfall data. The estimation of the variogram from sample data requires an appropriate sampling scheme: most importantly, a large sample and an appropriate layout of sampling locations that often has to serve both variogram estimation and geostatistical prediction. While some recommendations on these aspects exist, they focus on Gaussian data and high ratios of the variogram range to the extent of the study area. However, many hydrological data, and throughfall data in particular, do not follow a Gaussian distribution. In this study, we examined the effect of extent, sample size, sampling design, and calculation methods on variogram estimation of throughfall data. For our investigation, we first generated non-Gaussian random fields based on throughfall data with heavy outliers. Subsequently, we sampled the fields with three extents (plots with edge lengths of 25 m, 50 m, and 100 m), four common sampling designs (two grid-based layouts, transect and random sampling), and five sample sizes (50, 100, 150, 200, 400). We then estimated the variogram parameters by method-of-moments and residual maximum likelihood. Our key findings are threefold. First, the choice of the extent has a substantial influence on the estimation of the variogram. A comparatively small ratio of the extent to the correlation length is beneficial for variogram estimation. Second, a combination of a minimum sample size of 150, a design that ensures the sampling of small distances and variogram estimation by residual maximum likelihood offers a good compromise between accuracy and efficiency. Third, studies relying on method-of-moments based variogram estimation may have to employ at least 200 sampling points for reliable variogram estimates. These suggested sample sizes exceed the numbers recommended by studies dealing with Gaussian data by up to 100 %. Given that most previous throughfall studies relied on method-of-moments variogram estimation and sample sizes << 200, our current knowledge about throughfall spatial variability stands on shaky ground.

  5. Effects of Calibration Sample Size and Item Bank Size on Ability Estimation in Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Sahin, Alper; Weiss, David J.

    2015-01-01

    This study aimed to investigate the effects of calibration sample size and item bank size on examinee ability estimation in computerized adaptive testing (CAT). For this purpose, a 500-item bank pre-calibrated using the three-parameter logistic model with 10,000 examinees was simulated. Calibration samples of varying sizes (150, 250, 350, 500,…

  6. Simulation analyses of space use: Home range estimates, variability, and sample size

    USGS Publications Warehouse

    Bekoff, Marc; Mech, L. David

    1984-01-01

    Simulations of space use by animals were run to determine the relationship among home range area estimates, variability, and sample size (number of locations). As sample size increased, home range size increased asymptotically, whereas variability decreased among mean home range area estimates generated by multiple simulations for the same sample size. Our results suggest that field workers should ascertain between 100 and 200 locations in order to estimate reliably home range area. In some cases, this suggested guideline is higher than values found in the few published studies in which the relationship between home range area and number of locations is addressed. Sampling differences for small species occupying relatively small home ranges indicate that fewer locations may be sufficient to allow for a reliable estimate of home range. Intraspecific variability in social status (group member, loner, resident, transient), age, sex, reproductive condition, and food resources also have to be considered, as do season, habitat, and differences in sampling and analytical methods. Comparative data still are needed.

  7. Estimation After a Group Sequential Trial.

    PubMed

    Milanzi, Elasma; Molenberghs, Geert; Alonso, Ariel; Kenward, Michael G; Tsiatis, Anastasios A; Davidian, Marie; Verbeke, Geert

    2015-10-01

    Group sequential trials are one important instance of studies for which the sample size is not fixed a priori but rather takes one of a finite set of pre-specified values, dependent on the observed data. Much work has been devoted to the inferential consequences of this design feature. Molenberghs et al (2012) and Milanzi et al (2012) reviewed and extended the existing literature, focusing on a collection of seemingly disparate, but related, settings, namely completely random sample sizes, group sequential studies with deterministic and random stopping rules, incomplete data, and random cluster sizes. They showed that the ordinary sample average is a viable option for estimation following a group sequential trial, for a wide class of stopping rules and for random outcomes with a distribution in the exponential family. Their results are somewhat surprising in the sense that the sample average is not optimal, and further, there does not exist an optimal, or even, unbiased linear estimator. However, the sample average is asymptotically unbiased, both conditionally upon the observed sample size as well as marginalized over it. By exploiting ignorability they showed that the sample average is the conventional maximum likelihood estimator. They also showed that a conditional maximum likelihood estimator is finite sample unbiased, but is less efficient than the sample average and has the larger mean squared error. Asymptotically, the sample average and the conditional maximum likelihood estimator are equivalent. This previous work is restricted, however, to the situation in which the the random sample size can take only two values, N = n or N = 2 n . In this paper, we consider the more practically useful setting of sample sizes in a the finite set { n 1 , n 2 , …, n L }. It is shown that the sample average is then a justifiable estimator , in the sense that it follows from joint likelihood estimation, and it is consistent and asymptotically unbiased. We also show why simulations can give the false impression of bias in the sample average when considered conditional upon the sample size. The consequence is that no corrections need to be made to estimators following sequential trials. When small-sample bias is of concern, the conditional likelihood estimator provides a relatively straightforward modification to the sample average. Finally, it is shown that classical likelihood-based standard errors and confidence intervals can be applied, obviating the need for technical corrections.

  8. Using known map category marginal frequencies to improve estimates of thematic map accuracy

    NASA Technical Reports Server (NTRS)

    Card, D. H.

    1982-01-01

    By means of two simple sampling plans suggested in the accuracy-assessment literature, it is shown how one can use knowledge of map-category relative sizes to improve estimates of various probabilities. The fact that maximum likelihood estimates of cell probabilities for the simple random sampling and map category-stratified sampling were identical has permitted a unified treatment of the contingency-table analysis. A rigorous analysis of the effect of sampling independently within map categories is made possible by results for the stratified case. It is noted that such matters as optimal sample size selection for the achievement of a desired level of precision in various estimators are irrelevant, since the estimators derived are valid irrespective of how sample sizes are chosen.

  9. A cautionary note on Bayesian estimation of population size by removal sampling with diffuse priors.

    PubMed

    Bord, Séverine; Bioche, Christèle; Druilhet, Pierre

    2018-05-01

    We consider the problem of estimating a population size by removal sampling when the sampling rate is unknown. Bayesian methods are now widespread and allow to include prior knowledge in the analysis. However, we show that Bayes estimates based on default improper priors lead to improper posteriors or infinite estimates. Similarly, weakly informative priors give unstable estimators that are sensitive to the choice of hyperparameters. By examining the likelihood, we show that population size estimates can be stabilized by penalizing small values of the sampling rate or large value of the population size. Based on theoretical results and simulation studies, we propose some recommendations on the choice of the prior. Then, we applied our results to real datasets. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Sampling system for wheat (Triticum aestivum L) area estimation using digital LANDSAT MSS data and aerial photographs. [Brazil

    NASA Technical Reports Server (NTRS)

    Parada, N. D. J. (Principal Investigator); Moreira, M. A.; Chen, S. C.; Batista, G. T.

    1984-01-01

    A procedure to estimate wheat (Triticum aestivum L) area using sampling technique based on aerial photographs and digital LANDSAT MSS data is developed. Aerial photographs covering 720 square km are visually analyzed. To estimate wheat area, a regression approach is applied using different sample sizes and various sampling units. As the size of sampling unit decreased, the percentage of sampled area required to obtain similar estimation performance also decreased. The lowest percentage of the area sampled for wheat estimation with relatively high precision and accuracy through regression estimation is 13.90% using 10 square km as the sampling unit. Wheat area estimation using only aerial photographs is less precise and accurate than those obtained by regression estimation.

  11. Sample size considerations for paired experimental design with incomplete observations of continuous outcomes.

    PubMed

    Zhu, Hong; Xu, Xiaohan; Ahn, Chul

    2017-01-01

    Paired experimental design is widely used in clinical and health behavioral studies, where each study unit contributes a pair of observations. Investigators often encounter incomplete observations of paired outcomes in the data collected. Some study units contribute complete pairs of observations, while the others contribute either pre- or post-intervention observations. Statistical inference for paired experimental design with incomplete observations of continuous outcomes has been extensively studied in literature. However, sample size method for such study design is sparsely available. We derive a closed-form sample size formula based on the generalized estimating equation approach by treating the incomplete observations as missing data in a linear model. The proposed method properly accounts for the impact of mixed structure of observed data: a combination of paired and unpaired outcomes. The sample size formula is flexible to accommodate different missing patterns, magnitude of missingness, and correlation parameter values. We demonstrate that under complete observations, the proposed generalized estimating equation sample size estimate is the same as that based on the paired t-test. In the presence of missing data, the proposed method would lead to a more accurate sample size estimate comparing with the crude adjustment. Simulation studies are conducted to evaluate the finite-sample performance of the generalized estimating equation sample size formula. A real application example is presented for illustration.

  12. Sampling for area estimation: A comparison of full-frame sampling with the sample segment approach. [Kansas

    NASA Technical Reports Server (NTRS)

    Hixson, M. M.; Bauer, M. E.; Davis, B. J.

    1979-01-01

    The effect of sampling on the accuracy (precision and bias) of crop area estimates made from classifications of LANDSAT MSS data was investigated. Full-frame classifications of wheat and non-wheat for eighty counties in Kansas were repetitively sampled to simulate alternative sampling plants. Four sampling schemes involving different numbers of samples and different size sampling units were evaluated. The precision of the wheat area estimates increased as the segment size decreased and the number of segments was increased. Although the average bias associated with the various sampling schemes was not significantly different, the maximum absolute bias was directly related to sampling unit size.

  13. Sample size requirements for estimating effective dose from computed tomography using solid-state metal-oxide-semiconductor field-effect transistor dosimetry

    PubMed Central

    Trattner, Sigal; Cheng, Bin; Pieniazek, Radoslaw L.; Hoffmann, Udo; Douglas, Pamela S.; Einstein, Andrew J.

    2014-01-01

    Purpose: Effective dose (ED) is a widely used metric for comparing ionizing radiation burden between different imaging modalities, scanners, and scan protocols. In computed tomography (CT), ED can be estimated by performing scans on an anthropomorphic phantom in which metal-oxide-semiconductor field-effect transistor (MOSFET) solid-state dosimeters have been placed to enable organ dose measurements. Here a statistical framework is established to determine the sample size (number of scans) needed for estimating ED to a desired precision and confidence, for a particular scanner and scan protocol, subject to practical limitations. Methods: The statistical scheme involves solving equations which minimize the sample size required for estimating ED to desired precision and confidence. It is subject to a constrained variation of the estimated ED and solved using the Lagrange multiplier method. The scheme incorporates measurement variation introduced both by MOSFET calibration, and by variation in MOSFET readings between repeated CT scans. Sample size requirements are illustrated on cardiac, chest, and abdomen–pelvis CT scans performed on a 320-row scanner and chest CT performed on a 16-row scanner. Results: Sample sizes for estimating ED vary considerably between scanners and protocols. Sample size increases as the required precision or confidence is higher and also as the anticipated ED is lower. For example, for a helical chest protocol, for 95% confidence and 5% precision for the ED, 30 measurements are required on the 320-row scanner and 11 on the 16-row scanner when the anticipated ED is 4 mSv; these sample sizes are 5 and 2, respectively, when the anticipated ED is 10 mSv. Conclusions: Applying the suggested scheme, it was found that even at modest sample sizes, it is feasible to estimate ED with high precision and a high degree of confidence. As CT technology develops enabling ED to be lowered, more MOSFET measurements are needed to estimate ED with the same precision and confidence. PMID:24694150

  14. Estimating numbers of females with cubs-of-the-year in the Yellowstone grizzly bear population

    USGS Publications Warehouse

    Keating, K.A.; Schwartz, C.C.; Haroldson, M.A.; Moody, D.

    2001-01-01

    For grizzly bears (Ursus arctos horribilis) in the Greater Yellowstone Ecosystem (GYE), minimum population size and allowable numbers of human-caused mortalities have been calculated as a function of the number of unique females with cubs-of-the-year (FCUB) seen during a 3- year period. This approach underestimates the total number of FCUB, thereby biasing estimates of population size and sustainable mortality. Also, it does not permit calculation of valid confidence bounds. Many statistical methods can resolve or mitigate these problems, but there is no universal best method. Instead, relative performances of different methods can vary with population size, sample size, and degree of heterogeneity among sighting probabilities for individual animals. We compared 7 nonparametric estimators, using Monte Carlo techniques to assess performances over the range of sampling conditions deemed plausible for the Yellowstone population. Our goal was to estimate the number of FCUB present in the population each year. Our evaluation differed from previous comparisons of such estimators by including sample coverage methods and by treating individual sightings, rather than sample periods, as the sample unit. Consequently, our conclusions also differ from earlier studies. Recommendations regarding estimators and necessary sample sizes are presented, together with estimates of annual numbers of FCUB in the Yellowstone population with bootstrap confidence bounds.

  15. Estimation after classification using lot quality assurance sampling: corrections for curtailed sampling with application to evaluating polio vaccination campaigns.

    PubMed

    Olives, Casey; Valadez, Joseph J; Pagano, Marcello

    2014-03-01

    To assess the bias incurred when curtailment of Lot Quality Assurance Sampling (LQAS) is ignored, to present unbiased estimators, to consider the impact of cluster sampling by simulation and to apply our method to published polio immunization data from Nigeria. We present estimators of coverage when using two kinds of curtailed LQAS strategies: semicurtailed and curtailed. We study the proposed estimators with independent and clustered data using three field-tested LQAS designs for assessing polio vaccination coverage, with samples of size 60 and decision rules of 9, 21 and 33, and compare them to biased maximum likelihood estimators. Lastly, we present estimates of polio vaccination coverage from previously published data in 20 local government authorities (LGAs) from five Nigerian states. Simulations illustrate substantial bias if one ignores the curtailed sampling design. Proposed estimators show no bias. Clustering does not affect the bias of these estimators. Across simulations, standard errors show signs of inflation as clustering increases. Neither sampling strategy nor LQAS design influences estimates of polio vaccination coverage in 20 Nigerian LGAs. When coverage is low, semicurtailed LQAS strategies considerably reduces the sample size required to make a decision. Curtailed LQAS designs further reduce the sample size when coverage is high. Results presented dispel the misconception that curtailed LQAS data are unsuitable for estimation. These findings augment the utility of LQAS as a tool for monitoring vaccination efforts by demonstrating that unbiased estimation using curtailed designs is not only possible but these designs also reduce the sample size. © 2014 John Wiley & Sons Ltd.

  16. Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression.

    PubMed

    Candel, Math J J M; Van Breukelen, Gerard J P

    2010-06-30

    Adjustments of sample size formulas are given for varying cluster sizes in cluster randomized trials with a binary outcome when testing the treatment effect with mixed effects logistic regression using second-order penalized quasi-likelihood estimation (PQL). Starting from first-order marginal quasi-likelihood (MQL) estimation of the treatment effect, the asymptotic relative efficiency of unequal versus equal cluster sizes is derived. A Monte Carlo simulation study shows this asymptotic relative efficiency to be rather accurate for realistic sample sizes, when employing second-order PQL. An approximate, simpler formula is presented to estimate the efficiency loss due to varying cluster sizes when planning a trial. In many cases sampling 14 per cent more clusters is sufficient to repair the efficiency loss due to varying cluster sizes. Since current closed-form formulas for sample size calculation are based on first-order MQL, planning a trial also requires a conversion factor to obtain the variance of the second-order PQL estimator. In a second Monte Carlo study, this conversion factor turned out to be 1.25 at most. (c) 2010 John Wiley & Sons, Ltd.

  17. Influence of sampling window size and orientation on parafoveal cone packing density

    PubMed Central

    Lombardo, Marco; Serrao, Sebastiano; Ducoli, Pietro; Lombardo, Giuseppe

    2013-01-01

    We assessed the agreement between sampling windows of different size and orientation on packing density estimates in images of the parafoveal cone mosaic acquired using a flood-illumination adaptive optics retinal camera. Horizontal and vertical oriented sampling windows of different size (320x160 µm, 160x80 µm and 80x40 µm) were selected in two retinal locations along the horizontal meridian in one eye of ten subjects. At each location, cone density tended to decline with decreasing sampling area. Although the differences in cone density estimates were not statistically significant, Bland-Altman plots showed that the agreement between cone density estimated within the different sampling window conditions was moderate. The percentage of the preferred packing arrangements of cones by Voronoi tiles was slightly affected by window size and orientation. The results illustrated the high importance of specifying the size and orientation of the sampling window used to derive cone metric estimates to facilitate comparison of different studies. PMID:24009995

  18. Monitoring landscape metrics by point sampling: accuracy in estimating Shannon's diversity and edge density.

    PubMed

    Ramezani, Habib; Holm, Sören; Allard, Anna; Ståhl, Göran

    2010-05-01

    Environmental monitoring of landscapes is of increasing interest. To quantify landscape patterns, a number of metrics are used, of which Shannon's diversity, edge length, and density are studied here. As an alternative to complete mapping, point sampling was applied to estimate the metrics for already mapped landscapes selected from the National Inventory of Landscapes in Sweden (NILS). Monte-Carlo simulation was applied to study the performance of different designs. Random and systematic samplings were applied for four sample sizes and five buffer widths. The latter feature was relevant for edge length, since length was estimated through the number of points falling in buffer areas around edges. In addition, two landscape complexities were tested by applying two classification schemes with seven or 20 land cover classes to the NILS data. As expected, the root mean square error (RMSE) of the estimators decreased with increasing sample size. The estimators of both metrics were slightly biased, but the bias of Shannon's diversity estimator was shown to decrease when sample size increased. In the edge length case, an increasing buffer width resulted in larger bias due to the increased impact of boundary conditions; this effect was shown to be independent of sample size. However, we also developed adjusted estimators that eliminate the bias of the edge length estimator. The rates of decrease of RMSE with increasing sample size and buffer width were quantified by a regression model. Finally, indicative cost-accuracy relationships were derived showing that point sampling could be a competitive alternative to complete wall-to-wall mapping.

  19. Experiments with central-limit properties of spatial samples from locally covariant random fields

    USGS Publications Warehouse

    Barringer, T.H.; Smith, T.E.

    1992-01-01

    When spatial samples are statistically dependent, the classical estimator of sample-mean standard deviation is well known to be inconsistent. For locally dependent samples, however, consistent estimators of sample-mean standard deviation can be constructed. The present paper investigates the sampling properties of one such estimator, designated as the tau estimator of sample-mean standard deviation. In particular, the asymptotic normality properties of standardized sample means based on tau estimators are studied in terms of computer experiments with simulated sample-mean distributions. The effects of both sample size and dependency levels among samples are examined for various value of tau (denoting the size of the spatial kernel for the estimator). The results suggest that even for small degrees of spatial dependency, the tau estimator exhibits significantly stronger normality properties than does the classical estimator of standardized sample means. ?? 1992.

  20. Considerations in Forest Growth Estimation Between Two Measurements of Mapped Forest Inventory Plots

    Treesearch

    Michael T. Thompson

    2006-01-01

    Several aspects of the enhanced Forest Inventory and Analysis (FIA) program?s national plot design complicate change estimation. The design incorporates up to three separate plot sizes (microplot, subplot, and macroplot) to sample trees of different sizes. Because multiple plot sizes are involved, change estimators designed for polyareal plot sampling, such as those...

  1. Sample size and power calculations for detecting changes in malaria transmission using antibody seroconversion rate.

    PubMed

    Sepúlveda, Nuno; Paulino, Carlos Daniel; Drakeley, Chris

    2015-12-30

    Several studies have highlighted the use of serological data in detecting a reduction in malaria transmission intensity. These studies have typically used serology as an adjunct measure and no formal examination of sample size calculations for this approach has been conducted. A sample size calculator is proposed for cross-sectional surveys using data simulation from a reverse catalytic model assuming a reduction in seroconversion rate (SCR) at a given change point before sampling. This calculator is based on logistic approximations for the underlying power curves to detect a reduction in SCR in relation to the hypothesis of a stable SCR for the same data. Sample sizes are illustrated for a hypothetical cross-sectional survey from an African population assuming a known or unknown change point. Overall, data simulation demonstrates that power is strongly affected by assuming a known or unknown change point. Small sample sizes are sufficient to detect strong reductions in SCR, but invariantly lead to poor precision of estimates for current SCR. In this situation, sample size is better determined by controlling the precision of SCR estimates. Conversely larger sample sizes are required for detecting more subtle reductions in malaria transmission but those invariantly increase precision whilst reducing putative estimation bias. The proposed sample size calculator, although based on data simulation, shows promise of being easily applicable to a range of populations and survey types. Since the change point is a major source of uncertainty, obtaining or assuming prior information about this parameter might reduce both the sample size and the chance of generating biased SCR estimates.

  2. Sample size requirements for estimating effective dose from computed tomography using solid-state metal-oxide-semiconductor field-effect transistor dosimetry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Trattner, Sigal; Cheng, Bin; Pieniazek, Radoslaw L.

    2014-04-15

    Purpose: Effective dose (ED) is a widely used metric for comparing ionizing radiation burden between different imaging modalities, scanners, and scan protocols. In computed tomography (CT), ED can be estimated by performing scans on an anthropomorphic phantom in which metal-oxide-semiconductor field-effect transistor (MOSFET) solid-state dosimeters have been placed to enable organ dose measurements. Here a statistical framework is established to determine the sample size (number of scans) needed for estimating ED to a desired precision and confidence, for a particular scanner and scan protocol, subject to practical limitations. Methods: The statistical scheme involves solving equations which minimize the sample sizemore » required for estimating ED to desired precision and confidence. It is subject to a constrained variation of the estimated ED and solved using the Lagrange multiplier method. The scheme incorporates measurement variation introduced both by MOSFET calibration, and by variation in MOSFET readings between repeated CT scans. Sample size requirements are illustrated on cardiac, chest, and abdomen–pelvis CT scans performed on a 320-row scanner and chest CT performed on a 16-row scanner. Results: Sample sizes for estimating ED vary considerably between scanners and protocols. Sample size increases as the required precision or confidence is higher and also as the anticipated ED is lower. For example, for a helical chest protocol, for 95% confidence and 5% precision for the ED, 30 measurements are required on the 320-row scanner and 11 on the 16-row scanner when the anticipated ED is 4 mSv; these sample sizes are 5 and 2, respectively, when the anticipated ED is 10 mSv. Conclusions: Applying the suggested scheme, it was found that even at modest sample sizes, it is feasible to estimate ED with high precision and a high degree of confidence. As CT technology develops enabling ED to be lowered, more MOSFET measurements are needed to estimate ED with the same precision and confidence.« less

  3. Post-stratified estimation: with-in strata and total sample size recommendations

    Treesearch

    James A. Westfall; Paul L. Patterson; John W. Coulston

    2011-01-01

    Post-stratification is used to reduce the variance of estimates of the mean. Because the stratification is not fixed in advance, within-strata sample sizes can be quite small. The survey statistics literature provides some guidance on minimum within-strata sample sizes; however, the recommendations and justifications are inconsistent and apply broadly for many...

  4. Estimating the probability that the sample mean is within a desired fraction of the standard deviation of the true mean.

    PubMed

    Schillaci, Michael A; Schillaci, Mario E

    2009-02-01

    The use of small sample sizes in human and primate evolutionary research is commonplace. Estimating how well small samples represent the underlying population, however, is not commonplace. Because the accuracy of determinations of taxonomy, phylogeny, and evolutionary process are dependant upon how well the study sample represents the population of interest, characterizing the uncertainty, or potential error, associated with analyses of small sample sizes is essential. We present a method for estimating the probability that the sample mean is within a desired fraction of the standard deviation of the true mean using small (n<10) or very small (n < or = 5) sample sizes. This method can be used by researchers to determine post hoc the probability that their sample is a meaningful approximation of the population parameter. We tested the method using a large craniometric data set commonly used by researchers in the field. Given our results, we suggest that sample estimates of the population mean can be reasonable and meaningful even when based on small, and perhaps even very small, sample sizes.

  5. Investigation of the Specht density estimator

    NASA Technical Reports Server (NTRS)

    Speed, F. M.; Rydl, L. M.

    1971-01-01

    The feasibility of using the Specht density estimator function on the IBM 360/44 computer is investigated. Factors such as storage, speed, amount of calculations, size of the smoothing parameter and sample size have an effect on the results. The reliability of the Specht estimator for normal and uniform distributions and the effects of the smoothing parameter and sample size are investigated.

  6. Sample size for estimating mean and coefficient of variation in species of crotalarias.

    PubMed

    Toebe, Marcos; Machado, Letícia N; Tartaglia, Francieli L; Carvalho, Juliana O DE; Bandeira, Cirineu T; Cargnelutti Filho, Alberto

    2018-04-16

    The objective of this study was to determine the sample size necessary to estimate the mean and coefficient of variation in four species of crotalarias (C. juncea, C. spectabilis, C. breviflora and C. ochroleuca). An experiment was carried out for each species during the season 2014/15. At harvest, 1,000 pods of each species were randomly collected. In each pod were measured: mass of pod with and without seeds, length, width and height of pods, number and mass of seeds per pod, and mass of hundred seeds. Measures of central tendency, variability and distribution were calculated, and the normality was verified. The sample size necessary to estimate the mean and coefficient of variation with amplitudes of the confidence interval of 95% (ACI95%) of 2%, 4%, ..., 20% was determined by resampling with replacement. The sample size varies among species and characters, being necessary a larger sample size to estimate the mean in relation of the necessary for the coefficient of variation.

  7. Sampling strategies for estimating brook trout effective population size

    Treesearch

    Andrew R. Whiteley; Jason A. Coombs; Mark Hudy; Zachary Robinson; Keith H. Nislow; Benjamin H. Letcher

    2012-01-01

    The influence of sampling strategy on estimates of effective population size (Ne) from single-sample genetic methods has not been rigorously examined, though these methods are increasingly used. For headwater salmonids, spatially close kin association among age-0 individuals suggests that sampling strategy (number of individuals and location from...

  8. Approximate sample sizes required to estimate length distributions

    USGS Publications Warehouse

    Miranda, L.E.

    2007-01-01

    The sample sizes required to estimate fish length were determined by bootstrapping from reference length distributions. Depending on population characteristics and species-specific maximum lengths, 1-cm length-frequency histograms required 375-1,200 fish to estimate within 10% with 80% confidence, 2.5-cm histograms required 150-425 fish, proportional stock density required 75-140 fish, and mean length required 75-160 fish. In general, smaller species, smaller populations, populations with higher mortality, and simpler length statistics required fewer samples. Indices that require low sample sizes may be suitable for monitoring population status, and when large changes in length are evident, additional sampling effort may be allocated to more precisely define length status with more informative estimators. ?? Copyright by the American Fisheries Society 2007.

  9. Validation of abundance estimates from mark–recapture and removal techniques for rainbow trout captured by electrofishing in small streams

    USGS Publications Warehouse

    Rosenberger, Amanda E.; Dunham, Jason B.

    2005-01-01

    Estimation of fish abundance in streams using the removal model or the Lincoln - Peterson mark - recapture model is a common practice in fisheries. These models produce misleading results if their assumptions are violated. We evaluated the assumptions of these two models via electrofishing of rainbow trout Oncorhynchus mykiss in central Idaho streams. For one-, two-, three-, and four-pass sampling effort in closed sites, we evaluated the influences of fish size and habitat characteristics on sampling efficiency and the accuracy of removal abundance estimates. We also examined the use of models to generate unbiased estimates of fish abundance through adjustment of total catch or biased removal estimates. Our results suggested that the assumptions of the mark - recapture model were satisfied and that abundance estimates based on this approach were unbiased. In contrast, the removal model assumptions were not met. Decreasing sampling efficiencies over removal passes resulted in underestimated population sizes and overestimates of sampling efficiency. This bias decreased, but was not eliminated, with increased sampling effort. Biased removal estimates based on different levels of effort were highly correlated with each other but were less correlated with unbiased mark - recapture estimates. Stream size decreased sampling efficiency, and stream size and instream wood increased the negative bias of removal estimates. We found that reliable estimates of population abundance could be obtained from models of sampling efficiency for different levels of effort. Validation of abundance estimates requires extra attention to routine sampling considerations but can help fisheries biologists avoid pitfalls associated with biased data and facilitate standardized comparisons among studies that employ different sampling methods.

  10. Estimation of the bottleneck size in Florida panthers

    USGS Publications Warehouse

    Culver, M.; Hedrick, P.W.; Murphy, K.; O'Brien, S.; Hornocker, M.G.

    2008-01-01

    We have estimated the extent of genetic variation in museum (1890s) and contemporary (1980s) samples of Florida panthers Puma concolor coryi for both nuclear loci and mtDNA. The microsatellite heterozygosity in the contemporary sample was only 0.325 that in the museum samples although our sample size and number of loci are limited. Support for this estimate is provided by a sample of 84 microsatellite loci in contemporary Florida panthers and Idaho pumas Puma concolor hippolestes in which the contemporary Florida panther sample had only 0.442 the heterozygosity of Idaho pumas. The estimated diversities in mtDNA in the museum and contemporary samples were 0.600 and 0.000, respectively. Using a population genetics approach, we have estimated that to reduce either the microsatellite heterozygosity or the mtDNA diversity this much (in a period of c. 80years during the 20th century when the numbers were thought to be low) that a very small bottleneck size of c. 2 for several generations and a small effective population size in other generations is necessary. Using demographic data from Yellowstone pumas, we estimated the ratio of effective to census population size to be 0.315. Using this ratio, the census population size in the Florida panthers necessary to explain the loss of microsatellite variation was c .41 for the non-bottleneck generations and 6.2 for the two bottleneck generations. These low bottleneck population sizes and the concomitant reduced effectiveness of selection are probably responsible for the high frequency of several detrimental traits in Florida panthers, namely undescended testicles and poor sperm quality. The recent intensive monitoring both before and after the introduction of Texas pumas in 1995 will make the recovery and genetic restoration of Florida panthers a classic study of an endangered species. Our estimates of the bottleneck size responsible for the loss of genetic variation in the Florida panther completes an unknown aspect of this account. ?? 2008 The Authors. Journal compilation ?? 2008 The Zoological Society of London.

  11. Effects of LiDAR point density, sampling size and height threshold on estimation accuracy of crop biophysical parameters.

    PubMed

    Luo, Shezhou; Chen, Jing M; Wang, Cheng; Xi, Xiaohuan; Zeng, Hongcheng; Peng, Dailiang; Li, Dong

    2016-05-30

    Vegetation leaf area index (LAI), height, and aboveground biomass are key biophysical parameters. Corn is an important and globally distributed crop, and reliable estimations of these parameters are essential for corn yield forecasting, health monitoring and ecosystem modeling. Light Detection and Ranging (LiDAR) is considered an effective technology for estimating vegetation biophysical parameters. However, the estimation accuracies of these parameters are affected by multiple factors. In this study, we first estimated corn LAI, height and biomass (R2 = 0.80, 0.874 and 0.838, respectively) using the original LiDAR data (7.32 points/m2), and the results showed that LiDAR data could accurately estimate these biophysical parameters. Second, comprehensive research was conducted on the effects of LiDAR point density, sampling size and height threshold on the estimation accuracy of LAI, height and biomass. Our findings indicated that LiDAR point density had an important effect on the estimation accuracy for vegetation biophysical parameters, however, high point density did not always produce highly accurate estimates, and reduced point density could deliver reasonable estimation results. Furthermore, the results showed that sampling size and height threshold were additional key factors that affect the estimation accuracy of biophysical parameters. Therefore, the optimal sampling size and the height threshold should be determined to improve the estimation accuracy of biophysical parameters. Our results also implied that a higher LiDAR point density, larger sampling size and height threshold were required to obtain accurate corn LAI estimation when compared with height and biomass estimations. In general, our results provide valuable guidance for LiDAR data acquisition and estimation of vegetation biophysical parameters using LiDAR data.

  12. Sampling for area estimation: A comparison of full-frame sampling with the sample segment approach

    NASA Technical Reports Server (NTRS)

    Hixson, M.; Bauer, M. E.; Davis, B. J. (Principal Investigator)

    1979-01-01

    The author has identified the following significant results. Full-frame classifications of wheat and non-wheat for eighty counties in Kansas were repetitively sampled to simulate alternative sampling plans. Evaluation of four sampling schemes involving different numbers of samples and different size sampling units shows that the precision of the wheat estimates increased as the segment size decreased and the number of segments was increased. Although the average bias associated with the various sampling schemes was not significantly different, the maximum absolute bias was directly related to sampling size unit.

  13. Variance Estimation, Design Effects, and Sample Size Calculations for Respondent-Driven Sampling

    PubMed Central

    2006-01-01

    Hidden populations, such as injection drug users and sex workers, are central to a number of public health problems. However, because of the nature of these groups, it is difficult to collect accurate information about them, and this difficulty complicates disease prevention efforts. A recently developed statistical approach called respondent-driven sampling improves our ability to study hidden populations by allowing researchers to make unbiased estimates of the prevalence of certain traits in these populations. Yet, not enough is known about the sample-to-sample variability of these prevalence estimates. In this paper, we present a bootstrap method for constructing confidence intervals around respondent-driven sampling estimates and demonstrate in simulations that it outperforms the naive method currently in use. We also use simulations and real data to estimate the design effects for respondent-driven sampling in a number of situations. We conclude with practical advice about the power calculations that are needed to determine the appropriate sample size for a study using respondent-driven sampling. In general, we recommend a sample size twice as large as would be needed under simple random sampling. PMID:16937083

  14. Estimating accuracy of land-cover composition from two-stage cluster sampling

    USGS Publications Warehouse

    Stehman, S.V.; Wickham, J.D.; Fattorini, L.; Wade, T.D.; Baffetta, F.; Smith, J.H.

    2009-01-01

    Land-cover maps are often used to compute land-cover composition (i.e., the proportion or percent of area covered by each class), for each unit in a spatial partition of the region mapped. We derive design-based estimators of mean deviation (MD), mean absolute deviation (MAD), root mean square error (RMSE), and correlation (CORR) to quantify accuracy of land-cover composition for a general two-stage cluster sampling design, and for the special case of simple random sampling without replacement (SRSWOR) at each stage. The bias of the estimators for the two-stage SRSWOR design is evaluated via a simulation study. The estimators of RMSE and CORR have small bias except when sample size is small and the land-cover class is rare. The estimator of MAD is biased for both rare and common land-cover classes except when sample size is large. A general recommendation is that rare land-cover classes require large sample sizes to ensure that the accuracy estimators have small bias. ?? 2009 Elsevier Inc.

  15. Four hundred or more participants needed for stable contingency table estimates of clinical prediction rule performance.

    PubMed

    Kent, Peter; Boyle, Eleanor; Keating, Jennifer L; Albert, Hanne B; Hartvigsen, Jan

    2017-02-01

    To quantify variability in the results of statistical analyses based on contingency tables and discuss the implications for the choice of sample size for studies that derive clinical prediction rules. An analysis of three pre-existing sets of large cohort data (n = 4,062-8,674) was performed. In each data set, repeated random sampling of various sample sizes, from n = 100 up to n = 2,000, was performed 100 times at each sample size and the variability in estimates of sensitivity, specificity, positive and negative likelihood ratios, posttest probabilities, odds ratios, and risk/prevalence ratios for each sample size was calculated. There were very wide, and statistically significant, differences in estimates derived from contingency tables from the same data set when calculated in sample sizes below 400 people, and typically, this variability stabilized in samples of 400-600 people. Although estimates of prevalence also varied significantly in samples below 600 people, that relationship only explains a small component of the variability in these statistical parameters. To reduce sample-specific variability, contingency tables should consist of 400 participants or more when used to derive clinical prediction rules or test their performance. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. Blinded versus unblinded estimation of a correlation coefficient to inform interim design adaptations.

    PubMed

    Kunz, Cornelia U; Stallard, Nigel; Parsons, Nicholas; Todd, Susan; Friede, Tim

    2017-03-01

    Regulatory authorities require that the sample size of a confirmatory trial is calculated prior to the start of the trial. However, the sample size quite often depends on parameters that might not be known in advance of the study. Misspecification of these parameters can lead to under- or overestimation of the sample size. Both situations are unfavourable as the first one decreases the power and the latter one leads to a waste of resources. Hence, designs have been suggested that allow a re-assessment of the sample size in an ongoing trial. These methods usually focus on estimating the variance. However, for some methods the performance depends not only on the variance but also on the correlation between measurements. We develop and compare different methods for blinded estimation of the correlation coefficient that are less likely to introduce operational bias when the blinding is maintained. Their performance with respect to bias and standard error is compared to the unblinded estimator. We simulated two different settings: one assuming that all group means are the same and one assuming that different groups have different means. Simulation results show that the naïve (one-sample) estimator is only slightly biased and has a standard error comparable to that of the unblinded estimator. However, if the group means differ, other estimators have better performance depending on the sample size per group and the number of groups. © 2016 The Authors. Biometrical Journal Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Blinded versus unblinded estimation of a correlation coefficient to inform interim design adaptations

    PubMed Central

    Stallard, Nigel; Parsons, Nicholas; Todd, Susan; Friede, Tim

    2016-01-01

    Regulatory authorities require that the sample size of a confirmatory trial is calculated prior to the start of the trial. However, the sample size quite often depends on parameters that might not be known in advance of the study. Misspecification of these parameters can lead to under‐ or overestimation of the sample size. Both situations are unfavourable as the first one decreases the power and the latter one leads to a waste of resources. Hence, designs have been suggested that allow a re‐assessment of the sample size in an ongoing trial. These methods usually focus on estimating the variance. However, for some methods the performance depends not only on the variance but also on the correlation between measurements. We develop and compare different methods for blinded estimation of the correlation coefficient that are less likely to introduce operational bias when the blinding is maintained. Their performance with respect to bias and standard error is compared to the unblinded estimator. We simulated two different settings: one assuming that all group means are the same and one assuming that different groups have different means. Simulation results show that the naïve (one‐sample) estimator is only slightly biased and has a standard error comparable to that of the unblinded estimator. However, if the group means differ, other estimators have better performance depending on the sample size per group and the number of groups. PMID:27886393

  18. Estimating the Size of a Large Network and its Communities from a Random Sample

    PubMed Central

    Chen, Lin; Karbasi, Amin; Crawford, Forrest W.

    2017-01-01

    Most real-world networks are too large to be measured or studied directly and there is substantial interest in estimating global network properties from smaller sub-samples. One of the most important global properties is the number of vertices/nodes in the network. Estimating the number of vertices in a large network is a major challenge in computer science, epidemiology, demography, and intelligence analysis. In this paper we consider a population random graph G = (V, E) from the stochastic block model (SBM) with K communities/blocks. A sample is obtained by randomly choosing a subset W ⊆ V and letting G(W) be the induced subgraph in G of the vertices in W. In addition to G(W), we observe the total degree of each sampled vertex and its block membership. Given this partial information, we propose an efficient PopULation Size Estimation algorithm, called PULSE, that accurately estimates the size of the whole population as well as the size of each community. To support our theoretical analysis, we perform an exhaustive set of experiments to study the effects of sample size, K, and SBM model parameters on the accuracy of the estimates. The experimental results also demonstrate that PULSE significantly outperforms a widely-used method called the network scale-up estimator in a wide variety of scenarios. PMID:28867924

  19. Estimating the Size of a Large Network and its Communities from a Random Sample.

    PubMed

    Chen, Lin; Karbasi, Amin; Crawford, Forrest W

    2016-01-01

    Most real-world networks are too large to be measured or studied directly and there is substantial interest in estimating global network properties from smaller sub-samples. One of the most important global properties is the number of vertices/nodes in the network. Estimating the number of vertices in a large network is a major challenge in computer science, epidemiology, demography, and intelligence analysis. In this paper we consider a population random graph G = ( V, E ) from the stochastic block model (SBM) with K communities/blocks. A sample is obtained by randomly choosing a subset W ⊆ V and letting G ( W ) be the induced subgraph in G of the vertices in W . In addition to G ( W ), we observe the total degree of each sampled vertex and its block membership. Given this partial information, we propose an efficient PopULation Size Estimation algorithm, called PULSE, that accurately estimates the size of the whole population as well as the size of each community. To support our theoretical analysis, we perform an exhaustive set of experiments to study the effects of sample size, K , and SBM model parameters on the accuracy of the estimates. The experimental results also demonstrate that PULSE significantly outperforms a widely-used method called the network scale-up estimator in a wide variety of scenarios.

  20. [Practical aspects regarding sample size in clinical research].

    PubMed

    Vega Ramos, B; Peraza Yanes, O; Herrera Correa, G; Saldívar Toraya, S

    1996-01-01

    The knowledge of the right sample size let us to be sure if the published results in medical papers had a suitable design and a proper conclusion according to the statistics analysis. To estimate the sample size we must consider the type I error, type II error, variance, the size of the effect, significance and power of the test. To decide what kind of mathematics formula will be used, we must define what kind of study we have, it means if its a prevalence study, a means values one or a comparative one. In this paper we explain some basic topics of statistics and we describe four simple samples of estimation of sample size.

  1. One-step estimation of networked population size: Respondent-driven capture-recapture with anonymity.

    PubMed

    Khan, Bilal; Lee, Hsuan-Wei; Fellows, Ian; Dombrowski, Kirk

    2018-01-01

    Size estimation is particularly important for populations whose members experience disproportionate health issues or pose elevated health risks to the ambient social structures in which they are embedded. Efforts to derive size estimates are often frustrated when the population is hidden or hard-to-reach in ways that preclude conventional survey strategies, as is the case when social stigma is associated with group membership or when group members are involved in illegal activities. This paper extends prior research on the problem of network population size estimation, building on established survey/sampling methodologies commonly used with hard-to-reach groups. Three novel one-step, network-based population size estimators are presented, for use in the context of uniform random sampling, respondent-driven sampling, and when networks exhibit significant clustering effects. We give provably sufficient conditions for the consistency of these estimators in large configuration networks. Simulation experiments across a wide range of synthetic network topologies validate the performance of the estimators, which also perform well on a real-world location-based social networking data set with significant clustering. Finally, the proposed schemes are extended to allow them to be used in settings where participant anonymity is required. Systematic experiments show favorable tradeoffs between anonymity guarantees and estimator performance. Taken together, we demonstrate that reasonable population size estimates are derived from anonymous respondent driven samples of 250-750 individuals, within ambient populations of 5,000-40,000. The method thus represents a novel and cost-effective means for health planners and those agencies concerned with health and disease surveillance to estimate the size of hidden populations. We discuss limitations and future work in the concluding section.

  2. Sample sizes needed for specified margins of relative error in the estimates of the repeatability and reproducibility standard deviations.

    PubMed

    McClure, Foster D; Lee, Jung K

    2005-01-01

    Sample size formulas are developed to estimate the repeatability and reproducibility standard deviations (Sr and S(R)) such that the actual error in (Sr and S(R)) relative to their respective true values, sigmar and sigmaR, are at predefined levels. The statistical consequences associated with AOAC INTERNATIONAL required sample size to validate an analytical method are discussed. In addition, formulas to estimate the uncertainties of (Sr and S(R)) were derived and are provided as supporting documentation. Formula for the Number of Replicates Required for a Specified Margin of Relative Error in the Estimate of the Repeatability Standard Deviation.

  3. Estimating the Size of the Methamphetamine-Using Population in New York City Using Network Sampling Techniques.

    PubMed

    Dombrowski, Kirk; Khan, Bilal; Wendel, Travis; McLean, Katherine; Misshula, Evan; Curtis, Ric

    2012-12-01

    As part of a recent study of the dynamics of the retail market for methamphetamine use in New York City, we used network sampling methods to estimate the size of the total networked population. This process involved sampling from respondents' list of co-use contacts, which in turn became the basis for capture-recapture estimation. Recapture sampling was based on links to other respondents derived from demographic and "telefunken" matching procedures-the latter being an anonymized version of telephone number matching. This paper describes the matching process used to discover the links between the solicited contacts and project respondents, the capture-recapture calculation, the estimation of "false matches", and the development of confidence intervals for the final population estimates. A final population of 12,229 was estimated, with a range of 8235 - 23,750. The techniques described here have the special virtue of deriving an estimate for a hidden population while retaining respondent anonymity and the anonymity of network alters, but likely require larger sample size than the 132 persons interviewed to attain acceptable confidence levels for the estimate.

  4. The Impact of Sample Size and Other Factors When Estimating Multilevel Logistic Models

    ERIC Educational Resources Information Center

    Schoeneberger, Jason A.

    2016-01-01

    The design of research studies utilizing binary multilevel models must necessarily incorporate knowledge of multiple factors, including estimation method, variance component size, or number of predictors, in addition to sample sizes. This Monte Carlo study examined the performance of random effect binary outcome multilevel models under varying…

  5. Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference

    PubMed Central

    Karcher, Michael D.; Palacios, Julia A.; Bedford, Trevor; Suchard, Marc A.; Minin, Vladimir N.

    2016-01-01

    Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals’ genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for this deficiency, we propose a new model that explicitly accounts for preferential sampling by modeling the sampling times as an inhomogeneous Poisson process dependent on effective population size. We demonstrate that in the presence of preferential sampling our new model not only reduces bias, but also improves estimation precision. Finally, we compare the performance of the currently used phylodynamic methods with our proposed model through clinically-relevant, seasonal human influenza examples. PMID:26938243

  6. Weighting by Inverse Variance or by Sample Size in Random-Effects Meta-Analysis

    ERIC Educational Resources Information Center

    Marin-Martinez, Fulgencio; Sanchez-Meca, Julio

    2010-01-01

    Most of the statistical procedures in meta-analysis are based on the estimation of average effect sizes from a set of primary studies. The optimal weight for averaging a set of independent effect sizes is the inverse variance of each effect size, but in practice these weights have to be estimated, being affected by sampling error. When assuming a…

  7. The effects of sample size on population genomic analyses--implications for the tests of neutrality.

    PubMed

    Subramanian, Sankar

    2016-02-20

    One of the fundamental measures of molecular genetic variation is the Watterson's estimator (θ), which is based on the number of segregating sites. The estimation of θ is unbiased only under neutrality and constant population growth. It is well known that the estimation of θ is biased when these assumptions are violated. However, the effects of sample size in modulating the bias was not well appreciated. We examined this issue in detail based on large-scale exome data and robust simulations. Our investigation revealed that sample size appreciably influences θ estimation and this effect was much higher for constrained genomic regions than that of neutral regions. For instance, θ estimated for synonymous sites using 512 human exomes was 1.9 times higher than that obtained using 16 exomes. However, this difference was 2.5 times for the nonsynonymous sites of the same data. We observed a positive correlation between the rate of increase in θ estimates (with respect to the sample size) and the magnitude of selection pressure. For example, θ estimated for the nonsynonymous sites of highly constrained genes (dN/dS < 0.1) using 512 exomes was 3.6 times higher than that estimated using 16 exomes. In contrast this difference was only 2 times for the less constrained genes (dN/dS > 0.9). The results of this study reveal the extent of underestimation owing to small sample sizes and thus emphasize the importance of sample size in estimating a number of population genomic parameters. Our results have serious implications for neutrality tests such as Tajima D, Fu-Li D and those based on the McDonald and Kreitman test: Neutrality Index and the fraction of adaptive substitutions. For instance, use of 16 exomes produced 2.4 times higher proportion of adaptive substitutions compared to that obtained using 512 exomes (24% vs 10 %).

  8. The Petersen-Lincoln estimator and its extension to estimate the size of a shared population.

    PubMed

    Chao, Anne; Pan, H-Y; Chiang, Shu-Chuan

    2008-12-01

    The Petersen-Lincoln estimator has been used to estimate the size of a population in a single mark release experiment. However, the estimator is not valid when the capture sample and recapture sample are not independent. We provide an intuitive interpretation for "independence" between samples based on 2 x 2 categorical data formed by capture/non-capture in each of the two samples. From the interpretation, we review a general measure of "dependence" and quantify the correlation bias of the Petersen-Lincoln estimator when two types of dependences (local list dependence and heterogeneity of capture probability) exist. An important implication in the census undercount problem is that instead of using a post enumeration sample to assess the undercount of a census, one should conduct a prior enumeration sample to avoid correlation bias. We extend the Petersen-Lincoln method to the case of two populations. This new estimator of the size of the shared population is proposed and its variance is derived. We discuss a special case where the correlation bias of the proposed estimator due to dependence between samples vanishes. The proposed method is applied to a study of the relapse rate of illicit drug use in Taiwan. ((c) 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim).

  9. An audit of the statistics and the comparison with the parameter in the population

    NASA Astrophysics Data System (ADS)

    Bujang, Mohamad Adam; Sa'at, Nadiah; Joys, A. Reena; Ali, Mariana Mohamad

    2015-10-01

    The sufficient sample size that is needed to closely estimate the statistics for particular parameters are use to be an issue. Although sample size might had been calculated referring to objective of the study, however, it is difficult to confirm whether the statistics are closed with the parameter for a particular population. All these while, guideline that uses a p-value less than 0.05 is widely used as inferential evidence. Therefore, this study had audited results that were analyzed from various sub sample and statistical analyses and had compared the results with the parameters in three different populations. Eight types of statistical analysis and eight sub samples for each statistical analysis were analyzed. Results found that the statistics were consistent and were closed to the parameters when the sample study covered at least 15% to 35% of population. Larger sample size is needed to estimate parameter that involve with categorical variables compared with numerical variables. Sample sizes with 300 to 500 are sufficient to estimate the parameters for medium size of population.

  10. Optimal flexible sample size design with robust power.

    PubMed

    Zhang, Lanju; Cui, Lu; Yang, Bo

    2016-08-30

    It is well recognized that sample size determination is challenging because of the uncertainty on the treatment effect size. Several remedies are available in the literature. Group sequential designs start with a sample size based on a conservative (smaller) effect size and allow early stop at interim looks. Sample size re-estimation designs start with a sample size based on an optimistic (larger) effect size and allow sample size increase if the observed effect size is smaller than planned. Different opinions favoring one type over the other exist. We propose an optimal approach using an appropriate optimality criterion to select the best design among all the candidate designs. Our results show that (1) for the same type of designs, for example, group sequential designs, there is room for significant improvement through our optimization approach; (2) optimal promising zone designs appear to have no advantages over optimal group sequential designs; and (3) optimal designs with sample size re-estimation deliver the best adaptive performance. We conclude that to deal with the challenge of sample size determination due to effect size uncertainty, an optimal approach can help to select the best design that provides most robust power across the effect size range of interest. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  11. On the role of dimensionality and sample size for unstructured and structured covariance matrix estimation

    NASA Technical Reports Server (NTRS)

    Morgera, S. D.; Cooper, D. B.

    1976-01-01

    The experimental observation that a surprisingly small sample size vis-a-vis dimension is needed to achieve good signal-to-interference ratio (SIR) performance with an adaptive predetection filter is explained. The adaptive filter requires estimates as obtained by a recursive stochastic algorithm of the inverse of the filter input data covariance matrix. The SIR performance with sample size is compared for the situations where the covariance matrix estimates are of unstructured (generalized) form and of structured (finite Toeplitz) form; the latter case is consistent with weak stationarity of the input data stochastic process.

  12. Evaluation of species richness estimators based on quantitative performance measures and sensitivity to patchiness and sample grain size

    NASA Astrophysics Data System (ADS)

    Willie, Jacob; Petre, Charles-Albert; Tagg, Nikki; Lens, Luc

    2012-11-01

    Data from forest herbaceous plants in a site of known species richness in Cameroon were used to test the performance of rarefaction and eight species richness estimators (ACE, ICE, Chao1, Chao2, Jack1, Jack2, Bootstrap and MM). Bias, accuracy, precision and sensitivity to patchiness and sample grain size were the evaluation criteria. An evaluation of the effects of sampling effort and patchiness on diversity estimation is also provided. Stems were identified and counted in linear series of 1-m2 contiguous square plots distributed in six habitat types. Initially, 500 plots were sampled in each habitat type. The sampling process was monitored using rarefaction and a set of richness estimator curves. Curves from the first dataset suggested adequate sampling in riparian forest only. Additional plots ranging from 523 to 2143 were subsequently added in the undersampled habitats until most of the curves stabilized. Jack1 and ICE, the non-parametric richness estimators, performed better, being more accurate and less sensitive to patchiness and sample grain size, and significantly reducing biases that could not be detected by rarefaction and other estimators. This study confirms the usefulness of non-parametric incidence-based estimators, and recommends Jack1 or ICE alongside rarefaction while describing taxon richness and comparing results across areas sampled using similar or different grain sizes. As patchiness varied across habitat types, accurate estimations of diversity did not require the same number of plots. The number of samples needed to fully capture diversity is not necessarily the same across habitats, and can only be known when taxon sampling curves have indicated adequate sampling. Differences in observed species richness between habitats were generally due to differences in patchiness, except between two habitats where they resulted from differences in abundance. We suggest that communities should first be sampled thoroughly using appropriate taxon sampling curves before explaining differences in diversity.

  13. Effects of sample size on KERNEL home range estimates

    USGS Publications Warehouse

    Seaman, D.E.; Millspaugh, J.J.; Kernohan, Brian J.; Brundige, Gary C.; Raedeke, Kenneth J.; Gitzen, Robert A.

    1999-01-01

    Kernel methods for estimating home range are being used increasingly in wildlife research, but the effect of sample size on their accuracy is not known. We used computer simulations of 10-200 points/home range and compared accuracy of home range estimates produced by fixed and adaptive kernels with the reference (REF) and least-squares cross-validation (LSCV) methods for determining the amount of smoothing. Simulated home ranges varied from simple to complex shapes created by mixing bivariate normal distributions. We used the size of the 95% home range area and the relative mean squared error of the surface fit to assess the accuracy of the kernel home range estimates. For both measures, the bias and variance approached an asymptote at about 50 observations/home range. The fixed kernel with smoothing selected by LSCV provided the least-biased estimates of the 95% home range area. All kernel methods produced similar surface fit for most simulations, but the fixed kernel with LSCV had the lowest frequency and magnitude of very poor estimates. We reviewed 101 papers published in The Journal of Wildlife Management (JWM) between 1980 and 1997 that estimated animal home ranges. A minority of these papers used nonparametric utilization distribution (UD) estimators, and most did not adequately report sample sizes. We recommend that home range studies using kernel estimates use LSCV to determine the amount of smoothing, obtain a minimum of 30 observations per animal (but preferably a?Y50), and report sample sizes in published results.

  14. Intra-class correlation estimates for assessment of vitamin A intake in children.

    PubMed

    Agarwal, Girdhar G; Awasthi, Shally; Walter, Stephen D

    2005-03-01

    In many community-based surveys, multi-level sampling is inherent in the design. In the design of these studies, especially to calculate the appropriate sample size, investigators need good estimates of intra-class correlation coefficient (ICC), along with the cluster size, to adjust for variation inflation due to clustering at each level. The present study used data on the assessment of clinical vitamin A deficiency and intake of vitamin A-rich food in children in a district in India. For the survey, 16 households were sampled from 200 villages nested within eight randomly-selected blocks of the district. ICCs and components of variances were estimated from a three-level hierarchical random effects analysis of variance model. Estimates of ICCs and variance components were obtained at village and block levels. Between-cluster variation was evident at each level of clustering. In these estimates, ICCs were inversely related to cluster size, but the design effect could be substantial for large clusters. At the block level, most ICC estimates were below 0.07. At the village level, many ICC estimates ranged from 0.014 to 0.45. These estimates may provide useful information for the design of epidemiological studies in which the sampled (or allocated) units range in size from households to large administrative zones.

  15. Hierarchical modeling of cluster size in wildlife surveys

    USGS Publications Warehouse

    Royle, J. Andrew

    2008-01-01

    Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).

  16. Generalizing the Network Scale-Up Method: A New Estimator for the Size of Hidden Populations*

    PubMed Central

    Feehan, Dennis M.; Salganik, Matthew J.

    2018-01-01

    The network scale-up method enables researchers to estimate the size of hidden populations, such as drug injectors and sex workers, using sampled social network data. The basic scale-up estimator offers advantages over other size estimation techniques, but it depends on problematic modeling assumptions. We propose a new generalized scale-up estimator that can be used in settings with non-random social mixing and imperfect awareness about membership in the hidden population. Further, the new estimator can be used when data are collected via complex sample designs and from incomplete sampling frames. However, the generalized scale-up estimator also requires data from two samples: one from the frame population and one from the hidden population. In some situations these data from the hidden population can be collected by adding a small number of questions to already planned studies. For other situations, we develop interpretable adjustment factors that can be applied to the basic scale-up estimator. We conclude with practical recommendations for the design and analysis of future studies. PMID:29375167

  17. Accuracy or precision: Implications of sample design and methodology on abundance estimation

    USGS Publications Warehouse

    Kowalewski, Lucas K.; Chizinski, Christopher J.; Powell, Larkin A.; Pope, Kevin L.; Pegg, Mark A.

    2015-01-01

    Sampling by spatially replicated counts (point-count) is an increasingly popular method of estimating population size of organisms. Challenges exist when sampling by point-count method, and it is often impractical to sample entire area of interest and impossible to detect every individual present. Ecologists encounter logistical limitations that force them to sample either few large-sample units or many small sample-units, introducing biases to sample counts. We generated a computer environment and simulated sampling scenarios to test the role of number of samples, sample unit area, number of organisms, and distribution of organisms in the estimation of population sizes using N-mixture models. Many sample units of small area provided estimates that were consistently closer to true abundance than sample scenarios with few sample units of large area. However, sample scenarios with few sample units of large area provided more precise abundance estimates than abundance estimates derived from sample scenarios with many sample units of small area. It is important to consider accuracy and precision of abundance estimates during the sample design process with study goals and objectives fully recognized, although and with consequence, consideration of accuracy and precision of abundance estimates is often an afterthought that occurs during the data analysis process.

  18. Sampling studies to estimate the HIV prevalence rate in female commercial sex workers.

    PubMed

    Pascom, Ana Roberta Pati; Szwarcwald, Célia Landmann; Barbosa Júnior, Aristides

    2010-01-01

    We investigated sampling methods being used to estimate the HIV prevalence rate among female commercial sex workers. The studies were classified according to the adequacy or not of the sample size to estimate HIV prevalence rate and according to the sampling method (probabilistic or convenience). We identified 75 studies that estimated the HIV prevalence rate among female sex workers. Most of the studies employed convenience samples. The sample size was not adequate to estimate HIV prevalence rate in 35 studies. The use of convenience sample limits statistical inference for the whole group. It was observed that there was an increase in the number of published studies since 2005, as well as in the number of studies that used probabilistic samples. This represents a large advance in the monitoring of risk behavior practices and HIV prevalence rate in this group.

  19. A novel, efficient method for estimating the prevalence of acute malnutrition in resource-constrained and crisis-affected settings: A simulation study.

    PubMed

    Frison, Severine; Kerac, Marko; Checchi, Francesco; Nicholas, Jennifer

    2017-01-01

    The assessment of the prevalence of acute malnutrition in children under five is widely used for the detection of emergencies, planning interventions, advocacy, and monitoring and evaluation. This study examined PROBIT Methods which convert parameters (mean and standard deviation (SD)) of a normally distributed variable to a cumulative probability below any cut-off to estimate acute malnutrition in children under five using Middle-Upper Arm Circumference (MUAC). We assessed the performance of: PROBIT Method I, with mean MUAC from the survey sample and MUAC SD from a database of previous surveys; and PROBIT Method II, with mean and SD of MUAC observed in the survey sample. Specifically, we generated sub-samples from 852 survey datasets, simulating 100 surveys for eight sample sizes. Overall the methods were tested on 681 600 simulated surveys. PROBIT methods relying on sample sizes as small as 50 had better performance than the classic method for estimating and classifying the prevalence of acute malnutrition. They had better precision in the estimation of acute malnutrition for all sample sizes and better coverage for smaller sample sizes, while having relatively little bias. They classified situations accurately for a threshold of 5% acute malnutrition. Both PROBIT methods had similar outcomes. PROBIT Methods have a clear advantage in the assessment of acute malnutrition prevalence based on MUAC, compared to the classic method. Their use would require much lower sample sizes, thus enable great time and resource savings and permit timely and/or locally relevant prevalence estimates of acute malnutrition for a swift and well-targeted response.

  20. Malaria prevalence metrics in low- and middle-income countries: an assessment of precision in nationally-representative surveys.

    PubMed

    Alegana, Victor A; Wright, Jim; Bosco, Claudio; Okiro, Emelda A; Atkinson, Peter M; Snow, Robert W; Tatem, Andrew J; Noor, Abdisalan M

    2017-11-21

    One pillar to monitoring progress towards the Sustainable Development Goals is the investment in high quality data to strengthen the scientific basis for decision-making. At present, nationally-representative surveys are the main source of data for establishing a scientific evidence base, monitoring, and evaluation of health metrics. However, little is known about the optimal precisions of various population-level health and development indicators that remains unquantified in nationally-representative household surveys. Here, a retrospective analysis of the precision of prevalence from these surveys was conducted. Using malaria indicators, data were assembled in nine sub-Saharan African countries with at least two nationally-representative surveys. A Bayesian statistical model was used to estimate between- and within-cluster variability for fever and malaria prevalence, and insecticide-treated bed nets (ITNs) use in children under the age of 5 years. The intra-class correlation coefficient was estimated along with the optimal sample size for each indicator with associated uncertainty. Results suggest that the estimated sample sizes for the current nationally-representative surveys increases with declining malaria prevalence. Comparison between the actual sample size and the modelled estimate showed a requirement to increase the sample size for parasite prevalence by up to 77.7% (95% Bayesian credible intervals 74.7-79.4) for the 2015 Kenya MIS (estimated sample size of children 0-4 years 7218 [7099-7288]), and 54.1% [50.1-56.5] for the 2014-2015 Rwanda DHS (12,220 [11,950-12,410]). This study highlights the importance of defining indicator-relevant sample sizes to achieve the required precision in the current national surveys. While expanding the current surveys would need additional investment, the study highlights the need for improved approaches to cost effective sampling.

  1. A Scanning Transmission Electron Microscopy Method for Determining Manganese Composition in Welding Fume as a Function of Primary Particle Size

    PubMed Central

    Richman, Julie D.; Livi, Kenneth J.T.; Geyh, Alison S.

    2011-01-01

    Increasing evidence suggests that the physicochemical properties of inhaled nanoparticles influence the resulting toxicokinetics and toxicodynamics. This report presents a method using scanning transmission electron microscopy (STEM) to measure the Mn content throughout the primary particle size distribution of welding fume particle samples collected on filters for application in exposure and health research. Dark field images were collected to assess the primary particle size distribution and energy-dispersive X-ray and electron energy loss spectroscopy were performed for measurement of Mn composition as a function of primary particle size. A manual method incorporating imaging software was used to measure the primary particle diameter and to select an integration region for compositional analysis within primary particles throughout the size range. To explore the variation in the developed metric, the method was applied to 10 gas metal arc welding (GMAW) fume particle samples of mild steel that were collected under a variety of conditions. The range of Mn composition by particle size was −0.10 to 0.19 %/nm, where a positive estimate indicates greater relative abundance of Mn increasing with primary particle size and a negative estimate conversely indicates decreasing Mn content with size. However, the estimate was only statistically significant (p<0.05) in half of the samples (n=5), which all had a positive estimate. In the remaining samples, no significant trend was measured. Our findings indicate that the method is reproducible and that differences in the abundance of Mn by primary particle size among welding fume samples can be detected. PMID:21625364

  2. A Scanning Transmission Electron Microscopy Method for Determining Manganese Composition in Welding Fume as a Function of Primary Particle Size.

    PubMed

    Richman, Julie D; Livi, Kenneth J T; Geyh, Alison S

    2011-06-01

    Increasing evidence suggests that the physicochemical properties of inhaled nanoparticles influence the resulting toxicokinetics and toxicodynamics. This report presents a method using scanning transmission electron microscopy (STEM) to measure the Mn content throughout the primary particle size distribution of welding fume particle samples collected on filters for application in exposure and health research. Dark field images were collected to assess the primary particle size distribution and energy-dispersive X-ray and electron energy loss spectroscopy were performed for measurement of Mn composition as a function of primary particle size. A manual method incorporating imaging software was used to measure the primary particle diameter and to select an integration region for compositional analysis within primary particles throughout the size range. To explore the variation in the developed metric, the method was applied to 10 gas metal arc welding (GMAW) fume particle samples of mild steel that were collected under a variety of conditions. The range of Mn composition by particle size was -0.10 to 0.19 %/nm, where a positive estimate indicates greater relative abundance of Mn increasing with primary particle size and a negative estimate conversely indicates decreasing Mn content with size. However, the estimate was only statistically significant (p<0.05) in half of the samples (n=5), which all had a positive estimate. In the remaining samples, no significant trend was measured. Our findings indicate that the method is reproducible and that differences in the abundance of Mn by primary particle size among welding fume samples can be detected.

  3. The impact of sample size on the reproducibility of voxel-based lesion-deficit mappings.

    PubMed

    Lorca-Puls, Diego L; Gajardo-Vidal, Andrea; White, Jitrachote; Seghier, Mohamed L; Leff, Alexander P; Green, David W; Crinion, Jenny T; Ludersdorfer, Philipp; Hope, Thomas M H; Bowman, Howard; Price, Cathy J

    2018-07-01

    This study investigated how sample size affects the reproducibility of findings from univariate voxel-based lesion-deficit analyses (e.g., voxel-based lesion-symptom mapping and voxel-based morphometry). Our effect of interest was the strength of the mapping between brain damage and speech articulation difficulties, as measured in terms of the proportion of variance explained. First, we identified a region of interest by searching on a voxel-by-voxel basis for brain areas where greater lesion load was associated with poorer speech articulation using a large sample of 360 right-handed English-speaking stroke survivors. We then randomly drew thousands of bootstrap samples from this data set that included either 30, 60, 90, 120, 180, or 360 patients. For each resample, we recorded effect size estimates and p values after conducting exactly the same lesion-deficit analysis within the previously identified region of interest and holding all procedures constant. The results show (1) how often small effect sizes in a heterogeneous population fail to be detected; (2) how effect size and its statistical significance varies with sample size; (3) how low-powered studies (due to small sample sizes) can greatly over-estimate as well as under-estimate effect sizes; and (4) how large sample sizes (N ≥ 90) can yield highly significant p values even when effect sizes are so small that they become trivial in practical terms. The implications of these findings for interpreting the results from univariate voxel-based lesion-deficit analyses are discussed. Copyright © 2018 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  4. Estimating population sizes for elusive animals: the forest elephants of Kakum National Park, Ghana.

    PubMed

    Eggert, L S; Eggert, J A; Woodruff, D S

    2003-06-01

    African forest elephants are difficult to observe in the dense vegetation, and previous studies have relied upon indirect methods to estimate population sizes. Using multilocus genotyping of noninvasively collected samples, we performed a genetic survey of the forest elephant population at Kakum National Park, Ghana. We estimated population size, sex ratio and genetic variability from our data, then combined this information with field observations to divide the population into age groups. Our population size estimate was very close to that obtained using dung counts, the most commonly used indirect method of estimating the population sizes of forest elephant populations. As their habitat is fragmented by expanding human populations, management will be increasingly important to the persistence of forest elephant populations. The data that can be obtained from noninvasively collected samples will help managers plan for the conservation of this keystone species.

  5. Sample Size Determination for Regression Models Using Monte Carlo Methods in R

    ERIC Educational Resources Information Center

    Beaujean, A. Alexander

    2014-01-01

    A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…

  6. The Effects of Model Misspecification and Sample Size on LISREL Maximum Likelihood Estimates.

    ERIC Educational Resources Information Center

    Baldwin, Beatrice

    The robustness of LISREL computer program maximum likelihood estimates under specific conditions of model misspecification and sample size was examined. The population model used in this study contains one exogenous variable; three endogenous variables; and eight indicator variables, two for each latent variable. Conditions of model…

  7. Sample size calculation for stepped wedge and other longitudinal cluster randomised trials.

    PubMed

    Hooper, Richard; Teerenstra, Steven; de Hoop, Esther; Eldridge, Sandra

    2016-11-20

    The sample size required for a cluster randomised trial is inflated compared with an individually randomised trial because outcomes of participants from the same cluster are correlated. Sample size calculations for longitudinal cluster randomised trials (including stepped wedge trials) need to take account of at least two levels of clustering: the clusters themselves and times within clusters. We derive formulae for sample size for repeated cross-section and closed cohort cluster randomised trials with normally distributed outcome measures, under a multilevel model allowing for variation between clusters and between times within clusters. Our formulae agree with those previously described for special cases such as crossover and analysis of covariance designs, although simulation suggests that the formulae could underestimate required sample size when the number of clusters is small. Whether using a formula or simulation, a sample size calculation requires estimates of nuisance parameters, which in our model include the intracluster correlation, cluster autocorrelation, and individual autocorrelation. A cluster autocorrelation less than 1 reflects a situation where individuals sampled from the same cluster at different times have less correlated outcomes than individuals sampled from the same cluster at the same time. Nuisance parameters could be estimated from time series obtained in similarly clustered settings with the same outcome measure, using analysis of variance to estimate variance components. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  8. A sampling system for estimating the cultivation of wheat (Triticum aestivum L) from LANDSAT data. M.S. Thesis - 21 Jul. 1983

    NASA Technical Reports Server (NTRS)

    Parada, N. D. J. (Principal Investigator); Moreira, M. A.

    1983-01-01

    Using digitally processed MSS/LANDSAT data as auxiliary variable, a methodology to estimate wheat (Triticum aestivum L) area by means of sampling techniques was developed. To perform this research, aerial photographs covering 720 sq km in Cruz Alta test site at the NW of Rio Grande do Sul State, were visually analyzed. LANDSAT digital data were analyzed using non-supervised and supervised classification algorithms; as post-processing the classification was submitted to spatial filtering. To estimate wheat area, the regression estimation method was applied and different sample sizes and various sampling units (10, 20, 30, 40 and 60 sq km) were tested. Based on the four decision criteria established for this research, it was concluded that: (1) as the size of sampling units decreased the percentage of sampled area required to obtain similar estimation performance also decreased; (2) the lowest percentage of the area sampled for wheat estimation with relatively high precision and accuracy through regression estimation was 90% using 10 sq km s the sampling unit; and (3) wheat area estimation by direct expansion (using only aerial photographs) was less precise and accurate when compared to those obtained by means of regression estimation.

  9. Trap configuration and spacing influences parameter estimates in spatial capture-recapture models

    USGS Publications Warehouse

    Sun, Catherine C.; Fuller, Angela K.; Royle, J. Andrew

    2014-01-01

    An increasing number of studies employ spatial capture-recapture models to estimate population size, but there has been limited research on how different spatial sampling designs and trap configurations influence parameter estimators. Spatial capture-recapture models provide an advantage over non-spatial models by explicitly accounting for heterogeneous detection probabilities among individuals that arise due to the spatial organization of individuals relative to sampling devices. We simulated black bear (Ursus americanus) populations and spatial capture-recapture data to evaluate the influence of trap configuration and trap spacing on estimates of population size and a spatial scale parameter, sigma, that relates to home range size. We varied detection probability and home range size, and considered three trap configurations common to large-mammal mark-recapture studies: regular spacing, clustered, and a temporal sequence of different cluster configurations (i.e., trap relocation). We explored trap spacing and number of traps per cluster by varying the number of traps. The clustered arrangement performed well when detection rates were low, and provides for easier field implementation than the sequential trap arrangement. However, performance differences between trap configurations diminished as home range size increased. Our simulations suggest it is important to consider trap spacing relative to home range sizes, with traps ideally spaced no more than twice the spatial scale parameter. While spatial capture-recapture models can accommodate different sampling designs and still estimate parameters with accuracy and precision, our simulations demonstrate that aspects of sampling design, namely trap configuration and spacing, must consider study area size, ranges of individual movement, and home range sizes in the study population.

  10. Dispersion and sampling of adult Dermacentor andersoni in rangeland in Western North America.

    PubMed

    Rochon, K; Scoles, G A; Lysyk, T J

    2012-03-01

    A fixed precision sampling plan was developed for off-host populations of adult Rocky Mountain wood tick, Dermacentor andersoni (Stiles) based on data collected by dragging at 13 locations in Alberta, Canada; Washington; and Oregon. In total, 222 site-date combinations were sampled. Each site-date combination was considered a sample, and each sample ranged in size from 86 to 250 10 m2 quadrats. Analysis of simulated quadrats ranging in size from 10 to 50 m2 indicated that the most precise sample unit was the 10 m2 quadrat. Samples taken when abundance < 0.04 ticks per 10 m2 were more likely to not depart significantly from statistical randomness than samples taken when abundance was greater. Data were grouped into ten abundance classes and assessed for fit to the Poisson and negative binomial distributions. The Poisson distribution fit only data in abundance classes < 0.02 ticks per 10 m2, while the negative binomial distribution fit data from all abundance classes. A negative binomial distribution with common k = 0.3742 fit data in eight of the 10 abundance classes. Both the Taylor and Iwao mean-variance relationships were fit and used to predict sample sizes for a fixed level of precision. Sample sizes predicted using the Taylor model tended to underestimate actual sample sizes, while sample sizes estimated using the Iwao model tended to overestimate actual sample sizes. Using a negative binomial with common k provided estimates of required sample sizes closest to empirically calculated sample sizes.

  11. Sample Size Estimation for Alzheimer's Disease Trials from Japanese ADNI Serial Magnetic Resonance Imaging.

    PubMed

    Fujishima, Motonobu; Kawaguchi, Atsushi; Maikusa, Norihide; Kuwano, Ryozo; Iwatsubo, Takeshi; Matsuda, Hiroshi

    2017-01-01

    Little is known about the sample sizes required for clinical trials of Alzheimer's disease (AD)-modifying treatments using atrophy measures from serial brain magnetic resonance imaging (MRI) in the Japanese population. The primary objective of the present study was to estimate how large a sample size would be needed for future clinical trials for AD-modifying treatments in Japan using atrophy measures of the brain as a surrogate biomarker. Sample sizes were estimated from the rates of change of the whole brain and hippocampus by the k-means normalized boundary shift integral (KN-BSI) and cognitive measures using the data of 537 Japanese Alzheimer's Neuroimaging Initiative (J-ADNI) participants with a linear mixed-effects model. We also examined the potential use of ApoE status as a trial enrichment strategy. The hippocampal atrophy rate required smaller sample sizes than cognitive measures of AD and mild cognitive impairment (MCI). Inclusion of ApoE status reduced sample sizes for AD and MCI patients in the atrophy measures. These results show the potential use of longitudinal hippocampal atrophy measurement using automated image analysis as a progression biomarker and ApoE status as a trial enrichment strategy in a clinical trial of AD-modifying treatment in Japanese people.

  12. Estimation of Effect Size from a Series of Experiments Involving Paired Comparisons.

    ERIC Educational Resources Information Center

    Gibbons, Robert D.; And Others

    1993-01-01

    A distribution theory is derived for a G. V. Glass-type (1976) estimator of effect size from studies involving paired comparisons. The possibility of combining effect sizes from studies involving a mixture of related and unrelated samples is also explored. Resulting estimates are illustrated using data from previous psychiatric research. (SLD)

  13. Estimating the quadratic mean diameter of fine woody debris for forest type groups of the United States

    Treesearch

    Christopher W. Woodall; Vicente J. Monleon

    2009-01-01

    The Forest Inventory and Analysis program of the Forest Service, U.S. Department of Agriculture conducts a national inventory of fine woody debris (FWD); however, the sampling protocols involve tallying only the number of FWD pieces by size class that intersect a sampling transect with no measure of actual size. The line intersect estimator used with those samples...

  14. Generalized SAMPLE SIZE Determination Formulas for Investigating Contextual Effects by a Three-Level Random Intercept Model.

    PubMed

    Usami, Satoshi

    2017-03-01

    Behavioral and psychological researchers have shown strong interests in investigating contextual effects (i.e., the influences of combinations of individual- and group-level predictors on individual-level outcomes). The present research provides generalized formulas for determining the sample size needed in investigating contextual effects according to the desired level of statistical power as well as width of confidence interval. These formulas are derived within a three-level random intercept model that includes one predictor/contextual variable at each level to simultaneously cover various kinds of contextual effects that researchers can show interest. The relative influences of indices included in the formulas on the standard errors of contextual effects estimates are investigated with the aim of further simplifying sample size determination procedures. In addition, simulation studies are performed to investigate finite sample behavior of calculated statistical power, showing that estimated sample sizes based on derived formulas can be both positively and negatively biased due to complex effects of unreliability of contextual variables, multicollinearity, and violation of assumption regarding the known variances. Thus, it is advisable to compare estimated sample sizes under various specifications of indices and to evaluate its potential bias, as illustrated in the example.

  15. Modeling motor vehicle crashes using Poisson-gamma models: examining the effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter.

    PubMed

    Lord, Dominique

    2006-07-01

    There has been considerable research conducted on the development of statistical models for predicting crashes on highway facilities. Despite numerous advancements made for improving the estimation tools of statistical models, the most common probabilistic structure used for modeling motor vehicle crashes remains the traditional Poisson and Poisson-gamma (or Negative Binomial) distribution; when crash data exhibit over-dispersion, the Poisson-gamma model is usually the model of choice most favored by transportation safety modelers. Crash data collected for safety studies often have the unusual attributes of being characterized by low sample mean values. Studies have shown that the goodness-of-fit of statistical models produced from such datasets can be significantly affected. This issue has been defined as the "low mean problem" (LMP). Despite recent developments on methods to circumvent the LMP and test the goodness-of-fit of models developed using such datasets, no work has so far examined how the LMP affects the fixed dispersion parameter of Poisson-gamma models used for modeling motor vehicle crashes. The dispersion parameter plays an important role in many types of safety studies and should, therefore, be reliably estimated. The primary objective of this research project was to verify whether the LMP affects the estimation of the dispersion parameter and, if it is, to determine the magnitude of the problem. The secondary objective consisted of determining the effects of an unreliably estimated dispersion parameter on common analyses performed in highway safety studies. To accomplish the objectives of the study, a series of Poisson-gamma distributions were simulated using different values describing the mean, the dispersion parameter, and the sample size. Three estimators commonly used by transportation safety modelers for estimating the dispersion parameter of Poisson-gamma models were evaluated: the method of moments, the weighted regression, and the maximum likelihood method. In an attempt to complement the outcome of the simulation study, Poisson-gamma models were fitted to crash data collected in Toronto, Ont. characterized by a low sample mean and small sample size. The study shows that a low sample mean combined with a small sample size can seriously affect the estimation of the dispersion parameter, no matter which estimator is used within the estimation process. The probability the dispersion parameter becomes unreliably estimated increases significantly as the sample mean and sample size decrease. Consequently, the results show that an unreliably estimated dispersion parameter can significantly undermine empirical Bayes (EB) estimates as well as the estimation of confidence intervals for the gamma mean and predicted response. The paper ends with recommendations about minimizing the likelihood of producing Poisson-gamma models with an unreliable dispersion parameter for modeling motor vehicle crashes.

  16. Estimating sample size for landscape-scale mark-recapture studies of North American migratory tree bats

    USGS Publications Warehouse

    Ellison, Laura E.; Lukacs, Paul M.

    2014-01-01

    Concern for migratory tree-roosting bats in North America has grown because of possible population declines from wind energy development. This concern has driven interest in estimating population-level changes. Mark-recapture methodology is one possible analytical framework for assessing bat population changes, but sample size requirements to produce reliable estimates have not been estimated. To illustrate the sample sizes necessary for a mark-recapture-based monitoring program we conducted power analyses using a statistical model that allows reencounters of live and dead marked individuals. We ran 1,000 simulations for each of five broad sample size categories in a Burnham joint model, and then compared the proportion of simulations in which 95% confidence intervals overlapped between and among years for a 4-year study. Additionally, we conducted sensitivity analyses of sample size to various capture probabilities and recovery probabilities. More than 50,000 individuals per year would need to be captured and released to accurately determine 10% and 15% declines in annual survival. To detect more dramatic declines of 33% or 50% survival over four years, then sample sizes of 25,000 or 10,000 per year, respectively, would be sufficient. Sensitivity analyses reveal that increasing recovery of dead marked individuals may be more valuable than increasing capture probability of marked individuals. Because of the extraordinary effort that would be required, we advise caution should such a mark-recapture effort be initiated because of the difficulty in attaining reliable estimates. We make recommendations for what techniques show the most promise for mark-recapture studies of bats because some techniques violate the assumptions of mark-recapture methodology when used to mark bats.

  17. Inference and sample size calculation for clinical trials with incomplete observations of paired binary outcomes.

    PubMed

    Zhang, Song; Cao, Jing; Ahn, Chul

    2017-02-20

    We investigate the estimation of intervention effect and sample size determination for experiments where subjects are supposed to contribute paired binary outcomes with some incomplete observations. We propose a hybrid estimator to appropriately account for the mixed nature of observed data: paired outcomes from those who contribute complete pairs of observations and unpaired outcomes from those who contribute either pre-intervention or post-intervention outcomes. We theoretically prove that if incomplete data are evenly distributed between the pre-intervention and post-intervention periods, the proposed estimator will always be more efficient than the traditional estimator. A numerical research shows that when the distribution of incomplete data is unbalanced, the proposed estimator will be superior when there is moderate-to-strong positive within-subject correlation. We further derive a closed-form sample size formula to help researchers determine how many subjects need to be enrolled in such studies. Simulation results suggest that the calculated sample size maintains the empirical power and type I error under various design configurations. We demonstrate the proposed method using a real application example. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  18. Sample size determination for estimating antibody seroconversion rate under stable malaria transmission intensity.

    PubMed

    Sepúlveda, Nuno; Drakeley, Chris

    2015-04-03

    In the last decade, several epidemiological studies have demonstrated the potential of using seroprevalence (SP) and seroconversion rate (SCR) as informative indicators of malaria burden in low transmission settings or in populations on the cusp of elimination. However, most of studies are designed to control ensuing statistical inference over parasite rates and not on these alternative malaria burden measures. SP is in essence a proportion and, thus, many methods exist for the respective sample size determination. In contrast, designing a study where SCR is the primary endpoint, is not an easy task because precision and statistical power are affected by the age distribution of a given population. Two sample size calculators for SCR estimation are proposed. The first one consists of transforming the confidence interval for SP into the corresponding one for SCR given a known seroreversion rate (SRR). The second calculator extends the previous one to the most common situation where SRR is unknown. In this situation, data simulation was used together with linear regression in order to study the expected relationship between sample size and precision. The performance of the first sample size calculator was studied in terms of the coverage of the confidence intervals for SCR. The results pointed out to eventual problems of under or over coverage for sample sizes ≤250 in very low and high malaria transmission settings (SCR ≤ 0.0036 and SCR ≥ 0.29, respectively). The correct coverage was obtained for the remaining transmission intensities with sample sizes ≥ 50. Sample size determination was then carried out for cross-sectional surveys using realistic SCRs from past sero-epidemiological studies and typical age distributions from African and non-African populations. For SCR < 0.058, African studies require a larger sample size than their non-African counterparts in order to obtain the same precision. The opposite happens for the remaining transmission intensities. With respect to the second sample size calculator, simulation unravelled the likelihood of not having enough information to estimate SRR in low transmission settings (SCR ≤ 0.0108). In that case, the respective estimates tend to underestimate the true SCR. This problem is minimized by sample sizes of no less than 500 individuals. The sample sizes determined by this second method highlighted the prior expectation that, when SRR is not known, sample sizes are increased in relation to the situation of a known SRR. In contrast to the first sample size calculation, African studies would now require lesser individuals than their counterparts conducted elsewhere, irrespective of the transmission intensity. Although the proposed sample size calculators can be instrumental to design future cross-sectional surveys, the choice of a particular sample size must be seen as a much broader exercise that involves weighting statistical precision with ethical issues, available human and economic resources, and possible time constraints. Moreover, if the sample size determination is carried out on varying transmission intensities, as done here, the respective sample sizes can also be used in studies comparing sites with different malaria transmission intensities. In conclusion, the proposed sample size calculators are a step towards the design of better sero-epidemiological studies. Their basic ideas show promise to be applied to the planning of alternative sampling schemes that may target or oversample specific age groups.

  19. State Estimates of Disability in America. Disability Statistics Report 3.

    ERIC Educational Resources Information Center

    LaPlante, Mitchell P.

    This study presents and discusses existing data on disability by state, from the 1980 and 1990 censuses, the Current Population Survey (CPS), and the National Health Interview Survey (NHIS). The study used direct methods for states with large sample sizes and synthetic estimates for states with low sample sizes. The study's highlighted findings…

  20. Moment and maximum likelihood estimators for Weibull distributions under length- and area-biased sampling

    Treesearch

    Jeffrey H. Gove

    2003-01-01

    Many of the most popular sampling schemes used in forestry are probability proportional to size methods. These methods are also referred to as size biased because sampling is actually from a weighted form of the underlying population distribution. Length- and area-biased sampling are special cases of size-biased sampling where the probability weighting comes from a...

  1. Accounting for Incomplete Species Detection in Fish Community Monitoring

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McManamay, Ryan A; Orth, Dr. Donald J; Jager, Yetta

    2013-01-01

    Riverine fish assemblages are heterogeneous and very difficult to characterize with a one-size-fits-all approach to sampling. Furthermore, detecting changes in fish assemblages over time requires accounting for variation in sampling designs. We present a modeling approach that permits heterogeneous sampling by accounting for site and sampling covariates (including method) in a model-based framework for estimation (versus a sampling-based framework). We snorkeled during three surveys and electrofished during a single survey in suite of delineated habitats stratified by reach types. We developed single-species occupancy models to determine covariates influencing patch occupancy and species detection probabilities whereas community occupancy models estimated speciesmore » richness in light of incomplete detections. For most species, information-theoretic criteria showed higher support for models that included patch size and reach as covariates of occupancy. In addition, models including patch size and sampling method as covariates of detection probabilities also had higher support. Detection probability estimates for snorkeling surveys were higher for larger non-benthic species whereas electrofishing was more effective at detecting smaller benthic species. The number of sites and sampling occasions required to accurately estimate occupancy varied among fish species. For rare benthic species, our results suggested that higher number of occasions, and especially the addition of electrofishing, may be required to improve detection probabilities and obtain accurate occupancy estimates. Community models suggested that richness was 41% higher than the number of species actually observed and the addition of an electrofishing survey increased estimated richness by 13%. These results can be useful to future fish assemblage monitoring efforts by informing sampling designs, such as site selection (e.g. stratifying based on patch size) and determining effort required (e.g. number of sites versus occasions).« less

  2. How Large Should a Statistical Sample Be?

    ERIC Educational Resources Information Center

    Menil, Violeta C.; Ye, Ruili

    2012-01-01

    This study serves as a teaching aid for teachers of introductory statistics. The aim of this study was limited to determining various sample sizes when estimating population proportion. Tables on sample sizes were generated using a C[superscript ++] program, which depends on population size, degree of precision or error level, and confidence…

  3. Revisiting sample size: are big trials the answer?

    PubMed

    Lurati Buse, Giovanna A L; Botto, Fernando; Devereaux, P J

    2012-07-18

    The superiority of the evidence generated in randomized controlled trials over observational data is not only conditional to randomization. Randomized controlled trials require proper design and implementation to provide a reliable effect estimate. Adequate random sequence generation, allocation implementation, analyses based on the intention-to-treat principle, and sufficient power are crucial to the quality of a randomized controlled trial. Power, or the probability of the trial to detect a difference when a real difference between treatments exists, strongly depends on sample size. The quality of orthopaedic randomized controlled trials is frequently threatened by a limited sample size. This paper reviews basic concepts and pitfalls in sample-size estimation and focuses on the importance of large trials in the generation of valid evidence.

  4. Sub-sampling genetic data to estimate black bear population size: A case study

    USGS Publications Warehouse

    Tredick, C.A.; Vaughan, M.R.; Stauffer, D.F.; Simek, S.L.; Eason, T.

    2007-01-01

    Costs for genetic analysis of hair samples collected for individual identification of bears average approximately US$50 [2004] per sample. This can easily exceed budgetary allowances for large-scale studies or studies of high-density bear populations. We used 2 genetic datasets from 2 areas in the southeastern United States to explore how reducing costs of analysis by sub-sampling affected precision and accuracy of resulting population estimates. We used several sub-sampling scenarios to create subsets of the full datasets and compared summary statistics, population estimates, and precision of estimates generated from these subsets to estimates generated from the complete datasets. Our results suggested that bias and precision of estimates improved as the proportion of total samples used increased, and heterogeneity models (e.g., Mh[CHAO]) were more robust to reduced sample sizes than other models (e.g., behavior models). We recommend that only high-quality samples (>5 hair follicles) be used when budgets are constrained, and efforts should be made to maximize capture and recapture rates in the field.

  5. A Note on Sample Size and Solution Propriety for Confirmatory Factor Analytic Models

    ERIC Educational Resources Information Center

    Jackson, Dennis L.; Voth, Jennifer; Frey, Marc P.

    2013-01-01

    Determining an appropriate sample size for use in latent variable modeling techniques has presented ongoing challenges to researchers. In particular, small sample sizes are known to present concerns over sampling error for the variances and covariances on which model estimation is based, as well as for fit indexes and convergence failures. The…

  6. Seasonal variation in size-dependent survival of juvenile Atlantic salmon (Salmo salar): Performance of multistate capture-mark-recapture models

    USGS Publications Warehouse

    Letcher, B.H.; Horton, G.E.

    2008-01-01

    We estimated the magnitude and shape of size-dependent survival (SDS) across multiple sampling intervals for two cohorts of stream-dwelling Atlantic salmon (Salmo salar) juveniles using multistate capture-mark-recapture (CMR) models. Simulations designed to test the effectiveness of multistate models for detecting SDS in our system indicated that error in SDS estimates was low and that both time-invariant and time-varying SDS could be detected with sample sizes of >250, average survival of >0.6, and average probability of capture of >0.6, except for cases of very strong SDS. In the field (N ??? 750, survival 0.6-0.8 among sampling intervals, probability of capture 0.6-0.8 among sampling occasions), about one-third of the sampling intervals showed evidence of SDS, with poorer survival of larger fish during the age-2+ autumn and quadratic survival (opposite direction between cohorts) during age-1+ spring. The varying magnitude and shape of SDS among sampling intervals suggest a potential mechanism for the maintenance of the very wide observed size distributions. Estimating SDS using multistate CMR models appears complementary to established approaches, can provide estimates with low error, and can be used to detect intermittent SDS. ?? 2008 NRC Canada.

  7. Statistical power analysis in wildlife research

    USGS Publications Warehouse

    Steidl, R.J.; Hayes, J.P.

    1997-01-01

    Statistical power analysis can be used to increase the efficiency of research efforts and to clarify research results. Power analysis is most valuable in the design or planning phases of research efforts. Such prospective (a priori) power analyses can be used to guide research design and to estimate the number of samples necessary to achieve a high probability of detecting biologically significant effects. Retrospective (a posteriori) power analysis has been advocated as a method to increase information about hypothesis tests that were not rejected. However, estimating power for tests of null hypotheses that were not rejected with the effect size observed in the study is incorrect; these power estimates will always be a??0.50 when bias adjusted and have no relation to true power. Therefore, retrospective power estimates based on the observed effect size for hypothesis tests that were not rejected are misleading; retrospective power estimates are only meaningful when based on effect sizes other than the observed effect size, such as those effect sizes hypothesized to be biologically significant. Retrospective power analysis can be used effectively to estimate the number of samples or effect size that would have been necessary for a completed study to have rejected a specific null hypothesis. Simply presenting confidence intervals can provide additional information about null hypotheses that were not rejected, including information about the size of the true effect and whether or not there is adequate evidence to 'accept' a null hypothesis as true. We suggest that (1) statistical power analyses be routinely incorporated into research planning efforts to increase their efficiency, (2) confidence intervals be used in lieu of retrospective power analyses for null hypotheses that were not rejected to assess the likely size of the true effect, (3) minimum biologically significant effect sizes be used for all power analyses, and (4) if retrospective power estimates are to be reported, then the I?-level, effect sizes, and sample sizes used in calculations must also be reported.

  8. A novel measure of effect size for mediation analysis.

    PubMed

    Lachowicz, Mark J; Preacher, Kristopher J; Kelley, Ken

    2018-06-01

    Mediation analysis has become one of the most popular statistical methods in the social sciences. However, many currently available effect size measures for mediation have limitations that restrict their use to specific mediation models. In this article, we develop a measure of effect size that addresses these limitations. We show how modification of a currently existing effect size measure results in a novel effect size measure with many desirable properties. We also derive an expression for the bias of the sample estimator for the proposed effect size measure and propose an adjusted version of the estimator. We present a Monte Carlo simulation study conducted to examine the finite sampling properties of the adjusted and unadjusted estimators, which shows that the adjusted estimator is effective at recovering the true value it estimates. Finally, we demonstrate the use of the effect size measure with an empirical example. We provide freely available software so that researchers can immediately implement the methods we discuss. Our developments here extend the existing literature on effect sizes and mediation by developing a potentially useful method of communicating the magnitude of mediation. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  9. Effects of social organization, trap arrangement and density, sampling scale, and population density on bias in population size estimation using some common mark-recapture estimators.

    PubMed

    Gupta, Manan; Joshi, Amitabh; Vidya, T N C

    2017-01-01

    Mark-recapture estimators are commonly used for population size estimation, and typically yield unbiased estimates for most solitary species with low to moderate home range sizes. However, these methods assume independence of captures among individuals, an assumption that is clearly violated in social species that show fission-fusion dynamics, such as the Asian elephant. In the specific case of Asian elephants, doubts have been raised about the accuracy of population size estimates. More importantly, the potential problem for the use of mark-recapture methods posed by social organization in general has not been systematically addressed. We developed an individual-based simulation framework to systematically examine the potential effects of type of social organization, as well as other factors such as trap density and arrangement, spatial scale of sampling, and population density, on bias in population sizes estimated by POPAN, Robust Design, and Robust Design with detection heterogeneity. In the present study, we ran simulations with biological, demographic and ecological parameters relevant to Asian elephant populations, but the simulation framework is easily extended to address questions relevant to other social species. We collected capture history data from the simulations, and used those data to test for bias in population size estimation. Social organization significantly affected bias in most analyses, but the effect sizes were variable, depending on other factors. Social organization tended to introduce large bias when trap arrangement was uniform and sampling effort was low. POPAN clearly outperformed the two Robust Design models we tested, yielding close to zero bias if traps were arranged at random in the study area, and when population density and trap density were not too low. Social organization did not have a major effect on bias for these parameter combinations at which POPAN gave more or less unbiased population size estimates. Therefore, the effect of social organization on bias in population estimation could be removed by using POPAN with specific parameter combinations, to obtain population size estimates in a social species.

  10. Effects of social organization, trap arrangement and density, sampling scale, and population density on bias in population size estimation using some common mark-recapture estimators

    PubMed Central

    Joshi, Amitabh; Vidya, T. N. C.

    2017-01-01

    Mark-recapture estimators are commonly used for population size estimation, and typically yield unbiased estimates for most solitary species with low to moderate home range sizes. However, these methods assume independence of captures among individuals, an assumption that is clearly violated in social species that show fission-fusion dynamics, such as the Asian elephant. In the specific case of Asian elephants, doubts have been raised about the accuracy of population size estimates. More importantly, the potential problem for the use of mark-recapture methods posed by social organization in general has not been systematically addressed. We developed an individual-based simulation framework to systematically examine the potential effects of type of social organization, as well as other factors such as trap density and arrangement, spatial scale of sampling, and population density, on bias in population sizes estimated by POPAN, Robust Design, and Robust Design with detection heterogeneity. In the present study, we ran simulations with biological, demographic and ecological parameters relevant to Asian elephant populations, but the simulation framework is easily extended to address questions relevant to other social species. We collected capture history data from the simulations, and used those data to test for bias in population size estimation. Social organization significantly affected bias in most analyses, but the effect sizes were variable, depending on other factors. Social organization tended to introduce large bias when trap arrangement was uniform and sampling effort was low. POPAN clearly outperformed the two Robust Design models we tested, yielding close to zero bias if traps were arranged at random in the study area, and when population density and trap density were not too low. Social organization did not have a major effect on bias for these parameter combinations at which POPAN gave more or less unbiased population size estimates. Therefore, the effect of social organization on bias in population estimation could be removed by using POPAN with specific parameter combinations, to obtain population size estimates in a social species. PMID:28306735

  11. Accounting for sampling error when inferring population synchrony from time-series data: a Bayesian state-space modelling approach with applications.

    PubMed

    Santin-Janin, Hugues; Hugueny, Bernard; Aubry, Philippe; Fouchet, David; Gimenez, Olivier; Pontier, Dominique

    2014-01-01

    Data collected to inform time variations in natural population size are tainted by sampling error. Ignoring sampling error in population dynamics models induces bias in parameter estimators, e.g., density-dependence. In particular, when sampling errors are independent among populations, the classical estimator of the synchrony strength (zero-lag correlation) is biased downward. However, this bias is rarely taken into account in synchrony studies although it may lead to overemphasizing the role of intrinsic factors (e.g., dispersal) with respect to extrinsic factors (the Moran effect) in generating population synchrony as well as to underestimating the extinction risk of a metapopulation. The aim of this paper was first to illustrate the extent of the bias that can be encountered in empirical studies when sampling error is neglected. Second, we presented a space-state modelling approach that explicitly accounts for sampling error when quantifying population synchrony. Third, we exemplify our approach with datasets for which sampling variance (i) has been previously estimated, and (ii) has to be jointly estimated with population synchrony. Finally, we compared our results to those of a standard approach neglecting sampling variance. We showed that ignoring sampling variance can mask a synchrony pattern whatever its true value and that the common practice of averaging few replicates of population size estimates poorly performed at decreasing the bias of the classical estimator of the synchrony strength. The state-space model used in this study provides a flexible way of accurately quantifying the strength of synchrony patterns from most population size data encountered in field studies, including over-dispersed count data. We provided a user-friendly R-program and a tutorial example to encourage further studies aiming at quantifying the strength of population synchrony to account for uncertainty in population size estimates.

  12. Accounting for Sampling Error When Inferring Population Synchrony from Time-Series Data: A Bayesian State-Space Modelling Approach with Applications

    PubMed Central

    Santin-Janin, Hugues; Hugueny, Bernard; Aubry, Philippe; Fouchet, David; Gimenez, Olivier; Pontier, Dominique

    2014-01-01

    Background Data collected to inform time variations in natural population size are tainted by sampling error. Ignoring sampling error in population dynamics models induces bias in parameter estimators, e.g., density-dependence. In particular, when sampling errors are independent among populations, the classical estimator of the synchrony strength (zero-lag correlation) is biased downward. However, this bias is rarely taken into account in synchrony studies although it may lead to overemphasizing the role of intrinsic factors (e.g., dispersal) with respect to extrinsic factors (the Moran effect) in generating population synchrony as well as to underestimating the extinction risk of a metapopulation. Methodology/Principal findings The aim of this paper was first to illustrate the extent of the bias that can be encountered in empirical studies when sampling error is neglected. Second, we presented a space-state modelling approach that explicitly accounts for sampling error when quantifying population synchrony. Third, we exemplify our approach with datasets for which sampling variance (i) has been previously estimated, and (ii) has to be jointly estimated with population synchrony. Finally, we compared our results to those of a standard approach neglecting sampling variance. We showed that ignoring sampling variance can mask a synchrony pattern whatever its true value and that the common practice of averaging few replicates of population size estimates poorly performed at decreasing the bias of the classical estimator of the synchrony strength. Conclusion/Significance The state-space model used in this study provides a flexible way of accurately quantifying the strength of synchrony patterns from most population size data encountered in field studies, including over-dispersed count data. We provided a user-friendly R-program and a tutorial example to encourage further studies aiming at quantifying the strength of population synchrony to account for uncertainty in population size estimates. PMID:24489839

  13. A modified approach to estimating sample size for simple logistic regression with one continuous covariate.

    PubMed

    Novikov, I; Fund, N; Freedman, L S

    2010-01-15

    Different methods for the calculation of sample size for simple logistic regression (LR) with one normally distributed continuous covariate give different results. Sometimes the difference can be large. Furthermore, some methods require the user to specify the prevalence of cases when the covariate equals its population mean, rather than the more natural population prevalence. We focus on two commonly used methods and show through simulations that the power for a given sample size may differ substantially from the nominal value for one method, especially when the covariate effect is large, while the other method performs poorly if the user provides the population prevalence instead of the required parameter. We propose a modification of the method of Hsieh et al. that requires specification of the population prevalence and that employs Schouten's sample size formula for a t-test with unequal variances and group sizes. This approach appears to increase the accuracy of the sample size estimates for LR with one continuous covariate.

  14. A new estimator of the discovery probability.

    PubMed

    Favaro, Stefano; Lijoi, Antonio; Prünster, Igor

    2012-12-01

    Species sampling problems have a long history in ecological and biological studies and a number of issues, including the evaluation of species richness, the design of sampling experiments, and the estimation of rare species variety, are to be addressed. Such inferential problems have recently emerged also in genomic applications, however, exhibiting some peculiar features that make them more challenging: specifically, one has to deal with very large populations (genomic libraries) containing a huge number of distinct species (genes) and only a small portion of the library has been sampled (sequenced). These aspects motivate the Bayesian nonparametric approach we undertake, since it allows to achieve the degree of flexibility typically needed in this framework. Based on an observed sample of size n, focus will be on prediction of a key aspect of the outcome from an additional sample of size m, namely, the so-called discovery probability. In particular, conditionally on an observed basic sample of size n, we derive a novel estimator of the probability of detecting, at the (n+m+1)th observation, species that have been observed with any given frequency in the enlarged sample of size n+m. Such an estimator admits a closed-form expression that can be exactly evaluated. The result we obtain allows us to quantify both the rate at which rare species are detected and the achieved sample coverage of abundant species, as m increases. Natural applications are represented by the estimation of the probability of discovering rare genes within genomic libraries and the results are illustrated by means of two expressed sequence tags datasets. © 2012, The International Biometric Society.

  15. Development of a sampling strategy and sample size calculation to estimate the distribution of mammographic breast density in Korean women.

    PubMed

    Jun, Jae Kwan; Kim, Mi Jin; Choi, Kui Son; Suh, Mina; Jung, Kyu-Won

    2012-01-01

    Mammographic breast density is a known risk factor for breast cancer. To conduct a survey to estimate the distribution of mammographic breast density in Korean women, appropriate sampling strategies for representative and efficient sampling design were evaluated through simulation. Using the target population from the National Cancer Screening Programme (NCSP) for breast cancer in 2009, we verified the distribution estimate by repeating the simulation 1,000 times using stratified random sampling to investigate the distribution of breast density of 1,340,362 women. According to the simulation results, using a sampling design stratifying the nation into three groups (metropolitan, urban, and rural), with a total sample size of 4,000, we estimated the distribution of breast density in Korean women at a level of 0.01% tolerance. Based on the results of our study, a nationwide survey for estimating the distribution of mammographic breast density among Korean women can be conducted efficiently.

  16. Influence of item distribution pattern and abundance on efficiency of benthic core sampling

    USGS Publications Warehouse

    Behney, Adam C.; O'Shaughnessy, Ryan; Eichholz, Michael W.; Stafford, Joshua D.

    2014-01-01

    ore sampling is a commonly used method to estimate benthic item density, but little information exists about factors influencing the accuracy and time-efficiency of this method. We simulated core sampling in a Geographic Information System framework by generating points (benthic items) and polygons (core samplers) to assess how sample size (number of core samples), core sampler size (cm2), distribution of benthic items, and item density affected the bias and precision of estimates of density, the detection probability of items, and the time-costs. When items were distributed randomly versus clumped, bias decreased and precision increased with increasing sample size and increased slightly with increasing core sampler size. Bias and precision were only affected by benthic item density at very low values (500–1,000 items/m2). Detection probability (the probability of capturing ≥ 1 item in a core sample if it is available for sampling) was substantially greater when items were distributed randomly as opposed to clumped. Taking more small diameter core samples was always more time-efficient than taking fewer large diameter samples. We are unable to present a single, optimal sample size, but provide information for researchers and managers to derive optimal sample sizes dependent on their research goals and environmental conditions.

  17. Sample Size and Allocation of Effort in Point Count Sampling of Birds in Bottomland Hardwood Forests

    Treesearch

    Winston P. Smith; Daniel J. Twedt; Robert J. Cooper; David A. Wiedenfeld; Paul B. Hamel; Robert P. Ford

    1995-01-01

    To examine sample size requirements and optimum allocation of effort in point count sampling of bottomland hardwood forests, we computed minimum sample sizes from variation recorded during 82 point counts (May 7-May 16, 1992) from three localities containing three habitat types across three regions of the Mississippi Alluvial Valley (MAV). Also, we estimated the effect...

  18. Monitoring Species of Concern Using Noninvasive Genetic Sampling and Capture-Recapture Methods

    DTIC Science & Technology

    2016-11-01

    ABBREVIATIONS AICc Akaike’s Information Criterion with small sample size correction AZGFD Arizona Game and Fish Department BMGR Barry M. Goldwater...MNKA Minimum Number Known Alive N Abundance Ne Effective Population Size NGS Noninvasive Genetic Sampling NGS-CR Noninvasive Genetic...parameter estimates from capture-recapture models require sufficient sample sizes , capture probabilities and low capture biases. For NGS-CR, sample

  19. Meta-analysis of multiple outcomes: a multilevel approach.

    PubMed

    Van den Noortgate, Wim; López-López, José Antonio; Marín-Martínez, Fulgencio; Sánchez-Meca, Julio

    2015-12-01

    In meta-analysis, dependent effect sizes are very common. An example is where in one or more studies the effect of an intervention is evaluated on multiple outcome variables for the same sample of participants. In this paper, we evaluate a three-level meta-analytic model to account for this kind of dependence, extending the simulation results of Van den Noortgate, López-López, Marín-Martínez, and Sánchez-Meca Behavior Research Methods, 45, 576-594 (2013) by allowing for a variation in the number of effect sizes per study, in the between-study variance, in the correlations between pairs of outcomes, and in the sample size of the studies. At the same time, we explore the performance of the approach if the outcomes used in a study can be regarded as a random sample from a population of outcomes. We conclude that although this approach is relatively simple and does not require prior estimates of the sampling covariances between effect sizes, it gives appropriate mean effect size estimates, standard error estimates, and confidence interval coverage proportions in a variety of realistic situations.

  20. How Big Is Big Enough? Sample Size Requirements for CAST Item Parameter Estimation

    ERIC Educational Resources Information Center

    Chuah, Siang Chee; Drasgow, Fritz; Luecht, Richard

    2006-01-01

    Adaptive tests offer the advantages of reduced test length and increased accuracy in ability estimation. However, adaptive tests require large pools of precalibrated items. This study looks at the development of an item pool for 1 type of adaptive administration: the computer-adaptive sequential test. An important issue is the sample size required…

  1. Sample Size Calculation for Estimating or Testing a Nonzero Squared Multiple Correlation Coefficient

    ERIC Educational Resources Information Center

    Krishnamoorthy, K.; Xia, Yanping

    2008-01-01

    The problems of hypothesis testing and interval estimation of the squared multiple correlation coefficient of a multivariate normal distribution are considered. It is shown that available one-sided tests are uniformly most powerful, and the one-sided confidence intervals are uniformly most accurate. An exact method of calculating sample size to…

  2. Sample size and power considerations in network meta-analysis

    PubMed Central

    2012-01-01

    Background Network meta-analysis is becoming increasingly popular for establishing comparative effectiveness among multiple interventions for the same disease. Network meta-analysis inherits all methodological challenges of standard pairwise meta-analysis, but with increased complexity due to the multitude of intervention comparisons. One issue that is now widely recognized in pairwise meta-analysis is the issue of sample size and statistical power. This issue, however, has so far only received little attention in network meta-analysis. To date, no approaches have been proposed for evaluating the adequacy of the sample size, and thus power, in a treatment network. Findings In this article, we develop easy-to-use flexible methods for estimating the ‘effective sample size’ in indirect comparison meta-analysis and network meta-analysis. The effective sample size for a particular treatment comparison can be interpreted as the number of patients in a pairwise meta-analysis that would provide the same degree and strength of evidence as that which is provided in the indirect comparison or network meta-analysis. We further develop methods for retrospectively estimating the statistical power for each comparison in a network meta-analysis. We illustrate the performance of the proposed methods for estimating effective sample size and statistical power using data from a network meta-analysis on interventions for smoking cessation including over 100 trials. Conclusion The proposed methods are easy to use and will be of high value to regulatory agencies and decision makers who must assess the strength of the evidence supporting comparative effectiveness estimates. PMID:22992327

  3. Estimating the breeding population of long-billed curlew in the United States

    USGS Publications Warehouse

    Stanley, T.R.; Skagen, S.K.

    2007-01-01

    Determining population size and long-term trends in population size for species of high concern is a priority of international, national, and regional conservation plans. Long-billed curlews (Numenius americanus) are a species of special concern in North America due to apparent declines in their population. Because long-billed curlews are not adequately monitored by existing programs, we undertook a 2-year study with the goals of 1) determining present long-billed curlew distribution and breeding population size in the United States and 2) providing recommendations for a long-term long-billed curlew monitoring protocol. We selected a stratified random sample of survey routes in 16 western states for sampling in 2004 and 2005, and we analyzed count data from these routes to estimate detection probabilities and abundance. In addition, we evaluated habitat along roadsides to determine how well roadsides represented habitat throughout the sampling units. We estimated there were 164,515 (SE = 42,047) breeding long-billed curlews in 2004, and 109,533 (SE = 31,060) breeding individuals in 2005. These estimates far exceed currently accepted estimates based on expert opinion. We found that habitat along roadsides was representative of long-billed curlew habitat in general. We make recommendations for improving sampling methodology, and we present power curves to provide guidance on minimum sample sizes required to detect trends in abundance.

  4. Relative efficiency of unequal versus equal cluster sizes in cluster randomized trials using generalized estimating equation models.

    PubMed

    Liu, Jingxia; Colditz, Graham A

    2018-05-01

    There is growing interest in conducting cluster randomized trials (CRTs). For simplicity in sample size calculation, the cluster sizes are assumed to be identical across all clusters. However, equal cluster sizes are not guaranteed in practice. Therefore, the relative efficiency (RE) of unequal versus equal cluster sizes has been investigated when testing the treatment effect. One of the most important approaches to analyze a set of correlated data is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which the "working correlation structure" is introduced and the association pattern depends on a vector of association parameters denoted by ρ. In this paper, we utilize GEE models to test the treatment effect in a two-group comparison for continuous, binary, or count data in CRTs. The variances of the estimator of the treatment effect are derived for the different types of outcome. RE is defined as the ratio of variance of the estimator of the treatment effect for equal to unequal cluster sizes. We discuss a commonly used structure in CRTs-exchangeable, and derive the simpler formula of RE with continuous, binary, and count outcomes. Finally, REs are investigated for several scenarios of cluster size distributions through simulation studies. We propose an adjusted sample size due to efficiency loss. Additionally, we also propose an optimal sample size estimation based on the GEE models under a fixed budget for known and unknown association parameter (ρ) in the working correlation structure within the cluster. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Diagnostic test accuracy and prevalence inferences based on joint and sequential testing with finite population sampling.

    PubMed

    Su, Chun-Lung; Gardner, Ian A; Johnson, Wesley O

    2004-07-30

    The two-test two-population model, originally formulated by Hui and Walter, for estimation of test accuracy and prevalence estimation assumes conditionally independent tests, constant accuracy across populations and binomial sampling. The binomial assumption is incorrect if all individuals in a population e.g. child-care centre, village in Africa, or a cattle herd are sampled or if the sample size is large relative to population size. In this paper, we develop statistical methods for evaluating diagnostic test accuracy and prevalence estimation based on finite sample data in the absence of a gold standard. Moreover, two tests are often applied simultaneously for the purpose of obtaining a 'joint' testing strategy that has either higher overall sensitivity or specificity than either of the two tests considered singly. Sequential versions of such strategies are often applied in order to reduce the cost of testing. We thus discuss joint (simultaneous and sequential) testing strategies and inference for them. Using the developed methods, we analyse two real and one simulated data sets, and we compare 'hypergeometric' and 'binomial-based' inferences. Our findings indicate that the posterior standard deviations for prevalence (but not sensitivity and specificity) based on finite population sampling tend to be smaller than their counterparts for infinite population sampling. Finally, we make recommendations about how small the sample size should be relative to the population size to warrant use of the binomial model for prevalence estimation. Copyright 2004 John Wiley & Sons, Ltd.

  6. Population size and stopover duration estimation using mark–resight data and Bayesian analysis of a superpopulation model

    USGS Publications Warehouse

    Lyons, James E.; Kendall, William L.; Royle, J. Andrew; Converse, Sarah J.; Andres, Brad A.; Buchanan, Joseph B.

    2016-01-01

    We present a novel formulation of a mark–recapture–resight model that allows estimation of population size, stopover duration, and arrival and departure schedules at migration areas. Estimation is based on encounter histories of uniquely marked individuals and relative counts of marked and unmarked animals. We use a Bayesian analysis of a state–space formulation of the Jolly–Seber mark–recapture model, integrated with a binomial model for counts of unmarked animals, to derive estimates of population size and arrival and departure probabilities. We also provide a novel estimator for stopover duration that is derived from the latent state variable representing the interim between arrival and departure in the state–space model. We conduct a simulation study of field sampling protocols to understand the impact of superpopulation size, proportion marked, and number of animals sampled on bias and precision of estimates. Simulation results indicate that relative bias of estimates of the proportion of the population with marks was low for all sampling scenarios and never exceeded 2%. Our approach does not require enumeration of all unmarked animals detected or direct knowledge of the number of marked animals in the population at the time of the study. This provides flexibility and potential application in a variety of sampling situations (e.g., migratory birds, breeding seabirds, sea turtles, fish, pinnipeds, etc.). Application of the methods is demonstrated with data from a study of migratory sandpipers.

  7. Reproducibility of preclinical animal research improves with heterogeneity of study samples

    PubMed Central

    Vogt, Lucile; Sena, Emily S.; Würbel, Hanno

    2018-01-01

    Single-laboratory studies conducted under highly standardized conditions are the gold standard in preclinical animal research. Using simulations based on 440 preclinical studies across 13 different interventions in animal models of stroke, myocardial infarction, and breast cancer, we compared the accuracy of effect size estimates between single-laboratory and multi-laboratory study designs. Single-laboratory studies generally failed to predict effect size accurately, and larger sample sizes rendered effect size estimates even less accurate. By contrast, multi-laboratory designs including as few as 2 to 4 laboratories increased coverage probability by up to 42 percentage points without a need for larger sample sizes. These findings demonstrate that within-study standardization is a major cause of poor reproducibility. More representative study samples are required to improve the external validity and reproducibility of preclinical animal research and to prevent wasting animals and resources for inconclusive research. PMID:29470495

  8. Design and analysis of three-arm trials with negative binomially distributed endpoints.

    PubMed

    Mütze, Tobias; Munk, Axel; Friede, Tim

    2016-02-20

    A three-arm clinical trial design with an experimental treatment, an active control, and a placebo control, commonly referred to as the gold standard design, enables testing of non-inferiority or superiority of the experimental treatment compared with the active control. In this paper, we propose methods for designing and analyzing three-arm trials with negative binomially distributed endpoints. In particular, we develop a Wald-type test with a restricted maximum-likelihood variance estimator for testing non-inferiority or superiority. For this test, sample size and power formulas as well as optimal sample size allocations will be derived. The performance of the proposed test will be assessed in an extensive simulation study with regard to type I error rate, power, sample size, and sample size allocation. For the purpose of comparison, Wald-type statistics with a sample variance estimator and an unrestricted maximum-likelihood estimator are included in the simulation study. We found that the proposed Wald-type test with a restricted variance estimator performed well across the considered scenarios and is therefore recommended for application in clinical trials. The methods proposed are motivated and illustrated by a recent clinical trial in multiple sclerosis. The R package ThreeArmedTrials, which implements the methods discussed in this paper, is available on CRAN. Copyright © 2015 John Wiley & Sons, Ltd.

  9. The Influence of Mark-Recapture Sampling Effort on Estimates of Rock Lobster Survival

    PubMed Central

    Kordjazi, Ziya; Frusher, Stewart; Buxton, Colin; Gardner, Caleb; Bird, Tomas

    2016-01-01

    Five annual capture-mark-recapture surveys on Jasus edwardsii were used to evaluate the effect of sample size and fishing effort on the precision of estimated survival probability. Datasets of different numbers of individual lobsters (ranging from 200 to 1,000 lobsters) were created by random subsampling from each annual survey. This process of random subsampling was also used to create 12 datasets of different levels of effort based on three levels of the number of traps (15, 30 and 50 traps per day) and four levels of the number of sampling-days (2, 4, 6 and 7 days). The most parsimonious Cormack-Jolly-Seber (CJS) model for estimating survival probability shifted from a constant model towards sex-dependent models with increasing sample size and effort. A sample of 500 lobsters or 50 traps used on four consecutive sampling-days was required for obtaining precise survival estimations for males and females, separately. Reduced sampling effort of 30 traps over four sampling days was sufficient if a survival estimate for both sexes combined was sufficient for management of the fishery. PMID:26990561

  10. Application of particle size distributions to total particulate stack samples to estimate PM2.5 and PM10 emission factors for agricultural sources

    USDA-ARS?s Scientific Manuscript database

    Particle size distributions (PSD) have long been used to more accurately estimate the PM10 fraction of total particulate matter (PM) stack samples taken from agricultural sources. These PSD analyses were typically conducted using a Coulter Counter with 50 micrometer aperture tube. With recent increa...

  11. The use of group sequential, information-based sample size re-estimation in the design of the PRIMO study of chronic kidney disease.

    PubMed

    Pritchett, Yili; Jemiai, Yannis; Chang, Yuchiao; Bhan, Ishir; Agarwal, Rajiv; Zoccali, Carmine; Wanner, Christoph; Lloyd-Jones, Donald; Cannata-Andía, Jorge B; Thompson, Taylor; Appelbaum, Evan; Audhya, Paul; Andress, Dennis; Zhang, Wuyan; Solomon, Scott; Manning, Warren J; Thadhani, Ravi

    2011-04-01

    Chronic kidney disease is associated with a marked increase in risk for left ventricular hypertrophy and cardiovascular mortality compared with the general population. Therapy with vitamin D receptor activators has been linked with reduced mortality in chronic kidney disease and an improvement in left ventricular hypertrophy in animal studies. PRIMO (Paricalcitol capsules benefits in Renal failure Induced cardia MOrbidity) is a multinational, multicenter randomized controlled trial to assess the effects of paricalcitol (a selective vitamin D receptor activator) on mild to moderate left ventricular hypertrophy in patients with chronic kidney disease. Subjects with mild-moderate chronic kidney disease are randomized to paricalcitol or placebo after confirming left ventricular hypertrophy using a cardiac echocardiogram. Cardiac magnetic resonance imaging is then used to assess left ventricular mass index at baseline, 24 and 48 weeks, which is the primary efficacy endpoint of the study. Because of limited prior data to estimate sample size, a maximum information group sequential design with sample size re-estimation is implemented to allow sample size adjustment based on the nuisance parameter estimated using the interim data. An interim efficacy analysis is planned at a pre-specified time point conditioned on the status of enrollment. The decision to increase sample size depends on the observed treatment effect. A repeated measures analysis model, using available data at Week 24 and 48 with a backup model of an ANCOVA analyzing change from baseline to the final nonmissing observation, are pre-specified to evaluate the treatment effect. Gamma-family of spending function is employed to control family-wise Type I error rate as stopping for success is planned in the interim efficacy analysis. If enrollment is slower than anticipated, the smaller sample size used in the interim efficacy analysis and the greater percent of missing week 48 data might decrease the parameter estimation accuracy, either for the nuisance parameter or for the treatment effect, which might in turn affect the interim decision-making. The application of combining a group sequential design with a sample-size re-estimation in clinical trial design has the potential to improve efficiency and to increase the probability of trial success while ensuring integrity of the study.

  12. A Bayesian nonparametric method for prediction in EST analysis

    PubMed Central

    Lijoi, Antonio; Mena, Ramsés H; Prünster, Igor

    2007-01-01

    Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. PMID:17868445

  13. A standardized sampling protocol for channel catfish in prairie streams

    USGS Publications Warehouse

    Vokoun, Jason C.; Rabeni, Charles F.

    2001-01-01

    Three alternative gears—an AC electrofishing raft, bankpoles, and a 15-hoop-net set—were used in a standardized manner to sample channel catfish Ictalurus punctatus in three prairie streams of varying size in three seasons. We compared these gears as to time required per sample, size selectivity, mean catch per unit effort (CPUE) among months, mean CPUE within months, effect of fluctuating stream stage, and sensitivity to population size. According to these comparisons, the 15-hoop-net set used during stable water levels in October had the most desirable characteristics. Using our catch data, we estimated the precision of CPUE and size structure by varying sample sizes for the 15-hoop-net set. We recommend that 11–15 repetitions of the 15-hoop-net set be used for most management activities. This standardized basic unit of effort will increase the precision of estimates and allow better comparisons among samples as well as increased confidence in management decisions.

  14. Optimizing Sampling Efficiency for Biomass Estimation Across NEON Domains

    NASA Astrophysics Data System (ADS)

    Abercrombie, H. H.; Meier, C. L.; Spencer, J. J.

    2013-12-01

    Over the course of 30 years, the National Ecological Observatory Network (NEON) will measure plant biomass and productivity across the U.S. to enable an understanding of terrestrial carbon cycle responses to ecosystem change drivers. Over the next several years, prior to operational sampling at a site, NEON will complete construction and characterization phases during which a limited amount of sampling will be done at each site to inform sampling designs, and guide standardization of data collection across all sites. Sampling biomass in 60+ sites distributed among 20 different eco-climatic domains poses major logistical and budgetary challenges. Traditional biomass sampling methods such as clip harvesting and direct measurements of Leaf Area Index (LAI) involve collecting and processing plant samples, and are time and labor intensive. Possible alternatives include using indirect sampling methods for estimating LAI such as digital hemispherical photography (DHP) or using a LI-COR 2200 Plant Canopy Analyzer. These LAI estimations can then be used as a proxy for biomass. The biomass estimates calculated can then inform the clip harvest sampling design during NEON operations, optimizing both sample size and number so that standardized uncertainty limits can be achieved with a minimum amount of sampling effort. In 2011, LAI and clip harvest data were collected from co-located sampling points at the Central Plains Experimental Range located in northern Colorado, a short grass steppe ecosystem that is the NEON Domain 10 core site. LAI was measured with a LI-COR 2200 Plant Canopy Analyzer. The layout of the sampling design included four, 300 meter transects, with clip harvests plots spaced every 50m, and LAI sub-transects spaced every 10m. LAI was measured at four points along 6m sub-transects running perpendicular to the 300m transect. Clip harvest plots were co-located 4m from corresponding LAI transects, and had dimensions of 0.1m by 2m. We conducted regression analyses with LAI and clip harvest data to determine whether LAI can be used as a suitable proxy for aboveground standing biomass. We also compared optimal sample sizes derived from LAI data, and clip-harvest data from two different size clip harvest areas (0.1m by 1m vs. 0.1m by 2m). Sample sizes were calculated in order to estimate the mean to within a standardized level of uncertainty that will be used to guide sampling effort across all vegetation types (i.e. estimated within × 10% with 95% confidence). Finally, we employed a Semivariogram approach to determine optimal sample size and spacing.

  15. Optimally estimating the sample mean from the sample size, median, mid-range, and/or mid-quartile range.

    PubMed

    Luo, Dehui; Wan, Xiang; Liu, Jiming; Tong, Tiejun

    2018-06-01

    The era of big data is coming, and evidence-based medicine is attracting increasing attention to improve decision making in medical practice via integrating evidence from well designed and conducted clinical research. Meta-analysis is a statistical technique widely used in evidence-based medicine for analytically combining the findings from independent clinical trials to provide an overall estimation of a treatment effectiveness. The sample mean and standard deviation are two commonly used statistics in meta-analysis but some trials use the median, the minimum and maximum values, or sometimes the first and third quartiles to report the results. Thus, to pool results in a consistent format, researchers need to transform those information back to the sample mean and standard deviation. In this article, we investigate the optimal estimation of the sample mean for meta-analysis from both theoretical and empirical perspectives. A major drawback in the literature is that the sample size, needless to say its importance, is either ignored or used in a stepwise but somewhat arbitrary manner, e.g. the famous method proposed by Hozo et al. We solve this issue by incorporating the sample size in a smoothly changing weight in the estimators to reach the optimal estimation. Our proposed estimators not only improve the existing ones significantly but also share the same virtue of the simplicity. The real data application indicates that our proposed estimators are capable to serve as "rules of thumb" and will be widely applied in evidence-based medicine.

  16. Combining counts and incidence data: an efficient approach for estimating the log-normal species abundance distribution and diversity indices.

    PubMed

    Bellier, Edwige; Grøtan, Vidar; Engen, Steinar; Schartau, Ann Kristin; Diserud, Ola H; Finstad, Anders G

    2012-10-01

    Obtaining accurate estimates of diversity indices is difficult because the number of species encountered in a sample increases with sampling intensity. We introduce a novel method that requires that the presence of species in a sample to be assessed while the counts of the number of individuals per species are only required for just a small part of the sample. To account for species included as incidence data in the species abundance distribution, we modify the likelihood function of the classical Poisson log-normal distribution. Using simulated community assemblages, we contrast diversity estimates based on a community sample, a subsample randomly extracted from the community sample, and a mixture sample where incidence data are added to a subsample. We show that the mixture sampling approach provides more accurate estimates than the subsample and at little extra cost. Diversity indices estimated from a freshwater zooplankton community sampled using the mixture approach show the same pattern of results as the simulation study. Our method efficiently increases the accuracy of diversity estimates and comprehension of the left tail of the species abundance distribution. We show how to choose the scale of sample size needed for a compromise between information gained, accuracy of the estimates and cost expended when assessing biological diversity. The sample size estimates are obtained from key community characteristics, such as the expected number of species in the community, the expected number of individuals in a sample and the evenness of the community.

  17. Qualitative Meta-Analysis on the Hospital Task: Implications for Research

    ERIC Educational Resources Information Center

    Noll, Jennifer; Sharma, Sashi

    2014-01-01

    The "law of large numbers" indicates that as sample size increases, sample statistics become less variable and more closely estimate their corresponding population parameters. Different research studies investigating how people consider sample size when evaluating the reliability of a sample statistic have found a wide range of…

  18. What is the extent of prokaryotic diversity?

    PubMed Central

    Curtis, Thomas P; Head, Ian M; Lunn, Mary; Woodcock, Stephen; Schloss, Patrick D; Sloan, William T

    2006-01-01

    The extent of microbial diversity is an intrinsically fascinating subject of profound practical importance. The term ‘diversity’ may allude to the number of taxa or species richness as well as their relative abundance. There is uncertainty about both, primarily because sample sizes are too small. Non-parametric diversity estimators make gross underestimates if used with small sample sizes on unevenly distributed communities. One can make richness estimates over many scales using small samples by assuming a species/taxa-abundance distribution. However, no one knows what the underlying taxa-abundance distributions are for bacterial communities. Latterly, diversity has been estimated by fitting data from gene clone libraries and extrapolating from this to taxa-abundance curves to estimate richness. However, since sample sizes are small, we cannot be sure that such samples are representative of the community from which they were drawn. It is however possible to formulate, and calibrate, models that predict the diversity of local communities and of samples drawn from that local community. The calibration of such models suggests that migration rates are small and decrease as the community gets larger. The preliminary predictions of the model are qualitatively consistent with the patterns seen in clone libraries in ‘real life’. The validation of this model is also confounded by small sample sizes. However, if such models were properly validated, they could form invaluable tools for the prediction of microbial diversity and a basis for the systematic exploration of microbial diversity on the planet. PMID:17028084

  19. Interval estimation and optimal design for the within-subject coefficient of variation for continuous and binary variables

    PubMed Central

    Shoukri, Mohamed M; Elkum, Nasser; Walter, Stephen D

    2006-01-01

    Background In this paper we propose the use of the within-subject coefficient of variation as an index of a measurement's reliability. For continuous variables and based on its maximum likelihood estimation we derive a variance-stabilizing transformation and discuss confidence interval construction within the framework of a one-way random effects model. We investigate sample size requirements for the within-subject coefficient of variation for continuous and binary variables. Methods We investigate the validity of the approximate normal confidence interval by Monte Carlo simulations. In designing a reliability study, a crucial issue is the balance between the number of subjects to be recruited and the number of repeated measurements per subject. We discuss efficiency of estimation and cost considerations for the optimal allocation of the sample resources. The approach is illustrated by an example on Magnetic Resonance Imaging (MRI). We also discuss the issue of sample size estimation for dichotomous responses with two examples. Results For the continuous variable we found that the variance stabilizing transformation improves the asymptotic coverage probabilities on the within-subject coefficient of variation for the continuous variable. The maximum like estimation and sample size estimation based on pre-specified width of confidence interval are novel contribution to the literature for the binary variable. Conclusion Using the sample size formulas, we hope to help clinical epidemiologists and practicing statisticians to efficiently design reliability studies using the within-subject coefficient of variation, whether the variable of interest is continuous or binary. PMID:16686943

  20. Rock magnetic properties estimated from coercivity - blocking temperature diagram: application to recent volcanic rocks

    NASA Astrophysics Data System (ADS)

    Terada, T.; Sato, M.; Mochizuki, N.; Yamamoto, Y.; Tsunakawa, H.

    2013-12-01

    Magnetic properties of ferromagnetic minerals generally depend on their chemical composition, crystal structure, size, and shape. In the usual paleomagnetic study, we use a bulk sample which is the assemblage of magnetic minerals showing broad distributions of various magnetic properties. Microscopic and Curie-point observations of the bulk sample enable us to identify the constituent magnetic minerals, while other measurements, for example, stepwise thermal and/or alternating field demagnetizations (ThD, AFD) make it possible to estimate size, shape and domain state of the constituent magnetic grains. However, estimation based on stepwise demagnetizations has a limitation that magnetic grains with the same coercivity Hc (or blocking temperature Tb) can be identified as the single population even though they could have different size and shape. Dunlop and West (1969) carried out mapping of grain size and coercivity (Hc) using pTRM. However, it is considered that their mapping method is basically applicable to natural rocks containing only SD grains, since the grain sizes are estimated on the basis of the single domain theory (Neel, 1949). In addition, it is impossible to check thermal alteration due to laboratory heating in their experiment. In the present study we propose a new experimental method which makes it possible to estimate distribution of size and shape of magnetic minerals in a bulk sample. The present method is composed of simple procedures: (1) imparting ARM to a bulk sample, (2) ThD at a certain temperature, (3) stepwise AFD on the remaining ARM, (4) repeating the steps (1) ~ (3) with ThD at elevating temperatures up to the Curie temperature of the sample. After completion of the whole procedures, ARM spectra are calculated and mapped on the HC-Tb plane (hereafter called HC-Tb diagram). We analyze the Hc-Tb diagrams as follows: (1) For uniaxial SD populations, theoretical curve for a certain grain size (or shape anisotropy) is drawn on the Hc-Tb diagram. The curves are calculated using the single domain theory, since coercivity and blocking temperature of uniaxial SD grains can be expressed as a function of size and shape. (2) Boundary between SD and MD grains are calculated and drawn on the Hc-Tb diagram according to the theory by Butler and Banerjee (1975). (3) Theoretical predictions by (1) and (2) are compared with the obtained ARM spectra to estimate quantitive distribution of size, shape and domain state of magnetic grains in the sample. This mapping method has been applied to three samples: Hawaiian basaltic lava extruded in 1995, Ueno basaltic lava formed during Matsuyama chron, and Oshima basaltic lava extruded in 1986. We will discuss physical states of magnetic grains (size, shape, domain state, etc.) and their possible origins.

  1. Sample allocation balancing overall representativeness and stratum precision.

    PubMed

    Diaz-Quijano, Fredi Alexander

    2018-05-07

    In large-scale surveys, it is often necessary to distribute a preset sample size among a number of strata. Researchers must make a decision between prioritizing overall representativeness or precision of stratum estimates. Hence, I evaluated different sample allocation strategies based on stratum size. The strategies evaluated herein included allocation proportional to stratum population; equal sample for all strata; and proportional to the natural logarithm, cubic root, and square root of the stratum population. This study considered the fact that, from a preset sample size, the dispersion index of stratum sampling fractions is correlated with the population estimator error and the dispersion index of stratum-specific sampling errors would measure the inequality in precision distribution. Identification of a balanced and efficient strategy was based on comparing those both dispersion indices. Balance and efficiency of the strategies changed depending on overall sample size. As the sample to be distributed increased, the most efficient allocation strategies were equal sample for each stratum; proportional to the logarithm, to the cubic root, to square root; and that proportional to the stratum population, respectively. Depending on sample size, each of the strategies evaluated could be considered in optimizing the sample to keep both overall representativeness and stratum-specific precision. Copyright © 2018 Elsevier Inc. All rights reserved.

  2. Temporal dynamics of linkage disequilibrium in two populations of bighorn sheep

    PubMed Central

    Miller, Joshua M; Poissant, Jocelyn; Malenfant, René M; Hogg, John T; Coltman, David W

    2015-01-01

    Linkage disequilibrium (LD) is the nonrandom association of alleles at two markers. Patterns of LD have biological implications as well as practical ones when designing association studies or conservation programs aimed at identifying the genetic basis of fitness differences within and among populations. However, the temporal dynamics of LD in wild populations has received little empirical attention. In this study, we examined the overall extent of LD, the effect of sample size on the accuracy and precision of LD estimates, and the temporal dynamics of LD in two populations of bighorn sheep (Ovis canadensis) with different demographic histories. Using over 200 microsatellite loci, we assessed two metrics of multi-allelic LD, D′, and χ′2. We found that both populations exhibited high levels of LD, although the extent was much shorter in a native population than one that was founded via translocation, experienced a prolonged bottleneck post founding, followed by recent admixture. In addition, we observed significant variation in LD in relation to the sample size used, with small sample sizes leading to depressed estimates of the extent of LD but inflated estimates of background levels of LD. In contrast, there was not much variation in LD among yearly cross-sections within either population once sample size was accounted for. Lack of pronounced interannual variability suggests that researchers may not have to worry about interannual variation when estimating LD in a population and can instead focus on obtaining the largest sample size possible. PMID:26380673

  3. Population size estimation in Yellowstone wolves with error-prone noninvasive microsatellite genotypes.

    PubMed

    Creel, Scott; Spong, Goran; Sands, Jennifer L; Rotella, Jay; Zeigle, Janet; Joe, Lawrence; Murphy, Kerry M; Smith, Douglas

    2003-07-01

    Determining population sizes can be difficult, but is essential for conservation. By counting distinct microsatellite genotypes, DNA from noninvasive samples (hair, faeces) allows estimation of population size. Problems arise because genotypes from noninvasive samples are error-prone, but genotyping errors can be reduced by multiple polymerase chain reaction (PCR). For faecal genotypes from wolves in Yellowstone National Park, error rates varied substantially among samples, often above the 'worst-case threshold' suggested by simulation. Consequently, a substantial proportion of multilocus genotypes held one or more errors, despite multiple PCR. These genotyping errors created several genotypes per individual and caused overestimation (up to 5.5-fold) of population size. We propose a 'matching approach' to eliminate this overestimation bias.

  4. The impact of multiple endpoint dependency on Q and I(2) in meta-analysis.

    PubMed

    Thompson, Christopher Glen; Becker, Betsy Jane

    2014-09-01

    A common assumption in meta-analysis is that effect sizes are independent. When correlated effect sizes are analyzed using traditional univariate techniques, this assumption is violated. This research assesses the impact of dependence arising from treatment-control studies with multiple endpoints on homogeneity measures Q and I(2) in scenarios using the unbiased standardized-mean-difference effect size. Univariate and multivariate meta-analysis methods are examined. Conditions included different overall outcome effects, study sample sizes, numbers of studies, between-outcomes correlations, dependency structures, and ways of computing the correlation. The univariate approach used typical fixed-effects analyses whereas the multivariate approach used generalized least-squares (GLS) estimates of a fixed-effects model, weighted by the inverse variance-covariance matrix. Increased dependence among effect sizes led to increased Type I error rates from univariate models. When effect sizes were strongly dependent, error rates were drastically higher than nominal levels regardless of study sample size and number of studies. In contrast, using GLS estimation to account for multiple-endpoint dependency maintained error rates within nominal levels. Conversely, mean I(2) values were not greatly affected by increased amounts of dependency. Last, we point out that the between-outcomes correlation should be estimated as a pooled within-groups correlation rather than using a full-sample estimator that does not consider treatment/control group membership. Copyright © 2014 John Wiley & Sons, Ltd.

  5. Accounting for missing data in the estimation of contemporary genetic effective population size (N(e) ).

    PubMed

    Peel, D; Waples, R S; Macbeth, G M; Do, C; Ovenden, J R

    2013-03-01

    Theoretical models are often applied to population genetic data sets without fully considering the effect of missing data. Researchers can deal with missing data by removing individuals that have failed to yield genotypes and/or by removing loci that have failed to yield allelic determinations, but despite their best efforts, most data sets still contain some missing data. As a consequence, realized sample size differs among loci, and this poses a problem for unbiased methods that must explicitly account for random sampling error. One commonly used solution for the calculation of contemporary effective population size (N(e) ) is to calculate the effective sample size as an unweighted mean or harmonic mean across loci. This is not ideal because it fails to account for the fact that loci with different numbers of alleles have different information content. Here we consider this problem for genetic estimators of contemporary effective population size (N(e) ). To evaluate bias and precision of several statistical approaches for dealing with missing data, we simulated populations with known N(e) and various degrees of missing data. Across all scenarios, one method of correcting for missing data (fixed-inverse variance-weighted harmonic mean) consistently performed the best for both single-sample and two-sample (temporal) methods of estimating N(e) and outperformed some methods currently in widespread use. The approach adopted here may be a starting point to adjust other population genetics methods that include per-locus sample size components. © 2012 Blackwell Publishing Ltd.

  6. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach

    PubMed Central

    Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

    2016-01-01

    Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927

  7. A log-linear model approach to estimation of population size using the line-transect sampling method

    USGS Publications Warehouse

    Anderson, D.R.; Burnham, K.P.; Crain, B.R.

    1978-01-01

    The technique of estimating wildlife population size and density using the belt or line-transect sampling method has been used in many past projects, such as the estimation of density of waterfowl nestling sites in marshes, and is being used currently in such areas as the assessment of Pacific porpoise stocks in regions of tuna fishing activity. A mathematical framework for line-transect methodology has only emerged in the last 5 yr. In the present article, we extend this mathematical framework to a line-transect estimator based upon a log-linear model approach.

  8. Combining the boundary shift integral and tensor-based morphometry for brain atrophy estimation

    NASA Astrophysics Data System (ADS)

    Michalkiewicz, Mateusz; Pai, Akshay; Leung, Kelvin K.; Sommer, Stefan; Darkner, Sune; Sørensen, Lauge; Sporring, Jon; Nielsen, Mads

    2016-03-01

    Brain atrophy from structural magnetic resonance images (MRIs) is widely used as an imaging surrogate marker for Alzheimers disease. Their utility has been limited due to the large degree of variance and subsequently high sample size estimates. The only consistent and reasonably powerful atrophy estimation methods has been the boundary shift integral (BSI). In this paper, we first propose a tensor-based morphometry (TBM) method to measure voxel-wise atrophy that we combine with BSI. The combined model decreases the sample size estimates significantly when compared to BSI and TBM alone.

  9. Sample Size for Tablet Compression and Capsule Filling Events During Process Validation.

    PubMed

    Charoo, Naseem Ahmad; Durivage, Mark; Rahman, Ziyaur; Ayad, Mohamad Haitham

    2017-12-01

    During solid dosage form manufacturing, the uniformity of dosage units (UDU) is ensured by testing samples at 2 stages, that is, blend stage and tablet compression or capsule/powder filling stage. The aim of this work is to propose a sample size selection approach based on quality risk management principles for process performance qualification (PPQ) and continued process verification (CPV) stages by linking UDU to potential formulation and process risk factors. Bayes success run theorem appeared to be the most appropriate approach among various methods considered in this work for computing sample size for PPQ. The sample sizes for high-risk (reliability level of 99%), medium-risk (reliability level of 95%), and low-risk factors (reliability level of 90%) were estimated to be 299, 59, and 29, respectively. Risk-based assignment of reliability levels was supported by the fact that at low defect rate, the confidence to detect out-of-specification units would decrease which must be supplemented with an increase in sample size to enhance the confidence in estimation. Based on level of knowledge acquired during PPQ and the level of knowledge further required to comprehend process, sample size for CPV was calculated using Bayesian statistics to accomplish reduced sampling design for CPV. Copyright © 2017 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.

  10. A Re-Evaluation of the Size of the White Shark (Carcharodon carcharias) Population off California, USA

    PubMed Central

    Burgess, George H.; Bruce, Barry D.; Cailliet, Gregor M.; Goldman, Kenneth J.; Grubbs, R. Dean; Lowe, Christopher G.; MacNeil, M. Aaron; Mollet, Henry F.; Weng, Kevin C.; O'Sullivan, John B.

    2014-01-01

    White sharks are highly migratory and segregate by sex, age and size. Unlike marine mammals, they neither surface to breathe nor frequent haul-out sites, hindering generation of abundance data required to estimate population size. A recent tag-recapture study used photographic identifications of white sharks at two aggregation sites to estimate abundance in “central California” at 219 mature and sub-adult individuals. They concluded this represented approximately one-half of the total abundance of mature and sub-adult sharks in the entire eastern North Pacific Ocean (ENP). This low estimate generated great concern within the conservation community, prompting petitions for governmental endangered species designations. We critically examine that study and find violations of model assumptions that, when considered in total, lead to population underestimates. We also use a Bayesian mixture model to demonstrate that the inclusion of transient sharks, characteristic of white shark aggregation sites, would substantially increase abundance estimates for the adults and sub-adults in the surveyed sub-population. Using a dataset obtained from the same sampling locations and widely accepted demographic methodology, our analysis indicates a minimum all-life stages population size of >2000 individuals in the California subpopulation is required to account for the number and size range of individual sharks observed at the two sampled sites. Even accounting for methodological and conceptual biases, an extrapolation of these data to estimate the white shark population size throughout the ENP is inappropriate. The true ENP white shark population size is likely several-fold greater as both our study and the original published estimate exclude non-aggregating sharks and those that independently aggregate at other important ENP sites. Accurately estimating the central California and ENP white shark population size requires methodologies that account for biases introduced by sampling a limited number of sites and that account for all life history stages across the species' range of habitats. PMID:24932483

  11. A re-evaluation of the size of the white shark (Carcharodon carcharias) population off California, USA.

    PubMed

    Burgess, George H; Bruce, Barry D; Cailliet, Gregor M; Goldman, Kenneth J; Grubbs, R Dean; Lowe, Christopher G; MacNeil, M Aaron; Mollet, Henry F; Weng, Kevin C; O'Sullivan, John B

    2014-01-01

    White sharks are highly migratory and segregate by sex, age and size. Unlike marine mammals, they neither surface to breathe nor frequent haul-out sites, hindering generation of abundance data required to estimate population size. A recent tag-recapture study used photographic identifications of white sharks at two aggregation sites to estimate abundance in "central California" at 219 mature and sub-adult individuals. They concluded this represented approximately one-half of the total abundance of mature and sub-adult sharks in the entire eastern North Pacific Ocean (ENP). This low estimate generated great concern within the conservation community, prompting petitions for governmental endangered species designations. We critically examine that study and find violations of model assumptions that, when considered in total, lead to population underestimates. We also use a Bayesian mixture model to demonstrate that the inclusion of transient sharks, characteristic of white shark aggregation sites, would substantially increase abundance estimates for the adults and sub-adults in the surveyed sub-population. Using a dataset obtained from the same sampling locations and widely accepted demographic methodology, our analysis indicates a minimum all-life stages population size of >2000 individuals in the California subpopulation is required to account for the number and size range of individual sharks observed at the two sampled sites. Even accounting for methodological and conceptual biases, an extrapolation of these data to estimate the white shark population size throughout the ENP is inappropriate. The true ENP white shark population size is likely several-fold greater as both our study and the original published estimate exclude non-aggregating sharks and those that independently aggregate at other important ENP sites. Accurately estimating the central California and ENP white shark population size requires methodologies that account for biases introduced by sampling a limited number of sites and that account for all life history stages across the species' range of habitats.

  12. Tests of Independence in Contingency Tables with Small Samples: A Comparison of Statistical Power.

    ERIC Educational Resources Information Center

    Parshall, Cynthia G.; Kromrey, Jeffrey D.

    1996-01-01

    Power and Type I error rates were estimated for contingency tables with small sample sizes for the following four types of tests: (1) Pearson's chi-square; (2) chi-square with Yates's continuity correction; (3) the likelihood ratio test; and (4) Fisher's Exact Test. Various marginal distributions, sample sizes, and effect sizes were examined. (SLD)

  13. A general unified framework to assess the sampling variance of heritability estimates using pedigree or marker-based relationships.

    PubMed

    Visscher, Peter M; Goddard, Michael E

    2015-01-01

    Heritability is a population parameter of importance in evolution, plant and animal breeding, and human medical genetics. It can be estimated using pedigree designs and, more recently, using relationships estimated from markers. We derive the sampling variance of the estimate of heritability for a wide range of experimental designs, assuming that estimation is by maximum likelihood and that the resemblance between relatives is solely due to additive genetic variation. We show that well-known results for balanced designs are special cases of a more general unified framework. For pedigree designs, the sampling variance is inversely proportional to the variance of relationship in the pedigree and it is proportional to 1/N, whereas for population samples it is approximately proportional to 1/N(2), where N is the sample size. Variation in relatedness is a key parameter in the quantification of the sampling variance of heritability. Consequently, the sampling variance is high for populations with large recent effective population size (e.g., humans) because this causes low variation in relationship. However, even using human population samples, low sampling variance is possible with high N. Copyright © 2015 by the Genetics Society of America.

  14. Sample Size Calculations for Precise Interval Estimation of the Eta-Squared Effect Size

    ERIC Educational Resources Information Center

    Shieh, Gwowen

    2015-01-01

    Analysis of variance is one of the most frequently used statistical analyses in the behavioral, educational, and social sciences, and special attention has been paid to the selection and use of an appropriate effect size measure of association in analysis of variance. This article presents the sample size procedures for precise interval estimation…

  15. Finite mixture model: A maximum likelihood estimation approach on time series data

    NASA Astrophysics Data System (ADS)

    Yen, Phoong Seuk; Ismail, Mohd Tahir; Hamzah, Firdaus Mohamad

    2014-09-01

    Recently, statistician emphasized on the fitting of finite mixture model by using maximum likelihood estimation as it provides asymptotic properties. In addition, it shows consistency properties as the sample sizes increases to infinity. This illustrated that maximum likelihood estimation is an unbiased estimator. Moreover, the estimate parameters obtained from the application of maximum likelihood estimation have smallest variance as compared to others statistical method as the sample sizes increases. Thus, maximum likelihood estimation is adopted in this paper to fit the two-component mixture model in order to explore the relationship between rubber price and exchange rate for Malaysia, Thailand, Philippines and Indonesia. Results described that there is a negative effect among rubber price and exchange rate for all selected countries.

  16. Simulating realistic predator signatures in quantitative fatty acid signature analysis

    USGS Publications Warehouse

    Bromaghin, Jeffrey F.

    2015-01-01

    Diet estimation is an important field within quantitative ecology, providing critical insights into many aspects of ecology and community dynamics. Quantitative fatty acid signature analysis (QFASA) is a prominent method of diet estimation, particularly for marine mammal and bird species. Investigators using QFASA commonly use computer simulation to evaluate statistical characteristics of diet estimators for the populations they study. Similar computer simulations have been used to explore and compare the performance of different variations of the original QFASA diet estimator. In both cases, computer simulations involve bootstrap sampling prey signature data to construct pseudo-predator signatures with known properties. However, bootstrap sample sizes have been selected arbitrarily and pseudo-predator signatures therefore may not have realistic properties. I develop an algorithm to objectively establish bootstrap sample sizes that generates pseudo-predator signatures with realistic properties, thereby enhancing the utility of computer simulation for assessing QFASA estimator performance. The algorithm also appears to be computationally efficient, resulting in bootstrap sample sizes that are smaller than those commonly used. I illustrate the algorithm with an example using data from Chukchi Sea polar bears (Ursus maritimus) and their marine mammal prey. The concepts underlying the approach may have value in other areas of quantitative ecology in which bootstrap samples are post-processed prior to their use.

  17. Re-estimating sample size in cluster randomised trials with active recruitment within clusters.

    PubMed

    van Schie, S; Moerbeek, M

    2014-08-30

    Often only a limited number of clusters can be obtained in cluster randomised trials, although many potential participants can be recruited within each cluster. Thus, active recruitment is feasible within the clusters. To obtain an efficient sample size in a cluster randomised trial, the cluster level and individual level variance should be known before the study starts, but this is often not the case. We suggest using an internal pilot study design to address this problem of unknown variances. A pilot can be useful to re-estimate the variances and re-calculate the sample size during the trial. Using simulated data, it is shown that an initially low or high power can be adjusted using an internal pilot with the type I error rate remaining within an acceptable range. The intracluster correlation coefficient can be re-estimated with more precision, which has a positive effect on the sample size. We conclude that an internal pilot study design may be used if active recruitment is feasible within a limited number of clusters. Copyright © 2014 John Wiley & Sons, Ltd.

  18. Novel joint selection methods can reduce sample size for rheumatoid arthritis clinical trials with ultrasound endpoints.

    PubMed

    Allen, John C; Thumboo, Julian; Lye, Weng Kit; Conaghan, Philip G; Chew, Li-Ching; Tan, York Kiat

    2018-03-01

    To determine whether novel methods of selecting joints through (i) ultrasonography (individualized-ultrasound [IUS] method), or (ii) ultrasonography and clinical examination (individualized-composite-ultrasound [ICUS] method) translate into smaller rheumatoid arthritis (RA) clinical trial sample sizes when compared to existing methods utilizing predetermined joint sites for ultrasonography. Cohen's effect size (ES) was estimated (ES^) and a 95% CI (ES^L, ES^U) calculated on a mean change in 3-month total inflammatory score for each method. Corresponding 95% CIs [nL(ES^U), nU(ES^L)] were obtained on a post hoc sample size reflecting the uncertainty in ES^. Sample size calculations were based on a one-sample t-test as the patient numbers needed to provide 80% power at α = 0.05 to reject a null hypothesis H 0 : ES = 0 versus alternative hypotheses H 1 : ES = ES^, ES = ES^L and ES = ES^U. We aimed to provide point and interval estimates on projected sample sizes for future studies reflecting the uncertainty in our study ES^S. Twenty-four treated RA patients were followed up for 3 months. Utilizing the 12-joint approach and existing methods, the post hoc sample size (95% CI) was 22 (10-245). Corresponding sample sizes using ICUS and IUS were 11 (7-40) and 11 (6-38), respectively. Utilizing a seven-joint approach, the corresponding sample sizes using ICUS and IUS methods were nine (6-24) and 11 (6-35), respectively. Our pilot study suggests that sample size for RA clinical trials with ultrasound endpoints may be reduced using the novel methods, providing justification for larger studies to confirm these observations. © 2017 Asia Pacific League of Associations for Rheumatology and John Wiley & Sons Australia, Ltd.

  19. Statistical theory and methodology for remote sensing data analysis

    NASA Technical Reports Server (NTRS)

    Odell, P. L.

    1974-01-01

    A model is developed for the evaluation of acreages (proportions) of different crop-types over a geographical area using a classification approach and methods for estimating the crop acreages are given. In estimating the acreages of a specific croptype such as wheat, it is suggested to treat the problem as a two-crop problem: wheat vs. nonwheat, since this simplifies the estimation problem considerably. The error analysis and the sample size problem is investigated for the two-crop approach. Certain numerical results for sample sizes are given for a JSC-ERTS-1 data example on wheat identification performance in Hill County, Montana and Burke County, North Dakota. Lastly, for a large area crop acreages inventory a sampling scheme is suggested for acquiring sample data and the problem of crop acreage estimation and the error analysis is discussed.

  20. Improving size estimates of open animal populations by incorporating information on age

    USGS Publications Warehouse

    Manly, Bryan F.J.; McDonald, Trent L.; Amstrup, Steven C.; Regehr, Eric V.

    2003-01-01

    Around the world, a great deal of effort is expended each year to estimate the sizes of wild animal populations. Unfortunately, population size has proven to be one of the most intractable parameters to estimate. The capture-recapture estimation models most commonly used (of the Jolly-Seber type) are complicated and require numerous, sometimes questionable, assumptions. The derived estimates usually have large variances and lack consistency over time. In capture–recapture studies of long-lived animals, the ages of captured animals can often be determined with great accuracy and relative ease. We show how to incorporate age information into size estimates for open populations, where the size changes through births, deaths, immigration, and emigration. The proposed method allows more precise estimates of population size than the usual models, and it can provide these estimates from two sample occasions rather than the three usually required. Moreover, this method does not require specialized programs for capture-recapture data; researchers can derive their estimates using the logistic regression module in any standard statistical package.

  1. The numerical evaluation of maximum-likelihood estimates of the parameters for a mixture of normal distributions from partially identified samples

    NASA Technical Reports Server (NTRS)

    Walker, H. F.

    1976-01-01

    Likelihood equations determined by the two types of samples which are necessary conditions for a maximum-likelihood estimate are considered. These equations, suggest certain successive-approximations iterative procedures for obtaining maximum-likelihood estimates. These are generalized steepest ascent (deflected gradient) procedures. It is shown that, with probability 1 as N sub 0 approaches infinity (regardless of the relative sizes of N sub 0 and N sub 1, i=1,...,m), these procedures converge locally to the strongly consistent maximum-likelihood estimates whenever the step size is between 0 and 2. Furthermore, the value of the step size which yields optimal local convergence rates is bounded from below by a number which always lies between 1 and 2.

  2. ESTIMATING SAMPLE REQUIREMENTS FOR FIELD EVALUATIONS OF PESTICIDE LEACHING

    EPA Science Inventory

    A method is presented for estimating the number of samples needed to evaluate pesticide leaching threats to ground water at a desired level of precision. Sample size projections are based on desired precision (exhibited as relative tolerable error), level of confidence (90 or 95%...

  3. Sample Size and Power Estimates for a Confirmatory Factor Analytic Model in Exercise and Sport: A Monte Carlo Approach

    ERIC Educational Resources Information Center

    Myers, Nicholas D.; Ahn, Soyeon; Jin, Ying

    2011-01-01

    Monte Carlo methods can be used in data analytic situations (e.g., validity studies) to make decisions about sample size and to estimate power. The purpose of using Monte Carlo methods in a validity study is to improve the methodological approach within a study where the primary focus is on construct validity issues and not on advancing…

  4. Sample size and power for cost-effectiveness analysis (part 1).

    PubMed

    Glick, Henry A

    2011-03-01

    Basic sample size and power formulae for cost-effectiveness analysis have been established in the literature. These formulae are reviewed and the similarities and differences between sample size and power for cost-effectiveness analysis and for the analysis of other continuous variables such as changes in blood pressure or weight are described. The types of sample size and power tables that are commonly calculated for cost-effectiveness analysis are also described and the impact of varying the assumed parameter values on the resulting sample size and power estimates is discussed. Finally, the way in which the data for these calculations may be derived are discussed.

  5. Chi-Squared Test of Fit and Sample Size-A Comparison between a Random Sample Approach and a Chi-Square Value Adjustment Method.

    PubMed

    Bergh, Daniel

    2015-01-01

    Chi-square statistics are commonly used for tests of fit of measurement models. Chi-square is also sensitive to sample size, which is why several approaches to handle large samples in test of fit analysis have been developed. One strategy to handle the sample size problem may be to adjust the sample size in the analysis of fit. An alternative is to adopt a random sample approach. The purpose of this study was to analyze and to compare these two strategies using simulated data. Given an original sample size of 21,000, for reductions of sample sizes down to the order of 5,000 the adjusted sample size function works as good as the random sample approach. In contrast, when applying adjustments to sample sizes of lower order the adjustment function is less effective at approximating the chi-square value for an actual random sample of the relevant size. Hence, the fit is exaggerated and misfit under-estimated using the adjusted sample size function. Although there are big differences in chi-square values between the two approaches at lower sample sizes, the inferences based on the p-values may be the same.

  6. Approximate Sample Size Formulas for Testing Group Mean Differences when Variances Are Unequal in One-Way ANOVA

    ERIC Educational Resources Information Center

    Guo, Jiin-Huarng; Luh, Wei-Ming

    2008-01-01

    This study proposes an approach for determining appropriate sample size for Welch's F test when unequal variances are expected. Given a certain maximum deviation in population means and using the quantile of F and t distributions, there is no need to specify a noncentrality parameter and it is easy to estimate the approximate sample size needed…

  7. Non-invasive genetic censusing and monitoring of primate populations.

    PubMed

    Arandjelovic, Mimi; Vigilant, Linda

    2018-03-01

    Knowing the density or abundance of primate populations is essential for their conservation management and contextualizing socio-demographic and behavioral observations. When direct counts of animals are not possible, genetic analysis of non-invasive samples collected from wildlife populations allows estimates of population size with higher accuracy and precision than is possible using indirect signs. Furthermore, in contrast to traditional indirect survey methods, prolonged or periodic genetic sampling across months or years enables inference of group membership, movement, dynamics, and some kin relationships. Data may also be used to estimate sex ratios, sex differences in dispersal distances, and detect gene flow among locations. Recent advances in capture-recapture models have further improved the precision of population estimates derived from non-invasive samples. Simulations using these methods have shown that the confidence interval of point estimates includes the true population size when assumptions of the models are met, and therefore this range of population size minima and maxima should be emphasized in population monitoring studies. Innovations such as the use of sniffer dogs or anti-poaching patrols for sample collection are important to ensure adequate sampling, and the expected development of efficient and cost-effective genotyping by sequencing methods for DNAs derived from non-invasive samples will automate and speed analyses. © 2018 Wiley Periodicals, Inc.

  8. Conservative Sample Size Determination for Repeated Measures Analysis of Covariance.

    PubMed

    Morgan, Timothy M; Case, L Douglas

    2013-07-05

    In the design of a randomized clinical trial with one pre and multiple post randomized assessments of the outcome variable, one needs to account for the repeated measures in determining the appropriate sample size. Unfortunately, one seldom has a good estimate of the variance of the outcome measure, let alone the correlations among the measurements over time. We show how sample sizes can be calculated by making conservative assumptions regarding the correlations for a variety of covariance structures. The most conservative choice for the correlation depends on the covariance structure and the number of repeated measures. In the absence of good estimates of the correlations, the sample size is often based on a two-sample t-test, making the 'ultra' conservative and unrealistic assumption that there are zero correlations between the baseline and follow-up measures while at the same time assuming there are perfect correlations between the follow-up measures. Compared to the case of taking a single measurement, substantial savings in sample size can be realized by accounting for the repeated measures, even with very conservative assumptions regarding the parameters of the assumed correlation matrix. Assuming compound symmetry, the sample size from the two-sample t-test calculation can be reduced at least 44%, 56%, and 61% for repeated measures analysis of covariance by taking 2, 3, and 4 follow-up measures, respectively. The results offer a rational basis for determining a fairly conservative, yet efficient, sample size for clinical trials with repeated measures and a baseline value.

  9. Automated sampling assessment for molecular simulations using the effective sample size

    PubMed Central

    Zhang, Xin; Bhatt, Divesh; Zuckerman, Daniel M.

    2010-01-01

    To quantify the progress in the development of algorithms and forcefields used in molecular simulations, a general method for the assessment of the sampling quality is needed. Statistical mechanics principles suggest the populations of physical states characterize equilibrium sampling in a fundamental way. We therefore develop an approach for analyzing the variances in state populations, which quantifies the degree of sampling in terms of the effective sample size (ESS). The ESS estimates the number of statistically independent configurations contained in a simulated ensemble. The method is applicable to both traditional dynamics simulations as well as more modern (e.g., multi–canonical) approaches. Our procedure is tested in a variety of systems from toy models to atomistic protein simulations. We also introduce a simple automated procedure to obtain approximate physical states from dynamic trajectories: this allows sample–size estimation in systems for which physical states are not known in advance. PMID:21221418

  10. Increased accuracy of batch fecundity estimates using oocyte stage ratios in Plectropomus leopardus.

    PubMed

    Carter, A B; Williams, A J; Russ, G R

    2009-08-01

    Using the ratio of the number of migratory nuclei to hydrated oocytes to estimate batch fecundity of common coral trout Plectropomus leopardus increases the time over which samples can be collected and, therefore, increases the sample size available and reduces biases in batch fecundity estimates.

  11. "PowerUp"!: A Tool for Calculating Minimum Detectable Effect Sizes and Minimum Required Sample Sizes for Experimental and Quasi-Experimental Design Studies

    ERIC Educational Resources Information Center

    Dong, Nianbo; Maynard, Rebecca

    2013-01-01

    This paper and the accompanying tool are intended to complement existing supports for conducting power analysis tools by offering a tool based on the framework of Minimum Detectable Effect Sizes (MDES) formulae that can be used in determining sample size requirements and in estimating minimum detectable effect sizes for a range of individual- and…

  12. Improved variance estimation of classification performance via reduction of bias caused by small sample size.

    PubMed

    Wickenberg-Bolin, Ulrika; Göransson, Hanna; Fryknäs, Mårten; Gustafsson, Mats G; Isaksson, Anders

    2006-03-13

    Supervised learning for classification of cancer employs a set of design examples to learn how to discriminate between tumors. In practice it is crucial to confirm that the classifier is robust with good generalization performance to new examples, or at least that it performs better than random guessing. A suggested alternative is to obtain a confidence interval of the error rate using repeated design and test sets selected from available examples. However, it is known that even in the ideal situation of repeated designs and tests with completely novel samples in each cycle, a small test set size leads to a large bias in the estimate of the true variance between design sets. Therefore different methods for small sample performance estimation such as a recently proposed procedure called Repeated Random Sampling (RSS) is also expected to result in heavily biased estimates, which in turn translates into biased confidence intervals. Here we explore such biases and develop a refined algorithm called Repeated Independent Design and Test (RIDT). Our simulations reveal that repeated designs and tests based on resampling in a fixed bag of samples yield a biased variance estimate. We also demonstrate that it is possible to obtain an improved variance estimate by means of a procedure that explicitly models how this bias depends on the number of samples used for testing. For the special case of repeated designs and tests using new samples for each design and test, we present an exact analytical expression for how the expected value of the bias decreases with the size of the test set. We show that via modeling and subsequent reduction of the small sample bias, it is possible to obtain an improved estimate of the variance of classifier performance between design sets. However, the uncertainty of the variance estimate is large in the simulations performed indicating that the method in its present form cannot be directly applied to small data sets.

  13. Comparison of Two Methods Used to Model Shape Parameters of Pareto Distributions

    USGS Publications Warehouse

    Liu, C.; Charpentier, R.R.; Su, J.

    2011-01-01

    Two methods are compared for estimating the shape parameters of Pareto field-size (or pool-size) distributions for petroleum resource assessment. Both methods assume mature exploration in which most of the larger fields have been discovered. Both methods use the sizes of larger discovered fields to estimate the numbers and sizes of smaller fields: (1) the tail-truncated method uses a plot of field size versus size rank, and (2) the log-geometric method uses data binned in field-size classes and the ratios of adjacent bin counts. Simulation experiments were conducted using discovered oil and gas pool-size distributions from four petroleum systems in Alberta, Canada and using Pareto distributions generated by Monte Carlo simulation. The estimates of the shape parameters of the Pareto distributions, calculated by both the tail-truncated and log-geometric methods, generally stabilize where discovered pool numbers are greater than 100. However, with fewer than 100 discoveries, these estimates can vary greatly with each new discovery. The estimated shape parameters of the tail-truncated method are more stable and larger than those of the log-geometric method where the number of discovered pools is more than 100. Both methods, however, tend to underestimate the shape parameter. Monte Carlo simulation was also used to create sequences of discovered pool sizes by sampling from a Pareto distribution with a discovery process model using a defined exploration efficiency (in order to show how biased the sampling was in favor of larger fields being discovered first). A higher (more biased) exploration efficiency gives better estimates of the Pareto shape parameters. ?? 2011 International Association for Mathematical Geosciences.

  14. Estimating population size for Capercaillie (Tetrao urogallus L.) with spatial capture-recapture models based on genotypes from one field sample

    USGS Publications Warehouse

    Mollet, Pierre; Kery, Marc; Gardner, Beth; Pasinelli, Gilberto; Royle, Andy

    2015-01-01

    We conducted a survey of an endangered and cryptic forest grouse, the capercaillie Tetrao urogallus, based on droppings collected on two sampling occasions in eight forest fragments in central Switzerland in early spring 2009. We used genetic analyses to sex and individually identify birds. We estimated sex-dependent detection probabilities and population size using a modern spatial capture-recapture (SCR) model for the data from pooled surveys. A total of 127 capercaillie genotypes were identified (77 males, 46 females, and 4 of unknown sex). The SCR model yielded atotal population size estimate (posterior mean) of 137.3 capercaillies (posterior sd 4.2, 95% CRI 130–147). The observed sex ratio was skewed towards males (0.63). The posterior mean of the sex ratio under the SCR model was 0.58 (posterior sd 0.02, 95% CRI 0.54–0.61), suggesting a male-biased sex ratio in our study area. A subsampling simulation study indicated that a reduced sampling effort representing 75% of the actual detections would still yield practically acceptable estimates of total size and sex ratio in our population. Hence, field work and financial effort could be reduced without compromising accuracy when the SCR model is used to estimate key population parameters of cryptic species.

  15. Effective population sizes of a major vector of human diseases, Aedes aegypti.

    PubMed

    Saarman, Norah P; Gloria-Soria, Andrea; Anderson, Eric C; Evans, Benjamin R; Pless, Evlyn; Cosme, Luciano V; Gonzalez-Acosta, Cassandra; Kamgang, Basile; Wesson, Dawn M; Powell, Jeffrey R

    2017-12-01

    The effective population size ( N e ) is a fundamental parameter in population genetics that determines the relative strength of selection and random genetic drift, the effect of migration, levels of inbreeding, and linkage disequilibrium. In many cases where it has been estimated in animals, N e is on the order of 10%-20% of the census size. In this study, we use 12 microsatellite markers and 14,888 single nucleotide polymorphisms (SNPs) to empirically estimate N e in Aedes aegypti , the major vector of yellow fever, dengue, chikungunya, and Zika viruses. We used the method of temporal sampling to estimate N e on a global dataset made up of 46 samples of Ae. aegypti that included multiple time points from 17 widely distributed geographic localities. Our N e estimates for Ae. aegypti fell within a broad range (~25-3,000) and averaged between 400 and 600 across all localities and time points sampled. Adult census size (N c ) estimates for this species range between one and five thousand, so the N e / N c ratio is about the same as for most animals. These N e values are lower than estimates available for other insects and have important implications for the design of genetic control strategies to reduce the impact of this species of mosquito on human health.

  16. Improved population estimates through the use of auxiliary information

    USGS Publications Warehouse

    Johnson, D.H.; Ralph, C.J.; Scott, J.M.

    1981-01-01

    When estimating the size of a population of birds, the investigator may have, in addition to an estimator based on a statistical sample, information on one of several auxiliary variables, such as: (1) estimates of the population made on previous occasions, (2) measures of habitat variables associated with the size of the population, and (3) estimates of the population sizes of other species that correlate with the species of interest. Although many studies have described the relationships between each of these kinds of data and the population size to be estimated, very little work has been done to improve the estimator by incorporating such auxiliary information. A statistical methodology termed 'empirical Bayes' seems to be appropriate to these situations. The potential that empirical Bayes methodology has for improved estimation of the population size of the Mallard (Anas platyrhynchos) is explored. In the example considered, three empirical Bayes estimators were found to reduce the error by one-fourth to one-half of that of the usual estimator.

  17. Effects of sample size and sampling frequency on studies of brown bear home ranges and habitat use

    USGS Publications Warehouse

    Arthur, Steve M.; Schwartz, Charles C.

    1999-01-01

    We equipped 9 brown bears (Ursus arctos) on the Kenai Peninsula, Alaska, with collars containing both conventional very-high-frequency (VHF) transmitters and global positioning system (GPS) receivers programmed to determine an animal's position at 5.75-hr intervals. We calculated minimum convex polygon (MCP) and fixed and adaptive kernel home ranges for randomly-selected subsets of the GPS data to examine the effects of sample size on accuracy and precision of home range estimates. We also compared results obtained by weekly aerial radiotracking versus more frequent GPS locations to test for biases in conventional radiotracking data. Home ranges based on the MCP were 20-606 km2 (x = 201) for aerial radiotracking data (n = 12-16 locations/bear) and 116-1,505 km2 (x = 522) for the complete GPS data sets (n = 245-466 locations/bear). Fixed kernel home ranges were 34-955 km2 (x = 224) for radiotracking data and 16-130 km2 (x = 60) for the GPS data. Differences between means for radiotracking and GPS data were due primarily to the larger samples provided by the GPS data. Means did not differ between radiotracking data and equivalent-sized subsets of GPS data (P > 0.10). For the MCP, home range area increased and variability decreased asymptotically with number of locations. For the kernel models, both area and variability decreased with increasing sample size. Simulations suggested that the MCP and kernel models required >60 and >80 locations, respectively, for estimates to be both accurate (change in area <1%/additional location) and precise (CV < 50%). Although the radiotracking data appeared unbiased, except for the relationship between area and sample size, these data failed to indicate some areas that likely were important to bears. Our results suggest that the usefulness of conventional radiotracking data may be limited by potential biases and variability due to small samples. Investigators that use home range estimates in statistical tests should consider the effects of variability of those estimates. Use of GPS-equipped collars can facilitate obtaining larger samples of unbiased data and improve accuracy and precision of home range estimates.

  18. Estimation of sample size and testing power (part 6).

    PubMed

    Hu, Liang-ping; Bao, Xiao-lei; Guan, Xue; Zhou, Shi-guo

    2012-03-01

    The design of one factor with k levels (k ≥ 3) refers to the research that only involves one experimental factor with k levels (k ≥ 3), and there is no arrangement for other important non-experimental factors. This paper introduces the estimation of sample size and testing power for quantitative data and qualitative data having a binary response variable with the design of one factor with k levels (k ≥ 3).

  19. Estimation of sample size and testing power (Part 3).

    PubMed

    Hu, Liang-ping; Bao, Xiao-lei; Guan, Xue; Zhou, Shi-guo

    2011-12-01

    This article introduces the definition and sample size estimation of three special tests (namely, non-inferiority test, equivalence test and superiority test) for qualitative data with the design of one factor with two levels having a binary response variable. Non-inferiority test refers to the research design of which the objective is to verify that the efficacy of the experimental drug is not clinically inferior to that of the positive control drug. Equivalence test refers to the research design of which the objective is to verify that the experimental drug and the control drug have clinically equivalent efficacy. Superiority test refers to the research design of which the objective is to verify that the efficacy of the experimental drug is clinically superior to that of the control drug. By specific examples, this article introduces formulas of sample size estimation for the three special tests, and their SAS realization in detail.

  20. The relationship between national-level carbon dioxide emissions and population size: an assessment of regional and temporal variation, 1960-2005.

    PubMed

    Jorgenson, Andrew K; Clark, Brett

    2013-01-01

    This study examines the regional and temporal differences in the statistical relationship between national-level carbon dioxide emissions and national-level population size. The authors analyze panel data from 1960 to 2005 for a diverse sample of nations, and employ descriptive statistics and rigorous panel regression modeling techniques. Initial descriptive analyses indicate that all regions experienced overall increases in carbon emissions and population size during the 45-year period of investigation, but with notable differences. For carbon emissions, the sample of countries in Asia experienced the largest percent increase, followed by countries in Latin America, Africa, and lastly the sample of relatively affluent countries in Europe, North America, and Oceania combined. For population size, the sample of countries in Africa experienced the largest percent increase, followed countries in Latin America, Asia, and the combined sample of countries in Europe, North America, and Oceania. Findings for two-way fixed effects panel regression elasticity models of national-level carbon emissions indicate that the estimated elasticity coefficient for population size is much smaller for nations in Africa than for nations in other regions of the world. Regarding potential temporal changes, from 1960 to 2005 the estimated elasticity coefficient for population size decreased by 25% for the sample of Africa countries, 14% for the sample of Asia countries, 6.5% for the sample of Latin America countries, but remained the same in size for the sample of countries in Europe, North America, and Oceania. Overall, while population size continues to be the primary driver of total national-level anthropogenic carbon dioxide emissions, the findings for this study highlight the need for future research and policies to recognize that the actual impacts of population size on national-level carbon emissions differ across both time and region.

  1. Sample size planning for composite reliability coefficients: accuracy in parameter estimation via narrow confidence intervals.

    PubMed

    Terry, Leann; Kelley, Ken

    2012-11-01

    Composite measures play an important role in psychology and related disciplines. Composite measures almost always have error. Correspondingly, it is important to understand the reliability of the scores from any particular composite measure. However, the point estimates of the reliability of composite measures are fallible and thus all such point estimates should be accompanied by a confidence interval. When confidence intervals are wide, there is much uncertainty in the population value of the reliability coefficient. Given the importance of reporting confidence intervals for estimates of reliability, coupled with the undesirability of wide confidence intervals, we develop methods that allow researchers to plan sample size in order to obtain narrow confidence intervals for population reliability coefficients. We first discuss composite reliability coefficients and then provide a discussion on confidence interval formation for the corresponding population value. Using the accuracy in parameter estimation approach, we develop two methods to obtain accurate estimates of reliability by planning sample size. The first method provides a way to plan sample size so that the expected confidence interval width for the population reliability coefficient is sufficiently narrow. The second method ensures that the confidence interval width will be sufficiently narrow with some desired degree of assurance (e.g., 99% assurance that the 95% confidence interval for the population reliability coefficient will be less than W units wide). The effectiveness of our methods was verified with Monte Carlo simulation studies. We demonstrate how to easily implement the methods with easy-to-use and freely available software. ©2011 The British Psychological Society.

  2. An internal pilot design for prospective cancer screening trials with unknown disease prevalence.

    PubMed

    Brinton, John T; Ringham, Brandy M; Glueck, Deborah H

    2015-10-13

    For studies that compare the diagnostic accuracy of two screening tests, the sample size depends on the prevalence of disease in the study population, and on the variance of the outcome. Both parameters may be unknown during the design stage, which makes finding an accurate sample size difficult. To solve this problem, we propose adapting an internal pilot design. In this adapted design, researchers will accrue some percentage of the planned sample size, then estimate both the disease prevalence and the variances of the screening tests. The updated estimates of the disease prevalence and variance are used to conduct a more accurate power and sample size calculation. We demonstrate that in large samples, the adapted internal pilot design produces no Type I inflation. For small samples (N less than 50), we introduce a novel adjustment of the critical value to control the Type I error rate. We apply the method to two proposed prospective cancer screening studies: 1) a small oral cancer screening study in individuals with Fanconi anemia and 2) a large oral cancer screening trial. Conducting an internal pilot study without adjusting the critical value can cause Type I error rate inflation in small samples, but not in large samples. An internal pilot approach usually achieves goal power and, for most studies with sample size greater than 50, requires no Type I error correction. Further, we have provided a flexible and accurate approach to bound Type I error below a goal level for studies with small sample size.

  3. Sampling effort and estimates of species richness based on prepositioned area electrofisher samples

    USGS Publications Warehouse

    Bowen, Z.H.; Freeman, Mary C.

    1998-01-01

    Estimates of species richness based on electrofishing data are commonly used to describe the structure of fish communities. One electrofishing method for sampling riverine fishes that has become popular in the last decade is the prepositioned area electrofisher (PAE). We investigated the relationship between sampling effort and fish species richness at seven sites in the Tallapoosa River system, USA based on 1,400 PAE samples collected during 1994 and 1995. First, we estimated species richness at each site using the first-order jackknife and compared observed values for species richness and jackknife estimates of species richness to estimates based on historical collection data. Second, we used a permutation procedure and nonlinear regression to examine rates of species accumulation. Third, we used regression to predict the number of PAE samples required to collect the jackknife estimate of species richness at each site during 1994 and 1995. We found that jackknife estimates of species richness generally were less than or equal to estimates based on historical collection data. The relationship between PAE electrofishing effort and species richness in the Tallapoosa River was described by a positive asymptotic curve as found in other studies using different electrofishing gears in wadable streams. Results from nonlinear regression analyses indicted that rates of species accumulation were variable among sites and between years. Across sites and years, predictions of sampling effort required to collect jackknife estimates of species richness suggested that doubling sampling effort (to 200 PAEs) would typically increase observed species richness by not more than six species. However, sampling effort beyond about 60 PAE samples typically increased observed species richness by < 10%. We recommend using historical collection data in conjunction with a preliminary sample size of at least 70 PAE samples to evaluate estimates of species richness in medium-sized rivers. Seventy PAE samples should provide enough information to describe the relationship between sampling effort and species richness and thus facilitate evaluation of a sampling effort.

  4. Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA.

    PubMed

    Kelly, Brendan J; Gross, Robert; Bittinger, Kyle; Sherrill-Mix, Scott; Lewis, James D; Collman, Ronald G; Bushman, Frederic D; Li, Hongzhe

    2015-08-01

    The variation in community composition between microbiome samples, termed beta diversity, can be measured by pairwise distance based on either presence-absence or quantitative species abundance data. PERMANOVA, a permutation-based extension of multivariate analysis of variance to a matrix of pairwise distances, partitions within-group and between-group distances to permit assessment of the effect of an exposure or intervention (grouping factor) upon the sampled microbiome. Within-group distance and exposure/intervention effect size must be accurately modeled to estimate statistical power for a microbiome study that will be analyzed with pairwise distances and PERMANOVA. We present a framework for PERMANOVA power estimation tailored to marker-gene microbiome studies that will be analyzed by pairwise distances, which includes: (i) a novel method for distance matrix simulation that permits modeling of within-group pairwise distances according to pre-specified population parameters; (ii) a method to incorporate effects of different sizes within the simulated distance matrix; (iii) a simulation-based method for estimating PERMANOVA power from simulated distance matrices; and (iv) an R statistical software package that implements the above. Matrices of pairwise distances can be efficiently simulated to satisfy the triangle inequality and incorporate group-level effects, which are quantified by the adjusted coefficient of determination, omega-squared (ω2). From simulated distance matrices, available PERMANOVA power or necessary sample size can be estimated for a planned microbiome study. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. Body mass estimates of hominin fossils and the evolution of human body size.

    PubMed

    Grabowski, Mark; Hatala, Kevin G; Jungers, William L; Richmond, Brian G

    2015-08-01

    Body size directly influences an animal's place in the natural world, including its energy requirements, home range size, relative brain size, locomotion, diet, life history, and behavior. Thus, an understanding of the biology of extinct organisms, including species in our own lineage, requires accurate estimates of body size. Since the last major review of hominin body size based on postcranial morphology over 20 years ago, new fossils have been discovered, species attributions have been clarified, and methods improved. Here, we present the most comprehensive and thoroughly vetted set of individual fossil hominin body mass predictions to date, and estimation equations based on a large (n = 220) sample of modern humans of known body masses. We also present species averages based exclusively on fossils with reliable taxonomic attributions, estimates of species averages by sex, and a metric for levels of sexual dimorphism. Finally, we identify individual traits that appear to be the most reliable for mass estimation for each fossil species, for use when only one measurement is available for a fossil. Our results show that many early hominins were generally smaller-bodied than previously thought, an outcome likely due to larger estimates in previous studies resulting from the use of large-bodied modern human reference samples. Current evidence indicates that modern human-like large size first appeared by at least 3-3.5 Ma in some Australopithecus afarensis individuals. Our results challenge an evolutionary model arguing that body size increased from Australopithecus to early Homo. Instead, we show that there is no reliable evidence that the body size of non-erectus early Homo differed from that of australopiths, and confirm that Homo erectus evolved larger average body size than earlier hominins. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions

    NASA Technical Reports Server (NTRS)

    Peters, B. C., Jr.; Walker, H. F.

    1978-01-01

    This paper addresses the problem of obtaining numerically maximum-likelihood estimates of the parameters for a mixture of normal distributions. In recent literature, a certain successive-approximations procedure, based on the likelihood equations, was shown empirically to be effective in numerically approximating such maximum-likelihood estimates; however, the reliability of this procedure was not established theoretically. Here, we introduce a general iterative procedure, of the generalized steepest-ascent (deflected-gradient) type, which is just the procedure known in the literature when the step-size is taken to be 1. We show that, with probability 1 as the sample size grows large, this procedure converges locally to the strongly consistent maximum-likelihood estimate whenever the step-size lies between 0 and 2. We also show that the step-size which yields optimal local convergence rates for large samples is determined in a sense by the 'separation' of the component normal densities and is bounded below by a number between 1 and 2.

  7. An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions, 2

    NASA Technical Reports Server (NTRS)

    Peters, B. C., Jr.; Walker, H. F.

    1976-01-01

    The problem of obtaining numerically maximum likelihood estimates of the parameters for a mixture of normal distributions is addressed. In recent literature, a certain successive approximations procedure, based on the likelihood equations, is shown empirically to be effective in numerically approximating such maximum-likelihood estimates; however, the reliability of this procedure was not established theoretically. Here, a general iterative procedure is introduced, of the generalized steepest-ascent (deflected-gradient) type, which is just the procedure known in the literature when the step-size is taken to be 1. With probability 1 as the sample size grows large, it is shown that this procedure converges locally to the strongly consistent maximum-likelihood estimate whenever the step-size lies between 0 and 2. The step-size which yields optimal local convergence rates for large samples is determined in a sense by the separation of the component normal densities and is bounded below by a number between 1 and 2.

  8. Overlap between treatment and control distributions as an effect size measure in experiments.

    PubMed

    Hedges, Larry V; Olkin, Ingram

    2016-03-01

    The proportion π of treatment group observations that exceed the control group mean has been proposed as an effect size measure for experiments that randomly assign independent units into 2 groups. We give the exact distribution of a simple estimator of π based on the standardized mean difference and use it to study the small sample bias of this estimator. We also give the minimum variance unbiased estimator of π under 2 models, one in which the variance of the mean difference is known and one in which the variance is unknown. We show how to use the relation between the standardized mean difference and the overlap measure to compute confidence intervals for π and show that these results can be used to obtain unbiased estimators, large sample variances, and confidence intervals for 3 related effect size measures based on the overlap. Finally, we show how the effect size π can be used in a meta-analysis. (c) 2016 APA, all rights reserved).

  9. Using satellite imagery as ancillary data for increasing the precision of estimates for the Forest Inventory and Analysis program of the USDA Forest Service

    Treesearch

    Ronald E. McRoberts; Geoffrey R. Holden; Mark D. Nelson; Greg C. Liknes; Dale D. Gormanson

    2006-01-01

    Forest inventory programs report estimates of forest variables for areas of interest ranging in size from municipalities, to counties, to states or provinces. Because of numerous factors, sample sizes are often insufficient to estimate attributes as precisely as is desired, unless the estimation process is enhanced using ancillary data. Classified satellite imagery has...

  10. Sampling hazelnuts for aflatoxin: uncertainty associated with sampling, sample preparation, and analysis.

    PubMed

    Ozay, Guner; Seyhan, Ferda; Yilmaz, Aysun; Whitaker, Thomas B; Slate, Andrew B; Giesbrecht, Francis

    2006-01-01

    The variability associated with the aflatoxin test procedure used to estimate aflatoxin levels in bulk shipments of hazelnuts was investigated. Sixteen 10 kg samples of shelled hazelnuts were taken from each of 20 lots that were suspected of aflatoxin contamination. The total variance associated with testing shelled hazelnuts was estimated and partitioned into sampling, sample preparation, and analytical variance components. Each variance component increased as aflatoxin concentration (either B1 or total) increased. With the use of regression analysis, mathematical expressions were developed to model the relationship between aflatoxin concentration and the total, sampling, sample preparation, and analytical variances. The expressions for these relationships were used to estimate the variance for any sample size, subsample size, and number of analyses for a specific aflatoxin concentration. The sampling, sample preparation, and analytical variances associated with estimating aflatoxin in a hazelnut lot at a total aflatoxin level of 10 ng/g and using a 10 kg sample, a 50 g subsample, dry comminution with a Robot Coupe mill, and a high-performance liquid chromatographic analytical method are 174.40, 0.74, and 0.27, respectively. The sampling, sample preparation, and analytical steps of the aflatoxin test procedure accounted for 99.4, 0.4, and 0.2% of the total variability, respectively.

  11. Deep sea animal density and size estimated using a Dual-frequency IDentification SONar (DIDSON) offshore the island of Hawaii

    NASA Astrophysics Data System (ADS)

    Giorli, Giacomo; Drazen, Jeffrey C.; Neuheimer, Anna B.; Copeland, Adrienne; Au, Whitlow W. L.

    2018-01-01

    Pelagic animals that form deep sea scattering layers (DSLs) represent an important link in the food web between zooplankton and top predators. While estimating the composition, density and location of the DSL is important to understand mesopelagic ecosystem dynamics and to predict top predators' distribution, DSL composition and density are often estimated from trawls which may be biased in terms of extrusion, avoidance, and gear-associated biases. Instead, location and biomass of DSLs can be estimated from active acoustic techniques, though estimates are often in aggregate without regard to size or taxon specific information. For the first time in the open ocean, we used a DIDSON sonar to characterize the fauna in DSLs. Estimates of the numerical density and length of animals at different depths and locations along the Kona coast of the Island of Hawaii were determined. Data were collected below and inside the DSLs with the sonar mounted on a profiler. A total of 7068 animals were counted and sized. We estimated numerical densities ranging from 1 to 7 animals/m3 and individuals as long as 3 m were detected. These numerical densities were orders of magnitude higher than those estimated from trawls and average sizes of animals were much larger as well. A mixed model was used to characterize numerical density and length of animals as a function of deep sea layer sampled, location, time of day, and day of the year. Numerical density and length of animals varied by month, with numerical density also a function of depth. The DIDSON proved to be a good tool for open-ocean/deep-sea estimation of the numerical density and size of marine animals, especially larger ones. Further work is needed to understand how this methodology relates to estimates of volume backscatters obtained with standard echosounding techniques, density measures obtained with other sampling methodologies, and to precisely evaluate sampling biases.

  12. Sample Size Methods for Estimating HIV Incidence from Cross-Sectional Surveys

    PubMed Central

    Brookmeyer, Ron

    2015-01-01

    Summary Understanding HIV incidence, the rate at which new infections occur in populations, is critical for tracking and surveillance of the epidemic. In this paper we derive methods for determining sample sizes for cross-sectional surveys to estimate incidence with sufficient precision. We further show how to specify sample sizes for two successive cross-sectional surveys to detect changes in incidence with adequate power. In these surveys biomarkers such as CD4 cell count, viral load, and recently developed serological assays are used to determine which individuals are in an early disease stage of infection. The total number of individuals in this stage, divided by the number of people who are uninfected, is used to approximate the incidence rate. Our methods account for uncertainty in the durations of time spent in the biomarker defined early disease stage. We find that failure to account for this uncertainty when designing surveys can lead to imprecise estimates of incidence and underpowered studies. We evaluated our sample size methods in simulations and found that they performed well in a variety of underlying epidemics. Code for implementing our methods in R is available with this paper at the Biometrics website on Wiley Online Library. PMID:26302040

  13. Sample size methods for estimating HIV incidence from cross-sectional surveys.

    PubMed

    Konikoff, Jacob; Brookmeyer, Ron

    2015-12-01

    Understanding HIV incidence, the rate at which new infections occur in populations, is critical for tracking and surveillance of the epidemic. In this article, we derive methods for determining sample sizes for cross-sectional surveys to estimate incidence with sufficient precision. We further show how to specify sample sizes for two successive cross-sectional surveys to detect changes in incidence with adequate power. In these surveys biomarkers such as CD4 cell count, viral load, and recently developed serological assays are used to determine which individuals are in an early disease stage of infection. The total number of individuals in this stage, divided by the number of people who are uninfected, is used to approximate the incidence rate. Our methods account for uncertainty in the durations of time spent in the biomarker defined early disease stage. We find that failure to account for this uncertainty when designing surveys can lead to imprecise estimates of incidence and underpowered studies. We evaluated our sample size methods in simulations and found that they performed well in a variety of underlying epidemics. Code for implementing our methods in R is available with this article at the Biometrics website on Wiley Online Library. © 2015, The International Biometric Society.

  14. Bootstrap Estimation of Sample Statistic Bias in Structural Equation Modeling.

    ERIC Educational Resources Information Center

    Thompson, Bruce; Fan, Xitao

    This study empirically investigated bootstrap bias estimation in the area of structural equation modeling (SEM). Three correctly specified SEM models were used under four different sample size conditions. Monte Carlo experiments were carried out to generate the criteria against which bootstrap bias estimation should be judged. For SEM fit indices,…

  15. Parameter Estimation with Small Sample Size: A Higher-Order IRT Model Approach

    ERIC Educational Resources Information Center

    de la Torre, Jimmy; Hong, Yuan

    2010-01-01

    Sample size ranks as one of the most important factors that affect the item calibration task. However, due to practical concerns (e.g., item exposure) items are typically calibrated with much smaller samples than what is desired. To address the need for a more flexible framework that can be used in small sample item calibration, this article…

  16. Estimation and applications of size-biased distributions in forestry

    Treesearch

    Jeffrey H. Gove

    2003-01-01

    Size-biased distributions arise naturally in several contexts in forestry and ecology. Simple power relationships (e.g. basal area and diameter at breast height) between variables are one such area of interest arising from a modelling perspective. Another, probability proportional to size PPS) sampling, is found in the most widely used methods for sampling standing or...

  17. 40 CFR Table F-5 to Subpart F of... - Estimated Mass Concentration Measurement of PM2.5 for Idealized “Typical” Coarse Aerosol Size...

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 5 2011-07-01 2011-07-01 false Estimated Mass Concentration... 53—Estimated Mass Concentration Measurement of PM2.5 for Idealized “Typical” Coarse Aerosol Size Distribution Particle Aerodynamic Diameter (µm) Test Sampler Fractional Sampling Effectiveness Interval Mass...

  18. 40 CFR Table F-5 to Subpart F of... - Estimated Mass Concentration Measurement of PM2.5 for Idealized “Typical” Coarse Aerosol Size...

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 5 2010-07-01 2010-07-01 false Estimated Mass Concentration... 53—Estimated Mass Concentration Measurement of PM2.5 for Idealized “Typical” Coarse Aerosol Size Distribution Particle Aerodynamic Diameter (µm) Test Sampler Fractional Sampling Effectiveness Interval Mass...

  19. Determination of Minimum Training Sample Size for Microarray-Based Cancer Outcome Prediction–An Empirical Assessment

    PubMed Central

    Cheng, Ningtao; Wu, Leihong; Cheng, Yiyu

    2013-01-01

    The promise of microarray technology in providing prediction classifiers for cancer outcome estimation has been confirmed by a number of demonstrable successes. However, the reliability of prediction results relies heavily on the accuracy of statistical parameters involved in classifiers. It cannot be reliably estimated with only a small number of training samples. Therefore, it is of vital importance to determine the minimum number of training samples and to ensure the clinical value of microarrays in cancer outcome prediction. We evaluated the impact of training sample size on model performance extensively based on 3 large-scale cancer microarray datasets provided by the second phase of MicroArray Quality Control project (MAQC-II). An SSNR-based (scale of signal-to-noise ratio) protocol was proposed in this study for minimum training sample size determination. External validation results based on another 3 cancer datasets confirmed that the SSNR-based approach could not only determine the minimum number of training samples efficiently, but also provide a valuable strategy for estimating the underlying performance of classifiers in advance. Once translated into clinical routine applications, the SSNR-based protocol would provide great convenience in microarray-based cancer outcome prediction in improving classifier reliability. PMID:23861920

  20. Unfolding sphere size distributions with a density estimator based on Tikhonov regularization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Weese, J.; Korat, E.; Maier, D.

    1997-12-01

    This report proposes a method for unfolding sphere size distributions given a sample of radii that combines the advantages of a density estimator with those of Tikhonov regularization methods. The following topics are discusses in this report to achieve this method: the relation between the profile and the sphere size distribution; the method for unfolding sphere size distributions; the results based on simulations; and the experimental data comparison.

  1. Maximum inflation of the type 1 error rate when sample size and allocation rate are adapted in a pre-planned interim look.

    PubMed

    Graf, Alexandra C; Bauer, Peter

    2011-06-30

    We calculate the maximum type 1 error rate of the pre-planned conventional fixed sample size test for comparing the means of independent normal distributions (with common known variance) which can be yielded when sample size and allocation rate to the treatment arms can be modified in an interim analysis. Thereby it is assumed that the experimenter fully exploits knowledge of the unblinded interim estimates of the treatment effects in order to maximize the conditional type 1 error rate. The 'worst-case' strategies require knowledge of the unknown common treatment effect under the null hypothesis. Although this is a rather hypothetical scenario it may be approached in practice when using a standard control treatment for which precise estimates are available from historical data. The maximum inflation of the type 1 error rate is substantially larger than derived by Proschan and Hunsberger (Biometrics 1995; 51:1315-1324) for design modifications applying balanced samples before and after the interim analysis. Corresponding upper limits for the maximum type 1 error rate are calculated for a number of situations arising from practical considerations (e.g. restricting the maximum sample size, not allowing sample size to decrease, allowing only increase in the sample size in the experimental treatment). The application is discussed for a motivating example. Copyright © 2011 John Wiley & Sons, Ltd.

  2. Problems with sampling desert tortoises: A simulation analysis based on field data

    USGS Publications Warehouse

    Freilich, J.E.; Camp, R.J.; Duda, J.J.; Karl, A.E.

    2005-01-01

    The desert tortoise (Gopherus agassizii) was listed as a U.S. threatened species in 1990 based largely on population declines inferred from mark-recapture surveys of 2.59-km2 (1-mi2) plots. Since then, several census methods have been proposed and tested, but all methods still pose logistical or statistical difficulties. We conducted computer simulations using actual tortoise location data from 2 1-mi2 plot surveys in southern California, USA, to identify strengths and weaknesses of current sampling strategies. We considered tortoise population estimates based on these plots as "truth" and then tested various sampling methods based on sampling smaller plots or transect lines passing through the mile squares. Data were analyzed using Schnabel's mark-recapture estimate and program CAPTURE. Experimental subsampling with replacement of the 1-mi2 data using 1-km2 and 0.25-km2 plot boundaries produced data sets of smaller plot sizes, which we compared to estimates from the 1-mi 2 plots. We also tested distance sampling by saturating a 1-mi 2 site with computer simulated transect lines, once again evaluating bias in density estimates. Subsampling estimates from 1-km2 plots did not differ significantly from the estimates derived at 1-mi2. The 0.25-km2 subsamples significantly overestimated population sizes, chiefly because too few recaptures were made. Distance sampling simulations were biased 80% of the time and had high coefficient of variation to density ratios. Furthermore, a prospective power analysis suggested limited ability to detect population declines as high as 50%. We concluded that poor performance and bias of both sampling procedures was driven by insufficient sample size, suggesting that all efforts must be directed to increasing numbers found in order to produce reliable results. Our results suggest that present methods may not be capable of accurately estimating desert tortoise populations.

  3. Effect of Small Numbers of Test Results on Accuracy of Hoek-Brown Strength Parameter Estimations: A Statistical Simulation Study

    NASA Astrophysics Data System (ADS)

    Bozorgzadeh, Nezam; Yanagimura, Yoko; Harrison, John P.

    2017-12-01

    The Hoek-Brown empirical strength criterion for intact rock is widely used as the basis for estimating the strength of rock masses. Estimations of the intact rock H-B parameters, namely the empirical constant m and the uniaxial compressive strength σc, are commonly obtained by fitting the criterion to triaxial strength data sets of small sample size. This paper investigates how such small sample sizes affect the uncertainty associated with the H-B parameter estimations. We use Monte Carlo (MC) simulation to generate data sets of different sizes and different combinations of H-B parameters, and then investigate the uncertainty in H-B parameters estimated from these limited data sets. We show that the uncertainties depend not only on the level of variability but also on the particular combination of parameters being investigated. As particular combinations of H-B parameters can informally be considered to represent specific rock types, we discuss that as the minimum number of required samples depends on rock type it should correspond to some acceptable level of uncertainty in the estimations. Also, a comparison of the results from our analysis with actual rock strength data shows that the probability of obtaining reliable strength parameter estimations using small samples may be very low. We further discuss the impact of this on ongoing implementation of reliability-based design protocols and conclude with suggestions for improvements in this respect.

  4. Study samples are too small to produce sufficiently precise reliability coefficients.

    PubMed

    Charter, Richard A

    2003-04-01

    In a survey of journal articles, test manuals, and test critique books, the author found that a mean sample size (N) of 260 participants had been used for reliability studies on 742 tests. The distribution was skewed because the median sample size for the total sample was only 90. The median sample sizes for the internal consistency, retest, and interjudge reliabilities were 182, 64, and 36, respectively. The author presented sample size statistics for the various internal consistency methods and types of tests. In general, the author found that the sample sizes that were used in the internal consistency studies were too small to produce sufficiently precise reliability coefficients, which in turn could cause imprecise estimates of examinee true-score confidence intervals. The results also suggest that larger sample sizes have been used in the last decade compared with those that were used in earlier decades.

  5. Using the Sampling Margin of Error to Assess the Interpretative Validity of Student Evaluations of Teaching

    ERIC Educational Resources Information Center

    James, David E.; Schraw, Gregory; Kuch, Fred

    2015-01-01

    We present an equation, derived from standard statistical theory, that can be used to estimate sampling margin of error for student evaluations of teaching (SETs). We use the equation to examine the effect of sample size, response rates and sample variability on the estimated sampling margin of error, and present results in four tables that allow…

  6. Stratum variance estimation for sample allocation in crop surveys. [Great Plains Corridor

    NASA Technical Reports Server (NTRS)

    Perry, C. R., Jr.; Chhikara, R. S. (Principal Investigator)

    1980-01-01

    The problem of determining stratum variances needed in achieving an optimum sample allocation for crop surveys by remote sensing is investigated by considering an approach based on the concept of stratum variance as a function of the sampling unit size. A methodology using the existing and easily available information of historical crop statistics is developed for obtaining initial estimates of tratum variances. The procedure is applied to estimate stratum variances for wheat in the U.S. Great Plains and is evaluated based on the numerical results thus obtained. It is shown that the proposed technique is viable and performs satisfactorily, with the use of a conservative value for the field size and the crop statistics from the small political subdivision level, when the estimated stratum variances were compared to those obtained using the LANDSAT data.

  7. Effects of model complexity and priors on estimation using sequential importance sampling/resampling for species conservation

    USGS Publications Warehouse

    Dunham, Kylee; Grand, James B.

    2016-01-01

    We examined the effects of complexity and priors on the accuracy of models used to estimate ecological and observational processes, and to make predictions regarding population size and structure. State-space models are useful for estimating complex, unobservable population processes and making predictions about future populations based on limited data. To better understand the utility of state space models in evaluating population dynamics, we used them in a Bayesian framework and compared the accuracy of models with differing complexity, with and without informative priors using sequential importance sampling/resampling (SISR). Count data were simulated for 25 years using known parameters and observation process for each model. We used kernel smoothing to reduce the effect of particle depletion, which is common when estimating both states and parameters with SISR. Models using informative priors estimated parameter values and population size with greater accuracy than their non-informative counterparts. While the estimates of population size and trend did not suffer greatly in models using non-informative priors, the algorithm was unable to accurately estimate demographic parameters. This model framework provides reasonable estimates of population size when little to no information is available; however, when information on some vital rates is available, SISR can be used to obtain more precise estimates of population size and process. Incorporating model complexity such as that required by structured populations with stage-specific vital rates affects precision and accuracy when estimating latent population variables and predicting population dynamics. These results are important to consider when designing monitoring programs and conservation efforts requiring management of specific population segments.

  8. Effect of Study Design on Sample Size in Studies Intended to Evaluate Bioequivalence of Inhaled Short‐Acting β‐Agonist Formulations

    PubMed Central

    Zeng, Yaohui; Singh, Sachinkumar; Wang, Kai

    2017-01-01

    Abstract Pharmacodynamic studies that use methacholine challenge to assess bioequivalence of generic and innovator albuterol formulations are generally designed per published Food and Drug Administration guidance, with 3 reference doses and 1 test dose (3‐by‐1 design). These studies are challenging and expensive to conduct, typically requiring large sample sizes. We proposed 14 modified study designs as alternatives to the Food and Drug Administration–recommended 3‐by‐1 design, hypothesizing that adding reference and/or test doses would reduce sample size and cost. We used Monte Carlo simulation to estimate sample size. Simulation inputs were selected based on published studies and our own experience with this type of trial. We also estimated effects of these modified study designs on study cost. Most of these altered designs reduced sample size and cost relative to the 3‐by‐1 design, some decreasing cost by more than 40%. The most effective single study dose to add was 180 μg of test formulation, which resulted in an estimated 30% relative cost reduction. Adding a single test dose of 90 μg was less effective, producing only a 13% cost reduction. Adding a lone reference dose of either 180, 270, or 360 μg yielded little benefit (less than 10% cost reduction), whereas adding 720 μg resulted in a 19% cost reduction. Of the 14 study design modifications we evaluated, the most effective was addition of both a 90‐μg test dose and a 720‐μg reference dose (42% cost reduction). Combining a 180‐μg test dose and a 720‐μg reference dose produced an estimated 36% cost reduction. PMID:29281130

  9. Joint inversion of NMR and SIP data to estimate pore size distribution of geomaterials

    NASA Astrophysics Data System (ADS)

    Niu, Qifei; Zhang, Chi

    2018-03-01

    There are growing interests in using geophysical tools to characterize the microstructure of geomaterials because of the non-invasive nature and the applicability in field. In these applications, multiple types of geophysical data sets are usually processed separately, which may be inadequate to constrain the key feature of target variables. Therefore, simultaneous processing of multiple data sets could potentially improve the resolution. In this study, we propose a method to estimate pore size distribution by joint inversion of nuclear magnetic resonance (NMR) T2 relaxation and spectral induced polarization (SIP) spectra. The petrophysical relation between NMR T2 relaxation time and SIP relaxation time is incorporated in a nonlinear least squares problem formulation, which is solved using Gauss-Newton method. The joint inversion scheme is applied to a synthetic sample and a Berea sandstone sample. The jointly estimated pore size distributions are very close to the true model and results from other experimental method. Even when the knowledge of the petrophysical models of the sample is incomplete, the joint inversion can still capture the main features of the pore size distribution of the samples, including the general shape and relative peak positions of the distribution curves. It is also found from the numerical example that the surface relaxivity of the sample could be extracted with the joint inversion of NMR and SIP data if the diffusion coefficient of the ions in the electrical double layer is known. Comparing to individual inversions, the joint inversion could improve the resolution of the estimated pore size distribution because of the addition of extra data sets. The proposed approach might constitute a first step towards a comprehensive joint inversion that can extract the full pore geometry information of a geomaterial from NMR and SIP data.

  10. Sample size in studies on diagnostic accuracy in ophthalmology: a literature survey.

    PubMed

    Bochmann, Frank; Johnson, Zoe; Azuara-Blanco, Augusto

    2007-07-01

    To assess the sample sizes used in studies on diagnostic accuracy in ophthalmology. Design and sources: A survey literature published in 2005. The frequency of reporting calculations of sample sizes and the samples' sizes were extracted from the published literature. A manual search of five leading clinical journals in ophthalmology with the highest impact (Investigative Ophthalmology and Visual Science, Ophthalmology, Archives of Ophthalmology, American Journal of Ophthalmology and British Journal of Ophthalmology) was conducted by two independent investigators. A total of 1698 articles were identified, of which 40 studies were on diagnostic accuracy. One study reported that sample size was calculated before initiating the study. Another study reported consideration of sample size without calculation. The mean (SD) sample size of all diagnostic studies was 172.6 (218.9). The median prevalence of the target condition was 50.5%. Only a few studies consider sample size in their methods. Inadequate sample sizes in diagnostic accuracy studies may result in misleading estimates of test accuracy. An improvement over the current standards on the design and reporting of diagnostic studies is warranted.

  11. An estimate of field size distributions for selected sites in the major grain producing countries

    NASA Technical Reports Server (NTRS)

    Podwysocki, M. H.

    1977-01-01

    The field size distributions for the major grain producing countries of the World were estimated. LANDSAT-1 and 2 images were evaluated for two areas each in the United States, People's Republic of China, and the USSR. One scene each was evaluated for France, Canada, and India. Grid sampling was done for representative sub-samples of each image, measuring the long and short axes of each field; area was then calculated. Each of the resulting data sets was computer analyzed for their frequency distributions. Nearly all frequency distributions were highly peaked and skewed (shifted) towards small values, approaching that of either a Poisson or log-normal distribution. The data were normalized by a log transformation, creating a Gaussian distribution which has moments readily interpretable and useful for estimating the total population of fields. Resultant predictors of the field size estimates are discussed.

  12. Long-term effective population size dynamics of an intensively monitored vertebrate population

    PubMed Central

    Mueller, A-K; Chakarov, N; Krüger, O; Hoffman, J I

    2016-01-01

    Long-term genetic data from intensively monitored natural populations are important for understanding how effective population sizes (Ne) can vary over time. We therefore genotyped 1622 common buzzard (Buteo buteo) chicks sampled over 12 consecutive years (2002–2013 inclusive) at 15 microsatellite loci. This data set allowed us to both compare single-sample with temporal approaches and explore temporal patterns in the effective number of parents that produced each cohort in relation to the observed population dynamics. We found reasonable consistency between linkage disequilibrium-based single-sample and temporal estimators, particularly during the latter half of the study, but no clear relationship between annual Ne estimates () and census sizes. We also documented a 14-fold increase in between 2008 and 2011, a period during which the census size doubled, probably reflecting a combination of higher adult survival and immigration from further afield. Our study thus reveals appreciable temporal heterogeneity in the effective population size of a natural vertebrate population, confirms the need for long-term studies and cautions against drawing conclusions from a single sample. PMID:27553455

  13. [Potentials in the regionalization of health indicators using small-area estimation methods : Exemplary results based on the 2009, 2010 and 2012 GEDA studies].

    PubMed

    Kroll, Lars Eric; Schumann, Maria; Müters, Stephan; Lampert, Thomas

    2017-12-01

    Nationwide health surveys can be used to estimate regional differences in health. Using traditional estimation techniques, the spatial depth for these estimates is limited due to the constrained sample size. So far - without special refreshment samples - results have only been available for larger populated federal states of Germany. An alternative is regression-based small-area estimation techniques. These models can generate smaller-scale data, but are also subject to greater statistical uncertainties because of the model assumptions. In the present article, exemplary regionalized results based on the studies "Gesundheit in Deutschland aktuell" (GEDA studies) 2009, 2010 and 2012, are compared to the self-rated health status of the respondents. The aim of the article is to analyze the range of regional estimates in order to assess the usefulness of the techniques for health reporting more adequately. The results show that the estimated prevalence is relatively stable when using different samples. Important determinants of the variation of the estimates are the achieved sample size on the district level and the type of the district (cities vs. rural regions). Overall, the present study shows that small-area modeling of prevalence is associated with additional uncertainties compared to conventional estimates, which should be taken into account when interpreting the corresponding findings.

  14. A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification.

    PubMed

    Jiang, Wenyu; Simon, Richard

    2007-12-20

    This paper first provides a critical review on some existing methods for estimating the prediction error in classifying microarray data where the number of genes greatly exceeds the number of specimens. Special attention is given to the bootstrap-related methods. When the sample size n is small, we find that all the reviewed methods suffer from either substantial bias or variability. We introduce a repeated leave-one-out bootstrap (RLOOB) method that predicts for each specimen in the sample using bootstrap learning sets of size ln. We then propose an adjusted bootstrap (ABS) method that fits a learning curve to the RLOOB estimates calculated with different bootstrap learning set sizes. The ABS method is robust across the situations we investigate and provides a slightly conservative estimate for the prediction error. Even with small samples, it does not suffer from large upward bias as the leave-one-out bootstrap and the 0.632+ bootstrap, and it does not suffer from large variability as the leave-one-out cross-validation in microarray applications. Copyright (c) 2007 John Wiley & Sons, Ltd.

  15. Modeling misidentification errors that result from use of genetic tags in capture-recapture studies

    USGS Publications Warehouse

    Yoshizaki, J.; Brownie, C.; Pollock, K.H.; Link, W.A.

    2011-01-01

    Misidentification of animals is potentially important when naturally existing features (natural tags) such as DNA fingerprints (genetic tags) are used to identify individual animals. For example, when misidentification leads to multiple identities being assigned to an animal, traditional estimators tend to overestimate population size. Accounting for misidentification in capture-recapture models requires detailed understanding of the mechanism. Using genetic tags as an example, we outline a framework for modeling the effect of misidentification in closed population studies when individual identification is based on natural tags that are consistent over time (non-evolving natural tags). We first assume a single sample is obtained per animal for each capture event, and then generalize to the case where multiple samples (such as hair or scat samples) are collected per animal per capture occasion. We introduce methods for estimating population size and, using a simulation study, we show that our new estimators perform well for cases with moderately high capture probabilities or high misidentification rates. In contrast, conventional estimators can seriously overestimate population size when errors due to misidentification are ignored. ?? 2009 Springer Science+Business Media, LLC.

  16. Accounting for imperfect detection of groups and individuals when estimating abundance.

    PubMed

    Clement, Matthew J; Converse, Sarah J; Royle, J Andrew

    2017-09-01

    If animals are independently detected during surveys, many methods exist for estimating animal abundance despite detection probabilities <1. Common estimators include double-observer models, distance sampling models and combined double-observer and distance sampling models (known as mark-recapture-distance-sampling models; MRDS). When animals reside in groups, however, the assumption of independent detection is violated. In this case, the standard approach is to account for imperfect detection of groups, while assuming that individuals within groups are detected perfectly. However, this assumption is often unsupported. We introduce an abundance estimator for grouped animals when detection of groups is imperfect and group size may be under-counted, but not over-counted. The estimator combines an MRDS model with an N-mixture model to account for imperfect detection of individuals. The new MRDS-Nmix model requires the same data as an MRDS model (independent detection histories, an estimate of distance to transect, and an estimate of group size), plus a second estimate of group size provided by the second observer. We extend the model to situations in which detection of individuals within groups declines with distance. We simulated 12 data sets and used Bayesian methods to compare the performance of the new MRDS-Nmix model to an MRDS model. Abundance estimates generated by the MRDS-Nmix model exhibited minimal bias and nominal coverage levels. In contrast, MRDS abundance estimates were biased low and exhibited poor coverage. Many species of conservation interest reside in groups and could benefit from an estimator that better accounts for imperfect detection. Furthermore, the ability to relax the assumption of perfect detection of individuals within detected groups may allow surveyors to re-allocate resources toward detection of new groups instead of extensive surveys of known groups. We believe the proposed estimator is feasible because the only additional field data required are a second estimate of group size.

  17. Accounting for imperfect detection of groups and individuals when estimating abundance

    USGS Publications Warehouse

    Clement, Matthew J.; Converse, Sarah J.; Royle, J. Andrew

    2017-01-01

    If animals are independently detected during surveys, many methods exist for estimating animal abundance despite detection probabilities <1. Common estimators include double-observer models, distance sampling models and combined double-observer and distance sampling models (known as mark-recapture-distance-sampling models; MRDS). When animals reside in groups, however, the assumption of independent detection is violated. In this case, the standard approach is to account for imperfect detection of groups, while assuming that individuals within groups are detected perfectly. However, this assumption is often unsupported. We introduce an abundance estimator for grouped animals when detection of groups is imperfect and group size may be under-counted, but not over-counted. The estimator combines an MRDS model with an N-mixture model to account for imperfect detection of individuals. The new MRDS-Nmix model requires the same data as an MRDS model (independent detection histories, an estimate of distance to transect, and an estimate of group size), plus a second estimate of group size provided by the second observer. We extend the model to situations in which detection of individuals within groups declines with distance. We simulated 12 data sets and used Bayesian methods to compare the performance of the new MRDS-Nmix model to an MRDS model. Abundance estimates generated by the MRDS-Nmix model exhibited minimal bias and nominal coverage levels. In contrast, MRDS abundance estimates were biased low and exhibited poor coverage. Many species of conservation interest reside in groups and could benefit from an estimator that better accounts for imperfect detection. Furthermore, the ability to relax the assumption of perfect detection of individuals within detected groups may allow surveyors to re-allocate resources toward detection of new groups instead of extensive surveys of known groups. We believe the proposed estimator is feasible because the only additional field data required are a second estimate of group size.

  18. How accurate is the Pearson r-from-Z approximation? A Monte Carlo simulation study.

    PubMed

    Hittner, James B; May, Kim

    2012-01-01

    The Pearson r-from-Z approximation estimates the sample correlation (as an effect size measure) from the ratio of two quantities: the standard normal deviate equivalent (Z-score) corresponding to a one-tailed p-value divided by the square root of the total (pooled) sample size. The formula has utility in meta-analytic work when reports of research contain minimal statistical information. Although simple to implement, the accuracy of the Pearson r-from-Z approximation has not been empirically evaluated. To address this omission, we performed a series of Monte Carlo simulations. Results indicated that in some cases the formula did accurately estimate the sample correlation. However, when sample size was very small (N = 10) and effect sizes were small to small-moderate (ds of 0.1 and 0.3), the Pearson r-from-Z approximation was very inaccurate. Detailed figures that provide guidance as to when the Pearson r-from-Z formula will likely yield valid inferences are presented.

  19. Inference from single occasion capture experiments using genetic markers.

    PubMed

    Hettiarachchige, Chathurika K H; Huggins, Richard M

    2018-05-01

    Accurate estimation of the size of animal populations is an important task in ecological science. Recent advances in the field of molecular genetics researches allow the use of genetic data to estimate the size of a population from a single capture occasion rather than repeated occasions as in the usual capture-recapture experiments. Estimating the population size using genetic data also has sometimes led to estimates that differ markedly from each other and also from classical capture-recapture estimates. Here, we develop a closed form estimator that uses genetic information to estimate the size of a population consisting of mothers and daughters, focusing on estimating the number of mothers, using data from a single sample. We demonstrate the estimator is consistent and propose a parametric bootstrap to estimate the standard errors. The estimator is evaluated in a simulation study and applied to real data. We also consider maximum likelihood in this setting and discover problems that preclude its general use. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Small-Sample DIF Estimation Using SIBTEST, Cochran's Z, and Log-Linear Smoothing

    ERIC Educational Resources Information Center

    Lei, Pui-Wa; Li, Hongli

    2013-01-01

    Minimum sample sizes of about 200 to 250 per group are often recommended for differential item functioning (DIF) analyses. However, there are times when sample sizes for one or both groups of interest are smaller than 200 due to practical constraints. This study attempts to examine the performance of Simultaneous Item Bias Test (SIBTEST),…

  1. Conditional Optimal Design in Three- and Four-Level Experiments

    ERIC Educational Resources Information Center

    Hedges, Larry V.; Borenstein, Michael

    2014-01-01

    The precision of estimates of treatment effects in multilevel experiments depends on the sample sizes chosen at each level. It is often desirable to choose sample sizes at each level to obtain the smallest variance for a fixed total cost, that is, to obtain optimal sample allocation. This article extends previous results on optimal allocation to…

  2. Sequential ensemble-based optimal design for parameter estimation: SEQUENTIAL ENSEMBLE-BASED OPTIMAL DESIGN

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Man, Jun; Zhang, Jiangjiang; Li, Weixuan

    2016-10-01

    The ensemble Kalman filter (EnKF) has been widely used in parameter estimation for hydrological models. The focus of most previous studies was to develop more efficient analysis (estimation) algorithms. On the other hand, it is intuitively understandable that a well-designed sampling (data-collection) strategy should provide more informative measurements and subsequently improve the parameter estimation. In this work, a Sequential Ensemble-based Optimal Design (SEOD) method, coupled with EnKF, information theory and sequential optimal design, is proposed to improve the performance of parameter estimation. Based on the first-order and second-order statistics, different information metrics including the Shannon entropy difference (SD), degrees ofmore » freedom for signal (DFS) and relative entropy (RE) are used to design the optimal sampling strategy, respectively. The effectiveness of the proposed method is illustrated by synthetic one-dimensional and two-dimensional unsaturated flow case studies. It is shown that the designed sampling strategies can provide more accurate parameter estimation and state prediction compared with conventional sampling strategies. Optimal sampling designs based on various information metrics perform similarly in our cases. The effect of ensemble size on the optimal design is also investigated. Overall, larger ensemble size improves the parameter estimation and convergence of optimal sampling strategy. Although the proposed method is applied to unsaturated flow problems in this study, it can be equally applied in any other hydrological problems.« less

  3. Evaluating the Impact of Guessing and Its Interactions With Other Test Characteristics on Confidence Interval Procedures for Coefficient Alpha

    PubMed Central

    Paek, Insu

    2015-01-01

    The effect of guessing on the point estimate of coefficient alpha has been studied in the literature, but the impact of guessing and its interactions with other test characteristics on the interval estimators for coefficient alpha has not been fully investigated. This study examined the impact of guessing and its interactions with other test characteristics on four confidence interval (CI) procedures for coefficient alpha in terms of coverage rate (CR), length, and the degree of asymmetry of CI estimates. In addition, interval estimates of coefficient alpha when data follow the essentially tau-equivalent condition were investigated as a supplement to the case of dichotomous data with examinee guessing. For dichotomous data with guessing, the results did not reveal salient negative effects of guessing and its interactions with other test characteristics (sample size, test length, coefficient alpha levels) on CR and the degree of asymmetry, but the effect of guessing was salient as a main effect and an interaction effect with sample size on the length of the CI estimates, making longer CI estimates as guessing increases, especially when combined with a small sample size. Other important effects (e.g., CI procedures on CR) are also discussed. PMID:29795863

  4. Human body mass estimation: a comparison of "morphometric" and "mechanical" methods.

    PubMed

    Auerbach, Benjamin M; Ruff, Christopher B

    2004-12-01

    In the past, body mass was reconstructed from hominin skeletal remains using both "mechanical" methods which rely on the support of body mass by weight-bearing skeletal elements, and "morphometric" methods which reconstruct body mass through direct assessment of body size and shape. A previous comparison of two such techniques, using femoral head breadth (mechanical) and stature and bi-iliac breadth (morphometric), indicated a good general correspondence between them (Ruff et al. [1997] Nature 387:173-176). However, the two techniques were never systematically compared across a large group of modern humans of diverse body form. This study incorporates skeletal measures taken from 1,173 Holocene adult individuals, representing diverse geographic origins, body sizes, and body shapes. Femoral head breadth, bi-iliac breadth (after pelvic rearticulation), and long bone lengths were measured on each individual. Statures were estimated from long bone lengths using appropriate reference samples. Body masses were calculated using three available femoral head breadth (FH) formulae and the stature/bi-iliac breadth (STBIB) formula, and compared. All methods yielded similar results. Correlations between FH estimates and STBIB estimates are 0.74-0.81. Slight differences in results between the three FH estimates can be attributed to sampling differences in the original reference samples, and in particular, the body-size ranges included in those samples. There is no evidence for systematic differences in results due to differences in body proportions. Since the STBIB method was validated on other samples, and the FH methods produced similar estimates, this argues that either may be applied to skeletal remains with some confidence. 2004 Wiley-Liss, Inc.

  5. Heritabilities of measured and mid-infrared predicted milk fat globule size, milk fat and protein percentages, and their genetic correlations.

    PubMed

    Fleming, A; Schenkel, F S; Koeck, A; Malchiodi, F; Ali, R A; Corredig, M; Mallard, B; Sargolzaei, M; Miglior, F

    2017-05-01

    The objective of this study was to estimate the heritability of milk fat globule (MFG) size and mid-infrared (MIR) predicted MFG size in Holstein cattle. The genetic correlations between measured and predicted MFG size with milk fat and protein percentage were also investigated. Average MFG size was measured in 1,583 milk samples taken from 254 Holstein cows from 29 herds across Canada. Size was expressed as volume moment mean (D[4,3]) and surface moment mean (D[3,2]). Analyzed milk samples also had average MFG size predicted from their MIR spectral records. Fat and protein percentages were obtained for all test-day milk samples in the cow's lactation. Univariate and bivariate repeatability animal models were used to estimate heritability and genetic correlations. Moderate heritabilities of 0.364 and 0.466 were found for D[4,3] and D[3,2], respectively, and a strong genetic correlation was found between the 2 traits (0.98). The heritabilities for the MIR-predicted MFG size were lower than those estimated for the measured MFG size at 0.300 for predicted D[4,3] and 0.239 for predicted D[3,2]. The genetic correlation between measured and predicted D[4,3] was 0.685; the correlation was slightly higher between measured and predicted D[3,2] at 0.764, likely due to the better prediction accuracy of D[3,2]. Milk fat percentage had moderate genetic correlations with both D[4,3] and D[3,2] (0.538 and 0.681, respectively). The genetic correlation between predicted MFG size and fat percentage was much stronger (greater than 0.97 for both predicted D[4,3] and D[3,2]). The stronger correlation suggests a limitation for the use of the predicted values of MFG size as indicator traits for true average MFG size in milk in selection programs. Larger samples sizes are required to provide better evidence of the estimated genetic parameters. A genetic component appears to exist for the average MFG size in bovine milk, and the variation could be exploited in selection programs. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  6. Field substitution of nonresponders can maintain sample size and structure without altering survey estimates-the experience of the Italian behavioral risk factors surveillance system (PASSI).

    PubMed

    Baldissera, Sandro; Ferrante, Gianluigi; Quarchioni, Elisa; Minardi, Valentina; Possenti, Valentina; Carrozzi, Giuliano; Masocco, Maria; Salmaso, Stefania

    2014-04-01

    Field substitution of nonrespondents can be used to maintain the planned sample size and structure in surveys but may introduce additional bias. Sample weighting is suggested as the preferable alternative; however, limited empirical evidence exists comparing the two methods. We wanted to assess the impact of substitution on surveillance results using data from Progressi delle Aziende Sanitarie per la Salute in Italia-Progress by Local Health Units towards a Healthier Italy (PASSI). PASSI is conducted by Local Health Units (LHUs) through telephone interviews of stratified random samples of residents. Nonrespondents are replaced with substitutes randomly preselected in the same LHU stratum. We compared the weighted estimates obtained in the original PASSI sample (used as a reference) and in the substitutes' sample. The differences were evaluated using a Wald test. In 2011, 50,697 units were selected: 37,252 were from the original sample and 13,445 were substitutes; 37,162 persons were interviewed. The initially planned size and demographic composition were restored. No significant differences in the estimates between the original and the substitutes' sample were found. In our experience, field substitution is an acceptable method for dealing with nonresponse, maintaining the characteristics of the original sample without affecting the results. This evidence can support appropriate decisions about planning and implementing a surveillance system. Copyright © 2014 Elsevier Inc. All rights reserved.

  7. An empirical Bayes approach to analyzing recurring animal surveys

    USGS Publications Warehouse

    Johnson, D.H.

    1989-01-01

    Recurring estimates of the size of animal populations are often required by biologists or wildlife managers. Because of cost or other constraints, estimates frequently lack the accuracy desired but cannot readily be improved by additional sampling. This report proposes a statistical method employing empirical Bayes (EB) estimators as alternatives to those customarily used to estimate population size, and evaluates them by a subsampling experiment on waterfowl surveys. EB estimates, especially a simple limited-translation version, were more accurate and provided shorter confidence intervals with greater coverage probabilities than customary estimates.

  8. Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA

    PubMed Central

    Kelly, Brendan J.; Gross, Robert; Bittinger, Kyle; Sherrill-Mix, Scott; Lewis, James D.; Collman, Ronald G.; Bushman, Frederic D.; Li, Hongzhe

    2015-01-01

    Motivation: The variation in community composition between microbiome samples, termed beta diversity, can be measured by pairwise distance based on either presence–absence or quantitative species abundance data. PERMANOVA, a permutation-based extension of multivariate analysis of variance to a matrix of pairwise distances, partitions within-group and between-group distances to permit assessment of the effect of an exposure or intervention (grouping factor) upon the sampled microbiome. Within-group distance and exposure/intervention effect size must be accurately modeled to estimate statistical power for a microbiome study that will be analyzed with pairwise distances and PERMANOVA. Results: We present a framework for PERMANOVA power estimation tailored to marker-gene microbiome studies that will be analyzed by pairwise distances, which includes: (i) a novel method for distance matrix simulation that permits modeling of within-group pairwise distances according to pre-specified population parameters; (ii) a method to incorporate effects of different sizes within the simulated distance matrix; (iii) a simulation-based method for estimating PERMANOVA power from simulated distance matrices; and (iv) an R statistical software package that implements the above. Matrices of pairwise distances can be efficiently simulated to satisfy the triangle inequality and incorporate group-level effects, which are quantified by the adjusted coefficient of determination, omega-squared (ω2). From simulated distance matrices, available PERMANOVA power or necessary sample size can be estimated for a planned microbiome study. Availability and implementation: http://github.com/brendankelly/micropower. Contact: brendank@mail.med.upenn.edu or hongzhe@upenn.edu PMID:25819674

  9. Small Sample Performance of Bias-corrected Sandwich Estimators for Cluster-Randomized Trials with Binary Outcomes

    PubMed Central

    Li, Peng; Redden, David T.

    2014-01-01

    SUMMARY The sandwich estimator in generalized estimating equations (GEE) approach underestimates the true variance in small samples and consequently results in inflated type I error rates in hypothesis testing. This fact limits the application of the GEE in cluster-randomized trials (CRTs) with few clusters. Under various CRT scenarios with correlated binary outcomes, we evaluate the small sample properties of the GEE Wald tests using bias-corrected sandwich estimators. Our results suggest that the GEE Wald z test should be avoided in the analyses of CRTs with few clusters even when bias-corrected sandwich estimators are used. With t-distribution approximation, the Kauermann and Carroll (KC)-correction can keep the test size to nominal levels even when the number of clusters is as low as 10, and is robust to the moderate variation of the cluster sizes. However, in cases with large variations in cluster sizes, the Fay and Graubard (FG)-correction should be used instead. Furthermore, we derive a formula to calculate the power and minimum total number of clusters one needs using the t test and KC-correction for the CRTs with binary outcomes. The power levels as predicted by the proposed formula agree well with the empirical powers from the simulations. The proposed methods are illustrated using real CRT data. We conclude that with appropriate control of type I error rates under small sample sizes, we recommend the use of GEE approach in CRTs with binary outcomes due to fewer assumptions and robustness to the misspecification of the covariance structure. PMID:25345738

  10. A post hoc evaluation of a sample size re-estimation in the Secondary Prevention of Small Subcortical Strokes study.

    PubMed

    McClure, Leslie A; Szychowski, Jeff M; Benavente, Oscar; Hart, Robert G; Coffey, Christopher S

    2016-10-01

    The use of adaptive designs has been increasing in randomized clinical trials. Sample size re-estimation is a type of adaptation in which nuisance parameters are estimated at an interim point in the trial and the sample size re-computed based on these estimates. The Secondary Prevention of Small Subcortical Strokes study was a randomized clinical trial assessing the impact of single- versus dual-antiplatelet therapy and control of systolic blood pressure to a higher (130-149 mmHg) versus lower (<130 mmHg) target on recurrent stroke risk in a two-by-two factorial design. A sample size re-estimation was performed during the Secondary Prevention of Small Subcortical Strokes study resulting in an increase from the planned sample size of 2500-3020, and we sought to determine the impact of the sample size re-estimation on the study results. We assessed the results of the primary efficacy and safety analyses with the full 3020 patients and compared them to the results that would have been observed had randomization ended with 2500 patients. The primary efficacy outcome considered was recurrent stroke, and the primary safety outcomes were major bleeds and death. We computed incidence rates for the efficacy and safety outcomes and used Cox proportional hazards models to examine the hazard ratios for each of the two treatment interventions (i.e. the antiplatelet and blood pressure interventions). In the antiplatelet intervention, the hazard ratio was not materially modified by increasing the sample size, nor did the conclusions regarding the efficacy of mono versus dual-therapy change: there was no difference in the effect of dual- versus monotherapy on the risk of recurrent stroke hazard ratios (n = 3020 HR (95% confidence interval): 0.92 (0.72, 1.2), p = 0.48; n = 2500 HR (95% confidence interval): 1.0 (0.78, 1.3), p = 0.85). With respect to the blood pressure intervention, increasing the sample size resulted in less certainty in the results, as the hazard ratio for higher versus lower systolic blood pressure target approached, but did not achieve, statistical significance with the larger sample (n = 3020 HR (95% confidence interval): 0.81 (0.63, 1.0), p = 0.089; n = 2500 HR (95% confidence interval): 0.89 (0.68, 1.17), p = 0.40). The results from the safety analyses were similar to 3020 and 2500 patients for both study interventions. Other trial-related factors, such as contracts, finances, and study management, were impacted as well. Adaptive designs can have benefits in randomized clinical trials, but do not always result in significant findings. The impact of adaptive designs should be measured in terms of both trial results, as well as practical issues related to trial management. More post hoc analyses of study adaptations will lead to better understanding of the balance between the benefits and the costs. © The Author(s) 2016.

  11. Thoracic and respirable particle definitions for human health risk assessment.

    PubMed

    Brown, James S; Gordon, Terry; Price, Owen; Asgharian, Bahman

    2013-04-10

    Particle size-selective sampling refers to the collection of particles of varying sizes that potentially reach and adversely affect specific regions of the respiratory tract. Thoracic and respirable fractions are defined as the fraction of inhaled particles capable of passing beyond the larynx and ciliated airways, respectively, during inhalation. In an attempt to afford greater protection to exposed individuals, current size-selective sampling criteria overestimate the population means of particle penetration into regions of the lower respiratory tract. The purpose of our analyses was to provide estimates of the thoracic and respirable fractions for adults and children during typical activities with both nasal and oral inhalation, that may be used in the design of experimental studies and interpretation of health effects evidence. We estimated the fraction of inhaled particles (0.5-20 μm aerodynamic diameter) penetrating beyond the larynx (based on experimental data) and ciliated airways (based on a mathematical model) for an adult male, adult female, and a 10 yr old child during typical daily activities and breathing patterns. Our estimates show less penetration of coarse particulate matter into the thoracic and gas exchange regions of the respiratory tract than current size-selective criteria. Of the parameters we evaluated, particle penetration into the lower respiratory tract was most dependent on route of breathing. For typical activity levels and breathing habits, we estimated a 50% cut-size for the thoracic fraction at an aerodynamic diameter of around 3 μm in adults and 5 μm in children, whereas current ambient and occupational criteria suggest a 50% cut-size of 10 μm. By design, current size-selective sample criteria overestimate the mass of particles generally expected to penetrate into the lower respiratory tract to provide protection for individuals who may breathe orally. We provide estimates of thoracic and respirable fractions for a variety of breathing habits and activities that may benefit the design of experimental studies and interpretation of particle size-specific health effects.

  12. Thoracic and respirable particle definitions for human health risk assessment

    PubMed Central

    2013-01-01

    Background Particle size-selective sampling refers to the collection of particles of varying sizes that potentially reach and adversely affect specific regions of the respiratory tract. Thoracic and respirable fractions are defined as the fraction of inhaled particles capable of passing beyond the larynx and ciliated airways, respectively, during inhalation. In an attempt to afford greater protection to exposed individuals, current size-selective sampling criteria overestimate the population means of particle penetration into regions of the lower respiratory tract. The purpose of our analyses was to provide estimates of the thoracic and respirable fractions for adults and children during typical activities with both nasal and oral inhalation, that may be used in the design of experimental studies and interpretation of health effects evidence. Methods We estimated the fraction of inhaled particles (0.5-20 μm aerodynamic diameter) penetrating beyond the larynx (based on experimental data) and ciliated airways (based on a mathematical model) for an adult male, adult female, and a 10 yr old child during typical daily activities and breathing patterns. Results Our estimates show less penetration of coarse particulate matter into the thoracic and gas exchange regions of the respiratory tract than current size-selective criteria. Of the parameters we evaluated, particle penetration into the lower respiratory tract was most dependent on route of breathing. For typical activity levels and breathing habits, we estimated a 50% cut-size for the thoracic fraction at an aerodynamic diameter of around 3 μm in adults and 5 μm in children, whereas current ambient and occupational criteria suggest a 50% cut-size of 10 μm. Conclusions By design, current size-selective sample criteria overestimate the mass of particles generally expected to penetrate into the lower respiratory tract to provide protection for individuals who may breathe orally. We provide estimates of thoracic and respirable fractions for a variety of breathing habits and activities that may benefit the design of experimental studies and interpretation of particle size-specific health effects. PMID:23575443

  13. Determination of the influence of dispersion pattern of pesticide-resistant individuals on the reliability of resistance estimates using different sampling plans.

    PubMed

    Shah, R; Worner, S P; Chapman, R B

    2012-10-01

    Pesticide resistance monitoring includes resistance detection and subsequent documentation/ measurement. Resistance detection would require at least one (≥1) resistant individual(s) to be present in a sample to initiate management strategies. Resistance documentation, on the other hand, would attempt to get an estimate of the entire population (≥90%) of the resistant individuals. A computer simulation model was used to compare the efficiency of simple random and systematic sampling plans to detect resistant individuals and to document their frequencies when the resistant individuals were randomly or patchily distributed. A patchy dispersion pattern of resistant individuals influenced the sampling efficiency of systematic sampling plans while the efficiency of random sampling was independent of such patchiness. When resistant individuals were randomly distributed, sample sizes required to detect at least one resistant individual (resistance detection) with a probability of 0.95 were 300 (1%) and 50 (10% and 20%); whereas, when resistant individuals were patchily distributed, using systematic sampling, sample sizes required for such detection were 6000 (1%), 600 (10%) and 300 (20%). Sample sizes of 900 and 400 would be required to detect ≥90% of resistant individuals (resistance documentation) with a probability of 0.95 when resistant individuals were randomly dispersed and present at a frequency of 10% and 20%, respectively; whereas, when resistant individuals were patchily distributed, using systematic sampling, a sample size of 3000 and 1500, respectively, was necessary. Small sample sizes either underestimated or overestimated the resistance frequency. A simple random sampling plan is, therefore, recommended for insecticide resistance detection and subsequent documentation.

  14. Rule-of-thumb adjustment of sample sizes to accommodate dropouts in a two-stage analysis of repeated measurements.

    PubMed

    Overall, John E; Tonidandel, Scott; Starbuck, Robert R

    2006-01-01

    Recent contributions to the statistical literature have provided elegant model-based solutions to the problem of estimating sample sizes for testing the significance of differences in mean rates of change across repeated measures in controlled longitudinal studies with differentially correlated error and missing data due to dropouts. However, the mathematical complexity and model specificity of these solutions make them generally inaccessible to most applied researchers who actually design and undertake treatment evaluation research in psychiatry. In contrast, this article relies on a simple two-stage analysis in which dropout-weighted slope coefficients fitted to the available repeated measurements for each subject separately serve as the dependent variable for a familiar ANCOVA test of significance for differences in mean rates of change. This article is about how a sample of size that is estimated or calculated to provide desired power for testing that hypothesis without considering dropouts can be adjusted appropriately to take dropouts into account. Empirical results support the conclusion that, whatever reasonable level of power would be provided by a given sample size in the absence of dropouts, essentially the same power can be realized in the presence of dropouts simply by adding to the original dropout-free sample size the number of subjects who would be expected to drop from a sample of that original size under conditions of the proposed study.

  15. Inferring the demographic history from DNA sequences: An importance sampling approach based on non-homogeneous processes.

    PubMed

    Ait Kaci Azzou, S; Larribe, F; Froda, S

    2016-10-01

    In Ait Kaci Azzou et al. (2015) we introduced an Importance Sampling (IS) approach for estimating the demographic history of a sample of DNA sequences, the skywis plot. More precisely, we proposed a new nonparametric estimate of a population size that changes over time. We showed on simulated data that the skywis plot can work well in typical situations where the effective population size does not undergo very steep changes. In this paper, we introduce an iterative procedure which extends the previous method and gives good estimates under such rapid variations. In the iterative calibrated skywis plot we approximate the effective population size by a piecewise constant function, whose values are re-estimated at each step. These piecewise constant functions are used to generate the waiting times of non homogeneous Poisson processes related to a coalescent process with mutation under a variable population size model. Moreover, the present IS procedure is based on a modified version of the Stephens and Donnelly (2000) proposal distribution. Finally, we apply the iterative calibrated skywis plot method to a simulated data set from a rapidly expanding exponential model, and we show that the method based on this new IS strategy correctly reconstructs the demographic history. Copyright © 2016. Published by Elsevier Inc.

  16. Adaptive cluster sampling: An efficient method for assessing inconspicuous species

    Treesearch

    Andrea M. Silletti; Joan Walker

    2003-01-01

    Restorationistis typically evaluate the success of a project by estimating the population sizes of species that have been planted or seeded. Because total census is raely feasible, they must rely on sampling methods for population estimates. However, traditional random sampling designs may be inefficient for species that, for one reason or another, are challenging to...

  17. Sequential sampling: a novel method in farm animal welfare assessment.

    PubMed

    Heath, C A E; Main, D C J; Mullan, S; Haskell, M J; Browne, W J

    2016-02-01

    Lameness in dairy cows is an important welfare issue. As part of a welfare assessment, herd level lameness prevalence can be estimated from scoring a sample of animals, where higher levels of accuracy are associated with larger sample sizes. As the financial cost is related to the number of cows sampled, smaller samples are preferred. Sequential sampling schemes have been used for informing decision making in clinical trials. Sequential sampling involves taking samples in stages, where sampling can stop early depending on the estimated lameness prevalence. When welfare assessment is used for a pass/fail decision, a similar approach could be applied to reduce the overall sample size. The sampling schemes proposed here apply the principles of sequential sampling within a diagnostic testing framework. This study develops three sequential sampling schemes of increasing complexity to classify 80 fully assessed UK dairy farms, each with known lameness prevalence. Using the Welfare Quality herd-size-based sampling scheme, the first 'basic' scheme involves two sampling events. At the first sampling event half the Welfare Quality sample size is drawn, and then depending on the outcome, sampling either stops or is continued and the same number of animals is sampled again. In the second 'cautious' scheme, an adaptation is made to ensure that correctly classifying a farm as 'bad' is done with greater certainty. The third scheme is the only scheme to go beyond lameness as a binary measure and investigates the potential for increasing accuracy by incorporating the number of severely lame cows into the decision. The three schemes are evaluated with respect to accuracy and average sample size by running 100 000 simulations for each scheme, and a comparison is made with the fixed size Welfare Quality herd-size-based sampling scheme. All three schemes performed almost as well as the fixed size scheme but with much smaller average sample sizes. For the third scheme, an overall association between lameness prevalence and the proportion of lame cows that were severely lame on a farm was found. However, as this association was found to not be consistent across all farms, the sampling scheme did not prove to be as useful as expected. The preferred scheme was therefore the 'cautious' scheme for which a sampling protocol has also been developed.

  18. A robust measure of HIV-1 population turnover within chronically infected individuals.

    PubMed

    Achaz, G; Palmer, S; Kearney, M; Maldarelli, F; Mellors, J W; Coffin, J M; Wakeley, J

    2004-10-01

    A simple nonparameteric test for population structure was applied to temporally spaced samples of HIV-1 sequences from the gag-pol region within two chronically infected individuals. The results show that temporal structure can be detected for samples separated by about 22 months or more. The performance of the method, which was originally proposed to detect geographic structure, was tested for temporally spaced samples using neutral coalescent simulations. Simulations showed that the method is robust to variation in samples sizes and mutation rates, to the presence/absence of recombination, and that the power to detect temporal structure is high. By comparing levels of temporal structure in simulations to the levels observed in real data, we estimate the effective intra-individual population size of HIV-1 to be between 10(3) and 10(4) viruses, which is in agreement with some previous estimates. Using this estimate and a simple measure of sequence diversity, we estimate an effective neutral mutation rate of about 5 x 10(-6) per site per generation in the gag-pol region. The definition and interpretation of estimates of such "effective" population parameters are discussed.

  19. Comparison of day snorkeling, night snorkeling, and electrofishing to estimate bull trout abundance and size structure in a second-order Idaho stream

    Treesearch

    Russell F. Thurow; Daniel J. Schill

    1996-01-01

    Biologists lack sufficient information to develop protocols for sampling the abundance and size structure of bull trout Salvelinus confluentus. We compared summer estimates of the abundance and size structure of bull trout in a second-order central Idaho stream, derived by day snorkeling, night snorkeling, and electrofishing. We also examined the influence of water...

  20. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range.

    PubMed

    Wan, Xiang; Wang, Wenqian; Liu, Jiming; Tong, Tiejun

    2014-12-19

    In systematic reviews and meta-analysis, researchers often pool the results of the sample mean and standard deviation from a set of similar clinical trials. A number of the trials, however, reported the study using the median, the minimum and maximum values, and/or the first and third quartiles. Hence, in order to combine results, one may have to estimate the sample mean and standard deviation for such trials. In this paper, we propose to improve the existing literature in several directions. First, we show that the sample standard deviation estimation in Hozo et al.'s method (BMC Med Res Methodol 5:13, 2005) has some serious limitations and is always less satisfactory in practice. Inspired by this, we propose a new estimation method by incorporating the sample size. Second, we systematically study the sample mean and standard deviation estimation problem under several other interesting settings where the interquartile range is also available for the trials. We demonstrate the performance of the proposed methods through simulation studies for the three frequently encountered scenarios, respectively. For the first two scenarios, our method greatly improves existing methods and provides a nearly unbiased estimate of the true sample standard deviation for normal data and a slightly biased estimate for skewed data. For the third scenario, our method still performs very well for both normal data and skewed data. Furthermore, we compare the estimators of the sample mean and standard deviation under all three scenarios and present some suggestions on which scenario is preferred in real-world applications. In this paper, we discuss different approximation methods in the estimation of the sample mean and standard deviation and propose some new estimation methods to improve the existing literature. We conclude our work with a summary table (an Excel spread sheet including all formulas) that serves as a comprehensive guidance for performing meta-analysis in different situations.

  1. Robust functional statistics applied to Probability Density Function shape screening of sEMG data.

    PubMed

    Boudaoud, S; Rix, H; Al Harrach, M; Marin, F

    2014-01-01

    Recent studies pointed out possible shape modifications of the Probability Density Function (PDF) of surface electromyographical (sEMG) data according to several contexts like fatigue and muscle force increase. Following this idea, criteria have been proposed to monitor these shape modifications mainly using High Order Statistics (HOS) parameters like skewness and kurtosis. In experimental conditions, these parameters are confronted with small sample size in the estimation process. This small sample size induces errors in the estimated HOS parameters restraining real-time and precise sEMG PDF shape monitoring. Recently, a functional formalism, the Core Shape Model (CSM), has been used to analyse shape modifications of PDF curves. In this work, taking inspiration from CSM method, robust functional statistics are proposed to emulate both skewness and kurtosis behaviors. These functional statistics combine both kernel density estimation and PDF shape distances to evaluate shape modifications even in presence of small sample size. Then, the proposed statistics are tested, using Monte Carlo simulations, on both normal and Log-normal PDFs that mimic observed sEMG PDF shape behavior during muscle contraction. According to the obtained results, the functional statistics seem to be more robust than HOS parameters to small sample size effect and more accurate in sEMG PDF shape screening applications.

  2. Spatially-explicit estimation of Wright's neighborhood size in continuous populations

    Treesearch

    Andrew J. Shirk; Samuel A. Cushman

    2014-01-01

    Effective population size (Ne) is an important parameter in conservation genetics because it quantifies a population's capacity to resist loss of genetic diversity due to inbreeding and drift. The classical approach to estimate Ne from genetic data involves grouping sampled individuals into discretely defined subpopulations assumed to be panmictic. Importantly,...

  3. Microdiamond grade as a regionalised variable - some basic requirements for successful local microdiamond resource estimation of kimberlites

    NASA Astrophysics Data System (ADS)

    Stiefenhofer, Johann; Thurston, Malcolm L.; Bush, David E.

    2018-04-01

    Microdiamonds offer several advantages as a resource estimation tool, such as access to deeper parts of a deposit which may be beyond the reach of large diameter drilling (LDD) techniques, the recovery of the total diamond content in the kimberlite, and a cost benefit due to the cheaper treatment cost compared to large diameter samples. In this paper we take the first step towards local estimation by showing that micro-diamond samples can be treated as a regionalised variable suitable for use in geostatistical applications and we show examples of such output. Examples of microdiamond variograms are presented, the variance-support relationship for microdiamonds is demonstrated and consistency of the diamond size frequency distribution (SFD) is shown with the aid of real datasets. The focus therefore is on why local microdiamond estimation should be possible, not how to generate such estimates. Data from our case studies and examples demonstrate a positive correlation between micro- and macrodiamond sample grades as well as block estimates. This relationship can be demonstrated repeatedly across multiple mining operations. The smaller sample support size for microdiamond samples is a key difference between micro- and macrodiamond estimates and this aspect must be taken into account during the estimation process. We discuss three methods which can be used to validate or reconcile the estimates against macrodiamond data, either as estimates or in the form of production grades: (i) reconcilliation using production data, (ii) by comparing LDD-based grade estimates against microdiamond-based estimates and (iii) using simulation techniques.

  4. Sample sizes to control error estimates in determining soil bulk density in California forest soils

    Treesearch

    Youzhi Han; Jianwei Zhang; Kim G. Mattson; Weidong Zhang; Thomas A. Weber

    2016-01-01

    Characterizing forest soil properties with high variability is challenging, sometimes requiring large numbers of soil samples. Soil bulk density is a standard variable needed along with element concentrations to calculate nutrient pools. This study aimed to determine the optimal sample size, the number of observation (n), for predicting the soil bulk density with a...

  5. Inventory implications of using sampling variances in estimation of growth model coefficients

    Treesearch

    Albert R. Stage; William R. Wykoff

    2000-01-01

    Variables based on stand densities or stocking have sampling errors that depend on the relation of tree size to plot size and on the spatial structure of the population, ignoring the sampling errors of such variables, which include most measures of competition used in both distance-dependent and distance-independent growth models, can bias the predictions obtained from...

  6. Blinded and unblinded internal pilot study designs for clinical trials with count data.

    PubMed

    Schneider, Simon; Schmidli, Heinz; Friede, Tim

    2013-07-01

    Internal pilot studies are a popular design feature to address uncertainties in the sample size calculations caused by vague information on nuisance parameters. Despite their popularity, only very recently blinded sample size reestimation procedures for trials with count data were proposed and their properties systematically investigated. Although blinded procedures are favored by regulatory authorities, practical application is somewhat limited by fears that blinded procedures are prone to bias if the treatment effect was misspecified in the planning. Here, we compare unblinded and blinded procedures with respect to bias, error rates, and sample size distribution. We find that both procedures maintain the desired power and that the unblinded procedure is slightly liberal whereas the actual significance level of the blinded procedure is close to the nominal level. Furthermore, we show that in situations where uncertainty about the assumed treatment effect exists, the blinded estimator of the control event rate is biased in contrast to the unblinded estimator, which results in differences in mean sample sizes in favor of the unblinded procedure. However, these differences are rather small compared to the deviations of the mean sample sizes from the sample size required to detect the true, but unknown effect. We demonstrate that the variation of the sample size resulting from the blinded procedure is in many practically relevant situations considerably smaller than the one of the unblinded procedures. The methods are extended to overdispersed counts using a quasi-likelihood approach and are illustrated by trials in relapsing multiple sclerosis. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Sampling through time and phylodynamic inference with coalescent and birth–death models

    PubMed Central

    Volz, Erik M.; Frost, Simon D. W.

    2014-01-01

    Many population genetic models have been developed for the purpose of inferring population size and growth rates from random samples of genetic data. We examine two popular approaches to this problem, the coalescent and the birth–death-sampling model (BDM), in the context of estimating population size and birth rates in a population growing exponentially according to the birth–death branching process. For sequences sampled at a single time, we found the coalescent and the BDM gave virtually indistinguishable results in terms of the growth rates and fraction of the population sampled, even when sampling from a small population. For sequences sampled at multiple time points, we find that the birth–death model estimators are subject to large bias if the sampling process is misspecified. Since BDMs incorporate a model of the sampling process, we show how much of the statistical power of BDMs arises from the sequence of sample times and not from the genealogical tree. This motivates the development of a new coalescent estimator, which is augmented with a model of the known sampling process and is potentially more precise than the coalescent that does not use sample time information. PMID:25401173

  8. Simultaneous unbiased estimates of multiple downed wood attributes in perpendicular distance sampling

    Treesearch

    Mark J. Ducey; Jeffrey H. Gove; Harry T. Valentine

    2008-01-01

    Perpendicular distance sampling (PDS) is a fast probability-proportional-to-size method for inventory of downed wood. However, previous development of PDS had limited the method to estimating only one variable (such as volume per hectare, or surface area per hectare) at a time. Here, we develop a general design-unbiased estimator for PDS. We then show how that...

  9. Analysis of variograms with various sample sizes from a multispectral image

    USDA-ARS?s Scientific Manuscript database

    Variogram plays a crucial role in remote sensing application and geostatistics. It is very important to estimate variogram reliably from sufficient data. In this study, the analysis of variograms with various sample sizes of remotely sensed data was conducted. A 100x100-pixel subset was chosen from ...

  10. Estimating the Effective Sample Size of Tree Topologies from Bayesian Phylogenetic Analyses

    PubMed Central

    Lanfear, Robert; Hua, Xia; Warren, Dan L.

    2016-01-01

    Bayesian phylogenetic analyses estimate posterior distributions of phylogenetic tree topologies and other parameters using Markov chain Monte Carlo (MCMC) methods. Before making inferences from these distributions, it is important to assess their adequacy. To this end, the effective sample size (ESS) estimates how many truly independent samples of a given parameter the output of the MCMC represents. The ESS of a parameter is frequently much lower than the number of samples taken from the MCMC because sequential samples from the chain can be non-independent due to autocorrelation. Typically, phylogeneticists use a rule of thumb that the ESS of all parameters should be greater than 200. However, we have no method to calculate an ESS of tree topology samples, despite the fact that the tree topology is often the parameter of primary interest and is almost always central to the estimation of other parameters. That is, we lack a method to determine whether we have adequately sampled one of the most important parameters in our analyses. In this study, we address this problem by developing methods to estimate the ESS for tree topologies. We combine these methods with two new diagnostic plots for assessing posterior samples of tree topologies, and compare their performance on simulated and empirical data sets. Combined, the methods we present provide new ways to assess the mixing and convergence of phylogenetic tree topologies in Bayesian MCMC analyses. PMID:27435794

  11. The efficacy of respondent-driven sampling for the health assessment of minority populations.

    PubMed

    Badowski, Grazyna; Somera, Lilnabeth P; Simsiman, Brayan; Lee, Hye-Ryeon; Cassel, Kevin; Yamanaka, Alisha; Ren, JunHao

    2017-10-01

    Respondent driven sampling (RDS) is a relatively new network sampling technique typically employed for hard-to-reach populations. Like snowball sampling, initial respondents or "seeds" recruit additional respondents from their network of friends. Under certain assumptions, the method promises to produce a sample independent from the biases that may have been introduced by the non-random choice of "seeds." We conducted a survey on health communication in Guam's general population using the RDS method, the first survey that has utilized this methodology in Guam. It was conducted in hopes of identifying a cost-efficient non-probability sampling strategy that could generate reasonable population estimates for both minority and general populations. RDS data was collected in Guam in 2013 (n=511) and population estimates were compared with 2012 BRFSS data (n=2031) and the 2010 census data. The estimates were calculated using the unweighted RDS sample and the weighted sample using RDS inference methods and compared with known population characteristics. The sample size was reached in 23days, providing evidence that the RDS method is a viable, cost-effective data collection method, which can provide reasonable population estimates. However, the results also suggest that the RDS inference methods used to reduce bias, based on self-reported estimates of network sizes, may not always work. Caution is needed when interpreting RDS study findings. For a more diverse sample, data collection should not be conducted in just one location. Fewer questions about network estimates should be asked, and more careful consideration should be given to the kind of incentives offered to participants. Copyright © 2017. Published by Elsevier Ltd.

  12. The Relationship between National-Level Carbon Dioxide Emissions and Population Size: An Assessment of Regional and Temporal Variation, 1960–2005

    PubMed Central

    Jorgenson, Andrew K.; Clark, Brett

    2013-01-01

    This study examines the regional and temporal differences in the statistical relationship between national-level carbon dioxide emissions and national-level population size. The authors analyze panel data from 1960 to 2005 for a diverse sample of nations, and employ descriptive statistics and rigorous panel regression modeling techniques. Initial descriptive analyses indicate that all regions experienced overall increases in carbon emissions and population size during the 45-year period of investigation, but with notable differences. For carbon emissions, the sample of countries in Asia experienced the largest percent increase, followed by countries in Latin America, Africa, and lastly the sample of relatively affluent countries in Europe, North America, and Oceania combined. For population size, the sample of countries in Africa experienced the largest percent increase, followed countries in Latin America, Asia, and the combined sample of countries in Europe, North America, and Oceania. Findings for two-way fixed effects panel regression elasticity models of national-level carbon emissions indicate that the estimated elasticity coefficient for population size is much smaller for nations in Africa than for nations in other regions of the world. Regarding potential temporal changes, from 1960 to 2005 the estimated elasticity coefficient for population size decreased by 25% for the sample of Africa countries, 14% for the sample of Asia countries, 6.5% for the sample of Latin America countries, but remained the same in size for the sample of countries in Europe, North America, and Oceania. Overall, while population size continues to be the primary driver of total national-level anthropogenic carbon dioxide emissions, the findings for this study highlight the need for future research and policies to recognize that the actual impacts of population size on national-level carbon emissions differ across both time and region. PMID:23437323

  13. Non-parametric methods for cost-effectiveness analysis: the central limit theorem and the bootstrap compared.

    PubMed

    Nixon, Richard M; Wonderling, David; Grieve, Richard D

    2010-03-01

    Cost-effectiveness analyses (CEA) alongside randomised controlled trials commonly estimate incremental net benefits (INB), with 95% confidence intervals, and compute cost-effectiveness acceptability curves and confidence ellipses. Two alternative non-parametric methods for estimating INB are to apply the central limit theorem (CLT) or to use the non-parametric bootstrap method, although it is unclear which method is preferable. This paper describes the statistical rationale underlying each of these methods and illustrates their application with a trial-based CEA. It compares the sampling uncertainty from using either technique in a Monte Carlo simulation. The experiments are repeated varying the sample size and the skewness of costs in the population. The results showed that, even when data were highly skewed, both methods accurately estimated the true standard errors (SEs) when sample sizes were moderate to large (n>50), and also gave good estimates for small data sets with low skewness. However, when sample sizes were relatively small and the data highly skewed, using the CLT rather than the bootstrap led to slightly more accurate SEs. We conclude that while in general using either method is appropriate, the CLT is easier to implement, and provides SEs that are at least as accurate as the bootstrap. (c) 2009 John Wiley & Sons, Ltd.

  14. A Simulation Approach to Assessing Sampling Strategies for Insect Pests: An Example with the Balsam Gall Midge

    PubMed Central

    Carleton, R. Drew; Heard, Stephen B.; Silk, Peter J.

    2013-01-01

    Estimation of pest density is a basic requirement for integrated pest management in agriculture and forestry, and efficiency in density estimation is a common goal. Sequential sampling techniques promise efficient sampling, but their application can involve cumbersome mathematics and/or intensive warm-up sampling when pests have complex within- or between-site distributions. We provide tools for assessing the efficiency of sequential sampling and of alternative, simpler sampling plans, using computer simulation with “pre-sampling” data. We illustrate our approach using data for balsam gall midge (Paradiplosis tumifex) attack in Christmas tree farms. Paradiplosis tumifex proved recalcitrant to sequential sampling techniques. Midge distributions could not be fit by a common negative binomial distribution across sites. Local parameterization, using warm-up samples to estimate the clumping parameter k for each site, performed poorly: k estimates were unreliable even for samples of n∼100 trees. These methods were further confounded by significant within-site spatial autocorrelation. Much simpler sampling schemes, involving random or belt-transect sampling to preset sample sizes, were effective and efficient for P. tumifex. Sampling via belt transects (through the longest dimension of a stand) was the most efficient, with sample means converging on true mean density for sample sizes of n∼25–40 trees. Pre-sampling and simulation techniques provide a simple method for assessing sampling strategies for estimating insect infestation. We suspect that many pests will resemble P. tumifex in challenging the assumptions of sequential sampling methods. Our software will allow practitioners to optimize sampling strategies before they are brought to real-world applications, while potentially avoiding the need for the cumbersome calculations required for sequential sampling methods. PMID:24376556

  15. Estimation of tiger densities in the tropical dry forests of Panna, Central India, using photographic capture-recapture sampling

    USGS Publications Warehouse

    Karanth, K.Ullas; Chundawat, Raghunandan S.; Nichols, James D.; Kumar, N. Samba

    2004-01-01

    Tropical dry-deciduous forests comprise more than 45% of the tiger (Panthera tigris) habitat in India. However, in the absence of rigorously derived estimates of ecological densities of tigers in dry forests, critical baseline data for managing tiger populations are lacking. In this study tiger densities were estimated using photographic capture–recapture sampling in the dry forests of Panna Tiger Reserve in Central India. Over a 45-day survey period, 60 camera trap sites were sampled in a well-protected part of the 542-km2 reserve during 2002. A total sampling effort of 914 camera-trap-days yielded photo-captures of 11 individual tigers over 15 sampling occasions that effectively covered a 418-km2 area. The closed capture–recapture model Mh, which incorporates individual heterogeneity in capture probabilities, fitted these photographic capture history data well. The estimated capture probability/sample, p̂= 0.04, resulted in an estimated tiger population size and standard error (N̂(SÊN̂)) of 29 (9.65), and a density (D̂(SÊD̂)) of 6.94 (3.23) tigers/100 km2. The estimated tiger density matched predictions based on prey abundance. Our results suggest that, if managed appropriately, the available dry forest habitat in India has the potential to support a population size of about 9000 wild tigers.

  16. Confidence intervals for the population mean tailored to small sample sizes, with applications to survey sampling.

    PubMed

    Rosenblum, Michael A; Laan, Mark J van der

    2009-01-07

    The validity of standard confidence intervals constructed in survey sampling is based on the central limit theorem. For small sample sizes, the central limit theorem may give a poor approximation, resulting in confidence intervals that are misleading. We discuss this issue and propose methods for constructing confidence intervals for the population mean tailored to small sample sizes. We present a simple approach for constructing confidence intervals for the population mean based on tail bounds for the sample mean that are correct for all sample sizes. Bernstein's inequality provides one such tail bound. The resulting confidence intervals have guaranteed coverage probability under much weaker assumptions than are required for standard methods. A drawback of this approach, as we show, is that these confidence intervals are often quite wide. In response to this, we present a method for constructing much narrower confidence intervals, which are better suited for practical applications, and that are still more robust than confidence intervals based on standard methods, when dealing with small sample sizes. We show how to extend our approaches to much more general estimation problems than estimating the sample mean. We describe how these methods can be used to obtain more reliable confidence intervals in survey sampling. As a concrete example, we construct confidence intervals using our methods for the number of violent deaths between March 2003 and July 2006 in Iraq, based on data from the study "Mortality after the 2003 invasion of Iraq: A cross sectional cluster sample survey," by Burnham et al. (2006).

  17. DS — Software for analyzing data collected using double sampling

    USGS Publications Warehouse

    Bart, Jonathan; Hartley, Dana

    2011-01-01

    DS analyzes count data to estimate density or relative density and population size when appropriate. The software is available at http://iwcbm.dev4.fsr.com/IWCBM/default.asp?PageID=126. The software was designed to analyze data collected using double sampling, but it also can be used to analyze index data. DS is not currently configured to apply distance methods or methods based on capture-recapture theory. Double sampling for the purpose of this report means surveying a sample of locations with a rapid method of unknown accuracy and surveying a subset of these locations using a more intensive method assumed to yield unbiased estimates. "Detection ratios" are calculated as the ratio of results from rapid surveys on intensive plots to the number actually present as determined from the intensive surveys. The detection ratios are used to adjust results from the rapid surveys. The formula for density is (results from rapid survey)/(estimated detection ratio from intensive surveys). Population sizes are estimated as (density)(area). Double sampling is well-established in the survey sampling literature—see Cochran (1977) for the basic theory, Smith (1995) for applications of double sampling in waterfowl surveys, Bart and Earnst (2002, 2005) for discussions of its use in wildlife studies, and Bart and others (in press) for a detailed account of how the method was used to survey shorebirds across the arctic region of North America. Indices are surveys that do not involve complete counts of well-defined plots or recording information to estimate detection rates (Thompson and others, 1998). In most cases, such data should not be used to estimate density or population size but, under some circumstances, may be used to compare two densities or estimate how density changes through time or across space (Williams and others, 2005). The Breeding Bird Survey (Sauer and others, 2008) provides a good example of an index survey. Surveyors record all birds detected but do not record any information, such as distance or whether each bird is recorded in subperiods, that could be used to estimate detection rates. Nonetheless, the data are widely used to estimate temporal trends and spatial patterns in abundance (Sauer and others, 2008). DS produces estimates of density (or relative density for indices) by species and stratum. Strata are usually defined using region and habitat but other variables may be used, and the entire study area may be classified as a single stratum. Population size in each stratum and for the entire study area also is estimated for each species. For indices, the estimated totals generally are only useful if (a) plots are surveyed so that densities can be calculated and extrapolated to the entire study area and (b) if the detection rates are close to 1.0. All estimates are accompanied by standard errors (SE) and coefficients of variation (CV, that is, SE/estimate).

  18. Sampling plantations to determine white-pine weevil injury

    Treesearch

    Robert L. Talerico; Robert W., Jr. Wilson

    1973-01-01

    Use of 1/10-acre square plots to obtain estimates of the proportion of never-weeviled trees necessary for evaluating and scheduling white-pine weevil control is described. The optimum number of trees to observe per plot is estimated from data obtained from sample plantations in the Northeast and a table is given. Of sample size required to achieve a standard error of...

  19. Spatially explicit dynamic N-mixture models

    USGS Publications Warehouse

    Zhao, Qing; Royle, Andy; Boomer, G. Scott

    2017-01-01

    Knowledge of demographic parameters such as survival, reproduction, emigration, and immigration is essential to understand metapopulation dynamics. Traditionally the estimation of these demographic parameters requires intensive data from marked animals. The development of dynamic N-mixture models makes it possible to estimate demographic parameters from count data of unmarked animals, but the original dynamic N-mixture model does not distinguish emigration and immigration from survival and reproduction, limiting its ability to explain important metapopulation processes such as movement among local populations. In this study we developed a spatially explicit dynamic N-mixture model that estimates survival, reproduction, emigration, local population size, and detection probability from count data under the assumption that movement only occurs among adjacent habitat patches. Simulation studies showed that the inference of our model depends on detection probability, local population size, and the implementation of robust sampling design. Our model provides reliable estimates of survival, reproduction, and emigration when detection probability is high, regardless of local population size or the type of sampling design. When detection probability is low, however, our model only provides reliable estimates of survival, reproduction, and emigration when local population size is moderate to high and robust sampling design is used. A sensitivity analysis showed that our model is robust against the violation of the assumption that movement only occurs among adjacent habitat patches, suggesting wide applications of this model. Our model can be used to improve our understanding of metapopulation dynamics based on count data that are relatively easy to collect in many systems.

  20. Evaluating estimators for numbers of females with cubs-of-the-year in the Yellowstone grizzly bear population

    USGS Publications Warehouse

    Cherry, S.; White, G.C.; Keating, K.A.; Haroldson, Mark A.; Schwartz, Charles C.

    2007-01-01

    Current management of the grizzly bear (Ursus arctos) population in Yellowstone National Park and surrounding areas requires annual estimation of the number of adult female bears with cubs-of-the-year. We examined the performance of nine estimators of population size via simulation. Data were simulated using two methods for different combinations of population size, sample size, and coefficient of variation of individual sighting probabilities. We show that the coefficient of variation does not, by itself, adequately describe the effects of capture heterogeneity, because two different distributions of capture probabilities can have the same coefficient of variation. All estimators produced biased estimates of population size with bias decreasing as effort increased. Based on the simulation results we recommend the Chao estimator for model M h be used to estimate the number of female bears with cubs of the year; however, the estimator of Chao and Shen may also be useful depending on the goals of the research.

  1. Change-in-ratio methods for estimating population size

    USGS Publications Warehouse

    Udevitz, Mark S.; Pollock, Kenneth H.; McCullough, Dale R.; Barrett, Reginald H.

    2002-01-01

    Change-in-ratio (CIR) methods can provide an effective, low cost approach for estimating the size of wildlife populations. They rely on being able to observe changes in proportions of population subclasses that result from the removal of a known number of individuals from the population. These methods were first introduced in the 1940’s to estimate the size of populations with 2 subclasses under the assumption of equal subclass encounter probabilities. Over the next 40 years, closed population CIR models were developed to consider additional subclasses and use additional sampling periods. Models with assumptions about how encounter probabilities vary over time, rather than between subclasses, also received some attention. Recently, all of these CIR models have been shown to be special cases of a more general model. Under the general model, information from additional samples can be used to test assumptions about the encounter probabilities and to provide estimates of subclass sizes under relaxations of these assumptions. These developments have greatly extended the applicability of the methods. CIR methods are attractive because they do not require the marking of individuals, and subclass proportions often can be estimated with relatively simple sampling procedures. However, CIR methods require a carefully monitored removal of individuals from the population, and the estimates will be of poor quality unless the removals induce substantial changes in subclass proportions. In this paper, we review the state of the art for closed population estimation with CIR methods. Our emphasis is on the assumptions of CIR methods and on identifying situations where these methods are likely to be effective. We also identify some important areas for future CIR research.

  2. Incorporating partially identified sample segments into acreage estimation procedures: Estimates using only observations from the current year

    NASA Technical Reports Server (NTRS)

    Sielken, R. L., Jr. (Principal Investigator)

    1981-01-01

    Several methods of estimating individual crop acreages using a mixture of completely identified and partially identified (generic) segments from a single growing year are derived and discussed. A small Monte Carlo study of eight estimators is presented. The relative empirical behavior of these estimators is discussed as are the effects of segment sample size and amount of partial identification. The principle recommendations are (1) to not exclude, but rather incorporate partially identified sample segments into the estimation procedure, (2) try to avoid having a large percentage (say 80%) of only partially identified segments, in the sample, and (3) use the maximum likelihood estimator although the weighted least squares estimator and least squares ratio estimator both perform almost as well. Sets of spring small grains (North Dakota) data were used.

  3. Performance of small cluster surveys and the clustered LQAS design to estimate local-level vaccination coverage in Mali.

    PubMed

    Minetti, Andrea; Riera-Montes, Margarita; Nackers, Fabienne; Roederer, Thomas; Koudika, Marie Hortense; Sekkenes, Johanne; Taconet, Aurore; Fermon, Florence; Touré, Albouhary; Grais, Rebecca F; Checchi, Francesco

    2012-10-12

    Estimation of vaccination coverage at the local level is essential to identify communities that may require additional support. Cluster surveys can be used in resource-poor settings, when population figures are inaccurate. To be feasible, cluster samples need to be small, without losing robustness of results. The clustered LQAS (CLQAS) approach has been proposed as an alternative, as smaller sample sizes are required. We explored (i) the efficiency of cluster surveys of decreasing sample size through bootstrapping analysis and (ii) the performance of CLQAS under three alternative sampling plans to classify local VC, using data from a survey carried out in Mali after mass vaccination against meningococcal meningitis group A. VC estimates provided by a 10 × 15 cluster survey design were reasonably robust. We used them to classify health areas in three categories and guide mop-up activities: i) health areas not requiring supplemental activities; ii) health areas requiring additional vaccination; iii) health areas requiring further evaluation. As sample size decreased (from 10 × 15 to 10 × 3), standard error of VC and ICC estimates were increasingly unstable. Results of CLQAS simulations were not accurate for most health areas, with an overall risk of misclassification greater than 0.25 in one health area out of three. It was greater than 0.50 in one health area out of two under two of the three sampling plans. Small sample cluster surveys (10 × 15) are acceptably robust for classification of VC at local level. We do not recommend the CLQAS method as currently formulated for evaluating vaccination programmes.

  4. Assessing relative abundance and reproductive success of shrubsteppe raptors

    USGS Publications Warehouse

    Lehman, Robert N.; Carpenter, L.B.; Steenhof, Karen; Kochert, Michael N.

    1998-01-01

    From 1991-1994, we quantified relative abundance and reproductive success of the Ferruginous Hawk (Buteo regalis), Northern Harrier (Circus cyaneus), Burrowing Owl (Speotytoc unicularia), and Short-eared Owl (Asio flammeus) on the shrubsteppe plateaus (benchlands) in and near the Snake River Birds of Prey National Conservation Area in southwestern Idaho. To assess relative abundance, we searched randomly selected plots using four sampling methods: point counts, line transects, and quadrats of two sizes. On a persampling-effort basis, transects were slightly more effective than point counts and quadrats for locating raptor nests (3.4 pairs detected/100 h of effort vs. 2.2-3.1 pairs). Random sampling using quadrats failed to detect a Short-eared Owl population increase from 1993 to 1994. To evaluate nesting success, we tried to determine reproductive outcome for all nesting attempts located during random, historical, and incidental nest searches. We compared nesting success estimates based on all nesting attempts, on attempts found during incubation, and the Mayfield model. Most pairs used to evaluate success were pairs found incidentally. Visits to historical nesting areas yielded the highest number of pairs per sampling effort (14.6/100 h), but reoccupancy rates for most species decreased through time. Estimates based on all attempts had the highest sample sizes but probably overestimated success for all species except the Ferruginous Hawk. Estimates of success based on nesting attempts found during incubation had the lowest sample sizes. All three methods yielded biased nesting snccess estimates for the Northern Harrier and Short-eared Owl. The estimate based on pairs found during incubation probably provided the least biased estimate for the Burrowing Owl. Assessments of nesting success were hindered by difficulties in confirming egg laying and nesting success for all species except the Ferruginous hawk.

  5. Comparing the accuracy and precision of three techniques used for estimating missing landmarks when reconstructing fossil hominin crania.

    PubMed

    Neeser, Rudolph; Ackermann, Rebecca Rogers; Gain, James

    2009-09-01

    Various methodological approaches have been used for reconstructing fossil hominin remains in order to increase sample sizes and to better understand morphological variation. Among these, morphometric quantitative techniques for reconstruction are increasingly common. Here we compare the accuracy of three approaches--mean substitution, thin plate splines, and multiple linear regression--for estimating missing landmarks of damaged fossil specimens. Comparisons are made varying the number of missing landmarks, sample sizes, and the reference species of the population used to perform the estimation. The testing is performed on landmark data from individuals of Homo sapiens, Pan troglodytes and Gorilla gorilla, and nine hominin fossil specimens. Results suggest that when a small, same-species fossil reference sample is available to guide reconstructions, thin plate spline approaches perform best. However, if no such sample is available (or if the species of the damaged individual is uncertain), estimates of missing morphology based on a single individual (or even a small sample) of close taxonomic affinity are less accurate than those based on a large sample of individuals drawn from more distantly related extant populations using a technique (such as a regression method) able to leverage the information (e.g., variation/covariation patterning) contained in this large sample. Thin plate splines also show an unexpectedly large amount of error in estimating landmarks, especially over large areas. Recommendations are made for estimating missing landmarks under various scenarios. Copyright 2009 Wiley-Liss, Inc.

  6. Reporting and methodological quality of sample size calculations in cluster randomized trials could be improved: a review.

    PubMed

    Rutterford, Clare; Taljaard, Monica; Dixon, Stephanie; Copas, Andrew; Eldridge, Sandra

    2015-06-01

    To assess the quality of reporting and accuracy of a priori estimates used in sample size calculations for cluster randomized trials (CRTs). We reviewed 300 CRTs published between 2000 and 2008. The prevalence of reporting sample size elements from the 2004 CONSORT recommendations was evaluated and a priori estimates compared with those observed in the trial. Of the 300 trials, 166 (55%) reported a sample size calculation. Only 36 of 166 (22%) reported all recommended descriptive elements. Elements specific to CRTs were the worst reported: a measure of within-cluster correlation was specified in only 58 of 166 (35%). Only 18 of 166 articles (11%) reported both a priori and observed within-cluster correlation values. Except in two cases, observed within-cluster correlation values were either close to or less than a priori values. Even with the CONSORT extension for cluster randomization, the reporting of sample size elements specific to these trials remains below that necessary for transparent reporting. Journal editors and peer reviewers should implement stricter requirements for authors to follow CONSORT recommendations. Authors should report observed and a priori within-cluster correlation values to enable comparisons between these over a wider range of trials. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  7. Impact of Different Visual Field Testing Paradigms on Sample Size Requirements for Glaucoma Clinical Trials.

    PubMed

    Wu, Zhichao; Medeiros, Felipe A

    2018-03-20

    Visual field testing is an important endpoint in glaucoma clinical trials, and the testing paradigm used can have a significant impact on the sample size requirements. To investigate this, this study included 353 eyes of 247 glaucoma patients seen over a 3-year period to extract real-world visual field rates of change and variability estimates to provide sample size estimates from computer simulations. The clinical trial scenario assumed that a new treatment was added to one of two groups that were both under routine clinical care, with various treatment effects examined. Three different visual field testing paradigms were evaluated: a) evenly spaced testing, b) United Kingdom Glaucoma Treatment Study (UKGTS) follow-up scheme, which adds clustered tests at the beginning and end of follow-up in addition to evenly spaced testing, and c) clustered testing paradigm, with clusters of tests at the beginning and end of the trial period and two intermediary visits. The sample size requirements were reduced by 17-19% and 39-40% using the UKGTS and clustered testing paradigms, respectively, when compared to the evenly spaced approach. These findings highlight how the clustered testing paradigm can substantially reduce sample size requirements and improve the feasibility of future glaucoma clinical trials.

  8. The effect of age and body composition on body mass estimation of males using the stature/bi-iliac method.

    PubMed

    Junno, Juho-Antti; Niskanen, Markku; Maijanen, Heli; Holt, Brigitte; Sladek, Vladimir; Niinimäki, Sirpa; Berner, Margit

    2018-02-01

    The stature/bi-iliac breadth method provides reasonably precise, skeletal frame size (SFS) based body mass (BM) estimations across adults as a whole. In this study, we examine the potential effects of age changes in anthropometric dimensions on the estimation accuracy of SFS-based body mass estimation. We use anthropometric data from the literature and our own skeletal data from two osteological collections to study effects of age on stature, bi-iliac breadth, body mass, and body composition, as they are major components behind body size and body size estimations. We focus on males, as relevant longitudinal data are based on male study samples. As a general rule, lean body mass (LBM) increases through adolescence and early adulthood until people are aged in their 30s or 40s, and starts to decline in the late 40s or early 50s. Fat mass (FM) tends to increase until the mid-50s and declines thereafter, but in more mobile traditional societies it may decline throughout adult life. Because BM is the sum of LBM and FM, it exhibits a curvilinear age-related pattern in all societies. Skeletal frame size is based on stature and bi-iliac breadth, and both of those dimensions are affected by age. Skeletal frame size based body mass estimation tends to increase throughout adult life in both skeletal and anthropometric samples because an age-related increase in bi-iliac breadth more than compensates for an age-related stature decline commencing in the 30s or 40s. Combined with the above-mentioned curvilinear BM change, this results in curvilinear estimation bias. However, for simulations involving low to moderate percent body fat, the stature/bi-iliac method works well in predicting body mass in younger and middle-aged adults. Such conditions are likely to have applied to most human paleontological and archaeological samples. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Hierarchical distance-sampling models to estimate population size and habitat-specific abundance of an island endemic

    USGS Publications Warehouse

    Sillett, Scott T.; Chandler, Richard B.; Royle, J. Andrew; Kéry, Marc; Morrison, Scott A.

    2012-01-01

    Population size and habitat-specific abundance estimates are essential for conservation management. A major impediment to obtaining such estimates is that few statistical models are able to simultaneously account for both spatial variation in abundance and heterogeneity in detection probability, and still be amenable to large-scale applications. The hierarchical distance-sampling model of J. A. Royle, D. K. Dawson, and S. Bates provides a practical solution. Here, we extend this model to estimate habitat-specific abundance and rangewide population size of a bird species of management concern, the Island Scrub-Jay (Aphelocoma insularis), which occurs solely on Santa Cruz Island, California, USA. We surveyed 307 randomly selected, 300 m diameter, point locations throughout the 250-km2 island during October 2008 and April 2009. Population size was estimated to be 2267 (95% CI 1613-3007) and 1705 (1212-2369) during the fall and spring respectively, considerably lower than a previously published but statistically problematic estimate of 12 500. This large discrepancy emphasizes the importance of proper survey design and analysis for obtaining reliable information for management decisions. Jays were most abundant in low-elevation chaparral habitat; the detection function depended primarily on the percent cover of chaparral and forest within count circles. Vegetation change on the island has been dramatic in recent decades, due to release from herbivory following the eradication of feral sheep (Ovis aries) from the majority of the island in the mid-1980s. We applied best-fit fall and spring models of habitat-specific jay abundance to a vegetation map from 1985, and estimated the population size of A. insularis was 1400-1500 at that time. The 20-30% increase in the jay population suggests that the species has benefited from the recovery of native vegetation since sheep removal. Nevertheless, this jay's tiny range and small population size make it vulnerable to natural disasters and to habitat alteration related to climate change. Our results demonstrate that hierarchical distance-sampling models hold promise for estimating population size and spatial density variation at large scales. Our statistical methods have been incorporated into the R package unmarked to facilitate their use by animal ecologists, and we provide annotated code in the Supplement.

  10. Using e-mail recruitment and an online questionnaire to establish effect size: A worked example.

    PubMed

    Kirkby, Helen M; Wilson, Sue; Calvert, Melanie; Draper, Heather

    2011-06-09

    Sample size calculations require effect size estimations. Sometimes, effect size estimations and standard deviation may not be readily available, particularly if efficacy is unknown because the intervention is new or developing, or the trial targets a new population. In such cases, one way to estimate the effect size is to gather expert opinion. This paper reports the use of a simple strategy to gather expert opinion to estimate a suitable effect size to use in a sample size calculation. Researchers involved in the design and analysis of clinical trials were identified at the University of Birmingham and via the MRC Hubs for Trials Methodology Research. An email invited them to participate.An online questionnaire was developed using the free online tool 'Survey Monkey©'. The questionnaire described an intervention, an electronic participant information sheet (e-PIS), which may increase recruitment rates to a trial. Respondents were asked how much they would need to see recruitment rates increased by, based on 90%. 70%, 50% and 30% baseline rates, (in a hypothetical study) before they would consider using an e-PIS in their research.Analyses comprised simple descriptive statistics. The invitation to participate was sent to 122 people; 7 responded to say they were not involved in trial design and could not complete the questionnaire, 64 attempted it, 26 failed to complete it. Thirty-eight people completed the questionnaire and were included in the analysis (response rate 33%; 38/115). Of those who completed the questionnaire 44.7% (17/38) were at the academic grade of research fellow 26.3% (10/38) senior research fellow, and 28.9% (11/38) professor. Dependent upon the baseline recruitment rates presented in the questionnaire, participants wanted recruitment rate to increase from 6.9% to 28.9% before they would consider using the intervention. This paper has shown that in situations where effect size estimations cannot be collected from previous research, opinions from researchers and trialists can be quickly and easily collected by conducting a simple study using email recruitment and an online questionnaire. The results collected from the survey were successfully used in sample size calculations for a PhD research study protocol.

  11. Practical sampling plans for Varroa destructor (Acari: Varroidae) in Apis mellifera (Hymenoptera: Apidae) colonies and apiaries.

    PubMed

    Lee, K V; Moon, R D; Burkness, E C; Hutchison, W D; Spivak, M

    2010-08-01

    The parasitic mite Varroa destructor Anderson & Trueman (Acari: Varroidae) is arguably the most detrimental pest of the European-derived honey bee, Apis mellifera L. Unfortunately, beekeepers lack a standardized sampling plan to make informed treatment decisions. Based on data from 31 commercial apiaries, we developed sampling plans for use by beekeepers and researchers to estimate the density of mites in individual colonies or whole apiaries. Beekeepers can estimate a colony's mite density with chosen level of precision by dislodging mites from approximately to 300 adult bees taken from one brood box frame in the colony, and they can extrapolate to mite density on a colony's adults and pupae combined by doubling the number of mites on adults. For sampling whole apiaries, beekeepers can repeat the process in each of n = 8 colonies, regardless of apiary size. Researchers desiring greater precision can estimate mite density in an individual colony by examining three, 300-bee sample units. Extrapolation to density on adults and pupae may require independent estimates of numbers of adults, of pupae, and of their respective mite densities. Researchers can estimate apiary-level mite density by taking one 300-bee sample unit per colony, but should do so from a variable number of colonies, depending on apiary size. These practical sampling plans will allow beekeepers and researchers to quantify mite infestation levels and enhance understanding and management of V. destructor.

  12. Technical note: Alternatives to reduce adipose tissue sampling bias.

    PubMed

    Cruz, G D; Wang, Y; Fadel, J G

    2014-10-01

    Understanding the mechanisms by which nutritional and pharmaceutical factors can manipulate adipose tissue growth and development in production animals has direct and indirect effects in the profitability of an enterprise. Adipocyte cellularity (number and size) is a key biological response that is commonly measured in animal science research. The variability and sampling of adipocyte cellularity within a muscle has been addressed in previous studies, but no attempt to critically investigate these issues has been proposed in the literature. The present study evaluated 2 sampling techniques (random and systematic) in an attempt to minimize sampling bias and to determine the minimum number of samples from 1 to 15 needed to represent the overall adipose tissue in the muscle. Both sampling procedures were applied on adipose tissue samples dissected from 30 longissimus muscles from cattle finished either on grass or grain. Briefly, adipose tissue samples were fixed with osmium tetroxide, and size and number of adipocytes were determined by a Coulter Counter. These results were then fit in a finite mixture model to obtain distribution parameters of each sample. To evaluate the benefits of increasing number of samples and the advantage of the new sampling technique, the concept of acceptance ratio was used; simply stated, the higher the acceptance ratio, the better the representation of the overall population. As expected, a great improvement on the estimation of the overall adipocyte cellularity parameters was observed using both sampling techniques when sample size number increased from 1 to 15 samples, considering both techniques' acceptance ratio increased from approximately 3 to 25%. When comparing sampling techniques, the systematic procedure slightly improved parameters estimation. The results suggest that more detailed research using other sampling techniques may provide better estimates for minimum sampling.

  13. On the Post Hoc Power in Testing Mean Differences

    ERIC Educational Resources Information Center

    Yuan, Ke-Hai; Maxwell, Scott

    2005-01-01

    Retrospective or post hoc power analysis is recommended by reviewers and editors of many journals. Little literature has been found that gave a serious study of the post hoc power. When the sample size is large, the observed effect size is a good estimator of the true power. This article studies whether such a power estimator provides valuable…

  14. Measuring Compartment Size and Gas Solubility in Marine Mammals

    DTIC Science & Technology

    2014-09-30

    analyzed by gas chromatography . Injection of the sample into the gas chromatograph is done using a sample loop to minimize volume injection error. We...1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Measuring Compartment Size and Gas Solubility in Marine...study is to develop methods to estimate marine mammal tissue compartment sizes, and tissue gas solubility. We aim to improve the data available for

  15. Minimax Estimation of Functionals of Discrete Distributions

    PubMed Central

    Jiao, Jiantao; Venkat, Kartik; Han, Yanjun; Weissman, Tsachy

    2017-01-01

    We propose a general methodology for the construction and analysis of essentially minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions, where the support size S is unknown and may be comparable with or even much larger than the number of observations n. We treat the respective regions where the functional is nonsmooth and smooth separately. In the nonsmooth regime, we apply an unbiased estimator for the best polynomial approximation of the functional whereas, in the smooth regime, we apply a bias-corrected version of the maximum likelihood estimator (MLE). We illustrate the merit of this approach by thoroughly analyzing the performance of the resulting schemes for estimating two important information measures: 1) the entropy H(P)=∑i=1S−pilnpi and 2) Fα(P)=∑i=1Spiα, α > 0. We obtain the minimax L2 rates for estimating these functionals. In particular, we demonstrate that our estimator achieves the optimal sample complexity n ≍ S/ln S for entropy estimation. We also demonstrate that the sample complexity for estimating Fα(P), 0 < α < 1, is n ≍ S1/α/ln S, which can be achieved by our estimator but not the MLE. For 1 < α < 3/2, we show the minimax L2 rate for estimating Fα(P) is (n ln n)−2(α−1) for infinite support size, while the maximum L2 rate for the MLE is n−2(α−1). For all the above cases, the behavior of the minimax rate-optimal estimators with n samples is essentially that of the MLE (plug-in rule) with n ln n samples, which we term “effective sample size enlargement.” We highlight the practical advantages of our schemes for the estimation of entropy and mutual information. We compare our performance with various existing approaches, and demonstrate that our approach reduces running time and boosts the accuracy. Moreover, we show that the minimax rate-optimal mutual information estimator yielded by our framework leads to significant performance boosts over the Chow–Liu algorithm in learning graphical models. The wide use of information measure estimation suggests that the insights and estimators obtained in this paper could be broadly applicable. PMID:29375152

  16. 7 CFR 275.11 - Sampling.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... between 1 and k. The value of k is dependent upon the estimated size of the universe and the sample size... included in the active universe defined in paragraph (e)(1) of this section) during the annual review... review (i.e., households which are part of the negative universe defined in paragraph (e)(2) of this...

  17. 7 CFR 275.11 - Sampling.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... between 1 and k. The value of k is dependent upon the estimated size of the universe and the sample size... included in the active universe defined in paragraph (e)(1) of this section) during the annual review... review (i.e., households which are part of the negative universe defined in paragraph (e)(2) of this...

  18. 7 CFR 275.11 - Sampling.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... between 1 and k. The value of k is dependent upon the estimated size of the universe and the sample size... included in the active universe defined in paragraph (e)(1) of this section) during the annual review... review (i.e., households which are part of the negative universe defined in paragraph (e)(2) of this...

  19. Exploratory Factor Analysis with Small Sample Sizes

    ERIC Educational Resources Information Center

    de Winter, J. C. F.; Dodou, D.; Wieringa, P. A.

    2009-01-01

    Exploratory factor analysis (EFA) is generally regarded as a technique for large sample sizes ("N"), with N = 50 as a reasonable absolute minimum. This study offers a comprehensive overview of the conditions in which EFA can yield good quality results for "N" below 50. Simulations were carried out to estimate the minimum required "N" for different…

  20. Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning

    ERIC Educational Resources Information Center

    Li, Zhushan

    2014-01-01

    Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…

  1. An Investigation of Sample Size Splitting on ATFIND and DIMTEST

    ERIC Educational Resources Information Center

    Socha, Alan; DeMars, Christine E.

    2013-01-01

    Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…

  2. Power and Precision in Confirmatory Factor Analytic Tests of Measurement Invariance

    ERIC Educational Resources Information Center

    Meade, Adam W.; Bauer, Daniel J.

    2007-01-01

    This study investigates the effects of sample size, factor overdetermination, and communality on the precision of factor loading estimates and the power of the likelihood ratio test of factorial invariance in multigroup confirmatory factor analysis. Although sample sizes are typically thought to be the primary determinant of precision and power,…

  3. Capturing heterogeneity: The role of a study area's extent for estimating mean throughfall

    NASA Astrophysics Data System (ADS)

    Zimmermann, Alexander; Voss, Sebastian; Metzger, Johanna Clara; Hildebrandt, Anke; Zimmermann, Beate

    2016-11-01

    The selection of an appropriate spatial extent of a sampling plot is one among several important decisions involved in planning a throughfall sampling scheme. In fact, the choice of the extent may determine whether or not a study can adequately characterize the hydrological fluxes of the studied ecosystem. Previous attempts to optimize throughfall sampling schemes focused on the selection of an appropriate sample size, support, and sampling design, while comparatively little attention has been given to the role of the extent. In this contribution, we investigated the influence of the extent on the representativeness of mean throughfall estimates for three forest ecosystems of varying stand structure. Our study is based on virtual sampling of simulated throughfall fields. We derived these fields from throughfall data sampled in a simply structured forest (young tropical forest) and two heterogeneous forests (old tropical forest, unmanaged mixed European beech forest). We then sampled the simulated throughfall fields with three common extents and various sample sizes for a range of events and for accumulated data. Our findings suggest that the size of the study area should be carefully adapted to the complexity of the system under study and to the required temporal resolution of the throughfall data (i.e. event-based versus accumulated). Generally, event-based sampling in complex structured forests (conditions that favor comparatively long autocorrelations in throughfall) requires the largest extents. For event-based sampling, the choice of an appropriate extent can be as important as using an adequate sample size.

  4. Image analysis of representative food structures: application of the bootstrap method.

    PubMed

    Ramírez, Cristian; Germain, Juan C; Aguilera, José M

    2009-08-01

    Images (for example, photomicrographs) are routinely used as qualitative evidence of the microstructure of foods. In quantitative image analysis it is important to estimate the area (or volume) to be sampled, the field of view, and the resolution. The bootstrap method is proposed to estimate the size of the sampling area as a function of the coefficient of variation (CV(Bn)) and standard error (SE(Bn)) of the bootstrap taking sub-areas of different sizes. The bootstrap method was applied to simulated and real structures (apple tissue). For simulated structures, 10 computer-generated images were constructed containing 225 black circles (elements) and different coefficient of variation (CV(image)). For apple tissue, 8 images of apple tissue containing cellular cavities with different CV(image) were analyzed. Results confirmed that for simulated and real structures, increasing the size of the sampling area decreased the CV(Bn) and SE(Bn). Furthermore, there was a linear relationship between the CV(image) and CV(Bn) (.) For example, to obtain a CV(Bn) = 0.10 in an image with CV(image) = 0.60, a sampling area of 400 x 400 pixels (11% of whole image) was required, whereas if CV(image) = 1.46, a sampling area of 1000 x 100 pixels (69% of whole image) became necessary. This suggests that a large-size dispersion of element sizes in an image requires increasingly larger sampling areas or a larger number of images.

  5. Reference interval estimation: Methodological comparison using extensive simulations and empirical data.

    PubMed

    Daly, Caitlin H; Higgins, Victoria; Adeli, Khosrow; Grey, Vijay L; Hamid, Jemila S

    2017-12-01

    To statistically compare and evaluate commonly used methods of estimating reference intervals and to determine which method is best based on characteristics of the distribution of various data sets. Three approaches for estimating reference intervals, i.e. parametric, non-parametric, and robust, were compared with simulated Gaussian and non-Gaussian data. The hierarchy of the performances of each method was examined based on bias and measures of precision. The findings of the simulation study were illustrated through real data sets. In all Gaussian scenarios, the parametric approach provided the least biased and most precise estimates. In non-Gaussian scenarios, no single method provided the least biased and most precise estimates for both limits of a reference interval across all sample sizes, although the non-parametric approach performed the best for most scenarios. The hierarchy of the performances of the three methods was only impacted by sample size and skewness. Differences between reference interval estimates established by the three methods were inflated by variability. Whenever possible, laboratories should attempt to transform data to a Gaussian distribution and use the parametric approach to obtain the most optimal reference intervals. When this is not possible, laboratories should consider sample size and skewness as factors in their choice of reference interval estimation method. The consequences of false positives or false negatives may also serve as factors in this decision. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

  6. A note on power and sample size calculations for the Kruskal-Wallis test for ordered categorical data.

    PubMed

    Fan, Chunpeng; Zhang, Donghui

    2012-01-01

    Although the Kruskal-Wallis test has been widely used to analyze ordered categorical data, power and sample size methods for this test have been investigated to a much lesser extent when the underlying multinomial distributions are unknown. This article generalizes the power and sample size procedures proposed by Fan et al. ( 2011 ) for continuous data to ordered categorical data, when estimates from a pilot study are used in the place of knowledge of the true underlying distribution. Simulations show that the proposed power and sample size formulas perform well. A myelin oligodendrocyte glycoprotein (MOG) induced experimental autoimmunce encephalomyelitis (EAE) mouse study is used to demonstrate the application of the methods.

  7. Children's accuracy of portion size estimation using digital food images: effects of interface design and size of image on computer screen.

    PubMed

    Baranowski, Tom; Baranowski, Janice C; Watson, Kathleen B; Martin, Shelby; Beltran, Alicia; Islam, Noemi; Dadabhoy, Hafza; Adame, Su-heyla; Cullen, Karen; Thompson, Debbe; Buday, Richard; Subar, Amy

    2011-03-01

    To test the effect of image size and presence of size cues on the accuracy of portion size estimation by children. Children were randomly assigned to seeing images with or without food size cues (utensils and checked tablecloth) and were presented with sixteen food models (foods commonly eaten by children) in varying portion sizes, one at a time. They estimated each food model's portion size by selecting a digital food image. The same food images were presented in two ways: (i) as small, graduated portion size images all on one screen or (ii) by scrolling across large, graduated portion size images, one per sequential screen. Laboratory-based with computer and food models. Volunteer multi-ethnic sample of 120 children, equally distributed by gender and ages (8 to 13 years) in 2008-2009. Average percentage of correctly classified foods was 60·3 %. There were no differences in accuracy by any design factor or demographic characteristic. Multiple small pictures on the screen at once took half the time to estimate portion size compared with scrolling through large pictures. Larger pictures had more overestimation of size. Multiple images of successively larger portion sizes of a food on one computer screen facilitated quicker portion size responses with no decrease in accuracy. This is the method of choice for portion size estimation on a computer.

  8. A Model Based Approach to Sample Size Estimation in Recent Onset Type 1 Diabetes

    PubMed Central

    Bundy, Brian; Krischer, Jeffrey P.

    2016-01-01

    The area under the curve C-peptide following a 2-hour mixed meal tolerance test from 481 individuals enrolled on 5 prior TrialNet studies of recent onset type 1 diabetes from baseline to 12 months after enrollment were modelled to produce estimates of its rate of loss and variance. Age at diagnosis and baseline C-peptide were found to be significant predictors and adjusting for these in an ANCOVA resulted in estimates with lower variance. Using these results as planning parameters for new studies results in a nearly 50% reduction in the target sample size. The modelling also produces an expected C-peptide that can be used in Observed vs. Expected calculations to estimate the presumption of benefit in ongoing trials. PMID:26991448

  9. Confidence intervals for population allele frequencies: the general case of sampling from a finite diploid population of any size.

    PubMed

    Fung, Tak; Keenan, Kevin

    2014-01-01

    The estimation of population allele frequencies using sample data forms a central component of studies in population genetics. These estimates can be used to test hypotheses on the evolutionary processes governing changes in genetic variation among populations. However, existing studies frequently do not account for sampling uncertainty in these estimates, thus compromising their utility. Incorporation of this uncertainty has been hindered by the lack of a method for constructing confidence intervals containing the population allele frequencies, for the general case of sampling from a finite diploid population of any size. In this study, we address this important knowledge gap by presenting a rigorous mathematical method to construct such confidence intervals. For a range of scenarios, the method is used to demonstrate that for a particular allele, in order to obtain accurate estimates within 0.05 of the population allele frequency with high probability (> or = 95%), a sample size of > 30 is often required. This analysis is augmented by an application of the method to empirical sample allele frequency data for two populations of the checkerspot butterfly (Melitaea cinxia L.), occupying meadows in Finland. For each population, the method is used to derive > or = 98.3% confidence intervals for the population frequencies of three alleles. These intervals are then used to construct two joint > or = 95% confidence regions, one for the set of three frequencies for each population. These regions are then used to derive a > or = 95%% confidence interval for Jost's D, a measure of genetic differentiation between the two populations. Overall, the results demonstrate the practical utility of the method with respect to informing sampling design and accounting for sampling uncertainty in studies of population genetics, important for scientific hypothesis-testing and also for risk-based natural resource management.

  10. Efficient design and inference for multistage randomized trials of individualized treatment policies.

    PubMed

    Dawson, Ree; Lavori, Philip W

    2012-01-01

    Clinical demand for individualized "adaptive" treatment policies in diverse fields has spawned development of clinical trial methodology for their experimental evaluation via multistage designs, building upon methods intended for the analysis of naturalistically observed strategies. Because often there is no need to parametrically smooth multistage trial data (in contrast to observational data for adaptive strategies), it is possible to establish direct connections among different methodological approaches. We show by algebraic proof that the maximum likelihood (ML) and optimal semiparametric (SP) estimators of the population mean of the outcome of a treatment policy and its standard error are equal under certain experimental conditions. This result is used to develop a unified and efficient approach to design and inference for multistage trials of policies that adapt treatment according to discrete responses. We derive a sample size formula expressed in terms of a parametric version of the optimal SP population variance. Nonparametric (sample-based) ML estimation performed well in simulation studies, in terms of achieved power, for scenarios most likely to occur in real studies, even though sample sizes were based on the parametric formula. ML outperformed the SP estimator; differences in achieved power predominately reflected differences in their estimates of the population mean (rather than estimated standard errors). Neither methodology could mitigate the potential for overestimated sample sizes when strong nonlinearity was purposely simulated for certain discrete outcomes; however, such departures from linearity may not be an issue for many clinical contexts that make evaluation of competitive treatment policies meaningful.

  11. Accounting for randomness in measurement and sampling in studying cancer cell population dynamics.

    PubMed

    Ghavami, Siavash; Wolkenhauer, Olaf; Lahouti, Farshad; Ullah, Mukhtar; Linnebacher, Michael

    2014-10-01

    Knowing the expected temporal evolution of the proportion of different cell types in sample tissues gives an indication about the progression of the disease and its possible response to drugs. Such systems have been modelled using Markov processes. We here consider an experimentally realistic scenario in which transition probabilities are estimated from noisy cell population size measurements. Using aggregated data of FACS measurements, we develop MMSE and ML estimators and formulate two problems to find the minimum number of required samples and measurements to guarantee the accuracy of predicted population sizes. Our numerical results show that the convergence mechanism of transition probabilities and steady states differ widely from the real values if one uses the standard deterministic approach for noisy measurements. This provides support for our argument that for the analysis of FACS data one should consider the observed state as a random variable. The second problem we address is about the consequences of estimating the probability of a cell being in a particular state from measurements of small population of cells. We show how the uncertainty arising from small sample sizes can be captured by a distribution for the state probability.

  12. Massively parallel rRNA gene sequencing exacerbates the potential for biased community diversity comparisons due to variable library sizes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gihring, Thomas; Green, Stefan; Schadt, Christopher Warren

    2011-01-01

    Technologies for massively parallel sequencing are revolutionizing microbial ecology and are vastly increasing the scale of ribosomal RNA (rRNA) gene studies. Although pyrosequencing has increased the breadth and depth of possible rRNA gene sampling, one drawback is that the number of reads obtained per sample is difficult to control. Pyrosequencing libraries typically vary widely in the number of sequences per sample, even within individual studies, and there is a need to revisit the behaviour of richness estimators and diversity indices with variable gene sequence library sizes. Multiple reports and review papers have demonstrated the bias in non-parametric richness estimators (e.g.more » Chao1 and ACE) and diversity indices when using clone libraries. However, we found that biased community comparisons are accumulating in the literature. Here we demonstrate the effects of sample size on Chao1, ACE, CatchAll, Shannon, Chao-Shen and Simpson's estimations specifically using pyrosequencing libraries. The need to equalize the number of reads being compared across libraries is reiterated, and investigators are directed towards available tools for making unbiased diversity comparisons.« less

  13. Effect of Study Design on Sample Size in Studies Intended to Evaluate Bioequivalence of Inhaled Short-Acting β-Agonist Formulations.

    PubMed

    Zeng, Yaohui; Singh, Sachinkumar; Wang, Kai; Ahrens, Richard C

    2018-04-01

    Pharmacodynamic studies that use methacholine challenge to assess bioequivalence of generic and innovator albuterol formulations are generally designed per published Food and Drug Administration guidance, with 3 reference doses and 1 test dose (3-by-1 design). These studies are challenging and expensive to conduct, typically requiring large sample sizes. We proposed 14 modified study designs as alternatives to the Food and Drug Administration-recommended 3-by-1 design, hypothesizing that adding reference and/or test doses would reduce sample size and cost. We used Monte Carlo simulation to estimate sample size. Simulation inputs were selected based on published studies and our own experience with this type of trial. We also estimated effects of these modified study designs on study cost. Most of these altered designs reduced sample size and cost relative to the 3-by-1 design, some decreasing cost by more than 40%. The most effective single study dose to add was 180 μg of test formulation, which resulted in an estimated 30% relative cost reduction. Adding a single test dose of 90 μg was less effective, producing only a 13% cost reduction. Adding a lone reference dose of either 180, 270, or 360 μg yielded little benefit (less than 10% cost reduction), whereas adding 720 μg resulted in a 19% cost reduction. Of the 14 study design modifications we evaluated, the most effective was addition of both a 90-μg test dose and a 720-μg reference dose (42% cost reduction). Combining a 180-μg test dose and a 720-μg reference dose produced an estimated 36% cost reduction. © 2017, The Authors. The Journal of Clinical Pharmacology published by Wiley Periodicals, Inc. on behalf of American College of Clinical Pharmacology.

  14. SU-E-I-46: Sample-Size Dependence of Model Observers for Estimating Low-Contrast Detection Performance From CT Images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reiser, I; Lu, Z

    2014-06-01

    Purpose: Recently, task-based assessment of diagnostic CT systems has attracted much attention. Detection task performance can be estimated using human observers, or mathematical observer models. While most models are well established, considerable bias can be introduced when performance is estimated from a limited number of image samples. Thus, the purpose of this work was to assess the effect of sample size on bias and uncertainty of two channelized Hotelling observers and a template-matching observer. Methods: The image data used for this study consisted of 100 signal-present and 100 signal-absent regions-of-interest, which were extracted from CT slices. The experimental conditions includedmore » two signal sizes and five different x-ray beam current settings (mAs). Human observer performance for these images was determined in 2-alternative forced choice experiments. These data were provided by the Mayo clinic in Rochester, MN. Detection performance was estimated from three observer models, including channelized Hotelling observers (CHO) with Gabor or Laguerre-Gauss (LG) channels, and a template-matching observer (TM). Different sample sizes were generated by randomly selecting a subset of image pairs, (N=20,40,60,80). Observer performance was quantified as proportion of correct responses (PC). Bias was quantified as the relative difference of PC for 20 and 80 image pairs. Results: For n=100, all observer models predicted human performance across mAs and signal sizes. Bias was 23% for CHO (Gabor), 7% for CHO (LG), and 3% for TM. The relative standard deviation, σ(PC)/PC at N=20 was highest for the TM observer (11%) and lowest for the CHO (Gabor) observer (5%). Conclusion: In order to make image quality assessment feasible in the clinical practice, a statistically efficient observer model, that can predict performance from few samples, is needed. Our results identified two observer models that may be suited for this task.« less

  15. Biofouling on buoyant marine plastics: An experimental study into the effect of size on surface longevity.

    PubMed

    Fazey, Francesca M C; Ryan, Peter G

    2016-03-01

    Recent estimates suggest that roughly 100 times more plastic litter enters the sea than is found floating at the sea surface, despite the buoyancy and durability of many plastic polymers. Biofouling by marine biota is one possible mechanism responsible for this discrepancy. Microplastics (<5 mm in diameter) are more scarce than larger size classes, which makes sense because fouling is a function of surface area whereas buoyancy is a function of volume; the smaller an object, the greater its relative surface area. We tested whether plastic items with high surface area to volume ratios sank more rapidly by submerging 15 different sizes of polyethylene samples in False Bay, South Africa, for 12 weeks to determine the time required for samples to sink. All samples became sufficiently fouled to sink within the study period, but small samples lost buoyancy much faster than larger ones. There was a direct relationship between sample volume (buoyancy) and the time to attain a 50% probability of sinking, which ranged from 17 to 66 days of exposure. Our results provide the first estimates of the longevity of different sizes of plastic debris at the ocean surface. Further research is required to determine how fouling rates differ on free floating debris in different regions and in different types of marine environments. Such estimates could be used to improve model predictions of the distribution and abundance of floating plastic debris globally. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. A two-stage Monte Carlo approach to the expression of uncertainty with finite sample sizes.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Crowder, Stephen Vernon; Moyer, Robert D.

    2005-05-01

    Proposed supplement I to the GUM outlines a 'propagation of distributions' approach to deriving the distribution of a measurand for any non-linear function and for any set of random inputs. The supplement's proposed Monte Carlo approach assumes that the distributions of the random inputs are known exactly. This implies that the sample sizes are effectively infinite. In this case, the mean of the measurand can be determined precisely using a large number of Monte Carlo simulations. In practice, however, the distributions of the inputs will rarely be known exactly, but must be estimated using possibly small samples. If these approximatedmore » distributions are treated as exact, the uncertainty in estimating the mean is not properly taken into account. In this paper, we propose a two-stage Monte Carlo procedure that explicitly takes into account the finite sample sizes used to estimate parameters of the input distributions. We will illustrate the approach with a case study involving the efficiency of a thermistor mount power sensor. The performance of the proposed approach will be compared to the standard GUM approach for finite samples using simple non-linear measurement equations. We will investigate performance in terms of coverage probabilities of derived confidence intervals.« less

  17. Number of pins in two-stage stratified sampling for estimating herbage yield

    Treesearch

    William G. O' Regan; C. Eugene Conrad

    1975-01-01

    In a two-stage stratified procedure for sampling herbage yield, plots are stratified by a pin frame in stage one, and clipped. In stage two, clippings from selected plots are sorted, dried, and weighed. Sample size and distribution of plots between the two stages are determined by equations. A way to compute the effect of number of pins on the variance of estimated...

  18. Improving the accuracy of hyaluronic acid molecular weight estimation by conventional size exclusion chromatography.

    PubMed

    Shanmuga Doss, Sreeja; Bhatt, Nirav Pravinbhai; Jayaraman, Guhan

    2017-08-15

    There is an unreasonably high variation in the literature reports on molecular weight of hyaluronic acid (HA) estimated using conventional size exclusion chromatography (SEC). This variation is most likely due to errors in estimation. Working with commercially available HA molecular weight standards, this work examines the extent of error in molecular weight estimation due to two factors: use of non-HA based calibration and concentration of sample injected into the SEC column. We develop a multivariate regression correlation to correct for concentration effect. Our analysis showed that, SEC calibration based on non-HA standards like polyethylene oxide and pullulan led to approximately 2 and 10 times overestimation, respectively, when compared to HA-based calibration. Further, we found that injected sample concentration has an effect on molecular weight estimation. Even at 1g/l injected sample concentration, HA molecular weight standards of 0.7 and 1.64MDa showed appreciable underestimation of 11-24%. The multivariate correlation developed was found to reduce error in estimations at 1g/l to <4%. The correlation was also successfully applied to accurately estimate the molecular weight of HA produced by a recombinant Lactococcus lactis fermentation. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. EVALUATING PROBABILITY SAMPLING STRATEGIES FOR ESTIMATING REDD COUNTS: AN EXAMPLE WITH CHINOOK SALMON (Oncorhynchus tshawytscha)

    EPA Science Inventory

    Precise, unbiased estimates of population size are an essential tool for fisheries management. For a wide variety of salmonid fishes, redd counts from a sample of reaches are commonly used to monitor annual trends in abundance. Using a 9-year time series of georeferenced censuses...

  20. The effects of neutralized particles on the sampling efficiency of polyurethane foam used to estimate the extrathoracic deposition fraction.

    PubMed

    Tomyn, Ronald L; Sleeth, Darrah K; Thiese, Matthew S; Larson, Rodney R

    2016-01-01

    In addition to chemical composition, the site of deposition of inhaled particles is important for determining the potential health effects from an exposure. As a result, the International Organization for Standardization adopted a particle deposition sampling convention. This includes extrathoracic particle deposition sampling conventions for the anterior nasal passages (ET1) and the posterior nasal and oral passages (ET2). This study assessed how well a polyurethane foam insert placed in an Institute of Occupational Medicine (IOM) sampler can match an extrathoracic deposition sampling convention, while accounting for possible static buildup in the test particles. In this way, the study aimed to assess whether neutralized particles affected the performance of this sampler for estimating extrathoracic particle deposition. A total of three different particle sizes (4.9, 9.5, and 12.8 µm) were used. For each trial, one particle size was introduced into a low-speed wind tunnel with a wind speed set a 0.2 m/s (∼40 ft/min). This wind speed was chosen to closely match the conditions of most indoor working environments. Each particle size was tested twice either neutralized, using a high voltage neutralizer, or left in its normal (non neutralized) state as standard particles. IOM samplers were fitted with a polyurethane foam insert and placed on a rotating mannequin inside the wind tunnel. Foam sampling efficiencies were calculated for all trials to compare against the normalized ET1 sampling deposition convention. The foam sampling efficiencies matched well to the ET1 deposition convention for the larger particle sizes, but had a general trend of underestimating for all three particle sizes. The results of a Wilcoxon Rank Sum Test also showed that only at 4.9 µm was there a statistically significant difference (p-value = 0.03) between the foam sampling efficiency using the standard particles and the neutralized particles. This is interpreted to mean that static buildup may be occurring and neutralizing the particles that are 4.9 µm diameter in size did affect the performance of the foam sampler when estimating extrathoracic particle deposition.

  1. Quantitative endoscopy: initial accuracy measurements.

    PubMed

    Truitt, T O; Adelman, R A; Kelly, D H; Willging, J P

    2000-02-01

    The geometric optics of an endoscope can be used to determine the absolute size of an object in an endoscopic field without knowing the actual distance from the object. This study explores the accuracy of a technique that estimates absolute object size from endoscopic images. Quantitative endoscopy involves calibrating a rigid endoscope to produce size estimates from 2 images taken with a known traveled distance between the images. The heights of 12 samples, ranging in size from 0.78 to 11.80 mm, were estimated with this calibrated endoscope. Backup distances of 5 mm and 10 mm were used for comparison. The mean percent error for all estimated measurements when compared with the actual object sizes was 1.12%. The mean errors for 5-mm and 10-mm backup distances were 0.76% and 1.65%, respectively. The mean errors for objects <2 mm and > or =2 mm were 0.94% and 1.18%, respectively. Quantitative endoscopy estimates endoscopic image size to within 5% of the actual object size. This method remains promising for quantitatively evaluating object size from endoscopic images. It does not require knowledge of the absolute distance of the endoscope from the object, rather, only the distance traveled by the endoscope between images.

  2. Scalable population estimates using spatial-stream-network (SSN) models, fish density surveys, and national geospatial database frameworks for streams

    Treesearch

    Daniel J. Isaak; Jay M. Ver Hoef; Erin E. Peterson; Dona L. Horan; David E. Nagel

    2017-01-01

    Population size estimates for stream fishes are important for conservation and management, but sampling costs limit the extent of most estimates to small portions of river networks that encompass 100s–10 000s of linear kilometres. However, the advent of large fish density data sets, spatial-stream-network (SSN) models that benefit from nonindependence among samples,...

  3. The Effects of Small Sample Size on Identifying Polytomous DIF Using the Liu-Agresti Estimator of the Cumulative Common Odds Ratio

    ERIC Educational Resources Information Center

    Carvajal, Jorge; Skorupski, William P.

    2010-01-01

    This study is an evaluation of the behavior of the Liu-Agresti estimator of the cumulative common odds ratio when identifying differential item functioning (DIF) with polytomously scored test items using small samples. The Liu-Agresti estimator has been proposed by Penfield and Algina as a promising approach for the study of polytomous DIF but no…

  4. Optimization of scat detection methods for a social ungulate, the wild pig, and experimental evaluation of factors affecting detection of scat

    USGS Publications Warehouse

    Keiter, David A.; Cunningham, Fred L.; Rhodes, Olin E.; Irwin, Brian J.; Beasley, James

    2016-01-01

    Collection of scat samples is common in wildlife research, particularly for genetic capture-mark-recapture applications. Due to high degradation rates of genetic material in scat, large numbers of samples must be collected to generate robust estimates. Optimization of sampling approaches to account for taxa-specific patterns of scat deposition is, therefore, necessary to ensure sufficient sample collection. While scat collection methods have been widely studied in carnivores, research to maximize scat collection and noninvasive sampling efficiency for social ungulates is lacking. Further, environmental factors or scat morphology may influence detection of scat by observers. We contrasted performance of novel radial search protocols with existing adaptive cluster sampling protocols to quantify differences in observed amounts of wild pig (Sus scrofa) scat. We also evaluated the effects of environmental (percentage of vegetative ground cover and occurrence of rain immediately prior to sampling) and scat characteristics (fecal pellet size and number) on the detectability of scat by observers. We found that 15- and 20-m radial search protocols resulted in greater numbers of scats encountered than the previously used adaptive cluster sampling approach across habitat types, and that fecal pellet size, number of fecal pellets, percent vegetative ground cover, and recent rain events were significant predictors of scat detection. Our results suggest that use of a fixed-width radial search protocol may increase the number of scats detected for wild pigs, or other social ungulates, allowing more robust estimation of population metrics using noninvasive genetic sampling methods. Further, as fecal pellet size affected scat detection, juvenile or smaller-sized animals may be less detectable than adult or large animals, which could introduce bias into abundance estimates. Knowledge of relationships between environmental variables and scat detection may allow researchers to optimize sampling protocols to maximize utility of noninvasive sampling for wild pigs and other social ungulates.

  5. Optimization of scat detection methods for a social ungulate, the wild pig, and experimental evaluation of factors affecting detection of scat

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keiter, David A.; Cunningham, Fred L.; Rhodes, Jr., Olin E.

    Collection of scat samples is common in wildlife research, particularly for genetic capture-mark-recapture applications. Due to high degradation rates of genetic material in scat, large numbers of samples must be collected to generate robust estimates. Optimization of sampling approaches to account for taxa-specific patterns of scat deposition is, therefore, necessary to ensure sufficient sample collection. While scat collection methods have been widely studied in carnivores, research to maximize scat collection and noninvasive sampling efficiency for social ungulates is lacking. Further, environmental factors or scat morphology may influence detection of scat by observers. We contrasted performance of novel radial search protocolsmore » with existing adaptive cluster sampling protocols to quantify differences in observed amounts of wild pig ( Sus scrofa) scat. We also evaluated the effects of environmental (percentage of vegetative ground cover and occurrence of rain immediately prior to sampling) and scat characteristics (fecal pellet size and number) on the detectability of scat by observers. We found that 15- and 20-m radial search protocols resulted in greater numbers of scats encountered than the previously used adaptive cluster sampling approach across habitat types, and that fecal pellet size, number of fecal pellets, percent vegetative ground cover, and recent rain events were significant predictors of scat detection. Our results suggest that use of a fixed-width radial search protocol may increase the number of scats detected for wild pigs, or other social ungulates, allowing more robust estimation of population metrics using noninvasive genetic sampling methods. Further, as fecal pellet size affected scat detection, juvenile or smaller-sized animals may be less detectable than adult or large animals, which could introduce bias into abundance estimates. In conclusion, knowledge of relationships between environmental variables and scat detection may allow researchers to optimize sampling protocols to maximize utility of noninvasive sampling for wild pigs and other social ungulates.« less

  6. Optimization of Scat Detection Methods for a Social Ungulate, the Wild Pig, and Experimental Evaluation of Factors Affecting Detection of Scat.

    PubMed

    Keiter, David A; Cunningham, Fred L; Rhodes, Olin E; Irwin, Brian J; Beasley, James C

    2016-01-01

    Collection of scat samples is common in wildlife research, particularly for genetic capture-mark-recapture applications. Due to high degradation rates of genetic material in scat, large numbers of samples must be collected to generate robust estimates. Optimization of sampling approaches to account for taxa-specific patterns of scat deposition is, therefore, necessary to ensure sufficient sample collection. While scat collection methods have been widely studied in carnivores, research to maximize scat collection and noninvasive sampling efficiency for social ungulates is lacking. Further, environmental factors or scat morphology may influence detection of scat by observers. We contrasted performance of novel radial search protocols with existing adaptive cluster sampling protocols to quantify differences in observed amounts of wild pig (Sus scrofa) scat. We also evaluated the effects of environmental (percentage of vegetative ground cover and occurrence of rain immediately prior to sampling) and scat characteristics (fecal pellet size and number) on the detectability of scat by observers. We found that 15- and 20-m radial search protocols resulted in greater numbers of scats encountered than the previously used adaptive cluster sampling approach across habitat types, and that fecal pellet size, number of fecal pellets, percent vegetative ground cover, and recent rain events were significant predictors of scat detection. Our results suggest that use of a fixed-width radial search protocol may increase the number of scats detected for wild pigs, or other social ungulates, allowing more robust estimation of population metrics using noninvasive genetic sampling methods. Further, as fecal pellet size affected scat detection, juvenile or smaller-sized animals may be less detectable than adult or large animals, which could introduce bias into abundance estimates. Knowledge of relationships between environmental variables and scat detection may allow researchers to optimize sampling protocols to maximize utility of noninvasive sampling for wild pigs and other social ungulates.

  7. Optimization of scat detection methods for a social ungulate, the wild pig, and experimental evaluation of factors affecting detection of scat

    DOE PAGES

    Keiter, David A.; Cunningham, Fred L.; Rhodes, Jr., Olin E.; ...

    2016-05-25

    Collection of scat samples is common in wildlife research, particularly for genetic capture-mark-recapture applications. Due to high degradation rates of genetic material in scat, large numbers of samples must be collected to generate robust estimates. Optimization of sampling approaches to account for taxa-specific patterns of scat deposition is, therefore, necessary to ensure sufficient sample collection. While scat collection methods have been widely studied in carnivores, research to maximize scat collection and noninvasive sampling efficiency for social ungulates is lacking. Further, environmental factors or scat morphology may influence detection of scat by observers. We contrasted performance of novel radial search protocolsmore » with existing adaptive cluster sampling protocols to quantify differences in observed amounts of wild pig ( Sus scrofa) scat. We also evaluated the effects of environmental (percentage of vegetative ground cover and occurrence of rain immediately prior to sampling) and scat characteristics (fecal pellet size and number) on the detectability of scat by observers. We found that 15- and 20-m radial search protocols resulted in greater numbers of scats encountered than the previously used adaptive cluster sampling approach across habitat types, and that fecal pellet size, number of fecal pellets, percent vegetative ground cover, and recent rain events were significant predictors of scat detection. Our results suggest that use of a fixed-width radial search protocol may increase the number of scats detected for wild pigs, or other social ungulates, allowing more robust estimation of population metrics using noninvasive genetic sampling methods. Further, as fecal pellet size affected scat detection, juvenile or smaller-sized animals may be less detectable than adult or large animals, which could introduce bias into abundance estimates. In conclusion, knowledge of relationships between environmental variables and scat detection may allow researchers to optimize sampling protocols to maximize utility of noninvasive sampling for wild pigs and other social ungulates.« less

  8. Sample size determination for GEE analyses of stepped wedge cluster randomized trials.

    PubMed

    Li, Fan; Turner, Elizabeth L; Preisser, John S

    2018-06-19

    In stepped wedge cluster randomized trials, intact clusters of individuals switch from control to intervention from a randomly assigned period onwards. Such trials are becoming increasingly popular in health services research. When a closed cohort is recruited from each cluster for longitudinal follow-up, proper sample size calculation should account for three distinct types of intraclass correlations: the within-period, the inter-period, and the within-individual correlations. Setting the latter two correlation parameters to be equal accommodates cross-sectional designs. We propose sample size procedures for continuous and binary responses within the framework of generalized estimating equations that employ a block exchangeable within-cluster correlation structure defined from the distinct correlation types. For continuous responses, we show that the intraclass correlations affect power only through two eigenvalues of the correlation matrix. We demonstrate that analytical power agrees well with simulated power for as few as eight clusters, when data are analyzed using bias-corrected estimating equations for the correlation parameters concurrently with a bias-corrected sandwich variance estimator. © 2018, The International Biometric Society.

  9. Estimating relative decline in populations of subterranean termites (Isoptera: Rhinotermitidae) due to baiting.

    PubMed

    Evans, T A

    2001-12-01

    Although mark-recapture protocols produce inaccurate population estimates of termite colonies, they might be employed to estimate a relative change in colony size. This possibility was tested using two Australian, mound-building, wood-eating, subterranean Coptotermes species. Three different toxicants delivered in baits were used to decrease (but not eliminate) colony size, and a single mark-recapture protocol was used to estimate pre- and postbaiting population sizes. For both species, the numbers of termites retrieved from bait stations varied widely, resulting in no significant differences in the numbers of termites sampled between treatments in either the pre- or postbaiting protocols. There were significantly fewer termites sampled in all treatments, controls included, in the postbaiting protocol compared with the pre-, suggesting a seasonal change in forager numbers. The comparison of population estimates shows a large decrease in toxicant treated colonies compared with little change in control colonies, which suggests that estimating the relative decline in population size using mark-recapture protocols might to be possible. However, the change in population estimate was due entirely to the significantly lower recapture rate in the control colonies relative to the toxicant treated colonies, as numbers of unmarked termites did not change between treatments. The population estimates should be treated with caution because low recapture rates produce dubious population estimates and, in some cases, postbaiting mark-recapture population estimates could be much greater than those at prebaiting, despite consumption of bait in sufficient quantities to cause population decline. A possible interaction between fat-stain markers and toxicants should be investigated if mark-recapture population estimates are used. Alternative methods of population change are advised, along with other indirect measures.

  10. Using Structural Equation Modeling to Assess Functional Connectivity in the Brain: Power and Sample Size Considerations

    ERIC Educational Resources Information Center

    Sideridis, Georgios; Simos, Panagiotis; Papanicolaou, Andrew; Fletcher, Jack

    2014-01-01

    The present study assessed the impact of sample size on the power and fit of structural equation modeling applied to functional brain connectivity hypotheses. The data consisted of time-constrained minimum norm estimates of regional brain activity during performance of a reading task obtained with magnetoencephalography. Power analysis was first…

  11. Determining sample size for tree utilization surveys

    Treesearch

    Stanley J. Zarnoch; James W. Bentley; Tony G. Johnson

    2004-01-01

    The U.S. Department of Agriculture Forest Service has conducted many studies to determine what proportion of the timber harvested in the South is actually utilized. This paper describes the statistical methods used to determine required sample sizes for estimating utilization ratios for a required level of precision. The data used are those for 515 hardwood and 1,557...

  12. A Monte-Carlo simulation analysis for evaluating the severity distribution functions (SDFs) calibration methodology and determining the minimum sample-size requirements.

    PubMed

    Shirazi, Mohammadali; Reddy Geedipally, Srinivas; Lord, Dominique

    2017-01-01

    Severity distribution functions (SDFs) are used in highway safety to estimate the severity of crashes and conduct different types of safety evaluations and analyses. Developing a new SDF is a difficult task and demands significant time and resources. To simplify the process, the Highway Safety Manual (HSM) has started to document SDF models for different types of facilities. As such, SDF models have recently been introduced for freeway and ramps in HSM addendum. However, since these functions or models are fitted and validated using data from a few selected number of states, they are required to be calibrated to the local conditions when applied to a new jurisdiction. The HSM provides a methodology to calibrate the models through a scalar calibration factor. However, the proposed methodology to calibrate SDFs was never validated through research. Furthermore, there are no concrete guidelines to select a reliable sample size. Using extensive simulation, this paper documents an analysis that examined the bias between the 'true' and 'estimated' calibration factors. It was indicated that as the value of the true calibration factor deviates further away from '1', more bias is observed between the 'true' and 'estimated' calibration factors. In addition, simulation studies were performed to determine the calibration sample size for various conditions. It was found that, as the average of the coefficient of variation (CV) of the 'KAB' and 'C' crashes increases, the analyst needs to collect a larger sample size to calibrate SDF models. Taking this observation into account, sample-size guidelines are proposed based on the average CV of crash severities that are used for the calibration process. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. In Search of the Largest Possible Tsunami: An Example Following the 2011 Japan Tsunami

    NASA Astrophysics Data System (ADS)

    Geist, E. L.; Parsons, T.

    2012-12-01

    Many tsunami hazard assessments focus on estimating the largest possible tsunami: i.e., the worst-case scenario. This is typically performed by examining historic and prehistoric tsunami data or by estimating the largest source that can produce a tsunami. We demonstrate that worst-case assessments derived from tsunami and tsunami-source catalogs are greatly affected by sampling bias. Both tsunami and tsunami sources are well represented by a Pareto distribution. It is intuitive to assume that there is some limiting size (i.e., runup or seismic moment) for which a Pareto distribution is truncated or tapered. Likelihood methods are used to determine whether a limiting size can be determined from existing catalogs. Results from synthetic catalogs indicate that several observations near the limiting size are needed for accurate parameter estimation. Accordingly, the catalog length needed to empirically determine the limiting size is dependent on the difference between the limiting size and the observation threshold, with larger catalog lengths needed for larger limiting-threshold size differences. Most, if not all, tsunami catalogs and regional tsunami source catalogs are of insufficient length to determine the upper bound on tsunami runup. As an example, estimates of the empirical tsunami runup distribution are obtained from the Miyako tide gauge station in Japan, which recorded the 2011 Tohoku-oki tsunami as the largest tsunami among 51 other events. Parameter estimation using a tapered Pareto distribution is made both with and without the Tohoku-oki event. The catalog without the 2011 event appears to have a low limiting tsunami runup. However, this is an artifact of undersampling. Including the 2011 event, the catalog conforms more to a pure Pareto distribution with no confidence in estimating a limiting runup. Estimating the size distribution of regional tsunami sources is subject to the same sampling bias. Physical attenuation mechanisms such as wave breaking likely limit the maximum tsunami runup at a particular site. However, historic and prehistoric data alone cannot determine the upper bound on tsunami runup. Because of problems endemic to sampling Pareto distributions of tsunamis and their sources, we recommend that tsunami hazard assessment be based on a specific design probability of exceedance following a pure Pareto distribution, rather than attempting to determine the worst-case scenario.

  14. Technical Factors Influencing Cone Packing Density Estimates in Adaptive Optics Flood Illuminated Retinal Images

    PubMed Central

    Lombardo, Marco; Serrao, Sebastiano; Lombardo, Giuseppe

    2014-01-01

    Purpose To investigate the influence of various technical factors on the variation of cone packing density estimates in adaptive optics flood illuminated retinal images. Methods Adaptive optics images of the photoreceptor mosaic were obtained in fifteen healthy subjects. The cone density and Voronoi diagrams were assessed in sampling windows of 320×320 µm, 160×160 µm and 64×64 µm at 1.5 degree temporal and superior eccentricity from the preferred locus of fixation (PRL). The technical factors that have been analyzed included the sampling window size, the corrected retinal magnification factor (RMFcorr), the conversion from radial to linear distance from the PRL, the displacement between the PRL and foveal center and the manual checking of cone identification algorithm. Bland-Altman analysis was used to assess the agreement between cone density estimated within the different sampling window conditions. Results The cone density declined with decreasing sampling area and data between areas of different size showed low agreement. A high agreement was found between sampling areas of the same size when comparing density calculated with or without using individual RMFcorr. The agreement between cone density measured at radial and linear distances from the PRL and between data referred to the PRL or the foveal center was moderate. The percentage of Voronoi tiles with hexagonal packing arrangement was comparable between sampling areas of different size. The boundary effect, presence of any retinal vessels, and the manual selection of cones missed by the automated identification algorithm were identified as the factors influencing variation of cone packing arrangements in Voronoi diagrams. Conclusions The sampling window size is the main technical factor that influences variation of cone density. Clear identification of each cone in the image and the use of a large buffer zone are necessary to minimize factors influencing variation of Voronoi diagrams of the cone mosaic. PMID:25203681

  15. Technical factors influencing cone packing density estimates in adaptive optics flood illuminated retinal images.

    PubMed

    Lombardo, Marco; Serrao, Sebastiano; Lombardo, Giuseppe

    2014-01-01

    To investigate the influence of various technical factors on the variation of cone packing density estimates in adaptive optics flood illuminated retinal images. Adaptive optics images of the photoreceptor mosaic were obtained in fifteen healthy subjects. The cone density and Voronoi diagrams were assessed in sampling windows of 320×320 µm, 160×160 µm and 64×64 µm at 1.5 degree temporal and superior eccentricity from the preferred locus of fixation (PRL). The technical factors that have been analyzed included the sampling window size, the corrected retinal magnification factor (RMFcorr), the conversion from radial to linear distance from the PRL, the displacement between the PRL and foveal center and the manual checking of cone identification algorithm. Bland-Altman analysis was used to assess the agreement between cone density estimated within the different sampling window conditions. The cone density declined with decreasing sampling area and data between areas of different size showed low agreement. A high agreement was found between sampling areas of the same size when comparing density calculated with or without using individual RMFcorr. The agreement between cone density measured at radial and linear distances from the PRL and between data referred to the PRL or the foveal center was moderate. The percentage of Voronoi tiles with hexagonal packing arrangement was comparable between sampling areas of different size. The boundary effect, presence of any retinal vessels, and the manual selection of cones missed by the automated identification algorithm were identified as the factors influencing variation of cone packing arrangements in Voronoi diagrams. The sampling window size is the main technical factor that influences variation of cone density. Clear identification of each cone in the image and the use of a large buffer zone are necessary to minimize factors influencing variation of Voronoi diagrams of the cone mosaic.

  16. Performance of small cluster surveys and the clustered LQAS design to estimate local-level vaccination coverage in Mali

    PubMed Central

    2012-01-01

    Background Estimation of vaccination coverage at the local level is essential to identify communities that may require additional support. Cluster surveys can be used in resource-poor settings, when population figures are inaccurate. To be feasible, cluster samples need to be small, without losing robustness of results. The clustered LQAS (CLQAS) approach has been proposed as an alternative, as smaller sample sizes are required. Methods We explored (i) the efficiency of cluster surveys of decreasing sample size through bootstrapping analysis and (ii) the performance of CLQAS under three alternative sampling plans to classify local VC, using data from a survey carried out in Mali after mass vaccination against meningococcal meningitis group A. Results VC estimates provided by a 10 × 15 cluster survey design were reasonably robust. We used them to classify health areas in three categories and guide mop-up activities: i) health areas not requiring supplemental activities; ii) health areas requiring additional vaccination; iii) health areas requiring further evaluation. As sample size decreased (from 10 × 15 to 10 × 3), standard error of VC and ICC estimates were increasingly unstable. Results of CLQAS simulations were not accurate for most health areas, with an overall risk of misclassification greater than 0.25 in one health area out of three. It was greater than 0.50 in one health area out of two under two of the three sampling plans. Conclusions Small sample cluster surveys (10 × 15) are acceptably robust for classification of VC at local level. We do not recommend the CLQAS method as currently formulated for evaluating vaccination programmes. PMID:23057445

  17. Design and Weighting Methods for a Nationally Representative Sample of HIV-infected Adults Receiving Medical Care in the United States-Medical Monitoring Project

    PubMed Central

    Iachan, Ronaldo; H. Johnson, Christopher; L. Harding, Richard; Kyle, Tonja; Saavedra, Pedro; L. Frazier, Emma; Beer, Linda; L. Mattson, Christine; Skarbinski, Jacek

    2016-01-01

    Background: Health surveys of the general US population are inadequate for monitoring human immunodeficiency virus (HIV) infection because the relatively low prevalence of the disease (<0.5%) leads to small subpopulation sample sizes. Objective: To collect a nationally and locally representative probability sample of HIV-infected adults receiving medical care to monitor clinical and behavioral outcomes, supplementing the data in the National HIV Surveillance System. This paper describes the sample design and weighting methods for the Medical Monitoring Project (MMP) and provides estimates of the size and characteristics of this population. Methods: To develop a method for obtaining valid, representative estimates of the in-care population, we implemented a cross-sectional, three-stage design that sampled 23 jurisdictions, then 691 facilities, then 9,344 HIV patients receiving medical care, using probability-proportional-to-size methods. The data weighting process followed standard methods, accounting for the probabilities of selection at each stage and adjusting for nonresponse and multiplicity. Nonresponse adjustments accounted for differing response at both facility and patient levels. Multiplicity adjustments accounted for visits to more than one HIV care facility. Results: MMP used a multistage stratified probability sampling design that was approximately self-weighting in each of the 23 project areas and nationally. The probability sample represents the estimated 421,186 HIV-infected adults receiving medical care during January through April 2009. Methods were efficient (i.e., induced small, unequal weighting effects and small standard errors for a range of weighted estimates). Conclusion: The information collected through MMP allows monitoring trends in clinical and behavioral outcomes and informs resource allocation for treatment and prevention activities. PMID:27651851

  18. Some comments on Anderson and Pospahala's correction of bias in line transect sampling

    USGS Publications Warehouse

    Anderson, D.R.; Burnham, K.P.; Chain, B.R.

    1980-01-01

    ANDERSON and POSPAHALA (1970) investigated the estimation of wildlife population size using the belt or line transect sampling method and devised a correction for bias, thus leading to an estimator with interesting characteristics. This work was given a uniform mathematical framework in BURNHAM and ANDERSON (1976). In this paper we show that the ANDERSON-POSPAHALA estimator is optimal in the sense of being the (unique) best linear unbiased estimator within the class of estimators which are linear combinations of cell frequencies, provided certain assumptions are met.

  19. An internal pilot study for a randomized trial aimed at evaluating the effectiveness of iron interventions in children with non-anemic iron deficiency: the OptEC trial.

    PubMed

    Abdullah, Kawsari; Thorpe, Kevin E; Mamak, Eva; Maguire, Jonathon L; Birken, Catherine S; Fehlings, Darcy; Hanley, Anthony J; Macarthur, Colin; Zlotkin, Stanley H; Parkin, Patricia C

    2015-07-14

    The OptEC trial aims to evaluate the effectiveness of oral iron in young children with non-anemic iron deficiency (NAID). The initial sample size calculated for the OptEC trial ranged from 112-198 subjects. Given the uncertainty regarding the parameters used to calculate the sample, an internal pilot study was conducted. The objectives of this internal pilot study were to obtain reliable estimate of parameters (standard deviation and design factor) to recalculate the sample size and to assess the adherence rate and reasons for non-adherence in children enrolled in the pilot study. The first 30 subjects enrolled into the OptEC trial constituted the internal pilot study. The primary outcome of the OptEC trial is the Early Learning Composite (ELC). For estimation of the SD of the ELC, descriptive statistics of the 4 month follow-up ELC scores were assessed within each intervention group. The observed SD within each group was then pooled to obtain an estimated SD (S2) of the ELC. Correlation (ρ) between the ELC measured at baseline and follow-up was assessed. Recalculation of the sample size was performed using analysis of covariance (ANCOVA) method which uses the design factor (1- ρ(2)). Adherence rate was calculated using a parent reported rate of missed doses of the study intervention. The new estimate of the SD of the ELC was found to be 17.40 (S2). The design factor was (1- ρ2) = 0.21. Using a significance level of 5%, power of 80%, S2 = 17.40 and effect estimate (Δ) ranging from 6-8 points, the new sample size based on ANCOVA method ranged from 32-56 subjects (16-28 per group). Adherence ranged between 14% and 100% with 44% of the children having an adherence rate ≥ 86%. Information generated from our internal pilot study was used to update the design of the full and definitive trial, including recalculation of sample size, determination of the adequacy of adherence, and application of strategies to improve adherence. ClinicalTrials.gov Identifier: NCT01481766 (date of registration: November 22, 2011).

  20. Bayesian Modal Estimation of the Four-Parameter Item Response Model in Real, Realistic, and Idealized Data Sets.

    PubMed

    Waller, Niels G; Feuerstahler, Leah

    2017-01-01

    In this study, we explored item and person parameter recovery of the four-parameter model (4PM) in over 24,000 real, realistic, and idealized data sets. In the first analyses, we fit the 4PM and three alternative models to data from three Minnesota Multiphasic Personality Inventory-Adolescent form factor scales using Bayesian modal estimation (BME). Our results indicated that the 4PM fits these scales better than simpler item Response Theory (IRT) models. Next, using the parameter estimates from these real data analyses, we estimated 4PM item parameters in 6,000 realistic data sets to establish minimum sample size requirements for accurate item and person parameter recovery. Using a factorial design that crossed discrete levels of item parameters, sample size, and test length, we also fit the 4PM to an additional 18,000 idealized data sets to extend our parameter recovery findings. Our combined results demonstrated that 4PM item parameters and parameter functions (e.g., item response functions) can be accurately estimated using BME in moderate to large samples (N ⩾ 5, 000) and person parameters can be accurately estimated in smaller samples (N ⩾ 1, 000). In the supplemental files, we report annotated [Formula: see text] code that shows how to estimate 4PM item and person parameters in [Formula: see text] (Chalmers, 2012 ).

  1. Non-destructive crystal size determination in geological samples of archaeological use by means of infrared spectroscopy.

    PubMed

    Olivares, M; Larrañaga, A; Irazola, M; Sarmiento, A; Murelaga, X; Etxebarria, N

    2012-08-30

    The determination of crystal size of chert samples can provide suitable information about the raw material used for the manufacture of archeological items. X-ray diffraction (XRD) has been widely used for this purpose in several scientific areas. However, the historical value of archeological pieces makes this procedure sometimes unfeasible and thus, non-invasive new analytical approaches are required. In this sense, a new method was developed relating the crystal size obtained by means of XRD and infrared spectroscopy (IR) using partial least squares regression. The IR spectra collected from a large amount of different geological chert samples of archeological use were pre-processed following different treatments (i.e., derivatization or sample-wise normalization) to obtain the best regression model. The full cross-validation was satisfactorily validated using real samples and the experimental root mean standard error of precision value was 165 Å whereas the average precision of the estimated size value was 3%. The features of infrared bands were also evaluated in order to know the background of the prediction ability. In the studied case, the variance in the model was associated to the differences in the characteristic stretching and bending infrared bands of SiO(2). Based on this fact, it would be feasible to estimate the crystal size if it is built beforehand a chemometric model relating the size measured by standard methods and the IR spectra. Copyright © 2012 Elsevier B.V. All rights reserved.

  2. Estimation of the diagnostic threshold accounting for decision costs and sampling uncertainty.

    PubMed

    Skaltsa, Konstantina; Jover, Lluís; Carrasco, Josep Lluís

    2010-10-01

    Medical diagnostic tests are used to classify subjects as non-diseased or diseased. The classification rule usually consists of classifying subjects using the values of a continuous marker that is dichotomised by means of a threshold. Here, the optimum threshold estimate is found by minimising a cost function that accounts for both decision costs and sampling uncertainty. The cost function is optimised either analytically in a normal distribution setting or empirically in a free-distribution setting when the underlying probability distributions of diseased and non-diseased subjects are unknown. Inference of the threshold estimates is based on approximate analytically standard errors and bootstrap-based approaches. The performance of the proposed methodology is assessed by means of a simulation study, and the sample size required for a given confidence interval precision and sample size ratio is also calculated. Finally, a case example based on previously published data concerning the diagnosis of Alzheimer's patients is provided in order to illustrate the procedure.

  3. How bandwidth selection algorithms impact exploratory data analysis using kernel density estimation.

    PubMed

    Harpole, Jared K; Woods, Carol M; Rodebaugh, Thomas L; Levinson, Cheri A; Lenze, Eric J

    2014-09-01

    Exploratory data analysis (EDA) can reveal important features of underlying distributions, and these features often have an impact on inferences and conclusions drawn from data. Graphical analysis is central to EDA, and graphical representations of distributions often benefit from smoothing. A viable method of estimating and graphing the underlying density in EDA is kernel density estimation (KDE). This article provides an introduction to KDE and examines alternative methods for specifying the smoothing bandwidth in terms of their ability to recover the true density. We also illustrate the comparison and use of KDE methods with 2 empirical examples. Simulations were carried out in which we compared 8 bandwidth selection methods (Sheather-Jones plug-in [SJDP], normal rule of thumb, Silverman's rule of thumb, least squares cross-validation, biased cross-validation, and 3 adaptive kernel estimators) using 5 true density shapes (standard normal, positively skewed, bimodal, skewed bimodal, and standard lognormal) and 9 sample sizes (15, 25, 50, 75, 100, 250, 500, 1,000, 2,000). Results indicate that, overall, SJDP outperformed all methods. However, for smaller sample sizes (25 to 100) either biased cross-validation or Silverman's rule of thumb was recommended, and for larger sample sizes the adaptive kernel estimator with SJDP was recommended. Information is provided about implementing the recommendations in the R computing language. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  4. A preliminary report on the magnetic measurements of samples 72275 and 72255. [direction and magnitude of remanent magnetization

    NASA Technical Reports Server (NTRS)

    Banerjee, S. K.

    1974-01-01

    The direction and magnitude of natural remanent magnetization of five approximately 3-g subsamples of 72275 and 72255 and the high field saturation magnetization, coercive force, and isothermal remanent magnetization of 100-mg chip from each of these samples, were studied. Given an understanding of the magnetization processes, group 1 experiments provide information about the absolute direction of the ancient magnetizing field and a qualitative estimate of its size (paleointensity). The group 2 experiments yield a quantitative estimate of the iron content and a qualitative ideal of the grain sizes.

  5. Visual search for tropical web spiders: the influence of plot length, sampling effort, and phase of the day on species richness.

    PubMed

    Pinto-Leite, C M; Rocha, P L B

    2012-12-01

    Empirical studies using visual search methods to investigate spider communities were conducted with different sampling protocols, including a variety of plot sizes, sampling efforts, and diurnal periods for sampling. We sampled 11 plots ranging in size from 5 by 10 m to 5 by 60 m. In each plot, we computed the total number of species detected every 10 min during 1 hr during the daytime and during the nighttime (0630 hours to 1100 hours, both a.m. and p.m.). We measured the influence of time effort on the measurement of species richness by comparing the curves produced by sample-based rarefaction and species richness estimation (first-order jackknife). We used a general linear model with repeated measures to assess whether the phase of the day during which sampling occurred and the differences in the plot lengths influenced the number of species observed and the number of species estimated. To measure the differences in species composition between the phases of the day, we used a multiresponse permutation procedure and a graphical representation based on nonmetric multidimensional scaling. After 50 min of sampling, we noted a decreased rate of species accumulation and a tendency of the estimated richness curves to reach an asymptote. We did not detect an effect of plot size on the number of species sampled. However, differences in observed species richness and species composition were found between phases of the day. Based on these results, we propose guidelines for visual search for tropical web spiders.

  6. STREAMBED PARTICLE SIZE FROM PEBBLE COUNTS USING VISUALLY ESTIMATED SIZE CLSASES: JUNK OR USEFUL DATA?

    EPA Science Inventory

    In large-scale studies, it is often neither feasible nor necessary to obtain the large samples of 400 particles advocated by many geomorphologists to adequately quantify streambed surface particle-size distributions. Synoptic surveys such as U.S. Environmental Protection Agency...

  7. A novel sample size formula for the weighted log-rank test under the proportional hazards cure model.

    PubMed

    Xiong, Xiaoping; Wu, Jianrong

    2017-01-01

    The treatment of cancer has progressed dramatically in recent decades, such that it is no longer uncommon to see a cure or log-term survival in a significant proportion of patients with various types of cancer. To adequately account for the cure fraction when designing clinical trials, the cure models should be used. In this article, a sample size formula for the weighted log-rank test is derived under the fixed alternative hypothesis for the proportional hazards cure models. Simulation showed that the proposed sample size formula provides an accurate estimation of sample size for designing clinical trials under the proportional hazards cure models. Copyright © 2016 John Wiley & Sons, Ltd.

  8. A model-based approach to sample size estimation in recent onset type 1 diabetes.

    PubMed

    Bundy, Brian N; Krischer, Jeffrey P

    2016-11-01

    The area under the curve C-peptide following a 2-h mixed meal tolerance test from 498 individuals enrolled on five prior TrialNet studies of recent onset type 1 diabetes from baseline to 12 months after enrolment were modelled to produce estimates of its rate of loss and variance. Age at diagnosis and baseline C-peptide were found to be significant predictors, and adjusting for these in an ANCOVA resulted in estimates with lower variance. Using these results as planning parameters for new studies results in a nearly 50% reduction in the target sample size. The modelling also produces an expected C-peptide that can be used in observed versus expected calculations to estimate the presumption of benefit in ongoing trials. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  9. State-space modeling of population sizes and trends in Nihoa Finch and Millerbird

    USGS Publications Warehouse

    Gorresen, P. Marcos; Brinck, Kevin W.; Camp, Richard J.; Farmer, Chris; Plentovich, Sheldon M.; Banko, Paul C.

    2016-01-01

    Both of the 2 passerines endemic to Nihoa Island, Hawai‘i, USA—the Nihoa Millerbird (Acrocephalus familiaris kingi) and Nihoa Finch (Telespiza ultima)—are listed as endangered by federal and state agencies. Their abundances have been estimated by irregularly implemented fixed-width strip-transect sampling from 1967 to 2012, from which area-based extrapolation of the raw counts produced highly variable abundance estimates for both species. To evaluate an alternative survey method and improve abundance estimates, we conducted variable-distance point-transect sampling between 2010 and 2014. We compared our results to those obtained from strip-transect samples. In addition, we applied state-space models to derive improved estimates of population size and trends from the legacy time series of strip-transect counts. Both species were fairly evenly distributed across Nihoa and occurred in all or nearly all available habitat. Population trends for Nihoa Millerbird were inconclusive because of high within-year variance. Trends for Nihoa Finch were positive, particularly since the early 1990s. Distance-based analysis of point-transect counts produced mean estimates of abundance similar to those from strip-transects but was generally more precise. However, both survey methods produced biologically unrealistic variability between years. State-space modeling of the long-term time series of abundances obtained from strip-transect counts effectively reduced uncertainty in both within- and between-year estimates of population size, and allowed short-term changes in abundance trajectories to be smoothed into a long-term trend.

  10. The quantification of spermatozoa by real-time quantitative PCR, spectrophotometry, and spermatophore cap size.

    PubMed

    Doyle, Jacqueline M; McCormick, Cory R; DeWoody, J Andrew

    2011-01-01

    Many animals, such as crustaceans, insects, and salamanders, package their sperm into spermatophores, and the number of spermatozoa contained in a spermatophore is relevant to studies of sexual selection and sperm competition. We used two molecular methods, real-time quantitative polymerase chain reaction (RT-qPCR) and spectrophotometry, to estimate sperm numbers from spermatophores. First, we designed gene-specific primers that produced a single amplicon in four species of ambystomatid salamanders. A standard curve generated from cloned amplicons revealed a strong positive relationship between template DNA quantity and cycle threshold, suggesting that RT-qPCR could be used to quantify sperm in a given sample. We then extracted DNA from multiple Ambystoma maculatum spermatophores, performed RT-qPCR on each sample, and estimated template copy numbers (i.e. sperm number) using the standard curve. Second, we used spectrophotometry to determine the number of sperm per spermatophore by measuring DNA concentration relative to the genome size. We documented a significant positive relationship between the estimates of sperm number based on RT-qPCR and those based on spectrophotometry. When these molecular estimates were compared to spermatophore cap size, which in principle could predict the number of sperm contained in the spermatophore, we also found a significant positive relationship between sperm number and spermatophore cap size. This linear model allows estimates of sperm number strictly from cap size, an approach which could greatly simplify the estimation of sperm number in future studies. These methods may help explain variation in fertilization success where sperm competition is mediated by sperm quantity. © 2010 Blackwell Publishing Ltd.

  11. Estimation of the vortex length scale and intensity from two-dimensional samples

    NASA Technical Reports Server (NTRS)

    Reuss, D. L.; Cheng, W. P.

    1992-01-01

    A method is proposed for estimating flow features that influence flame wrinkling in reciprocating internal combustion engines, where traditional statistical measures of turbulence are suspect. Candidate methods were tested in a computed channel flow where traditional turbulence measures are valid and performance can be rationally evaluated. Two concepts are tested. First, spatial filtering is applied to the two-dimensional velocity distribution and found to reveal structures corresponding to the vorticity field. Decreasing the spatial-frequency cutoff of the filter locally changes the character and size of the flow structures that are revealed by the filter. Second, vortex length scale and intensity is estimated by computing the ensemble-average velocity distribution conditionally sampled on the vorticity peaks. The resulting conditionally sampled 'average vortex' has a peak velocity less than half the rms velocity and a size approximately equal to the two-point-correlation integral-length scale.

  12. Sampling methods, dispersion patterns, and fixed precision sequential sampling plans for western flower thrips (Thysanoptera: Thripidae) and cotton fleahoppers (Hemiptera: Miridae) in cotton.

    PubMed

    Parajulee, M N; Shrestha, R B; Leser, J F

    2006-04-01

    A 2-yr field study was conducted to examine the effectiveness of two sampling methods (visual and plant washing techniques) for western flower thrips, Frankliniella occidentalis (Pergande), and five sampling methods (visual, beat bucket, drop cloth, sweep net, and vacuum) for cotton fleahopper, Pseudatomoscelis seriatus (Reuter), in Texas cotton, Gossypium hirsutum (L.), and to develop sequential sampling plans for each pest. The plant washing technique gave similar results to the visual method in detecting adult thrips, but the washing technique detected significantly higher number of thrips larvae compared with the visual sampling. Visual sampling detected the highest number of fleahoppers followed by beat bucket, drop cloth, vacuum, and sweep net sampling, with no significant difference in catch efficiency between vacuum and sweep net methods. However, based on fixed precision cost reliability, the sweep net sampling was the most cost-effective method followed by vacuum, beat bucket, drop cloth, and visual sampling. Taylor's Power Law analysis revealed that the field dispersion patterns of both thrips and fleahoppers were aggregated throughout the crop growing season. For thrips management decision based on visual sampling (0.25 precision), 15 plants were estimated to be the minimum sample size when the estimated population density was one thrips per plant, whereas the minimum sample size was nine plants when thrips density approached 10 thrips per plant. The minimum visual sample size for cotton fleahoppers was 16 plants when the density was one fleahopper per plant, but the sample size decreased rapidly with an increase in fleahopper density, requiring only four plants to be sampled when the density was 10 fleahoppers per plant. Sequential sampling plans were developed and validated with independent data for both thrips and cotton fleahoppers.

  13. Sample size calculation for studies with grouped survival data.

    PubMed

    Li, Zhiguo; Wang, Xiaofei; Wu, Yuan; Owzar, Kouros

    2018-06-10

    Grouped survival data arise often in studies where the disease status is assessed at regular visits to clinic. The time to the event of interest can only be determined to be between two adjacent visits or is right censored at one visit. In data analysis, replacing the survival time with the endpoint or midpoint of the grouping interval leads to biased estimators of the effect size in group comparisons. Prentice and Gloeckler developed a maximum likelihood estimator for the proportional hazards model with grouped survival data and the method has been widely applied. Previous work on sample size calculation for designing studies with grouped data is based on either the exponential distribution assumption or the approximation of variance under the alternative with variance under the null. Motivated by studies in HIV trials, cancer trials and in vitro experiments to study drug toxicity, we develop a sample size formula for studies with grouped survival endpoints that use the method of Prentice and Gloeckler for comparing two arms under the proportional hazards assumption. We do not impose any distributional assumptions, nor do we use any approximation of variance of the test statistic. The sample size formula only requires estimates of the hazard ratio and survival probabilities of the event time of interest and the censoring time at the endpoints of the grouping intervals for one of the two arms. The formula is shown to perform well in a simulation study and its application is illustrated in the three motivating examples. Copyright © 2018 John Wiley & Sons, Ltd.

  14. Using flow cytometry to estimate pollen DNA content: improved methodology and applications

    PubMed Central

    Kron, Paul; Husband, Brian C.

    2012-01-01

    Background and Aims Flow cytometry has been used to measure nuclear DNA content in pollen, mostly to understand pollen development and detect unreduced gametes. Published data have not always met the high-quality standards required for some applications, in part due to difficulties inherent in the extraction of nuclei. Here we describe a simple and relatively novel method for extracting pollen nuclei, involving the bursting of pollen through a nylon mesh, compare it with other methods and demonstrate its broad applicability and utility. Methods The method was tested across 80 species, 64 genera and 33 families, and the data were evaluated using established criteria for estimating genome size and analysing cell cycle. Filter bursting was directly compared with chopping in five species, yields were compared with published values for sonicated samples, and the method was applied by comparing genome size estimates for leaf and pollen nuclei in six species. Key Results Data quality met generally applied standards for estimating genome size in 81 % of species and the higher best practice standards for cell cycle analysis in 51 %. In 41 % of species we met the most stringent criterion of screening 10 000 pollen grains per sample. In direct comparison with two chopping techniques, our method produced better quality histograms with consistently higher nuclei yields, and yields were higher than previously published results for sonication. In three binucleate and three trinucleate species we found that pollen-based genome size estimates differed from leaf tissue estimates by 1·5 % or less when 1C pollen nuclei were used, while estimates from 2C generative nuclei differed from leaf estimates by up to 2·5 %. Conclusions The high success rate, ease of use and wide applicability of the filter bursting method show that this method can facilitate the use of pollen for estimating genome size and dramatically improve unreduced pollen production estimation with flow cytometry. PMID:22875815

  15. Small area estimation (SAE) model: Case study of poverty in West Java Province

    NASA Astrophysics Data System (ADS)

    Suhartini, Titin; Sadik, Kusman; Indahwati

    2016-02-01

    This paper showed the comparative of direct estimation and indirect/Small Area Estimation (SAE) model. Model selection included resolve multicollinearity problem in auxiliary variable, such as choosing only variable non-multicollinearity and implemented principal component (PC). Concern parameters in this paper were the proportion of agricultural venture poor households and agricultural poor households area level in West Java Province. The approach for estimating these parameters could be performed based on direct estimation and SAE. The problem of direct estimation, three area even zero and could not be conducted by directly estimation, because small sample size. The proportion of agricultural venture poor households showed 19.22% and agricultural poor households showed 46.79%. The best model from agricultural venture poor households by choosing only variable non-multicollinearity and the best model from agricultural poor households by implemented PC. The best estimator showed SAE better then direct estimation both of the proportion of agricultural venture poor households and agricultural poor households area level in West Java Province. The solution overcame small sample size and obtained estimation for small area was implemented small area estimation method for evidence higher accuracy and better precision improved direct estimator.

  16. Evaluating multi-level models to test occupancy state responses of Plethodontid salamanders

    USGS Publications Warehouse

    Kroll, Andrew J.; Garcia, Tiffany S.; Jones, Jay E.; Dugger, Catherine; Murden, Blake; Johnson, Josh; Peerman, Summer; Brintz, Ben; Rochelle, Michael

    2015-01-01

    Plethodontid salamanders are diverse and widely distributed taxa and play critical roles in ecosystem processes. Due to salamander use of structurally complex habitats, and because only a portion of a population is available for sampling, evaluation of sampling designs and estimators is critical to provide strong inference about Plethodontid ecology and responses to conservation and management activities. We conducted a simulation study to evaluate the effectiveness of multi-scale and hierarchical single-scale occupancy models in the context of a Before-After Control-Impact (BACI) experimental design with multiple levels of sampling. Also, we fit the hierarchical single-scale model to empirical data collected for Oregon slender and Ensatina salamanders across two years on 66 forest stands in the Cascade Range, Oregon, USA. All models were fit within a Bayesian framework. Estimator precision in both models improved with increasing numbers of primary and secondary sampling units, underscoring the potential gains accrued when adding secondary sampling units. Both models showed evidence of estimator bias at low detection probabilities and low sample sizes; this problem was particularly acute for the multi-scale model. Our results suggested that sufficient sample sizes at both the primary and secondary sampling levels could ameliorate this issue. Empirical data indicated Oregon slender salamander occupancy was associated strongly with the amount of coarse woody debris (posterior mean = 0.74; SD = 0.24); Ensatina occupancy was not associated with amount of coarse woody debris (posterior mean = -0.01; SD = 0.29). Our simulation results indicate that either model is suitable for use in an experimental study of Plethodontid salamanders provided that sample sizes are sufficiently large. However, hierarchical single-scale and multi-scale models describe different processes and estimate different parameters. As a result, we recommend careful consideration of study questions and objectives prior to sampling data and fitting models.

  17. Selection of the effect size for sample size determination for a continuous response in a superiority clinical trial using a hybrid classical and Bayesian procedure.

    PubMed

    Ciarleglio, Maria M; Arendt, Christopher D; Peduzzi, Peter N

    2016-06-01

    When designing studies that have a continuous outcome as the primary endpoint, the hypothesized effect size ([Formula: see text]), that is, the hypothesized difference in means ([Formula: see text]) relative to the assumed variability of the endpoint ([Formula: see text]), plays an important role in sample size and power calculations. Point estimates for [Formula: see text] and [Formula: see text] are often calculated using historical data. However, the uncertainty in these estimates is rarely addressed. This article presents a hybrid classical and Bayesian procedure that formally integrates prior information on the distributions of [Formula: see text] and [Formula: see text] into the study's power calculation. Conditional expected power, which averages the traditional power curve using the prior distributions of [Formula: see text] and [Formula: see text] as the averaging weight, is used, and the value of [Formula: see text] is found that equates the prespecified frequentist power ([Formula: see text]) and the conditional expected power of the trial. This hypothesized effect size is then used in traditional sample size calculations when determining sample size for the study. The value of [Formula: see text] found using this method may be expressed as a function of the prior means of [Formula: see text] and [Formula: see text], [Formula: see text], and their prior standard deviations, [Formula: see text]. We show that the "naïve" estimate of the effect size, that is, the ratio of prior means, should be down-weighted to account for the variability in the parameters. An example is presented for designing a placebo-controlled clinical trial testing the antidepressant effect of alprazolam as monotherapy for major depression. Through this method, we are able to formally integrate prior information on the uncertainty and variability of both the treatment effect and the common standard deviation into the design of the study while maintaining a frequentist framework for the final analysis. Solving for the effect size which the study has a high probability of correctly detecting based on the available prior information on the difference [Formula: see text] and the standard deviation [Formula: see text] provides a valuable, substantiated estimate that can form the basis for discussion about the study's feasibility during the design phase. © The Author(s) 2016.

  18. Estimating Mutual Information for High-to-Low Calibration

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Michaud, Isaac James; Williams, Brian J.; Weaver, Brian Phillip

    Presentation shows that KSG 2 is superior to KSG 1 because it scales locally automatically; KSG estimators are limited to a maximum MI due to sample size; LNC extends the capability of KSG without onerous assumptions; iLNC allows LNC to estimate information gain.

  19. Transport Loss Estimation of Fine Particulate Matter in Sampling Tube Based on Numerical Computation

    NASA Astrophysics Data System (ADS)

    Luo, L.; Cheng, Z.

    2016-12-01

    In-situ measurement of PM2.5 physical and chemical properties is one substantial approach for the mechanism investigation of PM2.5 pollution. Minimizing PM2.5 transport loss in sampling tube is essential for ensuring the accuracy of the measurement result. In order to estimate the integrated PM2.5 transport efficiency in sampling tube and optimize tube designs, the effects of different tube factors (length, bore size and bend number) on the PM2.5 transport were analyzed based on the numerical computation. The results shows that PM2.5 mass concentration transport efficiency of vertical tube with flowrate at 20.0 L·min-1, bore size at 4 mm, length at 1.0 m was 89.6%. However, the transport efficiency will increase to 98.3% when the bore size is increased to 14 mm. PM2.5 mass concentration transport efficiency of horizontal tube with flowrate at 1.0 L·min-1, bore size at 4mm, length at 10.0 m is 86.7%, increased to 99.2% with length at 0.5 m. Low transport efficiency of 85.2% for PM2.5 mass concentration is estimated in bend with flowrate at 20.0 L·min-1, bore size at 4mm, curvature angle at 90o. Laminar flow of air in tube through keeping the ratio of flowrate (L·min-1) and bore size (mm) less than 1.4 is beneficial to decrease the PM2.5 transport loss. For the target of PM2.5 transport efficiency higher than 97%, it is advised to use vertical sampling tubes with length less than 6.0 m for the flowrates of 2.5, 5.0, 10.0 L·min-1 and bore size larger than 12 mm for the flowrates of 16.7 or 20.0 L·min-1. For horizontal sampling tubes, tube length is decided by the ratio of flowrate and bore size. Meanwhile, it is suggested to decrease the amount of the bends in tube of turbulent flow.

  20. Use of Bayes theorem to correct size-specific sampling bias in growth data.

    PubMed

    Troynikov, V S

    1999-03-01

    The bayesian decomposition of posterior distribution was used to develop a likelihood function to correct bias in the estimates of population parameters from data collected randomly with size-specific selectivity. Positive distributions with time as a parameter were used for parametrization of growth data. Numerical illustrations are provided. The alternative applications of the likelihood to estimate selectivity parameters are discussed.

  1. Addressing small sample size bias in multiple-biomarker trials: Inclusion of biomarker-negative patients and Firth correction.

    PubMed

    Habermehl, Christina; Benner, Axel; Kopp-Schneider, Annette

    2018-03-01

    In recent years, numerous approaches for biomarker-based clinical trials have been developed. One of these developments are multiple-biomarker trials, which aim to investigate multiple biomarkers simultaneously in independent subtrials. For low-prevalence biomarkers, small sample sizes within the subtrials have to be expected, as well as many biomarker-negative patients at the screening stage. The small sample sizes may make it unfeasible to analyze the subtrials individually. This imposes the need to develop new approaches for the analysis of such trials. With an expected large group of biomarker-negative patients, it seems reasonable to explore options to benefit from including them in such trials. We consider advantages and disadvantages of the inclusion of biomarker-negative patients in a multiple-biomarker trial with a survival endpoint. We discuss design options that include biomarker-negative patients in the study and address the issue of small sample size bias in such trials. We carry out a simulation study for a design where biomarker-negative patients are kept in the study and are treated with standard of care. We compare three different analysis approaches based on the Cox model to examine if the inclusion of biomarker-negative patients can provide a benefit with respect to bias and variance of the treatment effect estimates. We apply the Firth correction to reduce the small sample size bias. The results of the simulation study suggest that for small sample situations, the Firth correction should be applied to adjust for the small sample size bias. Additional to the Firth penalty, the inclusion of biomarker-negative patients in the analysis can lead to further but small improvements in bias and standard deviation of the estimates. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Radar volume reflectivity estimation using an array of ground-based rainfall drop size detectors

    NASA Astrophysics Data System (ADS)

    Lane, John; Merceret, Francis; Kasparis, Takis; Roy, D.; Muller, Brad; Jones, W. Linwood

    2000-08-01

    Rainfall drop size distribution (DSD) measurements made by single disdrometers at isolated ground sites have traditionally been used to estimate the transformation between weather radar reflectivity Z and rainfall rate R. Despite the immense disparity in sampling geometries, the resulting Z-R relation obtained by these single point measurements has historically been important in the study of applied radar meteorology. Simultaneous DSD measurements made at several ground sites within a microscale area may be used to improve the estimate of radar reflectivity in the air volume surrounding the disdrometer array. By applying the equations of motion for non-interacting hydrometers, a volume estimate of Z is obtained from the array of ground based disdrometers by first calculating a 3D drop size distribution. The 3D-DSD model assumes that only gravity and terminal velocity due to atmospheric drag within the sampling volume influence hydrometer dynamics. The sampling volume is characterized by wind velocities, which are input parameters to the 3D-DSD model, composed of vertical and horizontal components. Reflectivity data from four consecutive WSR-88D volume scans, acquired during a thunderstorm near Melbourne, FL on June 1, 1997, are compared to data processed using the 3D-DSD model and data form three ground based disdrometers of a microscale array.

  3. The Applicability of Confidence Intervals of Quantiles for the Generalized Logistic Distribution

    NASA Astrophysics Data System (ADS)

    Shin, H.; Heo, J.; Kim, T.; Jung, Y.

    2007-12-01

    The generalized logistic (GL) distribution has been widely used for frequency analysis. However, there is a little study related to the confidence intervals that indicate the prediction accuracy of distribution for the GL distribution. In this paper, the estimation of the confidence intervals of quantiles for the GL distribution is presented based on the method of moments (MOM), maximum likelihood (ML), and probability weighted moments (PWM) and the asymptotic variances of each quantile estimator are derived as functions of the sample sizes, return periods, and parameters. Monte Carlo simulation experiments are also performed to verify the applicability of the derived confidence intervals of quantile. As the results, the relative bias (RBIAS) and relative root mean square error (RRMSE) of the confidence intervals generally increase as return period increases and reverse as sample size increases. And PWM for estimating the confidence intervals performs better than the other methods in terms of RRMSE when the data is almost symmetric while ML shows the smallest RBIAS and RRMSE when the data is more skewed and sample size is moderately large. The GL model was applied to fit the distribution of annual maximum rainfall data. The results show that there are little differences in the estimated quantiles between ML and PWM while distinct differences in MOM.

  4. A Note on Structural Equation Modeling Estimates of Reliability

    ERIC Educational Resources Information Center

    Yang, Yanyun; Green, Samuel B.

    2010-01-01

    Reliability can be estimated using structural equation modeling (SEM). Two potential problems with this approach are that estimates may be unstable with small sample sizes and biased with misspecified models. A Monte Carlo study was conducted to investigate the quality of SEM estimates of reliability by themselves and relative to coefficient…

  5. Guidelines for sampling aboveground biomass and carbon in mature central hardwood forests

    Treesearch

    Martin A. Spetich; Stephen R. Shifley

    2017-01-01

    As impacts of climate change expand, determining accurate measures of forest biomass and associated carbon storage in forests is critical. We present sampling guidance for 12 combinations of percent error, plot size, and alpha levels by disturbance regime to help determine the optimal size of plots to estimate aboveground biomass and carbon in an old-growth Central...

  6. The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

    ERIC Educational Resources Information Center

    Sahin, Alper; Anil, Duygu

    2017-01-01

    This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…

  7. Effect of sample moisture content on XRD-estimated cellulose crystallinity index and crystallite size

    Treesearch

    Umesh P. Agarwal; Sally A. Ralph; Carlos Baez; Richard S. Reiner; Steve P. Verrill

    2017-01-01

    Although X-ray diffraction (XRD) has been the most widely used technique to investigate crystallinity index (CrI) and crystallite size (L200) of cellulose materials, there are not many studies that have taken into account the role of sample moisture on these measurements. The present investigation focuses on a variety of celluloses and cellulose...

  8. Equations for hydraulic conductivity estimation from particle size distribution: A dimensional analysis

    NASA Astrophysics Data System (ADS)

    Wang, Ji-Peng; François, Bertrand; Lambert, Pierre

    2017-09-01

    Estimating hydraulic conductivity from particle size distribution (PSD) is an important issue for various engineering problems. Classical models such as Hazen model, Beyer model, and Kozeny-Carman model usually regard the grain diameter at 10% passing (d10) as an effective grain size and the effects of particle size uniformity (in Beyer model) or porosity (in Kozeny-Carman model) are sometimes embedded. This technical note applies the dimensional analysis (Buckingham's ∏ theorem) to analyze the relationship between hydraulic conductivity and particle size distribution (PSD). The porosity is regarded as a dependent variable on the grain size distribution in unconsolidated conditions. It indicates that the coefficient of grain size uniformity and a dimensionless group representing the gravity effect, which is proportional to the mean grain volume, are the main two determinative parameters for estimating hydraulic conductivity. Regression analysis is then carried out on a database comprising 431 samples collected from different depositional environments and new equations are developed for hydraulic conductivity estimation. The new equation, validated in specimens beyond the database, shows an improved prediction comparing to using the classic models.

  9. Estimating Animal Abundance in Ground Beef Batches Assayed with Molecular Markers

    PubMed Central

    Hu, Xin-Sheng; Simila, Janika; Platz, Sindey Schueler; Moore, Stephen S.; Plastow, Graham; Meghen, Ciaran N.

    2012-01-01

    Estimating animal abundance in industrial scale batches of ground meat is important for mapping meat products through the manufacturing process and for effectively tracing the finished product during a food safety recall. The processing of ground beef involves a potentially large number of animals from diverse sources in a single product batch, which produces a high heterogeneity in capture probability. In order to estimate animal abundance through DNA profiling of ground beef constituents, two parameter-based statistical models were developed for incidence data. Simulations were applied to evaluate the maximum likelihood estimate (MLE) of a joint likelihood function from multiple surveys, showing superiority in the presence of high capture heterogeneity with small sample sizes, or comparable estimation in the presence of low capture heterogeneity with a large sample size when compared to other existing models. Our model employs the full information on the pattern of the capture-recapture frequencies from multiple samples. We applied the proposed models to estimate animal abundance in six manufacturing beef batches, genotyped using 30 single nucleotide polymorphism (SNP) markers, from a large scale beef grinding facility. Results show that between 411∼1367 animals were present in six manufacturing beef batches. These estimates are informative as a reference for improving recall processes and tracing finished meat products back to source. PMID:22479559

  10. Sample size requirements for the design of reliability studies: precision consideration.

    PubMed

    Shieh, Gwowen

    2014-09-01

    In multilevel modeling, the intraclass correlation coefficient based on the one-way random-effects model is routinely employed to measure the reliability or degree of resemblance among group members. To facilitate the advocated practice of reporting confidence intervals in future reliability studies, this article presents exact sample size procedures for precise interval estimation of the intraclass correlation coefficient under various allocation and cost structures. Although the suggested approaches do not admit explicit sample size formulas and require special algorithms for carrying out iterative computations, they are more accurate than the closed-form formulas constructed from large-sample approximations with respect to the expected width and assurance probability criteria. This investigation notes the deficiency of existing methods and expands the sample size methodology for the design of reliability studies that have not previously been discussed in the literature.

  11. A method for analysing small samples of floral pollen for free and protein-bound amino acids.

    PubMed

    Stabler, Daniel; Power, Eileen F; Borland, Anne M; Barnes, Jeremy D; Wright, Geraldine A

    2018-02-01

    Pollen provides floral visitors with essential nutrients including proteins, lipids, vitamins and minerals. As an important nutrient resource for pollinators, including honeybees and bumblebees, pollen quality is of growing interest in assessing available nutrition to foraging bees. To date, quantifying the protein-bound amino acids in pollen has been difficult and methods rely on large amounts of pollen, typically more than 1 g. More usual is to estimate a crude protein value based on the nitrogen content of pollen, however, such methods provide no information on the distribution of essential and non-essential amino acids constituting the proteins.Here, we describe a method of microwave-assisted acid hydrolysis using low amounts of pollen that allows exploration of amino acid composition, quantified using ultra high performance liquid chromatography (UHPLC), and a back calculation to estimate the crude protein content of pollen.Reliable analysis of protein-bound and free amino acids as well as an estimation of crude protein concentration was obtained from pollen samples as low as 1 mg. Greater variation in both protein-bound and free amino acids was found in pollen sample sizes <1 mg. Due to the variability in recovery of amino acids in smaller sample sizes, we suggest a correction factor to apply to specific sample sizes of pollen in order to estimate total crude protein content.The method described in this paper will allow researchers to explore the composition of amino acids in pollen and will aid research assessing the available nutrition to pollinating animals. This method will be particularly useful in assaying the pollen of wild plants, from which it is difficult to obtain large sample weights.

  12. Individualized statistical learning from medical image databases: application to identification of brain lesions.

    PubMed

    Erus, Guray; Zacharaki, Evangelia I; Davatzikos, Christos

    2014-04-01

    This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a "target-specific" feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject's images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an "estimability" criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Reporting of sample size calculations in analgesic clinical trials: ACTTION systematic review.

    PubMed

    McKeown, Andrew; Gewandter, Jennifer S; McDermott, Michael P; Pawlowski, Joseph R; Poli, Joseph J; Rothstein, Daniel; Farrar, John T; Gilron, Ian; Katz, Nathaniel P; Lin, Allison H; Rappaport, Bob A; Rowbotham, Michael C; Turk, Dennis C; Dworkin, Robert H; Smith, Shannon M

    2015-03-01

    Sample size calculations determine the number of participants required to have sufficiently high power to detect a given treatment effect. In this review, we examined the reporting quality of sample size calculations in 172 publications of double-blind randomized controlled trials of noninvasive pharmacologic or interventional (ie, invasive) pain treatments published in European Journal of Pain, Journal of Pain, and Pain from January 2006 through June 2013. Sixty-five percent of publications reported a sample size calculation but only 38% provided all elements required to replicate the calculated sample size. In publications reporting at least 1 element, 54% provided a justification for the treatment effect used to calculate sample size, and 24% of studies with continuous outcome variables justified the variability estimate. Publications of clinical pain condition trials reported a sample size calculation more frequently than experimental pain model trials (77% vs 33%, P < .001) but did not differ in the frequency of reporting all required elements. No significant differences in reporting of any or all elements were detected between publications of trials with industry and nonindustry sponsorship. Twenty-eight percent included a discrepancy between the reported number of planned and randomized participants. This study suggests that sample size calculation reporting in analgesic trial publications is usually incomplete. Investigators should provide detailed accounts of sample size calculations in publications of clinical trials of pain treatments, which is necessary for reporting transparency and communication of pre-trial design decisions. In this systematic review of analgesic clinical trials, sample size calculations and the required elements (eg, treatment effect to be detected; power level) were incompletely reported. A lack of transparency regarding sample size calculations may raise questions about the appropriateness of the calculated sample size. Copyright © 2015 American Pain Society. All rights reserved.

  14. Analysis of methods commonly used in biomedicine for treatment versus control comparison of very small samples.

    PubMed

    Ristić-Djurović, Jasna L; Ćirković, Saša; Mladenović, Pavle; Romčević, Nebojša; Trbovich, Alexander M

    2018-04-01

    A rough estimate indicated that use of samples of size not larger than ten is not uncommon in biomedical research and that many of such studies are limited to strong effects due to sample sizes smaller than six. For data collected from biomedical experiments it is also often unknown if mathematical requirements incorporated in the sample comparison methods are satisfied. Computer simulated experiments were used to examine performance of methods for qualitative sample comparison and its dependence on the effectiveness of exposure, effect intensity, distribution of studied parameter values in the population, and sample size. The Type I and Type II errors, their average, as well as the maximal errors were considered. The sample size 9 and the t-test method with p = 5% ensured error smaller than 5% even for weak effects. For sample sizes 6-8 the same method enabled detection of weak effects with errors smaller than 20%. If the sample sizes were 3-5, weak effects could not be detected with an acceptable error; however, the smallest maximal error in the most general case that includes weak effects is granted by the standard error of the mean method. The increase of sample size from 5 to 9 led to seven times more accurate detection of weak effects. Strong effects were detected regardless of the sample size and method used. The minimal recommended sample size for biomedical experiments is 9. Use of smaller sizes and the method of their comparison should be justified by the objective of the experiment. Copyright © 2018 Elsevier B.V. All rights reserved.

  15. Population Size Estimation of Men Who Have Sex with Men in Tbilisi, Georgia; Multiple Methods and Triangulation of Findings.

    PubMed

    Sulaberidze, Lela; Mirzazadeh, Ali; Chikovani, Ivdity; Shengelia, Natia; Tsereteli, Nino; Gotsadze, George

    2016-01-01

    An accurate estimation of the population size of men who have sex with men (MSM) is critical to the success of HIV program planning and to monitoring of the response to epidemic as a whole, but is quite often missing. In this study, our aim was to estimate the population size of MSM in Tbilisi, Georgia and compare it with other estimates in the region. In the absence of a gold standard for estimating the population size of MSM, this study reports a range of methods, including network scale-up, mobile/web apps multiplier, service and unique object multiplier, network-based capture-recapture, Handcock RDS-based and Wisdom of Crowds methods. To apply all these methods, two surveys were conducted: first, a household survey among 1,015 adults from the general population, and second, a respondent driven sample of 210 MSM. We also conducted a literature review of MSM size estimation in Eastern European and Central Asian countries. The median population size of MSM generated from all previously mentioned methods was estimated to be 5,100 (95% Confidence Interval (CI): 3,243~9,088). This corresponds to 1.42% (95%CI: 0.9%~2.53%) of the adult male population in Tbilisi. Our size estimates of the MSM population (1.42% (95%CI: 0.9%~2.53%) of the adult male population in Tbilisi) fall within ranges reported in other Eastern European and Central Asian countries. These estimates can provide valuable information for country level HIV prevention program planning and evaluation. Furthermore, we believe, that our results will narrow the gap in data availability on the estimates of the population size of MSM in the region.

  16. Measuring Housework Participation: The Gap between "Stylised" Questionnaire Estimates and Diary-Based Estimates

    ERIC Educational Resources Information Center

    Kan, Man Yee

    2008-01-01

    This article compares stylised (questionnaire-based) estimates and diary-based estimates of housework time collected from the same respondents. Data come from the Home On-line Study (1999-2001), a British national household survey that contains both types of estimates (sample size = 632 men and 666 women). It shows that the gap between the two…

  17. Monitoring the effective population size of a brown bear (Ursus arctos) population using new single-sample approaches.

    PubMed

    Skrbinšek, Tomaž; Jelenčič, Maja; Waits, Lisette; Kos, Ivan; Jerina, Klemen; Trontelj, Peter

    2012-02-01

    The effective population size (N(e) ) could be the ideal parameter for monitoring populations of conservation concern as it conveniently summarizes both the evolutionary potential of the population and its sensitivity to genetic stochasticity. However, tracing its change through time is difficult in natural populations. We applied four new methods for estimating N(e) from a single sample of genotypes to trace temporal change in N(e) for bears in the Northern Dinaric Mountains. We genotyped 510 bears using 20 microsatellite loci and determined their age. The samples were organized into cohorts with regard to the year when the animals were born and yearly samples with age categories for every year when they were alive. We used the Estimator by Parentage Assignment (EPA) to directly estimate both N(e) and generation interval for each yearly sample. For cohorts, we estimated the effective number of breeders (N(b) ) using linkage disequilibrium, sibship assignment and approximate Bayesian computation methods and extrapolated these estimates to N(e) using the generation interval. The N(e) estimate by EPA is 276 (183-350 95% CI), meeting the inbreeding-avoidance criterion of N(e) > 50 but short of the long-term minimum viable population goal of N(e) > 500. The results obtained by the other methods are highly consistent with this result, and all indicate a rapid increase in N(e) probably in the late 1990s and early 2000s. The new single-sample approaches to the estimation of N(e) provide efficient means for including N(e) in monitoring frameworks and will be of great importance for future management and conservation. © 2012 Blackwell Publishing Ltd.

  18. Nearest neighbor density ratio estimation for large-scale applications in astronomy

    NASA Astrophysics Data System (ADS)

    Kremer, J.; Gieseke, F.; Steenstrup Pedersen, K.; Igel, C.

    2015-09-01

    In astronomical applications of machine learning, the distribution of objects used for building a model is often different from the distribution of the objects the model is later applied to. This is known as sample selection bias, which is a major challenge for statistical inference as one can no longer assume that the labeled training data are representative. To address this issue, one can re-weight the labeled training patterns to match the distribution of unlabeled data that are available already in the training phase. There are many examples in practice where this strategy yielded good results, but estimating the weights reliably from a finite sample is challenging. We consider an efficient nearest neighbor density ratio estimator that can exploit large samples to increase the accuracy of the weight estimates. To solve the problem of choosing the right neighborhood size, we propose to use cross-validation on a model selection criterion that is unbiased under covariate shift. The resulting algorithm is our method of choice for density ratio estimation when the feature space dimensionality is small and sample sizes are large. The approach is simple and, because of the model selection, robust. We empirically find that it is on a par with established kernel-based methods on relatively small regression benchmark datasets. However, when applied to large-scale photometric redshift estimation, our approach outperforms the state-of-the-art.

  19. Evaluating probability sampling strategies for estimating redd counts: an example with Chinook salmon (Oncorhynchus tshawytscha)

    Treesearch

    Jean-Yves Courbois; Stephen L. Katz; Daniel J. Isaak; E. Ashley Steel; Russell F. Thurow; A. Michelle Wargo Rub; Tony Olsen; Chris E. Jordan

    2008-01-01

    Precise, unbiased estimates of population size are an essential tool for fisheries management. For a wide variety of salmonid fishes, redd counts from a sample of reaches are commonly used to monitor annual trends in abundance. Using a 9-year time series of georeferenced censuses of Chinook salmon (Oncorhynchus tshawytscha) redds from central Idaho,...

  20. Estimating individual glomerular volume in the human kidney: clinical perspectives.

    PubMed

    Puelles, Victor G; Zimanyi, Monika A; Samuel, Terence; Hughson, Michael D; Douglas-Denton, Rebecca N; Bertram, John F; Armitage, James A

    2012-05-01

    Measurement of individual glomerular volumes (IGV) has allowed the identification of drivers of glomerular hypertrophy in subjects without overt renal pathology. This study aims to highlight the relevance of IGV measurements with possible clinical implications and determine how many profiles must be measured in order to achieve stable size distribution estimates. We re-analysed 2250 IGV estimates obtained using the disector/Cavalieri method in 41 African and 34 Caucasian Americans. Pooled IGV analysis of mean and variance was conducted. Monte-Carlo (Jackknife) simulations determined the effect of the number of sampled glomeruli on mean IGV. Lin's concordance coefficient (R(C)), coefficient of variation (CV) and coefficient of error (CE) measured reliability. IGV mean and variance increased with overweight and hypertensive status. Superficial glomeruli were significantly smaller than juxtamedullary glomeruli in all subjects (P < 0.01), by race (P < 0.05) and in obese individuals (P < 0.01). Subjects with multiple chronic kidney disease (CKD) comorbidities showed significant increases in IGV mean and variability. Overall, mean IGV was particularly reliable with nine or more sampled glomeruli (R(C) > 0.95, <5% difference in CV and CE). These observations were not affected by a reduced sample size and did not disrupt the inverse linear correlation between mean IGV and estimated total glomerular number. Multiple comorbidities for CKD are associated with increased IGV mean and variance within subjects, including overweight, obesity and hypertension. Zonal selection and the number of sampled glomeruli do not represent drawbacks for future longitudinal biopsy-based studies of glomerular size and distribution.

  1. SMURC: High-Dimension Small-Sample Multivariate Regression With Covariance Estimation.

    PubMed

    Bayar, Belhassen; Bouaynaya, Nidhal; Shterenberg, Roman

    2017-03-01

    We consider a high-dimension low sample-size multivariate regression problem that accounts for correlation of the response variables. The system is underdetermined as there are more parameters than samples. We show that the maximum likelihood approach with covariance estimation is senseless because the likelihood diverges. We subsequently propose a normalization of the likelihood function that guarantees convergence. We call this method small-sample multivariate regression with covariance (SMURC) estimation. We derive an optimization problem and its convex approximation to compute SMURC. Simulation results show that the proposed algorithm outperforms the regularized likelihood estimator with known covariance matrix and the sparse conditional Gaussian graphical model. We also apply SMURC to the inference of the wing-muscle gene network of the Drosophila melanogaster (fruit fly).

  2. Extent of genome-wide linkage disequilibrium in Australian Holstein-Friesian cattle based on a high-density SNP panel.

    PubMed

    Khatkar, Mehar S; Nicholas, Frank W; Collins, Andrew R; Zenger, Kyall R; Cavanagh, Julie A L; Barris, Wes; Schnabel, Robert D; Taylor, Jeremy F; Raadsma, Herman W

    2008-04-24

    The extent of linkage disequilibrium (LD) within a population determines the number of markers that will be required for successful association mapping and marker-assisted selection. Most studies on LD in cattle reported to date are based on microsatellite markers or small numbers of single nucleotide polymorphisms (SNPs) covering one or only a few chromosomes. This is the first comprehensive study on the extent of LD in cattle by analyzing data on 1,546 Holstein-Friesian bulls genotyped for 15,036 SNP markers covering all regions of all autosomes. Furthermore, most studies in cattle have used relatively small sample sizes and, consequently, may have had biased estimates of measures commonly used to describe LD. We examine minimum sample sizes required to estimate LD without bias and loss in accuracy. Finally, relatively little information is available on comparative LD structures including other mammalian species such as human and mouse, and we compare LD structure in cattle with public-domain data from both human and mouse. We computed three LD estimates, D', Dvol and r2, for 1,566,890 syntenic SNP pairs and a sample of 365,400 non-syntenic pairs. Mean D' is 0.189 among syntenic SNPs, and 0.105 among non-syntenic SNPs; mean r2 is 0.024 among syntenic SNPs and 0.0032 among non-syntenic SNPs. All three measures of LD for syntenic pairs decline with distance; the decline is much steeper for r2 than for D' and Dvol. The value of D' and Dvol are quite similar. Significant LD in cattle extends to 40 kb (when estimated as r2) and 8.2 Mb (when estimated as D'). The mean values for LD at large physical distances are close to those for non-syntenic SNPs. Minor allelic frequency threshold affects the distribution and extent of LD. For unbiased and accurate estimates of LD across marker intervals spanning < 1 kb to > 50 Mb, minimum sample sizes of 400 (for D') and 75 (for r2) are required. The bias due to small samples sizes increases with inter-marker interval. LD in cattle is much less extensive than in a mouse population created from crossing inbred lines, and more extensive than in humans. For association mapping in Holstein-Friesian cattle, for a given design, at least one SNP is required for each 40 kb, giving a total requirement of at least 75,000 SNPs for a low power whole-genome scan (median r2 > 0.19) and up to 300,000 markers at 10 kb intervals for a high power genome scan (median r2 > 0.62). For estimation of LD by D' and Dvol with sufficient precision, a sample size of at least 400 is required, whereas for r2 a minimum sample of 75 is adequate.

  3. Estimating the quadratic mean diameters of fine woody debris in forests of the United States

    Treesearch

    Christopher W. Woodall; Vicente J. Monleon

    2010-01-01

    Most fine woody debris (FWD) line-intersect sampling protocols and associated estimators require an approximation of the quadratic mean diameter (QMD) of each individual FWD size class. There is a lack of empirically derived QMDs by FWD size class and species/forest type across the U.S. The objective of this study is to evaluate a technique known as the graphical...

  4. ESTIMATION OF DIFFUSION LOSSES WHEN SAMPLING DIESEL AEROSOL: A QUALITY ASSURANCE MEASURE

    EPA Science Inventory

    A fundamental component of the QA work for the assessment of instruments and sampling system performance was the investigation of particle losses in sampling lines. Along the aerosol sample pathway from its source to the collection media or measuring instrument, some nano-size p...

  5. A Comparison of Normal and Elliptical Estimation Methods in Structural Equation Models.

    ERIC Educational Resources Information Center

    Schumacker, Randall E.; Cheevatanarak, Suchittra

    Monte Carlo simulation compared chi-square statistics, parameter estimates, and root mean square error of approximation values using normal and elliptical estimation methods. Three research conditions were imposed on the simulated data: sample size, population contamination percent, and kurtosis. A Bentler-Weeks structural model established the…

  6. Chapter 33: Offshore Population Estimates of Marbled Murrelets in California

    Treesearch

    C. John Ralph; Sherri L. Miller

    1995-01-01

    We devised a method of estimating population size of Marbled Murrelets (Brachyramphus marmoratus) found in California’s offshore waters. The method involves determining the distribution of birds from the shore outward to 6,000 m offshore. Applying this distribution to data from boat surveys, we derived population estimates and estimates of sampling...

  7. Sample size re-estimation and other midcourse adjustments with sequential parallel comparison design.

    PubMed

    Silverman, Rachel K; Ivanova, Anastasia

    2017-01-01

    Sequential parallel comparison design (SPCD) was proposed to reduce placebo response in a randomized trial with placebo comparator. Subjects are randomized between placebo and drug in stage 1 of the trial, and then, placebo non-responders are re-randomized in stage 2. Efficacy analysis includes all data from stage 1 and all placebo non-responding subjects from stage 2. This article investigates the possibility to re-estimate the sample size and adjust the design parameters, allocation proportion to placebo in stage 1 of SPCD, and weight of stage 1 data in the overall efficacy test statistic during an interim analysis.

  8. Do adults show a curse of knowledge in false-belief reasoning? A robust estimate of the true effect size.

    PubMed

    Ryskin, Rachel A; Brown-Schmidt, Sarah

    2014-01-01

    Seven experiments use large sample sizes to robustly estimate the effect size of a previous finding that adults are more likely to commit egocentric errors in a false-belief task when the egocentric response is plausible in light of their prior knowledge. We estimate the true effect size to be less than half of that reported in the original findings. Even though we found effects in the same direction as the original, they were substantively smaller; the original study would have had less than 33% power to detect an effect of this magnitude. The influence of plausibility on the curse of knowledge in adults appears to be small enough that its impact on real-life perspective-taking may need to be reevaluated.

  9. Size at emergence improves accuracy of age estimates in forensically-useful beetle Creophilus maxillosus L. (Staphylinidae).

    PubMed

    Matuszewski, Szymon; Frątczak-Łagiewska, Katarzyna

    2018-02-05

    Insects colonizing human or animal cadavers may be used to estimate post-mortem interval (PMI) usually by aging larvae or pupae sampled on a crime scene. The accuracy of insect age estimates in a forensic context is reduced by large intraspecific variation in insect development time. Here we test the concept that insect size at emergence may be used to predict insect physiological age and accordingly to improve the accuracy of age estimates in forensic entomology. Using results of laboratory study on development of forensically-useful beetle Creophilus maxillosus (Linnaeus, 1758) (Staphylinidae) we demonstrate that its physiological age at emergence [i.e. thermal summation value (K) needed for emergence] fall with an increase of beetle size. In the validation study it was found that K estimated based on the adult insect size was significantly closer to the true K as compared to K from the general thermal summation model. Using beetle length at emergence as a predictor variable and male or female specific model regressing K against beetle length gave the most accurate predictions of age. These results demonstrate that size of C. maxillosus at emergence improves accuracy of age estimates in a forensic context.

  10. Comparative analyses of basal rate of metabolism in mammals: data selection does matter.

    PubMed

    Genoud, Michel; Isler, Karin; Martin, Robert D

    2018-02-01

    Basal rate of metabolism (BMR) is a physiological parameter that should be measured under strictly defined experimental conditions. In comparative analyses among mammals BMR is widely used as an index of the intensity of the metabolic machinery or as a proxy for energy expenditure. Many databases with BMR values for mammals are available, but the criteria used to select metabolic data as BMR estimates have often varied and the potential effect of this variability has rarely been questioned. We provide a new, expanded BMR database reflecting compliance with standard criteria (resting, postabsorptive state; thermal neutrality; adult, non-reproductive status for females) and examine potential effects of differential selectivity on the results of comparative analyses. The database includes 1739 different entries for 817 species of mammals, compiled from the original sources. It provides information permitting assessment of the validity of each estimate and presents the value closest to a proper BMR for each entry. Using different selection criteria, several alternative data sets were extracted and used in comparative analyses of (i) the scaling of BMR to body mass and (ii) the relationship between brain mass and BMR. It was expected that results would be especially dependent on selection criteria with small sample sizes and with relatively weak relationships. Phylogenetically informed regression (phylogenetic generalized least squares, PGLS) was applied to the alternative data sets for several different clades (Mammalia, Eutheria, Metatheria, or individual orders). For Mammalia, a 'subsampling procedure' was also applied, in which random subsamples of different sample sizes were taken from each original data set and successively analysed. In each case, two data sets with identical sample size and species, but comprising BMR data with different degrees of reliability, were compared. Selection criteria had minor effects on scaling equations computed for large clades (Mammalia, Eutheria, Metatheria), although less-reliable estimates of BMR were generally about 12-20% larger than more-reliable ones. Larger effects were found with more-limited clades, such as sciuromorph rodents. For the relationship between BMR and brain mass the results of comparative analyses were found to depend strongly on the data set used, especially with more-limited, order-level clades. In fact, with small sample sizes (e.g. <100) results often appeared erratic. Subsampling revealed that sample size has a non-linear effect on the probability of a zero slope for a given relationship. Depending on the species included, results could differ dramatically, especially with small sample sizes. Overall, our findings indicate a need for due diligence when selecting BMR estimates and caution regarding results (even if seemingly significant) with small sample sizes. © 2017 Cambridge Philosophical Society.

  11. Using pilot data to size a two-arm randomized trial to find a nearly optimal personalized treatment strategy.

    PubMed

    Laber, Eric B; Zhao, Ying-Qi; Regh, Todd; Davidian, Marie; Tsiatis, Anastasios; Stanford, Joseph B; Zeng, Donglin; Song, Rui; Kosorok, Michael R

    2016-04-15

    A personalized treatment strategy formalizes evidence-based treatment selection by mapping patient information to a recommended treatment. Personalized treatment strategies can produce better patient outcomes while reducing cost and treatment burden. Thus, among clinical and intervention scientists, there is a growing interest in conducting randomized clinical trials when one of the primary aims is estimation of a personalized treatment strategy. However, at present, there are no appropriate sample size formulae to assist in the design of such a trial. Furthermore, because the sampling distribution of the estimated outcome under an estimated optimal treatment strategy can be highly sensitive to small perturbations in the underlying generative model, sample size calculations based on standard (uncorrected) asymptotic approximations or computer simulations may not be reliable. We offer a simple and robust method for powering a single stage, two-armed randomized clinical trial when the primary aim is estimating the optimal single stage personalized treatment strategy. The proposed method is based on inverting a plugin projection confidence interval and is thereby regular and robust to small perturbations of the underlying generative model. The proposed method requires elicitation of two clinically meaningful parameters from clinical scientists and uses data from a small pilot study to estimate nuisance parameters, which are not easily elicited. The method performs well in simulated experiments and is illustrated using data from a pilot study of time to conception and fertility awareness. Copyright © 2015 John Wiley & Sons, Ltd.

  12. Creel survey sampling designs for estimating effort in short-duration Chinook salmon fisheries

    USGS Publications Warehouse

    McCormick, Joshua L.; Quist, Michael C.; Schill, Daniel J.

    2013-01-01

    Chinook Salmon Oncorhynchus tshawytscha sport fisheries in the Columbia River basin are commonly monitored using roving creel survey designs and require precise, unbiased catch estimates. The objective of this study was to examine the relative bias and precision of total catch estimates using various sampling designs to estimate angling effort under the assumption that mean catch rate was known. We obtained information on angling populations based on direct visual observations of portions of Chinook Salmon fisheries in three Idaho river systems over a 23-d period. Based on the angling population, Monte Carlo simulations were used to evaluate the properties of effort and catch estimates for each sampling design. All sampling designs evaluated were relatively unbiased. Systematic random sampling (SYS) resulted in the most precise estimates. The SYS and simple random sampling designs had mean square error (MSE) estimates that were generally half of those observed with cluster sampling designs. The SYS design was more efficient (i.e., higher accuracy per unit cost) than a two-cluster design. Increasing the number of clusters available for sampling within a day decreased the MSE of estimates of daily angling effort, but the MSE of total catch estimates was variable depending on the fishery. The results of our simulations provide guidelines on the relative influence of sample sizes and sampling designs on parameters of interest in short-duration Chinook Salmon fisheries.

  13. Can we estimate molluscan abundance and biomass on the continental shelf?

    NASA Astrophysics Data System (ADS)

    Powell, Eric N.; Mann, Roger; Ashton-Alcox, Kathryn A.; Kuykendall, Kelsey M.; Chase Long, M.

    2017-11-01

    Few empirical studies have focused on the effect of sample density on the estimate of abundance of the dominant carbonate-producing fauna of the continental shelf. Here, we present such a study and consider the implications of suboptimal sampling design on estimates of abundance and size-frequency distribution. We focus on a principal carbonate producer of the U.S. Atlantic continental shelf, the Atlantic surfclam, Spisula solidissima. To evaluate the degree to which the results are typical, we analyze a dataset for the principal carbonate producer of Mid-Atlantic estuaries, the Eastern oyster Crassostrea virginica, obtained from Delaware Bay. These two species occupy different habitats and display different lifestyles, yet demonstrate similar challenges to survey design and similar trends with sampling density. The median of a series of simulated survey mean abundances, the central tendency obtained over a large number of surveys of the same area, always underestimated true abundance at low sample densities. More dramatic were the trends in the probability of a biased outcome. As sample density declined, the probability of a survey availability event, defined as a survey yielding indices >125% or <75% of the true population abundance, increased and that increase was disproportionately biased towards underestimates. For these cases where a single sample accessed about 0.001-0.004% of the domain, 8-15 random samples were required to reduce the probability of a survey availability event below 40%. The problem of differential bias, in which the probabilities of a biased-high and a biased-low survey index were distinctly unequal, was resolved with fewer samples than the problem of overall bias. These trends suggest that the influence of sampling density on survey design comes with a series of incremental challenges. At woefully inadequate sampling density, the probability of a biased-low survey index will substantially exceed the probability of a biased-high index. The survey time series on the average will return an estimate of the stock that underestimates true stock abundance. If sampling intensity is increased, the frequency of biased indices balances between high and low values. Incrementing sample number from this point steadily reduces the likelihood of a biased survey; however, the number of samples necessary to drive the probability of survey availability events to a preferred level of infrequency may be daunting. Moreover, certain size classes will be disproportionately susceptible to such events and the impact on size frequency will be species specific, depending on the relative dispersion of the size classes.

  14. Extreme Mean and Its Applications

    NASA Technical Reports Server (NTRS)

    Swaroop, R.; Brownlow, J. D.

    1979-01-01

    Extreme value statistics obtained from normally distributed data are considered. An extreme mean is defined as the mean of p-th probability truncated normal distribution. An unbiased estimate of this extreme mean and its large sample distribution are derived. The distribution of this estimate even for very large samples is found to be nonnormal. Further, as the sample size increases, the variance of the unbiased estimate converges to the Cramer-Rao lower bound. The computer program used to obtain the density and distribution functions of the standardized unbiased estimate, and the confidence intervals of the extreme mean for any data are included for ready application. An example is included to demonstrate the usefulness of extreme mean application.

  15. Estimation of within-stratum variance for sample allocation: Foreign commodity production forecasting

    NASA Technical Reports Server (NTRS)

    Chhikara, R. S.; Perry, C. R., Jr. (Principal Investigator)

    1980-01-01

    The problem of determining the stratum variances required for an optimum sample allocation for remotely sensed crop surveys is investigated with emphasis on an approach based on the concept of stratum variance as a function of the sampling unit size. A methodology using the existing and easily available information of historical statistics is developed for obtaining initial estimates of stratum variances. The procedure is applied to variance for wheat in the U.S. Great Plains and is evaluated based on the numerical results obtained. It is shown that the proposed technique is viable and performs satisfactorily with the use of a conservative value (smaller than the expected value) for the field size and with the use of crop statistics from the small political division level.

  16. Determining the linkage of disease-resistance genes to molecular markers: the LOD-SCORE method revisited with regard to necessary sample sizes.

    PubMed

    Hühn, M

    1995-05-01

    Some approaches to molecular marker-assisted linkage detection for a dominant disease-resistance trait based on a segregating F2 population are discussed. Analysis of two-point linkage is carried out by the traditional measure of maximum lod score. It depends on (1) the maximum-likelihood estimate of the recombination fraction between the marker and the disease-resistance gene locus, (2) the observed absolute frequencies, and (3) the unknown number of tested individuals. If one replaces the absolute frequencies by expressions depending on the unknown sample size and the maximum-likelihood estimate of recombination value, the conventional rule for significant linkage (maximum lod score exceeds a given linkage threshold) can be resolved for the sample size. For each sub-population used for linkage analysis [susceptible (= recessive) individuals, resistant (= dominant) individuals, complete F2] this approach gives a lower bound for the necessary number of individuals required for the detection of significant two-point linkage by the lod-score method.

  17. Ranked set sampling: cost and optimal set size.

    PubMed

    Nahhas, Ramzi W; Wolfe, Douglas A; Chen, Haiying

    2002-12-01

    McIntyre (1952, Australian Journal of Agricultural Research 3, 385-390) introduced ranked set sampling (RSS) as a method for improving estimation of a population mean in settings where sampling and ranking of units from the population are inexpensive when compared with actual measurement of the units. Two of the major factors in the usefulness of RSS are the set size and the relative costs of the various operations of sampling, ranking, and measurement. In this article, we consider ranking error models and cost models that enable us to assess the effect of different cost structures on the optimal set size for RSS. For reasonable cost structures, we find that the optimal RSS set sizes are generally larger than had been anticipated previously. These results will provide a useful tool for determining whether RSS is likely to lead to an improvement over simple random sampling in a given setting and, if so, what RSS set size is best to use in this case.

  18. Evaluation of single and two-stage adaptive sampling designs for estimation of density and abundance of freshwater mussels in a large river

    USGS Publications Warehouse

    Smith, D.R.; Rogala, J.T.; Gray, B.R.; Zigler, S.J.; Newton, T.J.

    2011-01-01

    Reliable estimates of abundance are needed to assess consequences of proposed habitat restoration and enhancement projects on freshwater mussels in the Upper Mississippi River (UMR). Although there is general guidance on sampling techniques for population assessment of freshwater mussels, the actual performance of sampling designs can depend critically on the population density and spatial distribution at the project site. To evaluate various sampling designs, we simulated sampling of populations, which varied in density and degree of spatial clustering. Because of logistics and costs of large river sampling and spatial clustering of freshwater mussels, we focused on adaptive and non-adaptive versions of single and two-stage sampling. The candidate designs performed similarly in terms of precision (CV) and probability of species detection for fixed sample size. Both CV and species detection were determined largely by density, spatial distribution and sample size. However, designs did differ in the rate that occupied quadrats were encountered. Occupied units had a higher probability of selection using adaptive designs than conventional designs. We used two measures of cost: sample size (i.e. number of quadrats) and distance travelled between the quadrats. Adaptive and two-stage designs tended to reduce distance between sampling units, and thus performed better when distance travelled was considered. Based on the comparisons, we provide general recommendations on the sampling designs for the freshwater mussels in the UMR, and presumably other large rivers.

  19. Geometric k-nearest neighbor estimation of entropy and mutual information

    NASA Astrophysics Data System (ADS)

    Lord, Warren M.; Sun, Jie; Bollt, Erik M.

    2018-03-01

    Nonparametric estimation of mutual information is used in a wide range of scientific problems to quantify dependence between variables. The k-nearest neighbor (knn) methods are consistent, and therefore expected to work well for a large sample size. These methods use geometrically regular local volume elements. This practice allows maximum localization of the volume elements, but can also induce a bias due to a poor description of the local geometry of the underlying probability measure. We introduce a new class of knn estimators that we call geometric knn estimators (g-knn), which use more complex local volume elements to better model the local geometry of the probability measures. As an example of this class of estimators, we develop a g-knn estimator of entropy and mutual information based on elliptical volume elements, capturing the local stretching and compression common to a wide range of dynamical system attractors. A series of numerical examples in which the thickness of the underlying distribution and the sample sizes are varied suggest that local geometry is a source of problems for knn methods such as the Kraskov-Stögbauer-Grassberger estimator when local geometric effects cannot be removed by global preprocessing of the data. The g-knn method performs well despite the manipulation of the local geometry. In addition, the examples suggest that the g-knn estimators can be of particular relevance to applications in which the system is large, but the data size is limited.

  20. Change-in-ratio estimators for populations with more than two subclasses

    USGS Publications Warehouse

    Udevitz, Mark S.; Pollock, Kenneth H.

    1991-01-01

    Change-in-ratio methods have been developed to estimate the size of populations with two or three population subclasses. Most of these methods require the often unreasonable assumption of equal sampling probabilities for individuals in all subclasses. This paper presents new models based on the weaker assumption that ratios of sampling probabilities are constant over time for populations with three or more subclasses. Estimation under these models requires that a value be assumed for one of these ratios when there are two samples. Explicit expressions are given for the maximum likelihood estimators under models for two samples with three or more subclasses and for three samples with two subclasses. A numerical method using readily available statistical software is described for obtaining the estimators and their standard errors under all of the models. Likelihood ratio tests that can be used in model selection are discussed. Emphasis is on the two-sample, three-subclass models for which Monte-Carlo simulation results and an illustrative example are presented.

  1. Statistical considerations for agroforestry studies

    Treesearch

    James A. Baldwin

    1993-01-01

    Statistical topics that related to agroforestry studies are discussed. These included study objectives, populations of interest, sampling schemes, sample sizes, estimation vs. hypothesis testing, and P-values. In addition, a relatively new and very much improved histogram display is described.

  2. Fish assemblages

    USGS Publications Warehouse

    McGarvey, Daniel J.; Falke, Jeffrey A.; Li, Hiram W.; Li, Judith; Hauer, F. Richard; Lamberti, G.A.

    2017-01-01

    Methods to sample fishes in stream ecosystems and to analyze the raw data, focusing primarily on assemblage-level (all fish species combined) analyses, are presented in this chapter. We begin with guidance on sample site selection, permitting for fish collection, and information-gathering steps to be completed prior to conducting fieldwork. Basic sampling methods (visual surveying, electrofishing, and seining) are presented with specific instructions for estimating population sizes via visual, capture-recapture, and depletion surveys, in addition to new guidance on environmental DNA (eDNA) methods. Steps to process fish specimens in the field including the use of anesthesia and preservation of whole specimens or tissue samples (for genetic or stable isotope analysis) are also presented. Data analysis methods include characterization of size-structure within populations, estimation of species richness and diversity, and application of fish functional traits. We conclude with three advanced topics in assemblage-level analysis: multidimensional scaling (MDS), ecological networks, and loop analysis.

  3. 50 CFR 648.90 - NE multispecies assessment, framework procedures and specifications, and flexible area action...

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ...; survey results; stock status; current estimates of fishing mortality and overfishing levels; social and... survey data or, if sea sampling data are unavailable, length frequency information from trawl surveys... size; sea sampling, port sampling, and survey data or, if sea sampling data are unavailable, length...

  4. Using variance components to estimate power in a hierarchically nested sampling design improving monitoring of larval Devils Hole pupfish

    USGS Publications Warehouse

    Dzul, Maria C.; Dixon, Philip M.; Quist, Michael C.; Dinsomore, Stephen J.; Bower, Michael R.; Wilson, Kevin P.; Gaines, D. Bailey

    2013-01-01

    We used variance components to assess allocation of sampling effort in a hierarchically nested sampling design for ongoing monitoring of early life history stages of the federally endangered Devils Hole pupfish (DHP) (Cyprinodon diabolis). Sampling design for larval DHP included surveys (5 days each spring 2007–2009), events, and plots. Each survey was comprised of three counting events, where DHP larvae on nine plots were counted plot by plot. Statistical analysis of larval abundance included three components: (1) evaluation of power from various sample size combinations, (2) comparison of power in fixed and random plot designs, and (3) assessment of yearly differences in the power of the survey. Results indicated that increasing the sample size at the lowest level of sampling represented the most realistic option to increase the survey's power, fixed plot designs had greater power than random plot designs, and the power of the larval survey varied by year. This study provides an example of how monitoring efforts may benefit from coupling variance components estimation with power analysis to assess sampling design.

  5. A Monte Carlo Evaluation of Estimated Parameters of Five Shrinkage Estimate Formuli.

    ERIC Educational Resources Information Center

    Newman, Isadore; And Others

    1979-01-01

    A Monte Carlo simulation was employed to determine the accuracy with which the shrinkage in R squared can be estimated by five different shrinkage formulas. The study dealt with the use of shrinkage formulas for various sample sizes, different R squared values, and different degrees of multicollinearity. (Author/JKS)

  6. Sample Size for Estimation of G and Phi Coefficients in Generalizability Theory

    ERIC Educational Resources Information Center

    Atilgan, Hakan

    2013-01-01

    Problem Statement: Reliability, which refers to the degree to which measurement results are free from measurement errors, as well as its estimation, is an important issue in psychometrics. Several methods for estimating reliability have been suggested by various theories in the field of psychometrics. One of these theories is the generalizability…

  7. Methods for estimating confidence intervals in interrupted time series analyses of health interventions.

    PubMed

    Zhang, Fang; Wagner, Anita K; Soumerai, Stephen B; Ross-Degnan, Dennis

    2009-02-01

    Interrupted time series (ITS) is a strong quasi-experimental research design, which is increasingly applied to estimate the effects of health services and policy interventions. We describe and illustrate two methods for estimating confidence intervals (CIs) around absolute and relative changes in outcomes calculated from segmented regression parameter estimates. We used multivariate delta and bootstrapping methods (BMs) to construct CIs around relative changes in level and trend, and around absolute changes in outcome based on segmented linear regression analyses of time series data corrected for autocorrelated errors. Using previously published time series data, we estimated CIs around the effect of prescription alerts for interacting medications with warfarin on the rate of prescriptions per 10,000 warfarin users per month. Both the multivariate delta method (MDM) and the BM produced similar results. BM is preferred for calculating CIs of relative changes in outcomes of time series studies, because it does not require large sample sizes when parameter estimates are obtained correctly from the model. Caution is needed when sample size is small.

  8. Measuring selected PPCPs in wastewater to estimate the population in different cities in China.

    PubMed

    Gao, Jianfa; O'Brien, Jake; Du, Peng; Li, Xiqing; Ort, Christoph; Mueller, Jochen F; Thai, Phong K

    2016-10-15

    Sampling and analysis of wastewater from municipal wastewater treatment plants (WWTPs) has become a useful tool for understanding exposure to chemicals. Both wastewater based studies and management and planning of the catchment require information on catchment population in the time of monitoring. Recently, a model has been developed and calibrated using selected pharmaceutical and personal care products (PPCPs) measured in influent wastewater for estimating population in different catchments in Australia. The present study aimed at evaluating the feasibility of utilizing this population estimation approach in China. Twenty-four hour composite influent samples were collected from 31 WWTPs in 17 cities with catchment sizes from 200,000-3,450,000 people representing all seven regions of China. The samples were analyzed for 19 PPCPs using liquid chromatography coupled to tandem mass spectrometry in direct injection mode. Eight chemicals were detected in more than 50% of the samples. Significant positive correlations were found between individual PPCP mass loads and population estimates provided by WWTP operators. Using the PPCP mass load modeling approach calibrated with WWTP operator data, we estimated the population size of each catchment with good agreement with WWTP operator values (between 50-200% for all sites and 75-125% for 23 of the 31 sites). Overall, despite much lower detection and relatively high heterogeneity in PPCP consumption across China the model provided a good estimate of the population contributing to a given wastewater sample. Wastewater analysis could also provide objective PPCP consumption status in China. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Analysis of YBCO high temperature superconductor doped with silver nanoparticles and carbon nanotubes using Williamson-Hall and size-strain plot

    NASA Astrophysics Data System (ADS)

    Dadras, Sedigheh; Davoudiniya, Masoumeh

    2018-05-01

    This paper sets out to investigate and compare the effects of Ag nanoparticles and carbon nanotubes (CNTs) doping on the mechanical properties of Y1Ba2Cu3O7-δ (YBCO) high temperature superconductor. For this purpose, the pure and doped YBCO samples were synthesized by sol-gel method. The microstructural analysis of the samples is performed using X-ray diffraction (XRD). The crystalline size, lattice strain and stress of the pure and doped YBCO samples were estimated by modified forms of Williamson-Hall analysis (W-H), namely, uniform deformation model (UDM), uniform deformation stress model (UDSM) and the size-strain plot method (SSP). These results show that the crystalline size, lattice strain and stress of the YBCO samples declined by Ag nanoparticles and CNTs doping.

  10. Application of the Zero-Order Reaction Rate Model and Transition State Theory to predict porous Ti6Al4V bending strength.

    PubMed

    Reig, L; Amigó, V; Busquets, D; Calero, J A; Ortiz, J L

    2012-08-01

    Porous Ti6Al4V samples were produced by microsphere sintering. The Zero-Order Reaction Rate Model and Transition State Theory were used to model the sintering process and to estimate the bending strength of the porous samples developed. The evolution of the surface area during the sintering process was used to obtain sintering parameters (sintering constant, activation energy, frequency factor, constant of activation and Gibbs energy of activation). These were then correlated with the bending strength in order to obtain a simple model with which to estimate the evolution of the bending strength of the samples when the sintering temperature and time are modified: σY=P+B·[lnT·t-ΔGa/R·T]. Although the sintering parameters were obtained only for the microsphere sizes analysed here, the strength of intermediate sizes could easily be estimated following this model. Copyright © 2012 Elsevier B.V. All rights reserved.

  11. A comparison of two sampling approaches for assessing the urban forest canopy cover from aerial photography.

    Treesearch

    Ucar Zennure; Pete Bettinger; Krista Merry; Jacek Siry; J.M. Bowker

    2016-01-01

    Two different sampling approaches for estimating urban tree canopy cover were applied to two medium-sized cities in the United States, in conjunction with two freely available remotely sensed imagery products. A random point-based sampling approach, which involved 1000 sample points, was compared against a plot/grid sampling (cluster sampling) approach that involved a...

  12. A feasibility study in adapting Shamos Bickel and Hodges Lehman estimator into T-Method for normalization

    NASA Astrophysics Data System (ADS)

    Harudin, N.; Jamaludin, K. R.; Muhtazaruddin, M. Nabil; Ramlie, F.; Muhamad, Wan Zuki Azman Wan

    2018-03-01

    T-Method is one of the techniques governed under Mahalanobis Taguchi System that developed specifically for multivariate data predictions. Prediction using T-Method is always possible even with very limited sample size. The user of T-Method required to clearly understanding the population data trend since this method is not considering the effect of outliers within it. Outliers may cause apparent non-normality and the entire classical methods breakdown. There exist robust parameter estimate that provide satisfactory results when the data contain outliers, as well as when the data are free of them. The robust parameter estimates of location and scale measure called Shamos Bickel (SB) and Hodges Lehman (HL) which are used as a comparable method to calculate the mean and standard deviation of classical statistic is part of it. Embedding these into T-Method normalize stage feasibly help in enhancing the accuracy of the T-Method as well as analysing the robustness of T-method itself. However, the result of higher sample size case study shows that T-method is having lowest average error percentages (3.09%) on data with extreme outliers. HL and SB is having lowest error percentages (4.67%) for data without extreme outliers with minimum error differences compared to T-Method. The error percentages prediction trend is vice versa for lower sample size case study. The result shows that with minimum sample size, which outliers always be at low risk, T-Method is much better on that, while higher sample size with extreme outliers, T-Method as well show better prediction compared to others. For the case studies conducted in this research, it shows that normalization of T-Method is showing satisfactory results and it is not feasible to adapt HL and SB or normal mean and standard deviation into it since it’s only provide minimum effect of percentages errors. Normalization using T-method is still considered having lower risk towards outlier’s effect.

  13. Study design and sampling intensity for demographic analyses of bear populations

    USGS Publications Warehouse

    Harris, R.B.; Schwartz, C.C.; Mace, R.D.; Haroldson, M.A.

    2011-01-01

    The rate of population change through time (??) is a fundamental element of a wildlife population's conservation status, yet estimating it with acceptable precision for bears is difficult. For studies that follow known (usually marked) bears, ?? can be estimated during some defined time by applying either life-table or matrix projection methods to estimates of individual vital rates. Usually however, confidence intervals surrounding the estimate are broader than one would like. Using an estimator suggested by Doak et al. (2005), we explored the precision to be expected in ?? from demographic analyses of typical grizzly (Ursus arctos) and American black (U. americanus) bear data sets. We also evaluated some trade-offs among vital rates in sampling strategies. Confidence intervals around ?? were more sensitive to adding to the duration of a short (e.g., 3 yrs) than a long (e.g., 10 yrs) study, and more sensitive to adding additional bears to studies with small (e.g., 10 adult females/yr) than large (e.g., 30 adult females/yr) sample sizes. Confidence intervals of ?? projected using process-only variance of vital rates were only slightly smaller than those projected using total variances of vital rates. Under sampling constraints typical of most bear studies, it may be more efficient to invest additional resources into monitoring recruitment and juvenile survival rates of females already a part of the study, than to simply increase the sample size of study females. ?? 2011 International Association for Bear Research and Management.

  14. Are Antarctic minke whales unusually abundant because of 20th century whaling?

    PubMed

    Ruegg, Kristen C; Anderson, Eric C; Scott Baker, C; Vant, Murdoch; Jackson, Jennifer A; Palumbi, Stephen R

    2010-01-01

    Severe declines in megafauna worldwide illuminate the role of top predators in ecosystem structure. In the Antarctic, the Krill Surplus Hypothesis posits that the killing of more than 2 million large whales led to competitive release for smaller krill-eating species like the Antarctic minke whale. If true, the current size of the Antarctic minke whale population may be unusually high as an indirect result of whaling. Here, we estimate the long-term population size of the Antarctic minke whale prior to whaling by sequencing 11 nuclear genetic markers from 52 modern samples purchased in Japanese meat markets. We use coalescent simulations to explore the potential influence of population substructure and find that even though our samples are drawn from a limited geographic area, our estimate reflects ocean-wide genetic diversity. Using Bayesian estimates of the mutation rate and coalescent-based analyses of genetic diversity across loci, we calculate the long-term population size of the Antarctic minke whale to be 670,000 individuals (95% confidence interval: 374,000-1,150,000). Our estimate of long-term abundance is similar to, or greater than, contemporary abundance estimates, suggesting that managing Antarctic ecosystems under the assumption that Antarctic minke whales are unusually abundant is not warranted.

  15. Estimation and applications of size-based distributions in forestry

    Treesearch

    Jeffrey H. Gove

    2003-01-01

    Size-based distributions arise in several contexts in forestry and ecology. Simple power relationships (e.g., basal area and diameter at breast height) between variables are one such area of interest arising from a modeling perspective. Another, probability proportional to size sampline (PPS), is found in the most widely used methods for sampling standing or dead and...

  16. Catching ghosts with a coarse net: use and abuse of spatial sampling data in detecting synchronization

    PubMed Central

    2017-01-01

    Synchronization of population dynamics in different habitats is a frequently observed phenomenon. A common mathematical tool to reveal synchronization is the (cross)correlation coefficient between time courses of values of the population size of a given species where the population size is evaluated from spatial sampling data. The corresponding sampling net or grid is often coarse, i.e. it does not resolve all details of the spatial configuration, and the evaluation error—i.e. the difference between the true value of the population size and its estimated value—can be considerable. We show that this estimation error can make the value of the correlation coefficient very inaccurate or even irrelevant. We consider several population models to show that the value of the correlation coefficient calculated on a coarse sampling grid rarely exceeds 0.5, even if the true value is close to 1, so that the synchronization is effectively lost. We also observe ‘ghost synchronization’ when the correlation coefficient calculated on a coarse sampling grid is close to 1 but in reality the dynamics are not correlated. Finally, we suggest a simple test to check the sampling grid coarseness and hence to distinguish between the true and artifactual values of the correlation coefficient. PMID:28202589

  17. Estimation of the Human Extrathoracic Deposition Fraction of Inhaled Particles Using a Polyurethane Foam Collection Substrate in an IOM Sampler.

    PubMed

    Sleeth, Darrah K; Balthaser, Susan A; Collingwood, Scott; Larson, Rodney R

    2016-03-07

    Extrathoracic deposition of inhaled particles (i.e., in the head and throat) is an important exposure route for many hazardous materials. Current best practices for exposure assessment of aerosols in the workplace involve particle size selective sampling methods based on particle penetration into the human respiratory tract (i.e., inhalable or respirable sampling). However, the International Organization for Standardization (ISO) has recently adopted particle deposition sampling conventions (ISO 13138), including conventions for extrathoracic (ET) deposition into the anterior nasal passage (ET₁) and the posterior nasal and oral passages (ET₂). For this study, polyurethane foam was used as a collection substrate inside an inhalable aerosol sampler to provide an estimate of extrathoracic particle deposition. Aerosols of fused aluminum oxide (five sizes, 4.9 µm-44.3 µm) were used as a test dust in a low speed (0.2 m/s) wind tunnel. Samplers were placed on a rotating mannequin inside the wind tunnel to simulate orientation-averaged personal sampling. Collection efficiency data for the foam insert matched well to the extrathoracic deposition convention for the particle sizes tested. The concept of using a foam insert to match a particle deposition sampling convention was explored in this study and shows promise for future use as a sampling device.

  18. Estimation of the Human Extrathoracic Deposition Fraction of Inhaled Particles Using a Polyurethane Foam Collection Substrate in an IOM Sampler

    PubMed Central

    Sleeth, Darrah K.; Balthaser, Susan A.; Collingwood, Scott; Larson, Rodney R.

    2016-01-01

    Extrathoracic deposition of inhaled particles (i.e., in the head and throat) is an important exposure route for many hazardous materials. Current best practices for exposure assessment of aerosols in the workplace involve particle size selective sampling methods based on particle penetration into the human respiratory tract (i.e., inhalable or respirable sampling). However, the International Organization for Standardization (ISO) has recently adopted particle deposition sampling conventions (ISO 13138), including conventions for extrathoracic (ET) deposition into the anterior nasal passage (ET1) and the posterior nasal and oral passages (ET2). For this study, polyurethane foam was used as a collection substrate inside an inhalable aerosol sampler to provide an estimate of extrathoracic particle deposition. Aerosols of fused aluminum oxide (five sizes, 4.9 µm–44.3 µm) were used as a test dust in a low speed (0.2 m/s) wind tunnel. Samplers were placed on a rotating mannequin inside the wind tunnel to simulate orientation-averaged personal sampling. Collection efficiency data for the foam insert matched well to the extrathoracic deposition convention for the particle sizes tested. The concept of using a foam insert to match a particle deposition sampling convention was explored in this study and shows promise for future use as a sampling device. PMID:26959046

  19. Population entropies estimates of proteins

    NASA Astrophysics Data System (ADS)

    Low, Wai Yee

    2017-05-01

    The Shannon entropy equation provides a way to estimate variability of amino acids sequences in a multiple sequence alignment of proteins. Knowledge of protein variability is useful in many areas such as vaccine design, identification of antibody binding sites, and exploration of protein 3D structural properties. In cases where the population entropies of a protein are of interest but only a small sample size can be obtained, a method based on linear regression and random subsampling can be used to estimate the population entropy. This method is useful for comparisons of entropies where the actual sequence counts differ and thus, correction for alignment size bias is needed. In the current work, an R based package named EntropyCorrect that enables estimation of population entropy is presented and an empirical study on how well this new algorithm performs on simulated dataset of various combinations of population and sample sizes is discussed. The package is available at https://github.com/lloydlow/EntropyCorrect. This article, which was originally published online on 12 May 2017, contained an error in Eq. (1), where the summation sign was missing. The corrected equation appears in the Corrigendum attached to the pdf.

  20. Undersampling power-law size distributions: effect on the assessment of extreme natural hazards

    USGS Publications Warehouse

    Geist, Eric L.; Parsons, Thomas E.

    2014-01-01

    The effect of undersampling on estimating the size of extreme natural hazards from historical data is examined. Tests using synthetic catalogs indicate that the tail of an empirical size distribution sampled from a pure Pareto probability distribution can range from having one-to-several unusually large events to appearing depleted, relative to the parent distribution. Both of these effects are artifacts caused by limited catalog length. It is more difficult to diagnose the artificially depleted empirical distributions, since one expects that a pure Pareto distribution is physically limited in some way. Using maximum likelihood methods and the method of moments, we estimate the power-law exponent and the corner size parameter of tapered Pareto distributions for several natural hazard examples: tsunamis, floods, and earthquakes. Each of these examples has varying catalog lengths and measurement thresholds, relative to the largest event sizes. In many cases where there are only several orders of magnitude between the measurement threshold and the largest events, joint two-parameter estimation techniques are necessary to account for estimation dependence between the power-law scaling exponent and the corner size parameter. Results indicate that whereas the corner size parameter of a tapered Pareto distribution can be estimated, its upper confidence bound cannot be determined and the estimate itself is often unstable with time. Correspondingly, one cannot statistically reject a pure Pareto null hypothesis using natural hazard catalog data. Although physical limits to the hazard source size and by attenuation mechanisms from source to site constrain the maximum hazard size, historical data alone often cannot reliably determine the corner size parameter. Probabilistic assessments incorporating theoretical constraints on source size and propagation effects are preferred over deterministic assessments of extreme natural hazards based on historic data.

  1. Estimating and comparing microbial diversity in the presence of sequencing errors

    PubMed Central

    Chiu, Chun-Huo

    2016-01-01

    Estimating and comparing microbial diversity are statistically challenging due to limited sampling and possible sequencing errors for low-frequency counts, producing spurious singletons. The inflated singleton count seriously affects statistical analysis and inferences about microbial diversity. Previous statistical approaches to tackle the sequencing errors generally require different parametric assumptions about the sampling model or about the functional form of frequency counts. Different parametric assumptions may lead to drastically different diversity estimates. We focus on nonparametric methods which are universally valid for all parametric assumptions and can be used to compare diversity across communities. We develop here a nonparametric estimator of the true singleton count to replace the spurious singleton count in all methods/approaches. Our estimator of the true singleton count is in terms of the frequency counts of doubletons, tripletons and quadrupletons, provided these three frequency counts are reliable. To quantify microbial alpha diversity for an individual community, we adopt the measure of Hill numbers (effective number of taxa) under a nonparametric framework. Hill numbers, parameterized by an order q that determines the measures’ emphasis on rare or common species, include taxa richness (q = 0), Shannon diversity (q = 1, the exponential of Shannon entropy), and Simpson diversity (q = 2, the inverse of Simpson index). A diversity profile which depicts the Hill number as a function of order q conveys all information contained in a taxa abundance distribution. Based on the estimated singleton count and the original non-singleton frequency counts, two statistical approaches (non-asymptotic and asymptotic) are developed to compare microbial diversity for multiple communities. (1) A non-asymptotic approach refers to the comparison of estimated diversities of standardized samples with a common finite sample size or sample completeness. This approach aims to compare diversity estimates for equally-large or equally-complete samples; it is based on the seamless rarefaction and extrapolation sampling curves of Hill numbers, specifically for q = 0, 1 and 2. (2) An asymptotic approach refers to the comparison of the estimated asymptotic diversity profiles. That is, this approach compares the estimated profiles for complete samples or samples whose size tends to be sufficiently large. It is based on statistical estimation of the true Hill number of any order q ≥ 0. In the two approaches, replacing the spurious singleton count by our estimated count, we can greatly remove the positive biases associated with diversity estimates due to spurious singletons and also make fair comparisons across microbial communities, as illustrated in our simulation results and in applying our method to analyze sequencing data from viral metagenomes. PMID:26855872

  2. Optically-derived estimates of phytoplankton size class and taxonomic group biomass in the Eastern Subarctic Pacific Ocean

    NASA Astrophysics Data System (ADS)

    Zeng, Chen; Rosengard, Sarah Z.; Burt, William; Peña, M. Angelica; Nemcek, Nina; Zeng, Tao; Arrigo, Kevin R.; Tortell, Philippe D.

    2018-06-01

    We evaluate several algorithms for the estimation of phytoplankton size class (PSC) and functional type (PFT) biomass from ship-based optical measurements in the Subarctic Northeast Pacific Ocean. Using underway measurements of particulate absorption and backscatter in surface waters, we derived estimates of PSC/PFT based on chlorophyll-a concentrations (Chl-a), particulate absorption spectra and the wavelength dependence of particulate backscatter. Optically-derived [Chl-a] and phytoplankton absorption measurements were validated against discrete calibration samples, while the derived PSC/PFT estimates were validated using size-fractionated Chl-a measurements and HPLC analysis of diagnostic photosynthetic pigments (DPA). Our results showflo that PSC/PFT algorithms based on [Chl-a] and particulate absorption spectra performed significantly better than the backscatter slope approach. These two more successful algorithms yielded estimates of phytoplankton size classes that agreed well with HPLC-derived DPA estimates (RMSE = 12.9%, and 16.6%, respectively) across a range of hydrographic and productivity regimes. Moreover, the [Chl-a] algorithm produced PSC estimates that agreed well with size-fractionated [Chl-a] measurements, and estimates of the biomass of specific phytoplankton groups that were consistent with values derived from HPLC. Based on these results, we suggest that simple [Chl-a] measurements should be more fully exploited to improve the classification of phytoplankton assemblages in the Northeast Pacific Ocean.

  3. High throughput nonparametric probability density estimation.

    PubMed

    Farmer, Jenny; Jacobs, Donald

    2018-01-01

    In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.

  4. High throughput nonparametric probability density estimation

    PubMed Central

    Farmer, Jenny

    2018-01-01

    In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference. PMID:29750803

  5. Estimating individual glomerular volume in the human kidney: clinical perspectives

    PubMed Central

    Puelles, Victor G.; Zimanyi, Monika A.; Samuel, Terence; Hughson, Michael D.; Douglas-Denton, Rebecca N.; Bertram, John F.

    2012-01-01

    Background. Measurement of individual glomerular volumes (IGV) has allowed the identification of drivers of glomerular hypertrophy in subjects without overt renal pathology. This study aims to highlight the relevance of IGV measurements with possible clinical implications and determine how many profiles must be measured in order to achieve stable size distribution estimates. Methods. We re-analysed 2250 IGV estimates obtained using the disector/Cavalieri method in 41 African and 34 Caucasian Americans. Pooled IGV analysis of mean and variance was conducted. Monte-Carlo (Jackknife) simulations determined the effect of the number of sampled glomeruli on mean IGV. Lin’s concordance coefficient (RC), coefficient of variation (CV) and coefficient of error (CE) measured reliability. Results. IGV mean and variance increased with overweight and hypertensive status. Superficial glomeruli were significantly smaller than juxtamedullary glomeruli in all subjects (P < 0.01), by race (P < 0.05) and in obese individuals (P < 0.01). Subjects with multiple chronic kidney disease (CKD) comorbidities showed significant increases in IGV mean and variability. Overall, mean IGV was particularly reliable with nine or more sampled glomeruli (RC > 0.95, <5% difference in CV and CE). These observations were not affected by a reduced sample size and did not disrupt the inverse linear correlation between mean IGV and estimated total glomerular number. Conclusions. Multiple comorbidities for CKD are associated with increased IGV mean and variance within subjects, including overweight, obesity and hypertension. Zonal selection and the number of sampled glomeruli do not represent drawbacks for future longitudinal biopsy-based studies of glomerular size and distribution. PMID:21984554

  6. Sampling benthic macroinvertebrates in a large flood-plain river: Considerations of study design, sample size, and cost

    USGS Publications Warehouse

    Bartsch, L.A.; Richardson, W.B.; Naimo, T.J.

    1998-01-01

    Estimation of benthic macroinvertebrate populations over large spatial scales is difficult due to the high variability in abundance and the cost of sample processing and taxonomic analysis. To determine a cost-effective, statistically powerful sample design, we conducted an exploratory study of the spatial variation of benthic macroinvertebrates in a 37 km reach of the Upper Mississippi River. We sampled benthos at 36 sites within each of two strata, contiguous backwater and channel border. Three standard ponar (525 cm(2)) grab samples were obtained at each site ('Original Design'). Analysis of variance and sampling cost of strata-wide estimates for abundance of Oligochaeta, Chironomidae, and total invertebrates showed that only one ponar sample per site ('Reduced Design') yielded essentially the same abundance estimates as the Original Design, while reducing the overall cost by 63%. A posteriori statistical power analysis (alpha = 0.05, beta = 0.20) on the Reduced Design estimated that at least 18 sites per stratum were needed to detect differences in mean abundance between contiguous backwater and channel border areas for Oligochaeta, Chironomidae, and total invertebrates. Statistical power was nearly identical for the three taxonomic groups. The abundances of several taxa of concern (e.g., Hexagenia mayflies and Musculium fingernail clams) were too spatially variable to estimate power with our method. Resampling simulations indicated that to achieve adequate sampling precision for Oligochaeta, at least 36 sample sites per stratum would be required, whereas a sampling precision of 0.2 would not be attained with any sample size for Hexagenia in channel border areas, or Chironomidae and Musculium in both strata given the variance structure of the original samples. Community-wide diversity indices (Brillouin and 1-Simpsons) increased as sample area per site increased. The backwater area had higher diversity than the channel border area. The number of sampling sites required to sample benthic macroinvertebrates during our sampling period depended on the study objective and ranged from 18 to more than 40 sites per stratum. No single sampling regime would efficiently and adequately sample all components of the macroinvertebrate community.

  7. A comparison of small-area estimation techniques to estimate selected stand attributes using LiDAR-derived auxiliary variables

    Treesearch

    Michael E. Goerndt; Vicente J. Monleon; Hailemariam Temesgen

    2011-01-01

    One of the challenges often faced in forestry is the estimation of forest attributes for smaller areas of interest within a larger population. Small-area estimation (SAE) is a set of techniques well suited to estimation of forest attributes for small areas in which the existing sample size is small and auxiliary information is available. Selected SAE methods were...

  8. Innovative recruitment using online networks: lessons learned from an online study of alcohol and other drug use utilizing a web-based, respondent-driven sampling (webRDS) strategy.

    PubMed

    Bauermeister, José A; Zimmerman, Marc A; Johns, Michelle M; Glowacki, Pietreck; Stoddard, Sarah; Volz, Erik

    2012-09-01

    We used a web version of Respondent-Driven Sampling (webRDS) to recruit a sample of young adults (ages 18-24) and examined whether this strategy would result in alcohol and other drug (AOD) prevalence estimates comparable to national estimates (National Survey on Drug Use and Health [NSDUH]). We recruited 22 initial participants (seeds) via Facebook to complete a web survey examining AOD risk correlates. Sequential, incentivized recruitment continued until our desired sample size was achieved. After correcting for webRDS clustering effects, we contrasted our AOD prevalence estimates (past 30 days) to NSDUH estimates by comparing the 95% confidence intervals of prevalence estimates. We found comparable AOD prevalence estimates between our sample and NSDUH for the past 30 days for alcohol, marijuana, cocaine, Ecstasy (3,4-methylenedioxymethamphetamine, or MDMA), and hallucinogens. Cigarette use was lower than NSDUH estimates. WebRDS may be a suitable strategy to recruit young adults online. We discuss the unique strengths and challenges that may be encountered by public health researchers using webRDS methods.

  9. Identification of dust storm origin in South -West of Iran.

    PubMed

    Broomandi, Parya; Dabir, Bahram; Bonakdarpour, Babak; Rashidi, Yousef

    2017-01-01

    Deserts are the main sources of emitted dust, and are highly responsive to wind erosion. Low content of soil moisture and lack of vegetation cover lead to fine particle's release. One of the semi-arid bare lands in Iran, located in the South-West of Iran in Khoozestan province, was selected to investigate Sand and Dust storm potential. This paper focused on the metrological parameters of the sampling site, their changes and the relationship between these changes and dust storm occurrence, estimation of Reconaissance Drought Index, the Atterberg limits of soil samples and their relation with soil erosion ability, the chemical composition, size distribution of soil and airborne dust samples, and estimation of vertical mass flux by COMSALT through considering the effect of saffman force and interparticle cohesion forces during warm period (April-September) in 2010. The chemical compositions are measured with X-ray fluorescence, Atomic absorption spectrophotometer and X-ray diffraction. The particle size distribution analysis was conducted by using Laser particle size and sieve techniques. There was a strong negative correlation between dust storm occurrence and annual and seasonal rainfall and relative humidity. Positive strong correlation between annual and seasonal maximum temperature and dust storm frequency was seen. Estimation of RDI st in the studied period showed an extremely dry condition. Using the results of particle size distribution and soil consistency, the weak structure of soil was represented. X-ray diffraction analyses of soil and dust samples showed that soil mineralogy was dominated mainly by Quartz and calcite. X-ray fluorescence analyses of samples indicated that the most important major oxide compositions of the soil and airborne dust samples were SiO 2 , Al 2 O 3 , CaO, MgO, Na 2 O, and Fe 2 O 3 , demonstrating similar percentages for soil and dust samples. Estimation of Enrichment Factors for all studied trace elements in soil samples showed Br, Cl, Mo, S, Zn, and Hg with EF values higher than 10. The findings, showed the possible correlation between the degree of anthropogenic soil pollutants, and the remains of Iraq-Iran war. The results expressed sand and dust storm emission potential in this area, was illustrated with measured vertical mass fluxes by COMSALT.

  10. The effects of sample scheduling and sample numbers on estimates of the annual fluxes of suspended sediment in fluvial systems

    USGS Publications Warehouse

    Horowitz, Arthur J.; Clarke, Robin T.; Merten, Gustavo Henrique

    2015-01-01

    Since the 1970s, there has been both continuing and growing interest in developing accurate estimates of the annual fluvial transport (fluxes and loads) of suspended sediment and sediment-associated chemical constituents. This study provides an evaluation of the effects of manual sample numbers (from 4 to 12 year−1) and sample scheduling (random-based, calendar-based and hydrology-based) on the precision, bias and accuracy of annual suspended sediment flux estimates. The evaluation is based on data from selected US Geological Survey daily suspended sediment stations in the USA and covers basins ranging in area from just over 900 km2 to nearly 2 million km2 and annual suspended sediment fluxes ranging from about 4 Kt year−1 to about 200 Mt year−1. The results appear to indicate that there is a scale effect for random-based and calendar-based sampling schemes, with larger sample numbers required as basin size decreases. All the sampling schemes evaluated display some level of positive (overestimates) or negative (underestimates) bias. The study further indicates that hydrology-based sampling schemes are likely to generate the most accurate annual suspended sediment flux estimates with the fewest number of samples, regardless of basin size. This type of scheme seems most appropriate when the determination of suspended sediment concentrations, sediment-associated chemical concentrations, annual suspended sediment and annual suspended sediment-associated chemical fluxes only represent a few of the parameters of interest in multidisciplinary, multiparameter monitoring programmes. The results are just as applicable to the calibration of autosamplers/suspended sediment surrogates currently used to measure/estimate suspended sediment concentrations and ultimately, annual suspended sediment fluxes, because manual samples are required to adjust the sample data/measurements generated by these techniques so that they provide depth-integrated and cross-sectionally representative data. 

  11. Designing image segmentation studies: Statistical power, sample size and reference standard quality.

    PubMed

    Gibson, Eli; Hu, Yipeng; Huisman, Henkjan J; Barratt, Dean C

    2017-12-01

    Segmentation algorithms are typically evaluated by comparison to an accepted reference standard. The cost of generating accurate reference standards for medical image segmentation can be substantial. Since the study cost and the likelihood of detecting a clinically meaningful difference in accuracy both depend on the size and on the quality of the study reference standard, balancing these trade-offs supports the efficient use of research resources. In this work, we derive a statistical power calculation that enables researchers to estimate the appropriate sample size to detect clinically meaningful differences in segmentation accuracy (i.e. the proportion of voxels matching the reference standard) between two algorithms. Furthermore, we derive a formula to relate reference standard errors to their effect on the sample sizes of studies using lower-quality (but potentially more affordable and practically available) reference standards. The accuracy of the derived sample size formula was estimated through Monte Carlo simulation, demonstrating, with 95% confidence, a predicted statistical power within 4% of simulated values across a range of model parameters. This corresponds to sample size errors of less than 4 subjects and errors in the detectable accuracy difference less than 0.6%. The applicability of the formula to real-world data was assessed using bootstrap resampling simulations for pairs of algorithms from the PROMISE12 prostate MR segmentation challenge data set. The model predicted the simulated power for the majority of algorithm pairs within 4% for simulated experiments using a high-quality reference standard and within 6% for simulated experiments using a low-quality reference standard. A case study, also based on the PROMISE12 data, illustrates using the formulae to evaluate whether to use a lower-quality reference standard in a prostate segmentation study. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  12. Estimation of PAHs dry deposition and BaP toxic equivalency factors (TEFs) study at Urban, Industry Park and rural sampling sites in central Taiwan, Taichung.

    PubMed

    Fang, Guor-Cheng; Chang, Kuan-Foo; Lu, Chungsying; Bai, Hsunling

    2004-05-01

    The concentrations of polycyclic aromatic hydrocarbons (PAHs) in gas phase and particle bound were measured simultaneously at industrial (INDUSTRY), urban (URBAN), and rural areas (RURAL) in Taichung, Taiwan. And the PAH concentrations, size distributions, estimated PAHs dry deposition fluxes and health risk study of PAHs in the ambient air of central Taiwan were discussed in this study. Total PAH concentrations at INDUSTRY, URBAN, and RURAL sampling sites were found to be 1650 +/- 1240, 1220 +/- 520, and 831 +/- 427 ng/m3, respectively. The results indicated that PAH concentrations were higher at INDUSTRY and URBAN sampling sites than the RURAL sampling sites because of the more industrial processes, traffic exhausts and human activities. The estimation dry deposition and size distribution of PAHs were also studied. The results indicated that the estimated dry deposition fluxes of total PAHs were 58.5, 48.8, and 38.6 microg/m2/day at INDUSTRY, URBAN, and RURAL, respectively. The BaP equivalency results indicated that the health risk of gas phase PAHs were higher than the particle phase at three sampling sites of central Taiwan. However, compared with the BaP equivalency results to other studies conducted in factory, this study indicated the health risk of PAHs was acceptable in the ambient air of central Taiwan.

  13. The Ecological Genetics of Introduced Populations of the Giant Toad BUFO MARINUS. II. Effective Population Size

    PubMed Central

    Easteal, Simon

    1985-01-01

    The allele frequencies are described at ten polymorphic enzyme loci (of a total of 22 loci sampled) in 15 populations of the neotropical giant toad, Bufo marinus, introduced to Hawaii and Australia in the 1930s. The history of establishment of the ten populations is described and used as a framework for the analysis of allele frequency variances. The variances are used to determine the effective sizes of the populations. The estimates obtained (390 and 346) are reasonably precise, homogeneous between localities and much smaller than estimates of neighborhood size obtained previously using ecological methods. This discrepancy is discussed, and it is concluded that the estimates obtained here using genetic methods are the more reliable. PMID:3922852

  14. Individualized Statistical Learning from Medical Image Databases: Application to Identification of Brain Lesions

    PubMed Central

    Erus, Guray; Zacharaki, Evangelia I.; Davatzikos, Christos

    2014-01-01

    This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a “target-specific” feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject’s images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an “estimability” criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. PMID:24607564

  15. A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies.

    PubMed

    Khondoker, Mizanur; Dobson, Richard; Skirrow, Caroline; Simmons, Andrew; Stahl, Daniel

    2016-10-01

    Recent literature on the comparison of machine learning methods has raised questions about the neutrality, unbiasedness and utility of many comparative studies. Reporting of results on favourable datasets and sampling error in the estimated performance measures based on single samples are thought to be the major sources of bias in such comparisons. Better performance in one or a few instances does not necessarily imply so on an average or on a population level and simulation studies may be a better alternative for objectively comparing the performances of machine learning algorithms. We compare the classification performance of a number of important and widely used machine learning algorithms, namely the Random Forests (RF), Support Vector Machines (SVM), Linear Discriminant Analysis (LDA) and k-Nearest Neighbour (kNN). Using massively parallel processing on high-performance supercomputers, we compare the generalisation errors at various combinations of levels of several factors: number of features, training sample size, biological variation, experimental variation, effect size, replication and correlation between features. For smaller number of correlated features, number of features not exceeding approximately half the sample size, LDA was found to be the method of choice in terms of average generalisation errors as well as stability (precision) of error estimates. SVM (with RBF kernel) outperforms LDA as well as RF and kNN by a clear margin as the feature set gets larger provided the sample size is not too small (at least 20). The performance of kNN also improves as the number of features grows and outplays that of LDA and RF unless the data variability is too high and/or effect sizes are too small. RF was found to outperform only kNN in some instances where the data are more variable and have smaller effect sizes, in which cases it also provide more stable error estimates than kNN and LDA. Applications to a number of real datasets supported the findings from the simulation study. © The Author(s) 2013.

  16. Statistical inference involving binomial and negative binomial parameters.

    PubMed

    García-Pérez, Miguel A; Núñez-Antón, Vicente

    2009-05-01

    Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.

  17. Estimating snow leopard population abundance using photography and capture-recapture techniques

    USGS Publications Warehouse

    Jackson, R.M.; Roe, J.D.; Wangchuk, R.; Hunter, D.O.

    2006-01-01

    Conservation and management of snow leopards (Uncia uncia) has largely relied on anecdotal evidence and presence-absence data due to their cryptic nature and the difficult terrain they inhabit. These methods generally lack the scientific rigor necessary to accurately estimate population size and monitor trends. We evaluated the use of photography in capture-mark-recapture (CMR) techniques for estimating snow leopard population abundance and density within Hemis National Park, Ladakh, India. We placed infrared camera traps along actively used travel paths, scent-sprayed rocks, and scrape sites within 16- to 30-km2 sampling grids in successive winters during January and March 2003-2004. We used head-on, oblique, and side-view camera configurations to obtain snow leopard photographs at varying body orientations. We calculated snow leopard abundance estimates using the program CAPTURE. We obtained a total of 66 and 49 snow leopard captures resulting in 8.91 and 5.63 individuals per 100 trap-nights during 2003 and 2004, respectively. We identified snow leopards based on the distinct pelage patterns located primarily on the forelimbs, flanks, and dorsal surface of the tail. Capture probabilities ranged from 0.33 to 0.67. Density estimates ranged from 8.49 (SE = 0.22; individuals per 100 km2 in 2003 to 4.45 (SE = 0.16) in 2004. We believe the density disparity between years is attributable to different trap density and placement rather than to an actual decline in population size. Our results suggest that photographic capture-mark-recapture sampling may be a useful tool for monitoring demographic patterns. However, we believe a larger sample size would be necessary for generating a statistically robust estimate of population density and abundance based on CMR models.

  18. Multiple estimates of effective population size for monitoring a long-lived vertebrate: An application to Yellowstone grizzly bears

    USGS Publications Warehouse

    Kamath, Pauline L.; Haroldson, Mark A.; Luikart, Gordon; Paetkau, David; Whitman, Craig L.; van Manen, Frank T.

    2015-01-01

    Effective population size (Ne) is a key parameter for monitoring the genetic health of threatened populations because it reflects a population's evolutionary potential and risk of extinction due to genetic stochasticity. However, its application to wildlife monitoring has been limited because it is difficult to measure in natural populations. The isolated and well-studied population of grizzly bears (Ursus arctos) in the Greater Yellowstone Ecosystem provides a rare opportunity to examine the usefulness of different Ne estimators for monitoring. We genotyped 729 Yellowstone grizzly bears using 20 microsatellites and applied three single-sample estimators to examine contemporary trends in generation interval (GI), effective number of breeders (Nb) and Ne during 1982–2007. We also used multisample methods to estimate variance (NeV) and inbreeding Ne (NeI). Single-sample estimates revealed positive trajectories, with over a fourfold increase in Ne (≈100 to 450) and near doubling of the GI (≈8 to 14) from the 1980s to 2000s. NeV (240–319) and NeI (256) were comparable with the harmonic mean single-sample Ne (213) over the time period. Reanalysing historical data, we found NeV increased from ≈80 in the 1910s–1960s to ≈280 in the contemporary population. The estimated ratio of effective to total census size (Ne/Nc) was stable and high (0.42–0.66) compared to previous brown bear studies. These results support independent demographic evidence for Yellowstone grizzly bear population growth since the 1980s. They further demonstrate how genetic monitoring of Ne can complement demographic-based monitoring of Nc and vital rates, providing a valuable tool for wildlife managers.

  19. Sampling scales define occupancy and underlying occupancy-abundance relationships in animals.

    PubMed

    Steenweg, Robin; Hebblewhite, Mark; Whittington, Jesse; Lukacs, Paul; McKelvey, Kevin

    2018-01-01

    Occupancy-abundance (OA) relationships are a foundational ecological phenomenon and field of study, and occupancy models are increasingly used to track population trends and understand ecological interactions. However, these two fields of ecological inquiry remain largely isolated, despite growing appreciation of the importance of integration. For example, using occupancy models to infer trends in abundance is predicated on positive OA relationships. Many occupancy studies collect data that violate geographical closure assumptions due to the choice of sampling scales and application to mobile organisms, which may change how occupancy and abundance are related. Little research, however, has explored how different occupancy sampling designs affect OA relationships. We develop a conceptual framework for understanding how sampling scales affect the definition of occupancy for mobile organisms, which drives OA relationships. We explore how spatial and temporal sampling scales, and the choice of sampling unit (areal vs. point sampling), affect OA relationships. We develop predictions using simulations, and test them using empirical occupancy data from remote cameras on 11 medium-large mammals. Surprisingly, our simulations demonstrate that when using point sampling, OA relationships are unaffected by spatial sampling grain (i.e., cell size). In contrast, when using areal sampling (e.g., species atlas data), OA relationships are affected by spatial grain. Furthermore, OA relationships are also affected by temporal sampling scales, where the curvature of the OA relationship increases with temporal sampling duration. Our empirical results support these predictions, showing that at any given abundance, the spatial grain of point sampling does not affect occupancy estimates, but longer surveys do increase occupancy estimates. For rare species (low occupancy), estimates of occupancy will quickly increase with longer surveys, even while abundance remains constant. Our results also clearly demonstrate that occupancy for mobile species without geographical closure is not true occupancy. The independence of occupancy estimates from spatial sampling grain depends on the sampling unit. Point-sampling surveys can, however, provide unbiased estimates of occupancy for multiple species simultaneously, irrespective of home-range size. The use of occupancy for trend monitoring needs to explicitly articulate how the chosen sampling scales define occupancy and affect the occupancy-abundance relationship. © 2017 by the Ecological Society of America.

  20. A U-statistics based approach to sample size planning of two-arm trials with discrete outcome criterion aiming to establish either superiority or noninferiority.

    PubMed

    Wellek, Stefan

    2017-02-28

    In current practice, the most frequently applied approach to the handling of ties in the Mann-Whitney-Wilcoxon (MWW) test is based on the conditional distribution of the sum of mid-ranks, given the observed pattern of ties. Starting from this conditional version of the testing procedure, a sample size formula was derived and investigated by Zhao et al. (Stat Med 2008). In contrast, the approach we pursue here is a nonconditional one exploiting explicit representations for the variances of and the covariance between the two U-statistics estimators involved in the Mann-Whitney form of the test statistic. The accuracy of both ways of approximating the sample sizes required for attaining a prespecified level of power in the MWW test for superiority with arbitrarily tied data is comparatively evaluated by means of simulation. The key qualitative conclusions to be drawn from these numerical comparisons are as follows: With the sample sizes calculated by means of the respective formula, both versions of the test maintain the level and the prespecified power with about the same degree of accuracy. Despite the equivalence in terms of accuracy, the sample size estimates obtained by means of the new formula are in many cases markedly lower than that calculated for the conditional test. Perhaps, a still more important advantage of the nonconditional approach based on U-statistics is that it can be also adopted for noninferiority trials. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  1. A modified Wald interval for the area under the ROC curve (AUC) in diagnostic case-control studies

    PubMed Central

    2014-01-01

    Background The area under the receiver operating characteristic (ROC) curve, referred to as the AUC, is an appropriate measure for describing the overall accuracy of a diagnostic test or a biomarker in early phase trials without having to choose a threshold. There are many approaches for estimating the confidence interval for the AUC. However, all are relatively complicated to implement. Furthermore, many approaches perform poorly for large AUC values or small sample sizes. Methods The AUC is actually a probability. So we propose a modified Wald interval for a single proportion, which can be calculated on a pocket calculator. We performed a simulation study to compare this modified Wald interval (without and with continuity correction) with other intervals regarding coverage probability and statistical power. Results The main result is that the proposed modified Wald intervals maintain and exploit the type I error much better than the intervals of Agresti-Coull, Wilson, and Clopper-Pearson. The interval suggested by Bamber, the Mann-Whitney interval without transformation and also the interval of the binormal AUC are very liberal. For small sample sizes the Wald interval with continuity has a comparable coverage probability as the LT interval and higher power. For large sample sizes the results of the LT interval and of the Wald interval without continuity correction are comparable. Conclusions If individual patient data is not available, but only the estimated AUC and the total sample size, the modified Wald intervals can be recommended as confidence intervals for the AUC. For small sample sizes the continuity correction should be used. PMID:24552686

  2. A modified Wald interval for the area under the ROC curve (AUC) in diagnostic case-control studies.

    PubMed

    Kottas, Martina; Kuss, Oliver; Zapf, Antonia

    2014-02-19

    The area under the receiver operating characteristic (ROC) curve, referred to as the AUC, is an appropriate measure for describing the overall accuracy of a diagnostic test or a biomarker in early phase trials without having to choose a threshold. There are many approaches for estimating the confidence interval for the AUC. However, all are relatively complicated to implement. Furthermore, many approaches perform poorly for large AUC values or small sample sizes. The AUC is actually a probability. So we propose a modified Wald interval for a single proportion, which can be calculated on a pocket calculator. We performed a simulation study to compare this modified Wald interval (without and with continuity correction) with other intervals regarding coverage probability and statistical power. The main result is that the proposed modified Wald intervals maintain and exploit the type I error much better than the intervals of Agresti-Coull, Wilson, and Clopper-Pearson. The interval suggested by Bamber, the Mann-Whitney interval without transformation and also the interval of the binormal AUC are very liberal. For small sample sizes the Wald interval with continuity has a comparable coverage probability as the LT interval and higher power. For large sample sizes the results of the LT interval and of the Wald interval without continuity correction are comparable. If individual patient data is not available, but only the estimated AUC and the total sample size, the modified Wald intervals can be recommended as confidence intervals for the AUC. For small sample sizes the continuity correction should be used.

  3. Robustness of methods for blinded sample size re-estimation with overdispersed count data.

    PubMed

    Schneider, Simon; Schmidli, Heinz; Friede, Tim

    2013-09-20

    Counts of events are increasingly common as primary endpoints in randomized clinical trials. With between-patient heterogeneity leading to variances in excess of the mean (referred to as overdispersion), statistical models reflecting this heterogeneity by mixtures of Poisson distributions are frequently employed. Sample size calculation in the planning of such trials requires knowledge on the nuisance parameters, that is, the control (or overall) event rate and the overdispersion parameter. Usually, there is only little prior knowledge regarding these parameters in the design phase resulting in considerable uncertainty regarding the sample size. In this situation internal pilot studies have been found very useful and very recently several blinded procedures for sample size re-estimation have been proposed for overdispersed count data, one of which is based on an EM-algorithm. In this paper we investigate the EM-algorithm based procedure with respect to aspects of their implementation by studying the algorithm's dependence on the choice of convergence criterion and find that the procedure is sensitive to the choice of the stopping criterion in scenarios relevant to clinical practice. We also compare the EM-based procedure to other competing procedures regarding their operating characteristics such as sample size distribution and power. Furthermore, the robustness of these procedures to deviations from the model assumptions is explored. We find that some of the procedures are robust to at least moderate deviations. The results are illustrated using data from the US National Heart, Lung and Blood Institute sponsored Asymptomatic Cardiac Ischemia Pilot study. Copyright © 2013 John Wiley & Sons, Ltd.

  4. Assessment of optimum threshold and particle shape parameter for the image analysis of aggregate size distribution of concrete sections

    NASA Astrophysics Data System (ADS)

    Ozen, Murat; Guler, Murat

    2014-02-01

    Aggregate gradation is one of the key design parameters affecting the workability and strength properties of concrete mixtures. Estimating aggregate gradation from hardened concrete samples can offer valuable insights into the quality of mixtures in terms of the degree of segregation and the amount of deviation from the specified gradation limits. In this study, a methodology is introduced to determine the particle size distribution of aggregates from 2D cross sectional images of concrete samples. The samples used in the study were fabricated from six mix designs by varying the aggregate gradation, aggregate source and maximum aggregate size with five replicates of each design combination. Each sample was cut into three pieces using a diamond saw and then scanned to obtain the cross sectional images using a desktop flatbed scanner. An algorithm is proposed to determine the optimum threshold for the image analysis of the cross sections. A procedure was also suggested to determine a suitable particle shape parameter to be used in the analysis of aggregate size distribution within each cross section. Results of analyses indicated that the optimum threshold hence the pixel distribution functions may be different even for the cross sections of an identical concrete sample. Besides, the maximum ferret diameter is the most suitable shape parameter to estimate the size distribution of aggregates when computed based on the diagonal sieve opening. The outcome of this study can be of practical value for the practitioners to evaluate concrete in terms of the degree of segregation and the bounds of mixture's gradation achieved during manufacturing.

  5. Brain size growth in wild and captive chimpanzees (Pan troglodytes).

    PubMed

    Cofran, Zachary

    2018-05-24

    Despite many studies of chimpanzee brain size growth, intraspecific variation is under-explored. Brain size data from chimpanzees of the Taï Forest and the Yerkes Primate Research Center enable a unique glimpse into brain growth variation as age at death is known for individuals, allowing cross-sectional growth curves to be estimated. Because Taï chimpanzees are from the wild but Yerkes apes are captive, potential environmental effects on neural development can also be explored. Previous research has revealed differences in growth and health between wild and captive primates, but such habitat effects have yet to be investigated for brain growth. Here, I use an iterative curve fitting procedure to estimate brain growth and regression parameters for each population, statistically comparing growth models using bootstrapped confidence intervals. Yerkes and Taï brain sizes overlap at all ages, although the sole Taï newborn is at the low end of captive neonatal variation. Growth rate and duration are statistically indistinguishable between the two populations. Resampling the Yerkes sample to match the Taï sample size and age group composition shows that ontogenetic variation in the two groups are remarkably similar despite the latter's limited size. Best fit growth curves for each sample indicate cessation of brain size growth at around 2 years, earlier than has previously been reported. The overall similarity between wild and captive chimpanzees points to the canalization of brain growth in this species. © 2018 Wiley Periodicals, Inc.

  6. SnagPRO: snag and tree sampling and analysis methods for wildlife

    Treesearch

    Lisa J. Bate; Michael J. Wisdom; Edward O. Garton; Shawn C. Clabough

    2008-01-01

    We describe sampling methods and provide software to accurately and efficiently estimate snag and tree densities at desired scales to meet a variety of research and management objectives. The methods optimize sampling effort by choosing a plot size appropriate for the specified forest conditions and sampling goals. Plot selection and data analyses are supported by...

  7. Efficient Robust Regression via Two-Stage Generalized Empirical Likelihood

    PubMed Central

    Bondell, Howard D.; Stefanski, Leonard A.

    2013-01-01

    Large- and finite-sample efficiency and resistance to outliers are the key goals of robust statistics. Although often not simultaneously attainable, we develop and study a linear regression estimator that comes close. Efficiency obtains from the estimator’s close connection to generalized empirical likelihood, and its favorable robustness properties are obtained by constraining the associated sum of (weighted) squared residuals. We prove maximum attainable finite-sample replacement breakdown point, and full asymptotic efficiency for normal errors. Simulation evidence shows that compared to existing robust regression estimators, the new estimator has relatively high efficiency for small sample sizes, and comparable outlier resistance. The estimator is further illustrated and compared to existing methods via application to a real data set with purported outliers. PMID:23976805

  8. Capital Budgeting Decisions with Post-Audit Information

    DTIC Science & Technology

    1990-06-08

    estimates that were used during project selection. In similar fashion, this research introduces the equivalent sample size concept that permits the... equivalent sample size is extended to include the user’s prior beliefs. 4. For a management tool, the concepts for Cash Flow Control Charts are...Acoxxting Research , vol. 7, no. 2, Autumn 1969, pp. 215-244. [9] Gaynor, Edwin W., "Use of Control Charts in Cost Control ", National Association of Cost

  9. Annual design-based estimation for the annualized inventories of forest inventory and analysis: sample size determination

    Treesearch

    Hans T. Schreuder; Jin-Mann S. Lin; John Teply

    2000-01-01

    The Forest Inventory and Analysis units in the USDA Forest Service have been mandated by Congress to go to an annualized inventory where a certain percentage of plots, say 20 percent, will be measured in each State each year. Although this will result in an annual sample size that will be too small for reliable inference for many areas, it is a sufficiently large...

  10. Spatial pattern corrections and sample sizes for forest density estimates of historical tree surveys

    Treesearch

    Brice B. Hanberry; Shawn Fraver; Hong S. He; Jian Yang; Dan C. Dey; Brian J. Palik

    2011-01-01

    The U.S. General Land Office land surveys document trees present during European settlement. However, use of these surveys for calculating historical forest density and other derived metrics is limited by uncertainty about the performance of plotless density estimators under a range of conditions. Therefore, we tested two plotless density estimators, developed by...

  11. Marginal Maximum A Posteriori Item Parameter Estimation for the Generalized Graded Unfolding Model

    ERIC Educational Resources Information Center

    Roberts, James S.; Thompson, Vanessa M.

    2011-01-01

    A marginal maximum a posteriori (MMAP) procedure was implemented to estimate item parameters in the generalized graded unfolding model (GGUM). Estimates from the MMAP method were compared with those derived from marginal maximum likelihood (MML) and Markov chain Monte Carlo (MCMC) procedures in a recovery simulation that varied sample size,…

  12. The evolution of body size and shape in the human career

    PubMed Central

    Grabowski, Mark; Hatala, Kevin G.; Richmond, Brian G.

    2016-01-01

    Body size is a fundamental biological property of organisms, and documenting body size variation in hominin evolution is an important goal of palaeoanthropology. Estimating body mass appears deceptively simple but is laden with theoretical and pragmatic assumptions about best predictors and the most appropriate reference samples. Modern human training samples with known masses are arguably the ‘best’ for estimating size in early bipedal hominins such as the australopiths and all members of the genus Homo, but it is not clear if they are the most appropriate priors for reconstructing the size of the earliest putative hominins such as Orrorin and Ardipithecus. The trajectory of body size evolution in the early part of the human career is reviewed here and found to be complex and nonlinear. Australopith body size varies enormously across both space and time. The pre-erectus early Homo fossil record from Africa is poor and dominated by relatively small-bodied individuals, implying that the emergence of the genus Homo is probably not linked to an increase in body size or unprecedented increases in size variation. Body size differences alone cannot explain the observed variation in hominin body shape, especially when examined in the context of small fossil hominins and pygmy modern humans. This article is part of the themed issue ‘Major transitions in human evolution’. PMID:27298459

  13. The evolution of body size and shape in the human career.

    PubMed

    Jungers, William L; Grabowski, Mark; Hatala, Kevin G; Richmond, Brian G

    2016-07-05

    Body size is a fundamental biological property of organisms, and documenting body size variation in hominin evolution is an important goal of palaeoanthropology. Estimating body mass appears deceptively simple but is laden with theoretical and pragmatic assumptions about best predictors and the most appropriate reference samples. Modern human training samples with known masses are arguably the 'best' for estimating size in early bipedal hominins such as the australopiths and all members of the genus Homo, but it is not clear if they are the most appropriate priors for reconstructing the size of the earliest putative hominins such as Orrorin and Ardipithecus The trajectory of body size evolution in the early part of the human career is reviewed here and found to be complex and nonlinear. Australopith body size varies enormously across both space and time. The pre-erectus early Homo fossil record from Africa is poor and dominated by relatively small-bodied individuals, implying that the emergence of the genus Homo is probably not linked to an increase in body size or unprecedented increases in size variation. Body size differences alone cannot explain the observed variation in hominin body shape, especially when examined in the context of small fossil hominins and pygmy modern humans.This article is part of the themed issue 'Major transitions in human evolution'. © 2016 The Author(s).

  14. Overcoming the winner's curse: estimating penetrance parameters from case-control data.

    PubMed

    Zollner, Sebastian; Pritchard, Jonathan K

    2007-04-01

    Genomewide association studies are now a widely used approach in the search for loci that affect complex traits. After detection of significant association, estimates of penetrance and allele-frequency parameters for the associated variant indicate the importance of that variant and facilitate the planning of replication studies. However, when these estimates are based on the original data used to detect the variant, the results are affected by an ascertainment bias known as the "winner's curse." The actual genetic effect is typically smaller than its estimate. This overestimation of the genetic effect may cause replication studies to fail because the necessary sample size is underestimated. Here, we present an approach that corrects for the ascertainment bias and generates an estimate of the frequency of a variant and its penetrance parameters. The method produces a point estimate and confidence region for the parameter estimates. We study the performance of this method using simulated data sets and show that it is possible to greatly reduce the bias in the parameter estimates, even when the original association study had low power. The uncertainty of the estimate decreases with increasing sample size, independent of the power of the original test for association. Finally, we show that application of the method to case-control data can improve the design of replication studies considerably.

  15. Linear Combinations of Multiple Outcome Measures to Improve the Power of Efficacy Analysis ---Application to Clinical Trials on Early Stage Alzheimer Disease

    PubMed Central

    Xiong, Chengjie; Luo, Jingqin; Morris, John C; Bateman, Randall

    2018-01-01

    Modern clinical trials on Alzheimer disease (AD) focus on the early symptomatic stage or even the preclinical stage. Subtle disease progression at the early stages, however, poses a major challenge in designing such clinical trials. We propose a multivariate mixed model on repeated measures to model the disease progression over time on multiple efficacy outcomes, and derive the optimum weights to combine multiple outcome measures by minimizing the sample sizes to adequately power the clinical trials. A cross-validation simulation study is conducted to assess the accuracy for the estimated weights as well as the improvement in reducing the sample sizes for such trials. The proposed methodology is applied to the multiple cognitive tests from the ongoing observational study of the Dominantly Inherited Alzheimer Network (DIAN) to power future clinical trials in the DIAN with a cognitive endpoint. Our results show that the optimum weights to combine multiple outcome measures can be accurately estimated, and that compared to the individual outcomes, the combined efficacy outcome with these weights significantly reduces the sample size required to adequately power clinical trials. When applied to the clinical trial in the DIAN, the estimated linear combination of six cognitive tests can adequately power the clinical trial. PMID:29546251

  16. Outcome-Dependent Sampling Design and Inference for Cox's Proportional Hazards Model.

    PubMed

    Yu, Jichang; Liu, Yanyan; Cai, Jianwen; Sandler, Dale P; Zhou, Haibo

    2016-11-01

    We propose a cost-effective outcome-dependent sampling design for the failure time data and develop an efficient inference procedure for data collected with this design. To account for the biased sampling scheme, we derive estimators from a weighted partial likelihood estimating equation. The proposed estimators for regression parameters are shown to be consistent and asymptotically normally distributed. A criteria that can be used to optimally implement the ODS design in practice is proposed and studied. The small sample performance of the proposed method is evaluated by simulation studies. The proposed design and inference procedure is shown to be statistically more powerful than existing alternative designs with the same sample sizes. We illustrate the proposed method with an existing real data from the Cancer Incidence and Mortality of Uranium Miners Study.

  17. A Generalized Least Squares Regression Approach for Computing Effect Sizes in Single-Case Research: Application Examples

    ERIC Educational Resources Information Center

    Maggin, Daniel M.; Swaminathan, Hariharan; Rogers, Helen J.; O'Keeffe, Breda V.; Sugai, George; Horner, Robert H.

    2011-01-01

    A new method for deriving effect sizes from single-case designs is proposed. The strategy is applicable to small-sample time-series data with autoregressive errors. The method uses Generalized Least Squares (GLS) to model the autocorrelation of the data and estimate regression parameters to produce an effect size that represents the magnitude of…

  18. Cross-seasonal effects on waterfowl productivity: Implications under climate change

    USGS Publications Warehouse

    Osnas, Erik; Zhao, Qing; Runge, Michael C.; Boomer, G Scott

    2016-01-01

    Previous efforts to relate winter-ground precipitation to subsequent reproductive success as measured by the ratio of juveniles to adults in the autumn failed to account for increased vulnerability of juvenile ducks to hunting and uncertainty in the estimated age ratio. Neglecting increased juvenile vulnerability will positively bias the mean productivity estimate, and neglecting increased vulnerability and estimation uncertainty will positively bias the year-to-year variance in productivity because raw age ratios are the product of sampling variation, the year-specific vulnerability, and year-specific reproductive success. Therefore, we estimated the effects of cumulative winter precipitation in the California Central Valley and the Mississippi Alluvial Valley on pintail (Anas acuta) and mallard (Anas platyrhnchos) reproduction, respectively, using hierarchical Bayesian methods to correct for sampling bias in productivity estimates and observation error in covariates. We applied the model to a hunter-collected parts survey implemented by the United States Fish and Wildlife Service and band recoveries reported to the United States Geological Survey Bird Banding Laboratory using data from 1961 to 2013. We compared our results to previous estimates that used simple linear regression on uncorrected age ratios from a smaller subset of years in pintail (1961–1985). Like previous analyses, we found large and consistent effects of population size and wetland conditions in prairie Canada on mallard productivity, and large effects of population size and mean latitude of the observed breeding population on pintail productivity. Unlike previous analyses, we report a large amount of uncertainty in the estimated effects of wintering-ground precipitation on pintail and mallard productivity, with considerable uncertainty in the sign of the estimated main effect, although the posterior medians of precipitation effects were consistent with past studies. We found more consistent estimates in the sign of an interaction effect between population size and precipitation, suggesting that wintering-ground precipitation has a larger effect in years of high population size, especially for pintail. When we used the estimated effects in a population model to derive a sustainable harvest and population size projection (i.e., a yield curve), there was considerable uncertainty in the effect of increased or decreased wintering-ground precipitation on sustainable harvest potential and population size. These results suggest that the mechanism of cross-seasonal effects between winter habitat and reproduction in ducks occurs through a reduction in the strength of density dependence in years of above-average wintering-ground precipitation. We suggest additional investigation of the underlying mechanisms and that habitat managers and decision-makers consider the level of uncertainty in these estimates when attempting to integrate habitat management and harvest management decisions. Collection of annual data on the status of wintering-ground habitat in a rigorous sampling framework would likely be the most direct way to improve understanding of mechanisms and inform management.

  19. Inventory and mapping of flood inundation using interactive digital image analysis techniques

    USGS Publications Warehouse

    Rohde, Wayne G.; Nelson, Charles A.; Taranik, J.V.

    1979-01-01

    LANDSAT digital data and color infra-red photographs were used in a multiphase sampling scheme to estimate the area of agricultural land affected by a flood. The LANDSAT data were classified with a maximum likelihood algorithm. Stratification of the LANDSAT data, prior to classification, greatly reduced misclassification errors. The classification results were used to prepare a map overlay showing the areal extent of flooding. These data also provided statistics required to estimate sample size in a two phase sampling scheme, and provided quick, accurate estimates of areas flooded for the first phase. The measurements made in the second phase, based on ground data and photo-interpretation, were used with two phase sampling statistics to estimate the area of agricultural land affected by flooding These results show that LANDSAT digital data can be used to prepare map overlays showing the extent of flooding on agricultural land and, with two phase sampling procedures, can provide acreage estimates with sampling errors of about 5 percent. This procedure provides a technique for rapidly assessing the areal extent of flood conditions on agricultural land and would provide a basis for designing a sampling framework to estimate the impact of flooding on crop production.

  20. Net trophic transfer efficiency of PCBs to Lake Michigan coho salmon from their prey

    USGS Publications Warehouse

    Madenjian, Charles P.; Elliott, Robert F.; Schmidt, Larry J.; DeSorcie, Timothy J.; Hesselberg, Robert J.; Quintal, Richard T.; Begnoche, Linda J.; Bouchard, Patrick M.; Holey, Mark E.

    1998-01-01

    Most of the polychlorinated biphenyl (PCB) body burden accumulated by coho salmon (Oncorhynchus kisutch) from the Laurentian Great Lakes is from their food. We used diet information, PCB determinations in both coho salmon and their prey, and bioenergetics modeling to estimate the efficiency with which Lake Michigan coho salmon retain PCBs from their food. Our estimate was the most reliable estimate to date because (a) the coho salmon and prey fish sampled during our study were sampled in spring, summer, and fall from various locations throughout the lake, (b) detailed measurements were made on the PCB concentrations of both coho salmon and prey fish over wide ranges in fish size, and (c) coho salmon diet was analyzed in detail from April through November over a wide range of salmon size from numerous locations throughout the lake. We estimated that coho salmon from Lake Michigan retain 50% of the PCBs that are contained within their food.

  1. Mourning dove population trend estimates from Call-Count and North American Breeding Bird Surveys

    USGS Publications Warehouse

    Sauer, J.R.; Dolton, D.D.; Droege, S.

    1994-01-01

    The mourning dove (Zenaida macroura) Callcount Survey and the North American Breeding Bird Survey provide information on population trends of mourning doves throughout the continental United States. Because surveys are an integral part of the development of hunting regulations, a need exists to determine which survey provides precise information. We estimated population trends from 1966 to 1988 by state and dove management unit, and assessed the relative efficiency of each survey. Estimates of population trend differ (P lt 0.05) between surveys in 11 of 48 states; 9 of 11 states with divergent results occur in the Eastern Management Unit. Differences were probably a consequence of smaller sample sizes in the Callcount Survey. The Breeding Bird Survey generally provided trend estimates with smaller variances than did the Callcount Survey. Although the Callcount Survey probably provides more withinroute accuracy because of survey methods and timing, the Breeding Bird Survey has a larger sample size of survey routes and greater consistency of coverage in the Eastern Unit.

  2. Testing homogeneity of proportion ratios for stratified correlated bilateral data in two-arm randomized clinical trials.

    PubMed

    Pei, Yanbo; Tian, Guo-Liang; Tang, Man-Lai

    2014-11-10

    Stratified data analysis is an important research topic in many biomedical studies and clinical trials. In this article, we develop five test statistics for testing the homogeneity of proportion ratios for stratified correlated bilateral binary data based on an equal correlation model assumption. Bootstrap procedures based on these test statistics are also considered. To evaluate the performance of these statistics and procedures, we conduct Monte Carlo simulations to study their empirical sizes and powers under various scenarios. Our results suggest that the procedure based on score statistic performs well generally and is highly recommended. When the sample size is large, procedures based on the commonly used weighted least square estimate and logarithmic transformation with Mantel-Haenszel estimate are recommended as they do not involve any computation of maximum likelihood estimates requiring iterative algorithms. We also derive approximate sample size formulas based on the recommended test procedures. Finally, we apply the proposed methods to analyze a multi-center randomized clinical trial for scleroderma patients. Copyright © 2014 John Wiley & Sons, Ltd.

  3. Ratio of Cut Surface Area to Leaf Sample Volume for Water Potential Measurements by Thermocouple Psychrometers

    PubMed Central

    Walker, Sue; Oosterhuis, Derrick M.; Wiebe, Herman H.

    1984-01-01

    Evaporative losses from the cut edge of leaf samples are of considerable importance in measurements of leaf water potential using thermocouple psychrometers. The ratio of cut surface area to leaf sample volume (area to volume ratio) has been used to give an estimate of possible effects of evaporative loss in relation to sample size. A wide range of sample sizes with different area to volume ratios has been used. Our results using Glycine max L. Merr. cv Bragg indicate that leaf samples with area to volume values less than 0.2 square millimeter per cubic millimeter give psychrometric leaf water potential measurements that compare favorably with pressure chamber measurements. PMID:16663578

  4. Comparing population size estimators for plethodontid salamanders

    USGS Publications Warehouse

    Bailey, L.L.; Simons, T.R.; Pollock, K.H.

    2004-01-01

    Despite concern over amphibian declines, few studies estimate absolute abundances because of logistic and economic constraints and previously poor estimator performance. Two estimation approaches recommended for amphibian studies are mark-recapture and depletion (or removal) sampling. We compared abundance estimation via various mark-recapture and depletion methods, using data from a three-year study of terrestrial salamanders in Great Smoky Mountains National Park. Our results indicate that short-term closed-population, robust design, and depletion methods estimate surface population of salamanders (i.e., those near the surface and available for capture during a given sampling occasion). In longer duration studies, temporary emigration violates assumptions of both open- and closed-population mark-recapture estimation models. However, if the temporary emigration is completely random, these models should yield unbiased estimates of the total population (superpopulation) of salamanders in the sampled area. We recommend using Pollock's robust design in mark-recapture studies because of its flexibility to incorporate variation in capture probabilities and to estimate temporary emigration probabilities.

  5. Comparison of methods for estimating the attributable risk in the context of survival analysis.

    PubMed

    Gassama, Malamine; Bénichou, Jacques; Dartois, Laureen; Thiébaut, Anne C M

    2017-01-23

    The attributable risk (AR) measures the proportion of disease cases that can be attributed to an exposure in the population. Several definitions and estimation methods have been proposed for survival data. Using simulations, we compared four methods for estimating AR defined in terms of survival functions: two nonparametric methods based on Kaplan-Meier's estimator, one semiparametric based on Cox's model, and one parametric based on the piecewise constant hazards model, as well as one simpler method based on estimated exposure prevalence at baseline and Cox's model hazard ratio. We considered a fixed binary exposure with varying exposure probabilities and strengths of association, and generated event times from a proportional hazards model with constant or monotonic (decreasing or increasing) Weibull baseline hazard, as well as from a nonproportional hazards model. We simulated 1,000 independent samples of size 1,000 or 10,000. The methods were compared in terms of mean bias, mean estimated standard error, empirical standard deviation and 95% confidence interval coverage probability at four equally spaced time points. Under proportional hazards, all five methods yielded unbiased results regardless of sample size. Nonparametric methods displayed greater variability than other approaches. All methods showed satisfactory coverage except for nonparametric methods at the end of follow-up for a sample size of 1,000 especially. With nonproportional hazards, nonparametric methods yielded similar results to those under proportional hazards, whereas semiparametric and parametric approaches that both relied on the proportional hazards assumption performed poorly. These methods were applied to estimate the AR of breast cancer due to menopausal hormone therapy in 38,359 women of the E3N cohort. In practice, our study suggests to use the semiparametric or parametric approaches to estimate AR as a function of time in cohort studies if the proportional hazards assumption appears appropriate.

  6. An evaluation of the efficiency of minnow traps for estimating the abundance of minnows in desert spring systems

    USGS Publications Warehouse

    Peterson, James T.; Scheerer, Paul D.; Clements, Shaun

    2015-01-01

    Desert springs are sensitive aquatic ecosystems that pose unique challenges to natural resource managers and researchers. Among the most important of these is the need to accurately quantify population parameters for resident fish, particularly when the species are of special conservation concern. We evaluated the efficiency of baited minnow traps for estimating the abundance of two at-risk species, Foskett Speckled Dace Rhinichthys osculus ssp. and Borax Lake Chub Gila boraxobius, in desert spring systems in southeastern Oregon. We evaluated alternative sample designs using simulation and found that capture–recapture designs with four capture occasions would maximize the accuracy of estimates and minimize fish handling. We implemented the design and estimated capture and recapture probabilities using the Huggins closed-capture estimator. Trap capture probabilities averaged 23% and 26% for Foskett Speckled Dace and Borax Lake Chub, respectively, but differed substantially among sample locations, through time, and nonlinearly with fish body size. Recapture probabilities for Foskett Speckled Dace were, on average, 1.6 times greater than (first) capture probabilities, suggesting “trap-happy” behavior. Comparison of population estimates from the Huggins model with the commonly used Lincoln–Petersen estimator indicated that the latter underestimated Foskett Speckled Dace and Borax Lake Chub population size by 48% and by 20%, respectively. These biases were due to variability in capture and recapture probabilities. Simulation of fish monitoring that included the range of capture and recapture probabilities observed indicated that variability in capture and recapture probabilities in time negatively affected the ability to detect annual decreases by up to 20% in fish population size. Failure to account for variability in capture and recapture probabilities can lead to poor quality data and study inferences. Therefore, we recommend that fishery researchers and managers employ sample designs and estimators that can account for this variability.

  7. Estimating effective population size from linkage disequilibrium between unlinked loci: theory and application to fruit fly outbreak populations.

    PubMed

    Sved, John A; Cameron, Emilie C; Gilchrist, A Stuart

    2013-01-01

    There is a substantial literature on the use of linkage disequilibrium (LD) to estimate effective population size using unlinked loci. The Ne estimates are extremely sensitive to the sampling process, and there is currently no theory to cope with the possible biases. We derive formulae for the analysis of idealised populations mating at random with multi-allelic (microsatellite) loci. The 'Burrows composite index' is introduced in a novel way with a 'composite haplotype table'. We show that in a sample of diploid size S, the mean value of x2 or r2 from the composite haplotype table is biased by a factor of 1-1/(2S-1)2, rather than the usual factor 1+1/(2S-1) for a conventional haplotype table. But analysis of population data using these formulae leads to Ne estimates that are unrealistically low. We provide theory and simulation to show that this bias towards low Ne estimates is due to null alleles, and introduce a randomised permutation correction to compensate for the bias. We also consider the effect of introducing a within-locus disequilibrium factor to r2, and find that this factor leads to a bias in the Ne estimate. However this bias can be overcome using the same randomised permutation correction, to yield an altered r2 with lower variance than the original r2, and one that is also insensitive to null alleles. The resulting formulae are used to provide Ne estimates on 40 samples of the Queensland fruit fly, Bactrocera tryoni, from populations with widely divergent Ne expectations. Linkage relationships are known for most of the microsatellite loci in this species. We find that there is little difference in the estimated Ne values from using known unlinked loci as compared to using all loci, which is important for conservation studies where linkage relationships are unknown.

  8. Estimating modal abundances from the spectra of natural and laboratory pyroxene mixtures using the modified Gaussian model

    NASA Technical Reports Server (NTRS)

    Sunshine, Jessica M.; Pieters, Carle M.

    1993-01-01

    The modified Gaussian model (MGM) is used to explore spectra of samples containing multiple pyroxene components as a function of modal abundance. The MGM allows spectra to be analyzed directly, without the use of actual or assumed end-member spectra and therefore holds great promise for remote applications. A series of mass fraction mixtures created from several different particle size fractions are analyzed with the MGM to quantify the properties of pyroxene mixtures as a function of both modal abundance and grain size. Band centers, band widths, and relative band strengths of absorptions from individual pyroxenes in mixture spectra are found to be largely independent of particle size. Spectral properties of both zoned and exsolved pyroxene components are resolved in exsolved samples using the MGM, and modal abundances are accurately estimated to within 5-10 percent without predetermined knowledge of the end-member spectra.

  9. Characterization of particulate emissions from Australian open-cut coal mines: Toward improved emission estimates.

    PubMed

    Richardson, Claire; Rutherford, Shannon; Agranovski, Igor

    2018-06-01

    Given the significance of mining as a source of particulates, accurate characterization of emissions is important for the development of appropriate emission estimation techniques for use in modeling predictions and to inform regulatory decisions. The currently available emission estimation methods for Australian open-cut coal mines relate primarily to total suspended particulates and PM 10 (particulate matter with an aerodynamic diameter <10 μm), and limited data are available relating to the PM 2.5 (<2.5 μm) size fraction. To provide an initial analysis of the appropriateness of the currently available emission estimation techniques, this paper presents results of sampling completed at three open-cut coal mines in Australia. The monitoring data demonstrate that the particulate size fraction varies for different mining activities, and that the region in which the mine is located influences the characteristics of the particulates emitted to the atmosphere. The proportion of fine particulates in the sample increased with distance from the source, with the coarse fraction being a more significant proportion of total suspended particulates close to the source of emissions. In terms of particulate composition, the results demonstrate that the particulate emissions are predominantly sourced from naturally occurring geological material, and coal comprises less than 13% of the overall emissions. The size fractionation exhibited by the sampling data sets is similar to that adopted in current Australian emission estimation methods but differs from the size fractionation presented in the U.S. Environmental Protection Agency methodology. Development of region-specific emission estimation techniques for PM 10 and PM 2.5 from open-cut coal mines is necessary to allow accurate prediction of particulate emissions to inform regulatory decisions and for use in modeling predictions. Development of region-specific emission estimation techniques for PM 10 and PM 2.5 from open-cut coal mines is necessary to allow accurate prediction of particulate emissions to inform regulatory decisions and for use in modeling predictions. Comprehensive air quality monitoring was undertaken, and corresponding recommendations were provided.

  10. Using simulation to aid trial design: Ring-vaccination trials.

    PubMed

    Hitchings, Matt David Thomas; Grais, Rebecca Freeman; Lipsitch, Marc

    2017-03-01

    The 2014-6 West African Ebola epidemic highlights the need for rigorous, rapid clinical trial methods for vaccines. A challenge for trial design is making sample size calculations based on incidence within the trial, total vaccine effect, and intracluster correlation, when these parameters are uncertain in the presence of indirect effects of vaccination. We present a stochastic, compartmental model for a ring vaccination trial. After identification of an index case, a ring of contacts is recruited and either vaccinated immediately or after 21 days. The primary outcome of the trial is total vaccine effect, counting cases only from a pre-specified window in which the immediate arm is assumed to be fully protected and the delayed arm is not protected. Simulation results are used to calculate necessary sample size and estimated vaccine effect. Under baseline assumptions about vaccine properties, monthly incidence in unvaccinated rings and trial design, a standard sample-size calculation neglecting dynamic effects estimated that 7,100 participants would be needed to achieve 80% power to detect a difference in attack rate between arms, while incorporating dynamic considerations in the model increased the estimate to 8,900. This approach replaces assumptions about parameters at the ring level with assumptions about disease dynamics and vaccine characteristics at the individual level, so within this framework we were able to describe the sensitivity of the trial power and estimated effect to various parameters. We found that both of these quantities are sensitive to properties of the vaccine, to setting-specific parameters over which investigators have little control, and to parameters that are determined by the study design. Incorporating simulation into the trial design process can improve robustness of sample size calculations. For this specific trial design, vaccine effectiveness depends on properties of the ring vaccination design and on the measurement window, as well as the epidemiologic setting.

  11. Genetics of wellbeing and its components satisfaction with life, happiness, and quality of life: a review and meta-analysis of heritability studies.

    PubMed

    Bartels, Meike

    2015-03-01

    Wellbeing is a major topic of research across several disciplines, reflecting the increasing recognition of its strong value across major domains in life. Previous twin-family studies have revealed that individual differences in wellbeing are accounted for by both genetic as well as environmental factors. A systematic literature search identified 30 twin-family studies on wellbeing or a related measure such as satisfaction with life or happiness. Review of these studies showed considerable variation in heritability estimates (ranging from 0 to 64 %), which makes it difficult to draw firm conclusions regarding the genetic influences on wellbeing. For overall wellbeing twelve heritability estimates, from 10 independent studies, were meta-analyzed by computing a sample size weighted average heritability. Ten heritability estimates, derived from 9 independent samples, were used for the meta-analysis of satisfaction with life. The weighted average heritability of wellbeing, based on a sample size of 55,974 individuals, was 36 % (34-38), while the weighted average heritability for satisfaction with life was 32 % (29-35) (n = 47,750). With this result a more robust estimate of the relative influence of genetic effects on wellbeing is provided.

  12. Comparison of structural and least-squares lines for estimating geologic relations

    USGS Publications Warehouse

    Williams, G.P.; Troutman, B.M.

    1990-01-01

    Two different goals in fitting straight lines to data are to estimate a "true" linear relation (physical law) and to predict values of the dependent variable with the smallest possible error. Regarding the first goal, a Monte Carlo study indicated that the structural-analysis (SA) method of fitting straight lines to data is superior to the ordinary least-squares (OLS) method for estimating "true" straight-line relations. Number of data points, slope and intercept of the true relation, and variances of the errors associated with the independent (X) and dependent (Y) variables influence the degree of agreement. For example, differences between the two line-fitting methods decrease as error in X becomes small relative to error in Y. Regarding the second goal-predicting the dependent variable-OLS is better than SA. Again, the difference diminishes as X takes on less error relative to Y. With respect to estimation of slope and intercept and prediction of Y, agreement between Monte Carlo results and large-sample theory was very good for sample sizes of 100, and fair to good for sample sizes of 20. The procedures and error measures are illustrated with two geologic examples. ?? 1990 International Association for Mathematical Geology.

  13. Cluster randomised crossover trials with binary data and unbalanced cluster sizes: application to studies of near-universal interventions in intensive care.

    PubMed

    Forbes, Andrew B; Akram, Muhammad; Pilcher, David; Cooper, Jamie; Bellomo, Rinaldo

    2015-02-01

    Cluster randomised crossover trials have been utilised in recent years in the health and social sciences. Methods for analysis have been proposed; however, for binary outcomes, these have received little assessment of their appropriateness. In addition, methods for determination of sample size are currently limited to balanced cluster sizes both between clusters and between periods within clusters. This article aims to extend this work to unbalanced situations and to evaluate the properties of a variety of methods for analysis of binary data, with a particular focus on the setting of potential trials of near-universal interventions in intensive care to reduce in-hospital mortality. We derive a formula for sample size estimation for unbalanced cluster sizes, and apply it to the intensive care setting to demonstrate the utility of the cluster crossover design. We conduct a numerical simulation of the design in the intensive care setting and for more general configurations, and we assess the performance of three cluster summary estimators and an individual-data estimator based on binomial-identity-link regression. For settings similar to the intensive care scenario involving large cluster sizes and small intra-cluster correlations, the sample size formulae developed and analysis methods investigated are found to be appropriate, with the unweighted cluster summary method performing well relative to the more optimal but more complex inverse-variance weighted method. More generally, we find that the unweighted and cluster-size-weighted summary methods perform well, with the relative efficiency of each largely determined systematically from the study design parameters. Performance of individual-data regression is adequate with small cluster sizes but becomes inefficient for large, unbalanced cluster sizes. When outcome prevalences are 6% or less and the within-cluster-within-period correlation is 0.05 or larger, all methods display sub-nominal confidence interval coverage, with the less prevalent the outcome the worse the coverage. As with all simulation studies, conclusions are limited to the configurations studied. We confined attention to detecting intervention effects on an absolute risk scale using marginal models and did not explore properties of binary random effects models. Cluster crossover designs with binary outcomes can be analysed using simple cluster summary methods, and sample size in unbalanced cluster size settings can be determined using relatively straightforward formulae. However, caution needs to be applied in situations with low prevalence outcomes and moderate to high intra-cluster correlations. © The Author(s) 2014.

  14. Evaluation of the procedure 1A component of the 1980 US/Canada wheat and barley exploratory experiment

    NASA Technical Reports Server (NTRS)

    Chapman, G. M. (Principal Investigator); Carnes, J. G.

    1981-01-01

    Several techniques which use clusters generated by a new clustering algorithm, CLASSY, are proposed as alternatives to random sampling to obtain greater precision in crop proportion estimation: (1) Proportional Allocation/relative count estimator (PA/RCE) uses proportional allocation of dots to clusters on the basis of cluster size and a relative count cluster level estimate; (2) Proportional Allocation/Bayes Estimator (PA/BE) uses proportional allocation of dots to clusters and a Bayesian cluster-level estimate; and (3) Bayes Sequential Allocation/Bayesian Estimator (BSA/BE) uses sequential allocation of dots to clusters and a Bayesian cluster level estimate. Clustering in an effective method in making proportion estimates. It is estimated that, to obtain the same precision with random sampling as obtained by the proportional sampling of 50 dots with an unbiased estimator, samples of 85 or 166 would need to be taken if dot sets with AI labels (integrated procedure) or ground truth labels, respectively were input. Dot reallocation provides dot sets that are unbiased. It is recommended that these proportion estimation techniques are maintained, particularly the PA/BE because it provides the greatest precision.

  15. Effects of field plot size on prediction accuracy of aboveground biomass in airborne laser scanning-assisted inventories in tropical rain forests of Tanzania.

    PubMed

    Mauya, Ernest William; Hansen, Endre Hofstad; Gobakken, Terje; Bollandsås, Ole Martin; Malimbwi, Rogers Ernest; Næsset, Erik

    2015-12-01

    Airborne laser scanning (ALS) has recently emerged as a promising tool to acquire auxiliary information for improving aboveground biomass (AGB) estimation in sample-based forest inventories. Under design-based and model-assisted inferential frameworks, the estimation relies on a model that relates the auxiliary ALS metrics to AGB estimated on ground plots. The size of the field plots has been identified as one source of model uncertainty because of the so-called boundary effects which increases with decreasing plot size. Recent research in tropical forests has aimed to quantify the boundary effects on model prediction accuracy, but evidence of the consequences for the final AGB estimates is lacking. In this study we analyzed the effect of field plot size on model prediction accuracy and its implication when used in a model-assisted inferential framework. The results showed that the prediction accuracy of the model improved as the plot size increased. The adjusted R 2 increased from 0.35 to 0.74 while the relative root mean square error decreased from 63.6 to 29.2%. Indicators of boundary effects were identified and confirmed to have significant effects on the model residuals. Variance estimates of model-assisted mean AGB relative to corresponding variance estimates of pure field-based AGB, decreased with increasing plot size in the range from 200 to 3000 m 2 . The variance ratio of field-based estimates relative to model-assisted variance ranged from 1.7 to 7.7. This study showed that the relative improvement in precision of AGB estimation when increasing field-plot size, was greater for an ALS-assisted inventory compared to that of a pure field-based inventory.

  16. Estimating population genetic parameters and comparing model goodness-of-fit using DNA sequences with error

    PubMed Central

    Liu, Xiaoming; Fu, Yun-Xin; Maxwell, Taylor J.; Boerwinkle, Eric

    2010-01-01

    It is known that sequencing error can bias estimation of evolutionary or population genetic parameters. This problem is more prominent in deep resequencing studies because of their large sample size n, and a higher probability of error at each nucleotide site. We propose a new method based on the composite likelihood of the observed SNP configurations to infer population mutation rate θ = 4Neμ, population exponential growth rate R, and error rate ɛ, simultaneously. Using simulation, we show the combined effects of the parameters, θ, n, ɛ, and R on the accuracy of parameter estimation. We compared our maximum composite likelihood estimator (MCLE) of θ with other θ estimators that take into account the error. The results show the MCLE performs well when the sample size is large or the error rate is high. Using parametric bootstrap, composite likelihood can also be used as a statistic for testing the model goodness-of-fit of the observed DNA sequences. The MCLE method is applied to sequence data on the ANGPTL4 gene in 1832 African American and 1045 European American individuals. PMID:19952140

  17. Fossil shrews from Honduras and their significance for late glacial evolution in body size (Mammalia: Soricidae: Cryptotis)

    USGS Publications Warehouse

    Woodman, N.; Croft, D.A.

    2005-01-01

    Our study of mammalian remains excavated in the 1940s from McGrew Cave, north of Copan, Honduras, yielded an assemblage of 29 taxa that probably accumulated predominantly as the result of predation by owls. Among the taxa present are three species of small-eared shrews, genus Cryptotis. One species, Cryptotis merriami, is relatively rare among the fossil remains. The other two shrews, Cryptotis goodwini and Cryptotis orophila, are abundant and exhibit morpho metrical variation distinguishing them from modern populations. Fossils of C. goodwini are distinctly and consistently smaller than modern members of the species. To quantify the size differences, we derived common measures of body size for fossil C. goodwini using regression models based on modern samples of shrews in the Cryptotis mexicana-group. Estimated mean length of head and body for the fossil sample is 72-79 mm, and estimated mean mass is 7.6-9.6 g. These numbers indicate that the fossil sample averaged 6-14% smaller in head and body length and 39-52% less in mass than the modern sample and that increases of 6-17% in head and body length and 65-108% in mass occurred to achieve the mean body size of the modern sample. Conservative estimates of fresh (wet) food intake based on mass indicate that such a size increase would require a 37-58% increase in daily food consumption. In contrast to C. goodwini, fossil C. orophila from the cave is not different in mean body size from modern samples. The fossil sample does, however, show slightly greater variation in size than is currently present throughout the modern geographical distribution of the taxon. Moreover, variation in some other dental and mandibular characters is more constrained, exhibiting a more direct relationship to overall size. Our study of these species indicates that North American shrews have not all been static in size through time, as suggested by some previous work with fossil soricids. Lack of stratigraphic control within the site and our failure to obtain reliable radiometric dates on remains restrict our opportunities to place the site in a firm temporal context. However, the morphometrical differences we document for fossil C. orophila and C. goodwini show them to be distinct from modern populations of these shrews. Some other species of fossil mammals from McGrew Cave exhibit distinct size changes of the magnitudes experienced by many northern North American and some Mexican mammals during the transition from late glacial to Holocene environmental conditions, and it is likely that at least some of the remains from the cave are late Pleistocene in age. One curious factor is that, whereas most mainland mammals that exhibit large-scale size shifts during the late glacial/postglacial transition experienced dwarfing, C. goodwini increased in size. The lack of clinal variation in modern C. goodwini supports the hypothesis that size evolution can result from local selection rather than from cline translocation. Models of size change in mammals indicate that increased size, such as that observed for C. goodwini, are a likely consequence of increased availability of resources and, thereby, a relaxation of selection during critical times of the year.

  18. Finite-time and finite-size scalings in the evaluation of large-deviation functions: Numerical approach in continuous time.

    PubMed

    Guevara Hidalgo, Esteban; Nemoto, Takahiro; Lecomte, Vivien

    2017-06-01

    Rare trajectories of stochastic systems are important to understand because of their potential impact. However, their properties are by definition difficult to sample directly. Population dynamics provides a numerical tool allowing their study, by means of simulating a large number of copies of the system, which are subjected to selection rules that favor the rare trajectories of interest. Such algorithms are plagued by finite simulation time and finite population size, effects that can render their use delicate. In this paper, we present a numerical approach which uses the finite-time and finite-size scalings of estimators of the large deviation functions associated to the distribution of rare trajectories. The method we propose allows one to extract the infinite-time and infinite-size limit of these estimators, which-as shown on the contact process-provides a significant improvement of the large deviation function estimators compared to the standard one.

  19. Extremes in ecology: Avoiding the misleading effects of sampling variation in summary analyses

    USGS Publications Warehouse

    Link, W.A.; Sauer, J.R.

    1996-01-01

    Surveys such as the North American Breeding Bird Survey (BBS) produce large collections of parameter estimates. One's natural inclination when confronted with lists of parameter estimates is to look for the extreme values: in the BBS, these correspond to the species that appear to have the greatest changes in population size through time. Unfortunately, extreme estimates are liable to correspond to the most poorly estimated parameters. Consequently, the most extreme parameters may not match up with the most extreme parameter estimates. The ranking of parameter values on the basis of their estimates are a difficult statistical problem. We use data from the BBS and simulations to illustrate the potential misleading effects of sampling variation in rankings of parameters. We describe empirical Bayes and constrained empirical Bayes procedures which provide partial solutions to the problem of ranking in the presence of sampling variation.

  20. Estimation of signal-dependent noise level function in transform domain via a sparse recovery model.

    PubMed

    Yang, Jingyu; Gan, Ziqiao; Wu, Zhaoyang; Hou, Chunping

    2015-05-01

    This paper proposes a novel algorithm to estimate the noise level function (NLF) of signal-dependent noise (SDN) from a single image based on the sparse representation of NLFs. Noise level samples are estimated from the high-frequency discrete cosine transform (DCT) coefficients of nonlocal-grouped low-variation image patches. Then, an NLF recovery model based on the sparse representation of NLFs under a trained basis is constructed to recover NLF from the incomplete noise level samples. Confidence levels of the NLF samples are incorporated into the proposed model to promote reliable samples and weaken unreliable ones. We investigate the behavior of the estimation performance with respect to the block size, sampling rate, and confidence weighting. Simulation results on synthetic noisy images show that our method outperforms existing state-of-the-art schemes. The proposed method is evaluated on real noisy images captured by three types of commodity imaging devices, and shows consistently excellent SDN estimation performance. The estimated NLFs are incorporated into two well-known denoising schemes, nonlocal means and BM3D, and show significant improvements in denoising SDN-polluted images.

  1. Asymptotics of empirical eigenstructure for high dimensional spiked covariance.

    PubMed

    Wang, Weichen; Fan, Jianqing

    2017-06-01

    We derive the asymptotic distributions of the spiked eigenvalues and eigenvectors under a generalized and unified asymptotic regime, which takes into account the magnitude of spiked eigenvalues, sample size, and dimensionality. This regime allows high dimensionality and diverging eigenvalues and provides new insights into the roles that the leading eigenvalues, sample size, and dimensionality play in principal component analysis. Our results are a natural extension of those in Paul (2007) to a more general setting and solve the rates of convergence problems in Shen et al. (2013). They also reveal the biases of estimating leading eigenvalues and eigenvectors by using principal component analysis, and lead to a new covariance estimator for the approximate factor model, called shrinkage principal orthogonal complement thresholding (S-POET), that corrects the biases. Our results are successfully applied to outstanding problems in estimation of risks of large portfolios and false discovery proportions for dependent test statistics and are illustrated by simulation studies.

  2. Asymptotics of empirical eigenstructure for high dimensional spiked covariance

    PubMed Central

    Wang, Weichen

    2017-01-01

    We derive the asymptotic distributions of the spiked eigenvalues and eigenvectors under a generalized and unified asymptotic regime, which takes into account the magnitude of spiked eigenvalues, sample size, and dimensionality. This regime allows high dimensionality and diverging eigenvalues and provides new insights into the roles that the leading eigenvalues, sample size, and dimensionality play in principal component analysis. Our results are a natural extension of those in Paul (2007) to a more general setting and solve the rates of convergence problems in Shen et al. (2013). They also reveal the biases of estimating leading eigenvalues and eigenvectors by using principal component analysis, and lead to a new covariance estimator for the approximate factor model, called shrinkage principal orthogonal complement thresholding (S-POET), that corrects the biases. Our results are successfully applied to outstanding problems in estimation of risks of large portfolios and false discovery proportions for dependent test statistics and are illustrated by simulation studies. PMID:28835726

  3. Sample size and power estimation for studies with health related quality of life outcomes: a comparison of four methods using the SF-36.

    PubMed

    Walters, Stephen J

    2004-05-25

    We describe and compare four different methods for estimating sample size and power, when the primary outcome of the study is a Health Related Quality of Life (HRQoL) measure. These methods are: 1. assuming a Normal distribution and comparing two means; 2. using a non-parametric method; 3. Whitehead's method based on the proportional odds model; 4. the bootstrap. We illustrate the various methods, using data from the SF-36. For simplicity this paper deals with studies designed to compare the effectiveness (or superiority) of a new treatment compared to a standard treatment at a single point in time. The results show that if the HRQoL outcome has a limited number of discrete values (< 7) and/or the expected proportion of cases at the boundaries is high (scoring 0 or 100), then we would recommend using Whitehead's method (Method 3). Alternatively, if the HRQoL outcome has a large number of distinct values and the proportion at the boundaries is low, then we would recommend using Method 1. If a pilot or historical dataset is readily available (to estimate the shape of the distribution) then bootstrap simulation (Method 4) based on this data will provide a more accurate and reliable sample size estimate than conventional methods (Methods 1, 2, or 3). In the absence of a reliable pilot set, bootstrapping is not appropriate and conventional methods of sample size estimation or simulation will need to be used. Fortunately, with the increasing use of HRQoL outcomes in research, historical datasets are becoming more readily available. Strictly speaking, our results and conclusions only apply to the SF-36 outcome measure. Further empirical work is required to see whether these results hold true for other HRQoL outcomes. However, the SF-36 has many features in common with other HRQoL outcomes: multi-dimensional, ordinal or discrete response categories with upper and lower bounds, and skewed distributions, so therefore, we believe these results and conclusions using the SF-36 will be appropriate for other HRQoL measures.

  4. Small Sample Sizes Yield Biased Allometric Equations in Temperate Forests

    PubMed Central

    Duncanson, L.; Rourke, O.; Dubayah, R.

    2015-01-01

    Accurate quantification of forest carbon stocks is required for constraining the global carbon cycle and its impacts on climate. The accuracies of forest biomass maps are inherently dependent on the accuracy of the field biomass estimates used to calibrate models, which are generated with allometric equations. Here, we provide a quantitative assessment of the sensitivity of allometric parameters to sample size in temperate forests, focusing on the allometric relationship between tree height and crown radius. We use LiDAR remote sensing to isolate between 10,000 to more than 1,000,000 tree height and crown radius measurements per site in six U.S. forests. We find that fitted allometric parameters are highly sensitive to sample size, producing systematic overestimates of height. We extend our analysis to biomass through the application of empirical relationships from the literature, and show that given the small sample sizes used in common allometric equations for biomass, the average site-level biomass bias is ~+70% with a standard deviation of 71%, ranging from −4% to +193%. These findings underscore the importance of increasing the sample sizes used for allometric equation generation. PMID:26598233

  5. Sparse targets in hydroacoustic surveys: Balancing quantity and quality of in situ target strength data

    USGS Publications Warehouse

    DuFour, Mark R.; Mayer, Christine M.; Kocovsky, Patrick; Qian, Song; Warner, David M.; Kraus, Richard T.; Vandergoot, Christopher

    2017-01-01

    Hydroacoustic sampling of low-density fish in shallow water can lead to low sample sizes of naturally variable target strength (TS) estimates, resulting in both sparse and variable data. Increasing maximum beam compensation (BC) beyond conventional values (i.e., 3 dB beam width) can recover more targets during data analysis; however, data quality decreases near the acoustic beam edges. We identified the optimal balance between data quantity and quality with increasing BC using a standard sphere calibration, and we quantified the effect of BC on fish track variability, size structure, and density estimates of Lake Erie walleye (Sander vitreus). Standard sphere mean TS estimates were consistent with theoretical values (−39.6 dB) up to 18-dB BC, while estimates decreased at greater BC values. Natural sources (i.e., residual and mean TS) dominated total fish track variation, while contributions from measurement related error (i.e., number of single echo detections (SEDs) and BC) were proportionally low. Increasing BC led to more fish encounters and SEDs per fish, while stability in size structure and density were observed at intermediate values (e.g., 18 dB). Detection of medium to large fish (i.e., age-2+ walleye) benefited most from increasing BC, as proportional changes in size structure and density were greatest in these size categories. Therefore, when TS data are sparse and variable, increasing BC to an optimal value (here 18 dB) will maximize the TS data quantity while limiting lower-quality data near the beam edges.

  6. Using Grain-Size Distribution Methods for Estimation of Air Permeability.

    PubMed

    Wang, Tiejun; Huang, Yuanyang; Chen, Xunhong; Chen, Xi

    2016-01-01

    Knowledge of air permeability (ka ) at dry conditions is critical for the use of air flow models in porous media; however, it is usually difficult and time consuming to measure ka at dry conditions. It is thus desirable to estimate ka at dry conditions from other readily obtainable properties. In this study, the feasibility of using information derived from grain-size distributions (GSDs) for estimating ka at dry conditions was examined. Fourteen GSD-based equations originally developed for estimating saturated hydraulic conductivity were tested using ka measured at dry conditions in both undisturbed and disturbed river sediment samples. On average, the estimated ka from all the equations, except for the method of Slichter, differed by less than ± 4 times from the measured ka for both undisturbed and disturbed groups. In particular, for the two sediment groups, the results given by the methods of Terzaghi and Hazen-modified were comparable to the measured ka . In addition, two methods (e.g., Barr and Beyer) for the undisturbed samples and one method (e.g., Hazen-original) for the undisturbed samples were also able to produce comparable ka estimates. Moreover, after adjusting the values of the coefficient C in the GSD-based equations, the estimation of ka was significantly improved with the differences between the measured and estimated ka less than ±4% on average (except for the method of Barr). As demonstrated by this study, GSD-based equations may provide a promising and efficient way to estimate ka at dry conditions. © 2015, National Ground Water Association.

  7. An Unbiased Estimator of Gene Diversity with Improved Variance for Samples Containing Related and Inbred Individuals of any Ploidy

    PubMed Central

    Harris, Alexandre M.; DeGiorgio, Michael

    2016-01-01

    Gene diversity, or expected heterozygosity (H), is a common statistic for assessing genetic variation within populations. Estimation of this statistic decreases in accuracy and precision when individuals are related or inbred, due to increased dependence among allele copies in the sample. The original unbiased estimator of expected heterozygosity underestimates true population diversity in samples containing relatives, as it only accounts for sample size. More recently, a general unbiased estimator of expected heterozygosity was developed that explicitly accounts for related and inbred individuals in samples. Though unbiased, this estimator’s variance is greater than that of the original estimator. To address this issue, we introduce a general unbiased estimator of gene diversity for samples containing related or inbred individuals, which employs the best linear unbiased estimator of allele frequencies, rather than the commonly used sample proportion. We examine the properties of this estimator, H∼BLUE, relative to alternative estimators using simulations and theoretical predictions, and show that it predominantly has the smallest mean squared error relative to others. Further, we empirically assess the performance of H∼BLUE on a global human microsatellite dataset of 5795 individuals, from 267 populations, genotyped at 645 loci. Additionally, we show that the improved variance of H∼BLUE leads to improved estimates of the population differentiation statistic, FST, which employs measures of gene diversity within its calculation. Finally, we provide an R script, BestHet, to compute this estimator from genomic and pedigree data. PMID:28040781

  8. Estimation of Rainfall Sampling Uncertainty: A Comparison of Two Diverse Approaches

    NASA Technical Reports Server (NTRS)

    Steiner, Matthias; Zhang, Yu; Baeck, Mary Lynn; Wood, Eric F.; Smith, James A.; Bell, Thomas L.; Lau, William K. M. (Technical Monitor)

    2002-01-01

    The spatial and temporal intermittence of rainfall causes the averages of satellite observations of rain rate to differ from the "true" average rain rate over any given area and time period, even if the satellite observations are perfectly accurate. The difference of satellite averages based on occasional observation by satellite systems and the continuous-time average of rain rate is referred to as sampling error. In this study, rms sampling error estimates are obtained for average rain rates over boxes 100 km, 200 km, and 500 km on a side, for averaging periods of 1 day, 5 days, and 30 days. The study uses a multi-year, merged radar data product provided by Weather Services International Corp. at a resolution of 2 km in space and 15 min in time, over an area of the central U.S. extending from 35N to 45N in latitude and 100W to 80W in longitude. The intervals between satellite observations are assumed to be equal, and similar In size to what present and future satellite systems are able to provide (from 1 h to 12 h). The sampling error estimates are obtained using a resampling method called "resampling by shifts," and are compared to sampling error estimates proposed by Bell based on earlier work by Laughlin. The resampling estimates are found to scale with areal size and time period as the theory predicts. The dependence on average rain rate and time interval between observations is also similar to what the simple theory suggests.

  9. Speckle pattern sequential extraction metric for estimating the focus spot size on a remote diffuse target.

    PubMed

    Yu, Zhan; Li, Yuanyang; Liu, Lisheng; Guo, Jin; Wang, Tingfeng; Yang, Guoqing

    2017-11-10

    The speckle pattern (line by line) sequential extraction (SPSE) metric is proposed by the one-dimensional speckle intensity level crossing theory. Through the sequential extraction of received speckle information, the speckle metrics for estimating the variation of focusing spot size on a remote diffuse target are obtained. Based on the simulation, we will give some discussions about the SPSE metric range of application under the theoretical conditions, and the aperture size will affect the metric performance of the observation system. The results of the analyses are verified by the experiment. This method is applied to the detection of relative static target (speckled jitter frequency is less than the CCD sampling frequency). The SPSE metric can determine the variation of the focusing spot size over a long distance, moreover, the metric will estimate the spot size under some conditions. Therefore, the monitoring and the feedback of far-field spot will be implemented laser focusing system applications and help the system to optimize the focusing performance.

  10. Avoiding treatment bias of REDD+ monitoring by sampling with partial replacement.

    PubMed

    Köhl, Michael; Scott, Charles T; Lister, Andrew J; Demon, Inez; Plugge, Daniel

    2015-12-01

    Implementing REDD+ renders the development of a measurement, reporting and verification (MRV) system necessary to monitor carbon stock changes. MRV systems generally apply a combination of remote sensing techniques and in-situ field assessments. In-situ assessments can be based on 1) permanent plots, which are assessed on all successive occasions, 2) temporary plots, which are assessed only once, and 3) a combination of both. The current study focuses on in-situ assessments and addresses the effect of treatment bias, which is introduced by managing permanent sampling plots differently than the surrounding forests. Temporary plots are not subject to treatment bias, but are associated with large sampling errors and low cost-efficiency. Sampling with partial replacement (SPR) utilizes both permanent and temporary plots. We apply a scenario analysis with different intensities of deforestation and forest degradation to show that SPR combines cost-efficiency with the handling of treatment bias. Without treatment bias permanent plots generally provide lower sampling errors for change estimates than SPR and temporary plots, but do not provide reliable estimates, if treatment bias occurs, SPR allows for change estimates that are comparable to those provided by permanent plots, offers the flexibility to adjust sample sizes in the course of time, and allows to compare data on permanent versus temporary plots for detecting treatment bias. Equivalence of biomass or carbon stock estimates between permanent and temporary plots serves as an indication for the absence of treatment bias while differences suggest that there is evidence for treatment bias. SPR is a flexible tool for estimating emission factors from successive measurements. It does not entirely depend on sample plots that are installed at the first occasion but allows for the adjustment of sample sizes and placement of new plots at any occasion. This ensures that in-situ samples provide representative estimates over time. SPR offers the possibility to increase sampling intensity in areas with high degradation intensities or to establish new plots in areas where permanent plots are lost due to deforestation. SPR is also an ideal approach to mitigate concerns about treatment bias.

  11. Observed intra-cluster correlation coefficients in a cluster survey sample of patient encounters in general practice in Australia

    PubMed Central

    Knox, Stephanie A; Chondros, Patty

    2004-01-01

    Background Cluster sample study designs are cost effective, however cluster samples violate the simple random sample assumption of independence of observations. Failure to account for the intra-cluster correlation of observations when sampling through clusters may lead to an under-powered study. Researchers therefore need estimates of intra-cluster correlation for a range of outcomes to calculate sample size. We report intra-cluster correlation coefficients observed within a large-scale cross-sectional study of general practice in Australia, where the general practitioner (GP) was the primary sampling unit and the patient encounter was the unit of inference. Methods Each year the Bettering the Evaluation and Care of Health (BEACH) study recruits a random sample of approximately 1,000 GPs across Australia. Each GP completes details of 100 consecutive patient encounters. Intra-cluster correlation coefficients were estimated for patient demographics, morbidity managed and treatments received. Intra-cluster correlation coefficients were estimated for descriptive outcomes and for associations between outcomes and predictors and were compared across two independent samples of GPs drawn three years apart. Results Between April 1999 and March 2000, a random sample of 1,047 Australian general practitioners recorded details of 104,700 patient encounters. Intra-cluster correlation coefficients for patient demographics ranged from 0.055 for patient sex to 0.451 for language spoken at home. Intra-cluster correlations for morbidity variables ranged from 0.005 for the management of eye problems to 0.059 for management of psychological problems. Intra-cluster correlation for the association between two variables was smaller than the descriptive intra-cluster correlation of each variable. When compared with the April 2002 to March 2003 sample (1,008 GPs) the estimated intra-cluster correlation coefficients were found to be consistent across samples. Conclusions The demonstrated precision and reliability of the estimated intra-cluster correlations indicate that these coefficients will be useful for calculating sample sizes in future general practice surveys that use the GP as the primary sampling unit. PMID:15613248

  12. 40 CFR Table F-4 to Subpart F of... - Estimated Mass Concentration Measurement of PM2.5 for Idealized Coarse Aerosol Size Distribution

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 5 2010-07-01 2010-07-01 false Estimated Mass Concentration... Equivalent Methods for PM2.5 Pt. 53, Subpt. F, Table F-4 Table F-4 to Subpart F of Part 53—Estimated Mass... (µm) Test Sampler Fractional Sampling Effectiveness Interval Mass Concentration (µg/m3) Estimated Mass...

  13. 40 CFR Table F-4 to Subpart F of... - Estimated Mass Concentration Measurement of PM2.5 for Idealized Coarse Aerosol Size Distribution

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 5 2011-07-01 2011-07-01 false Estimated Mass Concentration... Equivalent Methods for PM2.5 Pt. 53, Subpt. F, Table F-4 Table F-4 to Subpart F of Part 53—Estimated Mass... (µm) Test Sampler Fractional Sampling Effectiveness Interval Mass Concentration (µg/m3) Estimated Mass...

  14. 40 CFR Table F-6 to Subpart F of... - Estimated Mass Concentration Measurement of PM2.5 for Idealized Fine Aerosol Size Distribution

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 5 2011-07-01 2011-07-01 false Estimated Mass Concentration... Equivalent Methods for PM2.5 Pt. 53, Subpt. F, Table F-6 Table F-6 to Subpart F of Part 53—Estimated Mass... (µm) Test Sampler Fractional Sampling Effectiveness Interval Mass Concentration (µg/m3) Estimated Mass...

  15. 40 CFR Table F-6 to Subpart F of... - Estimated Mass Concentration Measurement of PM2.5 for Idealized Fine Aerosol Size Distribution

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 5 2010-07-01 2010-07-01 false Estimated Mass Concentration... Equivalent Methods for PM2.5 Pt. 53, Subpt. F, Table F-6 Table F-6 to Subpart F of Part 53—Estimated Mass... (µm) Test Sampler Fractional Sampling Effectiveness Interval Mass Concentration (µg/m3) Estimated Mass...

  16. Historical Population Estimates For Several Fish Species At Offshore Oil and Gas Structures in the US Gulf of Mexico

    NASA Astrophysics Data System (ADS)

    Gitschlag, G.

    2016-02-01

    Population estimates were calculated for four fish species occurring at offshore oil and gas structures in water depths of 14-32 m off the Louisiana and upper Texas coasts in the US Gulf of Mexico. From 1993-1999 sampling was conducted at eight offshore platforms in conjunction with explosive salvage of the structures. To estimate fish population size prior to detonation of explosives, a fish mark-recapture study was conducted. Fish were captured on rod and reel using assorted hook sizes. Traps were occasionally used to supplement catches. Fish were tagged below the dorsal fin with plastic t-bar tags using tagging guns. Only fish that were alive and in good condition were released. Recapture sampling was conducted after explosives were detonated during salvage operations. Personnel operating from inflatable boats used dip nets to collect all dead fish that floated to the surface. Divers collected representative samples of dead fish that sank to the sea floor. Data provided estimates for red snapper (Lutjanus campechanus), Atlantic spadefish (Chaetodipterus faber), gray triggerfish (Balistes capriscus), and blue runner (Caranx crysos) at one or more of the eight platforms studied. At seven platforms, population size for red snapper was calculated at 503-1,943 with a 95% CI of 478. Abundance estimates for Atlantic spadefish at three platforms ranged from 1,432-1,782 with a 95% CI of 473. At three platforms, population size of gray triggerfish was 63-129 with a 95% CI of 82. Blue runner abundance at one platform was 558. Unlike the other three species which occur close to the platforms, blue runner range widely and recapture of this species was dependent on fish schools being in close proximity to the platform at the time explosives were detonated. Tag recapture was as high as 73% for red snapper at one structure studied.

  17. Accuracy in parameter estimation for targeted effects in structural equation modeling: sample size planning for narrow confidence intervals.

    PubMed

    Lai, Keke; Kelley, Ken

    2011-06-01

    In addition to evaluating a structural equation model (SEM) as a whole, often the model parameters are of interest and confidence intervals for those parameters are formed. Given a model with a good overall fit, it is entirely possible for the targeted effects of interest to have very wide confidence intervals, thus giving little information about the magnitude of the population targeted effects. With the goal of obtaining sufficiently narrow confidence intervals for the model parameters of interest, sample size planning methods for SEM are developed from the accuracy in parameter estimation approach. One method plans for the sample size so that the expected confidence interval width is sufficiently narrow. An extended procedure ensures that the obtained confidence interval will be no wider than desired, with some specified degree of assurance. A Monte Carlo simulation study was conducted that verified the effectiveness of the procedures in realistic situations. The methods developed have been implemented in the MBESS package in R so that they can be easily applied by researchers. © 2011 American Psychological Association

  18. Hard choices in assessing survival past dams — a comparison of single- and paired-release strategies

    USGS Publications Warehouse

    Zydlewski, Joseph D.; Stich, Daniel S.; Sigourney, Douglas B.

    2017-01-01

    Mark–recapture models are widely used to estimate survival of salmon smolts migrating past dams. Paired releases have been used to improve estimate accuracy by removing components of mortality not attributable to the dam. This method is accompanied by reduced precision because (i) sample size is reduced relative to a single, large release; and (ii) variance calculations inflate error. We modeled an idealized system with a single dam to assess trade-offs between accuracy and precision and compared methods using root mean squared error (RMSE). Simulations were run under predefined conditions (dam mortality, background mortality, detection probability, and sample size) to determine scenarios when the paired release was preferable to a single release. We demonstrate that a paired-release design provides a theoretical advantage over a single-release design only at large sample sizes and high probabilities of detection. At release numbers typical of many survival studies, paired release can result in overestimation of dam survival. Failures to meet model assumptions of a paired release may result in further overestimation of dam-related survival. Under most conditions, a single-release strategy was preferable.

  19. A Proposed Approach for Joint Modeling of the Longitudinal and Time-To-Event Data in Heterogeneous Populations: An Application to HIV/AIDS's Disease.

    PubMed

    Roustaei, Narges; Ayatollahi, Seyyed Mohammad Taghi; Zare, Najaf

    2018-01-01

    In recent years, the joint models have been widely used for modeling the longitudinal and time-to-event data simultaneously. In this study, we proposed an approach (PA) to study the longitudinal and survival outcomes simultaneously in heterogeneous populations. PA relaxes the assumption of conditional independence (CI). We also compared PA with joint latent class model (JLCM) and separate approach (SA) for various sample sizes (150, 300, and 600) and different association parameters (0, 0.2, and 0.5). The average bias of parameters estimation (AB-PE), average SE of parameters estimation (ASE-PE), and coverage probability of the 95% confidence interval (CP) among the three approaches were compared. In most cases, when the sample sizes increased, AB-PE and ASE-PE decreased for the three approaches, and CP got closer to the nominal level of 0.95. When there was a considerable association, PA in comparison with SA and JLCM performed better in the sense that PA had the smallest AB-PE and ASE-PE for the longitudinal submodel among the three approaches for the small and moderate sample sizes. Moreover, JLCM was desirable for the none-association and the large sample size. Finally, the evaluated approaches were applied on a real HIV/AIDS dataset for validation, and the results were compared.

  20. A comparative review of methods for comparing means using partially paired data.

    PubMed

    Guo, Beibei; Yuan, Ying

    2017-06-01

    In medical experiments with the objective of testing the equality of two means, data are often partially paired by design or because of missing data. The partially paired data represent a combination of paired and unpaired observations. In this article, we review and compare nine methods for analyzing partially paired data, including the two-sample t-test, paired t-test, corrected z-test, weighted t-test, pooled t-test, optimal pooled t-test, multiple imputation method, mixed model approach, and the test based on a modified maximum likelihood estimate. We compare the performance of these methods through extensive simulation studies that cover a wide range of scenarios with different effect sizes, sample sizes, and correlations between the paired variables, as well as true underlying distributions. The simulation results suggest that when the sample size is moderate, the test based on the modified maximum likelihood estimator is generally superior to the other approaches when the data is normally distributed and the optimal pooled t-test performs the best when the data is not normally distributed, with well-controlled type I error rates and high statistical power; when the sample size is small, the optimal pooled t-test is to be recommended when both variables have missing data and the paired t-test is to be recommended when only one variable has missing data.

  1. Reexamining Sample Size Requirements for Multivariate, Abundance-Based Community Research: When Resources are Limited, the Research Does Not Have to Be.

    PubMed

    Forcino, Frank L; Leighton, Lindsey R; Twerdy, Pamela; Cahill, James F

    2015-01-01

    Community ecologists commonly perform multivariate techniques (e.g., ordination, cluster analysis) to assess patterns and gradients of taxonomic variation. A critical requirement for a meaningful statistical analysis is accurate information on the taxa found within an ecological sample. However, oversampling (too many individuals counted per sample) also comes at a cost, particularly for ecological systems in which identification and quantification is substantially more resource consuming than the field expedition itself. In such systems, an increasingly larger sample size will eventually result in diminishing returns in improving any pattern or gradient revealed by the data, but will also lead to continually increasing costs. Here, we examine 396 datasets: 44 previously published and 352 created datasets. Using meta-analytic and simulation-based approaches, the research within the present paper seeks (1) to determine minimal sample sizes required to produce robust multivariate statistical results when conducting abundance-based, community ecology research. Furthermore, we seek (2) to determine the dataset parameters (i.e., evenness, number of taxa, number of samples) that require larger sample sizes, regardless of resource availability. We found that in the 44 previously published and the 220 created datasets with randomly chosen abundances, a conservative estimate of a sample size of 58 produced the same multivariate results as all larger sample sizes. However, this minimal number varies as a function of evenness, where increased evenness resulted in increased minimal sample sizes. Sample sizes as small as 58 individuals are sufficient for a broad range of multivariate abundance-based research. In cases when resource availability is the limiting factor for conducting a project (e.g., small university, time to conduct the research project), statistically viable results can still be obtained with less of an investment.

  2. HYPERSAMP - HYPERGEOMETRIC ATTRIBUTE SAMPLING SYSTEM BASED ON RISK AND FRACTION DEFECTIVE

    NASA Technical Reports Server (NTRS)

    De, Salvo L. J.

    1994-01-01

    HYPERSAMP is a demonstration of an attribute sampling system developed to determine the minimum sample size required for any preselected value for consumer's risk and fraction of nonconforming. This statistical method can be used in place of MIL-STD-105E sampling plans when a minimum sample size is desirable, such as when tests are destructive or expensive. HYPERSAMP utilizes the Hypergeometric Distribution and can be used for any fraction nonconforming. The program employs an iterative technique that circumvents the obstacle presented by the factorial of a non-whole number. HYPERSAMP provides the required Hypergeometric sample size for any equivalent real number of nonconformances in the lot or batch under evaluation. Many currently used sampling systems, such as the MIL-STD-105E, utilize the Binomial or the Poisson equations as an estimate of the Hypergeometric when performing inspection by attributes. However, this is primarily because of the difficulty in calculation of the factorials required by the Hypergeometric. Sampling plans based on the Binomial or Poisson equations will result in the maximum sample size possible with the Hypergeometric. The difference in the sample sizes between the Poisson or Binomial and the Hypergeometric can be significant. For example, a lot size of 400 devices with an error rate of 1.0% and a confidence of 99% would require a sample size of 400 (all units would need to be inspected) for the Binomial sampling plan and only 273 for a Hypergeometric sampling plan. The Hypergeometric results in a savings of 127 units, a significant reduction in the required sample size. HYPERSAMP is a demonstration program and is limited to sampling plans with zero defectives in the sample (acceptance number of zero). Since it is only a demonstration program, the sample size determination is limited to sample sizes of 1500 or less. The Hypergeometric Attribute Sampling System demonstration code is a spreadsheet program written for IBM PC compatible computers running DOS and Lotus 1-2-3 or Quattro Pro. This program is distributed on a 5.25 inch 360K MS-DOS format diskette, and the program price includes documentation. This statistical method was developed in 1992.

  3. A confidence interval analysis of sampling effort, sequencing depth, and taxonomic resolution of fungal community ecology in the era of high-throughput sequencing.

    PubMed

    Oono, Ryoko

    2017-01-01

    High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions 'how and why are communities different?' This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences.

  4. A confidence interval analysis of sampling effort, sequencing depth, and taxonomic resolution of fungal community ecology in the era of high-throughput sequencing

    PubMed Central

    2017-01-01

    High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions ‘how and why are communities different?’ This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences. PMID:29253889

  5. Using small area estimation and Lidar-derived variables for multivariate prediction of forest attributes

    Treesearch

    F. Mauro; Vicente Monleon; H. Temesgen

    2015-01-01

    Small area estimation (SAE) techniques have been successfully applied in forest inventories to provide reliable estimates for domains where the sample size is small (i.e. small areas). Previous studies have explored the use of either Area Level or Unit Level Empirical Best Linear Unbiased Predictors (EBLUPs) in a univariate framework, modeling each variable of interest...

  6. Estimation of a closed population size of tadpoles in temporary pond.

    PubMed

    Lima, M S C S; Pederassi, J; Souza, C A S

    2018-05-01

    The practice of capture-recapture to estimate the diversity is well known to many animal groups, however this practice in the larval phase of anuran amphibians is incipient. We aimed at evaluating the Lincoln estimator, Venn diagram and Bayes theorem in the inference of population size of a larval phase anurocenose from lotic environment. The adherence of results was evaluated using the Kolmogorov-Smirnov test. The marking of tadpoles for later recapture and methods measurement was made with eosin methylene blue. When comparing the results of Lincoln-Petersen estimator corresponding to the Venn diagram and Bayes theorem, we detected percentage differences per sampling, i.e., the proportion of sampled anuran genera is kept among the three methods, although the values are numerically different. By submitting these results to the Kolmogorov-Smirnov test we have found no significant differences. Therefore, no matter the estimator, the measured value is adherent and estimates the total population. Together with the marking methodology, which did not change the behavior of tadpoles, the present study helps to fill the need of more studies on larval phase of amphibians in Brazil, especially in semi-arid northeast.

  7. On the existence of maximum likelihood estimates for presence-only data

    USGS Publications Warehouse

    Hefley, Trevor J.; Hooten, Mevin B.

    2015-01-01

    It is important to identify conditions for which maximum likelihood estimates are unlikely to be identifiable from presence-only data. In data sets where the maximum likelihood estimates do not exist, penalized likelihood and Bayesian methods will produce coefficient estimates, but these are sensitive to the choice of estimation procedure and prior or penalty term. When sample size is small or it is thought that habitat preferences are strong, we propose a suite of estimation procedures researchers can consider using.

  8. The distance between Mars and Venus: measuring global sex differences in personality.

    PubMed

    Del Giudice, Marco; Booth, Tom; Irwing, Paul

    2012-01-01

    Sex differences in personality are believed to be comparatively small. However, research in this area has suffered from significant methodological limitations. We advance a set of guidelines for overcoming those limitations: (a) measure personality with a higher resolution than that afforded by the Big Five; (b) estimate sex differences on latent factors; and (c) assess global sex differences with multivariate effect sizes. We then apply these guidelines to a large, representative adult sample, and obtain what is presently the best estimate of global sex differences in personality. Personality measures were obtained from a large US sample (N = 10,261) with the 16PF Questionnaire. Multigroup latent variable modeling was used to estimate sex differences on individual personality dimensions, which were then aggregated to yield a multivariate effect size (Mahalanobis D). We found a global effect size D = 2.71, corresponding to an overlap of only 10% between the male and female distributions. Even excluding the factor showing the largest univariate ES, the global effect size was D = 1.71 (24% overlap). These are extremely large differences by psychological standards. The idea that there are only minor differences between the personality profiles of males and females should be rejected as based on inadequate methodology.

  9. The Distance Between Mars and Venus: Measuring Global Sex Differences in Personality

    PubMed Central

    Del Giudice, Marco; Booth, Tom; Irwing, Paul

    2012-01-01

    Background Sex differences in personality are believed to be comparatively small. However, research in this area has suffered from significant methodological limitations. We advance a set of guidelines for overcoming those limitations: (a) measure personality with a higher resolution than that afforded by the Big Five; (b) estimate sex differences on latent factors; and (c) assess global sex differences with multivariate effect sizes. We then apply these guidelines to a large, representative adult sample, and obtain what is presently the best estimate of global sex differences in personality. Methodology/Principal Findings Personality measures were obtained from a large US sample (N = 10,261) with the 16PF Questionnaire. Multigroup latent variable modeling was used to estimate sex differences on individual personality dimensions, which were then aggregated to yield a multivariate effect size (Mahalanobis D). We found a global effect size D = 2.71, corresponding to an overlap of only 10% between the male and female distributions. Even excluding the factor showing the largest univariate ES, the global effect size was D = 1.71 (24% overlap). These are extremely large differences by psychological standards. Significance The idea that there are only minor differences between the personality profiles of males and females should be rejected as based on inadequate methodology. PMID:22238596

  10. Sampling considerations for disease surveillance in wildlife populations

    USGS Publications Warehouse

    Nusser, S.M.; Clark, W.R.; Otis, D.L.; Huang, L.

    2008-01-01

    Disease surveillance in wildlife populations involves detecting the presence of a disease, characterizing its prevalence and spread, and subsequent monitoring. A probability sample of animals selected from the population and corresponding estimators of disease prevalence and detection provide estimates with quantifiable statistical properties, but this approach is rarely used. Although wildlife scientists often assume probability sampling and random disease distributions to calculate sample sizes, convenience samples (i.e., samples of readily available animals) are typically used, and disease distributions are rarely random. We demonstrate how landscape-based simulation can be used to explore properties of estimators from convenience samples in relation to probability samples. We used simulation methods to model what is known about the habitat preferences of the wildlife population, the disease distribution, and the potential biases of the convenience-sample approach. Using chronic wasting disease in free-ranging deer (Odocoileus virginianus) as a simple illustration, we show that using probability sample designs with appropriate estimators provides unbiased surveillance parameter estimates but that the selection bias and coverage errors associated with convenience samples can lead to biased and misleading results. We also suggest practical alternatives to convenience samples that mix probability and convenience sampling. For example, a sample of land areas can be selected using a probability design that oversamples areas with larger animal populations, followed by harvesting of individual animals within sampled areas using a convenience sampling method.

  11. The relationship between offspring size and fitness: integrating theory and empiricism.

    PubMed

    Rollinson, Njal; Hutchings, Jeffrey A

    2013-02-01

    How parents divide the energy available for reproduction between size and number of offspring has a profound effect on parental reproductive success. Theory indicates that the relationship between offspring size and offspring fitness is of fundamental importance to the evolution of parental reproductive strategies: this relationship predicts the optimal division of resources between size and number of offspring, it describes the fitness consequences for parents that deviate from optimality, and its shape can predict the most viable type of investment strategy in a given environment (e.g., conservative vs. diversified bet-hedging). Many previous attempts to estimate this relationship and the corresponding value of optimal offspring size have been frustrated by a lack of integration between theory and empiricism. In the present study, we draw from C. Smith and S. Fretwell's classic model to explain how a sound estimate of the offspring size--fitness relationship can be derived with empirical data. We evaluate what measures of fitness can be used to model the offspring size--fitness curve and optimal size, as well as which statistical models should and should not be used to estimate offspring size--fitness relationships. To construct the fitness curve, we recommend that offspring fitness be measured as survival up to the age at which the instantaneous rate of offspring mortality becomes random with respect to initial investment. Parental fitness is then expressed in ecologically meaningful, theoretically defensible, and broadly comparable units: the number of offspring surviving to independence. Although logistic and asymptotic regression have been widely used to estimate offspring size-fitness relationships, the former provides relatively unreliable estimates of optimal size when offspring survival and sample sizes are low, and the latter is unreliable under all conditions. We recommend that the Weibull-1 model be used to estimate this curve because it provides modest improvements in prediction accuracy under experimentally relevant conditions.

  12. Grouping methods for estimating the prevalences of rare traits from complex survey data that preserve confidentiality of respondents.

    PubMed

    Hyun, Noorie; Gastwirth, Joseph L; Graubard, Barry I

    2018-03-26

    Originally, 2-stage group testing was developed for efficiently screening individuals for a disease. In response to the HIV/AIDS epidemic, 1-stage group testing was adopted for estimating prevalences of a single or multiple traits from testing groups of size q, so individuals were not tested. This paper extends the methodology of 1-stage group testing to surveys with sample weighted complex multistage-cluster designs. Sample weighted-generalized estimating equations are used to estimate the prevalences of categorical traits while accounting for the error rates inherent in the tests. Two difficulties arise when using group testing in complex samples: (1) How does one weight the results of the test on each group as the sample weights will differ among observations in the same group. Furthermore, if the sample weights are related to positivity of the diagnostic test, then group-level weighting is needed to reduce bias in the prevalence estimation; (2) How does one form groups that will allow accurate estimation of the standard errors of prevalence estimates under multistage-cluster sampling allowing for intracluster correlation of the test results. We study 5 different grouping methods to address the weighting and cluster sampling aspects of complex designed samples. Finite sample properties of the estimators of prevalences, variances, and confidence interval coverage for these grouping methods are studied using simulations. National Health and Nutrition Examination Survey data are used to illustrate the methods. Copyright © 2018 John Wiley & Sons, Ltd.

  13. Reliability of Different Mark-Recapture Methods for Population Size Estimation Tested against Reference Population Sizes Constructed from Field Data

    PubMed Central

    Grimm, Annegret; Gruber, Bernd; Henle, Klaus

    2014-01-01

    Reliable estimates of population size are fundamental in many ecological studies and biodiversity conservation. Selecting appropriate methods to estimate abundance is often very difficult, especially if data are scarce. Most studies concerning the reliability of different estimators used simulation data based on assumptions about capture variability that do not necessarily reflect conditions in natural populations. Here, we used data from an intensively studied closed population of the arboreal gecko Gehyra variegata to construct reference population sizes for assessing twelve different population size estimators in terms of bias, precision, accuracy, and their 95%-confidence intervals. Two of the reference populations reflect natural biological entities, whereas the other reference populations reflect artificial subsets of the population. Since individual heterogeneity was assumed, we tested modifications of the Lincoln-Petersen estimator, a set of models in programs MARK and CARE-2, and a truncated geometric distribution. Ranking of methods was similar across criteria. Models accounting for individual heterogeneity performed best in all assessment criteria. For populations from heterogeneous habitats without obvious covariates explaining individual heterogeneity, we recommend using the moment estimator or the interpolated jackknife estimator (both implemented in CAPTURE/MARK). If data for capture frequencies are substantial, we recommend the sample coverage or the estimating equation (both models implemented in CARE-2). Depending on the distribution of catchabilities, our proposed multiple Lincoln-Petersen and a truncated geometric distribution obtained comparably good results. The former usually resulted in a minimum population size and the latter can be recommended when there is a long tail of low capture probabilities. Models with covariates and mixture models performed poorly. Our approach identified suitable methods and extended options to evaluate the performance of mark-recapture population size estimators under field conditions, which is essential for selecting an appropriate method and obtaining reliable results in ecology and conservation biology, and thus for sound management. PMID:24896260

  14. Localised estimates and spatial mapping of poverty incidence in the state of Bihar in India—An application of small area estimation techniques

    PubMed Central

    Aditya, Kaustav; Sud, U. C.

    2018-01-01

    Poverty affects many people, but the ramifications and impacts affect all aspects of society. Information about the incidence of poverty is therefore an important parameter of the population for policy analysis and decision making. In order to provide specific, targeted solutions when addressing poverty disadvantage small area statistics are needed. Surveys are typically designed and planned to produce reliable estimates of population characteristics of interest mainly at higher geographic area such as national and state level. Sample sizes are usually not large enough to provide reliable estimates for disaggregated analysis. In many instances estimates are required for areas of the population for which the survey providing the data was unplanned. Then, for areas with small sample sizes, direct survey estimation of population characteristics based only on the data available from the particular area tends to be unreliable. This paper describes an application of small area estimation (SAE) approach to improve the precision of estimates of poverty incidence at district level in the State of Bihar in India by linking data from the Household Consumer Expenditure Survey 2011–12 of NSSO and the Population Census 2011. The results show that the district level estimates generated by SAE method are more precise and representative. In contrast, the direct survey estimates based on survey data alone are less stable. PMID:29879202

  15. Localised estimates and spatial mapping of poverty incidence in the state of Bihar in India-An application of small area estimation techniques.

    PubMed

    Chandra, Hukum; Aditya, Kaustav; Sud, U C

    2018-01-01

    Poverty affects many people, but the ramifications and impacts affect all aspects of society. Information about the incidence of poverty is therefore an important parameter of the population for policy analysis and decision making. In order to provide specific, targeted solutions when addressing poverty disadvantage small area statistics are needed. Surveys are typically designed and planned to produce reliable estimates of population characteristics of interest mainly at higher geographic area such as national and state level. Sample sizes are usually not large enough to provide reliable estimates for disaggregated analysis. In many instances estimates are required for areas of the population for which the survey providing the data was unplanned. Then, for areas with small sample sizes, direct survey estimation of population characteristics based only on the data available from the particular area tends to be unreliable. This paper describes an application of small area estimation (SAE) approach to improve the precision of estimates of poverty incidence at district level in the State of Bihar in India by linking data from the Household Consumer Expenditure Survey 2011-12 of NSSO and the Population Census 2011. The results show that the district level estimates generated by SAE method are more precise and representative. In contrast, the direct survey estimates based on survey data alone are less stable.

  16. Modification of the Sandwich Estimator in Generalized Estimating Equations with Correlated Binary Outcomes in Rare Event and Small Sample Settings

    PubMed Central

    Rogers, Paul; Stoner, Julie

    2016-01-01

    Regression models for correlated binary outcomes are commonly fit using a Generalized Estimating Equations (GEE) methodology. GEE uses the Liang and Zeger sandwich estimator to produce unbiased standard error estimators for regression coefficients in large sample settings even when the covariance structure is misspecified. The sandwich estimator performs optimally in balanced designs when the number of participants is large, and there are few repeated measurements. The sandwich estimator is not without drawbacks; its asymptotic properties do not hold in small sample settings. In these situations, the sandwich estimator is biased downwards, underestimating the variances. In this project, a modified form for the sandwich estimator is proposed to correct this deficiency. The performance of this new sandwich estimator is compared to the traditional Liang and Zeger estimator as well as alternative forms proposed by Morel, Pan and Mancl and DeRouen. The performance of each estimator was assessed with 95% coverage probabilities for the regression coefficient estimators using simulated data under various combinations of sample sizes and outcome prevalence values with an Independence (IND), Autoregressive (AR) and Compound Symmetry (CS) correlation structure. This research is motivated by investigations involving rare-event outcomes in aviation data. PMID:26998504

  17. Implementing Generalized Additive Models to Estimate the Expected Value of Sample Information in a Microsimulation Model: Results of Three Case Studies.

    PubMed

    Rabideau, Dustin J; Pei, Pamela P; Walensky, Rochelle P; Zheng, Amy; Parker, Robert A

    2018-02-01

    The expected value of sample information (EVSI) can help prioritize research but its application is hampered by computational infeasibility, especially for complex models. We investigated an approach by Strong and colleagues to estimate EVSI by applying generalized additive models (GAM) to results generated from a probabilistic sensitivity analysis (PSA). For 3 potential HIV prevention and treatment strategies, we estimated life expectancy and lifetime costs using the Cost-effectiveness of Preventing AIDS Complications (CEPAC) model, a complex patient-level microsimulation model of HIV progression. We fitted a GAM-a flexible regression model that estimates the functional form as part of the model fitting process-to the incremental net monetary benefits obtained from the CEPAC PSA. For each case study, we calculated the expected value of partial perfect information (EVPPI) using both the conventional nested Monte Carlo approach and the GAM approach. EVSI was calculated using the GAM approach. For all 3 case studies, the GAM approach consistently gave similar estimates of EVPPI compared with the conventional approach. The EVSI behaved as expected: it increased and converged to EVPPI for larger sample sizes. For each case study, generating the PSA results for the GAM approach required 3 to 4 days on a shared cluster, after which EVPPI and EVSI across a range of sample sizes were evaluated in minutes. The conventional approach required approximately 5 weeks for the EVPPI calculation alone. Estimating EVSI using the GAM approach with results from a PSA dramatically reduced the time required to conduct a computationally intense project, which would otherwise have been impractical. Using the GAM approach, we can efficiently provide policy makers with EVSI estimates, even for complex patient-level microsimulation models.

  18. Comparison of Methods for Analyzing Left-Censored Occupational Exposure Data

    PubMed Central

    Huynh, Tran; Ramachandran, Gurumurthy; Banerjee, Sudipto; Monteiro, Joao; Stenzel, Mark; Sandler, Dale P.; Engel, Lawrence S.; Kwok, Richard K.; Blair, Aaron; Stewart, Patricia A.

    2014-01-01

    The National Institute for Environmental Health Sciences (NIEHS) is conducting an epidemiologic study (GuLF STUDY) to investigate the health of the workers and volunteers who participated from April to December of 2010 in the response and cleanup of the oil release after the Deepwater Horizon explosion in the Gulf of Mexico. The exposure assessment component of the study involves analyzing thousands of personal monitoring measurements that were collected during this effort. A substantial portion of these data has values reported by the analytic laboratories to be below the limits of detection (LOD). A simulation study was conducted to evaluate three established methods for analyzing data with censored observations to estimate the arithmetic mean (AM), geometric mean (GM), geometric standard deviation (GSD), and the 95th percentile (X0.95) of the exposure distribution: the maximum likelihood (ML) estimation, the β-substitution, and the Kaplan–Meier (K-M) methods. Each method was challenged with computer-generated exposure datasets drawn from lognormal and mixed lognormal distributions with sample sizes (N) varying from 5 to 100, GSDs ranging from 2 to 5, and censoring levels ranging from 10 to 90%, with single and multiple LODs. Using relative bias and relative root mean squared error (rMSE) as the evaluation metrics, the β-substitution method generally performed as well or better than the ML and K-M methods in most simulated lognormal and mixed lognormal distribution conditions. The ML method was suitable for large sample sizes (N ≥ 30) up to 80% censoring for lognormal distributions with small variability (GSD = 2–3). The K-M method generally provided accurate estimates of the AM when the censoring was <50% for lognormal and mixed distributions. The accuracy and precision of all methods decreased under high variability (GSD = 4 and 5) and small to moderate sample sizes (N < 20) but the β-substitution was still the best of the three methods. When using the ML method, practitioners are cautioned to be aware of different ways of estimating the AM as they could lead to biased interpretation. A limitation of the β-substitution method is the absence of a confidence interval for the estimate. More research is needed to develop methods that could improve the estimation accuracy for small sample sizes and high percent censored data and also provide uncertainty intervals. PMID:25261453

  19. Population Size Estimation of Men Who Have Sex with Men in Ho Chi Minh City and Nghe An Using Social App Multiplier Method.

    PubMed

    Safarnejad, Ali; Nga, Nguyen Thien; Son, Vo Hai

    2017-06-01

    This study aims to estimate the number of men who have sex with men (MSM) in Ho Chi Minh City (HCMC) and Nghe An province, Viet Nam, using a novel method of population size estimation, and to assess the feasibility of the method in implementation. An innovative approach to population size estimation grounded on the principles of the multiplier method, and using social app technology and internet-based surveys was undertaken among MSM in two regions of Viet Nam in 2015. Enumeration of active users of popular social apps for MSM in Viet Nam was conducted over 4 weeks. Subsequently, an independent online survey was done using respondent driven sampling. We also conducted interviews with key informants in Nghe An and HCMC on their experience and perceptions of this method and other methods of size estimation. The population of MSM in Nghe An province was estimated to be 1765 [90% CI 1251-3150]. The population of MSM in HCMC was estimated to be 37,238 [90% CI 24,146-81,422]. These estimates correspond to 0.17% of the adult male population in Nghe An province [90% CI 0.12-0.30], and 1.35% of the adult male population in HCMC [90% CI 0.87-2.95]. Our size estimates of the MSM population (1.35% [90% CI 0.87%-2.95%] of the adult male population in HCMC) fall within current standard practice of estimating 1-3% of adult male population in big cities. Our size estimates of the MSM population (0.17% [90% CI 0.12-0.30] of the adult male population in Nghe An province) are lower than the current standard practice of estimating 0.5-1.5% of adult male population in rural provinces. These estimates can provide valuable information for sub-national level HIV prevention program planning and evaluation. Furthermore, we believe that our results help to improve application of this population size estimation method in other regions of Viet Nam.

  20. A Fixed-Precision Sequential Sampling Plan for the Potato Tuberworm Moth, Phthorimaea operculella Zeller (Lepidoptera: Gelechidae), on Potato Cultivars.

    PubMed

    Shahbi, M; Rajabpour, A

    2017-08-01

    Phthorimaea operculella Zeller is an important pest of potato in Iran. Spatial distribution and fixed-precision sequential sampling for population estimation of the pest on two potato cultivars, Arinda ® and Sante ® , were studied in two separate potato fields during two growing seasons (2013-2014 and 2014-2015). Spatial distribution was investigated by Taylor's power law and Iwao's patchiness. Results showed that the spatial distribution of eggs and larvae was random. In contrast to Iwao's patchiness, Taylor's power law provided a highly significant relationship between variance and mean density. Therefore, fixed-precision sequential sampling plan was developed by Green's model at two precision levels of 0.25 and 0.1. The optimum sample size on Arinda ® and Sante ® cultivars at precision level of 0.25 ranged from 151 to 813 and 149 to 802 leaves, respectively. At 0.1 precision level, the sample sizes varied from 5083 to 1054 and 5100 to 1050 leaves for Arinda ® and Sante ® cultivars, respectively. Therefore, the optimum sample sizes for the cultivars, with different resistance levels, were not significantly different. According to the calculated stop lines, the sampling must be continued until cumulative number of eggs + larvae reached to 15-16 or 96-101 individuals at precision levels of 0.25 or 0.1, respectively. The performance of the sampling plan was validated by resampling analysis using resampling for validation of sampling plans software. The sampling plant provided in this study can be used to obtain a rapid estimate of the pest density with minimal effort.

Top