Sample records for statistical sampling study

  1. 42 CFR 402.109 - Statistical sampling.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 42 Public Health 2 2011-10-01 2011-10-01 false Statistical sampling. 402.109 Section 402.109... Statistical sampling. (a) Purpose. CMS or OIG may introduce the results of a statistical sampling study to... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study...

  2. 42 CFR 402.109 - Statistical sampling.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 42 Public Health 2 2010-10-01 2010-10-01 false Statistical sampling. 402.109 Section 402.109... Statistical sampling. (a) Purpose. CMS or OIG may introduce the results of a statistical sampling study to... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study...

  3. 45 CFR 160.536 - Statistical sampling.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 45 Public Welfare 1 2010-10-01 2010-10-01 false Statistical sampling. 160.536 Section 160.536... REQUIREMENTS GENERAL ADMINISTRATIVE REQUIREMENTS Procedures for Hearings § 160.536 Statistical sampling. (a) In... statistical sampling study as evidence of the number of violations under § 160.406 of this part, or the...

  4. 42 CFR 1003.133 - Statistical sampling.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 42 Public Health 5 2011-10-01 2011-10-01 false Statistical sampling. 1003.133 Section 1003.133... AUTHORITIES CIVIL MONEY PENALTIES, ASSESSMENTS AND EXCLUSIONS § 1003.133 Statistical sampling. (a) In meeting... statistical sampling study as evidence of the number and amount of claims and/or requests for payment as...

  5. 45 CFR 160.536 - Statistical sampling.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 45 Public Welfare 1 2011-10-01 2011-10-01 false Statistical sampling. 160.536 Section 160.536... REQUIREMENTS GENERAL ADMINISTRATIVE REQUIREMENTS Procedures for Hearings § 160.536 Statistical sampling. (a) In... statistical sampling study as evidence of the number of violations under § 160.406 of this part, or the...

  6. 42 CFR 1003.133 - Statistical sampling.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 42 Public Health 5 2010-10-01 2010-10-01 false Statistical sampling. 1003.133 Section 1003.133... AUTHORITIES CIVIL MONEY PENALTIES, ASSESSMENTS AND EXCLUSIONS § 1003.133 Statistical sampling. (a) In meeting... statistical sampling study as evidence of the number and amount of claims and/or requests for payment as...

  7. Evidence for a Global Sampling Process in Extraction of Summary Statistics of Item Sizes in a Set.

    PubMed

    Tokita, Midori; Ueda, Sachiyo; Ishiguchi, Akira

    2016-01-01

    Several studies have shown that our visual system may construct a "summary statistical representation" over groups of visual objects. Although there is a general understanding that human observers can accurately represent sets of a variety of features, many questions on how summary statistics, such as an average, are computed remain unanswered. This study investigated sampling properties of visual information used by human observers to extract two types of summary statistics of item sets, average and variance. We presented three models of ideal observers to extract the summary statistics: a global sampling model without sampling noise, global sampling model with sampling noise, and limited sampling model. We compared the performance of an ideal observer of each model with that of human observers using statistical efficiency analysis. Results suggest that summary statistics of items in a set may be computed without representing individual items, which makes it possible to discard the limited sampling account. Moreover, the extraction of summary statistics may not necessarily require the representation of individual objects with focused attention when the sets of items are larger than 4.

  8. An audit of the statistics and the comparison with the parameter in the population

    NASA Astrophysics Data System (ADS)

    Bujang, Mohamad Adam; Sa'at, Nadiah; Joys, A. Reena; Ali, Mariana Mohamad

    2015-10-01

    The sufficient sample size that is needed to closely estimate the statistics for particular parameters are use to be an issue. Although sample size might had been calculated referring to objective of the study, however, it is difficult to confirm whether the statistics are closed with the parameter for a particular population. All these while, guideline that uses a p-value less than 0.05 is widely used as inferential evidence. Therefore, this study had audited results that were analyzed from various sub sample and statistical analyses and had compared the results with the parameters in three different populations. Eight types of statistical analysis and eight sub samples for each statistical analysis were analyzed. Results found that the statistics were consistent and were closed to the parameters when the sample study covered at least 15% to 35% of population. Larger sample size is needed to estimate parameter that involve with categorical variables compared with numerical variables. Sample sizes with 300 to 500 are sufficient to estimate the parameters for medium size of population.

  9. Understanding the Sampling Distribution and the Central Limit Theorem.

    ERIC Educational Resources Information Center

    Lewis, Charla P.

    The sampling distribution is a common source of misuse and misunderstanding in the study of statistics. The sampling distribution, underlying distribution, and the Central Limit Theorem are all interconnected in defining and explaining the proper use of the sampling distribution of various statistics. The sampling distribution of a statistic is…

  10. [Effect sizes, statistical power and sample sizes in "the Japanese Journal of Psychology"].

    PubMed

    Suzukawa, Yumi; Toyoda, Hideki

    2012-04-01

    This study analyzed the statistical power of research studies published in the "Japanese Journal of Psychology" in 2008 and 2009. Sample effect sizes and sample statistical powers were calculated for each statistical test and analyzed with respect to the analytical methods and the fields of the studies. The results show that in the fields like perception, cognition or learning, the effect sizes were relatively large, although the sample sizes were small. At the same time, because of the small sample sizes, some meaningful effects could not be detected. In the other fields, because of the large sample sizes, meaningless effects could be detected. This implies that researchers who could not get large enough effect sizes would use larger samples to obtain significant results.

  11. Developing Sampling Frame for Case Study: Challenges and Conditions

    ERIC Educational Resources Information Center

    Ishak, Noriah Mohd; Abu Bakar, Abu Yazid

    2014-01-01

    Due to statistical analysis, the issue of random sampling is pertinent to any quantitative study. Unlike quantitative study, the elimination of inferential statistical analysis, allows qualitative researchers to be more creative in dealing with sampling issue. Since results from qualitative study cannot be generalized to the bigger population,…

  12. [A comparison of convenience sampling and purposive sampling].

    PubMed

    Suen, Lee-Jen Wu; Huang, Hui-Man; Lee, Hao-Hsien

    2014-06-01

    Convenience sampling and purposive sampling are two different sampling methods. This article first explains sampling terms such as target population, accessible population, simple random sampling, intended sample, actual sample, and statistical power analysis. These terms are then used to explain the difference between "convenience sampling" and purposive sampling." Convenience sampling is a non-probabilistic sampling technique applicable to qualitative or quantitative studies, although it is most frequently used in quantitative studies. In convenience samples, subjects more readily accessible to the researcher are more likely to be included. Thus, in quantitative studies, opportunity to participate is not equal for all qualified individuals in the target population and study results are not necessarily generalizable to this population. As in all quantitative studies, increasing the sample size increases the statistical power of the convenience sample. In contrast, purposive sampling is typically used in qualitative studies. Researchers who use this technique carefully select subjects based on study purpose with the expectation that each participant will provide unique and rich information of value to the study. As a result, members of the accessible population are not interchangeable and sample size is determined by data saturation not by statistical power analysis.

  13. Statistical considerations for agroforestry studies

    Treesearch

    James A. Baldwin

    1993-01-01

    Statistical topics that related to agroforestry studies are discussed. These included study objectives, populations of interest, sampling schemes, sample sizes, estimation vs. hypothesis testing, and P-values. In addition, a relatively new and very much improved histogram display is described.

  14. How Large Should a Statistical Sample Be?

    ERIC Educational Resources Information Center

    Menil, Violeta C.; Ye, Ruili

    2012-01-01

    This study serves as a teaching aid for teachers of introductory statistics. The aim of this study was limited to determining various sample sizes when estimating population proportion. Tables on sample sizes were generated using a C[superscript ++] program, which depends on population size, degree of precision or error level, and confidence…

  15. Comparing Simulated and Theoretical Sampling Distributions of the U3 Person-Fit Statistic.

    ERIC Educational Resources Information Center

    Emons, Wilco H. M.; Meijer, Rob R.; Sijtsma, Klaas

    2002-01-01

    Studied whether the theoretical sampling distribution of the U3 person-fit statistic is in agreement with the simulated sampling distribution under different item response theory models and varying item and test characteristics. Simulation results suggest that the use of standard normal deviates for the standardized version of the U3 statistic may…

  16. Rasch fit statistics and sample size considerations for polytomous data.

    PubMed

    Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael

    2008-05-29

    Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire - 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges.

  17. Rasch fit statistics and sample size considerations for polytomous data

    PubMed Central

    Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael

    2008-01-01

    Background Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Methods Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire – 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. Results The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. Conclusion It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges. PMID:18510722

  18. [Statistical prediction methods in violence risk assessment and its application].

    PubMed

    Liu, Yuan-Yuan; Hu, Jun-Mei; Yang, Min; Li, Xiao-Song

    2013-06-01

    It is an urgent global problem how to improve the violence risk assessment. As a necessary part of risk assessment, statistical methods have remarkable impacts and effects. In this study, the predicted methods in violence risk assessment from the point of statistics are reviewed. The application of Logistic regression as the sample of multivariate statistical model, decision tree model as the sample of data mining technique, and neural networks model as the sample of artificial intelligence technology are all reviewed. This study provides data in order to contribute the further research of violence risk assessment.

  19. CAN'T MISS--conquer any number task by making important statistics simple. Part 2. Probability, populations, samples, and normal distributions.

    PubMed

    Hansen, John P

    2003-01-01

    Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 2, describes probability, populations, and samples. The uses of descriptive and inferential statistics are outlined. The article also discusses the properties and probability of normal distributions, including the standard normal distribution.

  20. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

    PubMed

    Lin, Johnny; Bentler, Peter M

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.

  1. Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli.

    PubMed

    Westfall, Jacob; Kenny, David A; Judd, Charles M

    2014-10-01

    Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The conventional procedures of statistical power analysis fail to provide appropriate answers to these questions because they are based on statistical models in which stimuli are not assumed to be a source of random variation in the data, models that are inappropriate for experiments involving crossed random factors of participants and stimuli. In this article, we present new methods of power analysis for designs with crossed random factors, and we give detailed, practical guidance to psychology researchers planning experiments in which a sample of participants responds to a sample of stimuli. We extensively examine 5 commonly used experimental designs, describe how to estimate statistical power in each, and provide power analysis results based on a reasonable set of default parameter values. We then develop general conclusions and formulate rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli. We show that in crossed designs, statistical power typically does not approach unity as the number of participants goes to infinity but instead approaches a maximum attainable power value that is possibly small, depending on the stimulus sample. We also consider the statistical merits of designs involving multiple stimulus blocks. Finally, we provide a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.

  2. Qualitative Meta-Analysis on the Hospital Task: Implications for Research

    ERIC Educational Resources Information Center

    Noll, Jennifer; Sharma, Sashi

    2014-01-01

    The "law of large numbers" indicates that as sample size increases, sample statistics become less variable and more closely estimate their corresponding population parameters. Different research studies investigating how people consider sample size when evaluating the reliability of a sample statistic have found a wide range of…

  3. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis

    PubMed Central

    Lin, Johnny; Bentler, Peter M.

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne’s asymptotically distribution-free method and Satorra Bentler’s mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler’s statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby’s study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic. PMID:23144511

  4. EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.

    PubMed

    Tong, Xiaoxiao; Bentler, Peter M

    2013-01-01

    Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.

  5. Causality in Statistical Power: Isomorphic Properties of Measurement, Research Design, Effect Size, and Sample Size.

    PubMed

    Heidel, R Eric

    2016-01-01

    Statistical power is the ability to detect a significant effect, given that the effect actually exists in a population. Like most statistical concepts, statistical power tends to induce cognitive dissonance in hepatology researchers. However, planning for statistical power by an a priori sample size calculation is of paramount importance when designing a research study. There are five specific empirical components that make up an a priori sample size calculation: the scale of measurement of the outcome, the research design, the magnitude of the effect size, the variance of the effect size, and the sample size. A framework grounded in the phenomenon of isomorphism, or interdependencies amongst different constructs with similar forms, will be presented to understand the isomorphic effects of decisions made on each of the five aforementioned components of statistical power.

  6. Comparative Financial Statistics for Public Two-Year Colleges: FY 1993 Peer Group Sample.

    ERIC Educational Resources Information Center

    Dickmeyer, Nathan; Meeker, Bradley

    Comparative financial information derived from a national sample of 516 two-year colleges is presented in this report for fiscal year 1992-93, including statistics for the national sample and for six peer groups. The report's nine sections focus on: (1) introductory information about the study's background, objectives, and sample; the National…

  7. Designing Intervention Studies: Selected Populations, Range Restrictions, and Statistical Power

    PubMed Central

    Miciak, Jeremy; Taylor, W. Pat; Stuebing, Karla K.; Fletcher, Jack M.; Vaughn, Sharon

    2016-01-01

    An appropriate estimate of statistical power is critical for the design of intervention studies. Although the inclusion of a pretest covariate in the test of the primary outcome can increase statistical power, samples selected on the basis of pretest performance may demonstrate range restriction on the selection measure and other correlated measures. This can result in attenuated pretest-posttest correlations, reducing the variance explained by the pretest covariate. We investigated the implications of two potential range restriction scenarios: direct truncation on a selection measure and indirect range restriction on correlated measures. Empirical and simulated data indicated direct range restriction on the pretest covariate greatly reduced statistical power and necessitated sample size increases of 82%–155% (dependent on selection criteria) to achieve equivalent statistical power to parameters with unrestricted samples. However, measures demonstrating indirect range restriction required much smaller sample size increases (32%–71%) under equivalent scenarios. Additional analyses manipulated the correlations between measures and pretest-posttest correlations to guide planning experiments. Results highlight the need to differentiate between selection measures and potential covariates and to investigate range restriction as a factor impacting statistical power. PMID:28479943

  8. Designing Intervention Studies: Selected Populations, Range Restrictions, and Statistical Power.

    PubMed

    Miciak, Jeremy; Taylor, W Pat; Stuebing, Karla K; Fletcher, Jack M; Vaughn, Sharon

    2016-01-01

    An appropriate estimate of statistical power is critical for the design of intervention studies. Although the inclusion of a pretest covariate in the test of the primary outcome can increase statistical power, samples selected on the basis of pretest performance may demonstrate range restriction on the selection measure and other correlated measures. This can result in attenuated pretest-posttest correlations, reducing the variance explained by the pretest covariate. We investigated the implications of two potential range restriction scenarios: direct truncation on a selection measure and indirect range restriction on correlated measures. Empirical and simulated data indicated direct range restriction on the pretest covariate greatly reduced statistical power and necessitated sample size increases of 82%-155% (dependent on selection criteria) to achieve equivalent statistical power to parameters with unrestricted samples. However, measures demonstrating indirect range restriction required much smaller sample size increases (32%-71%) under equivalent scenarios. Additional analyses manipulated the correlations between measures and pretest-posttest correlations to guide planning experiments. Results highlight the need to differentiate between selection measures and potential covariates and to investigate range restriction as a factor impacting statistical power.

  9. An introduction to medical statistics for health care professionals: Hypothesis tests and estimation.

    PubMed

    Thomas, Elaine

    2005-01-01

    This article is the second in a series of three that will give health care professionals (HCPs) a sound introduction to medical statistics (Thomas, 2004). The objective of research is to find out about the population at large. However, it is generally not possible to study the whole of the population and research questions are addressed in an appropriate study sample. The next crucial step is then to use the information from the sample of individuals to make statements about the wider population of like individuals. This procedure of drawing conclusions about the population, based on study data, is known as inferential statistics. The findings from the study give us the best estimate of what is true for the relevant population, given the sample is representative of the population. It is important to consider how accurate this best estimate is, based on a single sample, when compared to the unknown population figure. Any difference between the observed sample result and the population characteristic is termed the sampling error. This article will cover the two main forms of statistical inference (hypothesis tests and estimation) along with issues that need to be addressed when considering the implications of the study results. Copyright (c) 2005 Whurr Publishers Ltd.

  10. Challenging Conventional Wisdom for Multivariate Statistical Models with Small Samples

    ERIC Educational Resources Information Center

    McNeish, Daniel

    2017-01-01

    In education research, small samples are common because of financial limitations, logistical challenges, or exploratory studies. With small samples, statistical principles on which researchers rely do not hold, leading to trust issues with model estimates and possible replication issues when scaling up. Researchers are generally aware of such…

  11. Sample Size and Statistical Conclusions from Tests of Fit to the Rasch Model According to the Rasch Unidimensional Measurement Model (Rumm) Program in Health Outcome Measurement.

    PubMed

    Hagell, Peter; Westergren, Albert

    Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).

  12. Multiple category-lot quality assurance sampling: a new classification system with application to schistosomiasis control.

    PubMed

    Olives, Casey; Valadez, Joseph J; Brooker, Simon J; Pagano, Marcello

    2012-01-01

    Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%), and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n=15 and n=25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n=15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools.

  13. Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement.

    PubMed

    Austin, Peter C

    2007-11-01

    I conducted a systematic review of the use of propensity score matching in the cardiovascular surgery literature. I examined the adequacy of reporting and whether appropriate statistical methods were used. I examined 60 articles published in the Annals of Thoracic Surgery, European Journal of Cardio-thoracic Surgery, Journal of Cardiovascular Surgery, and the Journal of Thoracic and Cardiovascular Surgery between January 1, 2004, and December 31, 2006. Thirty-one of the 60 studies did not provide adequate information on how the propensity score-matched pairs were formed. Eleven (18%) of studies did not report on whether matching on the propensity score balanced baseline characteristics between treated and untreated subjects in the matched sample. No studies used appropriate methods to compare baseline characteristics between treated and untreated subjects in the propensity score-matched sample. Eight (13%) of the 60 studies explicitly used statistical methods appropriate for the analysis of matched data when estimating the effect of treatment on the outcomes. Two studies used appropriate methods for some outcomes, but not for all outcomes. Thirty-nine (65%) studies explicitly used statistical methods that were inappropriate for matched-pairs data when estimating the effect of treatment on outcomes. Eleven studies did not report the statistical tests that were used to assess the statistical significance of the treatment effect. Analysis of propensity score-matched samples tended to be poor in the cardiovascular surgery literature. Most statistical analyses ignored the matched nature of the sample. I provide suggestions for improving the reporting and analysis of studies that use propensity score matching.

  14. Comparing identified and statistically significant lipids and polar metabolites in 15-year old serum and dried blood spot samples for longitudinal studies: Comparing lipids and metabolites in serum and DBS samples

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kyle, Jennifer E.; Casey, Cameron P.; Stratton, Kelly G.

    The use of dried blood spots (DBS) has many advantages over traditional plasma and serum samples such as smaller blood volume required, storage at room temperature, and ability for sampling in remote locations. However, understanding the robustness of different analytes in DBS samples is essential, especially in older samples collected for longitudinal studies. Here we analyzed DBS samples collected in 2000-2001 and stored at room temperature and compared them to matched serum samples stored at -80°C to determine if they could be effectively used as specific time points in a longitudinal study following metabolic disease. Four hundred small molecules weremore » identified in both the serum and DBS samples using gas chromatograph-mass spectrometry (GC-MS), liquid chromatography-MS (LC-MS) and LC-ion mobility spectrometry-MS (LC-IMS-MS). The identified polar metabolites overlapped well between the sample types, though only one statistically significant polar metabolite in a case-control study was conserved, indicating degradation occurs in the DBS samples affecting quantitation. Differences in the lipid identifications indicated that some oxidation occurs in the DBS samples. However, thirty-six statistically significant lipids correlated in both sample types indicating that lipid quantitation was more stable across the sample types.« less

  15. Statistical literacy and sample survey results

    NASA Astrophysics Data System (ADS)

    McAlevey, Lynn; Sullivan, Charles

    2010-10-01

    Sample surveys are widely used in the social sciences and business. The news media almost daily quote from them, yet they are widely misused. Using students with prior managerial experience embarking on an MBA course, we show that common sample survey results are misunderstood even by those managers who have previously done a statistics course. In general, they fare no better than managers who have never studied statistics. There are implications for teaching, especially in business schools, as well as for consulting.

  16. A Feasibility Study of the Collection of Unscheduled Maintenance Data Using Statistical Sampling Techniques.

    DTIC Science & Technology

    1985-09-01

    TECHNIQUES THESIS Robert A. Heinlein Captain, USAF AFIT/GLM/LSM/855-32.- _ DTIC MU’noN ’ST.,TEMENT A A-ZELECTE Approved lt public teleo*I Al \\ Z #&N0V21...343" A FEASIBILITY STUDY OF THE COLLECTION OF UNSCHEDULED MAINTENANCE DATA USING STrATISTICAL SAMPLING TECHNIQUES THESIS L .9 Robe-t A. Heinlein...a AFIT/GLM/LSM/85S-32 A FEASIBILITY STUDY OF THE COLLECTION OF UNSCHEDULED MAINTENANCE DATA USING STATISTICAL SAMPLING TECHNIQUES THESIS

  17. Statistical Inference for Data Adaptive Target Parameters.

    PubMed

    Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J

    2016-05-01

    Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.

  18. Valid statistical inference methods for a case-control study with missing data.

    PubMed

    Tian, Guo-Liang; Zhang, Chi; Jiang, Xuejun

    2018-04-01

    The main objective of this paper is to derive the valid sampling distribution of the observed counts in a case-control study with missing data under the assumption of missing at random by employing the conditional sampling method and the mechanism augmentation method. The proposed sampling distribution, called the case-control sampling distribution, can be used to calculate the standard errors of the maximum likelihood estimates of parameters via the Fisher information matrix and to generate independent samples for constructing small-sample bootstrap confidence intervals. Theoretical comparisons of the new case-control sampling distribution with two existing sampling distributions exhibit a large difference. Simulations are conducted to investigate the influence of the three different sampling distributions on statistical inferences. One finding is that the conclusion by the Wald test for testing independency under the two existing sampling distributions could be completely different (even contradictory) from the Wald test for testing the equality of the success probabilities in control/case groups under the proposed distribution. A real cervical cancer data set is used to illustrate the proposed statistical methods.

  19. Genome-wide association analysis of secondary imaging phenotypes from the Alzheimer's disease neuroimaging initiative study.

    PubMed

    Zhu, Wensheng; Yuan, Ying; Zhang, Jingwen; Zhou, Fan; Knickmeyer, Rebecca C; Zhu, Hongtu

    2017-02-01

    The aim of this paper is to systematically evaluate a biased sampling issue associated with genome-wide association analysis (GWAS) of imaging phenotypes for most imaging genetic studies, including the Alzheimer's Disease Neuroimaging Initiative (ADNI). Specifically, the original sampling scheme of these imaging genetic studies is primarily the retrospective case-control design, whereas most existing statistical analyses of these studies ignore such sampling scheme by directly correlating imaging phenotypes (called the secondary traits) with genotype. Although it has been well documented in genetic epidemiology that ignoring the case-control sampling scheme can produce highly biased estimates, and subsequently lead to misleading results and suspicious associations, such findings are not well documented in imaging genetics. We use extensive simulations and a large-scale imaging genetic data analysis of the Alzheimer's Disease Neuroimaging Initiative (ADNI) data to evaluate the effects of the case-control sampling scheme on GWAS results based on some standard statistical methods, such as linear regression methods, while comparing it with several advanced statistical methods that appropriately adjust for the case-control sampling scheme. Copyright © 2016 Elsevier Inc. All rights reserved.

  20. Does size matter? Statistical limits of paleomagnetic field reconstruction from small rock specimens

    NASA Astrophysics Data System (ADS)

    Berndt, Thomas; Muxworthy, Adrian R.; Fabian, Karl

    2016-01-01

    As samples of ever decreasing sizes are being studied paleomagnetically, care has to be taken that the underlying assumptions of statistical thermodynamics (Maxwell-Boltzmann statistics) are being met. Here we determine how many grains and how large a magnetic moment a sample needs to have to be able to accurately record an ambient field. It is found that for samples with a thermoremanent magnetic moment larger than 10-11Am2 the assumption of a sufficiently large number of grains is usually given. Standard 25 mm diameter paleomagnetic samples usually contain enough magnetic grains such that statistical errors are negligible, but "single silicate crystal" works on, for example, zircon, plagioclase, and olivine crystals are approaching the limits of what is physically possible, leading to statistic errors in both the angular deviation and paleointensity that are comparable to other sources of error. The reliability of nanopaleomagnetic imaging techniques capable of resolving individual grains (used, for example, to study the cloudy zone in meteorites), however, is questionable due to the limited area of the material covered.

  1. The Math Problem: Advertising Students' Attitudes toward Statistics

    ERIC Educational Resources Information Center

    Fullerton, Jami A.; Kendrick, Alice

    2013-01-01

    This study used the Students' Attitudes toward Statistics Scale (STATS) to measure attitude toward statistics among a national sample of advertising students. A factor analysis revealed four underlying factors make up the attitude toward statistics construct--"Interest & Future Applicability," "Confidence," "Statistical Tools," and "Initiative."…

  2. Standard deviation and standard error of the mean.

    PubMed

    Lee, Dong Kyu; In, Junyong; Lee, Sangseok

    2015-06-01

    In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results.

  3. Standard deviation and standard error of the mean

    PubMed Central

    In, Junyong; Lee, Sangseok

    2015-01-01

    In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results. PMID:26045923

  4. A random-sum Wilcoxon statistic and its application to analysis of ROC and LROC data.

    PubMed

    Tang, Liansheng Larry; Balakrishnan, N

    2011-01-01

    The Wilcoxon-Mann-Whitney statistic is commonly used for a distribution-free comparison of two groups. One requirement for its use is that the sample sizes of the two groups are fixed. This is violated in some of the applications such as medical imaging studies and diagnostic marker studies; in the former, the violation occurs since the number of correctly localized abnormal images is random, while in the latter the violation is due to some subjects not having observable measurements. For this reason, we propose here a random-sum Wilcoxon statistic for comparing two groups in the presence of ties, and derive its variance as well as its asymptotic distribution for large sample sizes. The proposed statistic includes the regular Wilcoxon rank-sum statistic. Finally, we apply the proposed statistic for summarizing location response operating characteristic data from a liver computed tomography study, and also for summarizing diagnostic accuracy of biomarker data.

  5. Community College Low-Income and Minority Student Completion Study: Descriptive Statistics from the 1992 High School Cohort

    ERIC Educational Resources Information Center

    Bailey, Thomas; Jenkins, Davis; Leinbach, Timothy

    2005-01-01

    This report summarizes statistics on access and attainment in higher education, focusing particularly on community college students, using data from the National Education Longitudinal Study of 1988 (NELS:88), which follows a nationally representative sample of individuals who were eighth graders in the spring of 1988. A sample of these…

  6. It's all relative: ranking the diversity of aquatic bacterial communities.

    PubMed

    Shaw, Allison K; Halpern, Aaron L; Beeson, Karen; Tran, Bao; Venter, J Craig; Martiny, Jennifer B H

    2008-09-01

    The study of microbial diversity patterns is hampered by the enormous diversity of microbial communities and the lack of resources to sample them exhaustively. For many questions about richness and evenness, however, one only needs to know the relative order of diversity among samples rather than total diversity. We used 16S libraries from the Global Ocean Survey to investigate the ability of 10 diversity statistics (including rarefaction, non-parametric, parametric, curve extrapolation and diversity indices) to assess the relative diversity of six aquatic bacterial communities. Overall, we found that the statistics yielded remarkably similar rankings of the samples for a given sequence similarity cut-off. This correspondence, despite the different underlying assumptions of the statistics, suggests that diversity statistics are a useful tool for ranking samples of microbial diversity. In addition, sequence similarity cut-off influenced the diversity ranking of the samples, demonstrating that diversity statistics can also be used to detect differences in phylogenetic structure among microbial communities. Finally, a subsampling analysis suggests that further sequencing from these particular clone libraries would not have substantially changed the richness rankings of the samples.

  7. Statistical Power in Meta-Analysis

    ERIC Educational Resources Information Center

    Liu, Jin

    2015-01-01

    Statistical power is important in a meta-analysis study, although few studies have examined the performance of simulated power in meta-analysis. The purpose of this study is to inform researchers about statistical power estimation on two sample mean difference test under different situations: (1) the discrepancy between the analytical power and…

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Fangyan; Zhang, Song; Chung Wong, Pak

    Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, themore » size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.« less

  9. Multi-Reader ROC studies with Split-Plot Designs: A Comparison of Statistical Methods

    PubMed Central

    Obuchowski, Nancy A.; Gallas, Brandon D.; Hillis, Stephen L.

    2012-01-01

    Rationale and Objectives Multi-reader imaging trials often use a factorial design, where study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of the design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper we compare three methods of analysis for the split-plot design. Materials and Methods Three statistical methods are presented: Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean ANOVA approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power and confidence interval coverage of the three test statistics. Results The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% CIs fall close to the nominal coverage for small and large sample sizes. Conclusions The split-plot MRMC study design can be statistically efficient compared with the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rate, similar power, and nominal CI coverage, are available for this study design. PMID:23122570

  10. Multi-reader ROC studies with split-plot designs: a comparison of statistical methods.

    PubMed

    Obuchowski, Nancy A; Gallas, Brandon D; Hillis, Stephen L

    2012-12-01

    Multireader imaging trials often use a factorial design, in which study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of this design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper, the authors compare three methods of analysis for the split-plot design. Three statistical methods are presented: the Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean analysis-of-variance approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power, and confidence interval coverage of the three test statistics. The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% confidence intervals falls close to the nominal coverage for small and large sample sizes. The split-plot multireader, multicase study design can be statistically efficient compared to the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rates, similar power, and nominal confidence interval coverage, are available for this study design. Copyright © 2012 AUR. All rights reserved.

  11. Multiple Category-Lot Quality Assurance Sampling: A New Classification System with Application to Schistosomiasis Control

    PubMed Central

    Olives, Casey; Valadez, Joseph J.; Brooker, Simon J.; Pagano, Marcello

    2012-01-01

    Background Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%), and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. Methodology We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n = 15 and n = 25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Principle Findings Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n = 15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. Conclusion/Significance This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools. PMID:22970333

  12. Managing Clustered Data Using Hierarchical Linear Modeling

    ERIC Educational Resources Information Center

    Warne, Russell T.; Li, Yan; McKyer, E. Lisako J.; Condie, Rachel; Diep, Cassandra S.; Murano, Peter S.

    2012-01-01

    Researchers in nutrition research often use cluster or multistage sampling to gather participants for their studies. These sampling methods often produce violations of the assumption of data independence that most traditional statistics share. Hierarchical linear modeling is a statistical method that can overcome violations of the independence…

  13. Statistical analysis of hydrological response in urbanising catchments based on adaptive sampling using inter-amount times

    NASA Astrophysics Data System (ADS)

    ten Veldhuis, Marie-Claire; Schleiss, Marc

    2017-04-01

    In this study, we introduced an alternative approach for analysis of hydrological flow time series, using an adaptive sampling framework based on inter-amount times (IATs). The main difference with conventional flow time series is the rate at which low and high flows are sampled: the unit of analysis for IATs is a fixed flow amount, instead of a fixed time window. We analysed statistical distributions of flows and IATs across a wide range of sampling scales to investigate sensitivity of statistical properties such as quantiles, variance, skewness, scaling parameters and flashiness indicators to the sampling scale. We did this based on streamflow time series for 17 (semi)urbanised basins in North Carolina, US, ranging from 13 km2 to 238 km2 in size. Results showed that adaptive sampling of flow time series based on inter-amounts leads to a more balanced representation of low flow and peak flow values in the statistical distribution. While conventional sampling gives a lot of weight to low flows, as these are most ubiquitous in flow time series, IAT sampling gives relatively more weight to high flow values, when given flow amounts are accumulated in shorter time. As a consequence, IAT sampling gives more information about the tail of the distribution associated with high flows, while conventional sampling gives relatively more information about low flow periods. We will present results of statistical analyses across a range of subdaily to seasonal scales and will highlight some interesting insights that can be derived from IAT statistics with respect to basin flashiness and impact urbanisation on hydrological response.

  14. Image correlation and sampling study

    NASA Technical Reports Server (NTRS)

    Popp, D. J.; Mccormack, D. S.; Sedwick, J. L.

    1972-01-01

    The development of analytical approaches for solving image correlation and image sampling of multispectral data is discussed. Relevant multispectral image statistics which are applicable to image correlation and sampling are identified. The general image statistics include intensity mean, variance, amplitude histogram, power spectral density function, and autocorrelation function. The translation problem associated with digital image registration and the analytical means for comparing commonly used correlation techniques are considered. General expressions for determining the reconstruction error for specific image sampling strategies are developed.

  15. A robust and efficient statistical method for genetic association studies using case and control samples from multiple cohorts

    PubMed Central

    2013-01-01

    Background The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case–control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. Results We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson’s disease (PD) case–control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk. Conclusions We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS. PMID:23394771

  16. In vivo Comet assay--statistical analysis and power calculations of mice testicular cells.

    PubMed

    Hansen, Merete Kjær; Sharma, Anoop Kumar; Dybdahl, Marianne; Boberg, Julie; Kulahci, Murat

    2014-11-01

    The in vivo Comet assay is a sensitive method for evaluating DNA damage. A recurrent concern is how to analyze the data appropriately and efficiently. A popular approach is to summarize the raw data into a summary statistic prior to the statistical analysis. However, consensus on which summary statistic to use has yet to be reached. Another important consideration concerns the assessment of proper sample sizes in the design of Comet assay studies. This study aims to identify a statistic suitably summarizing the % tail DNA of mice testicular samples in Comet assay studies. A second aim is to provide curves for this statistic outlining the number of animals and gels to use. The current study was based on 11 compounds administered via oral gavage in three doses to male mice: CAS no. 110-26-9, CAS no. 512-56-1, CAS no. 111873-33-7, CAS no. 79-94-7, CAS no. 115-96-8, CAS no. 598-55-0, CAS no. 636-97-5, CAS no. 85-28-9, CAS no. 13674-87-8, CAS no. 43100-38-5 and CAS no. 60965-26-6. Testicular cells were examined using the alkaline version of the Comet assay and the DNA damage was quantified as % tail DNA using a fully automatic scoring system. From the raw data 23 summary statistics were examined. A linear mixed-effects model was fitted to the summarized data and the estimated variance components were used to generate power curves as a function of sample size. The statistic that most appropriately summarized the within-sample distributions was the median of the log-transformed data, as it most consistently conformed to the assumptions of the statistical model. Power curves for 1.5-, 2-, and 2.5-fold changes of the highest dose group compared to the control group when 50 and 100 cells were scored per gel are provided to aid in the design of future Comet assay studies on testicular cells. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Preservice Secondary Mathematics Teachers' Statistical Knowledge: A Snapshot of Strengths and Weaknesses

    ERIC Educational Resources Information Center

    Lovett, Jennifer N.; Lee, Hollylynne S.

    2017-01-01

    Amid the implementation of new curriculum standard regarding statistics and new recommendations for preservice secondary mathematics teachers [PSMTs] to teach statistics, there is a need to examine the current state of PSMTs' common statistical knowledge. This study reports on the statistical knowledge 217 PSMTs from a purposeful sample of 18…

  18. [Practical aspects regarding sample size in clinical research].

    PubMed

    Vega Ramos, B; Peraza Yanes, O; Herrera Correa, G; Saldívar Toraya, S

    1996-01-01

    The knowledge of the right sample size let us to be sure if the published results in medical papers had a suitable design and a proper conclusion according to the statistics analysis. To estimate the sample size we must consider the type I error, type II error, variance, the size of the effect, significance and power of the test. To decide what kind of mathematics formula will be used, we must define what kind of study we have, it means if its a prevalence study, a means values one or a comparative one. In this paper we explain some basic topics of statistics and we describe four simple samples of estimation of sample size.

  19. Breaking Free of Sample Size Dogma to Perform Innovative Translational Research

    PubMed Central

    Bacchetti, Peter; Deeks, Steven G.; McCune, Joseph M.

    2011-01-01

    Innovative clinical and translational research is often delayed or prevented by reviewers’ expectations that any study performed in humans must be shown in advance to have high statistical power. This supposed requirement is not justifiable and is contradicted by the reality that increasing sample size produces diminishing marginal returns. Studies of new ideas often must start small (sometimes even with an N of 1) because of cost and feasibility concerns, and recent statistical work shows that small sample sizes for such research can produce more projected scientific value per dollar spent than larger sample sizes. Renouncing false dogma about sample size would remove a serious barrier to innovation and translation. PMID:21677197

  20. An Accurate Methodology to detect Leaching of Nickel and Chromium Ions in the Initial Phase of Orthodontic Treatment: An in vivo Study.

    PubMed

    Kumar, R Vinoth; Rajvikram, N; Rajakumar, P; Saravanan, R; Deepak, V Arun; Vijaykumar, V

    2016-03-01

    The aim of this study was to evaluate the release of nickel and chromium ions in human saliva during fixed orthodontic therapy. Ten patients with Angle's Class-I malocclusion with bimaxillary protrusion without any metal restorations or crowns and with all the permanent teeth were selected. Five male patients and five female patients in the age group range of 14 to 23 years were scheduled for orthodontic treatment with first premolar extraction. Saliva samples were collected in three stages: sample 1, before orthodontic treatment; sample 2, after 10 days of bonding sample; and sample 3, after 1 month of bonding. The samples were analyzed for the following metals nickel and chromium using inductively coupled plasma optical emission spectrometry (ICP-OES). The levels of nickel and chromium were statistically significant, while nickel showed a gradual increase in the first 10 days and a decline thereafter. Chromium showed a gradual increase and was statistically significant on the 30th day. There was greatest release of ions during the first 10 days and a gradual decline thereafter. Control group had traces of nickel and chromium. While comparing levels of nickel in saliva, there was a significant rise from baseline to 10th and 30th-day sample, which was statistically significant. While comparing 10th day to that of 30th day, there was no statistical significance. The levels of chromium ion in the saliva were more in 30th day, and when comparing 10th-day sample with 30th day, there was statistical significance. Nickel and chromium levels were well within the permissible levels. However, some hypersensitive individuals may be allergic to this minimal permissible level.

  1. Statistical analysis of arsenic contamination in drinking water in a city of Iran and its modeling using GIS.

    PubMed

    Sadeghi, Fatemeh; Nasseri, Simin; Mosaferi, Mohammad; Nabizadeh, Ramin; Yunesian, Masud; Mesdaghinia, Alireza

    2017-05-01

    In this research, probable arsenic contamination in drinking water in the city of Ardabil was studied in 163 samples during four seasons. In each season, sampling was carried out randomly in the study area. Results were analyzed statistically applying SPSS 19 software, and the data was also modeled by Arc GIS 10.1 software. The maximum permissible arsenic concentration in drinking water defined by the World Health Organization and Iranian national standard is 10 μg/L. Statistical analysis showed 75, 88, 47, and 69% of samples in autumn, winter, spring, and summer, respectively, had concentrations higher than the national standard. The mean concentrations of arsenic in autumn, winter, spring, and summer were 19.89, 15.9, 10.87, and 14.6 μg/L, respectively, and the overall average in all samples through the year was 15.32 μg/L. Although GIS outputs indicated that the concentration distribution profiles changed in four consecutive seasons, variance analysis of the results showed that statistically there is no significant difference in arsenic levels in four seasons.

  2. Designing Intervention Studies: Selected Populations, Range Restrictions, and Statistical Power

    ERIC Educational Resources Information Center

    Miciak, Jeremy; Taylor, W. Pat; Stuebing, Karla K.; Fletcher, Jack M.; Vaughn, Sharon

    2016-01-01

    An appropriate estimate of statistical power is critical for the design of intervention studies. Although the inclusion of a pretest covariate in the test of the primary outcome can increase statistical power, samples selected on the basis of pretest performance may demonstrate range restriction on the selection measure and other correlated…

  3. IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

    PubMed

    Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

    2017-09-15

    Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  4. Analysis of Statistical Methods Currently used in Toxicology Journals

    PubMed Central

    Na, Jihye; Yang, Hyeri

    2014-01-01

    Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health. PMID:25343012

  5. Analysis of Statistical Methods Currently used in Toxicology Journals.

    PubMed

    Na, Jihye; Yang, Hyeri; Bae, SeungJin; Lim, Kyung-Min

    2014-09-01

    Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health.

  6. Monitoring and identification of spatiotemporal landscape changes in multiple remote sensing images by using a stratified conditional Latin hypercube sampling approach and geostatistical simulation.

    PubMed

    Lin, Yu-Pin; Chu, Hone-Jay; Huang, Yu-Long; Tang, Chia-Hsi; Rouhani, Shahrokh

    2011-06-01

    This study develops a stratified conditional Latin hypercube sampling (scLHS) approach for multiple, remotely sensed, normalized difference vegetation index (NDVI) images. The objective is to sample, monitor, and delineate spatiotemporal landscape changes, including spatial heterogeneity and variability, in a given area. The scLHS approach, which is based on the variance quadtree technique (VQT) and the conditional Latin hypercube sampling (cLHS) method, selects samples in order to delineate landscape changes from multiple NDVI images. The images are then mapped for calibration and validation by using sequential Gaussian simulation (SGS) with the scLHS selected samples. Spatial statistical results indicate that in terms of their statistical distribution, spatial distribution, and spatial variation, the statistics and variograms of the scLHS samples resemble those of multiple NDVI images more closely than those of cLHS and VQT samples. Moreover, the accuracy of simulated NDVI images based on SGS with scLHS samples is significantly better than that of simulated NDVI images based on SGS with cLHS samples and VQT samples, respectively. However, the proposed approach efficiently monitors the spatial characteristics of landscape changes, including the statistics, spatial variability, and heterogeneity of NDVI images. In addition, SGS with the scLHS samples effectively reproduces spatial patterns and landscape changes in multiple NDVI images.

  7. Methodological quality of behavioural weight loss studies: a systematic review

    PubMed Central

    Lemon, S. C.; Wang, M. L.; Haughton, C. F.; Estabrook, D. P.; Frisard, C. F.; Pagoto, S. L.

    2018-01-01

    Summary This systematic review assessed the methodological quality of behavioural weight loss intervention studies conducted among adults and associations between quality and statistically significant weight loss outcome, strength of intervention effectiveness and sample size. Searches for trials published between January, 2009 and December, 2014 were conducted using PUBMED, MEDLINE and PSYCINFO and identified ninety studies. Methodological quality indicators included study design, anthropometric measurement approach, sample size calculations, intent-to-treat (ITT) analysis, loss to follow-up rate, missing data strategy, sampling strategy, report of treatment receipt and report of intervention fidelity (mean = 6.3). Indicators most commonly utilized included randomized design (100%), objectively measured anthropometrics (96.7%), ITT analysis (86.7%) and reporting treatment adherence (76.7%). Most studies (62.2%) had a follow-up rate >75% and reported a loss to follow-up analytic strategy or minimal missing data (69.9%). Describing intervention fidelity (34.4%) and sampling from a known population (41.1%) were least common. Methodological quality was not associated with reporting a statistically significant result, effect size or sample size. This review found the published literature of behavioural weight loss trials to be of high quality for specific indicators, including study design and measurement. Identified for improvement include utilization of more rigorous statistical approaches to loss to follow up and better fidelity reporting. PMID:27071775

  8. MSW Students' Perceptions of Relevance and Application of Statistics: Implications for Field Education

    ERIC Educational Resources Information Center

    Davis, Ashley; Mirick, Rebecca G.

    2015-01-01

    Many social work students feel anxious when taking a statistics course. Their attitudes, beliefs, and behaviors after learning statistics are less known. However, such information could help instructors support students' ongoing development of statistical knowledge. With a sample of MSW students (N = 101) in one program, this study examined…

  9. Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression.

    PubMed

    Chen, Yanguang

    2016-01-01

    In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson's statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran's index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China's regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test.

  10. How Attitudes towards Statistics Courses and the Field of Statistics Predicts Statistics Anxiety among Undergraduate Social Science Majors: A Validation of the Statistical Anxiety Scale

    ERIC Educational Resources Information Center

    O'Bryant, Monique J.

    2017-01-01

    The aim of this study was to validate an instrument that can be used by instructors or social scientist who are interested in evaluating statistics anxiety. The psychometric properties of the English version of the Statistical Anxiety Scale (SAS) was examined through a confirmatory factor analysis of scores from a sample of 323 undergraduate…

  11. The large sample size fallacy.

    PubMed

    Lantz, Björn

    2013-06-01

    Significance in the statistical sense has little to do with significance in the common practical sense. Statistical significance is a necessary but not a sufficient condition for practical significance. Hence, results that are extremely statistically significant may be highly nonsignificant in practice. The degree of practical significance is generally determined by the size of the observed effect, not the p-value. The results of studies based on large samples are often characterized by extreme statistical significance despite small or even trivial effect sizes. Interpreting such results as significant in practice without further analysis is referred to as the large sample size fallacy in this article. The aim of this article is to explore the relevance of the large sample size fallacy in contemporary nursing research. Relatively few nursing articles display explicit measures of observed effect sizes or include a qualitative discussion of observed effect sizes. Statistical significance is often treated as an end in itself. Effect sizes should generally be calculated and presented along with p-values for statistically significant results, and observed effect sizes should be discussed qualitatively through direct and explicit comparisons with the effects in related literature. © 2012 Nordic College of Caring Science.

  12. Statistics 101 for Radiologists.

    PubMed

    Anvari, Arash; Halpern, Elkan F; Samir, Anthony E

    2015-10-01

    Diagnostic tests have wide clinical applications, including screening, diagnosis, measuring treatment effect, and determining prognosis. Interpreting diagnostic test results requires an understanding of key statistical concepts used to evaluate test efficacy. This review explains descriptive statistics and discusses probability, including mutually exclusive and independent events and conditional probability. In the inferential statistics section, a statistical perspective on study design is provided, together with an explanation of how to select appropriate statistical tests. Key concepts in recruiting study samples are discussed, including representativeness and random sampling. Variable types are defined, including predictor, outcome, and covariate variables, and the relationship of these variables to one another. In the hypothesis testing section, we explain how to determine if observed differences between groups are likely to be due to chance. We explain type I and II errors, statistical significance, and study power, followed by an explanation of effect sizes and how confidence intervals can be used to generalize observed effect sizes to the larger population. Statistical tests are explained in four categories: t tests and analysis of variance, proportion analysis tests, nonparametric tests, and regression techniques. We discuss sensitivity, specificity, accuracy, receiver operating characteristic analysis, and likelihood ratios. Measures of reliability and agreement, including κ statistics, intraclass correlation coefficients, and Bland-Altman graphs and analysis, are introduced. © RSNA, 2015.

  13. Evaluation of Primary Immunization Coverage of Infants Under Universal Immunization Programme in an Urban Area of Bangalore City Using Cluster Sampling and Lot Quality Assurance Sampling Techniques

    PubMed Central

    K, Punith; K, Lalitha; G, Suman; BS, Pradeep; Kumar K, Jayanth

    2008-01-01

    Research Question: Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? Objective: To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Study Design: Population-based cross-sectional study. Study Setting: Areas under Mathikere Urban Health Center. Study Subjects: Children aged 12 months to 23 months. Sample Size: 220 in cluster sampling, 76 in lot quality assurance sampling. Statistical Analysis: Percentages and Proportions, Chi square Test. Results: (1) Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2) Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area. PMID:19876474

  14. Got power? A systematic review of sample size adequacy in health professions education research.

    PubMed

    Cook, David A; Hatala, Rose

    2015-03-01

    Many education research studies employ small samples, which in turn lowers statistical power. We re-analyzed the results of a meta-analysis of simulation-based education to determine study power across a range of effect sizes, and the smallest effect that could be plausibly excluded. We systematically searched multiple databases through May 2011, and included all studies evaluating simulation-based education for health professionals in comparison with no intervention or another simulation intervention. Reviewers working in duplicate abstracted information to calculate standardized mean differences (SMD's). We included 897 original research studies. Among the 627 no-intervention-comparison studies the median sample size was 25. Only two studies (0.3%) had ≥80% power to detect a small difference (SMD > 0.2 standard deviations) and 136 (22%) had power to detect a large difference (SMD > 0.8). 110 no-intervention-comparison studies failed to find a statistically significant difference, but none excluded a small difference and only 47 (43%) excluded a large difference. Among 297 studies comparing alternate simulation approaches the median sample size was 30. Only one study (0.3%) had ≥80% power to detect a small difference and 79 (27%) had power to detect a large difference. Of the 128 studies that did not detect a statistically significant effect, 4 (3%) excluded a small difference and 91 (71%) excluded a large difference. In conclusion, most education research studies are powered only to detect effects of large magnitude. For most studies that do not reach statistical significance, the possibility of large and important differences still exists.

  15. Across-cohort QC analyses of GWAS summary statistics from complex traits.

    PubMed

    Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

    2016-01-01

    Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics F st statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.

  16. Across-cohort QC analyses of GWAS summary statistics from complex traits

    PubMed Central

    Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

    2017-01-01

    Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics Fst statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy. PMID:27552965

  17. 42 CFR 405.1064 - ALJ decisions involving statistical samples.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 42 Public Health 2 2011-10-01 2011-10-01 false ALJ decisions involving statistical samples. 405... Medicare Coverage Policies § 405.1064 ALJ decisions involving statistical samples. When an appeal from the QIC involves an overpayment issue and the QIC used a statistical sample in reaching its...

  18. 42 CFR 405.1064 - ALJ decisions involving statistical samples.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 42 Public Health 2 2010-10-01 2010-10-01 false ALJ decisions involving statistical samples. 405... Medicare Coverage Policies § 405.1064 ALJ decisions involving statistical samples. When an appeal from the QIC involves an overpayment issue and the QIC used a statistical sample in reaching its...

  19. Progressive statistics for studies in sports medicine and exercise science.

    PubMed

    Hopkins, William G; Marshall, Stephen W; Batterham, Alan M; Hanin, Juri

    2009-01-01

    Statistical guidelines and expert statements are now available to assist in the analysis and reporting of studies in some biomedical disciplines. We present here a more progressive resource for sample-based studies, meta-analyses, and case studies in sports medicine and exercise science. We offer forthright advice on the following controversial or novel issues: using precision of estimation for inferences about population effects in preference to null-hypothesis testing, which is inadequate for assessing clinical or practical importance; justifying sample size via acceptable precision or confidence for clinical decisions rather than via adequate power for statistical significance; showing SD rather than SEM, to better communicate the magnitude of differences in means and nonuniformity of error; avoiding purely nonparametric analyses, which cannot provide inferences about magnitude and are unnecessary; using regression statistics in validity studies, in preference to the impractical and biased limits of agreement; making greater use of qualitative methods to enrich sample-based quantitative projects; and seeking ethics approval for public access to the depersonalized raw data of a study, to address the need for more scrutiny of research and better meta-analyses. Advice on less contentious issues includes the following: using covariates in linear models to adjust for confounders, to account for individual differences, and to identify potential mechanisms of an effect; using log transformation to deal with nonuniformity of effects and error; identifying and deleting outliers; presenting descriptive, effect, and inferential statistics in appropriate formats; and contending with bias arising from problems with sampling, assignment, blinding, measurement error, and researchers' prejudices. This article should advance the field by stimulating debate, promoting innovative approaches, and serving as a useful checklist for authors, reviewers, and editors.

  20. Statistical description of non-Gaussian samples in the F2 layer of the ionosphere during heliogeophysical disturbances

    NASA Astrophysics Data System (ADS)

    Sergeenko, N. P.

    2017-11-01

    An adequate statistical method should be developed in order to predict probabilistically the range of ionospheric parameters. This problem is solved in this paper. The time series of the critical frequency of the layer F2- foF2( t) were subjected to statistical processing. For the obtained samples {δ foF2}, statistical distributions and invariants up to the fourth order are calculated. The analysis shows that the distributions differ from the Gaussian law during the disturbances. At levels of sufficiently small probability distributions, there are arbitrarily large deviations from the model of the normal process. Therefore, it is attempted to describe statistical samples {δ foF2} based on the Poisson model. For the studied samples, the exponential characteristic function is selected under the assumption that time series are a superposition of some deterministic and random processes. Using the Fourier transform, the characteristic function is transformed into a nonholomorphic excessive-asymmetric probability-density function. The statistical distributions of the samples {δ foF2} calculated for the disturbed periods are compared with the obtained model distribution function. According to the Kolmogorov's criterion, the probabilities of the coincidence of a posteriori distributions with the theoretical ones are P 0.7-0.9. The conducted analysis makes it possible to draw a conclusion about the applicability of a model based on the Poisson random process for the statistical description and probabilistic variation estimates during heliogeophysical disturbances of the variations {δ foF2}.

  1. Statistical scaling of geometric characteristics in stochastically generated pore microstructures

    DOE PAGES

    Hyman, Jeffrey D.; Guadagnini, Alberto; Winter, C. Larrabee

    2015-05-21

    In this study, we analyze the statistical scaling of structural attributes of virtual porous microstructures that are stochastically generated by thresholding Gaussian random fields. Characterization of the extent at which randomly generated pore spaces can be considered as representative of a particular rock sample depends on the metrics employed to compare the virtual sample against its physical counterpart. Typically, comparisons against features and/patterns of geometric observables, e.g., porosity and specific surface area, flow-related macroscopic parameters, e.g., permeability, or autocorrelation functions are used to assess the representativeness of a virtual sample, and thereby the quality of the generation method. Here, wemore » rely on manifestations of statistical scaling of geometric observables which were recently observed in real millimeter scale rock samples [13] as additional relevant metrics by which to characterize a virtual sample. We explore the statistical scaling of two geometric observables, namely porosity (Φ) and specific surface area (SSA), of porous microstructures generated using the method of Smolarkiewicz and Winter [42] and Hyman and Winter [22]. Our results suggest that the method can produce virtual pore space samples displaying the symptoms of statistical scaling observed in real rock samples. Order q sample structure functions (statistical moments of absolute increments) of Φ and SSA scale as a power of the separation distance (lag) over a range of lags, and extended self-similarity (linear relationship between log structure functions of successive orders) appears to be an intrinsic property of the generated media. The width of the range of lags where power-law scaling is observed and the Hurst coefficient associated with the variables we consider can be controlled by the generation parameters of the method.« less

  2. Satellite Sampling and Retrieval Errors in Regional Monthly Rain Estimates from TMI AMSR-E, SSM/I, AMSU-B and the TRMM PR

    NASA Technical Reports Server (NTRS)

    Fisher, Brad; Wolff, David B.

    2010-01-01

    Passive and active microwave rain sensors onboard earth-orbiting satellites estimate monthly rainfall from the instantaneous rain statistics collected during satellite overpasses. It is well known that climate-scale rain estimates from meteorological satellites incur sampling errors resulting from the process of discrete temporal sampling and statistical averaging. Sampling and retrieval errors ultimately become entangled in the estimation of the mean monthly rain rate. The sampling component of the error budget effectively introduces statistical noise into climate-scale rain estimates that obscure the error component associated with the instantaneous rain retrieval. Estimating the accuracy of the retrievals on monthly scales therefore necessitates a decomposition of the total error budget into sampling and retrieval error quantities. This paper presents results from a statistical evaluation of the sampling and retrieval errors for five different space-borne rain sensors on board nine orbiting satellites. Using an error decomposition methodology developed by one of the authors, sampling and retrieval errors were estimated at 0.25 resolution within 150 km of ground-based weather radars located at Kwajalein, Marshall Islands and Melbourne, Florida. Error and bias statistics were calculated according to the land, ocean and coast classifications of the surface terrain mask developed for the Goddard Profiling (GPROF) rain algorithm. Variations in the comparative error statistics are attributed to various factors related to differences in the swath geometry of each rain sensor, the orbital and instrument characteristics of the satellite and the regional climatology. The most significant result from this study found that each of the satellites incurred negative longterm oceanic retrieval biases of 10 to 30%.

  3. Development and Validation of an Instrument to Measure Indonesian Pre-Service Teachers' Conceptions of Statistics

    ERIC Educational Resources Information Center

    Idris, Khairiani; Yang, Kai-Lin

    2017-01-01

    This article reports the results of a mixed-methods approach to develop and validate an instrument to measure Indonesian pre-service teachers' conceptions of statistics. First, a phenomenographic study involving a sample of 44 participants uncovered six categories of conceptions of statistics. Second, an instrument of conceptions of statistics was…

  4. Comparative Financial Statistics for Public Two-Year Colleges: FY 1993 National Sample.

    ERIC Educational Resources Information Center

    Dickmeyer, Nathan; Meeker, Bradley

    This report provides comparative information derived from a national sample of 516 public two-year colleges, highlighting financial statistics for fiscal year, 1992-93. This report provides space for colleges to compare their institutional statistics with national sample medians, quartile data for the national sample, and statistics presented in a…

  5. 7 CFR 52.38a - Definitions of terms applicable to statistical sampling.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Definitions of terms applicable to statistical... Sampling § 52.38a Definitions of terms applicable to statistical sampling. (a) Terms applicable to both on... acceptable as a process average. At the AQL's contained in the statistical sampling plans of this subpart...

  6. 7 CFR 52.38a - Definitions of terms applicable to statistical sampling.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Definitions of terms applicable to statistical... Sampling § 52.38a Definitions of terms applicable to statistical sampling. (a) Terms applicable to both on... acceptable as a process average. At the AQL's contained in the statistical sampling plans of this subpart...

  7. Ambiguities and conflicting results: the limitations of the kappa statistic in establishing the interrater reliability of the Irish nursing minimum data set for mental health: a discussion paper.

    PubMed

    Morris, Roisin; MacNeela, Padraig; Scott, Anne; Treacy, Pearl; Hyde, Abbey; O'Brien, Julian; Lehwaldt, Daniella; Byrne, Anne; Drennan, Jonathan

    2008-04-01

    In a study to establish the interrater reliability of the Irish Nursing Minimum Data Set (I-NMDS) for mental health difficulties relating to the choice of reliability test statistic were encountered. The objective of this paper is to highlight the difficulties associated with testing interrater reliability for an ordinal scale using a relatively homogenous sample and the recommended kw statistic. One pair of mental health nurses completed the I-NMDS for mental health for a total of 30 clients attending a mental health day centre over a two-week period. Data was analysed using the kw and percentage agreement statistics. A total of 34 of the 38 I-NMDS for mental health variables with lower than acceptable levels of kw reliability scores achieved acceptable levels of reliability according to their percentage agreement scores. The study findings implied that, due to the homogeneity of the sample, low variability within the data resulted in the 'base rate problem' associated with the use of kw statistic. Conclusions point to the interpretation of kw in tandem with percentage agreement scores. Suggestions that kw scores were low due to chance agreement and that one should strive to use a study sample with known variability are queried.

  8. Statistical methods for efficient design of community surveys of response to noise: Random coefficients regression models

    NASA Technical Reports Server (NTRS)

    Tomberlin, T. J.

    1985-01-01

    Research studies of residents' responses to noise consist of interviews with samples of individuals who are drawn from a number of different compact study areas. The statistical techniques developed provide a basis for those sample design decisions. These techniques are suitable for a wide range of sample survey applications. A sample may consist of a random sample of residents selected from a sample of compact study areas, or in a more complex design, of a sample of residents selected from a sample of larger areas (e.g., cities). The techniques may be applied to estimates of the effects on annoyance of noise level, numbers of noise events, the time-of-day of the events, ambient noise levels, or other factors. Methods are provided for determining, in advance, how accurately these effects can be estimated for different sample sizes and study designs. Using a simple cost function, they also provide for optimum allocation of the sample across the stages of the design for estimating these effects. These techniques are developed via a regression model in which the regression coefficients are assumed to be random, with components of variance associated with the various stages of a multi-stage sample design.

  9. Effect of the absolute statistic on gene-sampling gene-set analysis methods.

    PubMed

    Nam, Dougu

    2017-06-01

    Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.

  10. Statistical inference involving binomial and negative binomial parameters.

    PubMed

    García-Pérez, Miguel A; Núñez-Antón, Vicente

    2009-05-01

    Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.

  11. A robust clustering algorithm for identifying problematic samples in genome-wide association studies.

    PubMed

    Bellenguez, Céline; Strange, Amy; Freeman, Colin; Donnelly, Peter; Spencer, Chris C A

    2012-01-01

    High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer chris.spencer@well.ox.ac.uk Supplementary data are available at Bioinformatics online.

  12. Introductory Statistics Students' Conceptual Understanding of Study Design and Conclusions

    NASA Astrophysics Data System (ADS)

    Fry, Elizabeth Brondos

    Recommended learning goals for students in introductory statistics courses include the ability to recognize and explain the key role of randomness in designing studies and in drawing conclusions from those studies involving generalizations to a population or causal claims (GAISE College Report ASA Revision Committee, 2016). The purpose of this study was to explore introductory statistics students' understanding of the distinct roles that random sampling and random assignment play in study design and the conclusions that can be made from each. A study design unit lasting two and a half weeks was designed and implemented in four sections of an undergraduate introductory statistics course based on modeling and simulation. The research question that this study attempted to answer is: How does introductory statistics students' conceptual understanding of study design and conclusions (in particular, unbiased estimation and establishing causation) change after participating in a learning intervention designed to promote conceptual change in these areas? In order to answer this research question, a forced-choice assessment called the Inferences from Design Assessment (IDEA) was developed as a pretest and posttest, along with two open-ended assignments, a group quiz and a lab assignment. Quantitative analysis of IDEA results and qualitative analysis of the group quiz and lab assignment revealed that overall, students' mastery of study design concepts significantly increased after the unit, and the great majority of students successfully made the appropriate connections between random sampling and generalization, and between random assignment and causal claims. However, a small, but noticeable portion of students continued to demonstrate misunderstandings, such as confusion between random sampling and random assignment.

  13. The study of combining Latin Hypercube Sampling method and LU decomposition method (LULHS method) for constructing spatial random field

    NASA Astrophysics Data System (ADS)

    WANG, P. T.

    2015-12-01

    Groundwater modeling requires to assign hydrogeological properties to every numerical grid. Due to the lack of detailed information and the inherent spatial heterogeneity, geological properties can be treated as random variables. Hydrogeological property is assumed to be a multivariate distribution with spatial correlations. By sampling random numbers from a given statistical distribution and assigning a value to each grid, a random field for modeling can be completed. Therefore, statistics sampling plays an important role in the efficiency of modeling procedure. Latin Hypercube Sampling (LHS) is a stratified random sampling procedure that provides an efficient way to sample variables from their multivariate distributions. This study combines the the stratified random procedure from LHS and the simulation by using LU decomposition to form LULHS. Both conditional and unconditional simulations of LULHS were develpoed. The simulation efficiency and spatial correlation of LULHS are compared to the other three different simulation methods. The results show that for the conditional simulation and unconditional simulation, LULHS method is more efficient in terms of computational effort. Less realizations are required to achieve the required statistical accuracy and spatial correlation.

  14. Measuring University Students' Approaches to Learning Statistics: An Invariance Study

    ERIC Educational Resources Information Center

    Chiesi, Francesca; Primi, Caterina; Bilgin, Ayse Aysin; Lopez, Maria Virginia; del Carmen Fabrizio, Maria; Gozlu, Sitki; Tuan, Nguyen Minh

    2016-01-01

    The aim of the current study was to provide evidence that an abbreviated version of the Approaches and Study Skills Inventory for Students (ASSIST) was invariant across different languages and educational contexts in measuring university students' learning approaches to statistics. Data were collected on samples of university students attending…

  15. 34 CFR Appendix A to Subpart N of... - Sample Default Prevention Plan

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... relevant default prevention statistics, including a statistical analysis of the borrowers who default on...'s delinquency status by obtaining reports from data managers and FFEL Program lenders. 5. Enhance... academic study. III. Statistics for Measuring Progress 1. The number of students enrolled at your...

  16. The Effects of Finite Sampling on State Assessment Sample Requirements. NAEP Validity Studies. Working Paper Series.

    ERIC Educational Resources Information Center

    Chromy, James R.

    This study addressed statistical techniques that might ameliorate some of the sampling problems currently facing states with small populations participating in State National Assessment of Educational Progress (NAEP) assessments. The study explored how the application of finite population correction factors to the between-school component of…

  17. Robust functional statistics applied to Probability Density Function shape screening of sEMG data.

    PubMed

    Boudaoud, S; Rix, H; Al Harrach, M; Marin, F

    2014-01-01

    Recent studies pointed out possible shape modifications of the Probability Density Function (PDF) of surface electromyographical (sEMG) data according to several contexts like fatigue and muscle force increase. Following this idea, criteria have been proposed to monitor these shape modifications mainly using High Order Statistics (HOS) parameters like skewness and kurtosis. In experimental conditions, these parameters are confronted with small sample size in the estimation process. This small sample size induces errors in the estimated HOS parameters restraining real-time and precise sEMG PDF shape monitoring. Recently, a functional formalism, the Core Shape Model (CSM), has been used to analyse shape modifications of PDF curves. In this work, taking inspiration from CSM method, robust functional statistics are proposed to emulate both skewness and kurtosis behaviors. These functional statistics combine both kernel density estimation and PDF shape distances to evaluate shape modifications even in presence of small sample size. Then, the proposed statistics are tested, using Monte Carlo simulations, on both normal and Log-normal PDFs that mimic observed sEMG PDF shape behavior during muscle contraction. According to the obtained results, the functional statistics seem to be more robust than HOS parameters to small sample size effect and more accurate in sEMG PDF shape screening applications.

  18. Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression

    PubMed Central

    Chen, Yanguang

    2016-01-01

    In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson’s statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran’s index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China’s regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test. PMID:26800271

  19. A normative inference approach for optimal sample sizes in decisions from experience

    PubMed Central

    Ostwald, Dirk; Starke, Ludger; Hertwig, Ralph

    2015-01-01

    “Decisions from experience” (DFE) refers to a body of work that emerged in research on behavioral decision making over the last decade. One of the major experimental paradigms employed to study experience-based choice is the “sampling paradigm,” which serves as a model of decision making under limited knowledge about the statistical structure of the world. In this paradigm respondents are presented with two payoff distributions, which, in contrast to standard approaches in behavioral economics, are specified not in terms of explicit outcome-probability information, but by the opportunity to sample outcomes from each distribution without economic consequences. Participants are encouraged to explore the distributions until they feel confident enough to decide from which they would prefer to draw from in a final trial involving real monetary payoffs. One commonly employed measure to characterize the behavior of participants in the sampling paradigm is the sample size, that is, the number of outcome draws which participants choose to obtain from each distribution prior to terminating sampling. A natural question that arises in this context concerns the “optimal” sample size, which could be used as a normative benchmark to evaluate human sampling behavior in DFE. In this theoretical study, we relate the DFE sampling paradigm to the classical statistical decision theoretic literature and, under a probabilistic inference assumption, evaluate optimal sample sizes for DFE. In our treatment we go beyond analytically established results by showing how the classical statistical decision theoretic framework can be used to derive optimal sample sizes under arbitrary, but numerically evaluable, constraints. Finally, we critically evaluate the value of deriving optimal sample sizes under this framework as testable predictions for the experimental study of sampling behavior in DFE. PMID:26441720

  20. A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis.

    PubMed

    Reese, Sarah E; Archer, Kellie J; Therneau, Terry M; Atkinson, Elizabeth J; Vachon, Celine M; de Andrade, Mariza; Kocher, Jean-Pierre A; Eckel-Passow, Jeanette E

    2013-11-15

    Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal component analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of PCA to quantify the existence of batch effects, called guided PCA (gPCA). We describe a test statistic that uses gPCA to test whether a batch effect exists. We apply our proposed test statistic derived using gPCA to simulated data and to two copy number variation case studies: the first study consisted of 614 samples from a breast cancer family study using Illumina Human 660 bead-chip arrays, whereas the second case study consisted of 703 samples from a family blood pressure study that used Affymetrix SNP Array 6.0. We demonstrate that our statistic has good statistical properties and is able to identify significant batch effects in two copy number variation case studies. We developed a new statistic that uses gPCA to identify whether batch effects exist in high-throughput genomic data. Although our examples pertain to copy number data, gPCA is general and can be used on other data types as well. The gPCA R package (Available via CRAN) provides functionality and data to perform the methods in this article. reesese@vcu.edu

  1. Visual Sample Plan Version 7.0 User's Guide

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matzke, Brett D.; Newburn, Lisa LN; Hathaway, John E.

    2014-03-01

    User's guide for VSP 7.0 This user's guide describes Visual Sample Plan (VSP) Version 7.0 and provides instructions for using the software. VSP selects the appropriate number and location of environmental samples to ensure that the results of statistical tests performed to provide input to risk decisions have the required confidence and performance. VSP Version 7.0 provides sample-size equations or algorithms needed by specific statistical tests appropriate for specific environmental sampling objectives. It also provides data quality assessment and statistical analysis functions to support evaluation of the data and determine whether the data support decisions regarding sites suspected of contamination.more » The easy-to-use program is highly visual and graphic. VSP runs on personal computers with Microsoft Windows operating systems (XP, Vista, Windows 7, and Windows 8). Designed primarily for project managers and users without expertise in statistics, VSP is applicable to two- and three-dimensional populations to be sampled (e.g., rooms and buildings, surface soil, a defined layer of subsurface soil, water bodies, and other similar applications) for studies of environmental quality. VSP is also applicable for designing sampling plans for assessing chem/rad/bio threat and hazard identification within rooms and buildings, and for designing geophysical surveys for unexploded ordnance (UXO) identification.« less

  2. Distribution of the two-sample t-test statistic following blinded sample size re-estimation.

    PubMed

    Lu, Kaifeng

    2016-05-01

    We consider the blinded sample size re-estimation based on the simple one-sample variance estimator at an interim analysis. We characterize the exact distribution of the standard two-sample t-test statistic at the final analysis. We describe a simulation algorithm for the evaluation of the probability of rejecting the null hypothesis at given treatment effect. We compare the blinded sample size re-estimation method with two unblinded methods with respect to the empirical type I error, the empirical power, and the empirical distribution of the standard deviation estimator and final sample size. We characterize the type I error inflation across the range of standardized non-inferiority margin for non-inferiority trials, and derive the adjusted significance level to ensure type I error control for given sample size of the internal pilot study. We show that the adjusted significance level increases as the sample size of the internal pilot study increases. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  3. Approximate sample size formulas for the two-sample trimmed mean test with unequal variances.

    PubMed

    Luh, Wei-Ming; Guo, Jiin-Huarng

    2007-05-01

    Yuen's two-sample trimmed mean test statistic is one of the most robust methods to apply when variances are heterogeneous. The present study develops formulas for the sample size required for the test. The formulas are applicable for the cases of unequal variances, non-normality and unequal sample sizes. Given the specified alpha and the power (1-beta), the minimum sample size needed by the proposed formulas under various conditions is less than is given by the conventional formulas. Moreover, given a specified size of sample calculated by the proposed formulas, simulation results show that Yuen's test can achieve statistical power which is generally superior to that of the approximate t test. A numerical example is provided.

  4. Developing Students' Reasoning about Samples and Sampling Variability as a Path to Expert Statistical Thinking

    ERIC Educational Resources Information Center

    Garfield, Joan; Le, Laura; Zieffler, Andrew; Ben-Zvi, Dani

    2015-01-01

    This paper describes the importance of developing students' reasoning about samples and sampling variability as a foundation for statistical thinking. Research on expert-novice thinking as well as statistical thinking is reviewed and compared. A case is made that statistical thinking is a type of expert thinking, and as such, research…

  5. An empirical determination of the minimum number of measurements needed to estimate the mean random vitrinite reflectance of disseminated organic matter

    USGS Publications Warehouse

    Barker, C.E.; Pawlewicz, M.J.

    1993-01-01

    In coal samples, published recommendations based on statistical methods suggest 100 measurements are needed to estimate the mean random vitrinite reflectance (Rv-r) to within ??2%. Our survey of published thermal maturation studies indicates that those using dispersed organic matter (DOM) mostly have an objective of acquiring 50 reflectance measurements. This smaller objective size in DOM versus that for coal samples poses a statistical contradiction because the standard deviations of DOM reflectance distributions are typically larger indicating a greater sample size is needed to accurately estimate Rv-r in DOM. However, in studies of thermal maturation using DOM, even 50 measurements can be an unrealistic requirement given the small amount of vitrinite often found in such samples. Furthermore, there is generally a reduced need for assuring precision like that needed for coal applications. Therefore, a key question in thermal maturation studies using DOM is how many measurements of Rv-r are needed to adequately estimate the mean. Our empirical approach to this problem is to compute the reflectance distribution statistics: mean, standard deviation, skewness, and kurtosis in increments of 10 measurements. This study compares these intermediate computations of Rv-r statistics with a final one computed using all measurements for that sample. Vitrinite reflectance was measured on mudstone and sandstone samples taken from borehole M-25 in the Cerro Prieto, Mexico geothermal system which was selected because the rocks have a wide range of thermal maturation and a comparable humic DOM with depth. The results of this study suggest that after only 20-30 measurements the mean Rv-r is generally known to within 5% and always to within 12% of the mean Rv-r calculated using all of the measured particles. Thus, even in the worst case, the precision after measuring only 20-30 particles is in good agreement with the general precision of one decimal place recommended for mean Rv-r measurements on DOM. The coefficient of variation (V = standard deviation/mean) is proposed as a statistic to indicate the reliability of the mean Rv-r estimates made at n ??? 20. This preliminary study suggests a V 0.2 suggests an unreliable mean in such small samples. ?? 1993.

  6. [Respondent-Driven Sampling: a new sampling method to study visible and hidden populations].

    PubMed

    Mantecón, Alejandro; Juan, Montse; Calafat, Amador; Becoña, Elisardo; Román, Encarna

    2008-01-01

    The paper introduces a variant of chain-referral sampling: respondent-driven sampling (RDS). This sampling method shows that methods based on network analysis can be combined with the statistical validity of standard probability sampling methods. In this sense, RDS appears to be a mathematical improvement of snowball sampling oriented to the study of hidden populations. However, we try to prove its validity with populations that are not within a sampling frame but can nonetheless be contacted without difficulty. The basics of RDS are explained through our research on young people (aged 14 to 25) who go clubbing, consume alcohol and other drugs, and have sex. Fieldwork was carried out between May and July 2007 in three Spanish regions: Baleares, Galicia and Comunidad Valenciana. The presentation of the study shows the utility of this type of sampling when the population is accessible but there is a difficulty deriving from the lack of a sampling frame. However, the sample obtained is not a random representative one in statistical terms of the target population. It must be acknowledged that the final sample is representative of a 'pseudo-population' that approximates to the target population but is not identical to it.

  7. Efficient statistical tests to compare Youden index: accounting for contingency correlation.

    PubMed

    Chen, Fangyao; Xue, Yuqiang; Tan, Ming T; Chen, Pingyan

    2015-04-30

    Youden index is widely utilized in studies evaluating accuracy of diagnostic tests and performance of predictive, prognostic, or risk models. However, both one and two independent sample tests on Youden index have been derived ignoring the dependence (association) between sensitivity and specificity, resulting in potentially misleading findings. Besides, paired sample test on Youden index is currently unavailable. This article develops efficient statistical inference procedures for one sample, independent, and paired sample tests on Youden index by accounting for contingency correlation, namely associations between sensitivity and specificity and paired samples typically represented in contingency tables. For one and two independent sample tests, the variances are estimated by Delta method, and the statistical inference is based on the central limit theory, which are then verified by bootstrap estimates. For paired samples test, we show that the estimated covariance of the two sensitivities and specificities can be represented as a function of kappa statistic so the test can be readily carried out. We then show the remarkable accuracy of the estimated variance using a constrained optimization approach. Simulation is performed to evaluate the statistical properties of the derived tests. The proposed approaches yield more stable type I errors at the nominal level and substantially higher power (efficiency) than does the original Youden's approach. Therefore, the simple explicit large sample solution performs very well. Because we can readily implement the asymptotic and exact bootstrap computation with common software like R, the method is broadly applicable to the evaluation of diagnostic tests and model performance. Copyright © 2015 John Wiley & Sons, Ltd.

  8. Interpretation of correlations in clinical research.

    PubMed

    Hung, Man; Bounsanga, Jerry; Voss, Maren Wright

    2017-11-01

    Critically analyzing research is a key skill in evidence-based practice and requires knowledge of research methods, results interpretation, and applications, all of which rely on a foundation based in statistics. Evidence-based practice makes high demands on trained medical professionals to interpret an ever-expanding array of research evidence. As clinical training emphasizes medical care rather than statistics, it is useful to review the basics of statistical methods and what they mean for interpreting clinical studies. We reviewed the basic concepts of correlational associations, violations of normality, unobserved variable bias, sample size, and alpha inflation. The foundations of causal inference were discussed and sound statistical analyses were examined. We discuss four ways in which correlational analysis is misused, including causal inference overreach, over-reliance on significance, alpha inflation, and sample size bias. Recent published studies in the medical field provide evidence of causal assertion overreach drawn from correlational findings. The findings present a primer on the assumptions and nature of correlational methods of analysis and urge clinicians to exercise appropriate caution as they critically analyze the evidence before them and evaluate evidence that supports practice. Critically analyzing new evidence requires statistical knowledge in addition to clinical knowledge. Studies can overstate relationships, expressing causal assertions when only correlational evidence is available. Failure to account for the effect of sample size in the analyses tends to overstate the importance of predictive variables. It is important not to overemphasize the statistical significance without consideration of effect size and whether differences could be considered clinically meaningful.

  9. Statistical characteristics of the spatial distribution of territorial contamination by radionuclides from the Chernobyl accident

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arutyunyan, R.V.; Bol`shov, L.A.; Vasil`ev, S.K.

    1994-06-01

    The objective of this study was to clarify a number of issues related to the spatial distribution of contaminants from the Chernobyl accident. The effects of local statistics were addressed by collecting and analyzing (for Cesium 137) soil samples from a number of regions, and it was found that sample activity differed by a factor of 3-5. The effect of local non-uniformity was estimated by modeling the distribution of the average activity of a set of five samples for each of the regions, with the spread in the activities for a {+-}2 range being equal to 25%. The statistical characteristicsmore » of the distribution of contamination were then analyzed and found to be a log-normal distribution with the standard deviation being a function of test area. All data for the Bryanskaya Oblast area were analyzed statistically and were adequately described by a log-normal function.« less

  10. An investigative comparison of purging and non-purging groundwater sampling methods in Karoo aquifer monitoring wells

    NASA Astrophysics Data System (ADS)

    Gomo, M.; Vermeulen, D.

    2015-03-01

    An investigation was conducted to statistically compare the influence of non-purging and purging groundwater sampling methods on analysed inorganic chemistry parameters and calculated saturation indices. Groundwater samples were collected from 15 monitoring wells drilled in Karoo aquifers before and after purging for the comparative study. For the non-purging method, samples were collected from groundwater flow zones located in the wells using electrical conductivity (EC) profiling. The two data sets of non-purged and purged groundwater samples were analysed for inorganic chemistry parameters at the Institute of Groundwater Studies (IGS) laboratory of the Free University in South Africa. Saturation indices for mineral phases that were found in the data base of PHREEQC hydrogeochemical model were calculated for each data set. Four one-way ANOVA tests were conducted using Microsoft excel 2007 to investigate if there is any statistically significant difference between: (1) all inorganic chemistry parameters measured in the non-purged and purged groundwater samples per each specific well, (2) all mineral saturation indices calculated for the non-purged and purged groundwater samples per each specific well, (3) individual inorganic chemistry parameters measured in the non-purged and purged groundwater samples across all wells and (4) Individual mineral saturation indices calculated for non-purged and purged groundwater samples across all wells. For all the ANOVA tests conducted, the calculated alpha values (p) are greater than 0.05 (significance level) and test statistic (F) is less than the critical value (Fcrit) (F < Fcrit). The results imply that there was no statistically significant difference between the two data sets. With a 95% confidence, it was therefore concluded that the variance between groups was rather due to random chance and not to the influence of the sampling methods (tested factor). It is therefore be possible that in some hydrogeologic conditions, non-purged groundwater samples might be just as representative as the purged ones. The findings of this study can provide an important platform for future evidence oriented research investigations to establish the necessity of purging prior to groundwater sampling in different aquifer systems.

  11. Writing to Learn Statistics in an Advanced Placement Statistics Course

    ERIC Educational Resources Information Center

    Northrup, Christian Glenn

    2012-01-01

    This study investigated the use of writing in a statistics classroom to learn if writing provided a rich description of problem-solving processes of students as they solved problems. Through analysis of 329 written samples provided by students, it was determined that writing provided a rich description of problem-solving processes and enabled…

  12. Artificial Intelligence Approach to Support Statistical Quality Control Teaching

    ERIC Educational Resources Information Center

    Reis, Marcelo Menezes; Paladini, Edson Pacheco; Khator, Suresh; Sommer, Willy Arno

    2006-01-01

    Statistical quality control--SQC (consisting of Statistical Process Control, Process Capability Studies, Acceptance Sampling and Design of Experiments) is a very important tool to obtain, maintain and improve the Quality level of goods and services produced by an organization. Despite its importance, and the fact that it is taught in technical and…

  13. Power of tests for comparing trend curves with application to national immunization survey (NIS).

    PubMed

    Zhao, Zhen

    2011-02-28

    To develop statistical tests for comparing trend curves of study outcomes between two socio-demographic strata across consecutive time points, and compare statistical power of the proposed tests under different trend curves data, three statistical tests were proposed. For large sample size with independent normal assumption among strata and across consecutive time points, the Z and Chi-square test statistics were developed, which are functions of outcome estimates and the standard errors at each of the study time points for the two strata. For small sample size with independent normal assumption, the F-test statistic was generated, which is a function of sample size of the two strata and estimated parameters across study period. If two trend curves are approximately parallel, the power of Z-test is consistently higher than that of both Chi-square and F-test. If two trend curves cross at low interaction, the power of Z-test is higher than or equal to the power of both Chi-square and F-test; however, at high interaction, the powers of Chi-square and F-test are higher than that of Z-test. The measurement of interaction of two trend curves was defined. These tests were applied to the comparison of trend curves of vaccination coverage estimates of standard vaccine series with National Immunization Survey (NIS) 2000-2007 data. Copyright © 2011 John Wiley & Sons, Ltd.

  14. Slowdowns in diversification rates from real phylogenies may not be real.

    PubMed

    Cusimano, Natalie; Renner, Susanne S

    2010-07-01

    Studies of diversification patterns often find a slowing in lineage accumulation toward the present. This seemingly pervasive pattern of rate downturns has been taken as evidence for adaptive radiations, density-dependent regulation, and metacommunity species interactions. The significance of rate downturns is evaluated with statistical tests (the gamma statistic and Monte Carlo constant rates (MCCR) test; birth-death likelihood models and Akaike Information Criterion [AIC] scores) that rely on null distributions, which assume that the included species are a random sample of the entire clade. Sampling in real phylogenies, however, often is nonrandom because systematists try to include early-diverging species or representatives of previous intrataxon classifications. We studied the effects of biased sampling, structured sampling, and random sampling by experimentally pruning simulated trees (60 and 150 species) as well as a completely sampled empirical tree (58 species) and then applying the gamma statistic/MCCR test and birth-death likelihood models/AIC scores to assess rate changes. For trees with random species sampling, the true model (i.e., the one fitting the complete phylogenies) could be inferred in most cases. Oversampling deep nodes, however, strongly biases inferences toward downturns, with simulations of structured and biased sampling suggesting that this occurs when sampling percentages drop below 80%. The magnitude of the effect and the sensitivity of diversification rate models is such that a useful rule of thumb may be not to infer rate downturns from real trees unless they have >80% species sampling.

  15. Calculating Confidence, Uncertainty, and Numbers of Samples When Using Statistical Sampling Approaches to Characterize and Clear Contaminated Areas

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Piepel, Gregory F.; Matzke, Brett D.; Sego, Landon H.

    2013-04-27

    This report discusses the methodology, formulas, and inputs needed to make characterization and clearance decisions for Bacillus anthracis-contaminated and uncontaminated (or decontaminated) areas using a statistical sampling approach. Specifically, the report includes the methods and formulas for calculating the • number of samples required to achieve a specified confidence in characterization and clearance decisions • confidence in making characterization and clearance decisions for a specified number of samples for two common statistically based environmental sampling approaches. In particular, the report addresses an issue raised by the Government Accountability Office by providing methods and formulas to calculate the confidence that amore » decision area is uncontaminated (or successfully decontaminated) if all samples collected according to a statistical sampling approach have negative results. Key to addressing this topic is the probability that an individual sample result is a false negative, which is commonly referred to as the false negative rate (FNR). The two statistical sampling approaches currently discussed in this report are 1) hotspot sampling to detect small isolated contaminated locations during the characterization phase, and 2) combined judgment and random (CJR) sampling during the clearance phase. Typically if contamination is widely distributed in a decision area, it will be detectable via judgment sampling during the characterization phrase. Hotspot sampling is appropriate for characterization situations where contamination is not widely distributed and may not be detected by judgment sampling. CJR sampling is appropriate during the clearance phase when it is desired to augment judgment samples with statistical (random) samples. The hotspot and CJR statistical sampling approaches are discussed in the report for four situations: 1. qualitative data (detect and non-detect) when the FNR = 0 or when using statistical sampling methods that account for FNR > 0 2. qualitative data when the FNR > 0 but statistical sampling methods are used that assume the FNR = 0 3. quantitative data (e.g., contaminant concentrations expressed as CFU/cm2) when the FNR = 0 or when using statistical sampling methods that account for FNR > 0 4. quantitative data when the FNR > 0 but statistical sampling methods are used that assume the FNR = 0. For Situation 2, the hotspot sampling approach provides for stating with Z% confidence that a hotspot of specified shape and size with detectable contamination will be found. Also for Situation 2, the CJR approach provides for stating with X% confidence that at least Y% of the decision area does not contain detectable contamination. Forms of these statements for the other three situations are discussed in Section 2.2. Statistical methods that account for FNR > 0 currently only exist for the hotspot sampling approach with qualitative data (or quantitative data converted to qualitative data). This report documents the current status of methods and formulas for the hotspot and CJR sampling approaches. Limitations of these methods are identified. Extensions of the methods that are applicable when FNR = 0 to account for FNR > 0, or to address other limitations, will be documented in future revisions of this report if future funding supports the development of such extensions. For quantitative data, this report also presents statistical methods and formulas for 1. quantifying the uncertainty in measured sample results 2. estimating the true surface concentration corresponding to a surface sample 3. quantifying the uncertainty of the estimate of the true surface concentration. All of the methods and formulas discussed in the report were applied to example situations to illustrate application of the methods and interpretation of the results.« less

  16. Replicating studies in which samples of participants respond to samples of stimuli.

    PubMed

    Westfall, Jacob; Judd, Charles M; Kenny, David A

    2015-05-01

    In a direct replication, the typical goal is to reproduce a prior experimental result with a new but comparable sample of participants in a high-powered replication study. Often in psychology, the research to be replicated involves a sample of participants responding to a sample of stimuli. In replicating such studies, we argue that the same criteria should be used in sampling stimuli as are used in sampling participants. Namely, a new but comparable sample of stimuli should be used to ensure that the original results are not due to idiosyncrasies of the original stimulus sample, and the stimulus sample must often be enlarged to ensure high statistical power. In support of the latter point, we discuss the fact that in experiments involving samples of stimuli, statistical power typically does not approach 1 as the number of participants goes to infinity. As an example of the importance of sampling new stimuli, we discuss the bygone literature on the risky shift phenomenon, which was almost entirely based on a single stimulus sample that was later discovered to be highly unrepresentative. We discuss the use of both resampled and expanded stimulus sets, that is, stimulus samples that include the original stimuli plus new stimuli. © The Author(s) 2015.

  17. Seven ways to increase power without increasing N.

    PubMed

    Hansen, W B; Collins, L M

    1994-01-01

    Many readers of this monograph may wonder why a chapter on statistical power was included. After all, by now the issue of statistical power is in many respects mundane. Everyone knows that statistical power is a central research consideration, and certainly most National Institute on Drug Abuse grantees or prospective grantees understand the importance of including a power analysis in research proposals. However, there is ample evidence that, in practice, prevention researchers are not paying sufficient attention to statistical power. If they were, the findings observed by Hansen (1992) in a recent review of the prevention literature would not have emerged. Hansen (1992) examined statistical power based on 46 cohorts followed longitudinally, using nonparametric assumptions given the subjects' age at posttest and the numbers of subjects. Results of this analysis indicated that, in order for a study to attain 80-percent power for detecting differences between treatment and control groups, the difference between groups at posttest would need to be at least 8 percent (in the best studies) and as much as 16 percent (in the weakest studies). In order for a study to attain 80-percent power for detecting group differences in pre-post change, 22 of the 46 cohorts would have needed relative pre-post reductions of greater than 100 percent. Thirty-three of the 46 cohorts had less than 50-percent power to detect a 50-percent relative reduction in substance use. These results are consistent with other review findings (e.g., Lipsey 1990) that have shown a similar lack of power in a broad range of research topics. Thus, it seems that, although researchers are aware of the importance of statistical power (particularly of the necessity for calculating it when proposing research), they somehow are failing to end up with adequate power in their completed studies. This chapter argues that the failure of many prevention studies to maintain adequate statistical power is due to an overemphasis on sample size (N) as the only, or even the best, way to increase statistical power. It is easy to see how this overemphasis has come about. Sample size is easy to manipulate, has the advantage of being related to power in a straight-forward way, and usually is under the direct control of the researcher, except for limitations imposed by finances or subject availability. Another option for increasing power is to increase the alpha used for hypothesis-testing but, as very few researchers seriously consider significance levels much larger than the traditional .05, this strategy seldom is used. Of course, sample size is important, and the authors of this chapter are not recommending that researchers cease choosing sample sizes carefully. Rather, they argue that researchers should not confine themselves to increasing N to enhance power. It is important to take additional measures to maintain and improve power over and above making sure the initial sample size is sufficient. The authors recommend two general strategies. One strategy involves attempting to maintain the effective initial sample size so that power is not lost needlessly. The other strategy is to take measures to maximize the third factor that determines statistical power: effect size.

  18. Sampling and Data Analysis for Environmental Microbiology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Murray, Christopher J.

    2001-06-01

    A brief review of the literature indicates the importance of statistical analysis in applied and environmental microbiology. Sampling designs are particularly important for successful studies, and it is highly recommended that researchers review their sampling design before heading to the laboratory or the field. Most statisticians have numerous stories of scientists who approached them after their study was complete only to have to tell them that the data they gathered could not be used to test the hypothesis they wanted to address. Once the data are gathered, a large and complex body of statistical techniques are available for analysis ofmore » the data. Those methods include both numerical and graphical techniques for exploratory characterization of the data. Hypothesis testing and analysis of variance (ANOVA) are techniques that can be used to compare the mean and variance of two or more groups of samples. Regression can be used to examine the relationships between sets of variables and is often used to examine the dependence of microbiological populations on microbiological parameters. Multivariate statistics provides several methods that can be used for interpretation of datasets with a large number of variables and to partition samples into similar groups, a task that is very common in taxonomy, but also has applications in other fields of microbiology. Geostatistics and other techniques have been used to examine the spatial distribution of microorganisms. The objectives of this chapter are to provide a brief survey of some of the statistical techniques that can be used for sample design and data analysis of microbiological data in environmental studies, and to provide some examples of their use from the literature.« less

  19. Simulation Study Using a New Type of Sample Variance

    NASA Technical Reports Server (NTRS)

    Howe, D. A.; Lainson, K. J.

    1996-01-01

    We evaluate with simulated data a new type of sample variance for the characterization of frequency stability. The new statistic (referred to as TOTALVAR and its square root TOTALDEV) is a better predictor of long-term frequency variations than the present sample Allan deviation. The statistical model uses the assumption that a time series of phase or frequency differences is wrapped (periodic) with overall frequency difference removed. We find that the variability at long averaging times is reduced considerably for the five models of power-law noise commonly encountered with frequency standards and oscillators.

  20. Comparative Financial Statistics for Public Two-Year Colleges: FY 1995 Peer Groups Sample.

    ERIC Educational Resources Information Center

    Meeker, Bradley

    Comparative financial information derived from a national sample of 405 two-year colleges is presented in this report for fiscal year 1994-95, including data for the national sample and for 6groups of peer institutions. The first section provides introductory information on the annual study, discussing the study sample and the use of study…

  1. A re-evaluation of a case-control model with contaminated controls for resource selection studies

    Treesearch

    Christopher T. Rota; Joshua J. Millspaugh; Dylan C. Kesler; Chad P. Lehman; Mark A. Rumble; Catherine M. B. Jachowski

    2013-01-01

    A common sampling design in resource selection studies involves measuring resource attributes at sample units used by an animal and at sample units considered available for use. Few models can estimate the absolute probability of using a sample unit from such data, but such approaches are generally preferred over statistical methods that estimate a relative probability...

  2. [The research protocol III. Study population].

    PubMed

    Arias-Gómez, Jesús; Villasís-Keever, Miguel Ángel; Miranda-Novales, María Guadalupe

    2016-01-01

    The study population is defined as a set of cases, determined, limited, and accessible, that will constitute the subjects for the selection of the sample, and must fulfill several characteristics and distinct criteria. The objectives of this manuscript are focused on specifying each one of the elements required to make the selection of the participants of a research project, during the elaboration of the protocol, including the concepts of study population, sample, selection criteria and sampling methods. After delineating the study population, the researcher must specify the criteria that each participant has to comply. The criteria that include the specific characteristics are denominated selection or eligibility criteria. These criteria are inclusion, exclusion and elimination, and will delineate the eligible population. The sampling methods are divided in two large groups: 1) probabilistic or random sampling and 2) non-probabilistic sampling. The difference lies in the employment of statistical methods to select the subjects. In every research, it is necessary to establish at the beginning the specific number of participants to be included to achieve the objectives of the study. This number is the sample size, and can be calculated or estimated with mathematical formulas and statistic software.

  3. Extracting Hydrologic Understanding from the Unique Space-time Sampling of the Surface Water and Ocean Topography (SWOT) Mission

    NASA Astrophysics Data System (ADS)

    Nickles, C.; Zhao, Y.; Beighley, E.; Durand, M. T.; David, C. H.; Lee, H.

    2017-12-01

    The Surface Water and Ocean Topography (SWOT) satellite mission is jointly developed by NASA, the French space agency (CNES), with participation from the Canadian and UK space agencies to serve both the hydrology and oceanography communities. The SWOT mission will sample global surface water extents and elevations (lakes/reservoirs, rivers, estuaries, oceans, sea and land ice) at a finer spatial resolution than is currently possible enabling hydrologic discovery, model advancements and new applications that are not currently possible or likely even conceivable. Although the mission will provide global cover, analysis and interpolation of the data generated from the irregular space/time sampling represents a significant challenge. In this study, we explore the applicability of the unique space/time sampling for understanding river discharge dynamics throughout the Ohio River Basin. River network topology, SWOT sampling (i.e., orbit and identified SWOT river reaches) and spatial interpolation concepts are used to quantify the fraction of effective sampling of river reaches each day of the three-year mission. Streamflow statistics for SWOT generated river discharge time series are compared to continuous daily river discharge series. Relationships are presented to transform SWOT generated streamflow statistics to equivalent continuous daily discharge time series statistics intended to support hydrologic applications using low-flow and annual flow duration statistics.

  4. The (mis)reporting of statistical results in psychology journals.

    PubMed

    Bakker, Marjan; Wicherts, Jelte M

    2011-09-01

    In order to study the prevalence, nature (direction), and causes of reporting errors in psychology, we checked the consistency of reported test statistics, degrees of freedom, and p values in a random sample of high- and low-impact psychology journals. In a second study, we established the generality of reporting errors in a random sample of recent psychological articles. Our results, on the basis of 281 articles, indicate that around 18% of statistical results in the psychological literature are incorrectly reported. Inconsistencies were more common in low-impact journals than in high-impact journals. Moreover, around 15% of the articles contained at least one statistical conclusion that proved, upon recalculation, to be incorrect; that is, recalculation rendered the previously significant result insignificant, or vice versa. These errors were often in line with researchers' expectations. We classified the most common errors and contacted authors to shed light on the origins of the errors.

  5. A measure of the signal-to-noise ratio of microarray samples and studies using gene correlations.

    PubMed

    Venet, David; Detours, Vincent; Bersini, Hugues

    2012-01-01

    The quality of gene expression data can vary dramatically from platform to platform, study to study, and sample to sample. As reliable statistical analysis rests on reliable data, determining such quality is of the utmost importance. Quality measures to spot problematic samples exist, but they are platform-specific, and cannot be used to compare studies. As a proxy for quality, we propose a signal-to-noise ratio for microarray data, the "Signal-to-Noise Applied to Gene Expression Experiments", or SNAGEE. SNAGEE is based on the consistency of gene-gene correlations. We applied SNAGEE to a compendium of 80 large datasets on 37 platforms, for a total of 24,380 samples, and assessed the signal-to-noise ratio of studies and samples. This allowed us to discover serious issues with three studies. We show that signal-to-noise ratios of both studies and samples are linked to the statistical significance of the biological results. We showed that SNAGEE is an effective way to measure data quality for most types of gene expression studies, and that it often outperforms existing techniques. Furthermore, SNAGEE is platform-independent and does not require raw data files. The SNAGEE R package is available in BioConductor.

  6. Technology Tips: Sample Too Small? Probably Not!

    ERIC Educational Resources Information Center

    Strayer, Jeremy F.

    2013-01-01

    Statistical studies are referenced in the news every day, so frequently that people are sometimes skeptical of reported results. Often, no matter how large a sample size researchers use in their studies, people believe that the sample size is too small to make broad generalizations. The tasks presented in this article use simulations of repeated…

  7. Robust Covariate-Adjusted Log-Rank Statistics and Corresponding Sample Size Formula for Recurrent Events Data

    PubMed Central

    Song, Rui; Kosorok, Michael R.; Cai, Jianwen

    2009-01-01

    Summary Recurrent events data are frequently encountered in clinical trials. This article develops robust covariate-adjusted log-rank statistics applied to recurrent events data with arbitrary numbers of events under independent censoring and the corresponding sample size formula. The proposed log-rank tests are robust with respect to different data-generating processes and are adjusted for predictive covariates. It reduces to the Kong and Slud (1997, Biometrika 84, 847–862) setting in the case of a single event. The sample size formula is derived based on the asymptotic normality of the covariate-adjusted log-rank statistics under certain local alternatives and a working model for baseline covariates in the recurrent event data context. When the effect size is small and the baseline covariates do not contain significant information about event times, it reduces to the same form as that of Schoenfeld (1983, Biometrics 39, 499–503) for cases of a single event or independent event times within a subject. We carry out simulations to study the control of type I error and the comparison of powers between several methods in finite samples. The proposed sample size formula is illustrated using data from an rhDNase study. PMID:18162107

  8. US Geological Survey nutrient preservation experiment : experimental design, statistical analysis, and interpretation of analytical results

    USGS Publications Warehouse

    Patton, Charles J.; Gilroy, Edward J.

    1999-01-01

    Data on which this report is based, including nutrient concentrations in synthetic reference samples determined concurrently with those in real samples, are extensive (greater than 20,000 determinations) and have been published separately. In addition to confirming the well-documented instability of nitrite in acidified samples, this study also demonstrates that when biota are removed from samples at collection sites by 0.45-micrometer membrane filtration, subsequent preservation with sulfuric acid or mercury (II) provides no statistically significant improvement in nutrient concentration stability during storage at 4 degrees Celsius for 30 days. Biocide preservation had no statistically significant effect on the 30-day stability of phosphorus concentrations in whole-water splits from any of the 15 stations, but did stabilize Kjeldahl nitrogen concentrations in whole-water splits from three data-collection stations where ammonium accounted for at least half of the measured Kjeldahl nitrogen.

  9. Modified Distribution-Free Goodness-of-Fit Test Statistic.

    PubMed

    Chun, So Yeon; Browne, Michael W; Shapiro, Alexander

    2018-03-01

    Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.

  10. Confidence crisis of results in biomechanics research.

    PubMed

    Knudson, Duane

    2017-11-01

    Many biomechanics studies have small sample sizes and incorrect statistical analyses, so reporting of inaccurate inferences and inflated magnitude of effects are common in the field. This review examines these issues in biomechanics research and summarises potential solutions from research in other fields to increase the confidence in the experimental effects reported in biomechanics. Authors, reviewers and editors of biomechanics research reports are encouraged to improve sample sizes and the resulting statistical power, improve reporting transparency, improve the rigour of statistical analyses used, and increase the acceptance of replication studies to improve the validity of inferences from data in biomechanics research. The application of sports biomechanics research results would also improve if a larger percentage of unbiased effects and their uncertainty were reported in the literature.

  11. Preparing for the first meeting with a statistician.

    PubMed

    De Muth, James E

    2008-12-15

    Practical statistical issues that should be considered when performing data collection and analysis are reviewed. The meeting with a statistician should take place early in the research development before any study data are collected. The process of statistical analysis involves establishing the research question, formulating a hypothesis, selecting an appropriate test, sampling correctly, collecting data, performing tests, and making decisions. Once the objectives are established, the researcher can determine the characteristics or demographics of the individuals required for the study, how to recruit volunteers, what type of data are needed to answer the research question(s), and the best methods for collecting the required information. There are two general types of statistics: descriptive and inferential. Presenting data in a more palatable format for the reader is called descriptive statistics. Inferential statistics involve making an inference or decision about a population based on results obtained from a sample of that population. In order for the results of a statistical test to be valid, the sample should be representative of the population from which it is drawn. When collecting information about volunteers, researchers should only collect information that is directly related to the study objectives. Important information that a statistician will require first is an understanding of the type of variables involved in the study and which variables can be controlled by researchers and which are beyond their control. Data can be presented in one of four different measurement scales: nominal, ordinal, interval, or ratio. Hypothesis testing involves two mutually exclusive and exhaustive statements related to the research question. Statisticians should not be replaced by computer software, and they should be consulted before any research data are collected. When preparing to meet with a statistician, the pharmacist researcher should be familiar with the steps of statistical analysis and consider several questions related to the study to be conducted.

  12. Statistical scaling of pore-scale Lagrangian velocities in natural porous media.

    PubMed

    Siena, M; Guadagnini, A; Riva, M; Bijeljic, B; Pereira Nunes, J P; Blunt, M J

    2014-08-01

    We investigate the scaling behavior of sample statistics of pore-scale Lagrangian velocities in two different rock samples, Bentheimer sandstone and Estaillades limestone. The samples are imaged using x-ray computer tomography with micron-scale resolution. The scaling analysis relies on the study of the way qth-order sample structure functions (statistical moments of order q of absolute increments) of Lagrangian velocities depend on separation distances, or lags, traveled along the mean flow direction. In the sandstone block, sample structure functions of all orders exhibit a power-law scaling within a clearly identifiable intermediate range of lags. Sample structure functions associated with the limestone block display two diverse power-law regimes, which we infer to be related to two overlapping spatially correlated structures. In both rocks and for all orders q, we observe linear relationships between logarithmic structure functions of successive orders at all lags (a phenomenon that is typically known as extended power scaling, or extended self-similarity). The scaling behavior of Lagrangian velocities is compared with the one exhibited by porosity and specific surface area, which constitute two key pore-scale geometric observables. The statistical scaling of the local velocity field reflects the behavior of these geometric observables, with the occurrence of power-law-scaling regimes within the same range of lags for sample structure functions of Lagrangian velocity, porosity, and specific surface area.

  13. Statistical analysis of environmental monitoring data: does a worst case time for monitoring clean rooms exist?

    PubMed

    Cundell, A M; Bean, R; Massimore, L; Maier, C

    1998-01-01

    To determine the relationship between the sampling time of the environmental monitoring, i.e., viable counts, in aseptic filling areas and the microbial count and frequency of alerts for air, surface and personnel microbial monitoring, statistical analyses were conducted on 1) the frequency of alerts versus the time of day for routine environmental sampling conducted in calendar year 1994, and 2) environmental monitoring data collected at 30-minute intervals during routine aseptic filling operations over two separate days in four different clean rooms with multiple shifts and equipment set-ups at a parenteral manufacturing facility. Statistical analyses showed, except for one floor location that had significantly higher number of counts but no alert or action level samplings in the first two hours of operation, there was no relationship between the number of counts and the time of sampling. Further studies over a 30-day period at the floor location showed no relationship between time of sampling and microbial counts. The conclusion reached in the study was that there is no worst case time for environmental monitoring at that facility and that sampling any time during the aseptic filling operation will give a satisfactory measure of the microbial cleanliness in the clean room during the set-up and aseptic filling operation.

  14. Remineralization Property of an Orthodontic Primer Containing a Bioactive Glass with Silver and Zinc

    PubMed Central

    Lee, Seung-Min; Kim, In-Ryoung; Park, Bong-Soo; Ko, Ching-Chang; Son, Woo-Sung; Kim, Yong-Il

    2017-01-01

    White spot lesions (WSLs) are irreversible damages in orthodontic treatment due to excessive etching or demineralization by microorganisms. In this study, we conducted a mechanical and cell viability test to examine the antibacterial properties of 0.2% and 1% bioactive glass (BAG) and silver-doped and zinc-doped BAGs in a primer and evaluated their clinical applicability to prevent WSLs. The microhardness statistically significantly increased in the adhesive-containing BAG, while the other samples showed no statistically significant difference compared with the control group. The shear bond strength of all samples increased compared with that of the control group. The cell viability of the control and sample groups was similar within 24 h, but decreased slightly over 48 h. All samples showed antibacterial properties. Regarding remineralization property, the group containing 0.2% of the samples showed remineralization properties compared with the control group, but was not statistically significant; further, the group containing 1% of the samples showed a significant difference compared with the control group. Among them, the orthodontic bonding primer containing 1% silver-doped BAG showed the highest remineralization property. The new orthodontic bonding primer used in this study showed an antimicrobial effect, chemical remineralization effect, and WSL prevention as well as clinically applicable properties, both physically and biologically. PMID:29088092

  15. A First Assignment to Create Student Buy-In in an Introductory Business Statistics Course

    ERIC Educational Resources Information Center

    Newfeld, Daria

    2016-01-01

    This paper presents a sample assignment to be administered after the first two weeks of an introductory business focused statistics course in order to promote student buy-in. This assignment integrates graphical displays of data, descriptive statistics and cross-tabulation analysis through the lens of a marketing analysis study. A marketing sample…

  16. Introducing 3D U-statistic method for separating anomaly from background in exploration geochemical data with associated software development

    NASA Astrophysics Data System (ADS)

    Ghannadpour, Seyyed Saeed; Hezarkhani, Ardeshir

    2016-03-01

    The U-statistic method is one of the most important structural methods to separate the anomaly from the background. It considers the location of samples and carries out the statistical analysis of the data without judging from a geochemical point of view and tries to separate subpopulations and determine anomalous areas. In the present study, to use U-statistic method in three-dimensional (3D) condition, U-statistic is applied on the grade of two ideal test examples, by considering sample Z values (elevation). So far, this is the first time that this method has been applied on a 3D condition. To evaluate the performance of 3D U-statistic method and in order to compare U-statistic with one non-structural method, the method of threshold assessment based on median and standard deviation (MSD method) is applied on the two example tests. Results show that the samples indicated by U-statistic method as anomalous are more regular and involve less dispersion than those indicated by the MSD method. So that, according to the location of anomalous samples, denser areas of them can be determined as promising zones. Moreover, results show that at a threshold of U = 0, the total error of misclassification for U-statistic method is much smaller than the total error of criteria of bar {x}+n× s. Finally, 3D model of two test examples for separating anomaly from background using 3D U-statistic method is provided. The source code for a software program, which was developed in the MATLAB programming language in order to perform the calculations of the 3D U-spatial statistic method, is additionally provided. This software is compatible with all the geochemical varieties and can be used in similar exploration projects.

  17. Design-based Sample and Probability Law-Assumed Sample: Their Role in Scientific Investigation.

    ERIC Educational Resources Information Center

    Ojeda, Mario Miguel; Sahai, Hardeo

    2002-01-01

    Discusses some key statistical concepts in probabilistic and non-probabilistic sampling to provide an overview for understanding the inference process. Suggests a statistical model constituting the basis of statistical inference and provides a brief review of the finite population descriptive inference and a quota sampling inferential theory.…

  18. Comparative Financial Statistics for Public Two-Year Colleges: FY 1991 National Sample.

    ERIC Educational Resources Information Center

    Dickmeyer, Nathan; Cirino, Anna Marie

    This report provides comparative financial information derived from a national sample of 503 public two-year colleges. The report includes space for colleges to compare their institutional statistics with data provided on national sample medians; quartile data for the national sample; and statistics presented in various formats, including tables,…

  19. Learning Model and Form of Assesment toward the Inferensial Statistical Achievement by Controlling Numeric Thinking Skills

    ERIC Educational Resources Information Center

    Widiana, I. Wayan; Jampel, I. Nyoman

    2016-01-01

    This study aimed to find out the effect of learning model and form of assessment toward inferential statistical achievement after controlling numeric thinking skills. This study was quasi experimental study with 130 students as the sample. The data analysis used ANCOVA. After controlling numeric thinking skills, the result of this study show that:…

  20. Possession experiences in dissociative identity disorder: a preliminary study.

    PubMed

    Ross, Colin A

    2011-01-01

    Dissociative trance disorder, which includes possession experiences, was introduced as a provisional diagnosis requiring further study in the Diagnostic and Statistical Manual of Mental Disorders (4th ed.). Consideration is now being given to including possession experiences within dissociative identity disorder (DID) in the Diagnostic and Statistical Manual of Mental Disorders (5th ed.), which is due to be published in 2013. In order to provide empirical data relevant to the relationship between DID and possession states, I analyzed data on the prevalence of trance, possession states, sleepwalking, and paranormal experiences in 3 large samples: patients with DID from North America; psychiatric outpatients from Shanghai, China; and a general population sample from Winnipeg, Canada. Trance, sleepwalking, paranormal, and possession experiences were much more common in the DID patients than in the 2 comparison samples. The study is preliminary and exploratory in nature because the samples were not matched in any way.

  1. A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data

    PubMed Central

    Chen, Yi-Hau

    2017-01-01

    Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA. PMID:28622336

  2. A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data.

    PubMed

    Lai, En-Yu; Chen, Yi-Hau; Wu, Kun-Pin

    2017-06-01

    Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA.

  3. Rank-based testing of equal survivorship based on cross-sectional survival data with or without prospective follow-up.

    PubMed

    Chan, Kwun Chuen Gary; Qin, Jing

    2015-10-01

    Existing linear rank statistics cannot be applied to cross-sectional survival data without follow-up since all subjects are essentially censored. However, partial survival information are available from backward recurrence times and are frequently collected from health surveys without prospective follow-up. Under length-biased sampling, a class of linear rank statistics is proposed based only on backward recurrence times without any prospective follow-up. When follow-up data are available, the proposed rank statistic and a conventional rank statistic that utilizes follow-up information from the same sample are shown to be asymptotically independent. We discuss four ways to combine these two statistics when follow-up is present. Simulations show that all combined statistics have substantially improved power compared with conventional rank statistics, and a Mantel-Haenszel test performed the best among the proposal statistics. The method is applied to a cross-sectional health survey without follow-up and a study of Alzheimer's disease with prospective follow-up. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  4. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment

    PubMed Central

    Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P.; Patterson, Nick; Price, Alkes L.

    2014-01-01

    Motivation: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. Results: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1–5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case–control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of χ2 association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Availability and implementation: Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. Contact: bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:24990607

  5. Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials

    PubMed Central

    Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo

    2018-01-01

    This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555

  6. Statistical methods used in the public health literature and implications for training of public health professionals

    PubMed Central

    Hayat, Matthew J.; Powell, Amanda; Johnson, Tessa; Cadwell, Betsy L.

    2017-01-01

    Statistical literacy and knowledge is needed to read and understand the public health literature. The purpose of this study was to quantify basic and advanced statistical methods used in public health research. We randomly sampled 216 published articles from seven top tier general public health journals. Studies were reviewed by two readers and a standardized data collection form completed for each article. Data were analyzed with descriptive statistics and frequency distributions. Results were summarized for statistical methods used in the literature, including descriptive and inferential statistics, modeling, advanced statistical techniques, and statistical software used. Approximately 81.9% of articles reported an observational study design and 93.1% of articles were substantively focused. Descriptive statistics in table or graphical form were reported in more than 95% of the articles, and statistical inference reported in more than 76% of the studies reviewed. These results reveal the types of statistical methods currently used in the public health literature. Although this study did not obtain information on what should be taught, information on statistical methods being used is useful for curriculum development in graduate health sciences education, as well as making informed decisions about continuing education for public health professionals. PMID:28591190

  7. Statistical methods used in the public health literature and implications for training of public health professionals.

    PubMed

    Hayat, Matthew J; Powell, Amanda; Johnson, Tessa; Cadwell, Betsy L

    2017-01-01

    Statistical literacy and knowledge is needed to read and understand the public health literature. The purpose of this study was to quantify basic and advanced statistical methods used in public health research. We randomly sampled 216 published articles from seven top tier general public health journals. Studies were reviewed by two readers and a standardized data collection form completed for each article. Data were analyzed with descriptive statistics and frequency distributions. Results were summarized for statistical methods used in the literature, including descriptive and inferential statistics, modeling, advanced statistical techniques, and statistical software used. Approximately 81.9% of articles reported an observational study design and 93.1% of articles were substantively focused. Descriptive statistics in table or graphical form were reported in more than 95% of the articles, and statistical inference reported in more than 76% of the studies reviewed. These results reveal the types of statistical methods currently used in the public health literature. Although this study did not obtain information on what should be taught, information on statistical methods being used is useful for curriculum development in graduate health sciences education, as well as making informed decisions about continuing education for public health professionals.

  8. U.S.-MEXICO BORDER PROGRAM ARIZONA BORDER STUDY--STANDARD OPERATING PROCEDURE FOR SAMPLING WEIGHT CALCULATION (IIT-A-9.0)

    EPA Science Inventory

    The purpose of this SOP is to describe the procedures undertaken to calculate sampling weights. The sampling weights are needed to obtain weighted statistics of the study data. This SOP uses data that have been properly coded and certified with appropriate QA/QC procedures by th...

  9. 7 CFR 52.38c - Statistical sampling procedures for lot inspection of processed fruits and vegetables by attributes.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Statistical sampling procedures for lot inspection of processed fruits and vegetables by attributes. 52.38c Section 52.38c Agriculture Regulations of the... Regulations Governing Inspection and Certification Sampling § 52.38c Statistical sampling procedures for lot...

  10. 7 CFR 52.38b - Statistical sampling procedures for on-line inspection by attributes of processed fruits and...

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Statistical sampling procedures for on-line inspection by attributes of processed fruits and vegetables. 52.38b Section 52.38b Agriculture Regulations of... Regulations Governing Inspection and Certification Sampling § 52.38b Statistical sampling procedures for on...

  11. 7 CFR 52.38b - Statistical sampling procedures for on-line inspection by attributes of processed fruits and...

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Statistical sampling procedures for on-line inspection by attributes of processed fruits and vegetables. 52.38b Section 52.38b Agriculture Regulations of... Regulations Governing Inspection and Certification Sampling § 52.38b Statistical sampling procedures for on...

  12. 7 CFR 52.38c - Statistical sampling procedures for lot inspection of processed fruits and vegetables by attributes.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Statistical sampling procedures for lot inspection of processed fruits and vegetables by attributes. 52.38c Section 52.38c Agriculture Regulations of the... Regulations Governing Inspection and Certification Sampling § 52.38c Statistical sampling procedures for lot...

  13. 75 FR 38871 - Proposed Collection; Comment Request for Revenue Procedure 2004-29

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-07-06

    ... comments concerning Revenue Procedure 2004-29, Statistical Sampling in Sec. 274 Context. DATES: Written... Internet, at [email protected] . SUPPLEMENTARY INFORMATION: Title: Statistical Sampling in Sec...: Revenue Procedure 2004-29 prescribes the statistical sampling methodology by which taxpayers under...

  14. Time Series Analysis Based on Running Mann Whitney Z Statistics

    USDA-ARS?s Scientific Manuscript database

    A sensitive and objective time series analysis method based on the calculation of Mann Whitney U statistics is described. This method samples data rankings over moving time windows, converts those samples to Mann-Whitney U statistics, and then normalizes the U statistics to Z statistics using Monte-...

  15. Reliability and statistical power analysis of cortical and subcortical FreeSurfer metrics in a large sample of healthy elderly.

    PubMed

    Liem, Franziskus; Mérillat, Susan; Bezzola, Ladina; Hirsiger, Sarah; Philipp, Michel; Madhyastha, Tara; Jäncke, Lutz

    2015-03-01

    FreeSurfer is a tool to quantify cortical and subcortical brain anatomy automatically and noninvasively. Previous studies have reported reliability and statistical power analyses in relatively small samples or only selected one aspect of brain anatomy. Here, we investigated reliability and statistical power of cortical thickness, surface area, volume, and the volume of subcortical structures in a large sample (N=189) of healthy elderly subjects (64+ years). Reliability (intraclass correlation coefficient) of cortical and subcortical parameters is generally high (cortical: ICCs>0.87, subcortical: ICCs>0.95). Surface-based smoothing increases reliability of cortical thickness maps, while it decreases reliability of cortical surface area and volume. Nevertheless, statistical power of all measures benefits from smoothing. When aiming to detect a 10% difference between groups, the number of subjects required to test effects with sufficient power over the entire cortex varies between cortical measures (cortical thickness: N=39, surface area: N=21, volume: N=81; 10mm smoothing, power=0.8, α=0.05). For subcortical regions this number is between 16 and 76 subjects, depending on the region. We also demonstrate the advantage of within-subject designs over between-subject designs. Furthermore, we publicly provide a tool that allows researchers to perform a priori power analysis and sensitivity analysis to help evaluate previously published studies and to design future studies with sufficient statistical power. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. CAN'T MISS--conquer any number task by making important statistics simple. Part 1. Types of variables, mean, median, variance, and standard deviation.

    PubMed

    Hansen, John P

    2003-01-01

    Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 1, presents basic information about data including a classification system that describes the four major types of variables: continuous quantitative variable, discrete quantitative variable, ordinal categorical variable (including the binomial variable), and nominal categorical variable. A histogram is a graph that displays the frequency distribution for a continuous variable. The article also demonstrates how to calculate the mean, median, standard deviation, and variance for a continuous variable.

  17. Four hundred or more participants needed for stable contingency table estimates of clinical prediction rule performance.

    PubMed

    Kent, Peter; Boyle, Eleanor; Keating, Jennifer L; Albert, Hanne B; Hartvigsen, Jan

    2017-02-01

    To quantify variability in the results of statistical analyses based on contingency tables and discuss the implications for the choice of sample size for studies that derive clinical prediction rules. An analysis of three pre-existing sets of large cohort data (n = 4,062-8,674) was performed. In each data set, repeated random sampling of various sample sizes, from n = 100 up to n = 2,000, was performed 100 times at each sample size and the variability in estimates of sensitivity, specificity, positive and negative likelihood ratios, posttest probabilities, odds ratios, and risk/prevalence ratios for each sample size was calculated. There were very wide, and statistically significant, differences in estimates derived from contingency tables from the same data set when calculated in sample sizes below 400 people, and typically, this variability stabilized in samples of 400-600 people. Although estimates of prevalence also varied significantly in samples below 600 people, that relationship only explains a small component of the variability in these statistical parameters. To reduce sample-specific variability, contingency tables should consist of 400 participants or more when used to derive clinical prediction rules or test their performance. Copyright © 2016 Elsevier Inc. All rights reserved.

  18. A Monte Carlo Study of Levene's Test of Homogeneity of Variance: Empirical Frequencies of Type I Error in Normal Distributions.

    ERIC Educational Resources Information Center

    Neel, John H.; Stallings, William M.

    An influential statistics test recommends a Levene text for homogeneity of variance. A recent note suggests that Levene's test is upwardly biased for small samples. Another report shows inflated Alpha estimates and low power. Neither study utilized more than two sample sizes. This Monte Carlo study involved sampling from a normal population for…

  19. Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data

    PubMed Central

    Kim, Sung-Min; Choi, Yosoon

    2017-01-01

    To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z-score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z-scores: high content with a high z-score (HH), high content with a low z-score (HL), low content with a high z-score (LH), and low content with a low z-score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1–4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required. PMID:28629168

  20. Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data.

    PubMed

    Kim, Sung-Min; Choi, Yosoon

    2017-06-18

    To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z -score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z -scores: high content with a high z -score (HH), high content with a low z -score (HL), low content with a high z -score (LH), and low content with a low z -score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1-4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required.

  1. ADEQUACY OF VISUALLY CLASSIFIED PARTICLE COUNT STATISTICS FROM REGIONAL STREAM HABITAT SURVEYS

    EPA Science Inventory

    Streamlined sampling procedures must be used to achieve a sufficient sample size with limited resources in studies undertaken to evaluate habitat status and potential management-related habitat degradation at a regional scale. At the same time, these sampling procedures must achi...

  2. 75 FR 53738 - Proposed Collection; Comment Request for Rev. Proc. 2007-35

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-09-01

    ... Revenue Procedure Revenue Procedure 2007-35, Statistical Sampling for purposes of Section 199. DATES... through the Internet, at [email protected] . SUPPLEMENTARY INFORMATION: Title: Statistical Sampling...: This revenue procedure provides for determining when statistical sampling may be used in purposes of...

  3. Comparative study of dental cephalometric patterns of Japanese-Brazilian, Caucasian and Mongoloid patients

    PubMed Central

    Sathler, Renata; Pinzan, Arnaldo; Fernandes, Thais Maria Freire; de Almeida, Renato Rodrigues; Henriques, José Fernando Castanha

    2014-01-01

    Introduction The objective of this study was to identify the patterns of dental variables of adolescent Japanese-Brazilian descents with normal occlusion, and also to compare them with a similar Caucasian and Mongoloid sample. Methods Lateral cephalometric radiographs were used to compare the groups: Caucasian (n = 40), Japanese-Brazilian (n = 32) and Mongoloid (n = 33). The statistical tests used were one-way ANOVA and ANCOVA. The cephalometric measurements used followed the analyses of Steiner, Tweed and McNamara Jr. Results Statistical differences (P < 0.05) indicated a smaller interincisal angle and overbite for the Japanese-Brazilian sample, when compared to the Caucasian sample, although with similar values to the Mongoloid group. Conclusion The dental patterns found for the Japanese-Brazilian descents were, in general, more similar to those of the Mongoloid sample. PMID:25279521

  4. Analysis of the Einstein sample of early-type galaxies

    NASA Technical Reports Server (NTRS)

    Eskridge, Paul B.; Fabbiano, Giuseppina

    1993-01-01

    The EINSTEIN galaxy catalog contains x-ray data for 148 early-type (E and SO) galaxies. A detailed analysis of the global properties of this sample are studied. By comparing the x-ray properties with other tracers of the ISM, as well as with observables related to the stellar dynamics and populations of the sample, we expect to determine more clearly the physical relationships that determine the evolution of early-type galaxies. Previous studies with smaller samples have explored the relationships between x-ray luminosity (L(sub x)) and luminosities in other bands. Using our larger sample and the statistical techniques of survival analysis, a number of these earlier analyses were repeated. For our full sample, a strong statistical correlation is found between L(sub X) and L(sub B) (the probability that the null hypothesis is upheld is P less than 10(exp -4) from a variety of rank correlation tests. Regressions with several algorithms yield consistent results.

  5. Comparative forensic soil analysis of New Jersey state parks using a combination of simple techniques with multivariate statistics.

    PubMed

    Bonetti, Jennifer; Quarino, Lawrence

    2014-05-01

    This study has shown that the combination of simple techniques with the use of multivariate statistics offers the potential for the comparative analysis of soil samples. Five samples were obtained from each of twelve state parks across New Jersey in both the summer and fall seasons. Each sample was examined using particle-size distribution, pH analysis in both water and 1 M CaCl2 , and a loss on ignition technique. Data from each of the techniques were combined, and principal component analysis (PCA) and canonical discriminant analysis (CDA) were used for multivariate data transformation. Samples from different locations could be visually differentiated from one another using these multivariate plots. Hold-one-out cross-validation analysis showed error rates as low as 3.33%. Ten blind study samples were analyzed resulting in no misclassifications using Mahalanobis distance calculations and visual examinations of multivariate plots. Seasonal variation was minimal between corresponding samples, suggesting potential success in forensic applications. © 2014 American Academy of Forensic Sciences.

  6. Exploring the Impact of Varying Levels of Augmented Reality to Teach Probability and Sampling with a Mobile Device

    ERIC Educational Resources Information Center

    Conley, Quincy

    2013-01-01

    Statistics is taught at every level of education, yet teachers often have to assume their students have no knowledge of statistics and start from scratch each time they set out to teach statistics. The motivation for this experimental study comes from interest in exploring educational applications of augmented reality (AR) delivered via mobile…

  7. Student and Teacher Factors as Predictors of Statistics Achievement in Federal School of Statistics Ibadan

    ERIC Educational Resources Information Center

    Adetona, Abel Adekanmi

    2017-01-01

    The study aimed at assessing how students and teachers factor taken together influence students' achievement in Statistics as well as their relative contribution to the prediction. Two research questions were raised and purposive sampling was adopted to select national diploma year 2 students since they are already in their final level in the…

  8. Using public control genotype data to increase power and decrease cost of case-control genetic association studies.

    PubMed

    Ho, Lindsey A; Lange, Ethan M

    2010-12-01

    Genome-wide association (GWA) studies are a powerful approach for identifying novel genetic risk factors associated with human disease. A GWA study typically requires the inclusion of thousands of samples to have sufficient statistical power to detect single nucleotide polymorphisms that are associated with only modest increases in risk of disease given the heavy burden of a multiple test correction that is necessary to maintain valid statistical tests. Low statistical power and the high financial cost of performing a GWA study remains prohibitive for many scientific investigators anxious to perform such a study using their own samples. A number of remedies have been suggested to increase statistical power and decrease cost, including the utilization of free publicly available genotype data and multi-stage genotyping designs. Herein, we compare the statistical power and relative costs of alternative association study designs that use cases and screened controls to study designs that are based only on, or additionally include, free public control genotype data. We describe a novel replication-based two-stage study design, which uses free public control genotype data in the first stage and follow-up genotype data on case-matched controls in the second stage that preserves many of the advantages inherent when using only an epidemiologically matched set of controls. Specifically, we show that our proposed two-stage design can substantially increase statistical power and decrease cost of performing a GWA study while controlling the type-I error rate that can be inflated when using public controls due to differences in ancestry and batch genotype effects.

  9. Probabilistic reasoning under time pressure: an assessment in Italian, Spanish and English psychology undergraduates

    NASA Astrophysics Data System (ADS)

    Agus, M.; Hitchcott, P. K.; Penna, M. P.; Peró-Cebollero, M.; Guàrdia-Olmos, J.

    2016-11-01

    Many studies have investigated the features of probabilistic reasoning developed in relation to different formats of problem presentation, showing that it is affected by various individual and contextual factors. Incomplete understanding of the identity and role of these factors may explain the inconsistent evidence concerning the effect of problem presentation format. Thus, superior performance has sometimes been observed for graphically, rather than verbally, presented problems. The present study was undertaken to address this issue. Psychology undergraduates without any statistical expertise (N = 173 in Italy; N = 118 in Spain; N = 55 in England) were administered statistical problems in two formats (verbal-numerical and graphical-pictorial) under a condition of time pressure. Students also completed additional measures indexing several potentially relevant individual dimensions (statistical ability, statistical anxiety, attitudes towards statistics and confidence). Interestingly, a facilitatory effect of graphical presentation was observed in the Italian and Spanish samples but not in the English one. Significantly, the individual dimensions predicting statistical performance also differed between the samples, highlighting a different role of confidence. Hence, these findings confirm previous observations concerning problem presentation format while simultaneously highlighting the importance of individual dimensions.

  10. Testing the non-unity of rate ratio under inverse sampling.

    PubMed

    Tang, Man-Lai; Liao, Yi Jie; Ng, Hong Keung Tony; Chan, Ping Shing

    2007-08-01

    Inverse sampling is considered to be a more appropriate sampling scheme than the usual binomial sampling scheme when subjects arrive sequentially, when the underlying response of interest is acute, and when maximum likelihood estimators of some epidemiologic indices are undefined. In this article, we study various statistics for testing non-unity rate ratios in case-control studies under inverse sampling. These include the Wald, unconditional score, likelihood ratio and conditional score statistics. Three methods (the asymptotic, conditional exact, and Mid-P methods) are adopted for P-value calculation. We evaluate the performance of different combinations of test statistics and P-value calculation methods in terms of their empirical sizes and powers via Monte Carlo simulation. In general, asymptotic score and conditional score tests are preferable for their actual type I error rates are well controlled around the pre-chosen nominal level, and their powers are comparatively the largest. The exact version of Wald test is recommended if one wants to control the actual type I error rate at or below the pre-chosen nominal level. If larger power is expected and fluctuation of sizes around the pre-chosen nominal level are allowed, then the Mid-P version of Wald test is a desirable alternative. We illustrate the methodologies with a real example from a heart disease study. (c) 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

  11. Sampling benthic macroinvertebrates in a large flood-plain river: Considerations of study design, sample size, and cost

    USGS Publications Warehouse

    Bartsch, L.A.; Richardson, W.B.; Naimo, T.J.

    1998-01-01

    Estimation of benthic macroinvertebrate populations over large spatial scales is difficult due to the high variability in abundance and the cost of sample processing and taxonomic analysis. To determine a cost-effective, statistically powerful sample design, we conducted an exploratory study of the spatial variation of benthic macroinvertebrates in a 37 km reach of the Upper Mississippi River. We sampled benthos at 36 sites within each of two strata, contiguous backwater and channel border. Three standard ponar (525 cm(2)) grab samples were obtained at each site ('Original Design'). Analysis of variance and sampling cost of strata-wide estimates for abundance of Oligochaeta, Chironomidae, and total invertebrates showed that only one ponar sample per site ('Reduced Design') yielded essentially the same abundance estimates as the Original Design, while reducing the overall cost by 63%. A posteriori statistical power analysis (alpha = 0.05, beta = 0.20) on the Reduced Design estimated that at least 18 sites per stratum were needed to detect differences in mean abundance between contiguous backwater and channel border areas for Oligochaeta, Chironomidae, and total invertebrates. Statistical power was nearly identical for the three taxonomic groups. The abundances of several taxa of concern (e.g., Hexagenia mayflies and Musculium fingernail clams) were too spatially variable to estimate power with our method. Resampling simulations indicated that to achieve adequate sampling precision for Oligochaeta, at least 36 sample sites per stratum would be required, whereas a sampling precision of 0.2 would not be attained with any sample size for Hexagenia in channel border areas, or Chironomidae and Musculium in both strata given the variance structure of the original samples. Community-wide diversity indices (Brillouin and 1-Simpsons) increased as sample area per site increased. The backwater area had higher diversity than the channel border area. The number of sampling sites required to sample benthic macroinvertebrates during our sampling period depended on the study objective and ranged from 18 to more than 40 sites per stratum. No single sampling regime would efficiently and adequately sample all components of the macroinvertebrate community.

  12. Methods for flexible sample-size design in clinical trials: Likelihood, weighted, dual test, and promising zone approaches.

    PubMed

    Shih, Weichung Joe; Li, Gang; Wang, Yining

    2016-03-01

    Sample size plays a crucial role in clinical trials. Flexible sample-size designs, as part of the more general category of adaptive designs that utilize interim data, have been a popular topic in recent years. In this paper, we give a comparative review of four related methods for such a design. The likelihood method uses the likelihood ratio test with an adjusted critical value. The weighted method adjusts the test statistic with given weights rather than the critical value. The dual test method requires both the likelihood ratio statistic and the weighted statistic to be greater than the unadjusted critical value. The promising zone approach uses the likelihood ratio statistic with the unadjusted value and other constraints. All four methods preserve the type-I error rate. In this paper we explore their properties and compare their relationships and merits. We show that the sample size rules for the dual test are in conflict with the rules of the promising zone approach. We delineate what is necessary to specify in the study protocol to ensure the validity of the statistical procedure and what can be kept implicit in the protocol so that more flexibility can be attained for confirmatory phase III trials in meeting regulatory requirements. We also prove that under mild conditions, the likelihood ratio test still preserves the type-I error rate when the actual sample size is larger than the re-calculated one. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Differential gene expression detection and sample classification using penalized linear regression models.

    PubMed

    Wu, Baolin

    2006-02-15

    Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.

  14. Samples in applied psychology: over a decade of research in review.

    PubMed

    Shen, Winny; Kiger, Thomas B; Davies, Stacy E; Rasch, Rena L; Simon, Kara M; Ones, Deniz S

    2011-09-01

    This study examines sample characteristics of articles published in Journal of Applied Psychology (JAP) from 1995 to 2008. At the individual level, the overall median sample size over the period examined was approximately 173, which is generally adequate for detecting the average magnitude of effects of primary interest to researchers who publish in JAP. Samples using higher units of analyses (e.g., teams, departments/work units, and organizations) had lower median sample sizes (Mdn ≈ 65), yet were arguably robust given typical multilevel design choices of JAP authors despite the practical constraints of collecting data at higher units of analysis. A substantial proportion of studies used student samples (~40%); surprisingly, median sample sizes for student samples were smaller than working adult samples. Samples were more commonly occupationally homogeneous (~70%) than occupationally heterogeneous. U.S. and English-speaking participants made up the vast majority of samples, whereas Middle Eastern, African, and Latin American samples were largely unrepresented. On the basis of study results, recommendations are provided for authors, editors, and readers, which converge on 3 themes: (a) appropriateness and match between sample characteristics and research questions, (b) careful consideration of statistical power, and (c) the increased popularity of quantitative synthesis. Implications are discussed in terms of theory building, generalizability of research findings, and statistical power to detect effects. PsycINFO Database Record (c) 2011 APA, all rights reserved

  15. 78 FR 43002 - Proposed Collection; Comment Request for Revenue Procedure 2004-29

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-07-18

    ... comments concerning statistical sampling in Sec. 274 Context. DATES: Written comments should be received on... INFORMATION: Title: Statistical Sampling in Sec. 274 Contest. OMB Number: 1545-1847. Revenue Procedure Number: Revenue Procedure 2004-29. Abstract: Revenue Procedure 2004-29 prescribes the statistical sampling...

  16. 42 CFR 1003.133 - Statistical sampling.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 42 Public Health 5 2014-10-01 2014-10-01 false Statistical sampling. 1003.133 Section 1003.133 Public Health OFFICE OF INSPECTOR GENERAL-HEALTH CARE, DEPARTMENT OF HEALTH AND HUMAN SERVICES OIG AUTHORITIES CIVIL MONEY PENALTIES, ASSESSMENTS AND EXCLUSIONS § 1003.133 Statistical sampling. (a) In meeting...

  17. Outcomes of Zika virus infection during pregnancy: contributions to the debate on the efficiency of cohort studies.

    PubMed

    Duarte, Elisabeth Carmen; Garcia, Leila Posenato; de Araújo, Wildo Navegantes; Velez, Maria P

    2017-12-02

    Zika infection during pregnancy (ZIKVP) is known to be associated with adverse outcomes. Studies on this matter involve both rare outcomes and rare exposures and methodological choices are not straightforward. Cohort studies will surely offer more robust evidences, but their efficiency must be enhanced. We aim to contribute to the debate on sample selection strategies in cohort studies to assess outcomes associated with ZKVP. A study can be statistically more efficient than another if its estimates are more accurate (precise and valid), even if the studies involve the same number of subjects. Sample size and specific design strategies can enhance or impair the statistical efficiency of a study, depending on how the subjects are distributed in subgroups pertinent to the analysis. In most ZIKVP cohort studies to date there is an a priori identification of the source population (pregnant women, regardless of their exposure status) which is then sampled or included in its entirety (census). Subsequently, the group of pregnant women is classified according to exposure (presence or absence of ZIKVP), respecting the exposed:unexposed ratio in the source population. We propose that the sample selection be done from the a priori identification of groups of pregnant women exposed and unexposed to ZIKVP. This method will allow for an oversampling (even 100%) of the pregnant women with ZKVP and a optimized sampling from the general population of pregnant women unexposed to ZIKVP, saving resources in the unexposed group and improving the expected number of incident cases (outcomes) overall. We hope that this proposal will broaden the methodological debate on the improvement of statistical power and protocol harmonization of cohort studies that aim to evaluate the association between Zika infection during pregnancy and outcomes for the offspring, as well as those with similar objectives.

  18. A Comparison of EPI Sampling, Probability Sampling, and Compact Segment Sampling Methods for Micro and Small Enterprises

    PubMed Central

    Chao, Li-Wei; Szrek, Helena; Peltzer, Karl; Ramlagan, Shandir; Fleming, Peter; Leite, Rui; Magerman, Jesswill; Ngwenya, Godfrey B.; Pereira, Nuno Sousa; Behrman, Jere

    2011-01-01

    Finding an efficient method for sampling micro- and small-enterprises (MSEs) for research and statistical reporting purposes is a challenge in developing countries, where registries of MSEs are often nonexistent or outdated. This lack of a sampling frame creates an obstacle in finding a representative sample of MSEs. This study uses computer simulations to draw samples from a census of businesses and non-businesses in the Tshwane Municipality of South Africa, using three different sampling methods: the traditional probability sampling method, the compact segment sampling method, and the World Health Organization’s Expanded Programme on Immunization (EPI) sampling method. Three mechanisms by which the methods could differ are tested, the proximity selection of respondents, the at-home selection of respondents, and the use of inaccurate probability weights. The results highlight the importance of revisits and accurate probability weights, but the lesser effect of proximity selection on the samples’ statistical properties. PMID:22582004

  19. Intraocular and systemic levels of vascular endothelial growth factor in advanced cases of retinopathy of prematurity

    PubMed Central

    Velez-Montoya, Raul; Clapp, Carmen; Rivera, Jose Carlos; Garcia-Aguirre, Gerardo; Morales-Cantón, Virgilio; Fromow-Guerra, Jans; Guerrero-Naranjo, Jose Luis; Quiroz-Mercado, Hugo

    2010-01-01

    Purpose: To measure vitreous, aqueous, subretinal fluid and plasma levels of vascular endothelial growth factor in late stages of retinopathy of prematurity. Methods: Interventional study. We enrolled patients with clinical diagnoses of bilateral stage V retinopathy of prematurity, confirmed by b-scan ultrasound and programmed for vitrectomy. During surgery we took samples from blood, aqueous, vitreous, and subretinal fluids. The vascular endothelial growth factor concentration in each sample was measured by ELISA reaction. A control sample of aqueous, vitreous and blood was taken from patients with congenital cataract programmed for phacoemulsification. For statistical analysis, a Mann–Whitney and a Wilcoxon W test was done with a significant P value of 0.05. Results: We took samples of 16 consecutive patients who met the inclusion criteria. The vascular endothelial growth factor levels in the study group were: aqueous, 76.81 ± 61.89 pg/mL; vitreous, 118.53 ± 65.87 pg/mL; subretinal fluid, 1636.58 ± 356.47 pg/mL; and plasma, 74.64 ± 43.94 pg/mL. There was a statistical difference between the study and the control group (P < 0.001) in the aqueous and vitreous samples. Conclusion: Stage 5 retinopathy of prematurity has elevated intraocular levels of vascular endothelial growth factor, which remains high despite severe retinal lesion. There was no statistical difference in plasma levels of the molecule between the control and study group. PMID:20856587

  20. Comparative Financial Statistics for Public Two-Year Colleges: FY 1994 Peer Groups Sample.

    ERIC Educational Resources Information Center

    Dickmeyer, Nathan; Meeker, Bradley

    Comparative financial information derived from a national sample of 427 two-year colleges is presented in this report for fiscal year 1993-94, including data for the national sample and 6 groups of peer institutions. The first section provides introductory information on the annual study, reviewing the objectives of the study and potential uses of…

  1. Sample Invariance of the Structural Equation Model and the Item Response Model: A Case Study.

    ERIC Educational Resources Information Center

    Breithaupt, Krista; Zumbo, Bruno D.

    2002-01-01

    Evaluated the sample invariance of item discrimination statistics in a case study using real data, responses of 10 random samples of 500 people to a depression scale. Results lend some support to the hypothesized superiority of a two-parameter item response model over the common form of structural equation modeling, at least when responses are…

  2. The classification of secondary colorectal liver cancer in human biopsy samples using angular dispersive x-ray diffraction and multivariate analysis

    NASA Astrophysics Data System (ADS)

    Theodorakou, Chrysoula; Farquharson, Michael J.

    2009-08-01

    The motivation behind this study is to assess whether angular dispersive x-ray diffraction (ADXRD) data, processed using multivariate analysis techniques, can be used for classifying secondary colorectal liver cancer tissue and normal surrounding liver tissue in human liver biopsy samples. The ADXRD profiles from a total of 60 samples of normal liver tissue and colorectal liver metastases were measured using a synchrotron radiation source. The data were analysed for 56 samples using nonlinear peak-fitting software. Four peaks were fitted to all of the ADXRD profiles, and the amplitude, area, amplitude and area ratios for three of the four peaks were calculated and used for the statistical and multivariate analysis. The statistical analysis showed that there are significant differences between all the peak-fitting parameters and ratios between the normal and the diseased tissue groups. The technique of soft independent modelling of class analogy (SIMCA) was used to classify normal liver tissue and colorectal liver metastases resulting in 67% of the normal tissue samples and 60% of the secondary colorectal liver tissue samples being classified correctly. This study has shown that the ADXRD data of normal and secondary colorectal liver cancer are statistically different and x-ray diffraction data analysed using multivariate analysis have the potential to be used as a method of tissue classification.

  3. 78 FR 63568 - Proposed Collection; Comment Request for Rev. Proc. 2007-35

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-10-24

    ... Revenue Procedure 2007-35, Statistical Sampling for purposes of Section 199. DATES: Written comments... . SUPPLEMENTARY INFORMATION: Title: Statistical Sampling for purposes of Section 199. OMB Number: 1545-2072... statistical sampling may be used in purposes of section 199, which provides a deduction for income...

  4. Linking Associations of Rare Low-Abundance Species to Their Environments by Association Networks

    DOE PAGES

    Karpinets, Tatiana V.; Gopalakrishnan, Vancheswaran; Wargo, Jennifer; ...

    2018-03-07

    Studies of microbial communities by targeted sequencing of rRNA genes lead to recovering numerous rare low-abundance taxa with unknown biological roles. We propose to study associations of such rare organisms with their environments by a computational framework based on transformation of the data into qualitative variables. Namely, we analyze the sparse table of putative species or OTUs (operational taxonomic units) and samples generated in such studies, also known as an OTU table, by collecting statistics on co-occurrences of the species and on shared species richness across samples. Based on the statistics we built two association networks, of the rare putativemore » species and of the samples respectively, using a known computational technique, Association networks (Anets) developed for analysis of qualitative data. Clusters of samples and clusters of OTUs are then integrated and combined with metadata of the study to produce a map of associated putative species in their environments. We tested and validated the framework on two types of microbiomes, of human body sites and that of the Populus tree root systems. We show that in both studies the associations of OTUs can separate samples according to environmental or physiological characteristics of the studied systems.« less

  5. Linking Associations of Rare Low-Abundance Species to Their Environments by Association Networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karpinets, Tatiana V.; Gopalakrishnan, Vancheswaran; Wargo, Jennifer

    Studies of microbial communities by targeted sequencing of rRNA genes lead to recovering numerous rare low-abundance taxa with unknown biological roles. We propose to study associations of such rare organisms with their environments by a computational framework based on transformation of the data into qualitative variables. Namely, we analyze the sparse table of putative species or OTUs (operational taxonomic units) and samples generated in such studies, also known as an OTU table, by collecting statistics on co-occurrences of the species and on shared species richness across samples. Based on the statistics we built two association networks, of the rare putativemore » species and of the samples respectively, using a known computational technique, Association networks (Anets) developed for analysis of qualitative data. Clusters of samples and clusters of OTUs are then integrated and combined with metadata of the study to produce a map of associated putative species in their environments. We tested and validated the framework on two types of microbiomes, of human body sites and that of the Populus tree root systems. We show that in both studies the associations of OTUs can separate samples according to environmental or physiological characteristics of the studied systems.« less

  6. Determination of polarimetric parameters of honey by near-infrared transflectance spectroscopy.

    PubMed

    García-Alvarez, M; Ceresuela, S; Huidobro, J F; Hermida, M; Rodríguez-Otero, J L

    2002-01-30

    NIR transflectance spectroscopy was used to determine polarimetric parameters (direct polarization, polarization after inversion, specific rotation in dry matter, and polarization due to nonmonosaccharides) and sucrose in honey. In total, 156 honey samples were collected during 1992 (45 samples), 1995 (56 samples), and 1996 (55 samples). Samples were analyzed by NIR spectroscopy and polarimetric methods. Calibration (118 samples) and validation (38 samples) sets were made up; honeys from the three years were included in both sets. Calibrations were performed by modified partial least-squares regression and scatter correction by standard normal variation and detrend methods. For direct polarization, polarization after inversion, specific rotation in dry matter, and polarization due to nonmonosaccharides, good statistics (bias, SEV, and R(2)) were obtained for the validation set, and no statistically (p = 0.05) significant differences were found between instrumental and polarimetric methods for these parameters. Statistical data for sucrose were not as good as those of the other parameters. Therefore, NIR spectroscopy is not an effective method for quantitative analysis of sucrose in these honey samples. However, NIR spectroscopy may be an acceptable method for semiquantitative evaluation of sucrose for honeys, such as those in our study, containing up to 3% of sucrose. Further work is necessary to validate the uncertainty at higher levels.

  7. State Estimates of Disability in America. Disability Statistics Report 3.

    ERIC Educational Resources Information Center

    LaPlante, Mitchell P.

    This study presents and discusses existing data on disability by state, from the 1980 and 1990 censuses, the Current Population Survey (CPS), and the National Health Interview Survey (NHIS). The study used direct methods for states with large sample sizes and synthetic estimates for states with low sample sizes. The study's highlighted findings…

  8. Sampling errors for satellite-derived tropical rainfall - Monte Carlo study using a space-time stochastic model

    NASA Technical Reports Server (NTRS)

    Bell, Thomas L.; Abdullah, A.; Martin, Russell L.; North, Gerald R.

    1990-01-01

    Estimates of monthly average rainfall based on satellite observations from a low earth orbit will differ from the true monthly average because the satellite observes a given area only intermittently. This sampling error inherent in satellite monitoring of rainfall would occur even if the satellite instruments could measure rainfall perfectly. The size of this error is estimated for a satellite system being studied at NASA, the Tropical Rainfall Measuring Mission (TRMM). First, the statistical description of rainfall on scales from 1 to 1000 km is examined in detail, based on rainfall data from the Global Atmospheric Research Project Atlantic Tropical Experiment (GATE). A TRMM-like satellite is flown over a two-dimensional time-evolving simulation of rainfall using a stochastic model with statistics tuned to agree with GATE statistics. The distribution of sampling errors found from many months of simulated observations is found to be nearly normal, even though the distribution of area-averaged rainfall is far from normal. For a range of orbits likely to be employed in TRMM, sampling error is found to be less than 10 percent of the mean for rainfall averaged over a 500 x 500 sq km area.

  9. GASP cloud- and particle-encounter statistics and their application to LPC aircraft studies. Volume 1: Analysis and conclusions

    NASA Technical Reports Server (NTRS)

    Jasperson, W. H.; Nastrom, G. D.; Davis, R. E.; Holdeman, J. D.

    1984-01-01

    Summary studies are presented for the entire cloud observation archieve from the NASA Global Atmospheric Sampling Program (GASP). Studies are also presented for GASP particle concentration data gathered concurrently with the cloud observations. Cloud encounters are shown on about 15 percent of the data samples overall, but the probability of cloud encounter is shown to vary significantly with altitude, latitude, and distance from the tropopause. Several meteorological circulation features are apparent in the latitudinal distribution of cloud cover, and the cloud encounter statistics are shown to be consistent with the classical mid-latitude cyclone model. Observations of clouds spaced more closely than 90 minutes are shown to be statistically dependent. The statistics for cloud and particle encounter are utilized to estimate the frequency of cloud encounter on long range airline routes, and to assess the probability and extent of laminar flow loss due to cloud or particle encounter by aircraft utilizing laminar flow control (LFC). It is shown that the probability of extended cloud encounter is too low, of itself, to make LFC impractical.

  10. Misrepresenting random sampling? A systematic review of research papers in the Journal of Advanced Nursing.

    PubMed

    Williamson, Graham R

    2003-11-01

    This paper discusses the theoretical limitations of the use of random sampling and probability theory in the production of a significance level (or P-value) in nursing research. Potential alternatives, in the form of randomization tests, are proposed. Research papers in nursing, medicine and psychology frequently misrepresent their statistical findings, as the P-values reported assume random sampling. In this systematic review of studies published between January 1995 and June 2002 in the Journal of Advanced Nursing, 89 (68%) studies broke this assumption because they used convenience samples or entire populations. As a result, some of the findings may be questionable. The key ideas of random sampling and probability theory for statistical testing (for generating a P-value) are outlined. The result of a systematic review of research papers published in the Journal of Advanced Nursing is then presented, showing how frequently random sampling appears to have been misrepresented. Useful alternative techniques that might overcome these limitations are then discussed. REVIEW LIMITATIONS: This review is limited in scope because it is applied to one journal, and so the findings cannot be generalized to other nursing journals or to nursing research in general. However, it is possible that other nursing journals are also publishing research articles based on the misrepresentation of random sampling. The review is also limited because in several of the articles the sampling method was not completely clearly stated, and in this circumstance a judgment has been made as to the sampling method employed, based on the indications given by author(s). Quantitative researchers in nursing should be very careful that the statistical techniques they use are appropriate for the design and sampling methods of their studies. If the techniques they employ are not appropriate, they run the risk of misinterpreting findings by using inappropriate, unrepresentative and biased samples.

  11. Laser Velocimeter Measurements and Analysis in Turbulent Flows with Combustion. Part 2.

    DTIC Science & Technology

    1983-07-01

    sampling error for 63 this sample size. Mean velocities and turbulence intensi- ties were found to be statistically accurate to ± 1 % and 13%, respectively...Although the statist - ical error was found to be rather small (± 1 % for mean velo- cities and 13% for turbulence intensities), there can be additional...34Computational and Experimental Study of a Captive Annular Eddy," Journal of Fluid Mechanics, Vol. 28, pt. 1 , pp. 43-63, 12 April, 1967. 152 REFERENCES (con’d

  12. Further developments in cloud statistics for computer simulations

    NASA Technical Reports Server (NTRS)

    Chang, D. T.; Willand, J. H.

    1972-01-01

    This study is a part of NASA's continued program to provide global statistics of cloud parameters for computer simulation. The primary emphasis was on the development of the data bank of the global statistical distributions of cloud types and cloud layers and their applications in the simulation of the vertical distributions of in-cloud parameters such as liquid water content. These statistics were compiled from actual surface observations as recorded in Standard WBAN forms. Data for a total of 19 stations were obtained and reduced. These stations were selected to be representative of the 19 primary cloud climatological regions defined in previous studies of cloud statistics. Using the data compiled in this study, a limited study was conducted of the hemogeneity of cloud regions, the latitudinal dependence of cloud-type distributions, the dependence of these statistics on sample size, and other factors in the statistics which are of significance to the problem of simulation. The application of the statistics in cloud simulation was investigated. In particular, the inclusion of the new statistics in an expanded multi-step Monte Carlo simulation scheme is suggested and briefly outlined.

  13. Reporting quality of statistical methods in surgical observational studies: protocol for systematic review.

    PubMed

    Wu, Robert; Glen, Peter; Ramsay, Tim; Martel, Guillaume

    2014-06-28

    Observational studies dominate the surgical literature. Statistical adjustment is an important strategy to account for confounders in observational studies. Research has shown that published articles are often poor in statistical quality, which may jeopardize their conclusions. The Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines have been published to help establish standards for statistical reporting.This study will seek to determine whether the quality of statistical adjustment and the reporting of these methods are adequate in surgical observational studies. We hypothesize that incomplete reporting will be found in all surgical observational studies, and that the quality and reporting of these methods will be of lower quality in surgical journals when compared with medical journals. Finally, this work will seek to identify predictors of high-quality reporting. This work will examine the top five general surgical and medical journals, based on a 5-year impact factor (2007-2012). All observational studies investigating an intervention related to an essential component area of general surgery (defined by the American Board of Surgery), with an exposure, outcome, and comparator, will be included in this systematic review. Essential elements related to statistical reporting and quality were extracted from the SAMPL guidelines and include domains such as intent of analysis, primary analysis, multiple comparisons, numbers and descriptive statistics, association and correlation analyses, linear regression, logistic regression, Cox proportional hazard analysis, analysis of variance, survival analysis, propensity analysis, and independent and correlated analyses. Each article will be scored as a proportion based on fulfilling criteria in relevant analyses used in the study. A logistic regression model will be built to identify variables associated with high-quality reporting. A comparison will be made between the scores of surgical observational studies published in medical versus surgical journals. Secondary outcomes will pertain to individual domains of analysis. Sensitivity analyses will be conducted. This study will explore the reporting and quality of statistical analyses in surgical observational studies published in the most referenced surgical and medical journals in 2013 and examine whether variables (including the type of journal) can predict high-quality reporting.

  14. Learning to Reason from Samples

    ERIC Educational Resources Information Center

    Ben-Zvi, Dani; Bakker, Arthur; Makar, Katie

    2015-01-01

    The goal of this article is to introduce the topic of "learning to reason from samples," which is the focus of this special issue of "Educational Studies in Mathematics" on "statistical reasoning." Samples are data sets, taken from some wider universe (e.g., a population or a process) using a particular procedure…

  15. Validation of Statistical Sampling Algorithms in Visual Sample Plan (VSP): Summary Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nuffer, Lisa L; Sego, Landon H.; Wilson, John E.

    2009-02-18

    The U.S. Department of Homeland Security, Office of Technology Development (OTD) contracted with a set of U.S. Department of Energy national laboratories, including the Pacific Northwest National Laboratory (PNNL), to write a Remediation Guidance for Major Airports After a Chemical Attack. The report identifies key activities and issues that should be considered by a typical major airport following an incident involving release of a toxic chemical agent. Four experimental tasks were identified that would require further research in order to supplement the Remediation Guidance. One of the tasks, Task 4, OTD Chemical Remediation Statistical Sampling Design Validation, dealt with statisticalmore » sampling algorithm validation. This report documents the results of the sampling design validation conducted for Task 4. In 2005, the Government Accountability Office (GAO) performed a review of the past U.S. responses to Anthrax terrorist cases. Part of the motivation for this PNNL report was a major GAO finding that there was a lack of validated sampling strategies in the U.S. response to Anthrax cases. The report (GAO 2005) recommended that probability-based methods be used for sampling design in order to address confidence in the results, particularly when all sample results showed no remaining contamination. The GAO also expressed a desire that the methods be validated, which is the main purpose of this PNNL report. The objective of this study was to validate probability-based statistical sampling designs and the algorithms pertinent to within-building sampling that allow the user to prescribe or evaluate confidence levels of conclusions based on data collected as guided by the statistical sampling designs. Specifically, the designs found in the Visual Sample Plan (VSP) software were evaluated. VSP was used to calculate the number of samples and the sample location for a variety of sampling plans applied to an actual release site. Most of the sampling designs validated are probability based, meaning samples are located randomly (or on a randomly placed grid) so no bias enters into the placement of samples, and the number of samples is calculated such that IF the amount and spatial extent of contamination exceeds levels of concern, at least one of the samples would be taken from a contaminated area, at least X% of the time. Hence, "validation" of the statistical sampling algorithms is defined herein to mean ensuring that the "X%" (confidence) is actually met.« less

  16. Small sample mediation testing: misplaced confidence in bootstrapped confidence intervals.

    PubMed

    Koopman, Joel; Howe, Michael; Hollenbeck, John R; Sin, Hock-Peng

    2015-01-01

    Bootstrapping is an analytical tool commonly used in psychology to test the statistical significance of the indirect effect in mediation models. Bootstrapping proponents have particularly advocated for its use for samples of 20-80 cases. This advocacy has been heeded, especially in the Journal of Applied Psychology, as researchers are increasingly utilizing bootstrapping to test mediation with samples in this range. We discuss reasons to be concerned with this escalation, and in a simulation study focused specifically on this range of sample sizes, we demonstrate not only that bootstrapping has insufficient statistical power to provide a rigorous hypothesis test in most conditions but also that bootstrapping has a tendency to exhibit an inflated Type I error rate. We then extend our simulations to investigate an alternative empirical resampling method as well as a Bayesian approach and demonstrate that they exhibit comparable statistical power to bootstrapping in small samples without the associated inflated Type I error. Implications for researchers testing mediation hypotheses in small samples are presented. For researchers wishing to use these methods in their own research, we have provided R syntax in the online supplemental materials. (c) 2015 APA, all rights reserved.

  17. Disentangling Vulnerabilities from Outcomes: Distinctions between Trait Affect and Depressive Symptoms in Adolescent and Adult Samples

    PubMed Central

    Harding, Kaitlin A.; Willey, Brittany; Ahles, Joshua; Mezulis, Amy

    2016-01-01

    Background Trait negative affect and trait positive affect are affective vulnerabilities to depressive symptoms in adolescence and adulthood. While trait affect and the state affect characteristic of depressive symptoms are proposed to be theoretically distinct, no studies have established that these constructs are statistically distinct. Therefore, the purpose of the current study was to determine whether the trait affect (e.g. temperament dimensions) that predicts depressive symptoms and the state affect characteristic of depressive symptoms are statistically distinct among early adolescents and adults. We hypothesized that trait negative affect, trait positive affect, and depressive symptoms would represent largely distinct factors in both samples. Method Participants were 268 early adolescents (53.73% female) and 321 young adults (70.09% female) who completed self-report measures of demographic information, trait affect, and depressive symptoms. Results Principal axis factoring with oblique rotation for both samples indicated distinct adolescent factor loadings and overlapping adult factor loadings. Confirmatory factor analyses in both samples supported distinct but related relationships between trait NA, trait PA, and depressive symptoms. Limitations Study limitations include our cross-sectional design that prevented examination of self-reported fluctuations in trait affect and depressive symptoms and the unknown potential effects of self-report biases among adolescents and adults. Conclusions Findings support existing theoretical distinctions between adolescent constructs but highlight a need to revise or remove items to distinguish measurements of adult trait affect and depressive symptoms. Adolescent trait affect and depressive symptoms are statistically distinct, but adult trait affect and depressive symptoms statistically overlap and warrant further consideration. PMID:27085163

  18. Evidence for plant-derived xenomiRs based on a large-scale analysis of public small RNA sequencing data from human samples.

    PubMed

    Zhao, Qi; Liu, Yuanning; Zhang, Ning; Hu, Menghan; Zhang, Hao; Joshi, Trupti; Xu, Dong

    2018-01-01

    In recent years, an increasing number of studies have reported the presence of plant miRNAs in human samples, which resulted in a hypothesis asserting the existence of plant-derived exogenous microRNA (xenomiR). However, this hypothesis is not widely accepted in the scientific community due to possible sample contamination and the small sample size with lack of rigorous statistical analysis. This study provides a systematic statistical test that can validate (or invalidate) the plant-derived xenomiR hypothesis by analyzing 388 small RNA sequencing data from human samples in 11 types of body fluids/tissues. A total of 166 types of plant miRNAs were found in at least one human sample, of which 14 plant miRNAs represented more than 80% of the total plant miRNAs abundance in human samples. Plant miRNA profiles were characterized to be tissue-specific in different human samples. Meanwhile, the plant miRNAs identified from microbiome have an insignificant abundance compared to those from humans, while plant miRNA profiles in human samples were significantly different from those in plants, suggesting that sample contamination is an unlikely reason for all the plant miRNAs detected in human samples. This study also provides a set of testable synthetic miRNAs with isotopes that can be detected in situ after being fed to animals.

  19. Sample size in psychological research over the past 30 years.

    PubMed

    Marszalek, Jacob M; Barber, Carolyn; Kohlhart, Julie; Holmes, Cooper B

    2011-04-01

    The American Psychological Association (APA) Task Force on Statistical Inference was formed in 1996 in response to a growing body of research demonstrating methodological issues that threatened the credibility of psychological research, and made recommendations to address them. One issue was the small, even dramatically inadequate, size of samples used in studies published by leading journals. The present study assessed the progress made since the Task Force's final report in 1999. Sample sizes reported in four leading APA journals in 1955, 1977, 1995, and 2006 were compared using nonparametric statistics, while data from the last two waves were fit to a hierarchical generalized linear growth model for more in-depth analysis. Overall, results indicate that the recommendations for increasing sample sizes have not been integrated in core psychological research, although results slightly vary by field. This and other implications are discussed in the context of current methodological critique and practice.

  20. Quantification and statistical significance analysis of group separation in NMR-based metabonomics studies

    PubMed Central

    Goodpaster, Aaron M.; Kennedy, Michael A.

    2015-01-01

    Currently, no standard metrics are used to quantify cluster separation in PCA or PLS-DA scores plots for metabonomics studies or to determine if cluster separation is statistically significant. Lack of such measures makes it virtually impossible to compare independent or inter-laboratory studies and can lead to confusion in the metabonomics literature when authors putatively identify metabolites distinguishing classes of samples based on visual and qualitative inspection of scores plots that exhibit marginal separation. While previous papers have addressed quantification of cluster separation in PCA scores plots, none have advocated routine use of a quantitative measure of separation that is supported by a standard and rigorous assessment of whether or not the cluster separation is statistically significant. Here quantification and statistical significance of separation of group centroids in PCA and PLS-DA scores plots are considered. The Mahalanobis distance is used to quantify the distance between group centroids, and the two-sample Hotelling's T2 test is computed for the data, related to an F-statistic, and then an F-test is applied to determine if the cluster separation is statistically significant. We demonstrate the value of this approach using four datasets containing various degrees of separation, ranging from groups that had no apparent visual cluster separation to groups that had no visual cluster overlap. Widespread adoption of such concrete metrics to quantify and evaluate the statistical significance of PCA and PLS-DA cluster separation would help standardize reporting of metabonomics data. PMID:26246647

  1. Limitations of Poisson statistics in describing radioactive decay.

    PubMed

    Sitek, Arkadiusz; Celler, Anna M

    2015-12-01

    The assumption that nuclear decays are governed by Poisson statistics is an approximation. This approximation becomes unjustified when data acquisition times longer than or even comparable with the half-lives of the radioisotope in the sample are considered. In this work, the limits of the Poisson-statistics approximation are investigated. The formalism for the statistics of radioactive decay based on binomial distribution is derived. The theoretical factor describing the deviation of variance of the number of decays predicated by the Poisson distribution from the true variance is defined and investigated for several commonly used radiotracers such as (18)F, (15)O, (82)Rb, (13)N, (99m)Tc, (123)I, and (201)Tl. The variance of the number of decays estimated using the Poisson distribution is significantly different than the true variance for a 5-minute observation time of (11)C, (15)O, (13)N, and (82)Rb. Durations of nuclear medicine studies often are relatively long; they may be even a few times longer than the half-lives of some short-lived radiotracers. Our study shows that in such situations the Poisson statistics is unsuitable and should not be applied to describe the statistics of the number of decays in radioactive samples. However, the above statement does not directly apply to counting statistics at the level of event detection. Low sensitivities of detectors which are used in imaging studies make the Poisson approximation near perfect. Copyright © 2015 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

  2. Statistical Design for Biospecimen Cohort Size in Proteomics-based Biomarker Discovery and Verification Studies

    PubMed Central

    Skates, Steven J.; Gillette, Michael A.; LaBaer, Joshua; Carr, Steven A.; Anderson, N. Leigh; Liebler, Daniel C.; Ransohoff, David; Rifai, Nader; Kondratovich, Marina; Težak, Živana; Mansfield, Elizabeth; Oberg, Ann L.; Wright, Ian; Barnes, Grady; Gail, Mitchell; Mesri, Mehdi; Kinsinger, Christopher R.; Rodriguez, Henry; Boja, Emily S.

    2014-01-01

    Protein biomarkers are needed to deepen our understanding of cancer biology and to improve our ability to diagnose, monitor and treat cancers. Important analytical and clinical hurdles must be overcome to allow the most promising protein biomarker candidates to advance into clinical validation studies. Although contemporary proteomics technologies support the measurement of large numbers of proteins in individual clinical specimens, sample throughput remains comparatively low. This problem is amplified in typical clinical proteomics research studies, which routinely suffer from a lack of proper experimental design, resulting in analysis of too few biospecimens to achieve adequate statistical power at each stage of a biomarker pipeline. To address this critical shortcoming, a joint workshop was held by the National Cancer Institute (NCI), National Heart, Lung and Blood Institute (NHLBI), and American Association for Clinical Chemistry (AACC), with participation from the U.S. Food and Drug Administration (FDA). An important output from the workshop was a statistical framework for the design of biomarker discovery and verification studies. Herein, we describe the use of quantitative clinical judgments to set statistical criteria for clinical relevance, and the development of an approach to calculate biospecimen sample size for proteomic studies in discovery and verification stages prior to clinical validation stage. This represents a first step towards building a consensus on quantitative criteria for statistical design of proteomics biomarker discovery and verification research. PMID:24063748

  3. Statistical design for biospecimen cohort size in proteomics-based biomarker discovery and verification studies.

    PubMed

    Skates, Steven J; Gillette, Michael A; LaBaer, Joshua; Carr, Steven A; Anderson, Leigh; Liebler, Daniel C; Ransohoff, David; Rifai, Nader; Kondratovich, Marina; Težak, Živana; Mansfield, Elizabeth; Oberg, Ann L; Wright, Ian; Barnes, Grady; Gail, Mitchell; Mesri, Mehdi; Kinsinger, Christopher R; Rodriguez, Henry; Boja, Emily S

    2013-12-06

    Protein biomarkers are needed to deepen our understanding of cancer biology and to improve our ability to diagnose, monitor, and treat cancers. Important analytical and clinical hurdles must be overcome to allow the most promising protein biomarker candidates to advance into clinical validation studies. Although contemporary proteomics technologies support the measurement of large numbers of proteins in individual clinical specimens, sample throughput remains comparatively low. This problem is amplified in typical clinical proteomics research studies, which routinely suffer from a lack of proper experimental design, resulting in analysis of too few biospecimens to achieve adequate statistical power at each stage of a biomarker pipeline. To address this critical shortcoming, a joint workshop was held by the National Cancer Institute (NCI), National Heart, Lung, and Blood Institute (NHLBI), and American Association for Clinical Chemistry (AACC) with participation from the U.S. Food and Drug Administration (FDA). An important output from the workshop was a statistical framework for the design of biomarker discovery and verification studies. Herein, we describe the use of quantitative clinical judgments to set statistical criteria for clinical relevance and the development of an approach to calculate biospecimen sample size for proteomic studies in discovery and verification stages prior to clinical validation stage. This represents a first step toward building a consensus on quantitative criteria for statistical design of proteomics biomarker discovery and verification research.

  4. MID-ATLANTIC COASTAL STREAMS STUDY: STATISTICAL DESIGN FOR REGIONAL ASSESSMENT AND LANDSCAPE MODEL DEVELOPMENT

    EPA Science Inventory

    A network of stream-sampling sites was developed for the Mid-Atlantic Coastal Plain (New Jersey through North Carolina) a collaborative study between the U.S. Environmental Protection Agency and the U.S. Geological Survey. A stratified random sampling with unequal weighting was u...

  5. What Good Are Statistics that Don't Generalize?

    ERIC Educational Resources Information Center

    Shaffer, David Williamson; Serlin, Ronald C.

    2004-01-01

    Quantitative and qualitative inquiry are sometimes portrayed as distinct and incompatible paradigms for research in education. Approaches to combining qualitative and quantitative research typically "integrate" the two methods by letting them co-exist independently within a single research study. Here we describe intra-sample statistical analysis…

  6. Understanding Statistical Variation: A Response to Sharma

    ERIC Educational Resources Information Center

    Farmer, Jim

    2008-01-01

    In this article, the author responds to the paper "Exploring pre-service teachers' understanding of statistical variation: Implications for teaching and research" by Sashi Sharma (see EJ779107). In that paper, Sharma described a study "designed to investigate pre-service teachers' acknowledgment of variation in sampling and distribution…

  7. The effect of substrate composition and storage time on urine specific gravity in dogs.

    PubMed

    Steinberg, E; Drobatz, K; Aronson, L

    2009-10-01

    The purpose of this study is to evaluate the effects of substrate composition and storage time on urine specific gravity in dogs. A descriptive cohort study of 15 dogs. The urine specific gravity of free catch urine samples was analysed during a 5-hour time period using three separate storage methods; a closed syringe, a diaper pad and non-absorbable cat litter. The urine specific gravity increased over time in all three substrates. The syringe sample had the least change from baseline and the diaper sample had the greatest change from baseline. The urine specific gravity for the litter and diaper samples had a statistically significant increase from the 1-hour to the 5-hour time point. The urine specific gravity from canine urine stored either on a diaper or in a non-absorbable litter increased over time. Although the change was found to be statistically significant over the 5-hour study period it is unlikely to be clinically significant.

  8. Statistical considerations for plot design, sampling procedures, analysis, and quality assurance of ozone injury studies

    Treesearch

    Michael Arbaugh; Larry Bednar

    1996-01-01

    The sampling methods used to monitor ozone injury to ponderosa and Jeffrey pines depend on the objectives of the study, geographic and genetic composition of the forest, and the source and composition of air pollutant emissions. By using a standardized sampling methodology, it may be possible to compare conditions within local areas more accurately, and to apply the...

  9. Adventures in Uncertainty: An Empirical Investigation of the Use of a Taylor's Series Approximation for the Assessment of Sampling Errors in Educational Research.

    ERIC Educational Resources Information Center

    Wilson, Mark

    This study investigates the accuracy of the Woodruff-Causey technique for estimating sampling errors for complex statistics. The technique may be applied when data are collected by using multistage clustered samples. The technique was chosen for study because of its relevance to the correct use of multivariate analyses in educational survey…

  10. Sampling methods to the statistical control of the production of blood components.

    PubMed

    Pereira, Paulo; Seghatchian, Jerard; Caldeira, Beatriz; Santos, Paula; Castro, Rosa; Fernandes, Teresa; Xavier, Sandra; de Sousa, Gracinda; de Almeida E Sousa, João Paulo

    2017-12-01

    The control of blood components specifications is a requirement generalized in Europe by the European Commission Directives and in the US by the AABB standards. The use of a statistical process control methodology is recommended in the related literature, including the EDQM guideline. The control reliability is dependent of the sampling. However, a correct sampling methodology seems not to be systematically applied. Commonly, the sampling is intended to comply uniquely with the 1% specification to the produced blood components. Nevertheless, on a purely statistical viewpoint, this model could be argued not to be related to a consistent sampling technique. This could be a severe limitation to detect abnormal patterns and to assure that the production has a non-significant probability of producing nonconforming components. This article discusses what is happening in blood establishments. Three statistical methodologies are proposed: simple random sampling, sampling based on the proportion of a finite population, and sampling based on the inspection level. The empirical results demonstrate that these models are practicable in blood establishments contributing to the robustness of sampling and related statistical process control decisions for the purpose they are suggested for. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Air Combat Training: Good Stick Index Validation. Final Report for Period 3 April 1978-1 April 1979.

    ERIC Educational Resources Information Center

    Moore, Samuel B.; And Others

    A study was conducted to investigate and statistically validate a performance measuring system (the Good Stick Index) in the Tactical Air Command Combat Engagement Simulator I (TAC ACES I) Air Combat Maneuvering (ACM) training program. The study utilized a twelve-week sample of eighty-nine student pilots to statistically validate the Good Stick…

  12. Fine-scale landscape genetics of the American badger (Taxidea taxus): disentangling landscape effects and sampling artifacts in a poorly understood species

    PubMed Central

    Kierepka, E M; Latch, E K

    2016-01-01

    Landscape genetics is a powerful tool for conservation because it identifies landscape features that are important for maintaining genetic connectivity between populations within heterogeneous landscapes. However, using landscape genetics in poorly understood species presents a number of challenges, namely, limited life history information for the focal population and spatially biased sampling. Both obstacles can reduce power in statistics, particularly in individual-based studies. In this study, we genotyped 233 American badgers in Wisconsin at 12 microsatellite loci to identify alternative statistical approaches that can be applied to poorly understood species in an individual-based framework. Badgers are protected in Wisconsin owing to an overall lack in life history information, so our study utilized partial redundancy analysis (RDA) and spatially lagged regressions to quantify how three landscape factors (Wisconsin River, Ecoregions and land cover) impacted gene flow. We also performed simulations to quantify errors created by spatially biased sampling. Statistical analyses first found that geographic distance was an important influence on gene flow, mainly driven by fine-scale positive spatial autocorrelations. After controlling for geographic distance, both RDA and regressions found that Wisconsin River and Agriculture were correlated with genetic differentiation. However, only Agriculture had an acceptable type I error rate (3–5%) to be considered biologically relevant. Collectively, this study highlights the benefits of combining robust statistics and error assessment via simulations and provides a method for hypothesis testing in individual-based landscape genetics. PMID:26243136

  13. General statistical considerations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eberhardt, L L; Gilbert, R O

    From NAEG plutonium environmental studies program meeting; Las Vegas, Nevada, USA (2 Oct 1973). The high sampling variability encountered in environmental plutonium studies along with high analytical costs makes it very important that efficient soil sampling plans be used. However, efficient sampling depends on explicit and simple statements of the objectives of the study. When there are multiple objectives it may be difficult to devise a wholly suitable sampling scheme. Sampling for long-term changes in plutonium concentration in soils may also be complex and expensive. Further attention to problems associated with compositing samples is recommended, as is the consistent usemore » of random sampling as a basic technique. (auth)« less

  14. Determination of variables in the prediction of strontium distribution coefficients for selected sediments

    USGS Publications Warehouse

    Pace, M.N.; Rosentreter, J.J.; Bartholomay, R.C.

    2001-01-01

    Idaho State University and the US Geological Survey, in cooperation with the US Department of Energy, conducted a study to determine and evaluate strontium distribution coefficients (Kds) of subsurface materials at the Idaho National Engineering and Environmental Laboratory (INEEL). The Kds were determined to aid in assessing the variability of strontium Kds and their effects on chemical transport of strontium-90 in the Snake River Plain aquifer system. Data from batch experiments done to determine strontium Kds of five sediment-infill samples and six standard reference material samples were analyzed by using multiple linear regression analysis and the stepwise variable-selection method in the statistical program, Statistical Product and Service Solutions, to derive an equation of variables that can be used to predict strontium Kds of sediment-infill samples. The sediment-infill samples were from basalt vesicles and fractures from a selected core at the INEEL; strontium Kds ranged from ???201 to 356 ml g-1. The standard material samples consisted of clay minerals and calcite. The statistical analyses of the batch-experiment results showed that the amount of strontium in the initial solution, the amount of manganese oxide in the sample material, and the amount of potassium in the initial solution are the most important variables in predicting strontium Kds of sediment-infill samples.

  15. Designing image segmentation studies: Statistical power, sample size and reference standard quality.

    PubMed

    Gibson, Eli; Hu, Yipeng; Huisman, Henkjan J; Barratt, Dean C

    2017-12-01

    Segmentation algorithms are typically evaluated by comparison to an accepted reference standard. The cost of generating accurate reference standards for medical image segmentation can be substantial. Since the study cost and the likelihood of detecting a clinically meaningful difference in accuracy both depend on the size and on the quality of the study reference standard, balancing these trade-offs supports the efficient use of research resources. In this work, we derive a statistical power calculation that enables researchers to estimate the appropriate sample size to detect clinically meaningful differences in segmentation accuracy (i.e. the proportion of voxels matching the reference standard) between two algorithms. Furthermore, we derive a formula to relate reference standard errors to their effect on the sample sizes of studies using lower-quality (but potentially more affordable and practically available) reference standards. The accuracy of the derived sample size formula was estimated through Monte Carlo simulation, demonstrating, with 95% confidence, a predicted statistical power within 4% of simulated values across a range of model parameters. This corresponds to sample size errors of less than 4 subjects and errors in the detectable accuracy difference less than 0.6%. The applicability of the formula to real-world data was assessed using bootstrap resampling simulations for pairs of algorithms from the PROMISE12 prostate MR segmentation challenge data set. The model predicted the simulated power for the majority of algorithm pairs within 4% for simulated experiments using a high-quality reference standard and within 6% for simulated experiments using a low-quality reference standard. A case study, also based on the PROMISE12 data, illustrates using the formulae to evaluate whether to use a lower-quality reference standard in a prostate segmentation study. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  16. Comparison of two modalities: a novel technique, 'chromohysteroscopy', and blind endometrial sampling for the evaluation of abnormal uterine bleeding.

    PubMed

    Alay, Asli; Usta, Taner A; Ozay, Pinar; Karadugan, Ozgur; Ates, Ugur

    2014-05-01

    The objective of this study was to compare classical blind endometrial tissue sampling with hysteroscopic biopsy sampling following methylene blue dyeing in premenopausal and postmenopausal patients with abnormal uterine bleeding. A prospective case-control study was carried out in the Office Hysteroscopy Unit. Fifty-four patients with complaints of abnormal uterine bleeding were evaluated. Data of 38 patients were included in the statistical analysis. Three groups were compared by examining samples obtained through hysteroscopic biopsy before and after methylene blue dyeing, and classical blind endometrial tissue sampling. First, uterine cavity was evaluated with office hysteroscopy. Methylene blue dye was administered through the hysteroscopic inlet. Tissue samples were obtained from stained and non-stained areas. Blind endometrial sampling was performed in the same patients immediately after the hysteroscopy procedure. The results of hysteroscopic biopsy from methylene blue stained and non-stained areas and blind biopsy were compared. No statistically significant differences were determined in the comparison of biopsy samples obtained from methylene-blue stained, non-stained areas and blind biopsy (P > 0.05). We suggest that chromohysteroscopy is not superior to endometrial sampling in cases of abnormal uterine bleeding. Further studies with greater sample sizes should be performed to assess the validity of routine use of endometrial dyeing. © 2014 The Authors. Journal of Obstetrics and Gynaecology Research © 2014 Japan Society of Obstetrics and Gynecology.

  17. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.

    PubMed

    Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P; Patterson, Nick; Price, Alkes L

    2014-10-15

    Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary materials are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. Pocket guide to transportation, 1999

    DOT National Transportation Integrated Search

    1998-12-01

    Statistics published in this Pocket Guide to Transportation come from many different sources. Some statistics are based on samples and are subject to sampling variability. Statistics may also be subject to omissions and errors in reporting, recording...

  19. Pocket guide to transportation, 2009

    DOT National Transportation Integrated Search

    2009-01-01

    Statistics published in this Pocket Guide to Transportation come from many different sources. Some statistics are based on samples and are subject to sampling variability. Statistics may also be subject to omissions and errors in reporting, recording...

  20. Pocket guide to transportation, 2013.

    DOT National Transportation Integrated Search

    2013-01-01

    Abstract Statistics published in this Pocket Guide to Transportation come from many different sources. Some statistics are based on samples and are subject to sampling variability. Statistics may also be subject to omissions and errors in reporting, ...

  1. Pocket guide to transportation, 2010

    DOT National Transportation Integrated Search

    2010-01-01

    Statistics published in this Pocket Guide to Transportation come from many different sources. Some statistics are based on samples and are subject to sampling variability. Statistics may also be subject to omissions and errors in reporting, recording...

  2. Comparative Financial Statistics for Public Two-Year Colleges: FY 1992 National Sample.

    ERIC Educational Resources Information Center

    Dickmeyer, Nathan; Cirino, Anna Marie

    This report, the 15th in an annual series, provides comparative information derived from a national sample of 544 public two-year colleges, highlighting financial statistics for fiscal year 1991-92. The report offers space for colleges to compare their institutional statistics with data provided on national sample medians; quartile data for the…

  3. The relation between statistical power and inference in fMRI

    PubMed Central

    Wager, Tor D.; Yarkoni, Tal

    2017-01-01

    Statistically underpowered studies can result in experimental failure even when all other experimental considerations have been addressed impeccably. In fMRI the combination of a large number of dependent variables, a relatively small number of observations (subjects), and a need to correct for multiple comparisons can decrease statistical power dramatically. This problem has been clearly addressed yet remains controversial—especially in regards to the expected effect sizes in fMRI, and especially for between-subjects effects such as group comparisons and brain-behavior correlations. We aimed to clarify the power problem by considering and contrasting two simulated scenarios of such possible brain-behavior correlations: weak diffuse effects and strong localized effects. Sampling from these scenarios shows that, particularly in the weak diffuse scenario, common sample sizes (n = 20–30) display extremely low statistical power, poorly represent the actual effects in the full sample, and show large variation on subsequent replications. Empirical data from the Human Connectome Project resembles the weak diffuse scenario much more than the localized strong scenario, which underscores the extent of the power problem for many studies. Possible solutions to the power problem include increasing the sample size, using less stringent thresholds, or focusing on a region-of-interest. However, these approaches are not always feasible and some have major drawbacks. The most prominent solutions that may help address the power problem include model-based (multivariate) prediction methods and meta-analyses with related synthesis-oriented approaches. PMID:29155843

  4. The Seismic risk perception in Italy deduced by a statistical sample

    NASA Astrophysics Data System (ADS)

    Crescimbene, Massimo; La Longa, Federica; Camassi, Romano; Pino, Nicola Alessandro; Pessina, Vera; Peruzza, Laura; Cerbara, Loredana; Crescimbene, Cristiana

    2015-04-01

    In 2014 EGU Assembly we presented the results of a web a survey on the perception of seismic risk in Italy. The data were derived from over 8,500 questionnaires coming from all Italian regions. Our questionnaire was built by using the semantic differential method (Osgood et al. 1957) with a seven points Likert scale. The questionnaire is inspired the main theoretical approaches of risk perception (psychometric paradigm, cultural theory, etc.) .The results were promising and seem to clearly indicate an underestimation of seismic risk by the italian population. Based on these promising results, the DPC has funded our research for the second year. In 2015 EGU Assembly we present the results of a new survey deduced by an italian statistical sample. The importance of statistical significance at national scale was also suggested by ISTAT (Italian Statistic Institute), considering the study as of national interest, accepted the "project on the perception of seismic risk" as a pilot study inside the National Statistical System (SISTAN), encouraging our RU to proceed in this direction. The survey was conducted by a company specialised in population surveys using the CATI method (computer assisted telephone interview). Preliminary results will be discussed. The statistical support was provided by the research partner CNR-IRPPS. This research is funded by Italian Civil Protection Department (DPC).

  5. [Application of statistics on chronic-diseases-relating observational research papers].

    PubMed

    Hong, Zhi-heng; Wang, Ping; Cao, Wei-hua

    2012-09-01

    To study the application of statistics on Chronic-diseases-relating observational research papers which were recently published in the Chinese Medical Association Magazines, with influential index above 0.5. Using a self-developed criterion, two investigators individually participated in assessing the application of statistics on Chinese Medical Association Magazines, with influential index above 0.5. Different opinions reached an agreement through discussion. A total number of 352 papers from 6 magazines, including the Chinese Journal of Epidemiology, Chinese Journal of Oncology, Chinese Journal of Preventive Medicine, Chinese Journal of Cardiology, Chinese Journal of Internal Medicine and Chinese Journal of Endocrinology and Metabolism, were reviewed. The rate of clear statement on the following contents as: research objectives, t target audience, sample issues, objective inclusion criteria and variable definitions were 99.43%, 98.57%, 95.43%, 92.86% and 96.87%. The correct rates of description on quantitative and qualitative data were 90.94% and 91.46%, respectively. The rates on correctly expressing the results, on statistical inference methods related to quantitative, qualitative data and modeling were 100%, 95.32% and 87.19%, respectively. 89.49% of the conclusions could directly response to the research objectives. However, 69.60% of the papers did not mention the exact names of the study design, statistically, that the papers were using. 11.14% of the papers were in lack of further statement on the exclusion criteria. Percentage of the papers that could clearly explain the sample size estimation only taking up as 5.16%. Only 24.21% of the papers clearly described the variable value assignment. Regarding the introduction on statistical conduction and on database methods, the rate was only 24.15%. 18.75% of the papers did not express the statistical inference methods sufficiently. A quarter of the papers did not use 'standardization' appropriately. As for the aspect of statistical inference, the rate of description on statistical testing prerequisite was only 24.12% while 9.94% papers did not even employ the statistical inferential method that should be used. The main deficiencies on the application of Statistics used in papers related to Chronic-diseases-related observational research were as follows: lack of sample-size determination, variable value assignment description not sufficient, methods on statistics were not introduced clearly or properly, lack of consideration for pre-requisition regarding the use of statistical inferences.

  6. Geographic and temporal validity of prediction models: Different approaches were useful to examine model performance

    PubMed Central

    Austin, Peter C.; van Klaveren, David; Vergouwe, Yvonne; Nieboer, Daan; Lee, Douglas S.; Steyerberg, Ewout W.

    2017-01-01

    Objective Validation of clinical prediction models traditionally refers to the assessment of model performance in new patients. We studied different approaches to geographic and temporal validation in the setting of multicenter data from two time periods. Study Design and Setting We illustrated different analytic methods for validation using a sample of 14,857 patients hospitalized with heart failure at 90 hospitals in two distinct time periods. Bootstrap resampling was used to assess internal validity. Meta-analytic methods were used to assess geographic transportability. Each hospital was used once as a validation sample, with the remaining hospitals used for model derivation. Hospital-specific estimates of discrimination (c-statistic) and calibration (calibration intercepts and slopes) were pooled using random effects meta-analysis methods. I2 statistics and prediction interval width quantified geographic transportability. Temporal transportability was assessed using patients from the earlier period for model derivation and patients from the later period for model validation. Results Estimates of reproducibility, pooled hospital-specific performance, and temporal transportability were on average very similar, with c-statistics of 0.75. Between-hospital variation was moderate according to I2 statistics and prediction intervals for c-statistics. Conclusion This study illustrates how performance of prediction models can be assessed in settings with multicenter data at different time periods. PMID:27262237

  7. Trends in study design and the statistical methods employed in a leading general medicine journal.

    PubMed

    Gosho, M; Sato, Y; Nagashima, K; Takahashi, S

    2018-02-01

    Study design and statistical methods have become core components of medical research, and the methodology has become more multifaceted and complicated over time. The study of the comprehensive details and current trends of study design and statistical methods is required to support the future implementation of well-planned clinical studies providing information about evidence-based medicine. Our purpose was to illustrate study design and statistical methods employed in recent medical literature. This was an extension study of Sato et al. (N Engl J Med 2017; 376: 1086-1087), which reviewed 238 articles published in 2015 in the New England Journal of Medicine (NEJM) and briefly summarized the statistical methods employed in NEJM. Using the same database, we performed a new investigation of the detailed trends in study design and individual statistical methods that were not reported in the Sato study. Due to the CONSORT statement, prespecification and justification of sample size are obligatory in planning intervention studies. Although standard survival methods (eg Kaplan-Meier estimator and Cox regression model) were most frequently applied, the Gray test and Fine-Gray proportional hazard model for considering competing risks were sometimes used for a more valid statistical inference. With respect to handling missing data, model-based methods, which are valid for missing-at-random data, were more frequently used than single imputation methods. These methods are not recommended as a primary analysis, but they have been applied in many clinical trials. Group sequential design with interim analyses was one of the standard designs, and novel design, such as adaptive dose selection and sample size re-estimation, was sometimes employed in NEJM. Model-based approaches for handling missing data should replace single imputation methods for primary analysis in the light of the information found in some publications. Use of adaptive design with interim analyses is increasing after the presentation of the FDA guidance for adaptive design. © 2017 John Wiley & Sons Ltd.

  8. Interpreting “statistical hypothesis testing” results in clinical research

    PubMed Central

    Sarmukaddam, Sanjeev B.

    2012-01-01

    Difference between “Clinical Significance and Statistical Significance” should be kept in mind while interpreting “statistical hypothesis testing” results in clinical research. This fact is already known to many but again pointed out here as philosophy of “statistical hypothesis testing” is sometimes unnecessarily criticized mainly due to failure in considering such distinction. Randomized controlled trials are also wrongly criticized similarly. Some scientific method may not be applicable in some peculiar/particular situation does not mean that the method is useless. Also remember that “statistical hypothesis testing” is not for decision making and the field of “decision analysis” is very much an integral part of science of statistics. It is not correct to say that “confidence intervals have nothing to do with confidence” unless one understands meaning of the word “confidence” as used in context of confidence interval. Interpretation of the results of every study should always consider all possible alternative explanations like chance, bias, and confounding. Statistical tests in inferential statistics are, in general, designed to answer the question “How likely is the difference found in random sample(s) is due to chance” and therefore limitation of relying only on statistical significance in making clinical decisions should be avoided. PMID:22707861

  9. Statistical Symbolic Execution with Informed Sampling

    NASA Technical Reports Server (NTRS)

    Filieri, Antonio; Pasareanu, Corina S.; Visser, Willem; Geldenhuys, Jaco

    2014-01-01

    Symbolic execution techniques have been proposed recently for the probabilistic analysis of programs. These techniques seek to quantify the likelihood of reaching program events of interest, e.g., assert violations. They have many promising applications but have scalability issues due to high computational demand. To address this challenge, we propose a statistical symbolic execution technique that performs Monte Carlo sampling of the symbolic program paths and uses the obtained information for Bayesian estimation and hypothesis testing with respect to the probability of reaching the target events. To speed up the convergence of the statistical analysis, we propose Informed Sampling, an iterative symbolic execution that first explores the paths that have high statistical significance, prunes them from the state space and guides the execution towards less likely paths. The technique combines Bayesian estimation with a partial exact analysis for the pruned paths leading to provably improved convergence of the statistical analysis. We have implemented statistical symbolic execution with in- formed sampling in the Symbolic PathFinder tool. We show experimentally that the informed sampling obtains more precise results and converges faster than a purely statistical analysis and may also be more efficient than an exact symbolic analysis. When the latter does not terminate symbolic execution with informed sampling can give meaningful results under the same time and memory limits.

  10. The Accuracy of Estimated Total Test Statistics. Final Report.

    ERIC Educational Resources Information Center

    Kleinke, David J.

    In a post-mortem study of item sampling, 1,050 examinees were divided into ten groups 50 times. Each time, their papers were scored on four different sets of item samples from a 150-item test of academic aptitude. These samples were selected using (a) unstratified random sampling and stratification on (b) content, (c) difficulty, and (d) both.…

  11. A preliminary study on identification of Thai rice samples by INAA and statistical analysis

    NASA Astrophysics Data System (ADS)

    Kongsri, S.; Kukusamude, C.

    2017-09-01

    This study aims to investigate the elemental compositions in 93 Thai rice samples using instrumental neutron activation analysis (INAA) and to identify rice according to their types and rice cultivars using statistical analysis. As, Mg, Cl, Al, Br, Mn, K, Rb and Zn in Thai jasmine rice and Sung Yod rice samples were successfully determined by INAA. The accuracy and precision of the INAA method were verified by SRM 1568a Rice Flour. All elements were found to be in a good agreement with the certified values. The precisions in term of %RSD were lower than 7%. The LODs were obtained in range of 0.01 to 29 mg kg-1. The concentration of 9 elements distributed in Thai rice samples was evaluated and used as chemical indicators to identify the type of rice samples. The result found that Mg, Cl, As, Br, Mn, K, Rb, and Zn concentrations in Thai jasmine rice samples are significantly different but there was no evidence that Al is significantly different from concentration in Sung Yod rice samples at 95% confidence interval. Our results may provide preliminary information for discrimination of rice samples and may be useful database of Thai rice.

  12. The effect of sampling rate on observed statistics in a correlated random walk

    PubMed Central

    Rosser, G.; Fletcher, A. G.; Maini, P. K.; Baker, R. E.

    2013-01-01

    Tracking the movement of individual cells or animals can provide important information about their motile behaviour, with key examples including migrating birds, foraging mammals and bacterial chemotaxis. In many experimental protocols, observations are recorded with a fixed sampling interval and the continuous underlying motion is approximated as a series of discrete steps. The size of the sampling interval significantly affects the tracking measurements, the statistics computed from observed trajectories, and the inferences drawn. Despite the widespread use of tracking data to investigate motile behaviour, many open questions remain about these effects. We use a correlated random walk model to study the variation with sampling interval of two key quantities of interest: apparent speed and angle change. Two variants of the model are considered, in which reorientations occur instantaneously and with a stationary pause, respectively. We employ stochastic simulations to study the effect of sampling on the distributions of apparent speeds and angle changes, and present novel mathematical analysis in the case of rapid sampling. Our investigation elucidates the complex nature of sampling effects for sampling intervals ranging over many orders of magnitude. Results show that inclusion of a stationary phase significantly alters the observed distributions of both quantities. PMID:23740484

  13. Some limit theorems for ratios of order statistics from uniform random variables.

    PubMed

    Xu, Shou-Fang; Miao, Yu

    2017-01-01

    In this paper, we study the ratios of order statistics based on samples drawn from uniform distribution and establish some limit properties such as the almost sure central limit theorem, the large deviation principle, the Marcinkiewicz-Zygmund law of large numbers and complete convergence.

  14. Monte Carlo Simulation for Perusal and Practice.

    ERIC Educational Resources Information Center

    Brooks, Gordon P.; Barcikowski, Robert S.; Robey, Randall R.

    The meaningful investigation of many problems in statistics can be solved through Monte Carlo methods. Monte Carlo studies can help solve problems that are mathematically intractable through the analysis of random samples from populations whose characteristics are known to the researcher. Using Monte Carlo simulation, the values of a statistic are…

  15. Constructing Sample Space with Combinatorial Reasoning: A Mixed Methods Study

    ERIC Educational Resources Information Center

    McGalliard, William A., III.

    2012-01-01

    Recent curricular developments suggest that students at all levels need to be statistically literate and able to efficiently and accurately make probabilistic decisions. Furthermore, statistical literacy is a requirement to being a well-informed citizen of society. Research also recognizes that the ability to reason probabilistically is supported…

  16. Adaptive Perfectionism, Maladaptive Perfectionism and Statistics Anxiety in Graduate Psychology Students

    ERIC Educational Resources Information Center

    Comerchero, Victoria; Fortugno, Dominick

    2013-01-01

    The current study examined if correlations between statistics anxiety and dimensions of perfectionism (adaptive and maladaptive) were present amongst a sample of psychology graduate students (N = 96). Results demonstrated that scores on the APS-R Discrepancy scale, corresponding to maladaptive perfectionism, correlated with higher levels of…

  17. Failure to Replicate a Genetic Association May Provide Important Clues About Genetic Architecture

    PubMed Central

    Greene, Casey S.; Penrod, Nadia M.; Williams, Scott M.; Moore, Jason H.

    2009-01-01

    Replication has become the gold standard for assessing statistical results from genome-wide association studies. Unfortunately this replication requirement may cause real genetic effects to be missed. A real result can fail to replicate for numerous reasons including inadequate sample size or variability in phenotype definitions across independent samples. In genome-wide association studies the allele frequencies of polymorphisms may differ due to sampling error or population differences. We hypothesize that some statistically significant independent genetic effects may fail to replicate in an independent dataset when allele frequencies differ and the functional polymorphism interacts with one or more other functional polymorphisms. To test this hypothesis, we designed a simulation study in which case-control status was determined by two interacting polymorphisms with heritabilities ranging from 0.025 to 0.4 with replication sample sizes ranging from 400 to 1600 individuals. We show that the power to replicate the statistically significant independent main effect of one polymorphism can drop dramatically with a change of allele frequency of less than 0.1 at a second interacting polymorphism. We also show that differences in allele frequency can result in a reversal of allelic effects where a protective allele becomes a risk factor in replication studies. These results suggest that failure to replicate an independent genetic effect may provide important clues about the complexity of the underlying genetic architecture. We recommend that polymorphisms that fail to replicate be checked for interactions with other polymorphisms, particularly when samples are collected from groups with distinct ethnic backgrounds or different geographic regions. PMID:19503614

  18. Relations between Brain Structure and Attentional Function in Spina Bifida: Utilization of Robust Statistical Approaches

    PubMed Central

    Kulesz, Paulina A.; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M.; Francis, David J.

    2015-01-01

    Objective Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Method Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product moment correlation was compared with four robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator Results All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Conclusions Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PMID:25495830

  19. Relations between volumetric measures of brain structure and attentional function in spina bifida: utilization of robust statistical approaches.

    PubMed

    Kulesz, Paulina A; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M; Francis, David J

    2015-03-01

    Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product-moment correlation was compared with 4 robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator. All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PsycINFO Database Record (c) 2015 APA, all rights reserved.

  20. Statistical power as a function of Cronbach alpha of instrument questionnaire items.

    PubMed

    Heo, Moonseong; Kim, Namhee; Faith, Myles S

    2015-10-14

    In countless number of clinical trials, measurements of outcomes rely on instrument questionnaire items which however often suffer measurement error problems which in turn affect statistical power of study designs. The Cronbach alpha or coefficient alpha, here denoted by C(α), can be used as a measure of internal consistency of parallel instrument items that are developed to measure a target unidimensional outcome construct. Scale score for the target construct is often represented by the sum of the item scores. However, power functions based on C(α) have been lacking for various study designs. We formulate a statistical model for parallel items to derive power functions as a function of C(α) under several study designs. To this end, we assume fixed true score variance assumption as opposed to usual fixed total variance assumption. That assumption is critical and practically relevant to show that smaller measurement errors are inversely associated with higher inter-item correlations, and thus that greater C(α) is associated with greater statistical power. We compare the derived theoretical statistical power with empirical power obtained through Monte Carlo simulations for the following comparisons: one-sample comparison of pre- and post-treatment mean differences, two-sample comparison of pre-post mean differences between groups, and two-sample comparison of mean differences between groups. It is shown that C(α) is the same as a test-retest correlation of the scale scores of parallel items, which enables testing significance of C(α). Closed-form power functions and samples size determination formulas are derived in terms of C(α), for all of the aforementioned comparisons. Power functions are shown to be an increasing function of C(α), regardless of comparison of interest. The derived power functions are well validated by simulation studies that show that the magnitudes of theoretical power are virtually identical to those of the empirical power. Regardless of research designs or settings, in order to increase statistical power, development and use of instruments with greater C(α), or equivalently with greater inter-item correlations, is crucial for trials that intend to use questionnaire items for measuring research outcomes. Further development of the power functions for binary or ordinal item scores and under more general item correlation strutures reflecting more real world situations would be a valuable future study.

  1. NHEXAS PHASE I ARIZONA STUDY--STANDARD OPERATING PROCEDURE FOR SAMPLING WEIGHT CALCULATION (IIT-A-9.0)

    EPA Science Inventory

    The purpose of this SOP is to describe the procedures undertaken to calculate sampling weights. The sampling weights are needed to obtain weighted statistics of the NHEXAS data. This SOP uses data that have been properly coded and certified with appropriate QA/QC procedures by t...

  2. The Importance of Introductory Statistics Students Understanding Appropriate Sampling Techniques

    ERIC Educational Resources Information Center

    Menil, Violeta C.

    2005-01-01

    In this paper the author discusses the meaning of sampling, the reasons for sampling, the Central Limit Theorem, and the different techniques of sampling. Practical and relevant examples are given to make the appropriate sampling techniques understandable to students of Introductory Statistics courses. With a thorough knowledge of sampling…

  3. A time to be born: Variation in the hour of birth in a rural population of Northern Argentina.

    PubMed

    Chaney, Carlye; Goetz, Laura G; Valeggia, Claudia

    2018-04-17

    The present study aimed at investigating the timing of birth across the day in a rural population of indigenous and nonindigenous women in the province of Formosa, Argentina in order to explore the variation in patterns in a non-Western setting. This study utilized birth record data transcribed from delivery room records at a rural hospital in the province of Formosa, northern Argentina. The sample included data for Criollo, Wichí, and Toba/Qom women (n = 2421). Statistical analysis was conducted using directional statistics to identify a mean sample direction. Chi-square tests for homogeneity were also used to test for statistical significant differences between hours of the day. The mean sample direction was 81.04°, which equates to 5:24 AM when calculated as time on a 24-hr clock. Chi-squared analyses showed a statistically significant peak in births between 12:00 and 4:00 AM. Birth counts generally declined throughout the day until a statistically significant trough around 5:00 PM. This pattern may be associated with the circadian rhythms of hormone release, particularly melatonin, on a proximate level. At the ultimate level, giving birth in the early hours of the morning may have been selected to time births when the mother could benefit from the predator protection and support provided by her social group as well as increased mother-infant bonding from a more peaceful environment. © 2018 Wiley Periodicals, Inc.

  4. [A strategic family medicine model for controlling borderline and mild arterial hypertension].

    PubMed

    Uzcátegui Contreras, D; Granadillo Vera, D; Salinas, P J; Alvarez, N

    1999-10-31

    To research on the relationship of the patient and his/her family as a non-pharmacological factor for blood hypertension. To determine whether a hyposodic, hypocaloric, hypofat, and hypocholesterolemic diet decreases the blood tension. To determine whether physical exercises in the patient and his/her family help to reduce the hypertension. To observe whether the psychological therapy of muscles relaxation helps to reduce the hypertension. To evaluate in the sample of families, the experience of each member, as well as their suggestions and complaints about the programme. To design the strategic model to control the blood tension by ambulatory means. Controlled intervention study, descriptive, non-randomized, prospective. PLACEMENT: Primary care. Study group of 10 patients, 10 wives, and 12 children, and control group of 10 patients excluding family members. With both groups (study and control) there were meetings every 15 days for 6 months according to an established schedule. In the meetings there were given talks, pamphlets, physical exercises, muscles relaxation therapy, all about blood hypertension. There were questionnaires before and after each activity. MEASURING AND MAIN RESULTS: In both groups (study and control) there was a statistically significant (t < 0.01) reduction in the weight. The blood systolic tension decreased in both positions, seated and standing, in the study group (difference statistically significant) but not so in the control group, although there was a non-significant difference (decrease of 1.5 mmHg) in the seated position. The diastolic tension decreased significantly in the study group both in seated and standing positions, not so in the control group. The study sample showed that systolic tension seated and standing had a statistically significant reduction in the study group but not so in the control group. The weight had statistically significant reduction in both study and control groups. Total cholesterol had statistically significant decrease in the study group but not in the control group. HDL-C had statistically significant reduction in the study group; in the control group there was a decrease but not statistically significant. The triglycerides did not decrease statistically significant in any of the groups (study and control).

  5. Using the Bootstrap Method to Evaluate the Critical Range of Misfit for Polytomous Rasch Fit Statistics.

    PubMed

    Seol, Hyunsoo

    2016-06-01

    The purpose of this study was to apply the bootstrap procedure to evaluate how the bootstrapped confidence intervals (CIs) for polytomous Rasch fit statistics might differ according to sample sizes and test lengths in comparison with the rule-of-thumb critical value of misfit. A total of 25 simulated data sets were generated to fit the Rasch measurement and then a total of 1,000 replications were conducted to compute the bootstrapped CIs under each of 25 testing conditions. The results showed that rule-of-thumb critical values for assessing the magnitude of misfit were not applicable because the infit and outfit mean square error statistics showed different magnitudes of variability over testing conditions and the standardized fit statistics did not exactly follow the standard normal distribution. Further, they also do not share the same critical range for the item and person misfit. Based on the results of the study, the bootstrapped CIs can be used to identify misfitting items or persons as they offer a reasonable alternative solution, especially when the distributions of the infit and outfit statistics are not well known and depend on sample size. © The Author(s) 2016.

  6. Sampling designs for contaminant temporal trend analyses using sedentary species exemplified by the snails Bellamya aeruginosa and Viviparus viviparus.

    PubMed

    Yin, Ge; Danielsson, Sara; Dahlberg, Anna-Karin; Zhou, Yihui; Qiu, Yanling; Nyberg, Elisabeth; Bignert, Anders

    2017-10-01

    Environmental monitoring typically assumes samples and sampling activities to be representative of the population being studied. Given a limited budget, an appropriate sampling strategy is essential to support detecting temporal trends of contaminants. In the present study, based on real chemical analysis data on polybrominated diphenyl ethers in snails collected from five subsites in Tianmu Lake, computer simulation is performed to evaluate three sampling strategies by the estimation of required sample size, to reach a detection of an annual change of 5% with a statistical power of 80% and 90% with a significant level of 5%. The results showed that sampling from an arbitrarily selected sampling spot is the worst strategy, requiring much more individual analyses to achieve the above mentioned criteria compared with the other two approaches. A fixed sampling site requires the lowest sample size but may not be representative for the intended study object e.g. a lake and is also sensitive to changes of that particular sampling site. In contrast, sampling at multiple sites along the shore each year, and using pooled samples when the cost to collect and prepare individual specimens are much lower than the cost for chemical analysis, would be the most robust and cost efficient strategy in the long run. Using statistical power as criterion, the results demonstrated quantitatively the consequences of various sampling strategies, and could guide users with respect of required sample sizes depending on sampling design for long term monitoring programs. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Use of Unlabeled Samples for Mitigating the Hughes Phenomenon

    NASA Technical Reports Server (NTRS)

    Landgrebe, David A.; Shahshahani, Behzad M.

    1993-01-01

    The use of unlabeled samples in improving the performance of classifiers is studied. When the number of training samples is fixed and small, additional feature measurements may reduce the performance of a statistical classifier. It is shown that by using unlabeled samples, estimates of the parameters can be improved and therefore this phenomenon may be mitigated. Various methods for using unlabeled samples are reviewed and experimental results are provided.

  8. Fatty acid, cholesterol, vitamin, and mineral content of cooked beef cuts from a national study

    USDA-ARS?s Scientific Manuscript database

    The U.S. Department of Agriculture (USDA) provides foundational nutrient data for U.S. and international databases. For currency of retail beef data in USDA’s database, a nationwide comprehensive study obtained samples by primal categories using a statistically based sampling plan, resulting in 72 ...

  9. On the analysis of very small samples of Gaussian repeated measurements: an alternative approach.

    PubMed

    Westgate, Philip M; Burchett, Woodrow W

    2017-03-15

    The analysis of very small samples of Gaussian repeated measurements can be challenging. First, due to a very small number of independent subjects contributing outcomes over time, statistical power can be quite small. Second, nuisance covariance parameters must be appropriately accounted for in the analysis in order to maintain the nominal test size. However, available statistical strategies that ensure valid statistical inference may lack power, whereas more powerful methods may have the potential for inflated test sizes. Therefore, we explore an alternative approach to the analysis of very small samples of Gaussian repeated measurements, with the goal of maintaining valid inference while also improving statistical power relative to other valid methods. This approach uses generalized estimating equations with a bias-corrected empirical covariance matrix that accounts for all small-sample aspects of nuisance correlation parameter estimation in order to maintain valid inference. Furthermore, the approach utilizes correlation selection strategies with the goal of choosing the working structure that will result in the greatest power. In our study, we show that when accurate modeling of the nuisance correlation structure impacts the efficiency of regression parameter estimation, this method can improve power relative to existing methods that yield valid inference. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  10. STATISTICAL SAMPLING AND DATA ANALYSIS

    EPA Science Inventory

    Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other hetero...

  11. Methodological approaches in analysing observational data: A practical example on how to address clustering and selection bias.

    PubMed

    Trutschel, Diana; Palm, Rebecca; Holle, Bernhard; Simon, Michael

    2017-11-01

    Because not every scientific question on effectiveness can be answered with randomised controlled trials, research methods that minimise bias in observational studies are required. Two major concerns influence the internal validity of effect estimates: selection bias and clustering. Hence, to reduce the bias of the effect estimates, more sophisticated statistical methods are needed. To introduce statistical approaches such as propensity score matching and mixed models into representative real-world analysis and to conduct the implementation in statistical software R to reproduce the results. Additionally, the implementation in R is presented to allow the results to be reproduced. We perform a two-level analytic strategy to address the problems of bias and clustering: (i) generalised models with different abilities to adjust for dependencies are used to analyse binary data and (ii) the genetic matching and covariate adjustment methods are used to adjust for selection bias. Hence, we analyse the data from two population samples, the sample produced by the matching method and the full sample. The different analysis methods in this article present different results but still point in the same direction. In our example, the estimate of the probability of receiving a case conference is higher in the treatment group than in the control group. Both strategies, genetic matching and covariate adjustment, have their limitations but complement each other to provide the whole picture. The statistical approaches were feasible for reducing bias but were nevertheless limited by the sample used. For each study and obtained sample, the pros and cons of the different methods have to be weighted. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  12. Expression of Vascular Endothelial Growth Factor in Odontogenic Cysts: Is There Any Impression on Clinical Outcome?

    PubMed

    Sadri, Donia; Farhadi, Sareh; Shahabi, Zahra; Sarshar, Samaneh

    2016-01-01

    The recent scientific reports have shown that angiogenesis can affect biological behavior of pathologic lesions. Regarding unique clinical outcome of Odontogenic keratocyst (OKC), the present study was aimed to compare angiogenesis in Odontogenic keratocyst and Dentigerous cyst (DC). In this experimental study, tissue sections of 46 samples of OKC and DC were stained through immunohistochemical method using Vascular Endothelial Growth Factor (VEGF) antibody. VEGF expression was evaluated in epithelial cells, fibroblasts and endothelial cells. The average percentage of stained cells in any samples was categorized to 3 groups as follows: SCORE 0: 10% of cells or less are positive. SCORE 1: 10 to 50% of cells are positive. SCORE 2: more than 50% of cells are positive. Mann-U-Whitney, T-test and chi-square was used for statistical analysis. The average of VEGF expression in 24 samples of DC was 20.2% and in 22 samples of OKC was 52.6%, respectively. The average of VEGF expression in these two cysts had statistical significant differences. (PV= 0.045). There was significant statistical differences between two cysts in the terms of VEGF SCORE (PV= 0.000). OKC samples had significantly higher SCORE for the purpose of VEGF incidence than DC. Also, there were no differences between VEGF expression in epithelial cells of two cysts (PV= 0.268) there were significant statistical differences between two cysts in terms of endothelial cell staining. The endothelial cell staining was significantly higher in OKC than DC (PV= 0.037%). Regarding higher expression of Vascular Endothelial Growth factor in OKC than DC, it seems that angiogenesis may have great impression on clinical outcome of OKC.

  13. Multivariate statistical assessment of heavy metal pollution sources of groundwater around a lead and zinc plant.

    PubMed

    Zamani, Abbas Ali; Yaftian, Mohammad Reza; Parizanganeh, Abdolhossein

    2012-12-17

    The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran) for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP). Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs). Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area.

  14. Multivariate statistical assessment of heavy metal pollution sources of groundwater around a lead and zinc plant

    PubMed Central

    2012-01-01

    The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran) for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP). Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs). Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area. PMID:23369182

  15. A review of empirical research related to the use of small quantitative samples in clinical outcome scale development.

    PubMed

    Houts, Carrie R; Edwards, Michael C; Wirth, R J; Deal, Linda S

    2016-11-01

    There has been a notable increase in the advocacy of using small-sample designs as an initial quantitative assessment of item and scale performance during the scale development process. This is particularly true in the development of clinical outcome assessments (COAs), where Rasch analysis has been advanced as an appropriate statistical tool for evaluating the developing COAs using a small sample. We review the benefits such methods are purported to offer from both a practical and statistical standpoint and detail several problematic areas, including both practical and statistical theory concerns, with respect to the use of quantitative methods, including Rasch-consistent methods, with small samples. The feasibility of obtaining accurate information and the potential negative impacts of misusing large-sample statistical methods with small samples during COA development are discussed.

  16. A statistical approach to selecting and confirming validation targets in -omics experiments

    PubMed Central

    2012-01-01

    Background Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. Results Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. Conclusions For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. PMID:22738145

  17. Risk management in inpatient units in the Czech Republic from the point of view of nurses in leadership positions.

    PubMed

    Prokešová, Radka; Brabcová, Iva; Pokojová, Radka; Bártlová, Sylva

    2016-12-01

    The goal of this study was to assess specific features of risk management from the point of view of nurses in leadership positions in inpatient units in Czech hospitals. The study was performed using a quantitative research strategy, i.e., a questionnaire. The data sample was analyzed using SPSS v. 23.0. Pearson's chi-square and analysis of adjusted residues were used for identifying the existence associations of nominal and/or ordinal quantities. 315 nurses in leadership positions working in inpatient units of Czech hospitals were included in the sample. The sample was created using random selection by means of quotas. Based on the study results, statistically significant relations between the respondents' education and the utilization of methods to identify risks were identified. Furthermore, statistically significant relationships were found between a nurse's functional role within the system and regular analysis and evaluation of risks and between the type of the healthcare facility and the degree of patient involvement in risk management. The study found statistically significant correlations that can be used to increase the effectiveness of risk management in inpatient units of Czech hospitals. From this perspective, the fact that patient involvement in risk management was only reported by 37.8% of respondents seems to be the most notable problem.

  18. Evaluation of pesticide residues of organochlorine in vegetables and fruits in Qatar: statistical analysis.

    PubMed

    Al-Shamary, Noora M; Al-Ghouti, Mohammad A; Al-Shaikh, Ismail; Al-Meer, Saeed H; Ahmad, Talaat A

    2016-03-01

    The study aimed to examine the residues of organochlorines pesticides (OCPs) in vegetables and fruits in Qatar. A total of 127 samples was studied. Ninety percent of the imported samples recorded residues above the maximum residue levels (MRLs). The most frequently detected OCP in the samples was heptachlor (found in 75 samples). In the comparisons between the washed and unwashed samples, no significant differences were observed (P > 0.05). However, the effect of washing process with tap water depended on the type of vegetables and fruits.

  19. Statistical methods for conducting agreement (comparison of clinical tests) and precision (repeatability or reproducibility) studies in optometry and ophthalmology.

    PubMed

    McAlinden, Colm; Khadka, Jyoti; Pesudovs, Konrad

    2011-07-01

    The ever-expanding choice of ocular metrology and imaging equipment has driven research into the validity of their measurements. Consequently, studies of the agreement between two instruments or clinical tests have proliferated in the ophthalmic literature. It is important that researchers apply the appropriate statistical tests in agreement studies. Correlation coefficients are hazardous and should be avoided. The 'limits of agreement' method originally proposed by Altman and Bland in 1983 is the statistical procedure of choice. Its step-by-step use and practical considerations in relation to optometry and ophthalmology are detailed in addition to sample size considerations and statistical approaches to precision (repeatability or reproducibility) estimates. Ophthalmic & Physiological Optics © 2011 The College of Optometrists.

  20. Evaluation of asbestos levels in two schools before and after asbestos removal. Final report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karaffa, M.A.; Chesson, J.; Russell, J.

    This report presents a statistical evaluation of airborne asbestos data collected at two schools before and after removal of asbestos-containing material (ACM). Although the monitoring data are not totally consistent with new Asbestos Hazard Emergency Response Act (AHERA) requirements and recent EPA guidelines, the study evaluates these historical data by standard statistical methods to determine if abated work areas meet proposed clearance criteria. The objectives of this statistical analysis were to compare (1) airborne asbestos levels indoors after removal with levels outdoors, (2) airborne asbestos levels before and after removal of asbestos, and (3) static sampling and aggressive sampling ofmore » airborne asbestos. The results of this evaluation indicated the following: the effect of asbestos removal on indoor air quality is unpredictable; the variability in fiber concentrations among different sampling sites within the same building indicates the need to treat different sites as separate areas for the purpose of clearance; and aggressive sampling is appropriate for clearance testing because it captures more entrainable asbestos structures. Aggressive sampling lowers the chance of declaring a worksite clean when entrainable asbestos is still present.« less

  1. Dermatoglyphic features in patients with multiple sclerosis

    PubMed Central

    Sabanciogullari, Vedat; Cevik, Seyda; Karacan, Kezban; Bolayir, Ertugrul; Cimen, Mehmet

    2014-01-01

    Objective: To examine dermatoglyphic features to clarify implicated genetic predisposition in the etiology of multiple sclerosis (MS). Methods: The study was conducted between January and December 2013 in the Departments of Anatomy, and Neurology, Cumhuriyet University School of Medicine, Sivas, Turkey. The dermatoglyphic data of 61 patients, and a control group consisting of 62 healthy adults obtained with a digital scanner were transferred to a computer environment. The ImageJ program was used, and atd, dat, adt angles, a-b ridge count, sample types of all fingers, and ridge counts were calculated. Results: In both hands of the patients with MS, the a-b ridge count and ridge counts in all fingers increased, and the differences in these values were statistically significant. There was also a statistically significant increase in the dat angle in both hands of the MS patients. On the contrary, there was no statistically significant difference between the groups in terms of dermal ridge samples, and the most frequent sample in both groups was the ulnar loop. Conclusions: Aberrations in the distribution of dermatoglyphic samples support the genetic predisposition in MS etiology. Multiple sclerosis susceptible individuals may be determined by analyzing dermatoglyphic samples. PMID:25274586

  2. The effect of rare variants on inflation of the test statistics in case-control analyses.

    PubMed

    Pirie, Ailith; Wood, Angela; Lush, Michael; Tyrer, Jonathan; Pharoah, Paul D P

    2015-02-20

    The detection of bias due to cryptic population structure is an important step in the evaluation of findings of genetic association studies. The standard method of measuring this bias in a genetic association study is to compare the observed median association test statistic to the expected median test statistic. This ratio is inflated in the presence of cryptic population structure. However, inflation may also be caused by the properties of the association test itself particularly in the analysis of rare variants. We compared the properties of the three most commonly used association tests: the likelihood ratio test, the Wald test and the score test when testing rare variants for association using simulated data. We found evidence of inflation in the median test statistics of the likelihood ratio and score tests for tests of variants with less than 20 heterozygotes across the sample, regardless of the total sample size. The test statistics for the Wald test were under-inflated at the median for variants below the same minor allele frequency. In a genetic association study, if a substantial proportion of the genetic variants tested have rare minor allele frequencies, the properties of the association test may mask the presence or absence of bias due to population structure. The use of either the likelihood ratio test or the score test is likely to lead to inflation in the median test statistic in the absence of population structure. In contrast, the use of the Wald test is likely to result in under-inflation of the median test statistic which may mask the presence of population structure.

  3. CALIPSO Observations of Near-Cloud Aerosol Properties as a Function of Cloud Fraction

    NASA Technical Reports Server (NTRS)

    Yang, Weidong; Marshak, Alexander; Varnai, Tamas; Wood, Robert

    2015-01-01

    This paper uses spaceborne lidar data to study how near-cloud aerosol statistics of attenuated backscatter depend on cloud fraction. The results for a large region around the Azores show that: (1) far-from-cloud aerosol statistics are dominated by samples from scenes with lower cloud fractions, while near-cloud aerosol statistics are dominated by samples from scenes with higher cloud fractions; (2) near-cloud enhancements of attenuated backscatter occur for any cloud fraction but are most pronounced for higher cloud fractions; (3) the difference in the enhancements for different cloud fractions is most significant within 5km from clouds; (4) near-cloud enhancements can be well approximated by logarithmic functions of cloud fraction and distance to clouds. These findings demonstrate that if variability in cloud fraction across the scenes used to composite aerosol statistics are not considered, a sampling artifact will affect these statistics calculated as a function of distance to clouds. For the Azores-region dataset examined here, this artifact occurs mostly within 5 km from clouds, and exaggerates the near-cloud enhancements of lidar backscatter and color ratio by about 30. This shows that for accurate characterization of the changes in aerosol properties with distance to clouds, it is important to account for the impact of changes in cloud fraction.

  4. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Woodroffe, J. R.; Brito, T. V.; Jordanova, V. K.

    In the standard practice of neutron multiplicity counting , the first three sampled factorial moments of the event triggered neutron count distribution were used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. Our study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra,more » Italy, sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  5. Latent spatial models and sampling design for landscape genetics

    USGS Publications Warehouse

    Hanks, Ephraim M.; Hooten, Mevin B.; Knick, Steven T.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Cross, Todd B.; Schwartz, Michael K.

    2016-01-01

    We propose a spatially-explicit approach for modeling genetic variation across space and illustrate how this approach can be used to optimize spatial prediction and sampling design for landscape genetic data. We propose a multinomial data model for categorical microsatellite allele data commonly used in landscape genetic studies, and introduce a latent spatial random effect to allow for spatial correlation between genetic observations. We illustrate how modern dimension reduction approaches to spatial statistics can allow for efficient computation in landscape genetic statistical models covering large spatial domains. We apply our approach to propose a retrospective spatial sampling design for greater sage-grouse (Centrocercus urophasianus) population genetics in the western United States.

  6. Reporting of various methodological and statistical parameters in negative studies published in prominent Indian Medical Journals: a systematic review.

    PubMed

    Charan, J; Saxena, D

    2014-01-01

    Biased negative studies not only reflect poor research effort but also have an impact on 'patient care' as they prevent further research with similar objectives, leading to potential research areas remaining unexplored. Hence, published 'negative studies' should be methodologically strong. All parameters that may help a reader to judge validity of results and conclusions should be reported in published negative studies. There is a paucity of data on reporting of statistical and methodological parameters in negative studies published in Indian Medical Journals. The present systematic review was designed with an aim to critically evaluate negative studies published in prominent Indian Medical Journals for reporting of statistical and methodological parameters. Systematic review. All negative studies published in 15 Science Citation Indexed (SCI) medical journals published from India were included in present study. Investigators involved in the study evaluated all negative studies for the reporting of various parameters. Primary endpoints were reporting of "power" and "confidence interval." Power was reported in 11.8% studies. Confidence interval was reported in 15.7% studies. Majority of parameters like sample size calculation (13.2%), type of sampling method (50.8%), name of statistical tests (49.1%), adjustment of multiple endpoints (1%), post hoc power calculation (2.1%) were reported poorly. Frequency of reporting was more in clinical trials as compared to other study designs and in journals having impact factor more than 1 as compared to journals having impact factor less than 1. Negative studies published in prominent Indian medical journals do not report statistical and methodological parameters adequately and this may create problems in the critical appraisal of findings reported in these journals by its readers.

  7. Ground-water quality and effects of poultry confined animal feeding operations on shallow ground water, upper Shoal Creek basin, Southwest Missouri, 2000

    USGS Publications Warehouse

    Mugel, Douglas N.

    2002-01-01

    Forty-seven wells and 8 springs were sampled in May, October, and November 2000 in the upper Shoal Creek Basin, southwest Missouri, to determine if nutrient concentrations and fecal bacteria densities are increasing in the shallow aquifer as a result of poultry confined animal feeding operations (CAFOs). Most of the land use in the basin is agricultural, with cattle and hay production dominating; the number of poultry CAFOs has increased in recent years. Poultry waste (litter) is used as a source of nutrients on pasture land as much as several miles away from poultry barns.Most wells in the sample network were classified as ?P? wells, which were open only or mostly to the Springfield Plateau aquifer and where poultry litter was applied to a substantial acreage within 0.5 mile of the well both in spring 2000 and in several previous years; and ?Ag? wells, which were open only or mostly to the Springfield Plateau aquifer and which had limited or no association with poultry CAFOs. Water-quality data from wells and springs were grouped for statistical purposes as P1, Ag1, and Sp1 (May 2000 samples) and P2, Ag2, and Sp2 (October or November 2000 samples). The results of this study do not indicate that poultry CAFOs are affecting the shallow ground water in the upper Shoal Creek Basin with respect to nutrient concentrations and fecal bacteria densities. Statistical tests do not indicate that P wells sampled in spring 2000 have statistically larger concentrations of nitrite plus nitrate or fecal indicator bacteria densities than Ag wells sampled during the same time, at a 95-percent confidence level. Instead, the Ag wells had statistically larger concentrations of nitrite plus nitrate and fecal coliform bacteria densities than the P wells.The results of this study do not indicate seasonal variations from spring 2000 to fall 2000 in the concentrations of nutrients or fecal indicator bacteria densities from well samples. Statistical tests do not indicate statistically significant differences at a 95-percent confidence level for nitrite plus nitrate concentrations or fecal indicator bacteria densities between either P wells sampled in spring and fall 2000, or Ag wells sampled in spring and fall 2000. However, analysis of samples from springs shows that fecal streptococcus bacteria densities were statistically smaller in fall 2000 than in spring 2000 at a 95-percent confidence level.Nitrite plus nitrate concentrations in spring 2000 samples ranged from less than the detection level [0.02 mg/L (milligram per liter) as nitrogen] to 18 mg/L as nitrogen. Seven samples from three wells had nitrite plus nitrate concentrations at or larger than the maximum contaminant level (MCL) of 10 mg/L as nitrogen. The median nitrite plus nitrate concentrations were 0.28 mg/L as nitrogen for P1 samples, 4.6 mg/L as nitrogen for Ag1 samples, and 3.9 mg/L as nitrogen for Sp1 samples.Fecal coliform bacteria were detected in 1 of 25 P1 samples and 5 of 15 Ag1 samples. Escherichia coli (E. coli) bacteria were detected in 3 of 24 P1 samples and 1 of 13 Ag1 samples. Fecal streptococcus bacteria were detected in 8 of 25 P1 samples and 6 of 15 Ag1 samples. Bacteria densities in samples from wells ranged from less than 1 to 81 col/100 mL (colonies per 100 milliliters) of fecal coliform, less than 1 to 140 col/100 mL of E. coli, and less than 1 to 130 col/100 mL of fecal streptococcus. Fecal indicator bacteria densities in samples from springs were substantially larger than in samples from wells. In Sp1 samples, bacteria densities ranged from 12 to 3,300 col/100 mL of fecal coliform, 40 to 2,700 col/100 mL of E. coli, and 42 to 3,100 col/100 mL of fecal streptococcus.

  8. Role of microstructure on twin nucleation and growth in HCP titanium: A statistical study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arul Kumar, M.; Wroński, M.; McCabe, Rodney James

    In this study, a detailed statistical analysis is performed using Electron Back Scatter Diffraction (EBSD) to establish the effect of microstructure on twin nucleation and growth in deformed commercial purity hexagonal close packed (HCP) titanium. Rolled titanium samples are compressed along rolling, transverse and normal directions to establish statistical correlations for {10–12}, {11–21}, and {11–22} twins. A recently developed automated EBSD-twinning analysis software is employed for the statistical analysis. Finally, the analysis provides the following key findings: (I) grain size and strain dependence is different for twin nucleation and growth; (II) twinning statistics can be generalized for the HCP metalsmore » magnesium, zirconium and titanium; and (III) complex microstructure, where grain shape and size distribution is heterogeneous, requires multi-point statistical correlations.« less

  9. Role of microstructure on twin nucleation and growth in HCP titanium: A statistical study

    DOE PAGES

    Arul Kumar, M.; Wroński, M.; McCabe, Rodney James; ...

    2018-02-01

    In this study, a detailed statistical analysis is performed using Electron Back Scatter Diffraction (EBSD) to establish the effect of microstructure on twin nucleation and growth in deformed commercial purity hexagonal close packed (HCP) titanium. Rolled titanium samples are compressed along rolling, transverse and normal directions to establish statistical correlations for {10–12}, {11–21}, and {11–22} twins. A recently developed automated EBSD-twinning analysis software is employed for the statistical analysis. Finally, the analysis provides the following key findings: (I) grain size and strain dependence is different for twin nucleation and growth; (II) twinning statistics can be generalized for the HCP metalsmore » magnesium, zirconium and titanium; and (III) complex microstructure, where grain shape and size distribution is heterogeneous, requires multi-point statistical correlations.« less

  10. Arsenic health risk assessment in drinking water and source apportionment using multivariate statistical techniques in Kohistan region, northern Pakistan.

    PubMed

    Muhammad, Said; Tahir Shah, M; Khan, Sardar

    2010-10-01

    The present study was conducted in Kohistan region, where mafic and ultramafic rocks (Kohistan island arc and Indus suture zone) and metasedimentary rocks (Indian plate) are exposed. Water samples were collected from the springs, streams and Indus river and analyzed for physical parameters, anions, cations and arsenic (As(3+), As(5+) and arsenic total). The water quality in Kohistan region was evaluated by comparing the physio-chemical parameters with permissible limits set by Pakistan environmental protection agency and world health organization. Most of the studied parameters were found within their respective permissible limits. However in some samples, the iron and arsenic concentrations exceeded their permissible limits. For health risk assessment of arsenic, the average daily dose, hazards quotient (HQ) and cancer risk were calculated by using statistical formulas. The values of HQ were found >1 in the samples collected from Jabba, Dubair, while HQ values were <1 in rest of the samples. This level of contamination should have low chronic risk and medium cancer risk when compared with US EPA guidelines. Furthermore, the inter-dependence of physio-chemical parameters and pollution load was also calculated by using multivariate statistical techniques like one-way ANOVA, correlation analysis, regression analysis, cluster analysis and principle component analysis. Copyright © 2010 Elsevier Ltd. All rights reserved.

  11. Citizenship among a Sample of Hearing and Hearing Impaired Kindergarten's Children in Al-Riyadh Saudi Arabia "Comparative Study"

    ERIC Educational Resources Information Center

    Turkestani, Maryam Hafez; Bahatheg, Raja' Omar

    2015-01-01

    This study aimed at identifying statistically significant differences in citizenship between Saudi hearing and hearing impaired children. The study sample consisted of (167) hearing and (42) hearing impaired children at public kindergartens in Al-Riyadh city, (82) of whom were males and (127) were female children. Data was collected using…

  12. Derivation and Applicability of Asymptotic Results for Multiple Subtests Person-Fit Statistics

    PubMed Central

    Albers, Casper J.; Meijer, Rob R.; Tendeiro, Jorge N.

    2016-01-01

    In high-stakes testing, it is important to check the validity of individual test scores. Although a test may, in general, result in valid test scores for most test takers, for some test takers, test scores may not provide a good description of a test taker’s proficiency level. Person-fit statistics have been proposed to check the validity of individual test scores. In this study, the theoretical asymptotic sampling distribution of two person-fit statistics that can be used for tests that consist of multiple subtests is first discussed. Second, simulation study was conducted to investigate the applicability of this asymptotic theory for tests of finite length, in which the correlation between subtests and number of items in the subtests was varied. The authors showed that these distributions provide reasonable approximations, even for tests consisting of subtests of only 10 items each. These results have practical value because researchers do not have to rely on extensive simulation studies to simulate sampling distributions. PMID:29881053

  13. Combining censored and uncensored data in a U-statistic: design and sample size implications for cell therapy research.

    PubMed

    Moyé, Lemuel A; Lai, Dejian; Jing, Kaiyan; Baraniuk, Mary Sarah; Kwak, Minjung; Penn, Marc S; Wu, Colon O

    2011-01-01

    The assumptions that anchor large clinical trials are rooted in smaller, Phase II studies. In addition to specifying the target population, intervention delivery, and patient follow-up duration, physician-scientists who design these Phase II studies must select the appropriate response variables (endpoints). However, endpoint measures can be problematic. If the endpoint assesses the change in a continuous measure over time, then the occurrence of an intervening significant clinical event (SCE), such as death, can preclude the follow-up measurement. Finally, the ideal continuous endpoint measurement may be contraindicated in a fraction of the study patients, a change that requires a less precise substitution in this subset of participants.A score function that is based on the U-statistic can address these issues of 1) intercurrent SCE's and 2) response variable ascertainments that use different measurements of different precision. The scoring statistic is easy to apply, clinically relevant, and provides flexibility for the investigators' prospective design decisions. Sample size and power formulations for this statistic are provided as functions of clinical event rates and effect size estimates that are easy for investigators to identify and discuss. Examples are provided from current cardiovascular cell therapy research.

  14. GASP cloud- and particle-encounter statistics and their application to LFC aircraft studies. Volume 2: Appendixes

    NASA Technical Reports Server (NTRS)

    Jasperson, W. H.; Nastron, G. D.; Davis, R. E.; Holdeman, J. D.

    1984-01-01

    Summary studies are presented for the entire cloud observation archive from the NASA Global Atmospheric Sampling Program (GASP). Studies are also presented for GASP particle-concentration data gathered concurrently with the cloud observations. Cloud encounters are shown on about 15 percent of the data samples overall, but the probability of cloud encounter is shown to vary significantly with altitude, latitude, and distance from the tropopause. Several meteorological circulation features are apparent in the latitudinal distribution of cloud cover, and the cloud-encounter statistics are shown to be consistent with the classical mid-latitude cyclone model. Observations of clouds spaced more closely than 90 minutes are shown to be statistically dependent. The statistics for cloud and particle encounter are utilized to estimate the frequency of cloud encounter on long-range airline routes, and to assess the probability and extent of laminaar flow loss due to cloud or particle encounter by aircraft utilizing laminar flow control (LFC). It is shown that the probability of extended cloud encounter is too low, of itself, to make LFC impractical. This report is presented in two volumes. Volume I contains the narrative, analysis, and conclusions. Volume II contains five supporting appendixes.

  15. Differentiating gold nanorod samples using particle size and shape distributions from transmission electron microscope images

    NASA Astrophysics Data System (ADS)

    Grulke, Eric A.; Wu, Xiaochun; Ji, Yinglu; Buhr, Egbert; Yamamoto, Kazuhiro; Song, Nam Woong; Stefaniak, Aleksandr B.; Schwegler-Berry, Diane; Burchett, Woodrow W.; Lambert, Joshua; Stromberg, Arnold J.

    2018-04-01

    Size and shape distributions of gold nanorod samples are critical to their physico-chemical properties, especially their longitudinal surface plasmon resonance. This interlaboratory comparison study developed methods for measuring and evaluating size and shape distributions for gold nanorod samples using transmission electron microscopy (TEM) images. The objective was to determine whether two different samples, which had different performance attributes in their application, were different with respect to their size and/or shape descriptor distributions. Touching particles in the captured images were identified using a ruggedness shape descriptor. Nanorods could be distinguished from nanocubes using an elongational shape descriptor. A non-parametric statistical test showed that cumulative distributions of an elongational shape descriptor, that is, the aspect ratio, were statistically different between the two samples for all laboratories. While the scale parameters of size and shape distributions were similar for both samples, the width parameters of size and shape distributions were statistically different. This protocol fulfills an important need for a standardized approach to measure gold nanorod size and shape distributions for applications in which quantitative measurements and comparisons are important. Furthermore, the validated protocol workflow can be automated, thus providing consistent and rapid measurements of nanorod size and shape distributions for researchers, regulatory agencies, and industry.

  16. Ichthyoplankton abundance and variance in a large river system concerns for long-term monitoring

    USGS Publications Warehouse

    Holland-Bartels, Leslie E.; Dewey, Michael R.; Zigler, Steven J.

    1995-01-01

    System-wide spatial patterns of ichthyoplankton abundance and variability were assessed in the upper Mississippi and lower Illinois rivers to address the experimental design and statistical confidence in density estimates. Ichthyoplankton was sampled from June to August 1989 in primary milieus (vegetated and non-vegated backwaters and impounded areas, main channels and main channel borders) in three navigation pools (8, 13 and 26) of the upper Mississippi River and in a downstream reach of the Illinois River. Ichthyoplankton densities varied among stations of similar aquatic landscapes (milieus) more than among subsamples within a station. An analysis of sampling effort indicated that the collection of single samples at many stations in a given milieu type is statistically and economically preferable to the collection of multiple subsamples at fewer stations. Cluster analyses also revealed that stations only generally grouped by their preassigned milieu types. Pilot studies such as this can define station groupings and sources of variation beyond an a priori habitat classification. Thus the minimum intensity of sampling required to achieve a desired statistical confidence can be identified before implementing monitoring efforts.

  17. Students' Appreciation of Expectation and Variation as a Foundation for Statistical Understanding

    ERIC Educational Resources Information Center

    Watson, Jane M.; Callingham, Rosemary A.; Kelly, Ben A.

    2007-01-01

    This study presents the results of a partial credit Rasch analysis of in-depth interview data exploring statistical understanding of 73 school students in 6 contextual settings. The use of Rasch analysis allowed the exploration of a single underlying variable across contexts, which included probability sampling, representation of temperature…

  18. Nonparametric Bayesian predictive distributions for future order statistics

    Treesearch

    Richard A. Johnson; James W. Evans; David W. Green

    1999-01-01

    We derive the predictive distribution for a specified order statistic, determined from a future random sample, under a Dirichlet process prior. Two variants of the approach are treated and some limiting cases studied. A practical application to monitoring the strength of lumber is discussed including choices of prior expectation and comparisons made to a Bayesian...

  19. Testing homogeneity of proportion ratios for stratified correlated bilateral data in two-arm randomized clinical trials.

    PubMed

    Pei, Yanbo; Tian, Guo-Liang; Tang, Man-Lai

    2014-11-10

    Stratified data analysis is an important research topic in many biomedical studies and clinical trials. In this article, we develop five test statistics for testing the homogeneity of proportion ratios for stratified correlated bilateral binary data based on an equal correlation model assumption. Bootstrap procedures based on these test statistics are also considered. To evaluate the performance of these statistics and procedures, we conduct Monte Carlo simulations to study their empirical sizes and powers under various scenarios. Our results suggest that the procedure based on score statistic performs well generally and is highly recommended. When the sample size is large, procedures based on the commonly used weighted least square estimate and logarithmic transformation with Mantel-Haenszel estimate are recommended as they do not involve any computation of maximum likelihood estimates requiring iterative algorithms. We also derive approximate sample size formulas based on the recommended test procedures. Finally, we apply the proposed methods to analyze a multi-center randomized clinical trial for scleroderma patients. Copyright © 2014 John Wiley & Sons, Ltd.

  20. Decisions from Experience: Why Small Samples?

    ERIC Educational Resources Information Center

    Hertwig, Ralph; Pleskac, Timothy J.

    2010-01-01

    In many decisions we cannot consult explicit statistics telling us about the risks involved in our actions. In lieu of such data, we can arrive at an understanding of our dicey options by sampling from them. The size of the samples that we take determines, ceteris paribus, how good our choices will be. Studies of decisions from experience have…

  1. A Guerilla Guide to Common Problems in ‘Neurostatistics’: Essential Statistical Topics in Neuroscience

    PubMed Central

    Smith, Paul F.

    2017-01-01

    Effective inferential statistical analysis is essential for high quality studies in neuroscience. However, recently, neuroscience has been criticised for the poor use of experimental design and statistical analysis. Many of the statistical issues confronting neuroscience are similar to other areas of biology; however, there are some that occur more regularly in neuroscience studies. This review attempts to provide a succinct overview of some of the major issues that arise commonly in the analyses of neuroscience data. These include: the non-normal distribution of the data; inequality of variance between groups; extensive correlation in data for repeated measurements across time or space; excessive multiple testing; inadequate statistical power due to small sample sizes; pseudo-replication; and an over-emphasis on binary conclusions about statistical significance as opposed to effect sizes. Statistical analysis should be viewed as just another neuroscience tool, which is critical to the final outcome of the study. Therefore, it needs to be done well and it is a good idea to be proactive and seek help early, preferably before the study even begins. PMID:29371855

  2. A Guerilla Guide to Common Problems in 'Neurostatistics': Essential Statistical Topics in Neuroscience.

    PubMed

    Smith, Paul F

    2017-01-01

    Effective inferential statistical analysis is essential for high quality studies in neuroscience. However, recently, neuroscience has been criticised for the poor use of experimental design and statistical analysis. Many of the statistical issues confronting neuroscience are similar to other areas of biology; however, there are some that occur more regularly in neuroscience studies. This review attempts to provide a succinct overview of some of the major issues that arise commonly in the analyses of neuroscience data. These include: the non-normal distribution of the data; inequality of variance between groups; extensive correlation in data for repeated measurements across time or space; excessive multiple testing; inadequate statistical power due to small sample sizes; pseudo-replication; and an over-emphasis on binary conclusions about statistical significance as opposed to effect sizes. Statistical analysis should be viewed as just another neuroscience tool, which is critical to the final outcome of the study. Therefore, it needs to be done well and it is a good idea to be proactive and seek help early, preferably before the study even begins.

  3. High risk human papillomavirus in the periodontium : A case control study.

    PubMed

    Shipilova, Anna; Dayakar, Manjunath Mundoor; Gupta, Dinesh

    2017-01-01

    Human papilloma viruses (HPVs) are small DNA viruses that have been identified in periodontal pocket as well as gingival sulcus. High risk HPVs are also associated with a subset of head and neck carcinomas. It is thought that the periodontium could be a reservoir for HPV. 1. Detection of Human Papilloma virus (HPV) in periodontal pocket as well as gingival of patients having localized chronic periodontitis and gingival sulcus of periodontally healthy subjects. 2. Quantitative estimation of E6 and E7 mRNA in subjects showing presence of HPV3. To assess whether periodontal pocket is a reservoir for HPV. This case-control study included 30 subjects with localized chronic Periodontitis (cases) and 30 periodontally healthy subjects (controls). Two samples were taken from cases, one from periodontal pocket and one from gingival sulcus and one sample was taken from controls. Samples were collected in the form of pocket scrapings and gingival sulcus scrapings from cases and controls respectively. These samples were sent in storage media for identification and estimation of E6/E7 mRNA of HPV using in situ hybridization and flow cytometry. Statistical analysis was done by using, mean, percentage and Chi Square test. A statistical package SPSS version 13.0 was used to analyze the data. P value < 0.05 was considered as statistically significant. pocket samples as well as sulcus samples for both cases and controls were found to contain HPV E6/E7 mRNAInterpretation and. Presence of HPV E6/E7 mRNA in periodontium supports the hypothesis that periodontal tissues serve as a reservoir for latent HPV and there may be a synergy between oral cancer, periodontitis and HPV. However prospective studies are required to further explore this link.

  4. Sample Size Requirements for Studies of Treatment Effects on Beta-Cell Function in Newly Diagnosed Type 1 Diabetes

    PubMed Central

    Lachin, John M.; McGee, Paula L.; Greenbaum, Carla J.; Palmer, Jerry; Gottlieb, Peter; Skyler, Jay

    2011-01-01

    Preservation of -cell function as measured by stimulated C-peptide has recently been accepted as a therapeutic target for subjects with newly diagnosed type 1 diabetes. In recently completed studies conducted by the Type 1 Diabetes Trial Network (TrialNet), repeated 2-hour Mixed Meal Tolerance Tests (MMTT) were obtained for up to 24 months from 156 subjects with up to 3 months duration of type 1 diabetes at the time of study enrollment. These data provide the information needed to more accurately determine the sample size needed for future studies of the effects of new agents on the 2-hour area under the curve (AUC) of the C-peptide values. The natural log(), log(+1) and square-root transformations of the AUC were assessed. In general, a transformation of the data is needed to better satisfy the normality assumptions for commonly used statistical tests. Statistical analysis of the raw and transformed data are provided to estimate the mean levels over time and the residual variation in untreated subjects that allow sample size calculations for future studies at either 12 or 24 months of follow-up and among children 8–12 years of age, adolescents (13–17 years) and adults (18+ years). The sample size needed to detect a given relative (percentage) difference with treatment versus control is greater at 24 months than at 12 months of follow-up, and differs among age categories. Owing to greater residual variation among those 13–17 years of age, a larger sample size is required for this age group. Methods are also described for assessment of sample size for mixtures of subjects among the age categories. Statistical expressions are presented for the presentation of analyses of log(+1) and transformed values in terms of the original units of measurement (pmol/ml). Analyses using different transformations are described for the TrialNet study of masked anti-CD20 (rituximab) versus masked placebo. These results provide the information needed to accurately evaluate the sample size for studies of new agents to preserve C-peptide levels in newly diagnosed type 1 diabetes. PMID:22102862

  5. Sample size requirements for studies of treatment effects on beta-cell function in newly diagnosed type 1 diabetes.

    PubMed

    Lachin, John M; McGee, Paula L; Greenbaum, Carla J; Palmer, Jerry; Pescovitz, Mark D; Gottlieb, Peter; Skyler, Jay

    2011-01-01

    Preservation of β-cell function as measured by stimulated C-peptide has recently been accepted as a therapeutic target for subjects with newly diagnosed type 1 diabetes. In recently completed studies conducted by the Type 1 Diabetes Trial Network (TrialNet), repeated 2-hour Mixed Meal Tolerance Tests (MMTT) were obtained for up to 24 months from 156 subjects with up to 3 months duration of type 1 diabetes at the time of study enrollment. These data provide the information needed to more accurately determine the sample size needed for future studies of the effects of new agents on the 2-hour area under the curve (AUC) of the C-peptide values. The natural log(x), log(x+1) and square-root (√x) transformations of the AUC were assessed. In general, a transformation of the data is needed to better satisfy the normality assumptions for commonly used statistical tests. Statistical analysis of the raw and transformed data are provided to estimate the mean levels over time and the residual variation in untreated subjects that allow sample size calculations for future studies at either 12 or 24 months of follow-up and among children 8-12 years of age, adolescents (13-17 years) and adults (18+ years). The sample size needed to detect a given relative (percentage) difference with treatment versus control is greater at 24 months than at 12 months of follow-up, and differs among age categories. Owing to greater residual variation among those 13-17 years of age, a larger sample size is required for this age group. Methods are also described for assessment of sample size for mixtures of subjects among the age categories. Statistical expressions are presented for the presentation of analyses of log(x+1) and √x transformed values in terms of the original units of measurement (pmol/ml). Analyses using different transformations are described for the TrialNet study of masked anti-CD20 (rituximab) versus masked placebo. These results provide the information needed to accurately evaluate the sample size for studies of new agents to preserve C-peptide levels in newly diagnosed type 1 diabetes.

  6. Statistical Considerations of Food Allergy Prevention Studies.

    PubMed

    Bahnson, Henry T; du Toit, George; Lack, Gideon

    Clinical studies to prevent the development of food allergy have recently helped reshape public policy recommendations on the early introduction of allergenic foods. These trials are also prompting new research, and it is therefore important to address the unique design and analysis challenges of prevention trials. We highlight statistical concepts and give recommendations that clinical researchers may wish to adopt when designing future study protocols and analysis plans for prevention studies. Topics include selecting a study sample, addressing internal and external validity, improving statistical power, choosing alpha and beta, analysis innovations to address dilution effects, and analysis methods to deal with poor compliance, dropout, and missing data. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  7. Effects of sampling interval on spatial patterns and statistics of watershed nitrogen concentration

    USGS Publications Warehouse

    Wu, S.-S.D.; Usery, E.L.; Finn, M.P.; Bosch, D.D.

    2009-01-01

    This study investigates how spatial patterns and statistics of a 30 m resolution, model-simulated, watershed nitrogen concentration surface change with sampling intervals from 30 m to 600 m for every 30 m increase for the Little River Watershed (Georgia, USA). The results indicate that the mean, standard deviation, and variogram sills do not have consistent trends with increasing sampling intervals, whereas the variogram ranges remain constant. A sampling interval smaller than or equal to 90 m is necessary to build a representative variogram. The interpolation accuracy, clustering level, and total hot spot areas show decreasing trends approximating a logarithmic function. The trends correspond to the nitrogen variogram and start to level at a sampling interval of 360 m, which is therefore regarded as a critical spatial scale of the Little River Watershed. Copyright ?? 2009 by Bellwether Publishing, Ltd. All right reserved.

  8. Control of oral malodour by dentifrices measured by gas chromatography.

    PubMed

    Newby, Evelyn E; Hickling, Jenneth M; Hughes, Francis J; Proskin, Howard M; Bosma, Marylynn P

    2008-04-01

    To evaluate the effect of toothpaste treatments on levels of oral volatile sulphur compounds (VSCs) measured by gas chromatography in two clinical studies. These were blinded, randomised, controlled, crossover studies with 16 (study A) or 20 (study B) healthy volunteers between the ages of 19-54. Study A: breath samples were collected at baseline, immediately and lhr after brushing. Four dentifrices (Zinc A, Zinc B, commercially available triclosan dentifrice and zinc free control) were evaluated. Study B: breath samples were collected at baseline, immediately, 1, 2, 3 and 7 hours after treatment. Subjects consumed a light breakfast then provided an additional breath sample between baseline assessment and treatment. Two dentifrices (gel-to-foam and a commercially available triclosan dentrifrice) were evaluated. Breath samples were collected in syringes and analysed for VSCs (hydrogen sulphide, methyl mercaptan and Total VSCs) utilising gas chromatography (GC) with flame photometric detection. Study A: immediately after treatment, a statistically significant reduction in VSCs from baseline was observed for Zinc A product only. A statistically significant reduction in VSCs from baseline was observed after 1 hour for all products. Both zinc products exhibited a significantly greater reduction from baseline VSCs than Colgate Total and Control at all time points. Study B: a statistically significant reduction in VSCs from baseline was observed at all time points for both products. The gel-to-foam product exhibited significantly greater reduction from baseline Total VSC concentration than Colgate Total at all time points from 1 hour post-treatment. Control of oral malodour by toothpaste treatment, evaluated as VSC levels using GC, has been demonstrated. Zinc is effective at reducing VSCs and the efficacy of zinc is formulation dependent. A gel-to-foam dentifrice was more effective at reducing VSCs than Colgate Total up to 7 hours.

  9. PROBABILITY SAMPLING AND POPULATION INFERENCE IN MONITORING PROGRAMS

    EPA Science Inventory

    A fundamental difference between probability sampling and conventional statistics is that "sampling" deals with real, tangible populations, whereas "conventional statistics" usually deals with hypothetical populations that have no real-world realization. he focus here is on real ...

  10. Statistical distribution sampling

    NASA Technical Reports Server (NTRS)

    Johnson, E. S.

    1975-01-01

    Determining the distribution of statistics by sampling was investigated. Characteristic functions, the quadratic regression problem, and the differential equations for the characteristic functions are analyzed.

  11. A comprehensive review of arsenic levels in the semiconductor manufacturing industry.

    PubMed

    Park, Donguk; Yang, Haengsun; Jeong, Jeeyeon; Ha, Kwonchul; Choi, Sangjun; Kim, Chinyon; Yoon, Chungsik; Park, Dooyong; Paek, Domyung

    2010-11-01

    This paper presents a summary of arsenic level statistics from air and wipe samples taken from studies conducted in fabrication operations. The main objectives of this study were not only to describe arsenic measurement data but also, through a literature review, to categorize fabrication workers in accordance with observed arsenic levels. All airborne arsenic measurements reported were included in the summary statistics for analysis of the measurement data. The arithmetic mean was estimated assuming a lognormal distribution from the geometric mean and the geometric standard deviation or the range. In addition, weighted arithmetic means (WAMs) were calculated based on the number of measurements reported for each mean. Analysis of variance (ANOVA) was employed to compare arsenic levels classified according to several categories such as the year, sampling type, location sampled, operation type, and cleaning technique. Nine papers were found reporting airborne arsenic measurement data from maintenance workers or maintenance areas in semiconductor chip-making plants. A total of 40 statistical summaries from seven articles were identified that represented a total of 423 airborne arsenic measurements. Arsenic exposure levels taken during normal operating activities in implantation operations (WAM = 1.6 μg m⁻³, no. of samples = 77, no. of statistical summaries = 2) were found to be lower than exposure levels of engineers who were involved in maintenance works (7.7 μg m⁻³, no. of samples = 181, no. of statistical summaries = 19). The highest level (WAM = 218.6 μg m⁻³) was associated with various maintenance works performed inside an ion implantation chamber. ANOVA revealed no significant differences in the WAM arsenic levels among the categorizations based on operation and sampling characteristics. Arsenic levels (56.4 μg m⁻³) recorded during maintenance works performed in dry conditions were found to be much higher than those from maintenance works in wet conditions (0.6 μg m⁻³). Arsenic levels from wipe samples in process areas after maintenance activities ranged from non-detectable to 146 μg cm⁻², indicating the potential for dispersion into the air and hence inhalation. We conclude that workers who are regularly or occasionally involved in maintenance work have higher potential for occupational exposure than other employees who are in charge of routine production work. In addition, fabrication workers can be classified into two groups based on the reviewed arsenic exposure levels: operators with potential for low levels of exposure and maintenance engineers with high levels of exposure. These classifications could be used as a basis for a qualitative ordinal ranking of exposure in an epidemiological study.

  12. Sampling stored product insect pests: a comparison of four statistical sampling models for probability of pest detection

    USDA-ARS?s Scientific Manuscript database

    Statistically robust sampling strategies form an integral component of grain storage and handling activities throughout the world. Developing sampling strategies to target biological pests such as insects in stored grain is inherently difficult due to species biology and behavioral characteristics. ...

  13. Professional School Counseling (PSC) Publication Pattern Review: A Meta-Study of Author and Article Characteristics from the First 15 Years

    ERIC Educational Resources Information Center

    Erford, Bradley T.; Giguere, Monica; Glenn, Kacie; Ciarlone, Hallie

    2015-01-01

    Patterns of articles published in "Professional School Counseling" (PSC) from the first 15 volumes were reviewed in this meta-study. Author characteristics (e.g., sex, employment setting, nation of domicile) and article characteristics (e.g., topic, type, design, sample, sample size, participant type, statistical procedures and…

  14. Testing how voluntary participation requirements in an environmental study affect the planned random sample design outcomes: implications for the predictions of values and their uncertainty.

    NASA Astrophysics Data System (ADS)

    Ander, Louise; Lark, Murray; Smedley, Pauline; Watts, Michael; Hamilton, Elliott; Fletcher, Tony; Crabbe, Helen; Close, Rebecca; Studden, Mike; Leonardi, Giovanni

    2015-04-01

    Random sampling design is optimal in order to be able to assess outcomes, such as the mean of a given variable across an area. However, this optimal sampling design may be compromised to an unknown extent by unavoidable real-world factors: the extent to which the study design can still be considered random, and the influence this may have on the choice of appropriate statistical data analysis is examined in this work. We take a study which relied on voluntary participation for the sampling of private water tap chemical composition in England, UK. This study was designed and implemented as a categorical, randomised study. The local geological classes were grouped into 10 types, which were considered to be most important in likely effects on groundwater chemistry (the source of all the tap waters sampled). Locations of the users of private water supplies were made available to the study group from the Local Authority in the area. These were then assigned, based on location, to geological groups 1 to 10 and randomised within each group. However, the permission to collect samples then required active, voluntary participation by householders and thus, unlike many environmental studies, could not always follow the initial sample design. Impediments to participation ranged from 'willing but not available' during the designated sampling period, to a lack of response to requests to sample (assumed to be wholly unwilling or unable to participate). Additionally, a small number of unplanned samples were collected via new participants making themselves known to the sampling teams, during the sampling period. Here we examine the impact this has on the 'random' nature of the resulting data distribution, by comparison with the non-participating known supplies. We consider the implications this has on choice of statistical analysis methods to predict values and uncertainty at un-sampled locations.

  15. Environmental monitoring study of linear alkylbenzene sulfonates and insoluble soap in Spanish sewage sludge samples.

    PubMed

    Cantarero, Samuel; Zafra-Gómez, Alberto; Ballesteros, Oscar; Navalón, Alberto; Reis, Marco S; Saraiva, Pedro M; Vílchez, José L

    2011-01-01

    In this work we present a monitoring study of linear alkylbenzene sulfonates (LAS) and insoluble soap performed on Spanish sewage sludge samples. This work focuses on finding statistical relations between LAS concentrations and insoluble soap in sewage sludge samples and variables related to wastewater treatment plants such as water hardness, population and treatment type. It is worth to mention that 38 samples, collected from different Spanish regions, were studied. The statistical tool we used was Principal Component Analysis (PC), in order to reduce the number of response variables. The analysis of variance (ANOVA) test and a non-parametric test such as the Kruskal-Wallis test were also studied through the estimation of the p-value (probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true) in order to study possible relations between the concentration of both analytes and the rest of variables. We also compared LAS and insoluble soap behaviors. In addition, the results obtained for LAS (mean value) were compared with the limit value proposed by the future Directive entitled "Working Document on Sludge". According to the results, the mean obtained for soap and LAS was 26.49 g kg(-1) and 6.15 g kg(-1) respectively. It is worth noting that LAS mean was significantly higher than the limit value (2.6 g kg(-1)). In addition, LAS and soap concentrations depend largely on water hardness. However, only LAS concentration depends on treatment type.

  16. [Evaluation of using statistical methods in selected national medical journals].

    PubMed

    Sych, Z

    1996-01-01

    The paper covers the performed evaluation of frequency with which the statistical methods were applied in analyzed works having been published in six selected, national medical journals in the years 1988-1992. For analysis the following journals were chosen, namely: Klinika Oczna, Medycyna Pracy, Pediatria Polska, Polski Tygodnik Lekarski, Roczniki Państwowego Zakładu Higieny, Zdrowie Publiczne. Appropriate number of works up to the average in the remaining medical journals was randomly selected from respective volumes of Pol. Tyg. Lek. The studies did not include works wherein the statistical analysis was not implemented, which referred both to national and international publications. That exemption was also extended to review papers, casuistic ones, reviews of books, handbooks, monographies, reports from scientific congresses, as well as papers on historical topics. The number of works was defined in each volume. Next, analysis was performed to establish the mode of finding out a suitable sample in respective studies, differentiating two categories: random and target selections. Attention was also paid to the presence of control sample in the individual works. In the analysis attention was also focussed on the existence of sample characteristics, setting up three categories: complete, partial and lacking. In evaluating the analyzed works an effort was made to present the results of studies in tables and figures (Tab. 1, 3). Analysis was accomplished with regard to the rate of employing statistical methods in analyzed works in relevant volumes of six selected, national medical journals for the years 1988-1992, simultaneously determining the number of works, in which no statistical methods were used. Concurrently the frequency of applying the individual statistical methods was analyzed in the scrutinized works. Prominence was given to fundamental statistical methods in the field of descriptive statistics (measures of position, measures of dispersion) as well as most important methods of mathematical statistics such as parametric tests of significance, analysis of variance (in single and dual classifications). non-parametric tests of significance, correlation and regression. The works, in which use was made of either multiple correlation or multiple regression or else more complex methods of studying the relationship for two or more numbers of variables, were incorporated into the works whose statistical methods were constituted by correlation and regression as well as other methods, e.g. statistical methods being used in epidemiology (coefficients of incidence and morbidity, standardization of coefficients, survival tables) factor analysis conducted by Jacobi-Hotellng's method, taxonomic methods and others. On the basis of the performed studies it has been established that the frequency of employing statistical methods in the six selected national, medical journals in the years 1988-1992 was 61.1-66.0% of the analyzed works (Tab. 3), and they generally were almost similar to the frequency provided in English language medical journals. On a whole, no significant differences were disclosed in the frequency of applied statistical methods (Tab. 4) as well as in frequency of random tests (Tab. 3) in the analyzed works, appearing in the medical journals in respective years 1988-1992. The most frequently used statistical methods in analyzed works for 1988-1992 were the measures of position 44.2-55.6% and measures of dispersion 32.5-38.5% as well as parametric tests of significance 26.3-33.1% of the works analyzed (Tab. 4). For the purpose of increasing the frequency and reliability of the used statistical methods, the didactics should be widened in the field of biostatistics at medical studies and postgraduation training designed for physicians and scientific-didactic workers.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilbert, Richard O.

    The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Somemore » statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.« less

  18. Inference with viral quasispecies diversity indices: clonal and NGS approaches.

    PubMed

    Gregori, Josep; Salicrú, Miquel; Domingo, Esteban; Sanchez, Alex; Esteban, Juan I; Rodríguez-Frías, Francisco; Quer, Josep

    2014-04-15

    Given the inherent dynamics of a viral quasispecies, we are often interested in the comparison of diversity indices of sequential samples of a patient, or in the comparison of diversity indices of virus in groups of patients in a treated versus control design. It is then important to make sure that the diversity measures from each sample may be compared with no bias and within a consistent statistical framework. In the present report, we review some indices often used as measures for viral quasispecies complexity and provide means for statistical inference, applying procedures taken from the ecology field. In particular, we examine the Shannon entropy and the mutation frequency, and we discuss the appropriateness of different normalization methods of the Shannon entropy found in the literature. By taking amplicons ultra-deep pyrosequencing (UDPS) raw data as a surrogate of a real hepatitis C virus viral population, we study through in-silico sampling the statistical properties of these indices under two methods of viral quasispecies sampling, classical cloning followed by Sanger sequencing (CCSS) and next-generation sequencing (NGS) such as UDPS. We propose solutions specific to each of the two sampling methods-CCSS and NGS-to guarantee statistically conforming conclusions as free of bias as possible. josep.gregori@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. Application of Digital Image Analysis to Determine Pancreatic Islet Mass and Purity in Clinical Islet Isolation and Transplantation

    PubMed Central

    Wang, Ling-jia; Kissler, Hermann J; Wang, Xiaojun; Cochet, Olivia; Krzystyniak, Adam; Misawa, Ryosuke; Golab, Karolina; Tibudan, Martin; Grzanka, Jakub; Savari, Omid; Grose, Randall; Kaufman, Dixon B; Millis, Michael; Witkowski, Piotr

    2015-01-01

    Pancreatic islet mass, represented by islet equivalent (IEQ), is the most important parameter in decision making for clinical islet transplantation. To obtain IEQ, the sample of islets is routinely counted manually under a microscope and discarded thereafter. Islet purity, another parameter in islet processing, is routinely acquired by estimation only. In this study, we validated our digital image analysis (DIA) system developed using the software of Image Pro Plus for islet mass and purity assessment. Application of the DIA allows to better comply with current good manufacturing practice (cGMP) standards. Human islet samples were captured as calibrated digital images for the permanent record. Five trained technicians participated in determination of IEQ and purity by manual counting method and DIA. IEQ count showed statistically significant correlations between the manual method and DIA in all sample comparisons (r >0.819 and p < 0.0001). Statistically significant difference in IEQ between both methods was found only in High purity 100μL sample group (p = 0.029). As far as purity determination, statistically significant differences between manual assessment and DIA measurement was found in High and Low purity 100μL samples (p<0.005), In addition, islet particle number (IPN) and the IEQ/IPN ratio did not differ statistically between manual counting method and DIA. In conclusion, the DIA used in this study is a reliable technique in determination of IEQ and purity. Islet sample preserved as a digital image and results produced by DIA can be permanently stored for verification, technical training and islet information exchange between different islet centers. Therefore, DIA complies better with cGMP requirements than the manual counting method. We propose DIA as a quality control tool to supplement the established standard manual method for islets counting and purity estimation. PMID:24806436

  20. THE SCREENING AND RANKING ALGORITHM FOR CHANGE-POINTS DETECTION IN MULTIPLE SAMPLES

    PubMed Central

    Song, Chi; Min, Xiaoyi; Zhang, Heping

    2016-01-01

    The chromosome copy number variation (CNV) is the deviation of genomic regions from their normal copy number states, which may associate with many human diseases. Current genetic studies usually collect hundreds to thousands of samples to study the association between CNV and diseases. CNVs can be called by detecting the change-points in mean for sequences of array-based intensity measurements. Although multiple samples are of interest, the majority of the available CNV calling methods are single sample based. Only a few multiple sample methods have been proposed using scan statistics that are computationally intensive and designed toward either common or rare change-points detection. In this paper, we propose a novel multiple sample method by adaptively combining the scan statistic of the screening and ranking algorithm (SaRa), which is computationally efficient and is able to detect both common and rare change-points. We prove that asymptotically this method can find the true change-points with almost certainty and show in theory that multiple sample methods are superior to single sample methods when shared change-points are of interest. Additionally, we report extensive simulation studies to examine the performance of our proposed method. Finally, using our proposed method as well as two competing approaches, we attempt to detect CNVs in the data from the Primary Open-Angle Glaucoma Genes and Environment study, and conclude that our method is faster and requires less information while our ability to detect the CNVs is comparable or better. PMID:28090239

  1. FIR statistics of paired galaxies

    NASA Technical Reports Server (NTRS)

    Sulentic, Jack W.

    1990-01-01

    Much progress has been made in understanding the effects of interaction on galaxies (see reviews in this volume by Heckman and Kennicutt). Evidence for enhanced emission from galaxies in pairs first emerged in the radio (Sulentic 1976) and optical (Larson and Tinsley 1978) domains. Results in the far infrared (FIR) lagged behind until the advent of the Infrared Astronomy Satellite (IRAS). The last five years have seen numerous FIR studies of optical and IR selected samples of interacting galaxies (e.g., Cutri and McAlary 1985; Joseph and Wright 1985; Kennicutt et al. 1987; Haynes and Herter 1988). Despite all of this work, there are still contradictory ideas about the level and, even, the reality of an FIR enhancement in interacting galaxies. Much of the confusion originates in differences between the galaxy samples that were studied (i.e., optical morphology and redshift coverage). Here, the authors report on a study of the FIR detection properties for a large sample of interacting galaxies and a matching control sample. They focus on the distance independent detection fraction (DF) statistics of the sample. The results prove useful in interpreting the previously published work. A clarification of the phenomenology provides valuable clues about the physics of the FIR enhancement in galaxies.

  2. Explorations in statistics: the log transformation.

    PubMed

    Curran-Everett, Douglas

    2018-06-01

    Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This thirteenth installment of Explorations in Statistics explores the log transformation, an established technique that rescales the actual observations from an experiment so that the assumptions of some statistical analysis are better met. A general assumption in statistics is that the variability of some response Y is homogeneous across groups or across some predictor variable X. If the variability-the standard deviation-varies in rough proportion to the mean value of Y, a log transformation can equalize the standard deviations. Moreover, if the actual observations from an experiment conform to a skewed distribution, then a log transformation can make the theoretical distribution of the sample mean more consistent with a normal distribution. This is important: the results of a one-sample t test are meaningful only if the theoretical distribution of the sample mean is roughly normal. If we log-transform our observations, then we want to confirm the transformation was useful. We can do this if we use the Box-Cox method, if we bootstrap the sample mean and the statistic t itself, and if we assess the residual plots from the statistical model of the actual and transformed sample observations.

  3. Multi-pulse multi-delay (MPMD) multiple access modulation for UWB

    DOEpatents

    Dowla, Farid U.; Nekoogar, Faranak

    2007-03-20

    A new modulation scheme in UWB communications is introduced. This modulation technique utilizes multiple orthogonal transmitted-reference pulses for UWB channelization. The proposed UWB receiver samples the second order statistical function at both zero and non-zero lags and matches the samples to stored second order statistical functions, thus sampling and matching the shape of second order statistical functions rather than just the shape of the received pulses.

  4. Critical Thinking Skills of U.S. Air Force Senior and Intermediate Developmental Education Students

    DTIC Science & Technology

    2016-02-16

    SAASS), and Air War College (AWC). T-tests indicated no statistically significant difference in the CT skills of the sample of ACSC and AWC students...hypothesis that there was no statistically significant difference in the CT skills of IDE and SDE students. SAASS, as a more selective advanced studies...potential to develop CT skills, concluding, “students in the experimental group performed at a statistically significantly higher level than students in

  5. A sampling plan for conduit-flow karst springs: Minimizing sampling cost and maximizing statistical utility

    USGS Publications Warehouse

    Currens, J.C.

    1999-01-01

    Analytical data for nitrate and triazines from 566 samples collected over a 3-year period at Pleasant Grove Spring, Logan County, KY, were statistically analyzed to determine the minimum data set needed to calculate meaningful yearly averages for a conduit-flow karst spring. Results indicate that a biweekly sampling schedule augmented with bihourly samples from high-flow events will provide meaningful suspended-constituent and dissolved-constituent statistics. Unless collected over an extensive period of time, daily samples may not be representative and may also be autocorrelated. All high-flow events resulting in a significant deflection of a constituent from base-line concentrations should be sampled. Either the geometric mean or the flow-weighted average of the suspended constituents should be used. If automatic samplers are used, then they may be programmed to collect storm samples as frequently as every few minutes to provide details on the arrival time of constituents of interest. However, only samples collected bihourly should be used to calculate averages. By adopting a biweekly sampling schedule augmented with high-flow samples, the need to continuously monitor discharge, or to search for and analyze existing data to develop a statistically valid monitoring plan, is lessened.Analytical data for nitrate and triazines from 566 samples collected over a 3-year period at Pleasant Grove Spring, Logan County, KY, were statistically analyzed to determine the minimum data set needed to calculate meaningful yearly averages for a conduit-flow karst spring. Results indicate that a biweekly sampling schedule augmented with bihourly samples from high-flow events will provide meaningful suspended-constituent and dissolved-constituent statistics. Unless collected over an extensive period of time, daily samples may not be representative and may also be autocorrelated. All high-flow events resulting in a significant deflection of a constituent from base-line concentrations should be sampled. Either the geometric mean or the flow-weighted average of the suspended constituents should be used. If automatic samplers are used, then they may be programmed to collect storm samples as frequently as every few minutes to provide details on the arrival time of constituents of interest. However, only samples collected bihourly should be used to calculate averages. By adopting a biweekly sampling schedule augmented with high-flow samples, the need to continuously monitor discharge, or to search for and analyze existing data to develop a statistically valid monitoring plan, is lessened.

  6. Computerized EEG analysis for studying the effect of drugs on the central nervous system.

    PubMed

    Rosadini, G; Cavazza, B; Rodriguez, G; Sannita, W G; Siccardi, A

    1977-11-01

    Samples of our experience in quantitative pharmaco-EEG are reviewed to discuss and define its applicability and limits. Simple processing systems, such as the computation of Hjorth's descriptors, are useful for on-line monitoring of drug-induced EEG modifications which are evident also at the visual visual analysis. Power spectral analysis is suitable to identify and quantify EEG effects not evident at the visual inspection. It demonstrated how the EEG effects of compounds in a long-acting formulation vary according to the sampling time and the explored cerebral area. EEG modifications not detected by power spectral analysis can be defined by comparing statistically (F test) the spectral values of the EEG from a single lead at the different samples (longitudinal comparison), or the spectral values from different leads at any sample (intrahemispheric comparison). The presently available procedures of quantitative pharmaco-EEG are effective when applied to the study of mutltilead EEG recordings in a statistically significant sample of population. They do not seem reliable in the monitoring of directing of neuropyschiatric therapies in single patients, due to individual variability of drug effects.

  7. Study design and statistical analysis of data in human population studies with the micronucleus assay.

    PubMed

    Ceppi, Marcello; Gallo, Fabio; Bonassi, Stefano

    2011-01-01

    The most common study design performed in population studies based on the micronucleus (MN) assay, is the cross-sectional study, which is largely performed to evaluate the DNA damaging effects of exposure to genotoxic agents in the workplace, in the environment, as well as from diet or lifestyle factors. Sample size is still a critical issue in the design of MN studies since most recent studies considering gene-environment interaction, often require a sample size of several hundred subjects, which is in many cases difficult to achieve. The control of confounding is another major threat to the validity of causal inference. The most popular confounders considered in population studies using MN are age, gender and smoking habit. Extensive attention is given to the assessment of effect modification, given the increasing inclusion of biomarkers of genetic susceptibility in the study design. Selected issues concerning the statistical treatment of data have been addressed in this mini-review, starting from data description, which is a critical step of statistical analysis, since it allows to detect possible errors in the dataset to be analysed and to check the validity of assumptions required for more complex analyses. Basic issues dealing with statistical analysis of biomarkers are extensively evaluated, including methods to explore the dose-response relationship among two continuous variables and inferential analysis. A critical approach to the use of parametric and non-parametric methods is presented, before addressing the issue of most suitable multivariate models to fit MN data. In the last decade, the quality of statistical analysis of MN data has certainly evolved, although even nowadays only a small number of studies apply the Poisson model, which is the most suitable method for the analysis of MN data.

  8. Quantitative Methods for Analysing Joint Questionnaire Data: Exploring the Role of Joint in Force Design

    DTIC Science & Technology

    2015-08-01

    the nine questions. The Statistical Package for the Social Sciences ( SPSS ) [11] was used to conduct statistical analysis on the sample. Two types...constructs. SPSS was again used to conduct statistical analysis on the sample. This time factor analysis was conducted. Factor analysis attempts to...Business Research Methods and Statistics using SPSS . P432. 11 IBM SPSS Statistics . (2012) 12 Burns, R.B., Burns, R.A. (2008) ‘Business Research

  9. A multicenter study of viable PCR using propidium monoazide to detect Legionella in water samples.

    PubMed

    Scaturro, Maria; Fontana, Stefano; Dell'eva, Italo; Helfer, Fabrizia; Marchio, Michele; Stefanetti, Maria Vittoria; Cavallaro, Mario; Miglietta, Marilena; Montagna, Maria Teresa; De Giglio, Osvalda; Cuna, Teresa; Chetti, Leonarda; Sabattini, Maria Antonietta Bucci; Carlotti, Michela; Viggiani, Mariagabriella; Stenico, Alberta; Romanin, Elisa; Bonanni, Emma; Ottaviano, Claudio; Franzin, Laura; Avanzini, Claudio; Demarie, Valerio; Corbella, Marta; Cambieri, Patrizia; Marone, Piero; Rota, Maria Cristina; Bella, Antonino; Ricci, Maria Luisa

    2016-07-01

    Legionella quantification in environmental samples is overestimated by qPCR. Combination with a viable dye, such as Propidium monoazide (PMA), could make qPCR (named then vPCR) very reliable. In this multicentre study 717 artificial water samples, spiked with fixed concentrations of Legionella and interfering bacterial flora, were analysed by qPCR, vPCR and culture and data were compared by statistical analysis. A heat-treatment at 55 °C for 10 minutes was also performed to obtain viable and not-viable bacteria. When data of vPCR were compared with those of culture and qPCR, statistical analysis showed significant differences (P < 0.001). However, although the heat-treatment caused an abatement of CFU/mL ≤1 to 1 log10 unit, the comparison between untreated and heat-treated samples analysed by vPCR highlighted non-significant differences (P > 0.05). Overall this study provided a good experimental reproducibility of vPCR but also highlighted limits of PMA in the discriminating capability of dead and live bacteria, making vPCR not completely reliable. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. Statistical classification techniques for engineering and climatic data samples

    NASA Technical Reports Server (NTRS)

    Temple, E. C.; Shipman, J. R.

    1981-01-01

    Fisher's sample linear discriminant function is modified through an appropriate alteration of the common sample variance-covariance matrix. The alteration consists of adding nonnegative values to the eigenvalues of the sample variance covariance matrix. The desired results of this modification is to increase the number of correct classifications by the new linear discriminant function over Fisher's function. This study is limited to the two-group discriminant problem.

  11. [Formal sample size calculation and its limited validity in animal studies of medical basic research].

    PubMed

    Mayer, B; Muche, R

    2013-01-01

    Animal studies are highly relevant for basic medical research, although their usage is discussed controversially in public. Thus, an optimal sample size for these projects should be aimed at from a biometrical point of view. Statistical sample size calculation is usually the appropriate methodology in planning medical research projects. However, required information is often not valid or only available during the course of an animal experiment. This article critically discusses the validity of formal sample size calculation for animal studies. Within the discussion, some requirements are formulated to fundamentally regulate the process of sample size determination for animal experiments.

  12. Public and patient involvement in quantitative health research: A statistical perspective.

    PubMed

    Hannigan, Ailish

    2018-06-19

    The majority of studies included in recent reviews of impact for public and patient involvement (PPI) in health research had a qualitative design. PPI in solely quantitative designs is underexplored, particularly its impact on statistical analysis. Statisticians in practice have a long history of working in both consultative (indirect) and collaborative (direct) roles in health research, yet their perspective on PPI in quantitative health research has never been explicitly examined. To explore the potential and challenges of PPI from a statistical perspective at distinct stages of quantitative research, that is sampling, measurement and statistical analysis, distinguishing between indirect and direct PPI. Statistical analysis is underpinned by having a representative sample, and a collaborative or direct approach to PPI may help achieve that by supporting access to and increasing participation of under-represented groups in the population. Acknowledging and valuing the role of lay knowledge of the context in statistical analysis and in deciding what variables to measure may support collective learning and advance scientific understanding, as evidenced by the use of participatory modelling in other disciplines. A recurring issue for quantitative researchers, which reflects quantitative sampling methods, is the selection and required number of PPI contributors, and this requires further methodological development. Direct approaches to PPI in quantitative health research may potentially increase its impact, but the facilitation and partnership skills required may require further training for all stakeholders, including statisticians. © 2018 The Authors Health Expectations published by John Wiley & Sons Ltd.

  13. Mycology of chronic suppurative otitis media-cholesteatoma disease: An evaluative study.

    PubMed

    Singh, Gautam Bir; Solo, Medozhanuo; Kaur, Ravinder; Arora, Rubeena; Kumar, Sunil

    To detect the prevalence of fungus in chronic suppurative otitis media-cholesteatoma disease and to evaluate its clinical significance. Prospective observational study conducted in a sample size of 46 patients at a tertiary care university teaching hospital. Forty six patients suffering from chronic suppurative otitis media-cholesteatoma disease were recruited in this prospective study. Data was duly recorded. Cholesteatoma sample was procured at the time of mastoid surgery and microbiologically analysed for fungal infestation. Clinical correlation to fungus infestation of cholesteatoma was statistically analysed. Out of the recruited 46 patients, post-operatively cholesteatoma was seen in 40 cases only. Seventeen i.e. 42.5% of these cases had fungal colonization of cholesteatoma. Further a statistically significant correlation between persistent otorrhoea and fungal infestation of cholesteatoma was observed. Three cases of fungal otomastoiditis were also recorded in this study, but a statistically significant correlation between complications and fungus infestation of cholesteatoma could not be clearly established. There is fungal colonization of cholesteatoma which is pathogenic and can cause persistent otorrhoea. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Statistical Analysis Techniques for Small Sample Sizes

    NASA Technical Reports Server (NTRS)

    Navard, S. E.

    1984-01-01

    The small sample sizes problem which is encountered when dealing with analysis of space-flight data is examined. Because of such a amount of data available, careful analyses are essential to extract the maximum amount of information with acceptable accuracy. Statistical analysis of small samples is described. The background material necessary for understanding statistical hypothesis testing is outlined and the various tests which can be done on small samples are explained. Emphasis is on the underlying assumptions of each test and on considerations needed to choose the most appropriate test for a given type of analysis.

  15. Characterizing microstructural features of biomedical samples by statistical analysis of Mueller matrix images

    NASA Astrophysics Data System (ADS)

    He, Honghui; Dong, Yang; Zhou, Jialing; Ma, Hui

    2017-03-01

    As one of the salient features of light, polarization contains abundant structural and optical information of media. Recently, as a comprehensive description of polarization property, the Mueller matrix polarimetry has been applied to various biomedical studies such as cancerous tissues detections. In previous works, it has been found that the structural information encoded in the 2D Mueller matrix images can be presented by other transformed parameters with more explicit relationship to certain microstructural features. In this paper, we present a statistical analyzing method to transform the 2D Mueller matrix images into frequency distribution histograms (FDHs) and their central moments to reveal the dominant structural features of samples quantitatively. The experimental results of porcine heart, intestine, stomach, and liver tissues demonstrate that the transformation parameters and central moments based on the statistical analysis of Mueller matrix elements have simple relationships to the dominant microstructural properties of biomedical samples, including the density and orientation of fibrous structures, the depolarization power, diattenuation and absorption abilities. It is shown in this paper that the statistical analysis of 2D images of Mueller matrix elements may provide quantitative or semi-quantitative criteria for biomedical diagnosis.

  16. Sampling studies to estimate the HIV prevalence rate in female commercial sex workers.

    PubMed

    Pascom, Ana Roberta Pati; Szwarcwald, Célia Landmann; Barbosa Júnior, Aristides

    2010-01-01

    We investigated sampling methods being used to estimate the HIV prevalence rate among female commercial sex workers. The studies were classified according to the adequacy or not of the sample size to estimate HIV prevalence rate and according to the sampling method (probabilistic or convenience). We identified 75 studies that estimated the HIV prevalence rate among female sex workers. Most of the studies employed convenience samples. The sample size was not adequate to estimate HIV prevalence rate in 35 studies. The use of convenience sample limits statistical inference for the whole group. It was observed that there was an increase in the number of published studies since 2005, as well as in the number of studies that used probabilistic samples. This represents a large advance in the monitoring of risk behavior practices and HIV prevalence rate in this group.

  17. The stability of hydrogen ion and specific conductance in filtered wet-deposition samples stored at ambient temperatures

    USGS Publications Warehouse

    Gordon, J.D.; Schroder, L.J.; Morden-Moore, A. L.; Bowersox, V.C.

    1995-01-01

    Separate experiments by the U.S. Geological Survey (USGS) and the Illinois State Water Survey Central Analytical Laboratory (CAL) independently assessed the stability of hydrogen ion and specific conductance in filtered wet-deposition samples stored at ambient temperatures. The USGS experiment represented a test of sample stability under a diverse range of conditions, whereas the CAL experiment was a controlled test of sample stability. In the experiment by the USGS, a statistically significant (?? = 0.05) relation between [H+] and time was found for the composited filtered, natural, wet-deposition solution when all reported values are included in the analysis. However, if two outlying pH values most likely representing measurement error are excluded from the analysis, the change in [H+] over time was not statistically significant. In the experiment by the CAL, randomly selected samples were reanalyzed between July 1984 and February 1991. The original analysis and reanalysis pairs revealed that [H+] differences, although very small, were statistically different from zero, whereas specific-conductance differences were not. Nevertheless, the results of the CAL reanalysis project indicate there appears to be no consistent, chemically significant degradation in sample integrity with regard to [H+] and specific conductance while samples are stored at room temperature at the CAL. Based on the results of the CAL and USGS studies, short-term (45-60 day) stability of [H+] and specific conductance in natural filtered wet-deposition samples that are shipped and stored unchilled at ambient temperatures was satisfactory.

  18. Empirical likelihood-based tests for stochastic ordering

    PubMed Central

    BARMI, HAMMOU EL; MCKEAGUE, IAN W.

    2013-01-01

    This paper develops an empirical likelihood approach to testing for the presence of stochastic ordering among univariate distributions based on independent random samples from each distribution. The proposed test statistic is formed by integrating a localized empirical likelihood statistic with respect to the empirical distribution of the pooled sample. The asymptotic null distribution of this test statistic is found to have a simple distribution-free representation in terms of standard Brownian bridge processes. The approach is used to compare the lengths of rule of Roman Emperors over various historical periods, including the “decline and fall” phase of the empire. In a simulation study, the power of the proposed test is found to improve substantially upon that of a competing test due to El Barmi and Mukerjee. PMID:23874142

  19. Supervised classification in the presence of misclassified training data: a Monte Carlo simulation study in the three group case.

    PubMed

    Bolin, Jocelyn Holden; Finch, W Holmes

    2014-01-01

    Statistical classification of phenomena into observed groups is very common in the social and behavioral sciences. Statistical classification methods, however, are affected by the characteristics of the data under study. Statistical classification can be further complicated by initial misclassification of the observed groups. The purpose of this study is to investigate the impact of initial training data misclassification on several statistical classification and data mining techniques. Misclassification conditions in the three group case will be simulated and results will be presented in terms of overall as well as subgroup classification accuracy. Results show decreased classification accuracy as sample size, group separation and group size ratio decrease and as misclassification percentage increases with random forests demonstrating the highest accuracy across conditions.

  20. Use of missing data methods in longitudinal studies: the persistence of bad practices in developmental psychology.

    PubMed

    Jelicić, Helena; Phelps, Erin; Lerner, Richard M

    2009-07-01

    Developmental science rests on describing, explaining, and optimizing intraindividual changes and, hence, empirically requires longitudinal research. Problems of missing data arise in most longitudinal studies, thus creating challenges for interpreting the substance and structure of intraindividual change. Using a sample of reports of longitudinal studies obtained from three flagship developmental journals-Child Development, Developmental Psychology, and Journal of Research on Adolescence-we examined the number of longitudinal studies reporting missing data and the missing data techniques used. Of the 100 longitudinal studies sampled, 57 either reported having missing data or had discrepancies in sample sizes reported for different analyses. The majority of these studies (82%) used missing data techniques that are statistically problematic (either listwise deletion or pairwise deletion) and not among the methods recommended by statisticians (i.e., the direct maximum likelihood method and the multiple imputation method). Implications of these results for developmental theory and application, and the need for understanding the consequences of using statistically inappropriate missing data techniques with actual longitudinal data sets, are discussed.

  1. On using summary statistics from an external calibration sample to correct for covariate measurement error.

    PubMed

    Guo, Ying; Little, Roderick J; McConnell, Daniel S

    2012-01-01

    Covariate measurement error is common in epidemiologic studies. Current methods for correcting measurement error with information from external calibration samples are insufficient to provide valid adjusted inferences. We consider the problem of estimating the regression of an outcome Y on covariates X and Z, where Y and Z are observed, X is unobserved, but a variable W that measures X with error is observed. Information about measurement error is provided in an external calibration sample where data on X and W (but not Y and Z) are recorded. We describe a method that uses summary statistics from the calibration sample to create multiple imputations of the missing values of X in the regression sample, so that the regression coefficients of Y on X and Z and associated standard errors can be estimated using simple multiple imputation combining rules, yielding valid statistical inferences under the assumption of a multivariate normal distribution. The proposed method is shown by simulation to provide better inferences than existing methods, namely the naive method, classical calibration, and regression calibration, particularly for correction for bias and achieving nominal confidence levels. We also illustrate our method with an example using linear regression to examine the relation between serum reproductive hormone concentrations and bone mineral density loss in midlife women in the Michigan Bone Health and Metabolism Study. Existing methods fail to adjust appropriately for bias due to measurement error in the regression setting, particularly when measurement error is substantial. The proposed method corrects this deficiency.

  2. The Probability of Obtaining Two Statistically Different Test Scores as a Test Index

    ERIC Educational Resources Information Center

    Muller, Jorg M.

    2006-01-01

    A new test index is defined as the probability of obtaining two randomly selected test scores (PDTS) as statistically different. After giving a concept definition of the test index, two simulation studies are presented. The first analyzes the influence of the distribution of test scores, test reliability, and sample size on PDTS within classical…

  3. The Adequacy of Different Robust Statistical Tests in Comparing Two Independent Groups

    ERIC Educational Resources Information Center

    Pero-Cebollero, Maribel; Guardia-Olmos, Joan

    2013-01-01

    In the current study, we evaluated various robust statistical methods for comparing two independent groups. Two scenarios for simulation were generated: one of equality and another of population mean differences. In each of the scenarios, 33 experimental conditions were used as a function of sample size, standard deviation and asymmetry. For each…

  4. Exploring Pre-Service Teachers' Understanding of Statistical Variation: Implications for Teaching and Research

    ERIC Educational Resources Information Center

    Sharma, Sashi

    2007-01-01

    Concerns about the importance of variation in statistics education and a lack of research in this topic led to a preliminary study which explored pre-service teachers' ideas in this area. The teachers completed a written questionnaire about variation in sampling and distribution contexts. Responses were categorised in relation to a framework that…

  5. Simulation of Wind Profile Perturbations for Launch Vehicle Design

    NASA Technical Reports Server (NTRS)

    Adelfang, S. I.

    2004-01-01

    Ideally, a statistically representative sample of measured high-resolution wind profiles with wavelengths as small as tens of meters is required in design studies to establish aerodynamic load indicator dispersions and vehicle control system capability. At most potential launch sites, high- resolution wind profiles may not exist. Representative samples of Rawinsonde wind profiles to altitudes of 30 km are more likely to be available from the extensive network of measurement sites established for routine sampling in support of weather observing and forecasting activity. Such a sample, large enough to be statistically representative of relatively large wavelength perturbations, would be inadequate for launch vehicle design assessments because the Rawinsonde system accurately measures wind perturbations with wavelengths no smaller than 2000 m (1000 m altitude increment). The Kennedy Space Center (KSC) Jimsphere wind profiles (150/month and seasonal 2 and 3.5-hr pairs) are the only adequate samples of high resolution profiles approx. 150 to 300 m effective resolution, but over-sampled at 25 m intervals) that have been used extensively for launch vehicle design assessments. Therefore, a simulation process has been developed for enhancement of measured low-resolution Rawinsonde profiles that would be applicable in preliminary launch vehicle design studies at launch sites other than KSC.

  6. Groundwater-quality data for the Sierra Nevada study unit, 2008: Results from the California GAMA program

    USGS Publications Warehouse

    Shelton, Jennifer L.; Fram, Miranda S.; Munday, Cathy M.; Belitz, Kenneth

    2010-01-01

    Groundwater quality in the approximately 25,500-square-mile Sierra Nevada study unit was investigated in June through October 2008, as part of the Priority Basin Project of the Groundwater Ambient Monitoring and Assessment (GAMA) Program. The GAMA Priority Basin Project is being conducted by the U.S. Geological Survey (USGS) in cooperation with the California State Water Resources Control Board (SWRCB). The Sierra Nevada study was designed to provide statistically robust assessments of untreated groundwater quality within the primary aquifer systems in the study unit, and to facilitate statistically consistent comparisons of groundwater quality throughout California. The primary aquifer systems (hereinafter, primary aquifers) are defined by the depth of the screened or open intervals of the wells listed in the California Department of Public Health (CDPH) database of wells used for public and community drinking-water supplies. The quality of groundwater in shallower or deeper water-bearing zones may differ from that in the primary aquifers; shallow groundwater may be more vulnerable to contamination from the surface. In the Sierra Nevada study unit, groundwater samples were collected from 84 wells (and springs) in Lassen, Plumas, Butte, Sierra, Yuba, Nevada, Placer, El Dorado, Amador, Alpine, Calaveras, Tuolumne, Madera, Mariposa, Fresno, Inyo, Tulare, and Kern Counties. The wells were selected on two overlapping networks by using a spatially-distributed, randomized, grid-based approach. The primary grid-well network consisted of 30 wells, one well per grid cell in the study unit, and was designed to provide statistical representation of groundwater quality throughout the entire study unit. The lithologic grid-well network is a secondary grid that consisted of the wells in the primary grid-well network plus 53 additional wells and was designed to provide statistical representation of groundwater quality in each of the four major lithologic units in the Sierra Nevada study unit: granitic, metamorphic, sedimentary, and volcanic rocks. One natural spring that is not used for drinking water was sampled for comparison with a nearby primary grid well in the same cell. Groundwater samples were analyzed for organic constituents (volatile organic compounds [VOC], pesticides and pesticide degradates, and pharmaceutical compounds), constituents of special interest (N-nitrosodimethylamine [NDMA] and perchlorate), naturally occurring inorganic constituents (nutrients, major ions, total dissolved solids, and trace elements), and radioactive constituents (radium isotopes, radon-222, gross alpha and gross beta particle activities, and uranium isotopes). Naturally occurring isotopes and geochemical tracers (stable isotopes of hydrogen and oxygen in water, stable isotopes of carbon, carbon-14, strontium isotopes, and tritium), and dissolved noble gases also were measured to help identify the sources and ages of the sampled groundwater. Three types of quality-control samples (blanks, replicates, and samples for matrix spikes) each were collected at approximately 10 percent of the wells sampled for each analysis, and the results for these samples were used to evaluate the quality of the data for the groundwater samples. Field blanks rarely contained detectable concentrations of any constituent, suggesting that contamination from sample collection, handling, and analytical procedures was not a significant source of bias in the data for the groundwater samples. Differences between replicate samples were within acceptable ranges, with few exceptions. Matrix-spike recoveries were within acceptable ranges for most compounds. This study did not attempt to evaluate the quality of water delivered to consumers; after withdrawal from the ground, groundwater typically is treated, disinfected, or blended with other waters to maintain water quality. Regulatory benchmarks apply to finished drinking water that is served to the consumer, not to untre

  7. Remote sensing-aided systems for snow qualification, evapotranspiration estimation, and their application in hydrologic models

    NASA Technical Reports Server (NTRS)

    Korram, S.

    1977-01-01

    The design of general remote sensing-aided methodologies was studied to provide the estimates of several important inputs to water yield forecast models. These input parameters are snow area extent, snow water content, and evapotranspiration. The study area is Feather River Watershed (780,000 hectares), Northern California. The general approach involved a stepwise sequence of identification of the required information, sample design, measurement/estimation, and evaluation of results. All the relevent and available information types needed in the estimation process are being defined. These include Landsat, meteorological satellite, and aircraft imagery, topographic and geologic data, ground truth data, and climatic data from ground stations. A cost-effective multistage sampling approach was employed in quantification of all the required parameters. The physical and statistical models for both snow quantification and evapotranspiration estimation was developed. These models use the information obtained by aerial and ground data through appropriate statistical sampling design.

  8. Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques

    NASA Astrophysics Data System (ADS)

    Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein

    2017-10-01

    The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.

  9. Chi-squared and C statistic minimization for low count per bin data. [sampling in X ray astronomy

    NASA Technical Reports Server (NTRS)

    Nousek, John A.; Shue, David R.

    1989-01-01

    Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.

  10. Data-optimized source modeling with the Backwards Liouville Test–Kinetic method

    DOE PAGES

    Woodroffe, J. R.; Brito, T. V.; Jordanova, V. K.; ...

    2017-09-14

    In the standard practice of neutron multiplicity counting , the first three sampled factorial moments of the event triggered neutron count distribution were used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. Our study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra,more » Italy, sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  11. Estimating the mass variance in neutron multiplicity counting-A comparison of approaches

    NASA Astrophysics Data System (ADS)

    Dubi, C.; Croft, S.; Favalli, A.; Ocherashvili, A.; Pedersen, B.

    2017-12-01

    In the standard practice of neutron multiplicity counting , the first three sampled factorial moments of the event triggered neutron count distribution are used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α , n) production and the induced fission source responsible for multiplication. This study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra, Italy, sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.

  12. Estimating the mass variance in neutron multiplicity counting $-$ A comparison of approaches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dubi, C.; Croft, S.; Favalli, A.

    In the standard practice of neutron multiplicity counting, the first three sampled factorial moments of the event triggered neutron count distribution are used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. This study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra, Italy,more » sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  13. Estimating the mass variance in neutron multiplicity counting $-$ A comparison of approaches

    DOE PAGES

    Dubi, C.; Croft, S.; Favalli, A.; ...

    2017-09-14

    In the standard practice of neutron multiplicity counting, the first three sampled factorial moments of the event triggered neutron count distribution are used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. This study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra, Italy,more » sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  14. Correlation between Na/K ratio and electron densities in blood samples of breast cancer patients.

    PubMed

    Topdağı, Ömer; Toker, Ozan; Bakırdere, Sezgin; Bursalıoğlu, Ertuğrul Osman; Öz, Ersoy; Eyecioğlu, Önder; Demir, Mustafa; İçelli, Orhan

    2018-05-31

    The main purpose of this study was to investigate the relationship between the electron densities and Na/K ratio which has important role in breast cancer disease. Determinations of sodium and potassium concentrations in blood samples performed with inductive coupled plasma-atomic emission spectrometry. Electron density values of blood samples were determined via ZXCOM. Statistical analyses were performed for electron densities and Na/K ratio including Kolmogorov-Smirnov normality tests, Spearman's rank correlation test and Mann-Whitney U test. It was found that the electron densities significantly differ between control and breast cancer groups. In addition, statistically significant positive correlation was found between the electron density and Na/K ratios in breast cancer group.

  15. Resampling-based Methods in Single and Multiple Testing for Equality of Covariance/Correlation Matrices

    PubMed Central

    Yang, Yang; DeGruttola, Victor

    2016-01-01

    Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients. PMID:22740584

  16. Resampling-based methods in single and multiple testing for equality of covariance/correlation matrices.

    PubMed

    Yang, Yang; DeGruttola, Victor

    2012-06-22

    Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients.

  17. Statistical inference for extended or shortened phase II studies based on Simon's two-stage designs.

    PubMed

    Zhao, Junjun; Yu, Menggang; Feng, Xi-Ping

    2015-06-07

    Simon's two-stage designs are popular choices for conducting phase II clinical trials, especially in the oncology trials to reduce the number of patients placed on ineffective experimental therapies. Recently Koyama and Chen (2008) discussed how to conduct proper inference for such studies because they found that inference procedures used with Simon's designs almost always ignore the actual sampling plan used. In particular, they proposed an inference method for studies when the actual second stage sample sizes differ from planned ones. We consider an alternative inference method based on likelihood ratio. In particular, we order permissible sample paths under Simon's two-stage designs using their corresponding conditional likelihood. In this way, we can calculate p-values using the common definition: the probability of obtaining a test statistic value at least as extreme as that observed under the null hypothesis. In addition to providing inference for a couple of scenarios where Koyama and Chen's method can be difficult to apply, the resulting estimate based on our method appears to have certain advantage in terms of inference properties in many numerical simulations. It generally led to smaller biases and narrower confidence intervals while maintaining similar coverages. We also illustrated the two methods in a real data setting. Inference procedures used with Simon's designs almost always ignore the actual sampling plan. Reported P-values, point estimates and confidence intervals for the response rate are not usually adjusted for the design's adaptiveness. Proper statistical inference procedures should be used.

  18. The Utility of Robust Means in Statistics

    ERIC Educational Resources Information Center

    Goodwyn, Fara

    2012-01-01

    Location estimates calculated from heuristic data were examined using traditional and robust statistical methods. The current paper demonstrates the impact outliers have on the sample mean and proposes robust methods to control for outliers in sample data. Traditional methods fail because they rely on the statistical assumptions of normality and…

  19. The Role of the Sampling Distribution in Understanding Statistical Inference

    ERIC Educational Resources Information Center

    Lipson, Kay

    2003-01-01

    Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…

  20. Illustrating Sampling Distribution of a Statistic: Minitab Revisited

    ERIC Educational Resources Information Center

    Johnson, H. Dean; Evans, Marc A.

    2008-01-01

    Understanding the concept of the sampling distribution of a statistic is essential for the understanding of inferential procedures. Unfortunately, this topic proves to be a stumbling block for students in introductory statistics classes. In efforts to aid students in their understanding of this concept, alternatives to a lecture-based mode of…

  1. Biostatistics primer: part I.

    PubMed

    Overholser, Brian R; Sowinski, Kevin M

    2007-12-01

    Biostatistics is the application of statistics to biologic data. The field of statistics can be broken down into 2 fundamental parts: descriptive and inferential. Descriptive statistics are commonly used to categorize, display, and summarize data. Inferential statistics can be used to make predictions based on a sample obtained from a population or some large body of information. It is these inferences that are used to test specific research hypotheses. This 2-part review will outline important features of descriptive and inferential statistics as they apply to commonly conducted research studies in the biomedical literature. Part 1 in this issue will discuss fundamental topics of statistics and data analysis. Additionally, some of the most commonly used statistical tests found in the biomedical literature will be reviewed in Part 2 in the February 2008 issue.

  2. A study of personal and area airborne asbestos concentrations during asbestos abatement: a statistical evaluation of fibre concentration data.

    PubMed

    Lange, J H; Lange, P R; Reinhard, T K; Thomulka, K W

    1996-08-01

    Data were collected and analysed on airborne concentrations of asbestos generated by abatement of different asbestos-containing materials using various removal practices. Airborne concentrations of asbestos are dramatically variable among the types of asbestos-containing material being abated. Abatement practices evaluated in this study were removal of boiler/pipe insulation in a crawl space, ceiling tile, transite, floor tile/mastic with traditional methods, and mastic removal with a high-efficiency particulate air filter blast track (shot-blast) machine. In general, abatement of boiler and pipe insulation produces the highest airborne fibre levels, while abatement of floor tile and mastic was observed to be the lowest. A comparison of matched personal and area samples was not significantly different, and exhibited a good correlation using regression analysis. After adjusting data for outliers, personal sample fibre concentrations were greater than area sample fibre concentrations. Statistical analysis and sample distribution of airborne asbestos concentrations appear to be best represented in a logarithmic form. Area sample fibre concentrations were shown in this study to have a larger variability than personal measurements. Evaluation of outliers in fibre concentration data and the ability of these values to skew sample populations is presented. The use of personal and area samples in determining exposure, selecting personal protective equipment and its historical relevance as related to future abatement projects is discussed.

  3. Characterization of Surface Water and Groundwater Quality in the Lower Tano River Basin Using Statistical and Isotopic Approach.

    NASA Astrophysics Data System (ADS)

    Edjah, Adwoba; Stenni, Barbara; Cozzi, Giulio; Turetta, Clara; Dreossi, Giuliano; Tetteh Akiti, Thomas; Yidana, Sandow

    2017-04-01

    Adwoba Kua- Manza Edjaha, Barbara Stennib,c,Giuliano Dreossib, Giulio Cozzic, Clara Turetta c,T.T Akitid ,Sandow Yidanae a,eDepartment of Earth Science, University of Ghana Legon, Ghana West Africa bDepartment of Enviromental Sciences, Informatics and Statistics, Ca Foscari University of Venice, Italy cInstitute for the Dynamics of Environmental Processes, CNR, Venice, Italy dDepartment of Nuclear Application and Techniques, Graduate School of Nuclear and Allied Sciences University of Ghana Legon This research is part of a PhD research work "Hydrogeological Assessment of the Lower Tano river basin for sustainable economic usage, Ghana, West - Africa". In this study, the researcher investigated surface water and groundwater quality in the Lower Tano river basin. This assessment was based on some selected sampling sites associated with mining activities, and the development of oil and gas. Statistical approach was applied to characterize the quality of surface water and groundwater. Also, water stable isotopes, which is a natural tracer of the hydrological cycle was used to investigate the origin of groundwater recharge in the basin. The study revealed that Pb and Ni values of the surface water and groundwater samples exceeded the WHO standards for drinking water. In addition, water quality index (WQI), based on physicochemical parameters(EC, TDS, pH) and major ions(Ca2+, Na+, Mg2+, HCO3-,NO3-, CL-, SO42-, K+) exhibited good quality water for 60% of the sampled surface water and groundwater. Other statistical techniques, such as Heavy metal pollution index (HPI), degree of contamination (Cd), and heavy metal evaluation index (HEI), based on trace element parameters in the water samples, reveal that 90% of the surface water and groundwater samples belong to high level of pollution. Principal component analysis (PCA) also suggests that the water quality in the basin is likely affected by rock - water interaction and anthropogenic activities (sea water intrusion). This was confirm by further statistical analysis (cluster analysis and correlation matrix) of the water quality parameters. Spatial distribution of water quality parameters, trace elements and the results obtained from the statistical analysis was determined by geographical information system (GIS). In addition, the isotopic analysis of the sampled surface water and groundwater revealed that most of the surface water and groundwater were of meteoric origin with little or no isotopic variations. It is expected that outcomes of this research will form a baseline for making appropriate decision on water quality management by decision makers in the Lower Tano river Basin. Keywords: Water stable isotopes, Trace elements, Multivariate statistics, Evaluation indices, Lower Tano river basin.

  4. STATISTICAL ANALYSIS OF TANK 18F FLOOR SAMPLE RESULTS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harris, S.

    2010-09-02

    Representative sampling has been completed for characterization of the residual material on the floor of Tank 18F as per the statistical sampling plan developed by Shine [1]. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL [2]. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples resultsmore » [3] to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL{sub 95%}) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 18F. The uncertainty is quantified in this report by an upper 95% confidence limit (UCL{sub 95%}) on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL{sub 95%} was based entirely on the six current scrape sample results (each averaged across three analytical determinations).« less

  5. STATISTICAL ANALYSIS OF TANK 19F FLOOR SAMPLE RESULTS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harris, S.

    2010-09-02

    Representative sampling has been completed for characterization of the residual material on the floor of Tank 19F as per the statistical sampling plan developed by Harris and Shine. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples resultsmore » to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL95%) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current scrape sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 19F. The uncertainty is quantified in this report by an UCL95% on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL95% was based entirely on the six current scrape sample results (each averaged across three analytical determinations).« less

  6. Got Power? A Systematic Review of Sample Size Adequacy in Health Professions Education Research

    ERIC Educational Resources Information Center

    Cook, David A.; Hatala, Rose

    2015-01-01

    Many education research studies employ small samples, which in turn lowers statistical power. We re-analyzed the results of a meta-analysis of simulation-based education to determine study power across a range of effect sizes, and the smallest effect that could be plausibly excluded. We systematically searched multiple databases through May 2011,…

  7. The Mediatory Role of Exercise Self-Regulation in the Relationship between Personality Traits and Anger Management of Athletes

    ERIC Educational Resources Information Center

    Shahbazzadeh, Somayeh; Beliad, Mohammad Reza

    2017-01-01

    This study investigates the mediatory role of exercise self-regulation role in the relationship between personality traits and anger management among athletes. The statistical population of this study includes all athlete students of Shar-e Ghods College, among which 260 people were selected as sample using random sampling method. In addition, the…

  8. Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design

    PubMed Central

    Lu, Tsui-Shan; Longnecker, Matthew P.; Zhou, Haibo

    2016-01-01

    Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data and the general ODS design for a continuous response. While substantial work has been done for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome dependent sampling (Multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the Multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the Multivariate-ODS or the estimator from a simple random sample with the same sample size. The Multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of PCB exposure to hearing loss in children born to the Collaborative Perinatal Study. PMID:27966260

  9. Statistical power calculations for mixed pharmacokinetic study designs using a population approach.

    PubMed

    Kloprogge, Frank; Simpson, Julie A; Day, Nicholas P J; White, Nicholas J; Tarning, Joel

    2014-09-01

    Simultaneous modelling of dense and sparse pharmacokinetic data is possible with a population approach. To determine the number of individuals required to detect the effect of a covariate, simulation-based power calculation methodologies can be employed. The Monte Carlo Mapped Power method (a simulation-based power calculation methodology using the likelihood ratio test) was extended in the current study to perform sample size calculations for mixed pharmacokinetic studies (i.e. both sparse and dense data collection). A workflow guiding an easy and straightforward pharmacokinetic study design, considering also the cost-effectiveness of alternative study designs, was used in this analysis. Initially, data were simulated for a hypothetical drug and then for the anti-malarial drug, dihydroartemisinin. Two datasets (sampling design A: dense; sampling design B: sparse) were simulated using a pharmacokinetic model that included a binary covariate effect and subsequently re-estimated using (1) the same model and (2) a model not including the covariate effect in NONMEM 7.2. Power calculations were performed for varying numbers of patients with sampling designs A and B. Study designs with statistical power >80% were selected and further evaluated for cost-effectiveness. The simulation studies of the hypothetical drug and the anti-malarial drug dihydroartemisinin demonstrated that the simulation-based power calculation methodology, based on the Monte Carlo Mapped Power method, can be utilised to evaluate and determine the sample size of mixed (part sparsely and part densely sampled) study designs. The developed method can contribute to the design of robust and efficient pharmacokinetic studies.

  10. Factors That Explain the Attitude towards Statistics in High-School Students: Empirical Evidence at Technological Study Center of the Sea in Veracruz, Mexico

    ERIC Educational Resources Information Center

    Rojas-Kramer, Carlos; Limón-Suárez, Enrique; Moreno-García, Elena; García-Santillán, Arturo

    2018-01-01

    The aim of this paper was to analyze attitude towards statistics in high-school students using the SATS scale designed by Auzmendi (1992). The sample was 200 students from the sixth semester of the afternoon shift, who were enrolled in technical careers from the Technological Study Center of the Sea (Centro de Estudios Tecnológicos del Mar 07…

  11. Immunohistochemical analysis of stromal fibrocytes and myofibroblasts to envision the invasion and lymph node metastasis in oral squamous cell carcinoma.

    PubMed

    Rao, Sowmya J; Rao, Jyothi Bellur Madhava; Rao, Pp Jagadish

    2017-01-01

    Tumor cells work in close coordination with stromal elements from its stage of emergence to metastasis. The study was designed to assess the presence and distribution pattern of stromal fibrocytes and myofibroblasts in oral squamous cell carcinoma (OSCC). Possibility of using these stromal cells as a marker for invasion and lymphnode metastasis was evaluated. A total of 40 cases of OSCC consisting twenty cases of each lymph node positive (pN+) and lymph node negative (pN0) samples and ten normal oral mucosa (NOM) tissues were subjected to double immunostaining using CD34 and alpha-smooth muscle actin (α-SMA) antibodies. Stained sections were evaluated semiquantitatively. CD34 fibrocytes were seen in 70% of NOM and none of OSCC samples. α-SMA myofibroblasts were seen in 80% of OSCC and none of NOM samples. A statistically significant difference was found in fibrocyte values ( P < 0.001) and myofibroblast values ( P < 0.001) between NOM and OSCC study samples. No statistical significance in myofibroblast values between pN0 and pN+ study groups; however, their distribution pattern appreciably varied. This study suggested that fibrocytes could be used as one of the markers for early invasion. Abrupt loss of fibrocytes at the transition zone toward carcinoma and statistical significance in their values supported this inference. Heterogeneity in the distribution pattern of myofibroblasts in tumor stroma indicates that this variability may predict the tumor behavior toward nodal metastasis rather than their mere presence or absence.

  12. Statistical properties of measures of association and the Kappa statistic for assessing the accuracy of remotely sensed data using double sampling

    Treesearch

    Mohammed A. Kalkhan; Robin M. Reich; Raymond L. Czaplewski

    1996-01-01

    A Monte Carlo simulation was used to evaluate the statistical properties of measures of association and the Kappa statistic under double sampling with replacement. Three error matrices representing three levels of classification accuracy of Landsat TM Data consisting of four forest cover types in North Carolina. The overall accuracy of the five indices ranged from 0.35...

  13. Two Different Views on the World Around Us: The World of Uniformity versus Diversity.

    PubMed

    Kwon, JaeHwan; Nayakankuppam, Dhananjay

    2016-01-01

    We propose that when individuals believe in fixed traits of personality (entity theorists), they are likely to expect a world of "uniformity." As such, they easily infer a population statistic from a small sample of data with confidence. In contrast, individuals who believe in malleable traits of personality (incremental theorists) are likely to presume a world of "diversity," such that they "hesitate" to infer a population statistic from a similarly sized sample. In four laboratory experiments, we found that compared to incremental theorists, entity theorists estimated a population mean from a sample with a greater level of confidence (Studies 1a and 1b), expected more homogeneity among the entities within a population (Study 2), and perceived an extreme value to be more indicative of an outlier (Study 3). These results suggest that individuals are likely to use their implicit self-theory orientations (entity theory versus incremental theory) to see a population in general as a constitution either of homogeneous or heterogeneous entities.

  14. Prospective power calculations for the Four Lab study of a multigenerational reproductive/developmental toxicity rodent bioassay using a complex mixture of disinfection by-products in the low-response region.

    PubMed

    Dingus, Cheryl A; Teuschler, Linda K; Rice, Glenn E; Simmons, Jane Ellen; Narotsky, Michael G

    2011-10-01

    In complex mixture toxicology, there is growing emphasis on testing environmentally representative doses that improve the relevance of results for health risk assessment, but are typically much lower than those used in traditional toxicology studies. Traditional experimental designs with typical sample sizes may have insufficient statistical power to detect effects caused by environmentally relevant doses. Proper study design, with adequate statistical power, is critical to ensuring that experimental results are useful for environmental health risk assessment. Studies with environmentally realistic complex mixtures have practical constraints on sample concentration factor and sample volume as well as the number of animals that can be accommodated. This article describes methodology for calculation of statistical power for non-independent observations for a multigenerational rodent reproductive/developmental bioassay. The use of the methodology is illustrated using the U.S. EPA's Four Lab study in which rodents were exposed to chlorinated water concentrates containing complex mixtures of drinking water disinfection by-products. Possible experimental designs included two single-block designs and a two-block design. Considering the possible study designs and constraints, a design of two blocks of 100 females with a 40:60 ratio of control:treated animals and a significance level of 0.05 yielded maximum prospective power (~90%) to detect pup weight decreases, while providing the most power to detect increased prenatal loss.

  15. Prospective Power Calculations for the Four Lab Study of A Multigenerational Reproductive/Developmental Toxicity Rodent Bioassay Using A Complex Mixture of Disinfection By-Products in the Low-Response Region

    PubMed Central

    Dingus, Cheryl A.; Teuschler, Linda K.; Rice, Glenn E.; Simmons, Jane Ellen; Narotsky, Michael G.

    2011-01-01

    In complex mixture toxicology, there is growing emphasis on testing environmentally representative doses that improve the relevance of results for health risk assessment, but are typically much lower than those used in traditional toxicology studies. Traditional experimental designs with typical sample sizes may have insufficient statistical power to detect effects caused by environmentally relevant doses. Proper study design, with adequate statistical power, is critical to ensuring that experimental results are useful for environmental health risk assessment. Studies with environmentally realistic complex mixtures have practical constraints on sample concentration factor and sample volume as well as the number of animals that can be accommodated. This article describes methodology for calculation of statistical power for non-independent observations for a multigenerational rodent reproductive/developmental bioassay. The use of the methodology is illustrated using the U.S. EPA’s Four Lab study in which rodents were exposed to chlorinated water concentrates containing complex mixtures of drinking water disinfection by-products. Possible experimental designs included two single-block designs and a two-block design. Considering the possible study designs and constraints, a design of two blocks of 100 females with a 40:60 ratio of control:treated animals and a significance level of 0.05 yielded maximum prospective power (~90%) to detect pup weight decreases, while providing the most power to detect increased prenatal loss. PMID:22073030

  16. Statistical Learning is Related to Early Literacy-Related Skills

    PubMed Central

    Spencer, Mercedes; Kaschak, Michael P.; Jones, John L.; Lonigan, Christopher J.

    2015-01-01

    It has been demonstrated that statistical learning, or the ability to use statistical information to learn the structure of one’s environment, plays a role in young children’s acquisition of linguistic knowledge. Although most research on statistical learning has focused on language acquisition processes, such as the segmentation of words from fluent speech and the learning of syntactic structure, some recent studies have explored the extent to which individual differences in statistical learning are related to literacy-relevant knowledge and skills. The present study extends on this literature by investigating the relations between two measures of statistical learning and multiple measures of skills that are critical to the development of literacy—oral language, vocabulary knowledge, and phonological processing—within a single model. Our sample included a total of 553 typically developing children from prekindergarten through second grade. Structural equation modeling revealed that statistical learning accounted for a unique portion of the variance in these literacy-related skills. Practical implications for instruction and assessment are discussed. PMID:26478658

  17. FabricS: A user-friendly, complete and robust software for particle shape-fabric analysis

    NASA Astrophysics Data System (ADS)

    Moreno Chávez, G.; Castillo Rivera, F.; Sarocchi, D.; Borselli, L.; Rodríguez-Sedano, L. A.

    2018-06-01

    Shape-fabric is a textural parameter related to the spatial arrangement of elongated particles in geological samples. Its usefulness spans a range from sedimentary petrology to igneous and metamorphic petrology. Independently of the process being studied, when a material flows, the elongated particles are oriented with the major axis in the direction of flow. In sedimentary petrology this information has been used for studies of paleo-flow direction of turbidites, the origin of quartz sediments, and locating ignimbrite vents, among others. In addition to flow direction and its polarity, the method enables flow rheology to be inferred. The use of shape-fabric has been limited due to the difficulties of automatically measuring particles and analyzing them with reliable circular statistics programs. This has dampened interest in the method for a long time. Shape-fabric measurement has increased in popularity since the 1980s thanks to the development of new image analysis techniques and circular statistics software. However, the programs currently available are unreliable, old and are incompatible with newer operating systems, or require programming skills. The goal of our work is to develop a user-friendly program, in the MATLAB environment, with a graphical user interface, that can process images and includes editing functions, and thresholds (elongation and size) for selecting a particle population and analyzing it with reliable circular statistics algorithms. Moreover, the method also has to produce rose diagrams, orientation vectors, and a complete series of statistical parameters. All these requirements are met by our new software. In this paper, we briefly explain the methodology from collection of oriented samples in the field to the minimum number of particles needed to obtain reliable fabric data. We obtained the data using specific statistical tests and taking into account the degree of iso-orientation of the samples and the required degree of reliability. The program has been verified by means of several simulations performed using appropriately designed features and by analyzing real samples.

  18. Dental fear and caries in 6-12 year old children in Greece. Determination of dental fear cut-off points.

    PubMed

    Boka, V; Arapostathis, K; Karagiannis, V; Kotsanos, N; van Loveren, C; Veerkamp, J

    2017-03-01

    To present: the normative data on dental fear and caries status; the dental fear cut-off points of young children in the city of Thessaloniki, Greece. Study Design: This is a cross-sectional study with two independent study groups. A first representative sample consisted of 1484 children from 15 primary public schools of Thessaloniki. A second sample consisted of 195 randomly selected age-matched children, all patients of the Postgraduate Paediatric Dental Clinic of Aristotle University of Thessaloniki. First sample: In order to select data on dental fear and caries, dental examination took place in the classroom with disposable mirrors and a penlight. All the children completed the Dental Subscale of the Children's Fear Survey Schedule (CFSS-DS). Second sample: In order to define the cut-off points of the CFSS-DS, dental treatment of the 195 children was performed at the University Clinic. Children⁁s dental fear was assessed using the CFSS-DS and their behaviour during dental treatment was observed by one calibrated examiner using the Venham scale. Statistical analysis of the data was performed with IBM SPSS Statistics 20 at a statistical significance level of <0.05. First sample: The mean CFSS-DS score was 27.1±10.8. Age was significantly (p<0.05) related to dental fear. Mean differences between boys and girls were not significant. Caries was not correlated with dental fear. Second sample: CFSS-DS< 33 was defined as 'no dental fear', scores 33-37 as 'borderline' and scores > 37 as 'dental fear'. In the first sample, 84.6% of the children did not suffer from dental fear (CFSS-DS<33). Dental fear was correlated to age and not to caries and gender. The dental fear cut-off point for the CFSS-DS was estimated at 37 for 6-12 year old children (33-37 borderlines).

  19. A comparison of likelihood ratio tests and Rao's score test for three separable covariance matrix structures.

    PubMed

    Filipiak, Katarzyna; Klein, Daniel; Roy, Anuradha

    2017-01-01

    The problem of testing the separability of a covariance matrix against an unstructured variance-covariance matrix is studied in the context of multivariate repeated measures data using Rao's score test (RST). The RST statistic is developed with the first component of the separable structure as a first-order autoregressive (AR(1)) correlation matrix or an unstructured (UN) covariance matrix under the assumption of multivariate normality. It is shown that the distribution of the RST statistic under the null hypothesis of any separability does not depend on the true values of the mean or the unstructured components of the separable structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data, where the likelihood ratio test (LRT) cannot be used, and it outperforms the standard LRT in a number of contexts. Monte Carlo simulations are then used to study the comparative behavior of the null distribution of the RST statistic, as well as that of the LRT statistic, in terms of sample size considerations, and for the estimation of the empirical percentiles. Our findings are compared with existing results where the first component of the separable structure is a compound symmetry (CS) correlation matrix. It is also shown by simulations that the empirical null distribution of the RST statistic converges faster than the empirical null distribution of the LRT statistic to the limiting χ 2 distribution. The tests are implemented on a real dataset from medical studies. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Entrepreneurial Intentions of Agricultural Students: Levels and Determinants

    ERIC Educational Resources Information Center

    Pouratashi, Mahtab

    2015-01-01

    Purpose: This paper examined levels and determinants of entrepreneurial intentions amongst agricultural students. Methodology: The statistical population comprised students in colleges of agriculture at University of Tehran. By use of a random sampling method, a sample of 120 students participated in the study. The instrument for data collection…

  1. Data Interpretation: Using Probability

    ERIC Educational Resources Information Center

    Drummond, Gordon B.; Vowler, Sarah L.

    2011-01-01

    Experimental data are analysed statistically to allow researchers to draw conclusions from a limited set of measurements. The hard fact is that researchers can never be certain that measurements from a sample will exactly reflect the properties of the entire group of possible candidates available to be studied (although using a sample is often the…

  2. Predicting Adolescent and Adult Antisocial Behavior among Adjudicated Delinquent Females

    ERIC Educational Resources Information Center

    Cernkovich, Stephen A.; Lanctot, Nadine; Giordano, Peggy C.

    2008-01-01

    Studies identifying the mechanisms underlying the causes and consequences of antisocial behavior among female delinquents as they transit to adulthood are scarce and have important limitations: Most are based on official statistics, they typically are restricted to normative samples, and rarely do they gather prospective data from samples of…

  3. MID-ATLANTIC COASTAL STREAMS STUDY: STATISTICAL DESIGN FOR REGIONAL ASSESSMENT AND LANDSCAPE MODEL DEVELOPMENT

    EPA Science Inventory

    A network of stream-sampling sites was developed for the Mid-Atlantic Coastal Plain (New Jersey through North Carolina) as part of collaborative research between the U.S. Environmental Protection Agency and the U.S. Geological Survey. A stratified random sampling with unequal wei...

  4. Properties of different selection signature statistics and a new strategy for combining them.

    PubMed

    Ma, Y; Ding, X; Qanbari, S; Weigend, S; Zhang, Q; Simianer, H

    2015-11-01

    Identifying signatures of recent or ongoing selection is of high relevance in livestock population genomics. From a statistical perspective, determining a proper testing procedure and combining various test statistics is challenging. On the basis of extensive simulations in this study, we discuss the statistical properties of eight different established selection signature statistics. In the considered scenario, we show that a reasonable power to detect selection signatures is achieved with high marker density (>1 SNP/kb) as obtained from sequencing, while rather small sample sizes (~15 diploid individuals) appear to be sufficient. Most selection signature statistics such as composite likelihood ratio and cross population extended haplotype homozogysity have the highest power when fixation of the selected allele is reached, while integrated haplotype score has the highest power when selection is ongoing. We suggest a novel strategy, called de-correlated composite of multiple signals (DCMS) to combine different statistics for detecting selection signatures while accounting for the correlation between the different selection signature statistics. When examined with simulated data, DCMS consistently has a higher power than most of the single statistics and shows a reliable positional resolution. We illustrate the new statistic to the established selective sweep around the lactase gene in human HapMap data providing further evidence of the reliability of this new statistic. Then, we apply it to scan selection signatures in two chicken samples with diverse skin color. Our analysis suggests that a set of well-known genes such as BCO2, MC1R, ASIP and TYR were involved in the divergent selection for this trait.

  5. Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: a Monte Carlo study.

    PubMed

    Chou, C P; Bentler, P M; Satorra, A

    1991-11-01

    Research studying robustness of maximum likelihood (ML) statistics in covariance structure analysis has concluded that test statistics and standard errors are biased under severe non-normality. An estimation procedure known as asymptotic distribution free (ADF), making no distributional assumption, has been suggested to avoid these biases. Corrections to the normal theory statistics to yield more adequate performance have also been proposed. This study compares the performance of a scaled test statistic and robust standard errors for two models under several non-normal conditions and also compares these with the results from ML and ADF methods. Both ML and ADF test statistics performed rather well in one model and considerably worse in the other. In general, the scaled test statistic seemed to behave better than the ML test statistic and the ADF statistic performed the worst. The robust and ADF standard errors yielded more appropriate estimates of sampling variability than the ML standard errors, which were usually downward biased, in both models under most of the non-normal conditions. ML test statistics and standard errors were found to be quite robust to the violation of the normality assumption when data had either symmetric and platykurtic distributions, or non-symmetric and zero kurtotic distributions.

  6. [An investigation of the statistical power of the effect size in randomized controlled trials for the treatment of patients with type 2 diabetes mellitus using Chinese medicine].

    PubMed

    Ma, Li-Xin; Liu, Jian-Ping

    2012-01-01

    To investigate whether the power of the effect size was based on adequate sample size in randomized controlled trials (RCTs) for the treatment of patients with type 2 diabetes mellitus (T2DM) using Chinese medicine. China Knowledge Resource Integrated Database (CNKI), VIP Database for Chinese Technical Periodicals (VIP), Chinese Biomedical Database (CBM), and Wangfang Data were systematically recruited using terms like "Xiaoke" or diabetes, Chinese herbal medicine, patent medicine, traditional Chinese medicine, randomized, controlled, blinded, and placebo-controlled. Limitation was set on the intervention course > or = 3 months in order to identify the information of outcome assessement and the sample size. Data collection forms were made according to the checking lists found in the CONSORT statement. Independent double data extractions were performed on all included trials. The statistical power of the effects size for each RCT study was assessed using sample size calculation equations. (1) A total of 207 RCTs were included, including 111 superiority trials and 96 non-inferiority trials. (2) Among the 111 superiority trials, fasting plasma glucose (FPG) and glycosylated hemoglobin HbA1c (HbA1c) outcome measure were reported in 9% and 12% of the RCTs respectively with the sample size > 150 in each trial. For the outcome of HbA1c, only 10% of the RCTs had more than 80% power. For FPG, 23% of the RCTs had more than 80% power. (3) In the 96 non-inferiority trials, the outcomes FPG and HbA1c were reported as 31% and 36% respectively. These RCTs had a samples size > 150. For HbA1c only 36% of the RCTs had more than 80% power. For FPG, only 27% of the studies had more than 80% power. The sample size for statistical analysis was distressingly low and most RCTs did not achieve 80% power. In order to obtain a sufficient statistic power, it is recommended that clinical trials should establish clear research objective and hypothesis first, and choose scientific and evidence-based study design and outcome measurements. At the same time, calculate required sample size to ensure a precise research conclusion.

  7. Dental hygiene students' perceptions of distance learning: do they change over time?

    PubMed

    Sledge, Rhonda; Vuk, Jasna; Long, Susan

    2014-02-01

    The University of Arkansas for Medical Sciences dental hygiene program established a distant site where the didactic curriculum was broadcast via interactive video from the main campus to the distant site, supplemented with on-line learning via Blackboard. This study compared the perceptions of students towards distance learning as they progressed through the 21 month curriculum. Specifically, the study sought to answer the following questions: Is there a difference in the initial perceptions of students on the main campus and at the distant site toward distance learning? Do students' perceptions change over time with exposure to synchronous distance learning over the course of the curriculum? All 39 subjects were women between the ages of 20 and 35 years. Of the 39 subjects, 37 were Caucasian and 2 were African-American. A 15-question Likert scale survey was administered at 4 different periods during the 21 month program to compare changes in perceptions toward distance learning as students progressed through the program. An independent sample t-test and ANOVA were utilized for statistical analysis. At the beginning of the program, independent samples t-test revealed that students at the main campus (n=34) perceived statistically significantly higher effectiveness of distance learning than students at the distant site (n=5). Repeated measures of ANOVA revealed that perceptions of students at the main campus on effectiveness and advantages of distance learning statistically significantly decreased whereas perceptions of students at distant site statistically significantly increased over time. Distance learning in the dental hygiene program was discussed, and replication of the study with larger samples of students was recommended.

  8. Assessing Understanding of Sampling Distributions and Differences in Learning amongst Different Learning Styles

    ERIC Educational Resources Information Center

    Beeman, Jennifer Leigh Sloan

    2013-01-01

    Research has found that students successfully complete an introductory course in statistics without fully comprehending the underlying theory or being able to exhibit statistical reasoning. This is particularly true for the understanding about the sampling distribution of the mean, a crucial concept for statistical inference. This study…

  9. Item Analysis Appropriate for Domain-Referenced Classroom Testing. (Project Technical Report Number 1).

    ERIC Educational Resources Information Center

    Nitko, Anthony J.; Hsu, Tse-chi

    Item analysis procedures appropriate for domain-referenced classroom testing are described. A conceptual framework within which item statistics can be considered and promising statistics in light of this framework are presented. The sampling fluctuations of the more promising item statistics for sample sizes comparable to the typical classroom…

  10. Improving Statistics Education through Simulations: The Case of the Sampling Distribution.

    ERIC Educational Resources Information Center

    Earley, Mark A.

    This paper presents a summary of action research investigating statistics students' understandings of the sampling distribution of the mean. With four sections of an introductory Statistics in Education course (n=98 students), a computer simulation activity (R. delMas, J. Garfield, and B. Chance, 1999) was implemented and evaluated to show…

  11. Reliability and One-Year Stability of the PIN3 Neighborhood Environmental Audit in Urban and Rural Neighborhoods.

    PubMed

    Porter, Anna K; Wen, Fang; Herring, Amy H; Rodríguez, Daniel A; Messer, Lynne C; Laraia, Barbara A; Evenson, Kelly R

    2018-06-01

    Reliable and stable environmental audit instruments are needed to successfully identify the physical and social attributes that may influence physical activity. This study described the reliability and stability of the PIN3 environmental audit instrument in both urban and rural neighborhoods. Four randomly sampled road segments in and around a one-quarter mile buffer of participants' residences from the Pregnancy, Infection, and Nutrition (PIN3) study were rated twice, approximately 2 weeks apart. One year later, 253 of the year 1 sampled roads were re-audited. The instrument included 43 measures that resulted in 73 item scores for calculation of percent overall agreement, kappa statistics, and log-linear models. For same-day reliability, 81% of items had moderate to outstanding kappa statistics (kappas ≥ 0.4). Two-week reliability was slightly lower, with 77% of items having moderate to outstanding agreement using kappa statistics. One-year stability had 68% of items showing moderate to outstanding agreement using kappa statistics. The reliability of the audit measures was largely consistent when comparing urban to rural locations, with only 8% of items exhibiting significant differences (α < 0.05) by urbanicity. The PIN3 instrument is a reliable and stable audit tool for studies assessing neighborhood attributes in urban and rural environments.

  12. Prevalence of diseases and statistical power of the Japan Nurses' Health Study.

    PubMed

    Fujita, Toshiharu; Hayashi, Kunihiko; Katanoda, Kota; Matsumura, Yasuhiro; Lee, Jung Su; Takagi, Hirofumi; Suzuki, Shosuke; Mizunuma, Hideki; Aso, Takeshi

    2007-10-01

    The Japan Nurses' Health Study (JNHS) is a long-term, large-scale cohort study investigating the effects of various lifestyle factors and healthcare habits on the health of Japanese women. Based on currently limited statistical data regarding the incidence of disease among Japanese women, our initial sample size was tentatively set at 50,000 during the design phase. The actual number of women who agreed to participate in follow-up surveys was approximately 18,000. Taking into account the actual sample size and new information on disease frequency obtained during the baseline component, we established the prevalence of past diagnoses of target diseases, predicted their incidence, and calculated the statistical power for JNHS follow-up surveys. For all diseases except ovarian cancer, the prevalence of a past diagnosis increased markedly with age, and incidence rates could be predicted based on the degree of increase in prevalence between two adjacent 5-yr age groups. The predicted incidence rate for uterine myoma, hypercholesterolemia, and hypertension was > or =3.0 (per 1,000 women, per year), while the rate of thyroid disease, hepatitis, gallstone disease, and benign breast tumor was predicted to be > or =1.0. For these diseases, the statistical power to detect risk factors with a relative risk of 1.5 or more within ten years, was 70% or higher.

  13. [A Review on the Use of Effect Size in Nursing Research].

    PubMed

    Kang, Hyuncheol; Yeon, Kyupil; Han, Sang Tae

    2015-10-01

    The purpose of this study was to introduce the main concepts of statistical testing and effect size and to provide researchers in nursing science with guidance on how to calculate the effect size for the statistical analysis methods mainly used in nursing. For t-test, analysis of variance, correlation analysis, regression analysis which are used frequently in nursing research, the generally accepted definitions of the effect size were explained. Some formulae for calculating the effect size are described with several examples in nursing research. Furthermore, the authors present the required minimum sample size for each example utilizing G*Power 3 software that is the most widely used program for calculating sample size. It is noted that statistical significance testing and effect size measurement serve different purposes, and the reliance on only one side may be misleading. Some practical guidelines are recommended for combining statistical significance testing and effect size measure in order to make more balanced decisions in quantitative analyses.

  14. Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation.

    PubMed

    Yin, Guosheng; Ma, Yanyuan

    2013-01-01

    The Pearson test statistic is constructed by partitioning the data into bins and computing the difference between the observed and expected counts in these bins. If the maximum likelihood estimator (MLE) of the original data is used, the statistic generally does not follow a chi-squared distribution or any explicit distribution. We propose a bootstrap-based modification of the Pearson test statistic to recover the chi-squared distribution. We compute the observed and expected counts in the partitioned bins by using the MLE obtained from a bootstrap sample. This bootstrap-sample MLE adjusts exactly the right amount of randomness to the test statistic, and recovers the chi-squared distribution. The bootstrap chi-squared test is easy to implement, as it only requires fitting exactly the same model to the bootstrap data to obtain the corresponding MLE, and then constructs the bin counts based on the original data. We examine the test size and power of the new model diagnostic procedure using simulation studies and illustrate it with a real data set.

  15. Drinking water quality assessment.

    PubMed

    Aryal, J; Gautam, B; Sapkota, N

    2012-09-01

    Drinking water quality is the great public health concern because it is a major risk factor for high incidence of diarrheal diseases in Nepal. In the recent years, the prevalence rate of diarrhoea has been found the highest in Myagdi district. This study was carried out to assess the quality of drinking water from different natural sources, reservoirs and collection taps at Arthunge VDC of Myagdi district. A cross-sectional study was carried out using random sampling method in Arthunge VDC of Myagdi district from January to June,2010. 84 water samples representing natural sources, reservoirs and collection taps from the study area were collected. The physico-chemical and microbiological analysis was performed following standards technique set by APHA 1998 and statistical analysis was carried out using SPSS 11.5. The result was also compared with national and WHO guidelines. Out of 84 water samples (from natural source, reservoirs and tap water) analyzed, drinking water quality parameters (except arsenic and total coliform) of all water samples was found to be within the WHO standards and national standards.15.48% of water samples showed pH (13) higher than the WHO permissible guideline values. Similarly, 85.71% of water samples showed higher Arsenic value (72) than WHO value. Further, the statistical analysis showed no significant difference (P<0.05) of physico-chemical parameters and total coliform count of drinking water for collection taps water samples of winter (January, 2010) and summer (June, 2010). The microbiological examination of water samples revealed the presence of total coliform in 86.90% of water samples. The results obtained from physico-chemical analysis of water samples were within national standard and WHO standards except arsenic. The study also found the coliform contamination to be the key problem with drinking water.

  16. Sampling of prenatal and postnatal offspring from individual rat dams enhances animal use without compromising development

    NASA Technical Reports Server (NTRS)

    Alberts, J. R.; Burden, H. W.; Hawes, N.; Ronca, A. E.

    1996-01-01

    To assess prenatal and postnatal developmental status in the offspring of a group of animals, it is typical to examine fetuses from some of the dams as well as infants born to the remaining dams. Statistical limitations often arise, particularly when the animals are rare or especially precious, because all offspring of the dam represent only a single statistical observation; littermates are not independent observations (biologically or statistically). We describe a study in which pregnant laboratory rats were laparotomized on day 7 of gestation (GD7) to ascertain the number and distribution of uterine implantation sites and were subjected to a simulated experience on a 10-day space shuttle flight. After the simulated landing on GD18, rats were unilaterally hysterectomized, thus providing a sample of fetuses from 10 independent uteruses, followed by successful vaginal delivery on GD22, yielding postnatal samples from 10 uteruses. A broad profile of maternal and offspring morphologic and physiologic measures indicated that these novel sampling procedures did not compromise maternal well-being and maintained normal offspring development and function. Measures included maternal organ weights and hormone concentrations, offspring body size, growth, organ weights, sexual differentiation, and catecholamine concentrations.

  17. Prevalence of hepatitis-C virus genotypes and potential transmission risks in Malakand Khyber Pakhtunkhwa, Pakistan.

    PubMed

    Nazir, Nausheen; Jan, Muhammad Rasul; Ali, Amjad; Asif, Muhammad; Idrees, Muhammad; Nisar, Mohammad; Zahoor, Muhammad; Abd El-Salam, Naser M

    2017-08-22

    Hepatitis C virus (HCV) is a leading cause of chronic liver disease and frequently progresses towards liver cirrhosis and Hepatocellular Carcinoma (HCC). This study aimed to determine the prevalence of HCV genotypes and their association with possible transmission risks in the general population of Malakand Division. Sum of 570 serum samples were collected during March 2011 to January 2012 from suspected patients visited to different hospitals of Malakand. The suspected sera were tested using qualitative PCR and were then subjected to molecular genotype specific assay. Quantitative PCR was also performed for determination of pre-treatment viral load in confirmed positive patients. Out of 570 serum samples 316 sera were seen positive while 254 sera were found negative using qualitative PCR. The positive samples were then subjected to genotyping assay out of 316, type-specific PCR fragments were seen in 271 sera while 45 samples were found untypable genotypes. Genotype 3a was seen as a predominant genotype (63.3%) with a standard error of ±2.7%. Cramer's V statistic and Liklihood-Ratio statistical procedures are used to measure the strength and to test the association, respectively, between the dependent variable, genotype, and explanatory variables (e.g. gender, risk, age and area/districts). The dependent variable, genotype, is observed statistically significant association with variable risk factors. This implies that the genotype is highly dependent on how the patient was infected. In contrast, the other covariates, for example, gender, age, and district (area) no statistical significant association are observed. The association between gender-age indicates that the mean age of female was older by 10.5 ± 2.3 years with 95% confidence level using t-statistic. It was concluded from the present study that the predominant genotype was 3a in the infected population of Malakand. This study also highlights the high prevalence rate of untypable genotypes which an important issue of health care setup in Malakand and create complications in therapy of infected patients. Major mode of HCV transmission is multiple uses and re-uses of needles/injections. ISRCTN ISRCTN73824458. Registered: 28 September 2014.

  18. Power-up: A Reanalysis of 'Power Failure' in Neuroscience Using Mixture Modeling

    PubMed Central

    Wood, John

    2017-01-01

    Recently, evidence for endemically low statistical power has cast neuroscience findings into doubt. If low statistical power plagues neuroscience, then this reduces confidence in the reported effects. However, if statistical power is not uniformly low, then such blanket mistrust might not be warranted. Here, we provide a different perspective on this issue, analyzing data from an influential study reporting a median power of 21% across 49 meta-analyses (Button et al., 2013). We demonstrate, using Gaussian mixture modeling, that the sample of 730 studies included in that analysis comprises several subcomponents so the use of a single summary statistic is insufficient to characterize the nature of the distribution. We find that statistical power is extremely low for studies included in meta-analyses that reported a null result and that it varies substantially across subfields of neuroscience, with particularly low power in candidate gene association studies. Therefore, whereas power in neuroscience remains a critical issue, the notion that studies are systematically underpowered is not the full story: low power is far from a universal problem. SIGNIFICANCE STATEMENT Recently, researchers across the biomedical and psychological sciences have become concerned with the reliability of results. One marker for reliability is statistical power: the probability of finding a statistically significant result given that the effect exists. Previous evidence suggests that statistical power is low across the field of neuroscience. Our results present a more comprehensive picture of statistical power in neuroscience: on average, studies are indeed underpowered—some very seriously so—but many studies show acceptable or even exemplary statistical power. We show that this heterogeneity in statistical power is common across most subfields in neuroscience. This new, more nuanced picture of statistical power in neuroscience could affect not only scientific understanding, but potentially policy and funding decisions for neuroscience research. PMID:28706080

  19. Genome-wide association study of bipolar disorder in European American and African American individuals.

    PubMed

    Smith, E N; Bloss, C S; Badner, J A; Barrett, T; Belmonte, P L; Berrettini, W; Byerley, W; Coryell, W; Craig, D; Edenberg, H J; Eskin, E; Foroud, T; Gershon, E; Greenwood, T A; Hipolito, M; Koller, D L; Lawson, W B; Liu, C; Lohoff, F; McInnis, M G; McMahon, F J; Mirel, D B; Murray, S S; Nievergelt, C; Nurnberger, J; Nwulia, E A; Paschall, J; Potash, J B; Rice, J; Schulze, T G; Scheftner, W; Panganiban, C; Zaitlen, N; Zandi, P P; Zöllner, S; Schork, N J; Kelsoe, J R

    2009-08-01

    To identify bipolar disorder (BD) genetic susceptibility factors, we conducted two genome-wide association (GWA) studies: one involving a sample of individuals of European ancestry (EA; n=1001 cases; n=1033 controls), and one involving a sample of individuals of African ancestry (AA; n=345 cases; n=670 controls). For the EA sample, single-nucleotide polymorphisms (SNPs) with the strongest statistical evidence for association included rs5907577 in an intergenic region at Xq27.1 (P=1.6 x 10(-6)) and rs10193871 in NAP5 at 2q21.2 (P=9.8 x 10(-6)). For the AA sample, SNPs with the strongest statistical evidence for association included rs2111504 in DPY19L3 at 19q13.11 (P=1.5 x 10(-6)) and rs2769605 in NTRK2 at 9q21.33 (P=4.5 x 10(-5)). We also investigated whether we could provide support for three regions previously associated with BD, and we showed that the ANK3 region replicates in our sample, along with some support for C15Orf53; other evidence implicates BD candidate genes such as SLITRK2. We also tested the hypothesis that BD susceptibility variants exhibit genetic background-dependent effects. SNPs with the strongest statistical evidence for genetic background effects included rs11208285 in ROR1 at 1p31.3 (P=1.4 x 10(-6)), rs4657247 in RGS5 at 1q23.3 (P=4.1 x 10(-6)), and rs7078071 in BTBD16 at 10q26.13 (P=4.5 x 10(-6)). This study is the first to conduct GWA of BD in individuals of AA and suggests that genetic variations that contribute to BD may vary as a function of ancestry.

  20. Orphan therapies: making best use of postmarket data.

    PubMed

    Maro, Judith C; Brown, Jeffrey S; Dal Pan, Gerald J; Li, Lingling

    2014-08-01

    Postmarket surveillance of the comparative safety and efficacy of orphan therapeutics is challenging, particularly when multiple therapeutics are licensed for the same orphan indication. To make best use of product-specific registry data collected to fulfill regulatory requirements, we propose the creation of a distributed electronic health data network among registries. Such a network could support sequential statistical analyses designed to detect early warnings of excess risks. We use a simulated example to explore the circumstances under which a distributed network may prove advantageous. We perform sample size calculations for sequential and non-sequential statistical studies aimed at comparing the incidence of hepatotoxicity following initiation of two newly licensed therapies for homozygous familial hypercholesterolemia. We calculate the sample size savings ratio, or the proportion of sample size saved if one conducted a sequential study as compared to a non-sequential study. Then, using models to describe the adoption and utilization of these therapies, we simulate when these sample sizes are attainable in calendar years. We then calculate the analytic calendar time savings ratio, analogous to the sample size savings ratio. We repeat these analyses for numerous scenarios. Sequential analyses detect effect sizes earlier or at the same time as non-sequential analyses. The most substantial potential savings occur when the market share is more imbalanced (i.e., 90% for therapy A) and the effect size is closest to the null hypothesis. However, due to low exposure prevalence, these savings are difficult to realize within the 30-year time frame of this simulation for scenarios in which the outcome of interest occurs at or more frequently than one event/100 person-years. We illustrate a process to assess whether sequential statistical analyses of registry data performed via distributed networks may prove a worthwhile infrastructure investment for pharmacovigilance.

  1. Anthropometric Characteristics of Columbia, South Carolina, Youth Baseball Players and Dixie Youth World Series Players

    ERIC Educational Resources Information Center

    French, Karen E.; Spurgeon, John H.; Nevett, Michael E.

    2007-01-01

    The purpose of this study was to compare measures of body size in two samples of youth baseball players with normative data from the United States National Center for Health Statistics (NCHS) growth charts. One sample of youth baseball players participated in a local little league. The second sample of youth baseball players were members of eight…

  2. Overweight and Obesity: Prevalence and Correlates in a Large Clinical Sample of Children with Autism Spectrum Disorder

    ERIC Educational Resources Information Center

    Zuckerman, Katharine E.; Hill, Alison P.; Guion, Kimberly; Voltolina, Lisa; Fombonne, Eric

    2014-01-01

    Autism Spectrum Disorders (ASDs) and childhood obesity (OBY) are rising public health concerns. This study aimed to evaluate the prevalence of overweight (OWT) and OBY in a sample of 376 Oregon children with ASD, and to assess correlates of OWT and OBY in this sample. We used descriptive statistics, bivariate, and focused multivariate analyses to…

  3. Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.

    PubMed

    Dazard, Jean-Eudes; Rao, J Sunil

    2012-07-01

    The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.

  4. Office for Civil Rights Survey Redesign: A Feasibility Survey. Contractor Report. Statistical Analysis Report.

    ERIC Educational Resources Information Center

    Mansfield, Wendy; Farris, Elizabeth

    This report provides results of a Fast Response Survey System (FRSS) study conducted by the National Center for Education Statistics for the Office for Civil Rights (OCR). The OCR wanted input for their decision-making process on possible modifications to their biennial survey of a national sample of public school districts (PSDs). The survey, the…

  5. Shoulder strength value differences between genders and age groups.

    PubMed

    Balcells-Diaz, Eudald; Daunis-I-Estadella, Pepus

    2018-03-01

    The strength of a normal shoulder differs according to gender and decreases with age. Therefore, the Constant score, which is a shoulder function measurement tool that allocates 25% of the final score to strength, differs from the absolute values but likely reflects a normal shoulder. To compare group results, a normalized Constant score is needed, and the first step to achieving normalization involves statistically establishing the gender differences and age-related decline. In this investigation, we sought to verify the gender difference and age-related decline in strength. We obtained a randomized representative sample of the general population in a small to medium-sized Spanish city. We then invited this population to participate in our study, and we measured their shoulder strength. We performed a statistical analysis with a power of 80% and a P value < .05. We observed a statistically significant difference between the genders and a statistically significant decline with age. To the best of our knowledge, this is the first investigation to study a representative sample of the general population from which conclusions can be drawn regarding Constant score normalization. Copyright © 2017 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.

  6. The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?

    PubMed Central

    2010-01-01

    Background Pseudoreplication occurs when observations are not statistically independent, but treated as if they are. This can occur when there are multiple observations on the same subjects, when samples are nested or hierarchically organised, or when measurements are correlated in time or space. Analysis of such data without taking these dependencies into account can lead to meaningless results, and examples can easily be found in the neuroscience literature. Results A single issue of Nature Neuroscience provided a number of examples and is used as a case study to highlight how pseudoreplication arises in neuroscientific studies, why the analyses in these papers are incorrect, and appropriate analytical methods are provided. 12% of papers had pseudoreplication and a further 36% were suspected of having pseudoreplication, but it was not possible to determine for certain because insufficient information was provided. Conclusions Pseudoreplication can undermine the conclusions of a statistical analysis, and it would be easier to detect if the sample size, degrees of freedom, the test statistic, and precise p-values are reported. This information should be a requirement for all publications. PMID:20074371

  7. Assessment of sampling stability in ecological applications of discriminant analysis

    USGS Publications Warehouse

    Williams, B.K.; Titus, K.

    1988-01-01

    A simulation study was undertaken to assess the sampling stability of the variable loadings in linear discriminant function analysis. A factorial design was used for the factors of multivariate dimensionality, dispersion structure, configuration of group means, and sample size. A total of 32,400 discriminant analyses were conducted, based on data from simulated populations with appropriate underlying statistical distributions. A review of 60 published studies and 142 individual analyses indicated that sample sizes in ecological studies often have met that requirement. However, individual group sample sizes frequently were very unequal, and checks of assumptions usually were not reported. The authors recommend that ecologists obtain group sample sizes that are at least three times as large as the number of variables measured.

  8. Establishing Statistical Equivalence of Data from Different Sampling Approaches for Assessment of Bacterial Phenotypic Antimicrobial Resistance

    PubMed Central

    2018-01-01

    ABSTRACT To assess phenotypic bacterial antimicrobial resistance (AMR) in different strata (e.g., host populations, environmental areas, manure, or sewage effluents) for epidemiological purposes, isolates of target bacteria can be obtained from a stratum using various sample types. Also, different sample processing methods can be applied. The MIC of each target antimicrobial drug for each isolate is measured. Statistical equivalence testing of the MIC data for the isolates allows evaluation of whether different sample types or sample processing methods yield equivalent estimates of the bacterial antimicrobial susceptibility in the stratum. We demonstrate this approach on the antimicrobial susceptibility estimates for (i) nontyphoidal Salmonella spp. from ground or trimmed meat versus cecal content samples of cattle in processing plants in 2013-2014 and (ii) nontyphoidal Salmonella spp. from urine, fecal, and blood human samples in 2015 (U.S. National Antimicrobial Resistance Monitoring System data). We found that the sample types for cattle yielded nonequivalent susceptibility estimates for several antimicrobial drug classes and thus may gauge distinct subpopulations of salmonellae. The quinolone and fluoroquinolone susceptibility estimates for nontyphoidal salmonellae from human blood are nonequivalent to those from urine or feces, conjecturally due to the fluoroquinolone (ciprofloxacin) use to treat infections caused by nontyphoidal salmonellae. We also demonstrate statistical equivalence testing for comparing sample processing methods for fecal samples (culturing one versus multiple aliquots per sample) to assess AMR in fecal Escherichia coli. These methods yield equivalent results, except for tetracyclines. Importantly, statistical equivalence testing provides the MIC difference at which the data from two sample types or sample processing methods differ statistically. Data users (e.g., microbiologists and epidemiologists) may then interpret practical relevance of the difference. IMPORTANCE Bacterial antimicrobial resistance (AMR) needs to be assessed in different populations or strata for the purposes of surveillance and determination of the efficacy of interventions to halt AMR dissemination. To assess phenotypic antimicrobial susceptibility, isolates of target bacteria can be obtained from a stratum using different sample types or employing different sample processing methods in the laboratory. The MIC of each target antimicrobial drug for each of the isolates is measured, yielding the MIC distribution across the isolates from each sample type or sample processing method. We describe statistical equivalence testing for the MIC data for evaluating whether two sample types or sample processing methods yield equivalent estimates of the bacterial phenotypic antimicrobial susceptibility in the stratum. This includes estimating the MIC difference at which the data from the two approaches differ statistically. Data users (e.g., microbiologists, epidemiologists, and public health professionals) can then interpret whether that present difference is practically relevant. PMID:29475868

  9. Establishing Statistical Equivalence of Data from Different Sampling Approaches for Assessment of Bacterial Phenotypic Antimicrobial Resistance.

    PubMed

    Shakeri, Heman; Volkova, Victoriya; Wen, Xuesong; Deters, Andrea; Cull, Charley; Drouillard, James; Müller, Christian; Moradijamei, Behnaz; Jaberi-Douraki, Majid

    2018-05-01

    To assess phenotypic bacterial antimicrobial resistance (AMR) in different strata (e.g., host populations, environmental areas, manure, or sewage effluents) for epidemiological purposes, isolates of target bacteria can be obtained from a stratum using various sample types. Also, different sample processing methods can be applied. The MIC of each target antimicrobial drug for each isolate is measured. Statistical equivalence testing of the MIC data for the isolates allows evaluation of whether different sample types or sample processing methods yield equivalent estimates of the bacterial antimicrobial susceptibility in the stratum. We demonstrate this approach on the antimicrobial susceptibility estimates for (i) nontyphoidal Salmonella spp. from ground or trimmed meat versus cecal content samples of cattle in processing plants in 2013-2014 and (ii) nontyphoidal Salmonella spp. from urine, fecal, and blood human samples in 2015 (U.S. National Antimicrobial Resistance Monitoring System data). We found that the sample types for cattle yielded nonequivalent susceptibility estimates for several antimicrobial drug classes and thus may gauge distinct subpopulations of salmonellae. The quinolone and fluoroquinolone susceptibility estimates for nontyphoidal salmonellae from human blood are nonequivalent to those from urine or feces, conjecturally due to the fluoroquinolone (ciprofloxacin) use to treat infections caused by nontyphoidal salmonellae. We also demonstrate statistical equivalence testing for comparing sample processing methods for fecal samples (culturing one versus multiple aliquots per sample) to assess AMR in fecal Escherichia coli These methods yield equivalent results, except for tetracyclines. Importantly, statistical equivalence testing provides the MIC difference at which the data from two sample types or sample processing methods differ statistically. Data users (e.g., microbiologists and epidemiologists) may then interpret practical relevance of the difference. IMPORTANCE Bacterial antimicrobial resistance (AMR) needs to be assessed in different populations or strata for the purposes of surveillance and determination of the efficacy of interventions to halt AMR dissemination. To assess phenotypic antimicrobial susceptibility, isolates of target bacteria can be obtained from a stratum using different sample types or employing different sample processing methods in the laboratory. The MIC of each target antimicrobial drug for each of the isolates is measured, yielding the MIC distribution across the isolates from each sample type or sample processing method. We describe statistical equivalence testing for the MIC data for evaluating whether two sample types or sample processing methods yield equivalent estimates of the bacterial phenotypic antimicrobial susceptibility in the stratum. This includes estimating the MIC difference at which the data from the two approaches differ statistically. Data users (e.g., microbiologists, epidemiologists, and public health professionals) can then interpret whether that present difference is practically relevant. Copyright © 2018 Shakeri et al.

  10. Regression modeling of particle size distributions in urban storm water: advancements through improved sample collection methods

    USGS Publications Warehouse

    Fienen, Michael N.; Selbig, William R.

    2012-01-01

    A new sample collection system was developed to improve the representation of sediment entrained in urban storm water by integrating water quality samples from the entire water column. The depth-integrated sampler arm (DISA) was able to mitigate sediment stratification bias in storm water, thereby improving the characterization of suspended-sediment concentration and particle size distribution at three independent study locations. Use of the DISA decreased variability, which improved statistical regression to predict particle size distribution using surrogate environmental parameters, such as precipitation depth and intensity. The performance of this statistical modeling technique was compared to results using traditional fixed-point sampling methods and was found to perform better. When environmental parameters can be used to predict particle size distributions, environmental managers have more options when characterizing concentrations, loads, and particle size distributions in urban runoff.

  11. Optimizing Integrated Terminal Airspace Operations Under Uncertainty

    NASA Technical Reports Server (NTRS)

    Bosson, Christabelle; Xue, Min; Zelinski, Shannon

    2014-01-01

    In the terminal airspace, integrated departures and arrivals have the potential to increase operations efficiency. Recent research has developed geneticalgorithm- based schedulers for integrated arrival and departure operations under uncertainty. This paper presents an alternate method using a machine jobshop scheduling formulation to model the integrated airspace operations. A multistage stochastic programming approach is chosen to formulate the problem and candidate solutions are obtained by solving sample average approximation problems with finite sample size. Because approximate solutions are computed, the proposed algorithm incorporates the computation of statistical bounds to estimate the optimality of the candidate solutions. A proof-ofconcept study is conducted on a baseline implementation of a simple problem considering a fleet mix of 14 aircraft evolving in a model of the Los Angeles terminal airspace. A more thorough statistical analysis is also performed to evaluate the impact of the number of scenarios considered in the sampled problem. To handle extensive sampling computations, a multithreading technique is introduced.

  12. Advanced statistical methods for improved data analysis of NASA astrophysics missions

    NASA Technical Reports Server (NTRS)

    Feigelson, Eric D.

    1992-01-01

    The investigators under this grant studied ways to improve the statistical analysis of astronomical data. They looked at existing techniques, the development of new techniques, and the production and distribution of specialized software to the astronomical community. Abstracts of nine papers that were produced are included, as well as brief descriptions of four software packages. The articles that are abstracted discuss analytical and Monte Carlo comparisons of six different linear least squares fits, a (second) paper on linear regression in astronomy, two reviews of public domain software for the astronomer, subsample and half-sample methods for estimating sampling distributions, a nonparametric estimation of survival functions under dependent competing risks, censoring in astronomical data due to nondetections, an astronomy survival analysis computer package called ASURV, and improving the statistical methodology of astronomical data analysis.

  13. Considerations for the design, analysis and presentation of in vivo studies.

    PubMed

    Ranstam, J; Cook, J A

    2017-03-01

    To describe, explain and give practical suggestions regarding important principles and key methodological challenges in the study design, statistical analysis, and reporting of results from in vivo studies. Pre-specifying endpoints and analysis, recognizing the common underlying assumption of statistically independent observations, performing sample size calculations, and addressing multiplicity issues are important parts of an in vivo study. A clear reporting of results and informative graphical presentations of data are other important parts. Copyright © 2016 Osteoarthritis Research Society International. Published by Elsevier Ltd. All rights reserved.

  14. Geospatial techniques for developing a sampling frame of watersheds across a region

    USGS Publications Warehouse

    Gresswell, Robert E.; Bateman, Douglas S.; Lienkaemper, George; Guy, T.J.

    2004-01-01

    Current land-management decisions that affect the persistence of native salmonids are often influenced by studies of individual sites that are selected based on judgment and convenience. Although this approach is useful for some purposes, extrapolating results to areas that were not sampled is statistically inappropriate because the sampling design is usually biased. Therefore, in recent investigations of coastal cutthroat trout (Oncorhynchus clarki clarki) located above natural barriers to anadromous salmonids, we used a methodology for extending the statistical scope of inference. The purpose of this paper is to apply geospatial tools to identify a population of watersheds and develop a probability-based sampling design for coastal cutthroat trout in western Oregon, USA. The population of mid-size watersheds (500-5800 ha) west of the Cascade Range divide was derived from watershed delineations based on digital elevation models. Because a database with locations of isolated populations of coastal cutthroat trout did not exist, a sampling frame of isolated watersheds containing cutthroat trout had to be developed. After the sampling frame of watersheds was established, isolated watersheds with coastal cutthroat trout were stratified by ecoregion and erosion potential based on dominant bedrock lithology (i.e., sedimentary and igneous). A stratified random sample of 60 watersheds was selected with proportional allocation in each stratum. By comparing watershed drainage areas of streams in the general population to those in the sampling frame and the resulting sample (n = 60), we were able to evaluate the how representative the subset of watersheds was in relation to the population of watersheds. Geospatial tools provided a relatively inexpensive means to generate the information necessary to develop a statistically robust, probability-based sampling design.

  15. Analysis of statistical misconception in terms of statistical reasoning

    NASA Astrophysics Data System (ADS)

    Maryati, I.; Priatna, N.

    2018-05-01

    Reasoning skill is needed for everyone to face globalization era, because every person have to be able to manage and use information from all over the world which can be obtained easily. Statistical reasoning skill is the ability to collect, group, process, interpret, and draw conclusion of information. Developing this skill can be done through various levels of education. However, the skill is low because many people assume that statistics is just the ability to count and using formulas and so do students. Students still have negative attitude toward course which is related to research. The purpose of this research is analyzing students’ misconception in descriptive statistic course toward the statistical reasoning skill. The observation was done by analyzing the misconception test result and statistical reasoning skill test; observing the students’ misconception effect toward statistical reasoning skill. The sample of this research was 32 students of math education department who had taken descriptive statistic course. The mean value of misconception test was 49,7 and standard deviation was 10,6 whereas the mean value of statistical reasoning skill test was 51,8 and standard deviation was 8,5. If the minimal value is 65 to state the standard achievement of a course competence, students’ mean value is lower than the standard competence. The result of students’ misconception study emphasized on which sub discussion that should be considered. Based on the assessment result, it was found that students’ misconception happen on this: 1) writing mathematical sentence and symbol well, 2) understanding basic definitions, 3) determining concept that will be used in solving problem. In statistical reasoning skill, the assessment was done to measure reasoning from: 1) data, 2) representation, 3) statistic format, 4) probability, 5) sample, and 6) association.

  16. Sample size determination for estimating antibody seroconversion rate under stable malaria transmission intensity.

    PubMed

    Sepúlveda, Nuno; Drakeley, Chris

    2015-04-03

    In the last decade, several epidemiological studies have demonstrated the potential of using seroprevalence (SP) and seroconversion rate (SCR) as informative indicators of malaria burden in low transmission settings or in populations on the cusp of elimination. However, most of studies are designed to control ensuing statistical inference over parasite rates and not on these alternative malaria burden measures. SP is in essence a proportion and, thus, many methods exist for the respective sample size determination. In contrast, designing a study where SCR is the primary endpoint, is not an easy task because precision and statistical power are affected by the age distribution of a given population. Two sample size calculators for SCR estimation are proposed. The first one consists of transforming the confidence interval for SP into the corresponding one for SCR given a known seroreversion rate (SRR). The second calculator extends the previous one to the most common situation where SRR is unknown. In this situation, data simulation was used together with linear regression in order to study the expected relationship between sample size and precision. The performance of the first sample size calculator was studied in terms of the coverage of the confidence intervals for SCR. The results pointed out to eventual problems of under or over coverage for sample sizes ≤250 in very low and high malaria transmission settings (SCR ≤ 0.0036 and SCR ≥ 0.29, respectively). The correct coverage was obtained for the remaining transmission intensities with sample sizes ≥ 50. Sample size determination was then carried out for cross-sectional surveys using realistic SCRs from past sero-epidemiological studies and typical age distributions from African and non-African populations. For SCR < 0.058, African studies require a larger sample size than their non-African counterparts in order to obtain the same precision. The opposite happens for the remaining transmission intensities. With respect to the second sample size calculator, simulation unravelled the likelihood of not having enough information to estimate SRR in low transmission settings (SCR ≤ 0.0108). In that case, the respective estimates tend to underestimate the true SCR. This problem is minimized by sample sizes of no less than 500 individuals. The sample sizes determined by this second method highlighted the prior expectation that, when SRR is not known, sample sizes are increased in relation to the situation of a known SRR. In contrast to the first sample size calculation, African studies would now require lesser individuals than their counterparts conducted elsewhere, irrespective of the transmission intensity. Although the proposed sample size calculators can be instrumental to design future cross-sectional surveys, the choice of a particular sample size must be seen as a much broader exercise that involves weighting statistical precision with ethical issues, available human and economic resources, and possible time constraints. Moreover, if the sample size determination is carried out on varying transmission intensities, as done here, the respective sample sizes can also be used in studies comparing sites with different malaria transmission intensities. In conclusion, the proposed sample size calculators are a step towards the design of better sero-epidemiological studies. Their basic ideas show promise to be applied to the planning of alternative sampling schemes that may target or oversample specific age groups.

  17. Statistical characterization of a large geochemical database and effect of sample size

    USGS Publications Warehouse

    Zhang, C.; Manheim, F.T.; Hinde, J.; Grossman, J.N.

    2005-01-01

    The authors investigated statistical distributions for concentrations of chemical elements from the National Geochemical Survey (NGS) database of the U.S. Geological Survey. At the time of this study, the NGS data set encompasses 48,544 stream sediment and soil samples from the conterminous United States analyzed by ICP-AES following a 4-acid near-total digestion. This report includes 27 elements: Al, Ca, Fe, K, Mg, Na, P, Ti, Ba, Ce, Co, Cr, Cu, Ga, La, Li, Mn, Nb, Nd, Ni, Pb, Sc, Sr, Th, V, Y and Zn. The goal and challenge for the statistical overview was to delineate chemical distributions in a complex, heterogeneous data set spanning a large geographic range (the conterminous United States), and many different geological provinces and rock types. After declustering to create a uniform spatial sample distribution with 16,511 samples, histograms and quantile-quantile (Q-Q) plots were employed to delineate subpopulations that have coherent chemical and mineral affinities. Probability groupings are discerned by changes in slope (kinks) on the plots. Major rock-forming elements, e.g., Al, Ca, K and Na, tend to display linear segments on normal Q-Q plots. These segments can commonly be linked to petrologic or mineralogical associations. For example, linear segments on K and Na plots reflect dilution of clay minerals by quartz sand (low in K and Na). Minor and trace element relationships are best displayed on lognormal Q-Q plots. These sensitively reflect discrete relationships in subpopulations within the wide range of the data. For example, small but distinctly log-linear subpopulations for Pb, Cu, Zn and Ag are interpreted to represent ore-grade enrichment of naturally occurring minerals such as sulfides. None of the 27 chemical elements could pass the test for either normal or lognormal distribution on the declustered data set. Part of the reasons relate to the presence of mixtures of subpopulations and outliers. Random samples of the data set with successively smaller numbers of data points showed that few elements passed standard statistical tests for normality or log-normality until sample size decreased to a few hundred data points. Large sample size enhances the power of statistical tests, and leads to rejection of most statistical hypotheses for real data sets. For large sample sizes (e.g., n > 1000), graphical methods such as histogram, stem-and-leaf, and probability plots are recommended for rough judgement of probability distribution if needed. ?? 2005 Elsevier Ltd. All rights reserved.

  18. Nomogram for sample size calculation on a straightforward basis for the kappa statistic.

    PubMed

    Hong, Hyunsook; Choi, Yunhee; Hahn, Seokyung; Park, Sue Kyung; Park, Byung-Joo

    2014-09-01

    Kappa is a widely used measure of agreement. However, it may not be straightforward in some situation such as sample size calculation due to the kappa paradox: high agreement but low kappa. Hence, it seems reasonable in sample size calculation that the level of agreement under a certain marginal prevalence is considered in terms of a simple proportion of agreement rather than a kappa value. Therefore, sample size formulae and nomograms using a simple proportion of agreement rather than a kappa under certain marginal prevalences are proposed. A sample size formula was derived using the kappa statistic under the common correlation model and goodness-of-fit statistic. The nomogram for the sample size formula was developed using SAS 9.3. The sample size formulae using a simple proportion of agreement instead of a kappa statistic and nomograms to eliminate the inconvenience of using a mathematical formula were produced. A nomogram for sample size calculation with a simple proportion of agreement should be useful in the planning stages when the focus of interest is on testing the hypothesis of interobserver agreement involving two raters and nominal outcome measures. Copyright © 2014 Elsevier Inc. All rights reserved.

  19. Surrogate and clinical endpoints for studies in peripheral artery occlusive disease: Are statistics the brakes?

    PubMed

    Waliszewski, Matthias W; Redlich, Ulf; Breul, Victor; Tautenhahn, Jörg

    2017-04-30

    The aim of this review is to present the available clinical and surrogate endpoints that may be used in future studies performed in patients with peripheral artery occlusive disease (PAOD). Importantly, we describe statistical limitations of the most commonly used endpoints and offer some guidance with respect to study design for a given sample size. The proposed endpoints may be used in studies using surgical or interventional revascularization and/or drug treatments. Considering recently published study endpoints and designs, the usefulness of these endpoints for reimbursement is evaluated. Based on these potential study endpoints and patient sample size estimates with different non-inferiority or tests for difference hypotheses, a rating relative to their corresponding reimbursement values is attempted. As regards the benefit for the patients and for the payers, walking distance and the ankle brachial index (ABI) are the most feasible endpoints in a relatively small study samples given that other non-vascular impact factors can be controlled. Angiographic endpoints such as minimal lumen diameter (MLD) do not seem useful from a reimbursement standpoint despite their intuitiveness. Other surrogate endpoints, such as transcutaneous oxygen tension measurements, have yet to be established as useful endpoints in reasonably sized studies with patients with critical limb ischemia (CLI). From a reimbursement standpoint, WD and ABI are effective endpoints for a moderate study sample size given that non-vascular confounding factors can be controlled.

  20. On the Power Functions of Test Statistics in Order Restricted Inference.

    DTIC Science & Technology

    1984-10-01

    California-Davis Actuarial Science Davis, California 95616 The University of Iowa Iowa City, Iowa 52242 *F. T. Wright Department of Mathematics and...34 SUMMARY --We study the power functions of both the likelihood ratio and con- trast statistics for detecting a totally ordered trend in a collection...samples from normal populations, Bartholomew (1959 a,b; 1961) studied the likelihood ratio tests (LRTs) for H0 versus H -H assuming in one case that

  1. FUNSTAT and statistical image representations

    NASA Technical Reports Server (NTRS)

    Parzen, E.

    1983-01-01

    General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.

  2. Repeated Random Sampling in Year 5

    ERIC Educational Resources Information Center

    Watson, Jane M.; English, Lyn D.

    2016-01-01

    As an extension to an activity introducing Year 5 students to the practice of statistics, the software "TinkerPlots" made it possible to collect repeated random samples from a finite population to informally explore students' capacity to begin reasoning with a distribution of sample statistics. This article provides background for the…

  3. Growth after partial cutting of ponderosa pine on permanent sample plots in eastern Oregon.

    Treesearch

    Edwin L. Mowat

    1961-01-01

    Between the years 1913 and 1938, seven sets of permanent sample plots were established on the Whitman, Malheur, Rogue River, and Deschutes National Forests in eastern and central Oregon to study the results of various methods of selection cutting in old-growth ponderosa pine stands. This report briefly describes these studies and gives statistics on board-foot growth...

  4. Hydrogeochemistry and water quality of the Kordkandi-Duzduzan plain, NW Iran: application of multivariate statistical analysis and PoS index.

    PubMed

    Soltani, Shahla; Asghari Moghaddam, Asghar; Barzegar, Rahim; Kazemian, Naeimeh; Tziritis, Evangelos

    2017-08-18

    Kordkandi-Duzduzan plain is one of the fertile plains of East Azarbaijan Province, NW of Iran. Groundwater is an important resource for drinking and agricultural purposes due to the lack of surface water resources in the region. The main objectives of the present study are to identify the hydrogeochemical processes and the potential sources of major, minor, and trace metals and metalloids such as Cr, Mn, Cd, Fe, Al, and As by using joint hydrogeochemical techniques and multivariate statistical analysis and to evaluate groundwater quality deterioration with the use of PoS environmental index. To achieve these objectives, 23 groundwater samples were collected in September 2015. Piper diagram shows that the mixed Ca-Mg-Cl is the dominant groundwater type, and some of the samples have Ca-HCO 3 , Ca-Cl, and Na-Cl types. Multivariate statistical analyses indicate that weathering and dissolution of different rocks and minerals, e.g., silicates, gypsum, and halite, ion exchange, and agricultural activities influence the hydrogeochemistry of the study area. The cluster analysis divides the samples into two distinct clusters which are completely different in EC (and its dependent variables such as Na + , K + , Ca 2+ , Mg 2+ , SO 4 2- , and Cl - ), Cd, and Cr variables according to the ANOVA statistical test. Based on the median values, the concentrations of pH, NO 3 - , SiO 2 , and As in cluster 1 are elevated compared with those of cluster 2, while their maximum values occur in cluster 2. According to the PoS index, the dominant parameter that controls quality deterioration is As, with 60% of contribution. Samples of lowest PoS values are located in the southern and northern parts (recharge area) while samples of the highest values are located in the discharge area and the eastern part.

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Solc, J.

    The reclamation effort typically deals with consequences of mining activity instead of being planned well before the mining. Detailed assessment of principal hydro- and geochemical processes participating in pore and groundwater chemistry evolution was carried out at three surface mine localities in North Dakota-the Fritz mine, the Indian Head mine, and the Velva mine. The geochemical model MINTEQUA2 and advanced statistical analysis coupled with traditional interpretive techniques were used to determine site-specific environmental characteristics and to compare the differences between study sites. Multivariate statistical analysis indicates that sulfate, magnesium, calcium, the gypsum saturation index, and sodium contribute the most tomore » overall differences in groundwater chemistry between study sites. Soil paste extract pH and EC measurements performed on over 3700 samples document extremely acidic soils at the Fritz mine. The number of samples with pH <5.5 reaches 80%-90% of total samples from discrete depth near the top of the soil profile at the Fritz mine. Soil samples from Indian Head and Velva do not indicate the acidity below the pH of 5.5 limit. The percentage of samples with EC > 3 mS cm{sup -1} is between 20% and 40% at the Fritz mine and below 20% for samples from Indian Head and Velva. The results of geochemical modeling indicate an increased tendency for gypsum saturation within the vadose zone, particularly within the lands disturbed by mining activity. This trend is directly associated with increased concentrations of sulfate anions as a result of mineral oxidation. Geochemical modeling, statistical analysis, and soil extract pH and EC measurements proved to be reliable, fast, and relatively cost-effective tools for the assessment of soil acidity, the extent of the oxidation zone, and the potential for negative impact on pore and groundwater chemistry.« less

  6. Artemisia supplementation differentially affects the mucosal and luminal ileal microbiota of diet-induced obese mice

    PubMed Central

    Shawna, Wicks; M., Taylor Christopher; Meng, Luo; Eugene, Blanchard IV; David, Ribnicky; T., Cefalu William; L., Mynatt Randall; A., Welsh David

    2014-01-01

    Objective The gut microbiome has been implicated in obesity and metabolic syndrome; however, most studies have focused on fecal or colonic samples. Several species of Artemisia have been reported to ameliorate insulin signaling both in vitro and in vivo. The aim of this study was to characterize the mucosal and luminal bacterial populations in the terminal ileum with or without supplementation with Artemisia extracts. Materials/Methods Following 4 weeks of supplementation with different Artemisia extracts (PMI 5011, Santa or Scopa), diet-induced obese mice were sacrificed and luminal and mucosal samples of terminal ileum were used to evaluate microbial community composition by pyrosequencing of 16S rDNA hypervariable regions. Results Significant differences in community structure and membership were observed between luminal and mucosal samples, irrespective of diet group. All Artemisia extracts increased the Bacteroidetes:Firmicutes ratio in mucosal samples. This effect was not observed in the luminal compartment. There was high inter-individual variability in the phylogenetic assessments of the ileal microbiota, limiting the statistical power of this pilot investigation. Conclusions Marked differences in bacterial communities exist dependent upon the biogeographic compartment in the terminal ileum. Future studies testing the effects of Artemisia or other botanical supplements require larger sample sizes for adequate statistical power. PMID:24985102

  7. Evaluation and application of summary statistic imputation to discover new height-associated loci.

    PubMed

    Rüeger, Sina; McDaid, Aaron; Kutalik, Zoltán

    2018-05-01

    As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression.

  8. Evaluation and application of summary statistic imputation to discover new height-associated loci

    PubMed Central

    2018-01-01

    As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression. PMID:29782485

  9. Association between oral lichen planus and Epstein–Barr virus in Iranian patients

    PubMed Central

    Shariati, Matin; Mokhtari, Mojgan; Masoudifar, Aria

    2018-01-01

    Background: Oral lichen planus (OLP) is a common mucocutaneous disease with malignant transformation potential. Several etiologies such as humoral, autoimmunity, and viral infections might play a role, but still there is no definite etiology for this disease. The aim of this study was to investigate the presence of Epstein–Barr virus (EBV) genome in Iranian patients with OLP as compared to people with normal mucosa. Materials and Methods: The study was carried out on a case group including 38 tissue specimens of patients with histopathological confirmation of OLP and a control group including 38 samples of healthy mucosa. All samples were examined by nested polymerase chain reaction (PCR) method to determine the DNA of EBV. Results: Twenty-two (57.9%) female samples and 16 (42.1%) male samples with OLP were randomly selected as the case group, and 20 (52.6%) female samples and 18 (47.4%) male samples with healthy mucosa as the control group. There was a statistically significant difference in the percentage of EBV positivity between the case (15.8%) and the control groups (P < 0.05); in the case group, three female samples (13.6%) and three male samples (18.8%) were infected with EBV; the difference between the genders was not statistically significant (P = 0.50). Conclusion: Results emphasized that EBV genome was significantly higher among Iranian patients with OLP so antiviral therapy might be helpful. PMID:29692821

  10. Association between oral lichen planus and Epstein-Barr virus in Iranian patients.

    PubMed

    Shariati, Matin; Mokhtari, Mojgan; Masoudifar, Aria

    2018-01-01

    Oral lichen planus (OLP) is a common mucocutaneous disease with malignant transformation potential. Several etiologies such as humoral, autoimmunity, and viral infections might play a role, but still there is no definite etiology for this disease. The aim of this study was to investigate the presence of Epstein-Barr virus (EBV) genome in Iranian patients with OLP as compared to people with normal mucosa. The study was carried out on a case group including 38 tissue specimens of patients with histopathological confirmation of OLP and a control group including 38 samples of healthy mucosa. All samples were examined by nested polymerase chain reaction (PCR) method to determine the DNA of EBV. Twenty-two (57.9%) female samples and 16 (42.1%) male samples with OLP were randomly selected as the case group, and 20 (52.6%) female samples and 18 (47.4%) male samples with healthy mucosa as the control group. There was a statistically significant difference in the percentage of EBV positivity between the case (15.8%) and the control groups ( P < 0.05); in the case group, three female samples (13.6%) and three male samples (18.8%) were infected with EBV; the difference between the genders was not statistically significant ( P = 0.50). Results emphasized that EBV genome was significantly higher among Iranian patients with OLP so antiviral therapy might be helpful.

  11. Field comparison of analytical results from discrete-depth ground water samplers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zemo, D.A.; Delfino, T.A.; Gallinatti, J.D.

    1995-07-01

    Discrete-depth ground water samplers are used during environmental screening investigations to collect ground water samples in lieu of installing and sampling monitoring wells. Two of the most commonly used samplers are the BAT Enviroprobe and the QED HydroPunch I, which rely on differing sample collection mechanics. Although these devices have been on the market for several years, it was unknown what, if any, effect the differences would have on analytical results for ground water samples containing low to moderate concentrations of chlorinated volatile organic compounds (VOCs). This study investigated whether the discrete-depth ground water sampler used introduces statistically significant differencesmore » in analytical results. The goal was to provide a technical basis for allowing the two devices to be used interchangeably during screening investigations. Because this study was based on field samples, it included several sources of potential variability. It was necessary to separate differences due to sampler type from variability due to sampling location, sample handling, and laboratory analytical error. To statistically evaluate these sources of variability, the experiment was arranged in a nested design. Sixteen ground water samples were collected from eight random locations within a 15-foot by 15-foot grid. The grid was located in an area where shallow ground water was believed to be uniformly affected by VOCs. The data were evaluated using analysis of variance.« less

  12. Americium-241 in surface soil associated with the Hanford site and vicinity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Price, K.R.; Gilbert, R.O.; Gano, K.A.

    1981-05-01

    Various kinds of surface soil samples were collected and analyzed for Americium-241 (/sup 241/Am) to examine the feasibility of improving soil sample data for the Hanford Surface Environmental Surveillance Program. Results do not indicate that a major improvement would occur if procedures were changed from the current practices. Conclusions from this study are somewhat tempered by the very low levels of /sup 241/Am (< 0.10 pCi/g dry weight) detected in surface soil samples and by the fact that statistical significance depended on the type of statistical tests used. In general, the average concentration of /sup 241/Am in soil crust (0more » to 1.0 cm deep) was greater than the corresponding subsurface layer (1.0 to 2.5 cm deep), and the average concentration of /sup 241/Am in some onsite samples collected near the PUREX facility was greater than comparable samples collected 60 km upwind at an offsite location.« less

  13. An analysis of the cognitive deficit of schizophrenia based on the Piaget developmental theory.

    PubMed

    Torres, Alejandro; Olivares, Jose M; Rodriguez, Angel; Vaamonde, Antonio; Berrios, German E

    2007-01-01

    The objective of the study was to evaluate from the perspective of the Piaget developmental model the cognitive functioning of a sample of patients diagnosed with schizophrenia. Fifty patients with schizophrenia (Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition) and 40 healthy matched controls were evaluated by means of the Longeot Logical Thought Evaluation Scale. Only 6% of the subjects with schizophrenia reached the "formal period," and 70% remained at the "concrete operations" stage. The corresponding figures for the control sample were 25% and 15%, respectively. These differences were statistically significant. The samples were specifically differentiable on the permutation, probabilities, and pendulum tests of the scale. The Longeot Logical Thought Evaluation Scale can discriminate between subjects with schizophrenia and healthy controls.

  14. Comparative Financial Statistics for Public Two-Year Colleges: FY 1991 Peer Groups Sample.

    ERIC Educational Resources Information Center

    Dickmeyer, Nathan; Cirino, Anna Marie

    Comparative financial information, derived from two national surveys of 503 public two-year colleges, is presented in this report for fiscal year (FY) 1990-91. The report includes statistics for the national sample and six peer groups, space for colleges to compare their institutional statistics with national and peer groups, and tables, bar…

  15. The theoretical and experimental study of a material structure evolution in gigacyclic fatigue regime

    NASA Astrophysics Data System (ADS)

    Plekhov, Oleg; Naimark, Oleg; Narykova, Maria; Kadomtsev, Andrey; Betekhtin, Vladimir

    2015-10-01

    The work is devoted to the study of the metal structure evolution under gigacyclic fatigue (VHCF) regime. The study of the mechanical properties of the samples (Armco iron) with different state of life time existing was carried out on the base of the acoustic resonance method. The damage accumulation (porosity of the samples) was studied by the hydrostatic weighing method. A statistical model of damage accumulation was proposed in order to describe the damage accumulation process. The model describes the influence of the sample surface on the location of fatigue crack initiation.

  16. Association of Blastocystis subtypes with diarrhea in children

    NASA Astrophysics Data System (ADS)

    Zulfa, F.; Sari, I. P.; Kurniawan, A.

    2017-08-01

    Blastocystis hominis is an intestinal zoonotic protozoa that epidemiological surveys have shown, is highly prevalent among children and may cause chronic diarrhea. This study aimed to identify Blastocystis subtypes among children and associate those subtypes to pathology. The study’s population was children aged 6-12 years old divided into asymptomatic and symptomatic (diarrhea) groups. The asymptomatic samples were obtained from primary school students in the Bukit Duri area of South Jakarta, while the symptomatic samples were obtained from patients who visited nearby primary health centers (Puskesmas). Symptomatic stool samples were examined inParasitology Laboratory FKUI. Microscopic examination of the stool samples was performed to screen for single Blastocystic infection, followed by culture, PCR of 18S rRNA, and sequencing. In the study, 53.2% of children (n = 156) harbored intestinal parasites, Blastocysts sp. A single infection of Blastocystis sp. was present in 69 (44.23%) samples, comprised of 36 symptomatic and 33 asymptomatic participants. The Blastocystis subtypes (STs) identified in this study were STs 1-4 ST3 was the most dominant and was observed with statistically significant higher frequency in the symptomatic group. ST4 was only found in one sample in the symptomatic group. While ST1 and ST2 were found more frequently in the asymptomatic group, no statistical association was observed. ST3 is more likely to be associated with clinical symptoms than ST1 and ST2.

  17. Statistics for characterizing data on the periphery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Theiler, James P; Hush, Donald R

    2010-01-01

    We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.

  18. A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES.

    PubMed

    Zhou, Xiang

    2017-12-01

    Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal z -scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.

  19. Bootstrap Estimation of Sample Statistic Bias in Structural Equation Modeling.

    ERIC Educational Resources Information Center

    Thompson, Bruce; Fan, Xitao

    This study empirically investigated bootstrap bias estimation in the area of structural equation modeling (SEM). Three correctly specified SEM models were used under four different sample size conditions. Monte Carlo experiments were carried out to generate the criteria against which bootstrap bias estimation should be judged. For SEM fit indices,…

  20. An Investigation of Sample Size Splitting on ATFIND and DIMTEST

    ERIC Educational Resources Information Center

    Socha, Alan; DeMars, Christine E.

    2013-01-01

    Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…

  1. Science Achievement and Students' Self-Confidence and Interest in Science: A Taiwanese Representative Sample Study

    ERIC Educational Resources Information Center

    Chang, Chun-Yen; Cheng, Wei-Ying

    2008-01-01

    The interrelationship between senior high school students' science achievement (SA) and their self-confidence and interest in science (SCIS) was explored with a representative sample of approximately 1,044 11th-grade students from 30 classes attending four high schools throughout Taiwan. Statistical analyses indicated that a statistically…

  2. Chemical studies of H chondrites. II - Weathering effects in the Victoria Land, Antarctic population and comparison of two Antarctic populations with non-Antarctic falls

    NASA Astrophysics Data System (ADS)

    Dennison, J. E.; Lipschutz, M. E.

    1987-03-01

    The authors report RNAA data for 14 siderophile, lithophile and chalcophile volatile/mobile trace elements in interior portions of 45 different H4-6 chondrites (49 samples) from Victoria Land, Antarctica and 5 H5 chondrites from the Yamato Mts., Antarctica. Relative to H5 chondrites of weathering types A and B, all elements are depleted (10 at statistically significant levels) in extensively weathered (types B/C and C) samples. Chondrites of weathering types A and B seem compositionally uncompromised and as useful as contemporary falls for trace-element studies. When data distributions for these 14 trace elements in non-Antarctic H chondrite falls and unpaired samples from Victoria Land and from the Yamato Mts. (Queen Maud Land) are compared statistically, numerous significant differences are apparent. These and other differences give ample cause to doubt that the various sample populations derive from the same parent population. The observed differences do no reflect weathering, chance or other trivial causes: a preterrestrial source must be responsible.

  3. Availability of a fetal goat tongue cell line ZZ-R 127 for isolation of Foot-and-mouth disease virus (FMDV) from clinical samples collected from animals experimentally infected with FMDV.

    PubMed

    Fukai, Katsuhiko; Onozato, Hiroyuki; Kitano, Rie; Yamazoe, Reiko; Morioka, Kazuki; Yamada, Manabu; Ohashi, Seiichi; Yoshida, Kazuo; Kanno, Toru

    2013-11-01

    The availability of the fetal goat tongue cell line ZZ-R 127 for the isolation of Foot-and-mouth disease virus (FMDV) has not been evaluated using clinical samples other than epithelial suspensions. Therefore, in the current study, the availability of ZZ-R 127 cells for the isolation of FMDV was evaluated using clinical samples (e.g., sera, nasal swabs, saliva, feces, and oropharyngeal fluids) collected from animals experimentally infected with an FMDV isolate. Virus isolation rates for the ZZ-R 127 cells were statistically higher than those for the porcine kidney cell line (IB-RS-2) in experimental infections using cattle, goats, and pigs (P < 0.01). Virus titers in the ZZ-R 127 cells were also statistically higher than those in the IB-RS-2 cells. The availability of ZZ-R 127 cells for the isolation of FMDV not only from epithelial suspensions but also from other clinical samples was confirmed in the current study.

  4. Statistical modelling as an aid to the design of retail sampling plans for mycotoxins in food.

    PubMed

    MacArthur, Roy; MacDonald, Susan; Brereton, Paul; Murray, Alistair

    2006-01-01

    A study has been carried out to assess appropriate statistical models for use in evaluating retail sampling plans for the determination of mycotoxins in food. A compound gamma model was found to be a suitable fit. A simulation model based on the compound gamma model was used to produce operating characteristic curves for a range of parameters relevant to retail sampling. The model was also used to estimate the minimum number of increments necessary to minimize the overall measurement uncertainty. Simulation results showed that measurements based on retail samples (for which the maximum number of increments is constrained by cost) may produce fit-for-purpose results for the measurement of ochratoxin A in dried fruit, but are unlikely to do so for the measurement of aflatoxin B1 in pistachio nuts. In order to produce a more accurate simulation, further work is required to determine the degree of heterogeneity associated with batches of food products. With appropriate parameterization in terms of physical and biological characteristics, the systems developed in this study could be applied to other analyte/matrix combinations.

  5. Gender differences in a clinic-referred sample of Taiwanese attention-deficit/hyperactivity disorder children.

    PubMed

    Yang, Pinchen; Jong, Yuh-Jyh; Chung, Li-Chen; Chen, Cheng-Sheng

    2004-12-01

    The purpose of this study was to examine gender differences within a clinic-referred sample of 6-11-year-old Taiwanese children with attention-deficit/hyperactivity disorder (ADHD)- combined subtype. The subjects were 21 girls with a diagnosis of ADHD from the 4th edition of the Diagnostic and Statistical Manual and 21 age-matched boys with ADHD. Comparisons were made of behavioral ratings, cognitive profiles, and vigilance/attention assessments between these two groups. The results found ADHD girls and ADHD boys to be statistically indistinguishable on nearly all measures except the subtests of block design (P = 0.016), the discrepancy between Performance Intelligence Quotient and Verbal Intelligence Quotient (P = 0.019), and the discrepancy between fluid and crystallized IQ (P = 0.041). In the study samples, ADHD girls and ADHD boys were strikingly similar on a wide range of measures. ADHD boys and girls in clinics may be expected to show more similarities than differences in treatment needs. However, these results should be interpreted with caution since data were only from clinic-referred samples.

  6. 42 CFR 1003.109 - Notice of proposed determination.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... briefly describe the statistical sampling technique utilized by the Inspector General); (3) The reason why... statistical sampling in accordance with § 1003.133 in which case the notice shall describe those claims and...

  7. 11 CFR 9036.4 - Commission review of submissions.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ..., in conducting its review, may utilize statistical sampling techniques. Based on the results of its... nonmatchable and the reason that it is not matchable; or if statistical sampling is used, the estimated amount...

  8. ENVIRONMENTAL SAMPLING: A BRIEF REVIEW

    EPA Science Inventory

    Proper application of statistical principles at the outset of an environmental study can make the difference between an effective, efficient study and wasted resources. This review distills some of the thoughts current among environmental scientists from a variety of backgrounds ...

  9. Fracture resistance of Kevlar-reinforced poly(methyl methacrylate) resin: a preliminary study.

    PubMed

    Berrong, J M; Weed, R M; Young, J M

    1990-01-01

    The reinforcing effect of Kevlar fibers incorporated in processed poly(methyl methacrylate) resin samples was studied using 0% (controls), 0.5%, 1%, and 2% by weight of the added fibers. The samples were subjected to impact testing to determine fracture resistance, and sample groups were statistically compared using an ANOVA. Each reinforced sample had significantly greater fracture resistance (P less than 0.05) than the control, and no difference was found either within or between control groups. The use of reinforcing Kevlar fibers appears to enhance the fracture resistance of acrylic resin denture base materials.

  10. Is psychology suffering from a replication crisis? What does "failure to replicate" really mean?

    PubMed

    Maxwell, Scott E; Lau, Michael Y; Howard, George S

    2015-09-01

    Psychology has recently been viewed as facing a replication crisis because efforts to replicate past study findings frequently do not show the same result. Often, the first study showed a statistically significant result but the replication does not. Questions then arise about whether the first study results were false positives, and whether the replication study correctly indicates that there is truly no effect after all. This article suggests these so-called failures to replicate may not be failures at all, but rather are the result of low statistical power in single replication studies, and the result of failure to appreciate the need for multiple replications in order to have enough power to identify true effects. We provide examples of these power problems and suggest some solutions using Bayesian statistics and meta-analysis. Although the need for multiple replication studies may frustrate those who would prefer quick answers to psychology's alleged crisis, the large sample sizes typically needed to provide firm evidence will almost always require concerted efforts from multiple investigators. As a result, it remains to be seen how many of the recently claimed failures to replicate will be supported or instead may turn out to be artifacts of inadequate sample sizes and single study replications. (PsycINFO Database Record (c) 2015 APA, all rights reserved).

  11. Structural parameters of young star clusters: fractal analysis

    NASA Astrophysics Data System (ADS)

    Hetem, A.

    2017-07-01

    A unified view of star formation in the Universe demand detailed and in-depth studies of young star clusters. This work is related to our previous study of fractal statistics estimated for a sample of young stellar clusters (Gregorio-Hetem et al. 2015, MNRAS 448, 2504). The structural properties can lead to significant conclusions about the early stages of cluster formation: 1) virial conditions can be used to distinguish warm collapsed; 2) bound or unbound behaviour can lead to conclusions about expansion; and 3) fractal statistics are correlated to the dynamical evolution and age. The technique of error bars estimation most used in the literature is to adopt inferential methods (like bootstrap) to estimate deviation and variance, which are valid only for an artificially generated cluster. In this paper, we expanded the number of studied clusters, in order to enhance the investigation of the cluster properties and dynamic evolution. The structural parameters were compared with fractal statistics and reveal that the clusters radial density profile show a tendency of the mean separation of the stars increase with the average surface density. The sample can be divided into two groups showing different dynamic behaviour, but they have the same dynamic evolution, since the entire sample was revealed as being expanding objects, for which the substructures do not seem to have been completely erased. These results are in agreement with the simulations adopting low surface densities and supervirial conditions.

  12. Validation of Marek's disease diagnosis and monitoring of Marek's disease vaccines from samples collected in FTA cards.

    PubMed

    Cortes, Aneg L; Montiel, Enrique R; Gimeno, Isabel M

    2009-12-01

    The use of Flinders Technology Associates (FTA) filter cards to quantify Marek's disease virus (MDV) DNA for the diagnosis of Marek's disease (MD) and to monitor MD vaccines was evaluated. Samples of blood (43), solid tumors (14), and feather pulp (FP; 36) collected fresh and in FTA cards were analyzed. MDV DNA load was quantified by real-time PCR. Threshold cycle (Ct) ratios were calculated for each sample by dividing the Ct value of the internal control gene (glyceraldehyde-3-phosphate dehydrogenase) by the Ct value of the MDV gene. Statistically significant correlation (P < 0.05) within Ct ratios was detected between samples collected fresh and in FTA cards by using Pearson's correlation test. Load of serotype 1 MDV DNA was quantified in 24 FP, 14 solid tumor, and 43 blood samples. There was a statistically significant correlation between FP (r = 0.95), solid tumor (r = 0.94), and blood (r = 0.9) samples collected fresh and in FTA cards. Load of serotype 2 MDV DNA was quantified in 17 FP samples, and the correlation between samples collected fresh and in FTA cards was also statistically significant (Pearson's coefficient, r = 0.96); load of serotype 3 MDV DNA was quantified in 36 FP samples, and correlation between samples taken fresh and in FTA cards was also statistically significant (r = 0.84). MDV DNA samples extracted 3 days (t0) and 8 months after collection (t1) were used to evaluate the stability of MDV DNA in archived samples collected in FTA cards. A statistically significant correlation was found for serotype 1 (r = 0.96), serotype 2 (r = 1), and serotype 3 (r = 0.9). The results show that FTA cards are an excellent media to collect, transport, and archive samples for MD diagnosis and to monitor MD vaccines. In addition, FTA cards are widely available, inexpensive, and adequate for the shipment of samples nationally and internationally.

  13. Powerful Statistical Inference for Nested Data Using Sufficient Summary Statistics

    PubMed Central

    Dowding, Irene; Haufe, Stefan

    2018-01-01

    Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This “naive” approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment. PMID:29615885

  14. Seasonal rationalization of river water quality sampling locations: a comparative study of the modified Sanders and multivariate statistical approaches.

    PubMed

    Varekar, Vikas; Karmakar, Subhankar; Jha, Ramakar

    2016-02-01

    The design of surface water quality sampling location is a crucial decision-making process for rationalization of monitoring network. The quantity, quality, and types of available dataset (watershed characteristics and water quality data) may affect the selection of appropriate design methodology. The modified Sanders approach and multivariate statistical techniques [particularly factor analysis (FA)/principal component analysis (PCA)] are well-accepted and widely used techniques for design of sampling locations. However, their performance may vary significantly with quantity, quality, and types of available dataset. In this paper, an attempt has been made to evaluate performance of these techniques by accounting the effect of seasonal variation, under a situation of limited water quality data but extensive watershed characteristics information, as continuous and consistent river water quality data is usually difficult to obtain, whereas watershed information may be made available through application of geospatial techniques. A case study of Kali River, Western Uttar Pradesh, India, is selected for the analysis. The monitoring was carried out at 16 sampling locations. The discrete and diffuse pollution loads at different sampling sites were estimated and accounted using modified Sanders approach, whereas the monitored physical and chemical water quality parameters were utilized as inputs for FA/PCA. The designed optimum number of sampling locations for monsoon and non-monsoon seasons by modified Sanders approach are eight and seven while that for FA/PCA are eleven and nine, respectively. Less variation in the number and locations of designed sampling sites were obtained by both techniques, which shows stability of results. A geospatial analysis has also been carried out to check the significance of designed sampling location with respect to river basin characteristics and land use of the study area. Both methods are equally efficient; however, modified Sanders approach outperforms FA/PCA when limited water quality and extensive watershed information is available. The available water quality dataset is limited and FA/PCA-based approach fails to identify monitoring locations with higher variation, as these multivariate statistical approaches are data-driven. The priority/hierarchy and number of sampling sites designed by modified Sanders approach are well justified by the land use practices and observed river basin characteristics of the study area.

  15. Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design.

    PubMed

    Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo

    2017-03-15

    Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  16. An evaluation of the quality of statistical design and analysis of published medical research: results from a systematic survey of general orthopaedic journals.

    PubMed

    Parsons, Nick R; Price, Charlotte L; Hiskens, Richard; Achten, Juul; Costa, Matthew L

    2012-04-25

    The application of statistics in reported research in trauma and orthopaedic surgery has become ever more important and complex. Despite the extensive use of statistical analysis, it is still a subject which is often not conceptually well understood, resulting in clear methodological flaws and inadequate reporting in many papers. A detailed statistical survey sampled 100 representative orthopaedic papers using a validated questionnaire that assessed the quality of the trial design and statistical analysis methods. The survey found evidence of failings in study design, statistical methodology and presentation of the results. Overall, in 17% (95% confidence interval; 10-26%) of the studies investigated the conclusions were not clearly justified by the results, in 39% (30-49%) of studies a different analysis should have been undertaken and in 17% (10-26%) a different analysis could have made a difference to the overall conclusions. It is only by an improved dialogue between statistician, clinician, reviewer and journal editor that the failings in design methodology and analysis highlighted by this survey can be addressed.

  17. Discovering human germ cell mutagens with whole genome sequencing: Insights from power calculations reveal the importance of controlling for between-family variability.

    PubMed

    Webster, R J; Williams, A; Marchetti, F; Yauk, C L

    2018-07-01

    Mutations in germ cells pose potential genetic risks to offspring. However, de novo mutations are rare events that are spread across the genome and are difficult to detect. Thus, studies in this area have generally been under-powered, and no human germ cell mutagen has been identified. Whole Genome Sequencing (WGS) of human pedigrees has been proposed as an approach to overcome these technical and statistical challenges. WGS enables analysis of a much wider breadth of the genome than traditional approaches. Here, we performed power analyses to determine the feasibility of using WGS in human families to identify germ cell mutagens. Different statistical models were compared in the power analyses (ANOVA and multiple regression for one-child families, and mixed effect model sampling between two to four siblings per family). Assumptions were made based on parameters from the existing literature, such as the mutation-by-paternal age effect. We explored two scenarios: a constant effect due to an exposure that occurred in the past, and an accumulating effect where the exposure is continuing. Our analysis revealed the importance of modeling inter-family variability of the mutation-by-paternal age effect. Statistical power was improved by models accounting for the family-to-family variability. Our power analyses suggest that sufficient statistical power can be attained with 4-28 four-sibling families per treatment group, when the increase in mutations ranges from 40 to 10% respectively. Modeling family variability using mixed effect models provided a reduction in sample size compared to a multiple regression approach. Much larger sample sizes were required to detect an interaction effect between environmental exposures and paternal age. These findings inform study design and statistical modeling approaches to improve power and reduce sequencing costs for future studies in this area. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.

  18. Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples

    PubMed Central

    Libiger, Ondrej; Schork, Nicholas J.

    2015-01-01

    It is now feasible to examine the composition and diversity of microbial communities (i.e., “microbiomes”) that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology “Metastats” across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency distributions obtained on a small to moderate number of samples. PMID:26734061

  19. Occurrence, distribution, and transport of pesticides in agricultural irrigation-return flow from four drainage basins in the Columbia Basin Project, Washington, 2002-04, and comparison with historical data

    USGS Publications Warehouse

    Wagner, Richard J.; Frans, Lonna M.; Huffman, Raegan L.

    2006-01-01

    Water-quality samples were collected from sites in four irrigation return-flow drainage basins in the Columbia Basin Project from July 2002 through October 2004. Ten samples were collected throughout the irrigation season (generally April through October) and two samples were collected during the non-irrigation season. Samples were analyzed for temperature, pH, specific conductance, dissolved oxygen, major ions, trace elements, nutrients, and a suite of 107 pesticides and pesticide metabolites (pesticide transformation products) and to document the occurrence, distribution, and pesticides transport and pesticide metabolites. The four drainage basins vary in size from 19 to 710 square miles. Percentage of agricultural cropland ranges from about 35 percent in Crab Creek drainage basin to a maximum of 75 percent in Lind Coulee drainage basin. More than 95 percent of cropland in Red Rock Coulee, Crab Creek, and Sand Hollow drainage basins is irrigated, whereas only 30 percent of cropland in Lind Coulee is irrigated. Forty-two pesticides and five metabolites were detected in samples from the four irrigation return-flow drainage basins. The most compounds detected were in samples from Sand Hollow with 37, followed by Lind Coulee with 33, Red Rock Coulee with 30, and Crab Creek with 28. Herbicides were the most frequently detected pesticides, followed by insecticides, metabolites, and fungicides. Atrazine, bentazon, diuron, and 2,4-D were the most frequently detected herbicides and chlorpyrifos and azinphos-methyl were the most frequently detected insecticides. A statistical comparison of pesticide concentrations in surface-water samples collected in the mid-1990s at Crab Creek and Sand Hollow with those collected in this study showed a statistically significant increase in concentrations for diuron and a statistically significant decrease for ethoprophos and atrazine in Crab Creek. Statistically significant increases were in concentrations of bromacil, diuron, and pendimethalin at Sand Hollow and statistically significant decreases were in concentrations of 2,6-diethylanaline, alachlor, atrazine, DCPA, and EPTC. A seasonal Kendall trend test on data from Lind Coulee indicated no statistically significant trends for any pesticide for 1994 through 2004. A comparison of pesticide concentrations detected in this study with those detected in previous U.S. Geological Survey National Water-Quality Assessment studies of the Central Columbia Plateau, Yakima River basin, and national agricultural studies indicated that concentrations in this study generally were in the middle to lower end of the concentration spectrum for the most frequently detected herbicides and insecticides, but that the overall rate of detection was near the high end. Thirty-one of the 42 herbicides, insecticides, and fungicides detected in surface-water samples were applied to the major agricultural crops in the drainage basins, and 11 of the detected pesticides are sold for residential application. Eight of the pesticides detected in surface-water samples were not reported as having any agricultural or residential use. The overall pattern of pesticide use depends on which crops are grown in each drainage basin. Drainage basins with predominantly more orchards have higher amounts of insecticides applied, whereas basins with larger percentages of field crops tend to have more herbicides applied. Pesticide usage was most similar in Crab Creek and Sand Hollow, where the largest total amounts applied were the insecticides azinphos-methyl, carbaryl, and chlorpyrifos and the herbicide EPTC. In Red Rock Coulee basin, DCPA was the most heavily applied herbicide, followed by the fungicide chlorothalonil, the herbicide EPTC, and the insecticides chlorpyrifos and azinphos-methyl. In Lind Coulee, which has a large percentage of dryland agricultural area, the herbicides 2,4-D and EPTC were applied in the largest amount, followed by the fungicide chlorothalonil. The

  20. Information Entropy Production of Maximum Entropy Markov Chains from Spike Trains

    NASA Astrophysics Data System (ADS)

    Cofré, Rodrigo; Maldonado, Cesar

    2018-01-01

    We consider the maximum entropy Markov chain inference approach to characterize the collective statistics of neuronal spike trains, focusing on the statistical properties of the inferred model. We review large deviations techniques useful in this context to describe properties of accuracy and convergence in terms of sampling size. We use these results to study the statistical fluctuation of correlations, distinguishability and irreversibility of maximum entropy Markov chains. We illustrate these applications using simple examples where the large deviation rate function is explicitly obtained for maximum entropy models of relevance in this field.

  1. Statistical properties of relative weight distributions of four salmonid species and their sampling implications

    USGS Publications Warehouse

    Hyatt, M.W.; Hubert, W.A.

    2001-01-01

    We assessed relative weight (Wr) distributions among 291 samples of stock-to-quality-length brook trout Salvelinus fontinalis, brown trout Salmo trutta, rainbow trout Oncorhynchus mykiss, and cutthroat trout O. clarki from lentic and lotic habitats. Statistics describing Wr sample distributions varied slightly among species and habitat types. The average sample was leptokurtotic and slightly skewed to the right with a standard deviation of about 10, but the shapes of Wr distributions varied widely among samples. Twenty-two percent of the samples had nonnormal distributions, suggesting the need to evaluate sample distributions before applying statistical tests to determine whether assumptions are met. In general, our findings indicate that samples of about 100 stock-to-quality-length fish are needed to obtain confidence interval widths of four Wr units around the mean. Power analysis revealed that samples of about 50 stock-to-quality-length fish are needed to detect a 2% change in mean Wr at a relatively high level of power (beta = 0.01, alpha = 0.05).

  2. Making Sense of 'Big Data' in Provenance Studies

    NASA Astrophysics Data System (ADS)

    Vermeesch, P.

    2014-12-01

    Huge online databases can be 'mined' to reveal previously hidden trends and relationships in society. One could argue that sedimentary geology has entered a similar era of 'Big Data', as modern provenance studies routinely apply multiple proxies to dozens of samples. Just like the Internet, sedimentary geology now requires specialised statistical tools to interpret such large datasets. These can be organised on three levels of progressively higher order:A single sample: The most effective way to reveal the provenance information contained in a representative sample of detrital zircon U-Pb ages are probability density estimators such as histograms and kernel density estimates. The widely popular 'probability density plots' implemented in IsoPlot and AgeDisplay compound analytical uncertainty with geological scatter and are therefore invalid.Several samples: Multi-panel diagrams comprising many detrital age distributions or compositional pie charts quickly become unwieldy and uninterpretable. For example, if there are N samples in a study, then the number of pairwise comparisons between samples increases quadratically as N(N-1)/2. This is simply too much information for the human eye to process. To solve this problem, it is necessary to (a) express the 'distance' between two samples as a simple scalar and (b) combine all N(N-1)/2 such values in a single two-dimensional 'map', grouping similar and pulling apart dissimilar samples. This can be easily achieved using simple statistics-based dissimilarity measures and a standard statistical method called Multidimensional Scaling (MDS).Several methods: Suppose that we use four provenance proxies: bulk petrography, chemistry, heavy minerals and detrital geochronology. This will result in four MDS maps, each of which likely show slightly different trends and patterns. To deal with such cases, it may be useful to use a related technique called 'three way multidimensional scaling'. This results in two graphical outputs: an MDS map, and a map with 'weights' showing to what extent the different provenance proxies influence the horizontal and vertical axis of the MDS map. Thus, detrital data can not only inform the user about the provenance of sediments, but also about the causal relationships between the mineralogy, geochronology and chemistry.

  3. Statistical Literacy and Sample Survey Results

    ERIC Educational Resources Information Center

    McAlevey, Lynn; Sullivan, Charles

    2010-01-01

    Sample surveys are widely used in the social sciences and business. The news media almost daily quote from them, yet they are widely misused. Using students with prior managerial experience embarking on an MBA course, we show that common sample survey results are misunderstood even by those managers who have previously done a statistics course. In…

  4. Data processing of qualitative results from an interlaboratory comparison for the detection of “Flavescence dorée” phytoplasma: How the use of statistics can improve the reliability of the method validation process in plant pathology

    PubMed Central

    Renaudin, Isabelle; Poliakoff, Françoise

    2017-01-01

    A working group established in the framework of the EUPHRESCO European collaborative project aimed to compare and validate diagnostic protocols for the detection of “Flavescence dorée” (FD) phytoplasma in grapevines. Seven molecular protocols were compared in an interlaboratory test performance study where each laboratory had to analyze the same panel of samples consisting of DNA extracts prepared by the organizing laboratory. The tested molecular methods consisted of universal and group-specific real-time and end-point nested PCR tests. Different statistical approaches were applied to this collaborative study. Firstly, there was the standard statistical approach consisting in analyzing samples which are known to be positive and samples which are known to be negative and reporting the proportion of false-positive and false-negative results to respectively calculate diagnostic specificity and sensitivity. This approach was supplemented by the calculation of repeatability and reproducibility for qualitative methods based on the notions of accordance and concordance. Other new approaches were also implemented, based, on the one hand, on the probability of detection model, and, on the other hand, on Bayes’ theorem. These various statistical approaches are complementary and give consistent results. Their combination, and in particular, the introduction of new statistical approaches give overall information on the performance and limitations of the different methods, and are particularly useful for selecting the most appropriate detection scheme with regards to the prevalence of the pathogen. Three real-time PCR protocols (methods M4, M5 and M6 respectively developed by Hren (2007), Pelletier (2009) and under patent oligonucleotides) achieved the highest levels of performance for FD phytoplasma detection. This paper also addresses the issue of indeterminate results and the identification of outlier results. The statistical tools presented in this paper and their combination can be applied to many other studies concerning plant pathogens and other disciplines that use qualitative detection methods. PMID:28384335

  5. Data processing of qualitative results from an interlaboratory comparison for the detection of "Flavescence dorée" phytoplasma: How the use of statistics can improve the reliability of the method validation process in plant pathology.

    PubMed

    Chabirand, Aude; Loiseau, Marianne; Renaudin, Isabelle; Poliakoff, Françoise

    2017-01-01

    A working group established in the framework of the EUPHRESCO European collaborative project aimed to compare and validate diagnostic protocols for the detection of "Flavescence dorée" (FD) phytoplasma in grapevines. Seven molecular protocols were compared in an interlaboratory test performance study where each laboratory had to analyze the same panel of samples consisting of DNA extracts prepared by the organizing laboratory. The tested molecular methods consisted of universal and group-specific real-time and end-point nested PCR tests. Different statistical approaches were applied to this collaborative study. Firstly, there was the standard statistical approach consisting in analyzing samples which are known to be positive and samples which are known to be negative and reporting the proportion of false-positive and false-negative results to respectively calculate diagnostic specificity and sensitivity. This approach was supplemented by the calculation of repeatability and reproducibility for qualitative methods based on the notions of accordance and concordance. Other new approaches were also implemented, based, on the one hand, on the probability of detection model, and, on the other hand, on Bayes' theorem. These various statistical approaches are complementary and give consistent results. Their combination, and in particular, the introduction of new statistical approaches give overall information on the performance and limitations of the different methods, and are particularly useful for selecting the most appropriate detection scheme with regards to the prevalence of the pathogen. Three real-time PCR protocols (methods M4, M5 and M6 respectively developed by Hren (2007), Pelletier (2009) and under patent oligonucleotides) achieved the highest levels of performance for FD phytoplasma detection. This paper also addresses the issue of indeterminate results and the identification of outlier results. The statistical tools presented in this paper and their combination can be applied to many other studies concerning plant pathogens and other disciplines that use qualitative detection methods.

  6. Performance of digital RGB reflectance color extraction for plaque lesion

    NASA Astrophysics Data System (ADS)

    Hashim, Hadzli; Taib, Mohd Nasir; Jailani, Rozita; Sulaiman, Saadiah; Baba, Roshidah

    2005-01-01

    Several clinical psoriasis lesion groups are been studied for digital RGB color features extraction. Previous works have used samples size that included all the outliers lying beyond the standard deviation factors from the peak histograms. This paper described the statistical performances of the RGB model with and without removing these outliers. Plaque lesion is experimented with other types of psoriasis. The statistical tests are compared with respect to three samples size; the original 90 samples, the first size reduction by removing outliers from 2 standard deviation distances (2SD) and the second size reduction by removing outliers from 1 standard deviation distance (1SD). Quantification of data images through the normal/direct and differential of the conventional reflectance method is considered. Results performances are concluded by observing the error plots with 95% confidence interval and findings of the inference T-tests applied. The statistical tests outcomes have shown that B component for conventional differential method can be used to distinctively classify plaque from the other psoriasis groups in consistent with the error plots finding with an improvement in p-value greater than 0.5.

  7. Comparative evaluation of the accuracy of linear measurements between cone beam computed tomography and 3D microtomography.

    PubMed

    Mangione, Francesca; Meleo, Deborah; Talocco, Marco; Pecci, Raffaella; Pacifici, Luciano; Bedini, Rossella

    2013-01-01

    The aim of this study was to evaluate the influence of artifacts on the accuracy of linear measurements estimated with a common cone beam computed tomography (CBCT) system used in dental clinical practice, by comparing it with microCT system as standard reference. Ten bovine bone cylindrical samples containing one implant each, able to provide both points of reference and image quality degradation, have been scanned by CBCT and microCT systems. Thanks to the software of the two systems, for each cylindrical sample, two diameters taken at different levels, by using implants different points as references, have been measured. Results have been analyzed by ANOVA and a significant statistically difference has been found. Due to the obtained results, in this work it is possible to say that the measurements made with the two different instruments are still not statistically comparable, although in some samples were obtained similar performances and therefore not statistically significant. With the improvement of the hardware and software of CBCT systems, in the near future the two instruments will be able to provide similar performances.

  8. A method for determining the weak statistical stationarity of a random process

    NASA Technical Reports Server (NTRS)

    Sadeh, W. Z.; Koper, C. A., Jr.

    1978-01-01

    A method for determining the weak statistical stationarity of a random process is presented. The core of this testing procedure consists of generating an equivalent ensemble which approximates a true ensemble. Formation of an equivalent ensemble is accomplished through segmenting a sufficiently long time history of a random process into equal, finite, and statistically independent sample records. The weak statistical stationarity is ascertained based on the time invariance of the equivalent-ensemble averages. Comparison of these averages with their corresponding time averages over a single sample record leads to a heuristic estimate of the ergodicity of a random process. Specific variance tests are introduced for evaluating the statistical independence of the sample records, the time invariance of the equivalent-ensemble autocorrelations, and the ergodicity. Examination and substantiation of these procedures were conducted utilizing turbulent velocity signals.

  9. Impact of Rating Scale Categories on Reliability and Fit Statistics of the Malay Spiritual Well-Being Scale using Rasch Analysis.

    PubMed

    Daher, Aqil Mohammad; Ahmad, Syed Hassan; Winn, Than; Selamat, Mohd Ikhsan

    2015-01-01

    Few studies have employed the item response theory in examining reliability. We conducted this study to examine the effect of Rating Scale Categories (RSCs) on the reliability and fit statistics of the Malay Spiritual Well-Being Scale, employing the Rasch model. The Malay Spiritual Well-Being Scale (SWBS) with the original six; three and four newly structured RSCs was distributed randomly among three different samples of 50 participants each. The mean age of respondents in the three samples ranged between 36 and 39 years old. The majority was female in all samples, and Islam was the most prevalent religion among the respondents. The predominating race was Malay, followed by Chinese and Indian. The original six RSCs indicated better targeting of 0.99 and smallest model error of 0.24. The Infit Mnsq (mean square) and Zstd (Z standard) of the six RSCs were "1.1"and "-0.1"respectively. The six RSCs achieved the highest person and item reliabilities of 0.86 and 0.85 respectively. These reliabilities yielded the highest person (2.46) and item (2.38) separation indices compared to other the RSCs. The person and item reliability and, to a lesser extent, the fit statistics, were better with the six RSCs compared to the four and three RSCs.

  10. Baldwin Effect and Additional BLR Component in AGN with Superluminal Jets

    NASA Astrophysics Data System (ADS)

    Patiño Álvarez, Víctor; Torrealba, Janet; Chavushyan, Vahram; Cruz González, Irene; Arshakian, Tigran; León Tavares, Jonathan; Popovic, Luka

    2016-06-01

    We study the Baldwin Effect (BE) in 96 core-jet blazars with optical and ultraviolet spectroscopic data from a radio-loud AGN sample obtained from the MOJAVE 2cm survey. A statistical analysis is presented of the equivalent widths W_lambda of emission lines H beta 4861, Mg II 2798, C IV 1549, and continuum luminosities at 5100, 3000, and 1350 angstroms. The BE is found statistically significant (with confidence level c.l. > 95%) in H beta and C IV emission lines, while for Mg II the trend is slightly less significant (c.l. = 94.5%). The slopes of the BE in the studied samples for H beta and Mg II are found steeper and with statistically significant difference than those of a comparison radio-quiet sample. We present simulations of the expected BE slopes produced by the contribution to the total continuum of the non-thermal boosted emission from the relativistic jet, and by variability of the continuum components. We find that the slopes of the BE between radio-quiet and radio-loud AGN should not be different, under the assumption that the broad line is only being emitted by the canonical broad line region around the black hole. We discuss that the BE slope steepening in radio AGN is due to a jet associated broad-line region.

  11. [Evaluation of the quality of Anales Españoles de Pediatría versus Medicina Clínica].

    PubMed

    Bonillo Perales, A

    2002-08-01

    To compare the scientific methodology and quality of articles published in Anales Españoles de Pediatría and Medicina Clínica. A stratified and randomized selection of 40 original articles published in 2001 in Anales Españoles de Pediatría and Medicina Clínica was made. Methodological errors in the critical analysis of original articles (21 items), epidemiological design, sample size, statistical complexity and levels of scientific evidence in both journals were compared using the chi-squared and/or Student's t-test. No differences were found between Anales Españoles de Pediatría and Medicina Clínica in the critical evaluation of original articles (p > 0.2). In original articles published in Anales Españoles de Pediatría, the designs were of lower scientific evidence (a lower proportion of clinical trials, cohort and case-control studies) (17.5 vs 42.5 %, p 0.05), sample sizes were smaller (p 0.003) and there was less statistical complexity in the results section (p 0.03). To improve the scientific quality of Anales Españoles de Pediatría, improved study designs, larger sample sizes and greater statistical complexity are required in its articles.

  12. Implementation of novel statistical procedures and other advanced approaches to improve analysis of CASA data.

    PubMed

    Ramón, M; Martínez-Pastor, F

    2018-04-23

    Computer-aided sperm analysis (CASA) produces a wealth of data that is frequently ignored. The use of multiparametric statistical methods can help explore these datasets, unveiling the subpopulation structure of sperm samples. In this review we analyse the significance of the internal heterogeneity of sperm samples and its relevance. We also provide a brief description of the statistical tools used for extracting sperm subpopulations from the datasets, namely unsupervised clustering (with non-hierarchical, hierarchical and two-step methods) and the most advanced supervised methods, based on machine learning. The former method has allowed exploration of subpopulation patterns in many species, whereas the latter offering further possibilities, especially considering functional studies and the practical use of subpopulation analysis. We also consider novel approaches, such as the use of geometric morphometrics or imaging flow cytometry. Finally, although the data provided by CASA systems provides valuable information on sperm samples by applying clustering analyses, there are several caveats. Protocols for capturing and analysing motility or morphometry should be standardised and adapted to each experiment, and the algorithms should be open in order to allow comparison of results between laboratories. Moreover, we must be aware of new technology that could change the paradigm for studying sperm motility and morphology.

  13. 42 CFR 402.7 - Notice of proposed determination.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... and a brief description of the statistical sampling technique CMS or OIG used. (3) The reason why the... is relying upon statistical sampling to project the number and types of claims or requests for...

  14. Normal Approximations to the Distributions of the Wilcoxon Statistics: Accurate to What "N"? Graphical Insights

    ERIC Educational Resources Information Center

    Bellera, Carine A.; Julien, Marilyse; Hanley, James A.

    2010-01-01

    The Wilcoxon statistics are usually taught as nonparametric alternatives for the 1- and 2-sample Student-"t" statistics in situations where the data appear to arise from non-normal distributions, or where sample sizes are so small that we cannot check whether they do. In the past, critical values, based on exact tail areas, were…

  15. "What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"

    ERIC Educational Resources Information Center

    Ozturk, Elif

    2012-01-01

    The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…

  16. Characterizations of linear sufficient statistics

    NASA Technical Reports Server (NTRS)

    Peters, B. C., Jr.; Reoner, R.; Decell, H. P., Jr.

    1977-01-01

    A surjective bounded linear operator T from a Banach space X to a Banach space Y must be a sufficient statistic for a dominated family of probability measures defined on the Borel sets of X. These results were applied, so that they characterize linear sufficient statistics for families of the exponential type, including as special cases the Wishart and multivariate normal distributions. The latter result was used to establish precisely which procedures for sampling from a normal population had the property that the sample mean was a sufficient statistic.

  17. Sampling designs for HIV molecular epidemiology with application to Honduras.

    PubMed

    Shepherd, Bryan E; Rossini, Anthony J; Soto, Ramon Jeremias; De Rivera, Ivette Lorenzana; Mullins, James I

    2005-11-01

    Proper sampling is essential to characterize the molecular epidemiology of human immunodeficiency virus (HIV). HIV sampling frames are difficult to identify, so most studies use convenience samples. We discuss statistically valid and feasible sampling techniques that overcome some of the potential for bias due to convenience sampling and ensure better representation of the study population. We employ a sampling design called stratified cluster sampling. This first divides the population into geographical and/or social strata. Within each stratum, a population of clusters is chosen from groups, locations, or facilities where HIV-positive individuals might be found. Some clusters are randomly selected within strata and individuals are randomly selected within clusters. Variation and cost help determine the number of clusters and the number of individuals within clusters that are to be sampled. We illustrate the approach through a study designed to survey the heterogeneity of subtype B strains in Honduras.

  18. Sampling in Developmental Science: Situations, Shortcomings, Solutions, and Standards.

    PubMed

    Bornstein, Marc H; Jager, Justin; Putnick, Diane L

    2013-12-01

    Sampling is a key feature of every study in developmental science. Although sampling has far-reaching implications, too little attention is paid to sampling. Here, we describe, discuss, and evaluate four prominent sampling strategies in developmental science: population-based probability sampling, convenience sampling, quota sampling, and homogeneous sampling. We then judge these sampling strategies by five criteria: whether they yield representative and generalizable estimates of a study's target population, whether they yield representative and generalizable estimates of subsamples within a study's target population, the recruitment efforts and costs they entail, whether they yield sufficient power to detect subsample differences, and whether they introduce "noise" related to variation in subsamples and whether that "noise" can be accounted for statistically. We use sample composition of gender, ethnicity, and socioeconomic status to illustrate and assess the four sampling strategies. Finally, we tally the use of the four sampling strategies in five prominent developmental science journals and make recommendations about best practices for sample selection and reporting.

  19. Statistical power in parallel group point exposure studies with time-to-event outcomes: an empirical comparison of the performance of randomized controlled trials and the inverse probability of treatment weighting (IPTW) approach.

    PubMed

    Austin, Peter C; Schuster, Tibor; Platt, Robert W

    2015-10-15

    Estimating statistical power is an important component of the design of both randomized controlled trials (RCTs) and observational studies. Methods for estimating statistical power in RCTs have been well described and can be implemented simply. In observational studies, statistical methods must be used to remove the effects of confounding that can occur due to non-random treatment assignment. Inverse probability of treatment weighting (IPTW) using the propensity score is an attractive method for estimating the effects of treatment using observational data. However, sample size and power calculations have not been adequately described for these methods. We used an extensive series of Monte Carlo simulations to compare the statistical power of an IPTW analysis of an observational study with time-to-event outcomes with that of an analysis of a similarly-structured RCT. We examined the impact of four factors on the statistical power function: number of observed events, prevalence of treatment, the marginal hazard ratio, and the strength of the treatment-selection process. We found that, on average, an IPTW analysis had lower statistical power compared to an analysis of a similarly-structured RCT. The difference in statistical power increased as the magnitude of the treatment-selection model increased. The statistical power of an IPTW analysis tended to be lower than the statistical power of a similarly-structured RCT.

  20. [Study on correlation between ITS sequence of Arctium lappa and quality of Fructus Arctii].

    PubMed

    Xu, Liang; Dou, Deqiang; Wang, Bing; Yang, Yanyun; Kang, Tingguo

    2011-07-01

    To study the correlation between ITS sequence of Arctium lappa and Fructus Arctii quality of different origin. The samples of Fructu arctii materials were collected from 26 different producing areas. Their ITS sequence were determined after polymerase chain reaction (PCR) and quality were evaluated through the determination of arctiin content by HPLC. Genetic diversity, genotype and correlation were analyzed by ClustalX (1.81), Mage 4.0, SPSS 13.0 statistical software. ITS sequence of A. was obtained from 26 samples, and was registered in the GenBank. Corresponding arctiin content of Fructus arctii and 1000-grain weight were determined. A. lappa genotype correlated with Fructus arctii quality by statistical analysis. The research provided a foundation for revealing the molecular mechanism of Fructus arctii geoherbs.

  1. Bovine origin Staphylococcus aureus: A new zoonotic agent?

    PubMed

    Rao, Relangi Tulasi; Jayakumar, Kannan; Kumar, Pavitra

    2017-10-01

    The study aimed to assess the nature of animal origin Staphylococcus aureus strains. The study has zoonotic importance and aimed to compare virulence between two different hosts, i.e., bovine and ovine origin. Conventional polymerase chain reaction-based methods used for the characterization of S. aureus strains and chick embryo model employed for the assessment of virulence capacity of strains. All statistical tests carried on R program, version 3.0.4. After initial screening and molecular characterization of the prevalence of S. aureus found to be 42.62% in bovine origin samples and 28.35% among ovine origin samples. Meanwhile, the methicillin-resistant S. aureus prevalence is found to be meager in both the hosts. Among the samples, only 6.8% isolates tested positive for methicillin resistance. The biofilm formation quantified and the variation compared among the host. A Welch two-sample t -test found to be statistically significant, t=2.3179, df=28.103, and p=0.02795. Chicken embryo model found effective to test the pathogenicity of the strains. The study helped to conclude healthy bovines can act as S. aureus reservoirs. Bovine origin S. aureus strains are more virulent than ovine origin strains. Bovine origin strains have high probability to become zoonotic pathogen. Further, gene knock out studies may be conducted to conclude zoonocity of the bovine origin strains.

  2. Developing a cosmic ray muon sampling capability for muon tomography and monitoring applications

    NASA Astrophysics Data System (ADS)

    Chatzidakis, S.; Chrysikopoulou, S.; Tsoukalas, L. H.

    2015-12-01

    In this study, a cosmic ray muon sampling capability using a phenomenological model that captures the main characteristics of the experimentally measured spectrum coupled with a set of statistical algorithms is developed. The "muon generator" produces muons with zenith angles in the range 0-90° and energies in the range 1-100 GeV and is suitable for Monte Carlo simulations with emphasis on muon tomographic and monitoring applications. The muon energy distribution is described by the Smith and Duller (1959) [35] phenomenological model. Statistical algorithms are then employed for generating random samples. The inverse transform provides a means to generate samples from the muon angular distribution, whereas the Acceptance-Rejection and Metropolis-Hastings algorithms are employed to provide the energy component. The predictions for muon energies 1-60 GeV and zenith angles 0-90° are validated with a series of actual spectrum measurements and with estimates from the software library CRY. The results confirm the validity of the phenomenological model and the applicability of the statistical algorithms to generate polyenergetic-polydirectional muons. The response of the algorithms and the impact of critical parameters on computation time and computed results were investigated. Final output from the proposed "muon generator" is a look-up table that contains the sampled muon angles and energies and can be easily integrated into Monte Carlo particle simulation codes such as Geant4 and MCNP.

  3. Influence of light curing and sample thickness on microhardness of a composite resin

    PubMed Central

    Aguiar, Flávio HB; Andrade, Kelly RM; Leite Lima, Débora AN; Ambrosano, Gláucia MB; Lovadino, José R

    2009-01-01

    The aim of this in vitro study was to evaluate the influence of light-curing units and different sample thicknesses on the microhardness of a composite resin. Composite resin specimens were randomly prepared and assigned to nine experimental groups (n = 5): considering three light-curing units (conventional quartz tungsten halogen [QTH]: 550 mW/cm2 – 20 s; high irradiance QTH: 1160 mW/cm2 – 10 s; and light-emitting diode [LED]: 360 mW/cm2 – 40 s) and three sample thicknesses (0.5 mm, 1 mm, and 2 mm). All samples were polymerized with the light tip 8 mm away from the specimen. Knoop microhardness was then measured on the top and bottom surfaces of each sample. The top surfaces, with some exceptions, were almost similar; however, in relation to the bottom surfaces, statistical differences were found between curing units and thicknesses. In all experimental groups, the 0.5-mm-thick increments showed microhardness values statistically higher than those observed for 1- and -2-mm increments. The conventional and LED units showed higher hardness mean values and were statistically different from the high irradiance unit. In all experimental groups, microhardness mean values obtained for the top surface were higher than those observed for the bottom surface. In conclusion, higher levels of irradiance or thinner increments would help improve hybrid composite resin polymerization. PMID:23674901

  4. Evaluation of the color stability of two techniques for reproducing artificial irides after microwave polymerization

    PubMed Central

    GOIATO, Marcelo Coelho; dos SANTOS, Daniela Micheline; MORENO, Amália; GENNARI-FILHO, Humberto; PELLIZZER, Eduardo Piza

    2011-01-01

    The use of ocular prostheses for ophthalmic patients aims to rebuild facial aesthetics and provide an artificial substitute to the visual organ. Natural intemperate conditions promote discoloration of artificial irides and many studies have attempted to produce irides with greater chromatic paint durability using different paint materials. Objectives The present study evaluated the color stability of artificial irides obtained with two techniques (oil painting and digital image) and submitted to microwave polymerization. Material and Methods Forty samples were fabricated simulating ocular prostheses. Each sample was constituted by one disc of acrylic resin N1 and one disc of colorless acrylic resin with the iris interposed between the discs. The irides in brown and blue color were obtained by oil painting or digital image. The color stability was determined by a reflection spectrophotometer and measurements were taken before and after microwave polymerization. Statistical analysis of the techniques for reproducing artificial irides was performed by applying the normal data distribution test followed by 2-way ANOVA and Tukey HSD test (α=.05). Results Chromatic alterations occurred in all specimens and statistically significant differences were observed between the oil-painted samples and those obtained by digital imaging. There was no statistical difference between the brown and blue colors. Independently of technique, all samples suffered color alterations after microwave polymerization. Conclusion The digital imaging technique for reproducing irides presented better color stability after microwave polymerization. PMID:21625733

  5. An investigation into the effects of temporal resolution on hepatic dynamic contrast-enhanced MRI in volunteers and in patients with hepatocellular carcinoma

    NASA Astrophysics Data System (ADS)

    Gill, Andrew B.; Black, Richard T.; Bowden, David J.; Priest, Andrew N.; Graves, Martin J.; Lomas, David J.

    2014-06-01

    This study investigated the effect of temporal resolution on the dual-input pharmacokinetic (PK) modelling of dynamic contrast-enhanced MRI (DCE-MRI) data from normal volunteer livers and from patients with hepatocellular carcinoma. Eleven volunteers and five patients were examined at 3 T. Two sections, one optimized for the vascular input functions (VIF) and one for the tissue, were imaged within a single heart-beat (HB) using a saturation-recovery fast gradient echo sequence. The data was analysed using a dual-input single-compartment PK model. The VIFs and/or uptake curves were then temporally sub-sampled (at interval ▵t = [2-20] s) before being subject to the same PK analysis. Statistical comparisons of tumour and normal tissue PK parameter values using a 5% significance level gave rise to the same study results when temporally sub-sampling the VIFs to HB < ▵t <4 s. However, sub-sampling to ▵t > 4 s did adversely affect the statistical comparisons. Temporal sub-sampling of just the liver/tumour tissue uptake curves at ▵t ≤ 20 s, whilst using high temporal resolution VIFs, did not substantially affect PK parameter statistical comparisons. In conclusion, there is no practical advantage to be gained from acquiring very high temporal resolution hepatic DCE-MRI data. Instead the high temporal resolution could be usefully traded for increased spatial resolution or SNR.

  6. Identifying Microlensing Events in Large, Non-Uniformly Sampled Surveys: The Case of the Palomar Transient Factory

    NASA Astrophysics Data System (ADS)

    Price-Whelan, Adrian M.; Agueros, M. A.; Fournier, A.; Street, R.; Ofek, E.; Levitan, D. B.; PTF Collaboration

    2013-01-01

    Many current photometric, time-domain surveys are driven by specific goals such as searches for supernovae or transiting exoplanets, or studies of stellar variability. These goals in turn set the cadence with which individual fields are re-imaged. In the case of the Palomar Transient Factory (PTF), several such sub-surveys are being conducted in parallel, leading to extremely non-uniform sampling over the survey's nearly 20,000 sq. deg. footprint. While the typical 7.26 sq. deg. PTF field has been imaged 20 times in R-band, ~2300 sq. deg. have been observed more than 100 times. We use the existing PTF data 6.4x107 light curves) to study the trade-off that occurs when searching for microlensing events when one has access to a large survey footprint with irregular sampling. To examine the probability that microlensing events can be recovered in these data, we also test previous statistics used on uniformly sampled data to identify variables and transients. We find that one such statistic, the von Neumann ratio, performs best for identifying simulated microlensing events. We develop a selection method using this statistic and apply it to data from all PTF fields with >100 observations to uncover a number of interesting candidate events. This work can help constrain all-sky event rate predictions and tests microlensing signal recovery in large datasets, both of which will be useful to future wide-field, time-domain surveys such as the LSST.

  7. Identifying natural flow regimes using fish communities

    NASA Astrophysics Data System (ADS)

    Chang, Fi-John; Tsai, Wen-Ping; Wu, Tzu-Ching; Chen, Hung-kwai; Herricks, Edwin E.

    2011-10-01

    SummaryModern water resources management has adopted natural flow regimes as reasonable targets for river restoration and conservation. The characterization of a natural flow regime begins with the development of hydrologic statistics from flow records. However, little guidance exists for defining the period of record needed for regime determination. In Taiwan, the Taiwan Eco-hydrological Indicator System (TEIS), a group of hydrologic statistics selected for fisheries relevance, is being used to evaluate ecological flows. The TEIS consists of a group of hydrologic statistics selected to characterize the relationships between flow and the life history of indigenous species. Using the TEIS and biosurvey data for Taiwan, this paper identifies the length of hydrologic record sufficient for natural flow regime characterization. To define the ecological hydrology of fish communities, this study connected hydrologic statistics to fish communities by using methods to define antecedent conditions that influence existing community composition. A moving average method was applied to TEIS statistics to reflect the effects of antecedent flow condition and a point-biserial correlation method was used to relate fisheries collections with TEIS statistics. The resulting fish species-TEIS (FISH-TEIS) hydrologic statistics matrix takes full advantage of historical flows and fisheries data. The analysis indicates that, in the watersheds analyzed, averaging TEIS statistics for the present year and 3 years prior to the sampling date, termed MA(4), is sufficient to develop a natural flow regime. This result suggests that flow regimes based on hydrologic statistics for the period of record can be replaced by regimes developed for sampled fish communities.

  8. Volatility measurement with directional change in Chinese stock market: Statistical property and investment strategy

    NASA Astrophysics Data System (ADS)

    Ma, Junjun; Xiong, Xiong; He, Feng; Zhang, Wei

    2017-04-01

    The stock price fluctuation is studied in this paper with intrinsic time perspective. The event, directional change (DC) or overshoot, are considered as time scale of price time series. With this directional change law, its corresponding statistical properties and parameter estimation is tested in Chinese stock market. Furthermore, a directional change trading strategy is proposed for invest in the market portfolio in Chinese stock market, and both in-sample and out-of-sample performance are compared among the different method of model parameter estimation. We conclude that DC method can capture important fluctuations in Chinese stock market and gain profit due to the statistical property that average upturn overshoot size is bigger than average downturn directional change size. The optimal parameter of DC method is not fixed and we obtained 1.8% annual excess return with this DC-based trading strategy.

  9. Statistical power comparisons at 3T and 7T with a GO / NOGO task.

    PubMed

    Torrisi, Salvatore; Chen, Gang; Glen, Daniel; Bandettini, Peter A; Baker, Chris I; Reynolds, Richard; Yen-Ting Liu, Jeffrey; Leshin, Joseph; Balderston, Nicholas; Grillon, Christian; Ernst, Monique

    2018-07-15

    The field of cognitive neuroscience is weighing evidence about whether to move from standard field strength to ultra-high field (UHF). The present study contributes to the evidence by comparing a cognitive neuroscience paradigm at 3 Tesla (3T) and 7 Tesla (7T). The goal was to test and demonstrate the practical effects of field strength on a standard GO/NOGO task using accessible preprocessing and analysis tools. Two independent matched healthy samples (N = 31 each) were analyzed at 3T and 7T. Results show gains at 7T in statistical strength, the detection of smaller effects and group-level power. With an increased availability of UHF scanners, these gains may be exploited by cognitive neuroscientists and other neuroimaging researchers to develop more efficient or comprehensive experimental designs and, given the same sample size, achieve greater statistical power at 7T. Published by Elsevier Inc.

  10. Applying Bayesian statistics to the study of psychological trauma: A suggestion for future research.

    PubMed

    Yalch, Matthew M

    2016-03-01

    Several contemporary researchers have noted the virtues of Bayesian methods of data analysis. Although debates continue about whether conventional or Bayesian statistics is the "better" approach for researchers in general, there are reasons why Bayesian methods may be well suited to the study of psychological trauma in particular. This article describes how Bayesian statistics offers practical solutions to the problems of data non-normality, small sample size, and missing data common in research on psychological trauma. After a discussion of these problems and the effects they have on trauma research, this article explains the basic philosophical and statistical foundations of Bayesian statistics and how it provides solutions to these problems using an applied example. Results of the literature review and the accompanying example indicates the utility of Bayesian statistics in addressing problems common in trauma research. Bayesian statistics provides a set of methodological tools and a broader philosophical framework that is useful for trauma researchers. Methodological resources are also provided so that interested readers can learn more. (c) 2016 APA, all rights reserved).

  11. Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples

    PubMed Central

    White, James Robert; Nagarajan, Niranjan; Pop, Mihai

    2009-01-01

    Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our commensal bacterial microflora. An important prerequisite for such discoveries are computational tools that are able to rapidly and accurately compare large datasets generated from complex bacterial communities to identify features that distinguish them. We present a statistical method for comparing clinical metagenomic samples from two treatment populations on the basis of count data (e.g. as obtained through sequencing) to detect differentially abundant features. Our method, Metastats, employs the false discovery rate to improve specificity in high-complexity environments, and separately handles sparsely-sampled features using Fisher's exact test. Under a variety of simulations, we show that Metastats performs well compared to previously used methods, and significantly outperforms other methods for features with sparse counts. We demonstrate the utility of our method on several datasets including a 16S rRNA survey of obese and lean human gut microbiomes, COG functional profiles of infant and mature gut microbiomes, and bacterial and viral metabolic subsystem data inferred from random sequencing of 85 metagenomes. The application of our method to the obesity dataset reveals differences between obese and lean subjects not reported in the original study. For the COG and subsystem datasets, we provide the first statistically rigorous assessment of the differences between these populations. The methods described in this paper are the first to address clinical metagenomic datasets comprising samples from multiple subjects. Our methods are robust across datasets of varied complexity and sampling level. While designed for metagenomic applications, our software can also be applied to digital gene expression studies (e.g. SAGE). A web server implementation of our methods and freely available source code can be found at http://metastats.cbcb.umd.edu/. PMID:19360128

  12. Spatial Distribution of Soil Fauna In Long Term No Tillage

    NASA Astrophysics Data System (ADS)

    Corbo, J. Z. F.; Vieira, S. R.; Siqueira, G. M.

    2012-04-01

    The soil is a complex system constituted by living beings, organic and mineral particles, whose components define their physical, chemical and biological properties. Soil fauna plays an important role in soil and may reflect and interfere in its functionality. These organisms' populations may be influenced by management practices, fertilization, liming and porosity, among others. Such changes may reduce the composition and distribution of soil fauna community. Thus, this study aimed to determine the spatial variability of soil fauna in consolidated no-tillage system. The experimental area is located at Instituto Agronômico in Campinas (São Paulo, Brazil). The sampling was conducted in a Rhodic Eutrudox, under no tillage system and 302 points distributed in a 3.2 hectare area in a regular grid of 10.00 m x 10.00 m were sampled. The soil fauna was sampled with "Pitfall Traps" method and traps remained in the area for seven days. Data were analyzed using descriptive statistics to determine the main statistical moments (mean variance, coefficient of variation, standard deviation, skewness and kurtosis). Geostatistical tools were used to determine the spatial variability of the attributes using the experimental semivariogram. For the biodiversity analysis, Shannon and Pielou indexes and richness were calculated for each sample. Geostatistics has proven to be a great tool for mapping the spatial variability of groups from the soil epigeal fauna. The family Formicidae proved to be the most abundant and dominant in the study area. The parameters of descriptive statistics showed that all attributes studied showed lognormal frequency distribution for groups from the epigeal soil fauna. The exponential model was the most suited for the obtained data, for both groups of epigeal soil fauna (Acari, Araneae, Coleoptera, Formicidae and Coleoptera larva), and the other biodiversity indexes. The sampling scheme (10.00 m x 10.00 m) was not sufficient to detect the spatial variability for all groups of soil epigeal fauna found in this study.

  13. Estimating statistical uncertainty of Monte Carlo efficiency-gain in the context of a correlated sampling Monte Carlo code for brachytherapy treatment planning with non-normal dose distribution.

    PubMed

    Mukhopadhyay, Nitai D; Sampson, Andrew J; Deniz, Daniel; Alm Carlsson, Gudrun; Williamson, Jeffrey; Malusek, Alexandr

    2012-01-01

    Correlated sampling Monte Carlo methods can shorten computing times in brachytherapy treatment planning. Monte Carlo efficiency is typically estimated via efficiency gain, defined as the reduction in computing time by correlated sampling relative to conventional Monte Carlo methods when equal statistical uncertainties have been achieved. The determination of the efficiency gain uncertainty arising from random effects, however, is not a straightforward task specially when the error distribution is non-normal. The purpose of this study is to evaluate the applicability of the F distribution and standardized uncertainty propagation methods (widely used in metrology to estimate uncertainty of physical measurements) for predicting confidence intervals about efficiency gain estimates derived from single Monte Carlo runs using fixed-collision correlated sampling in a simplified brachytherapy geometry. A bootstrap based algorithm was used to simulate the probability distribution of the efficiency gain estimates and the shortest 95% confidence interval was estimated from this distribution. It was found that the corresponding relative uncertainty was as large as 37% for this particular problem. The uncertainty propagation framework predicted confidence intervals reasonably well; however its main disadvantage was that uncertainties of input quantities had to be calculated in a separate run via a Monte Carlo method. The F distribution noticeably underestimated the confidence interval. These discrepancies were influenced by several photons with large statistical weights which made extremely large contributions to the scored absorbed dose difference. The mechanism of acquiring high statistical weights in the fixed-collision correlated sampling method was explained and a mitigation strategy was proposed. Copyright © 2011 Elsevier Ltd. All rights reserved.

  14. Effect of environment and genotype on commercial maize hybrids using LC/MS-based metabolomics.

    PubMed

    Baniasadi, Hamid; Vlahakis, Chris; Hazebroek, Jan; Zhong, Cathy; Asiago, Vincent

    2014-02-12

    We recently applied gas chromatography coupled to time-of-flight mass spectrometry (GC/TOF-MS) and multivariate statistical analysis to measure biological variation of many metabolites due to environment and genotype in forage and grain samples collected from 50 genetically diverse nongenetically modified (non-GM) DuPont Pioneer commercial maize hybrids grown at six North American locations. In the present study, the metabolome coverage was extended using a core subset of these grain and forage samples employing ultra high pressure liquid chromatography (uHPLC) mass spectrometry (LC/MS). A total of 286 and 857 metabolites were detected in grain and forage samples, respectively, using LC/MS. Multivariate statistical analysis was utilized to compare and correlate the metabolite profiles. Environment had a greater effect on the metabolome than genetic background. The results of this study support and extend previously published insights into the environmental and genetic associated perturbations to the metabolome that are not associated with transgenic modification.

  15. A First Survey on the Abundance of Plastics Fragments and Particles on Two Sandy Beaches in Kuching, Sarawak, Malaysia

    NASA Astrophysics Data System (ADS)

    Noik, V. James; Mohd Tuah, P.

    2015-04-01

    Plastic fragments and particles as an emerging environmental contaminant and pollutant are gaining scientific attention in the recent decades due to the potential threats on biota. This study aims to elucidate the presence, abundance and temporal change of plastic fragments and particles from two selected beaches, namely Santubong and Trombol in Kuching on two sampling times. Morphological and polymer identification assessment on the recovered plastics was also conducted. Overall comparison statistical analysis revealed that the abundance of plastic fragments/debris on both of sampling stations were insignificantly different (p>0.05). Likewise, statistical analysis on the temporal changes on the abundance yielded no significant difference for most of the sampling sites on each respective station, except STB-S2. Morphological studies revealed physical features of plastic fragments and debris were diverse in shapes, sizes, colors and surface fatigues. FTIR fingerprinting analysis shows that polypropylene and polyethylene were the dominant plastic polymers debris on both beaches.

  16. The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations

    PubMed Central

    Städler, Thomas; Haubold, Bernhard; Merino, Carlos; Stephan, Wolfgang; Pfaffelhuber, Peter

    2009-01-01

    Using coalescent simulations, we study the impact of three different sampling schemes on patterns of neutral diversity in structured populations. Specifically, we are interested in two summary statistics based on the site frequency spectrum as a function of migration rate, demographic history of the entire substructured population (including timing and magnitude of specieswide expansions), and the sampling scheme. Using simulations implementing both finite-island and two-dimensional stepping-stone spatial structure, we demonstrate strong effects of the sampling scheme on Tajima's D (DT) and Fu and Li's D (DFL) statistics, particularly under specieswide (range) expansions. Pooled samples yield average DT and DFL values that are generally intermediate between those of local and scattered samples. Local samples (and to a lesser extent, pooled samples) are influenced by local, rapid coalescence events in the underlying coalescent process. These processes result in lower proportions of external branch lengths and hence lower proportions of singletons, explaining our finding that the sampling scheme affects DFL more than it does DT. Under specieswide expansion scenarios, these effects of spatial sampling may persist up to very high levels of gene flow (Nm > 25), implying that local samples cannot be regarded as being drawn from a panmictic population. Importantly, many data sets on humans, Drosophila, and plants contain signatures of specieswide expansions and effects of sampling scheme that are predicted by our simulation results. This suggests that validating the assumption of panmixia is crucial if robust demographic inferences are to be made from local or pooled samples. However, future studies should consider adopting a framework that explicitly accounts for the genealogical effects of population subdivision and empirical sampling schemes. PMID:19237689

  17. Modeling motor vehicle crashes using Poisson-gamma models: examining the effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter.

    PubMed

    Lord, Dominique

    2006-07-01

    There has been considerable research conducted on the development of statistical models for predicting crashes on highway facilities. Despite numerous advancements made for improving the estimation tools of statistical models, the most common probabilistic structure used for modeling motor vehicle crashes remains the traditional Poisson and Poisson-gamma (or Negative Binomial) distribution; when crash data exhibit over-dispersion, the Poisson-gamma model is usually the model of choice most favored by transportation safety modelers. Crash data collected for safety studies often have the unusual attributes of being characterized by low sample mean values. Studies have shown that the goodness-of-fit of statistical models produced from such datasets can be significantly affected. This issue has been defined as the "low mean problem" (LMP). Despite recent developments on methods to circumvent the LMP and test the goodness-of-fit of models developed using such datasets, no work has so far examined how the LMP affects the fixed dispersion parameter of Poisson-gamma models used for modeling motor vehicle crashes. The dispersion parameter plays an important role in many types of safety studies and should, therefore, be reliably estimated. The primary objective of this research project was to verify whether the LMP affects the estimation of the dispersion parameter and, if it is, to determine the magnitude of the problem. The secondary objective consisted of determining the effects of an unreliably estimated dispersion parameter on common analyses performed in highway safety studies. To accomplish the objectives of the study, a series of Poisson-gamma distributions were simulated using different values describing the mean, the dispersion parameter, and the sample size. Three estimators commonly used by transportation safety modelers for estimating the dispersion parameter of Poisson-gamma models were evaluated: the method of moments, the weighted regression, and the maximum likelihood method. In an attempt to complement the outcome of the simulation study, Poisson-gamma models were fitted to crash data collected in Toronto, Ont. characterized by a low sample mean and small sample size. The study shows that a low sample mean combined with a small sample size can seriously affect the estimation of the dispersion parameter, no matter which estimator is used within the estimation process. The probability the dispersion parameter becomes unreliably estimated increases significantly as the sample mean and sample size decrease. Consequently, the results show that an unreliably estimated dispersion parameter can significantly undermine empirical Bayes (EB) estimates as well as the estimation of confidence intervals for the gamma mean and predicted response. The paper ends with recommendations about minimizing the likelihood of producing Poisson-gamma models with an unreliable dispersion parameter for modeling motor vehicle crashes.

  18. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures.

    PubMed

    Scheid, Anika; Nebel, Markus E

    2012-07-09

    Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case - without sacrificing much of the accuracy of the results. Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms.

  19. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures

    PubMed Central

    2012-01-01

    Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case – without sacrificing much of the accuracy of the results. Conclusions Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms. PMID:22776037

  20. Test Statistics and Confidence Intervals to Establish Noninferiority between Treatments with Ordinal Categorical Data.

    PubMed

    Zhang, Fanghong; Miyaoka, Etsuo; Huang, Fuping; Tanaka, Yutaka

    2015-01-01

    The problem for establishing noninferiority is discussed between a new treatment and a standard (control) treatment with ordinal categorical data. A measure of treatment effect is used and a method of specifying noninferiority margin for the measure is provided. Two Z-type test statistics are proposed where the estimation of variance is constructed under the shifted null hypothesis using U-statistics. Furthermore, the confidence interval and the sample size formula are given based on the proposed test statistics. The proposed procedure is applied to a dataset from a clinical trial. A simulation study is conducted to compare the performance of the proposed test statistics with that of the existing ones, and the results show that the proposed test statistics are better in terms of the deviation from nominal level and the power.

  1. Probabilistic assessment method of the non-monotonic dose-responses-Part I: Methodological approach.

    PubMed

    Chevillotte, Grégoire; Bernard, Audrey; Varret, Clémence; Ballet, Pascal; Bodin, Laurent; Roudot, Alain-Claude

    2017-08-01

    More and more studies aim to characterize non-monotonic dose response curves (NMDRCs). The greatest difficulty is to assess the statistical plausibility of NMDRCs from previously conducted dose response studies. This difficulty is linked to the fact that these studies present (i) few doses tested, (ii) a low sample size per dose, and (iii) the absence of any raw data. In this study, we propose a new methodological approach to probabilistically characterize NMDRCs. The methodology is composed of three main steps: (i) sampling from summary data to cover all the possibilities that may be presented by the responses measured by dose and to obtain a new raw database, (ii) statistical analysis of each sampled dose-response curve to characterize the slopes and their signs, and (iii) characterization of these dose-response curves according to the variation of the sign in the slope. This method allows characterizing all types of dose-response curves and can be applied both to continuous data and to discrete data. The aim of this study is to present the general principle of this probabilistic method which allows to assess the non-monotonic dose responses curves, and to present some results. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. Proper and Paradigmatic Metonymy as a Lens for Characterizing Student Conceptions of Distributions and Sampling

    ERIC Educational Resources Information Center

    Noll, Jennifer; Hancock, Stacey

    2015-01-01

    This research investigates what students' use of statistical language can tell us about their conceptions of distribution and sampling in relation to informal inference. Prior research documents students' challenges in understanding ideas of distribution and sampling as tools for making informal statistical inferences. We know that these…

  3. Comparative Financial Statistics for Public Two-Year Colleges: FY 1995 National Sample.

    ERIC Educational Resources Information Center

    Meeker, Bradley

    Based on responses by 405 public two-year colleges in the United States to 2 surveys, this report provides comparative financial information for fiscal year 1994-95. The report provides space for colleges to compare their institutional statistics with national sample medians, quartile data for the national sample, and tables and graphs of…

  4. Comparative Financial Statistics for Public Two-Year Colleges: FY 1994 National Sample.

    ERIC Educational Resources Information Center

    Dickmeyer, Nathan; Meeker, Bradley

    Based on responses by 427 public two-year colleges in the United States to two surveys, this report provides comparative financial information for fiscal year 1993-94. The report provides space for colleges to compare their institutional statistics with national sample medians, quartile data for the national sample, and tables and graphs of…

  5. Application of binomial and multinomial probability statistics to the sampling design process of a global grain tracing and recall system

    USDA-ARS?s Scientific Manuscript database

    Small, coded, pill-sized tracers embedded in grain are proposed as a method for grain traceability. A sampling process for a grain traceability system was designed and investigated by applying probability statistics using a science-based sampling approach to collect an adequate number of tracers fo...

  6. People Patterns: Statistics. Environmental Module for Use in a Mathematics Laboratory Setting.

    ERIC Educational Resources Information Center

    Zastrocky, Michael; Trojan, Arthur

    This module on statistics consists of 18 worksheets that cover such topics as sample spaces, mean, median, mode, taking samples, posting results, analyzing data, and graphing. The last four worksheets require the students to work with samples and use these to compare people's responses. A computer dating service is one result of this work.…

  7. Statistical methods and errors in family medicine articles between 2010 and 2014-Suez Canal University, Egypt: A cross-sectional study.

    PubMed

    Nour-Eldein, Hebatallah

    2016-01-01

    With limited statistical knowledge of most physicians it is not uncommon to find statistical errors in research articles. To determine the statistical methods and to assess the statistical errors in family medicine (FM) research articles that were published between 2010 and 2014. This was a cross-sectional study. All 66 FM research articles that were published over 5 years by FM authors with affiliation to Suez Canal University were screened by the researcher between May and August 2015. Types and frequencies of statistical methods were reviewed in all 66 FM articles. All 60 articles with identified inferential statistics were examined for statistical errors and deficiencies. A comprehensive 58-item checklist based on statistical guidelines was used to evaluate the statistical quality of FM articles. Inferential methods were recorded in 62/66 (93.9%) of FM articles. Advanced analyses were used in 29/66 (43.9%). Contingency tables 38/66 (57.6%), regression (logistic, linear) 26/66 (39.4%), and t-test 17/66 (25.8%) were the most commonly used inferential tests. Within 60 FM articles with identified inferential statistics, no prior sample size 19/60 (31.7%), application of wrong statistical tests 17/60 (28.3%), incomplete documentation of statistics 59/60 (98.3%), reporting P value without test statistics 32/60 (53.3%), no reporting confidence interval with effect size measures 12/60 (20.0%), use of mean (standard deviation) to describe ordinal/nonnormal data 8/60 (13.3%), and errors related to interpretation were mainly for conclusions without support by the study data 5/60 (8.3%). Inferential statistics were used in the majority of FM articles. Data analysis and reporting statistics are areas for improvement in FM research articles.

  8. Statistical methods and errors in family medicine articles between 2010 and 2014-Suez Canal University, Egypt: A cross-sectional study

    PubMed Central

    Nour-Eldein, Hebatallah

    2016-01-01

    Background: With limited statistical knowledge of most physicians it is not uncommon to find statistical errors in research articles. Objectives: To determine the statistical methods and to assess the statistical errors in family medicine (FM) research articles that were published between 2010 and 2014. Methods: This was a cross-sectional study. All 66 FM research articles that were published over 5 years by FM authors with affiliation to Suez Canal University were screened by the researcher between May and August 2015. Types and frequencies of statistical methods were reviewed in all 66 FM articles. All 60 articles with identified inferential statistics were examined for statistical errors and deficiencies. A comprehensive 58-item checklist based on statistical guidelines was used to evaluate the statistical quality of FM articles. Results: Inferential methods were recorded in 62/66 (93.9%) of FM articles. Advanced analyses were used in 29/66 (43.9%). Contingency tables 38/66 (57.6%), regression (logistic, linear) 26/66 (39.4%), and t-test 17/66 (25.8%) were the most commonly used inferential tests. Within 60 FM articles with identified inferential statistics, no prior sample size 19/60 (31.7%), application of wrong statistical tests 17/60 (28.3%), incomplete documentation of statistics 59/60 (98.3%), reporting P value without test statistics 32/60 (53.3%), no reporting confidence interval with effect size measures 12/60 (20.0%), use of mean (standard deviation) to describe ordinal/nonnormal data 8/60 (13.3%), and errors related to interpretation were mainly for conclusions without support by the study data 5/60 (8.3%). Conclusion: Inferential statistics were used in the majority of FM articles. Data analysis and reporting statistics are areas for improvement in FM research articles. PMID:27453839

  9. Schoolchildren with Learning Difficulties Have Low Iron Status and High Anemia Prevalence

    PubMed Central

    Arcanjo, C. P. C.; Santos, P. R.

    2016-01-01

    Background. In developing countries there is high prevalence of iron deficiency anemia, which reduces cognitive performance, work performance, and endurance; it also causes learning difficulties and negative impact on development for infant population. Methods. The study concerns a case-control study; data was collected from an appropriate sample consisting of schoolchildren aged 8 years. The sample was divided into two subgroups: those with deficient initial reading skills (DIRS) (case) and those without (control). Blood samples were taken to analyze hemoglobin and serum ferritin levels. These results were then used to compare the two groups with Student's t-test. Association between DIRS and anemia was analyzed using odds ratio (OR). Results. Hemoglobin and serum ferritin levels of schoolchildren with DIRS were statistically lower when compared to those without, hemoglobin p = 0.02 and serum ferritin p = 0.04. DIRS was statistically associated with a risk of anemia with a weighted OR of 1.62. Conclusions. In this study, schoolchildren with DIRS had lower hemoglobin and serum ferritin levels when compared to those without. PMID:27703806

  10. Schoolchildren with Learning Difficulties Have Low Iron Status and High Anemia Prevalence.

    PubMed

    Arcanjo, F P N; Arcanjo, C P C; Santos, P R

    2016-01-01

    Background . In developing countries there is high prevalence of iron deficiency anemia, which reduces cognitive performance, work performance, and endurance; it also causes learning difficulties and negative impact on development for infant population. Methods . The study concerns a case-control study; data was collected from an appropriate sample consisting of schoolchildren aged 8 years. The sample was divided into two subgroups: those with deficient initial reading skills (DIRS) (case) and those without (control). Blood samples were taken to analyze hemoglobin and serum ferritin levels. These results were then used to compare the two groups with Student's t -test. Association between DIRS and anemia was analyzed using odds ratio (OR). Results . Hemoglobin and serum ferritin levels of schoolchildren with DIRS were statistically lower when compared to those without, hemoglobin p = 0.02 and serum ferritin p = 0.04. DIRS was statistically associated with a risk of anemia with a weighted OR of 1.62. Conclusions . In this study, schoolchildren with DIRS had lower hemoglobin and serum ferritin levels when compared to those without.

  11. Two Different Views on the World Around Us: The World of Uniformity versus Diversity

    PubMed Central

    Nayakankuppam, Dhananjay

    2016-01-01

    We propose that when individuals believe in fixed traits of personality (entity theorists), they are likely to expect a world of “uniformity.” As such, they easily infer a population statistic from a small sample of data with confidence. In contrast, individuals who believe in malleable traits of personality (incremental theorists) are likely to presume a world of “diversity,” such that they “hesitate” to infer a population statistic from a similarly sized sample. In four laboratory experiments, we found that compared to incremental theorists, entity theorists estimated a population mean from a sample with a greater level of confidence (Studies 1a and 1b), expected more homogeneity among the entities within a population (Study 2), and perceived an extreme value to be more indicative of an outlier (Study 3). These results suggest that individuals are likely to use their implicit self-theory orientations (entity theory versus incremental theory) to see a population in general as a constitution either of homogeneous or heterogeneous entities. PMID:27977788

  12. The Beginner's Guide to the Bootstrap Method of Resampling.

    ERIC Educational Resources Information Center

    Lane, Ginny G.

    The bootstrap method of resampling can be useful in estimating the replicability of study results. The bootstrap procedure creates a mock population from a given sample of data from which multiple samples are then drawn. The method extends the usefulness of the jackknife procedure as it allows for computation of a given statistic across a maximal…

  13. Time-cumulated visible and infrared histograms used as descriptor of cloud cover

    NASA Technical Reports Server (NTRS)

    Seze, G.; Rossow, W.

    1987-01-01

    To study the statistical behavior of clouds for different climate regimes, the spatial and temporal stability of VIS-IR bidimensional histograms is tested. Also, the effect of data sampling and averaging on the histogram shapes is considered; in particular the sampling strategy used by the International Satellite Cloud Climatology Project is tested.

  14. Determining sample size for tree utilization surveys

    Treesearch

    Stanley J. Zarnoch; James W. Bentley; Tony G. Johnson

    2004-01-01

    The U.S. Department of Agriculture Forest Service has conducted many studies to determine what proportion of the timber harvested in the South is actually utilized. This paper describes the statistical methods used to determine required sample sizes for estimating utilization ratios for a required level of precision. The data used are those for 515 hardwood and 1,557...

  15. A New Sample Size Formula for Regression.

    ERIC Educational Resources Information Center

    Brooks, Gordon P.; Barcikowski, Robert S.

    The focus of this research was to determine the efficacy of a new method of selecting sample sizes for multiple linear regression. A Monte Carlo simulation was used to study both empirical predictive power rates and empirical statistical power rates of the new method and seven other methods: those of C. N. Park and A. L. Dudycha (1974); J. Cohen…

  16. Examining the Relationship between Gender and Drug-Using Behaviors in Adolescents: The Use of Diagnostic Assessments and Biochemical Analyses of Urine Samples.

    ERIC Educational Resources Information Center

    James, William H.; Moore, David D.

    1999-01-01

    Examines the relationship between gender and drug use among adolescents using diagnostic assessments and biochemical analyses of urine samples. Statistical significance was found in the relationship between gender and marijuana use. The study confirms that more research is needed in this area. (Author/MKA)

  17. Followup Audit: DoD Military Treatment Facilities Continue to Miss Opportunities to Collect on Third Party Outpatient Claims

    DTIC Science & Technology

    2015-07-24

    Business Office Manual at the six MTFs reviewed. Based on the statistical sample, there were 144,930 claims worth $34.8 million that had at least...19 Parameters ______________________________________________________________________________ 19 Statistical ...the UBO Manual at the six MTFs reviewed. Based on the statistical sample, there were 144,930 claims worth $34.8 million that had at least one

  18. Sampling design considerations for demographic studies: a case of colonial seabirds

    USGS Publications Warehouse

    Kendall, William L.; Converse, Sarah J.; Doherty, Paul F.; Naughton, Maura B.; Anders, Angela; Hines, James E.; Flint, Elizabeth

    2009-01-01

    For the purposes of making many informed conservation decisions, the main goal for data collection is to assess population status and allow prediction of the consequences of candidate management actions. Reducing the bias and variance of estimates of population parameters reduces uncertainty in population status and projections, thereby reducing the overall uncertainty under which a population manager must make a decision. In capture-recapture studies, imperfect detection of individuals, unobservable life-history states, local movement outside study areas, and tag loss can cause bias or precision problems with estimates of population parameters. Furthermore, excessive disturbance to individuals during capture?recapture sampling may be of concern because disturbance may have demographic consequences. We address these problems using as an example a monitoring program for Black-footed Albatross (Phoebastria nigripes) and Laysan Albatross (Phoebastria immutabilis) nesting populations in the northwestern Hawaiian Islands. To mitigate these estimation problems, we describe a synergistic combination of sampling design and modeling approaches. Solutions include multiple capture periods per season and multistate, robust design statistical models, dead recoveries and incidental observations, telemetry and data loggers, buffer areas around study plots to neutralize the effect of local movements outside study plots, and double banding and statistical models that account for band loss. We also present a variation on the robust capture?recapture design and a corresponding statistical model that minimizes disturbance to individuals. For the albatross case study, this less invasive robust design was more time efficient and, when used in combination with a traditional robust design, reduced the standard error of detection probability by 14% with only two hours of additional effort in the field. These field techniques and associated modeling approaches are applicable to studies of most taxa being marked and in some cases have individually been applied to studies of birds, fish, herpetofauna, and mammals.

  19. LD Score Regression Distinguishes Confounding from Polygenicity in Genome-Wide Association Studies

    PubMed Central

    Bulik-Sullivan, Brendan K.; Loh, Po-Ru; Finucane, Hilary; Ripke, Stephan; Yang, Jian; Patterson, Nick; Daly, Mark J.; Price, Alkes L.; Neale, Benjamin M.

    2015-01-01

    Both polygenicity (i.e., many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield an inflated distribution of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from true polygenic signal and bias. We have developed an approach, LD Score regression, that quantifies the contribution of each by examining the relationship between test statistics and linkage disequilibrium (LD). The LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of test statistic inflation in many GWAS of large sample size. PMID:25642630

  20. Nurses' autonomy level in teaching hospitals and its relationship with the underlying factors.

    PubMed

    Amini, Kourosh; Negarandeh, Reza; Ramezani-Badr, Farhad; Moosaeifard, Mahdi; Fallah, Ramezan

    2015-02-01

    This study aimed to determine the autonomy level of nurses in hospitals affiliated to Zanjan University of Medical Sciences, Iran. In this descriptive cross-sectional study, 252 subjects were recruited using systematic random sampling method. The data were collected using questionnaire including Dempster Practice Behavior Scale. For data analysis, descriptive statistics and to compare the overall score and its subscales according to the demographic variables, t-test and analysis of variance test were used. The nurses in this study had medium professional autonomy. Statistical tests showed significant differences in the research sample according to age, gender, work experience, working position and place of work. The results of this study revealed that most of the nurses who participated in the study compared with western societies have lower professional autonomy. More studies are needed to determine the factors related to this difference and how we can promote Iranian nurses' autonomy. © 2013 Wiley Publishing Asia Pty Ltd.

Top