42 CFR 402.109 - Statistical sampling.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 42 Public Health 2 2011-10-01 2011-10-01 false Statistical sampling. 402.109 Section 402.109... Statistical sampling. (a) Purpose. CMS or OIG may introduce the results of a statistical sampling study to... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study...
42 CFR 402.109 - Statistical sampling.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 42 Public Health 2 2010-10-01 2010-10-01 false Statistical sampling. 402.109 Section 402.109... Statistical sampling. (a) Purpose. CMS or OIG may introduce the results of a statistical sampling study to... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study...
45 CFR 160.536 - Statistical sampling.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 45 Public Welfare 1 2010-10-01 2010-10-01 false Statistical sampling. 160.536 Section 160.536... REQUIREMENTS GENERAL ADMINISTRATIVE REQUIREMENTS Procedures for Hearings § 160.536 Statistical sampling. (a) In... statistical sampling study as evidence of the number of violations under § 160.406 of this part, or the...
42 CFR 1003.133 - Statistical sampling.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 42 Public Health 5 2011-10-01 2011-10-01 false Statistical sampling. 1003.133 Section 1003.133... AUTHORITIES CIVIL MONEY PENALTIES, ASSESSMENTS AND EXCLUSIONS § 1003.133 Statistical sampling. (a) In meeting... statistical sampling study as evidence of the number and amount of claims and/or requests for payment as...
45 CFR 160.536 - Statistical sampling.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 45 Public Welfare 1 2011-10-01 2011-10-01 false Statistical sampling. 160.536 Section 160.536... REQUIREMENTS GENERAL ADMINISTRATIVE REQUIREMENTS Procedures for Hearings § 160.536 Statistical sampling. (a) In... statistical sampling study as evidence of the number of violations under § 160.406 of this part, or the...
42 CFR 1003.133 - Statistical sampling.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 42 Public Health 5 2010-10-01 2010-10-01 false Statistical sampling. 1003.133 Section 1003.133... AUTHORITIES CIVIL MONEY PENALTIES, ASSESSMENTS AND EXCLUSIONS § 1003.133 Statistical sampling. (a) In meeting... statistical sampling study as evidence of the number and amount of claims and/or requests for payment as...
42 CFR 405.1064 - ALJ decisions involving statistical samples.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 42 Public Health 2 2011-10-01 2011-10-01 false ALJ decisions involving statistical samples. 405... Medicare Coverage Policies § 405.1064 ALJ decisions involving statistical samples. When an appeal from the QIC involves an overpayment issue and the QIC used a statistical sample in reaching its...
42 CFR 405.1064 - ALJ decisions involving statistical samples.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 42 Public Health 2 2010-10-01 2010-10-01 false ALJ decisions involving statistical samples. 405... Medicare Coverage Policies § 405.1064 ALJ decisions involving statistical samples. When an appeal from the QIC involves an overpayment issue and the QIC used a statistical sample in reaching its...
Comparative Financial Statistics for Public Two-Year Colleges: FY 1993 National Sample.
ERIC Educational Resources Information Center
Dickmeyer, Nathan; Meeker, Bradley
This report provides comparative information derived from a national sample of 516 public two-year colleges, highlighting financial statistics for fiscal year, 1992-93. This report provides space for colleges to compare their institutional statistics with national sample medians, quartile data for the national sample, and statistics presented in a…
7 CFR 52.38a - Definitions of terms applicable to statistical sampling.
Code of Federal Regulations, 2011 CFR
2011-01-01
... 7 Agriculture 2 2011-01-01 2011-01-01 false Definitions of terms applicable to statistical... Sampling § 52.38a Definitions of terms applicable to statistical sampling. (a) Terms applicable to both on... acceptable as a process average. At the AQL's contained in the statistical sampling plans of this subpart...
7 CFR 52.38a - Definitions of terms applicable to statistical sampling.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 2 2010-01-01 2010-01-01 false Definitions of terms applicable to statistical... Sampling § 52.38a Definitions of terms applicable to statistical sampling. (a) Terms applicable to both on... acceptable as a process average. At the AQL's contained in the statistical sampling plans of this subpart...
Evidence for a Global Sampling Process in Extraction of Summary Statistics of Item Sizes in a Set.
Tokita, Midori; Ueda, Sachiyo; Ishiguchi, Akira
2016-01-01
Several studies have shown that our visual system may construct a "summary statistical representation" over groups of visual objects. Although there is a general understanding that human observers can accurately represent sets of a variety of features, many questions on how summary statistics, such as an average, are computed remain unanswered. This study investigated sampling properties of visual information used by human observers to extract two types of summary statistics of item sets, average and variance. We presented three models of ideal observers to extract the summary statistics: a global sampling model without sampling noise, global sampling model with sampling noise, and limited sampling model. We compared the performance of an ideal observer of each model with that of human observers using statistical efficiency analysis. Results suggest that summary statistics of items in a set may be computed without representing individual items, which makes it possible to discard the limited sampling account. Moreover, the extraction of summary statistics may not necessarily require the representation of individual objects with focused attention when the sets of items are larger than 4.
Rasch fit statistics and sample size considerations for polytomous data.
Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael
2008-05-29
Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire - 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges.
Rasch fit statistics and sample size considerations for polytomous data
Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael
2008-01-01
Background Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Methods Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire – 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. Results The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. Conclusion It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges. PMID:18510722
Understanding the Sampling Distribution and the Central Limit Theorem.
ERIC Educational Resources Information Center
Lewis, Charla P.
The sampling distribution is a common source of misuse and misunderstanding in the study of statistics. The sampling distribution, underlying distribution, and the Central Limit Theorem are all interconnected in defining and explaining the proper use of the sampling distribution of various statistics. The sampling distribution of a statistic is…
ERIC Educational Resources Information Center
Garfield, Joan; Le, Laura; Zieffler, Andrew; Ben-Zvi, Dani
2015-01-01
This paper describes the importance of developing students' reasoning about samples and sampling variability as a foundation for statistical thinking. Research on expert-novice thinking as well as statistical thinking is reviewed and compared. A case is made that statistical thinking is a type of expert thinking, and as such, research…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piepel, Gregory F.; Matzke, Brett D.; Sego, Landon H.
2013-04-27
This report discusses the methodology, formulas, and inputs needed to make characterization and clearance decisions for Bacillus anthracis-contaminated and uncontaminated (or decontaminated) areas using a statistical sampling approach. Specifically, the report includes the methods and formulas for calculating the • number of samples required to achieve a specified confidence in characterization and clearance decisions • confidence in making characterization and clearance decisions for a specified number of samples for two common statistically based environmental sampling approaches. In particular, the report addresses an issue raised by the Government Accountability Office by providing methods and formulas to calculate the confidence that amore » decision area is uncontaminated (or successfully decontaminated) if all samples collected according to a statistical sampling approach have negative results. Key to addressing this topic is the probability that an individual sample result is a false negative, which is commonly referred to as the false negative rate (FNR). The two statistical sampling approaches currently discussed in this report are 1) hotspot sampling to detect small isolated contaminated locations during the characterization phase, and 2) combined judgment and random (CJR) sampling during the clearance phase. Typically if contamination is widely distributed in a decision area, it will be detectable via judgment sampling during the characterization phrase. Hotspot sampling is appropriate for characterization situations where contamination is not widely distributed and may not be detected by judgment sampling. CJR sampling is appropriate during the clearance phase when it is desired to augment judgment samples with statistical (random) samples. The hotspot and CJR statistical sampling approaches are discussed in the report for four situations: 1. qualitative data (detect and non-detect) when the FNR = 0 or when using statistical sampling methods that account for FNR > 0 2. qualitative data when the FNR > 0 but statistical sampling methods are used that assume the FNR = 0 3. quantitative data (e.g., contaminant concentrations expressed as CFU/cm2) when the FNR = 0 or when using statistical sampling methods that account for FNR > 0 4. quantitative data when the FNR > 0 but statistical sampling methods are used that assume the FNR = 0. For Situation 2, the hotspot sampling approach provides for stating with Z% confidence that a hotspot of specified shape and size with detectable contamination will be found. Also for Situation 2, the CJR approach provides for stating with X% confidence that at least Y% of the decision area does not contain detectable contamination. Forms of these statements for the other three situations are discussed in Section 2.2. Statistical methods that account for FNR > 0 currently only exist for the hotspot sampling approach with qualitative data (or quantitative data converted to qualitative data). This report documents the current status of methods and formulas for the hotspot and CJR sampling approaches. Limitations of these methods are identified. Extensions of the methods that are applicable when FNR = 0 to account for FNR > 0, or to address other limitations, will be documented in future revisions of this report if future funding supports the development of such extensions. For quantitative data, this report also presents statistical methods and formulas for 1. quantifying the uncertainty in measured sample results 2. estimating the true surface concentration corresponding to a surface sample 3. quantifying the uncertainty of the estimate of the true surface concentration. All of the methods and formulas discussed in the report were applied to example situations to illustrate application of the methods and interpretation of the results.« less
An audit of the statistics and the comparison with the parameter in the population
NASA Astrophysics Data System (ADS)
Bujang, Mohamad Adam; Sa'at, Nadiah; Joys, A. Reena; Ali, Mariana Mohamad
2015-10-01
The sufficient sample size that is needed to closely estimate the statistics for particular parameters are use to be an issue. Although sample size might had been calculated referring to objective of the study, however, it is difficult to confirm whether the statistics are closed with the parameter for a particular population. All these while, guideline that uses a p-value less than 0.05 is widely used as inferential evidence. Therefore, this study had audited results that were analyzed from various sub sample and statistical analyses and had compared the results with the parameters in three different populations. Eight types of statistical analysis and eight sub samples for each statistical analysis were analyzed. Results found that the statistics were consistent and were closed to the parameters when the sample study covered at least 15% to 35% of population. Larger sample size is needed to estimate parameter that involve with categorical variables compared with numerical variables. Sample sizes with 300 to 500 are sufficient to estimate the parameters for medium size of population.
Design-based Sample and Probability Law-Assumed Sample: Their Role in Scientific Investigation.
ERIC Educational Resources Information Center
Ojeda, Mario Miguel; Sahai, Hardeo
2002-01-01
Discusses some key statistical concepts in probabilistic and non-probabilistic sampling to provide an overview for understanding the inference process. Suggests a statistical model constituting the basis of statistical inference and provides a brief review of the finite population descriptive inference and a quota sampling inferential theory.…
Comparative Financial Statistics for Public Two-Year Colleges: FY 1991 National Sample.
ERIC Educational Resources Information Center
Dickmeyer, Nathan; Cirino, Anna Marie
This report provides comparative financial information derived from a national sample of 503 public two-year colleges. The report includes space for colleges to compare their institutional statistics with data provided on national sample medians; quartile data for the national sample; and statistics presented in various formats, including tables,…
Code of Federal Regulations, 2011 CFR
2011-01-01
... 7 Agriculture 2 2011-01-01 2011-01-01 false Statistical sampling procedures for lot inspection of processed fruits and vegetables by attributes. 52.38c Section 52.38c Agriculture Regulations of the... Regulations Governing Inspection and Certification Sampling § 52.38c Statistical sampling procedures for lot...
Code of Federal Regulations, 2011 CFR
2011-01-01
... 7 Agriculture 2 2011-01-01 2011-01-01 false Statistical sampling procedures for on-line inspection by attributes of processed fruits and vegetables. 52.38b Section 52.38b Agriculture Regulations of... Regulations Governing Inspection and Certification Sampling § 52.38b Statistical sampling procedures for on...
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 2 2010-01-01 2010-01-01 false Statistical sampling procedures for on-line inspection by attributes of processed fruits and vegetables. 52.38b Section 52.38b Agriculture Regulations of... Regulations Governing Inspection and Certification Sampling § 52.38b Statistical sampling procedures for on...
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 2 2010-01-01 2010-01-01 false Statistical sampling procedures for lot inspection of processed fruits and vegetables by attributes. 52.38c Section 52.38c Agriculture Regulations of the... Regulations Governing Inspection and Certification Sampling § 52.38c Statistical sampling procedures for lot...
75 FR 38871 - Proposed Collection; Comment Request for Revenue Procedure 2004-29
Federal Register 2010, 2011, 2012, 2013, 2014
2010-07-06
... comments concerning Revenue Procedure 2004-29, Statistical Sampling in Sec. 274 Context. DATES: Written... Internet, at [email protected] . SUPPLEMENTARY INFORMATION: Title: Statistical Sampling in Sec...: Revenue Procedure 2004-29 prescribes the statistical sampling methodology by which taxpayers under...
Time Series Analysis Based on Running Mann Whitney Z Statistics
USDA-ARS?s Scientific Manuscript database
A sensitive and objective time series analysis method based on the calculation of Mann Whitney U statistics is described. This method samples data rankings over moving time windows, converts those samples to Mann-Whitney U statistics, and then normalizes the U statistics to Z statistics using Monte-...
Hansen, John P
2003-01-01
Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 2, describes probability, populations, and samples. The uses of descriptive and inferential statistics are outlined. The article also discusses the properties and probability of normal distributions, including the standard normal distribution.
75 FR 53738 - Proposed Collection; Comment Request for Rev. Proc. 2007-35
Federal Register 2010, 2011, 2012, 2013, 2014
2010-09-01
... Revenue Procedure Revenue Procedure 2007-35, Statistical Sampling for purposes of Section 199. DATES... through the Internet, at [email protected] . SUPPLEMENTARY INFORMATION: Title: Statistical Sampling...: This revenue procedure provides for determining when statistical sampling may be used in purposes of...
Hagell, Peter; Westergren, Albert
Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).
78 FR 43002 - Proposed Collection; Comment Request for Revenue Procedure 2004-29
Federal Register 2010, 2011, 2012, 2013, 2014
2013-07-18
... comments concerning statistical sampling in Sec. 274 Context. DATES: Written comments should be received on... INFORMATION: Title: Statistical Sampling in Sec. 274 Contest. OMB Number: 1545-1847. Revenue Procedure Number: Revenue Procedure 2004-29. Abstract: Revenue Procedure 2004-29 prescribes the statistical sampling...
42 CFR 1003.133 - Statistical sampling.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 42 Public Health 5 2014-10-01 2014-10-01 false Statistical sampling. 1003.133 Section 1003.133 Public Health OFFICE OF INSPECTOR GENERAL-HEALTH CARE, DEPARTMENT OF HEALTH AND HUMAN SERVICES OIG AUTHORITIES CIVIL MONEY PENALTIES, ASSESSMENTS AND EXCLUSIONS § 1003.133 Statistical sampling. (a) In meeting...
EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.
Tong, Xiaoxiao; Bentler, Peter M
2013-01-01
Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.
78 FR 63568 - Proposed Collection; Comment Request for Rev. Proc. 2007-35
Federal Register 2010, 2011, 2012, 2013, 2014
2013-10-24
... Revenue Procedure 2007-35, Statistical Sampling for purposes of Section 199. DATES: Written comments... . SUPPLEMENTARY INFORMATION: Title: Statistical Sampling for purposes of Section 199. OMB Number: 1545-2072... statistical sampling may be used in purposes of section 199, which provides a deduction for income...
[Effect sizes, statistical power and sample sizes in "the Japanese Journal of Psychology"].
Suzukawa, Yumi; Toyoda, Hideki
2012-04-01
This study analyzed the statistical power of research studies published in the "Japanese Journal of Psychology" in 2008 and 2009. Sample effect sizes and sample statistical powers were calculated for each statistical test and analyzed with respect to the analytical methods and the fields of the studies. The results show that in the fields like perception, cognition or learning, the effect sizes were relatively large, although the sample sizes were small. At the same time, because of the small sample sizes, some meaningful effects could not be detected. In the other fields, because of the large sample sizes, meaningless effects could be detected. This implies that researchers who could not get large enough effect sizes would use larger samples to obtain significant results.
Statistical Inference for Data Adaptive Target Parameters.
Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J
2016-05-01
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Heidel, R Eric
2016-01-01
Statistical power is the ability to detect a significant effect, given that the effect actually exists in a population. Like most statistical concepts, statistical power tends to induce cognitive dissonance in hepatology researchers. However, planning for statistical power by an a priori sample size calculation is of paramount importance when designing a research study. There are five specific empirical components that make up an a priori sample size calculation: the scale of measurement of the outcome, the research design, the magnitude of the effect size, the variance of the effect size, and the sample size. A framework grounded in the phenomenon of isomorphism, or interdependencies amongst different constructs with similar forms, will be presented to understand the isomorphic effects of decisions made on each of the five aforementioned components of statistical power.
Sampling methods to the statistical control of the production of blood components.
Pereira, Paulo; Seghatchian, Jerard; Caldeira, Beatriz; Santos, Paula; Castro, Rosa; Fernandes, Teresa; Xavier, Sandra; de Sousa, Gracinda; de Almeida E Sousa, João Paulo
2017-12-01
The control of blood components specifications is a requirement generalized in Europe by the European Commission Directives and in the US by the AABB standards. The use of a statistical process control methodology is recommended in the related literature, including the EDQM guideline. The control reliability is dependent of the sampling. However, a correct sampling methodology seems not to be systematically applied. Commonly, the sampling is intended to comply uniquely with the 1% specification to the produced blood components. Nevertheless, on a purely statistical viewpoint, this model could be argued not to be related to a consistent sampling technique. This could be a severe limitation to detect abnormal patterns and to assure that the production has a non-significant probability of producing nonconforming components. This article discusses what is happening in blood establishments. Three statistical methodologies are proposed: simple random sampling, sampling based on the proportion of a finite population, and sampling based on the inspection level. The empirical results demonstrate that these models are practicable in blood establishments contributing to the robustness of sampling and related statistical process control decisions for the purpose they are suggested for. Copyright © 2017 Elsevier Ltd. All rights reserved.
Pocket guide to transportation, 1999
DOT National Transportation Integrated Search
1998-12-01
Statistics published in this Pocket Guide to Transportation come from many different sources. Some statistics are based on samples and are subject to sampling variability. Statistics may also be subject to omissions and errors in reporting, recording...
Pocket guide to transportation, 2009
DOT National Transportation Integrated Search
2009-01-01
Statistics published in this Pocket Guide to Transportation come from many different sources. Some statistics are based on samples and are subject to sampling variability. Statistics may also be subject to omissions and errors in reporting, recording...
Pocket guide to transportation, 2013.
DOT National Transportation Integrated Search
2013-01-01
Abstract Statistics published in this Pocket Guide to Transportation come from many different sources. Some statistics are based on samples and are subject to sampling variability. Statistics may also be subject to omissions and errors in reporting, ...
Pocket guide to transportation, 2010
DOT National Transportation Integrated Search
2010-01-01
Statistics published in this Pocket Guide to Transportation come from many different sources. Some statistics are based on samples and are subject to sampling variability. Statistics may also be subject to omissions and errors in reporting, recording...
Comparative Financial Statistics for Public Two-Year Colleges: FY 1992 National Sample.
ERIC Educational Resources Information Center
Dickmeyer, Nathan; Cirino, Anna Marie
This report, the 15th in an annual series, provides comparative information derived from a national sample of 544 public two-year colleges, highlighting financial statistics for fiscal year 1991-92. The report offers space for colleges to compare their institutional statistics with data provided on national sample medians; quartile data for the…
Comparing Simulated and Theoretical Sampling Distributions of the U3 Person-Fit Statistic.
ERIC Educational Resources Information Center
Emons, Wilco H. M.; Meijer, Rob R.; Sijtsma, Klaas
2002-01-01
Studied whether the theoretical sampling distribution of the U3 person-fit statistic is in agreement with the simulated sampling distribution under different item response theory models and varying item and test characteristics. Simulation results suggest that the use of standard normal deviates for the standardized version of the U3 statistic may…
Westfall, Jacob; Kenny, David A; Judd, Charles M
2014-10-01
Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The conventional procedures of statistical power analysis fail to provide appropriate answers to these questions because they are based on statistical models in which stimuli are not assumed to be a source of random variation in the data, models that are inappropriate for experiments involving crossed random factors of participants and stimuli. In this article, we present new methods of power analysis for designs with crossed random factors, and we give detailed, practical guidance to psychology researchers planning experiments in which a sample of participants responds to a sample of stimuli. We extensively examine 5 commonly used experimental designs, describe how to estimate statistical power in each, and provide power analysis results based on a reasonable set of default parameter values. We then develop general conclusions and formulate rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli. We show that in crossed designs, statistical power typically does not approach unity as the number of participants goes to infinity but instead approaches a maximum attainable power value that is possibly small, depending on the stimulus sample. We also consider the statistical merits of designs involving multiple stimulus blocks. Finally, we provide a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.
Statistical Symbolic Execution with Informed Sampling
NASA Technical Reports Server (NTRS)
Filieri, Antonio; Pasareanu, Corina S.; Visser, Willem; Geldenhuys, Jaco
2014-01-01
Symbolic execution techniques have been proposed recently for the probabilistic analysis of programs. These techniques seek to quantify the likelihood of reaching program events of interest, e.g., assert violations. They have many promising applications but have scalability issues due to high computational demand. To address this challenge, we propose a statistical symbolic execution technique that performs Monte Carlo sampling of the symbolic program paths and uses the obtained information for Bayesian estimation and hypothesis testing with respect to the probability of reaching the target events. To speed up the convergence of the statistical analysis, we propose Informed Sampling, an iterative symbolic execution that first explores the paths that have high statistical significance, prunes them from the state space and guides the execution towards less likely paths. The technique combines Bayesian estimation with a partial exact analysis for the pruned paths leading to provably improved convergence of the statistical analysis. We have implemented statistical symbolic execution with in- formed sampling in the Symbolic PathFinder tool. We show experimentally that the informed sampling obtains more precise results and converges faster than a purely statistical analysis and may also be more efficient than an exact symbolic analysis. When the latter does not terminate symbolic execution with informed sampling can give meaningful results under the same time and memory limits.
The Importance of Introductory Statistics Students Understanding Appropriate Sampling Techniques
ERIC Educational Resources Information Center
Menil, Violeta C.
2005-01-01
In this paper the author discusses the meaning of sampling, the reasons for sampling, the Central Limit Theorem, and the different techniques of sampling. Practical and relevant examples are given to make the appropriate sampling techniques understandable to students of Introductory Statistics courses. With a thorough knowledge of sampling…
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.
Lin, Johnny; Bentler, Peter M
2012-01-01
Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.
STATISTICAL SAMPLING AND DATA ANALYSIS
Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other hetero...
Houts, Carrie R; Edwards, Michael C; Wirth, R J; Deal, Linda S
2016-11-01
There has been a notable increase in the advocacy of using small-sample designs as an initial quantitative assessment of item and scale performance during the scale development process. This is particularly true in the development of clinical outcome assessments (COAs), where Rasch analysis has been advanced as an appropriate statistical tool for evaluating the developing COAs using a small sample. We review the benefits such methods are purported to offer from both a practical and statistical standpoint and detail several problematic areas, including both practical and statistical theory concerns, with respect to the use of quantitative methods, including Rasch-consistent methods, with small samples. The feasibility of obtaining accurate information and the potential negative impacts of misusing large-sample statistical methods with small samples during COA development are discussed.
NASA Astrophysics Data System (ADS)
ten Veldhuis, Marie-Claire; Schleiss, Marc
2017-04-01
In this study, we introduced an alternative approach for analysis of hydrological flow time series, using an adaptive sampling framework based on inter-amount times (IATs). The main difference with conventional flow time series is the rate at which low and high flows are sampled: the unit of analysis for IATs is a fixed flow amount, instead of a fixed time window. We analysed statistical distributions of flows and IATs across a wide range of sampling scales to investigate sensitivity of statistical properties such as quantiles, variance, skewness, scaling parameters and flashiness indicators to the sampling scale. We did this based on streamflow time series for 17 (semi)urbanised basins in North Carolina, US, ranging from 13 km2 to 238 km2 in size. Results showed that adaptive sampling of flow time series based on inter-amounts leads to a more balanced representation of low flow and peak flow values in the statistical distribution. While conventional sampling gives a lot of weight to low flows, as these are most ubiquitous in flow time series, IAT sampling gives relatively more weight to high flow values, when given flow amounts are accumulated in shorter time. As a consequence, IAT sampling gives more information about the tail of the distribution associated with high flows, while conventional sampling gives relatively more information about low flow periods. We will present results of statistical analyses across a range of subdaily to seasonal scales and will highlight some interesting insights that can be derived from IAT statistics with respect to basin flashiness and impact urbanisation on hydrological response.
PROBABILITY SAMPLING AND POPULATION INFERENCE IN MONITORING PROGRAMS
A fundamental difference between probability sampling and conventional statistics is that "sampling" deals with real, tangible populations, whereas "conventional statistics" usually deals with hypothetical populations that have no real-world realization. he focus here is on real ...
Statistical distribution sampling
NASA Technical Reports Server (NTRS)
Johnson, E. S.
1975-01-01
Determining the distribution of statistics by sampling was investigated. Characteristic functions, the quadratic regression problem, and the differential equations for the characteristic functions are analyzed.
It's all relative: ranking the diversity of aquatic bacterial communities.
Shaw, Allison K; Halpern, Aaron L; Beeson, Karen; Tran, Bao; Venter, J Craig; Martiny, Jennifer B H
2008-09-01
The study of microbial diversity patterns is hampered by the enormous diversity of microbial communities and the lack of resources to sample them exhaustively. For many questions about richness and evenness, however, one only needs to know the relative order of diversity among samples rather than total diversity. We used 16S libraries from the Global Ocean Survey to investigate the ability of 10 diversity statistics (including rarefaction, non-parametric, parametric, curve extrapolation and diversity indices) to assess the relative diversity of six aquatic bacterial communities. Overall, we found that the statistics yielded remarkably similar rankings of the samples for a given sequence similarity cut-off. This correspondence, despite the different underlying assumptions of the statistics, suggests that diversity statistics are a useful tool for ranking samples of microbial diversity. In addition, sequence similarity cut-off influenced the diversity ranking of the samples, demonstrating that diversity statistics can also be used to detect differences in phylogenetic structure among microbial communities. Finally, a subsampling analysis suggests that further sequencing from these particular clone libraries would not have substantially changed the richness rankings of the samples.
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis
Lin, Johnny; Bentler, Peter M.
2012-01-01
Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne’s asymptotically distribution-free method and Satorra Bentler’s mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler’s statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby’s study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic. PMID:23144511
Qualitative Meta-Analysis on the Hospital Task: Implications for Research
ERIC Educational Resources Information Center
Noll, Jennifer; Sharma, Sashi
2014-01-01
The "law of large numbers" indicates that as sample size increases, sample statistics become less variable and more closely estimate their corresponding population parameters. Different research studies investigating how people consider sample size when evaluating the reliability of a sample statistic have found a wide range of…
USDA-ARS?s Scientific Manuscript database
Statistically robust sampling strategies form an integral component of grain storage and handling activities throughout the world. Developing sampling strategies to target biological pests such as insects in stored grain is inherently difficult due to species biology and behavioral characteristics. ...
Standard deviation and standard error of the mean.
Lee, Dong Kyu; In, Junyong; Lee, Sangseok
2015-06-01
In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results.
Standard deviation and standard error of the mean
In, Junyong; Lee, Sangseok
2015-01-01
In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results. PMID:26045923
Explorations in statistics: the log transformation.
Curran-Everett, Douglas
2018-06-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This thirteenth installment of Explorations in Statistics explores the log transformation, an established technique that rescales the actual observations from an experiment so that the assumptions of some statistical analysis are better met. A general assumption in statistics is that the variability of some response Y is homogeneous across groups or across some predictor variable X. If the variability-the standard deviation-varies in rough proportion to the mean value of Y, a log transformation can equalize the standard deviations. Moreover, if the actual observations from an experiment conform to a skewed distribution, then a log transformation can make the theoretical distribution of the sample mean more consistent with a normal distribution. This is important: the results of a one-sample t test are meaningful only if the theoretical distribution of the sample mean is roughly normal. If we log-transform our observations, then we want to confirm the transformation was useful. We can do this if we use the Box-Cox method, if we bootstrap the sample mean and the statistic t itself, and if we assess the residual plots from the statistical model of the actual and transformed sample observations.
Multi-pulse multi-delay (MPMD) multiple access modulation for UWB
Dowla, Farid U.; Nekoogar, Faranak
2007-03-20
A new modulation scheme in UWB communications is introduced. This modulation technique utilizes multiple orthogonal transmitted-reference pulses for UWB channelization. The proposed UWB receiver samples the second order statistical function at both zero and non-zero lags and matches the samples to stored second order statistical functions, thus sampling and matching the shape of second order statistical functions rather than just the shape of the received pulses.
Currens, J.C.
1999-01-01
Analytical data for nitrate and triazines from 566 samples collected over a 3-year period at Pleasant Grove Spring, Logan County, KY, were statistically analyzed to determine the minimum data set needed to calculate meaningful yearly averages for a conduit-flow karst spring. Results indicate that a biweekly sampling schedule augmented with bihourly samples from high-flow events will provide meaningful suspended-constituent and dissolved-constituent statistics. Unless collected over an extensive period of time, daily samples may not be representative and may also be autocorrelated. All high-flow events resulting in a significant deflection of a constituent from base-line concentrations should be sampled. Either the geometric mean or the flow-weighted average of the suspended constituents should be used. If automatic samplers are used, then they may be programmed to collect storm samples as frequently as every few minutes to provide details on the arrival time of constituents of interest. However, only samples collected bihourly should be used to calculate averages. By adopting a biweekly sampling schedule augmented with high-flow samples, the need to continuously monitor discharge, or to search for and analyze existing data to develop a statistically valid monitoring plan, is lessened.Analytical data for nitrate and triazines from 566 samples collected over a 3-year period at Pleasant Grove Spring, Logan County, KY, were statistically analyzed to determine the minimum data set needed to calculate meaningful yearly averages for a conduit-flow karst spring. Results indicate that a biweekly sampling schedule augmented with bihourly samples from high-flow events will provide meaningful suspended-constituent and dissolved-constituent statistics. Unless collected over an extensive period of time, daily samples may not be representative and may also be autocorrelated. All high-flow events resulting in a significant deflection of a constituent from base-line concentrations should be sampled. Either the geometric mean or the flow-weighted average of the suspended constituents should be used. If automatic samplers are used, then they may be programmed to collect storm samples as frequently as every few minutes to provide details on the arrival time of constituents of interest. However, only samples collected bihourly should be used to calculate averages. By adopting a biweekly sampling schedule augmented with high-flow samples, the need to continuously monitor discharge, or to search for and analyze existing data to develop a statistically valid monitoring plan, is lessened.
2015-08-01
the nine questions. The Statistical Package for the Social Sciences ( SPSS ) [11] was used to conduct statistical analysis on the sample. Two types...constructs. SPSS was again used to conduct statistical analysis on the sample. This time factor analysis was conducted. Factor analysis attempts to...Business Research Methods and Statistics using SPSS . P432. 11 IBM SPSS Statistics . (2012) 12 Burns, R.B., Burns, R.A. (2008) ‘Business Research
Statistical Analysis Techniques for Small Sample Sizes
NASA Technical Reports Server (NTRS)
Navard, S. E.
1984-01-01
The small sample sizes problem which is encountered when dealing with analysis of space-flight data is examined. Because of such a amount of data available, careful analyses are essential to extract the maximum amount of information with acceptable accuracy. Statistical analysis of small samples is described. The background material necessary for understanding statistical hypothesis testing is outlined and the various tests which can be done on small samples are explained. Emphasis is on the underlying assumptions of each test and on considerations needed to choose the most appropriate test for a given type of analysis.
[Statistical prediction methods in violence risk assessment and its application].
Liu, Yuan-Yuan; Hu, Jun-Mei; Yang, Min; Li, Xiao-Song
2013-06-01
It is an urgent global problem how to improve the violence risk assessment. As a necessary part of risk assessment, statistical methods have remarkable impacts and effects. In this study, the predicted methods in violence risk assessment from the point of statistics are reviewed. The application of Logistic regression as the sample of multivariate statistical model, decision tree model as the sample of data mining technique, and neural networks model as the sample of artificial intelligence technology are all reviewed. This study provides data in order to contribute the further research of violence risk assessment.
Chi-squared and C statistic minimization for low count per bin data. [sampling in X ray astronomy
NASA Technical Reports Server (NTRS)
Nousek, John A.; Shue, David R.
1989-01-01
Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.
The Utility of Robust Means in Statistics
ERIC Educational Resources Information Center
Goodwyn, Fara
2012-01-01
Location estimates calculated from heuristic data were examined using traditional and robust statistical methods. The current paper demonstrates the impact outliers have on the sample mean and proposes robust methods to control for outliers in sample data. Traditional methods fail because they rely on the statistical assumptions of normality and…
The Role of the Sampling Distribution in Understanding Statistical Inference
ERIC Educational Resources Information Center
Lipson, Kay
2003-01-01
Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…
Illustrating Sampling Distribution of a Statistic: Minitab Revisited
ERIC Educational Resources Information Center
Johnson, H. Dean; Evans, Marc A.
2008-01-01
Understanding the concept of the sampling distribution of a statistic is essential for the understanding of inferential procedures. Unfortunately, this topic proves to be a stumbling block for students in introductory statistics classes. In efforts to aid students in their understanding of this concept, alternatives to a lecture-based mode of…
STATISTICAL ANALYSIS OF TANK 18F FLOOR SAMPLE RESULTS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harris, S.
2010-09-02
Representative sampling has been completed for characterization of the residual material on the floor of Tank 18F as per the statistical sampling plan developed by Shine [1]. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL [2]. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples resultsmore » [3] to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL{sub 95%}) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 18F. The uncertainty is quantified in this report by an upper 95% confidence limit (UCL{sub 95%}) on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL{sub 95%} was based entirely on the six current scrape sample results (each averaged across three analytical determinations).« less
STATISTICAL ANALYSIS OF TANK 19F FLOOR SAMPLE RESULTS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harris, S.
2010-09-02
Representative sampling has been completed for characterization of the residual material on the floor of Tank 19F as per the statistical sampling plan developed by Harris and Shine. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples resultsmore » to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL95%) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current scrape sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 19F. The uncertainty is quantified in this report by an UCL95% on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL95% was based entirely on the six current scrape sample results (each averaged across three analytical determinations).« less
Mohammed A. Kalkhan; Robin M. Reich; Raymond L. Czaplewski
1996-01-01
A Monte Carlo simulation was used to evaluate the statistical properties of measures of association and the Kappa statistic under double sampling with replacement. Three error matrices representing three levels of classification accuracy of Landsat TM Data consisting of four forest cover types in North Carolina. The overall accuracy of the five indices ranged from 0.35...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Fangyan; Zhang, Song; Chung Wong, Pak
Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, themore » size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.« less
Comparative Financial Statistics for Public Two-Year Colleges: FY 1993 Peer Group Sample.
ERIC Educational Resources Information Center
Dickmeyer, Nathan; Meeker, Bradley
Comparative financial information derived from a national sample of 516 two-year colleges is presented in this report for fiscal year 1992-93, including statistics for the national sample and for six peer groups. The report's nine sections focus on: (1) introductory information about the study's background, objectives, and sample; the National…
ERIC Educational Resources Information Center
Beeman, Jennifer Leigh Sloan
2013-01-01
Research has found that students successfully complete an introductory course in statistics without fully comprehending the underlying theory or being able to exhibit statistical reasoning. This is particularly true for the understanding about the sampling distribution of the mean, a crucial concept for statistical inference. This study…
ERIC Educational Resources Information Center
Nitko, Anthony J.; Hsu, Tse-chi
Item analysis procedures appropriate for domain-referenced classroom testing are described. A conceptual framework within which item statistics can be considered and promising statistics in light of this framework are presented. The sampling fluctuations of the more promising item statistics for sample sizes comparable to the typical classroom…
Improving Statistics Education through Simulations: The Case of the Sampling Distribution.
ERIC Educational Resources Information Center
Earley, Mark A.
This paper presents a summary of action research investigating statistics students' understandings of the sampling distribution of the mean. With four sections of an introductory Statistics in Education course (n=98 students), a computer simulation activity (R. delMas, J. Garfield, and B. Chance, 1999) was implemented and evaluated to show…
Designing Intervention Studies: Selected Populations, Range Restrictions, and Statistical Power
Miciak, Jeremy; Taylor, W. Pat; Stuebing, Karla K.; Fletcher, Jack M.; Vaughn, Sharon
2016-01-01
An appropriate estimate of statistical power is critical for the design of intervention studies. Although the inclusion of a pretest covariate in the test of the primary outcome can increase statistical power, samples selected on the basis of pretest performance may demonstrate range restriction on the selection measure and other correlated measures. This can result in attenuated pretest-posttest correlations, reducing the variance explained by the pretest covariate. We investigated the implications of two potential range restriction scenarios: direct truncation on a selection measure and indirect range restriction on correlated measures. Empirical and simulated data indicated direct range restriction on the pretest covariate greatly reduced statistical power and necessitated sample size increases of 82%–155% (dependent on selection criteria) to achieve equivalent statistical power to parameters with unrestricted samples. However, measures demonstrating indirect range restriction required much smaller sample size increases (32%–71%) under equivalent scenarios. Additional analyses manipulated the correlations between measures and pretest-posttest correlations to guide planning experiments. Results highlight the need to differentiate between selection measures and potential covariates and to investigate range restriction as a factor impacting statistical power. PMID:28479943
Designing Intervention Studies: Selected Populations, Range Restrictions, and Statistical Power.
Miciak, Jeremy; Taylor, W Pat; Stuebing, Karla K; Fletcher, Jack M; Vaughn, Sharon
2016-01-01
An appropriate estimate of statistical power is critical for the design of intervention studies. Although the inclusion of a pretest covariate in the test of the primary outcome can increase statistical power, samples selected on the basis of pretest performance may demonstrate range restriction on the selection measure and other correlated measures. This can result in attenuated pretest-posttest correlations, reducing the variance explained by the pretest covariate. We investigated the implications of two potential range restriction scenarios: direct truncation on a selection measure and indirect range restriction on correlated measures. Empirical and simulated data indicated direct range restriction on the pretest covariate greatly reduced statistical power and necessitated sample size increases of 82%-155% (dependent on selection criteria) to achieve equivalent statistical power to parameters with unrestricted samples. However, measures demonstrating indirect range restriction required much smaller sample size increases (32%-71%) under equivalent scenarios. Additional analyses manipulated the correlations between measures and pretest-posttest correlations to guide planning experiments. Results highlight the need to differentiate between selection measures and potential covariates and to investigate range restriction as a factor impacting statistical power.
Lin, Yu-Pin; Chu, Hone-Jay; Huang, Yu-Long; Tang, Chia-Hsi; Rouhani, Shahrokh
2011-06-01
This study develops a stratified conditional Latin hypercube sampling (scLHS) approach for multiple, remotely sensed, normalized difference vegetation index (NDVI) images. The objective is to sample, monitor, and delineate spatiotemporal landscape changes, including spatial heterogeneity and variability, in a given area. The scLHS approach, which is based on the variance quadtree technique (VQT) and the conditional Latin hypercube sampling (cLHS) method, selects samples in order to delineate landscape changes from multiple NDVI images. The images are then mapped for calibration and validation by using sequential Gaussian simulation (SGS) with the scLHS selected samples. Spatial statistical results indicate that in terms of their statistical distribution, spatial distribution, and spatial variation, the statistics and variograms of the scLHS samples resemble those of multiple NDVI images more closely than those of cLHS and VQT samples. Moreover, the accuracy of simulated NDVI images based on SGS with scLHS samples is significantly better than that of simulated NDVI images based on SGS with cLHS samples and VQT samples, respectively. However, the proposed approach efficiently monitors the spatial characteristics of landscape changes, including the statistics, spatial variability, and heterogeneity of NDVI images. In addition, SGS with the scLHS samples effectively reproduces spatial patterns and landscape changes in multiple NDVI images.
2018-01-01
ABSTRACT To assess phenotypic bacterial antimicrobial resistance (AMR) in different strata (e.g., host populations, environmental areas, manure, or sewage effluents) for epidemiological purposes, isolates of target bacteria can be obtained from a stratum using various sample types. Also, different sample processing methods can be applied. The MIC of each target antimicrobial drug for each isolate is measured. Statistical equivalence testing of the MIC data for the isolates allows evaluation of whether different sample types or sample processing methods yield equivalent estimates of the bacterial antimicrobial susceptibility in the stratum. We demonstrate this approach on the antimicrobial susceptibility estimates for (i) nontyphoidal Salmonella spp. from ground or trimmed meat versus cecal content samples of cattle in processing plants in 2013-2014 and (ii) nontyphoidal Salmonella spp. from urine, fecal, and blood human samples in 2015 (U.S. National Antimicrobial Resistance Monitoring System data). We found that the sample types for cattle yielded nonequivalent susceptibility estimates for several antimicrobial drug classes and thus may gauge distinct subpopulations of salmonellae. The quinolone and fluoroquinolone susceptibility estimates for nontyphoidal salmonellae from human blood are nonequivalent to those from urine or feces, conjecturally due to the fluoroquinolone (ciprofloxacin) use to treat infections caused by nontyphoidal salmonellae. We also demonstrate statistical equivalence testing for comparing sample processing methods for fecal samples (culturing one versus multiple aliquots per sample) to assess AMR in fecal Escherichia coli. These methods yield equivalent results, except for tetracyclines. Importantly, statistical equivalence testing provides the MIC difference at which the data from two sample types or sample processing methods differ statistically. Data users (e.g., microbiologists and epidemiologists) may then interpret practical relevance of the difference. IMPORTANCE Bacterial antimicrobial resistance (AMR) needs to be assessed in different populations or strata for the purposes of surveillance and determination of the efficacy of interventions to halt AMR dissemination. To assess phenotypic antimicrobial susceptibility, isolates of target bacteria can be obtained from a stratum using different sample types or employing different sample processing methods in the laboratory. The MIC of each target antimicrobial drug for each of the isolates is measured, yielding the MIC distribution across the isolates from each sample type or sample processing method. We describe statistical equivalence testing for the MIC data for evaluating whether two sample types or sample processing methods yield equivalent estimates of the bacterial phenotypic antimicrobial susceptibility in the stratum. This includes estimating the MIC difference at which the data from the two approaches differ statistically. Data users (e.g., microbiologists, epidemiologists, and public health professionals) can then interpret whether that present difference is practically relevant. PMID:29475868
Shakeri, Heman; Volkova, Victoriya; Wen, Xuesong; Deters, Andrea; Cull, Charley; Drouillard, James; Müller, Christian; Moradijamei, Behnaz; Jaberi-Douraki, Majid
2018-05-01
To assess phenotypic bacterial antimicrobial resistance (AMR) in different strata (e.g., host populations, environmental areas, manure, or sewage effluents) for epidemiological purposes, isolates of target bacteria can be obtained from a stratum using various sample types. Also, different sample processing methods can be applied. The MIC of each target antimicrobial drug for each isolate is measured. Statistical equivalence testing of the MIC data for the isolates allows evaluation of whether different sample types or sample processing methods yield equivalent estimates of the bacterial antimicrobial susceptibility in the stratum. We demonstrate this approach on the antimicrobial susceptibility estimates for (i) nontyphoidal Salmonella spp. from ground or trimmed meat versus cecal content samples of cattle in processing plants in 2013-2014 and (ii) nontyphoidal Salmonella spp. from urine, fecal, and blood human samples in 2015 (U.S. National Antimicrobial Resistance Monitoring System data). We found that the sample types for cattle yielded nonequivalent susceptibility estimates for several antimicrobial drug classes and thus may gauge distinct subpopulations of salmonellae. The quinolone and fluoroquinolone susceptibility estimates for nontyphoidal salmonellae from human blood are nonequivalent to those from urine or feces, conjecturally due to the fluoroquinolone (ciprofloxacin) use to treat infections caused by nontyphoidal salmonellae. We also demonstrate statistical equivalence testing for comparing sample processing methods for fecal samples (culturing one versus multiple aliquots per sample) to assess AMR in fecal Escherichia coli These methods yield equivalent results, except for tetracyclines. Importantly, statistical equivalence testing provides the MIC difference at which the data from two sample types or sample processing methods differ statistically. Data users (e.g., microbiologists and epidemiologists) may then interpret practical relevance of the difference. IMPORTANCE Bacterial antimicrobial resistance (AMR) needs to be assessed in different populations or strata for the purposes of surveillance and determination of the efficacy of interventions to halt AMR dissemination. To assess phenotypic antimicrobial susceptibility, isolates of target bacteria can be obtained from a stratum using different sample types or employing different sample processing methods in the laboratory. The MIC of each target antimicrobial drug for each of the isolates is measured, yielding the MIC distribution across the isolates from each sample type or sample processing method. We describe statistical equivalence testing for the MIC data for evaluating whether two sample types or sample processing methods yield equivalent estimates of the bacterial phenotypic antimicrobial susceptibility in the stratum. This includes estimating the MIC difference at which the data from the two approaches differ statistically. Data users (e.g., microbiologists, epidemiologists, and public health professionals) can then interpret whether that present difference is practically relevant. Copyright © 2018 Shakeri et al.
Statistical literacy and sample survey results
NASA Astrophysics Data System (ADS)
McAlevey, Lynn; Sullivan, Charles
2010-10-01
Sample surveys are widely used in the social sciences and business. The news media almost daily quote from them, yet they are widely misused. Using students with prior managerial experience embarking on an MBA course, we show that common sample survey results are misunderstood even by those managers who have previously done a statistics course. In general, they fare no better than managers who have never studied statistics. There are implications for teaching, especially in business schools, as well as for consulting.
Nomogram for sample size calculation on a straightforward basis for the kappa statistic.
Hong, Hyunsook; Choi, Yunhee; Hahn, Seokyung; Park, Sue Kyung; Park, Byung-Joo
2014-09-01
Kappa is a widely used measure of agreement. However, it may not be straightforward in some situation such as sample size calculation due to the kappa paradox: high agreement but low kappa. Hence, it seems reasonable in sample size calculation that the level of agreement under a certain marginal prevalence is considered in terms of a simple proportion of agreement rather than a kappa value. Therefore, sample size formulae and nomograms using a simple proportion of agreement rather than a kappa under certain marginal prevalences are proposed. A sample size formula was derived using the kappa statistic under the common correlation model and goodness-of-fit statistic. The nomogram for the sample size formula was developed using SAS 9.3. The sample size formulae using a simple proportion of agreement instead of a kappa statistic and nomograms to eliminate the inconvenience of using a mathematical formula were produced. A nomogram for sample size calculation with a simple proportion of agreement should be useful in the planning stages when the focus of interest is on testing the hypothesis of interobserver agreement involving two raters and nominal outcome measures. Copyright © 2014 Elsevier Inc. All rights reserved.
FUNSTAT and statistical image representations
NASA Technical Reports Server (NTRS)
Parzen, E.
1983-01-01
General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.
Statistical scaling of geometric characteristics in stochastically generated pore microstructures
Hyman, Jeffrey D.; Guadagnini, Alberto; Winter, C. Larrabee
2015-05-21
In this study, we analyze the statistical scaling of structural attributes of virtual porous microstructures that are stochastically generated by thresholding Gaussian random fields. Characterization of the extent at which randomly generated pore spaces can be considered as representative of a particular rock sample depends on the metrics employed to compare the virtual sample against its physical counterpart. Typically, comparisons against features and/patterns of geometric observables, e.g., porosity and specific surface area, flow-related macroscopic parameters, e.g., permeability, or autocorrelation functions are used to assess the representativeness of a virtual sample, and thereby the quality of the generation method. Here, wemore » rely on manifestations of statistical scaling of geometric observables which were recently observed in real millimeter scale rock samples [13] as additional relevant metrics by which to characterize a virtual sample. We explore the statistical scaling of two geometric observables, namely porosity (Φ) and specific surface area (SSA), of porous microstructures generated using the method of Smolarkiewicz and Winter [42] and Hyman and Winter [22]. Our results suggest that the method can produce virtual pore space samples displaying the symptoms of statistical scaling observed in real rock samples. Order q sample structure functions (statistical moments of absolute increments) of Φ and SSA scale as a power of the separation distance (lag) over a range of lags, and extended self-similarity (linear relationship between log structure functions of successive orders) appears to be an intrinsic property of the generated media. The width of the range of lags where power-law scaling is observed and the Hurst coefficient associated with the variables we consider can be controlled by the generation parameters of the method.« less
NASA Technical Reports Server (NTRS)
Fisher, Brad; Wolff, David B.
2010-01-01
Passive and active microwave rain sensors onboard earth-orbiting satellites estimate monthly rainfall from the instantaneous rain statistics collected during satellite overpasses. It is well known that climate-scale rain estimates from meteorological satellites incur sampling errors resulting from the process of discrete temporal sampling and statistical averaging. Sampling and retrieval errors ultimately become entangled in the estimation of the mean monthly rain rate. The sampling component of the error budget effectively introduces statistical noise into climate-scale rain estimates that obscure the error component associated with the instantaneous rain retrieval. Estimating the accuracy of the retrievals on monthly scales therefore necessitates a decomposition of the total error budget into sampling and retrieval error quantities. This paper presents results from a statistical evaluation of the sampling and retrieval errors for five different space-borne rain sensors on board nine orbiting satellites. Using an error decomposition methodology developed by one of the authors, sampling and retrieval errors were estimated at 0.25 resolution within 150 km of ground-based weather radars located at Kwajalein, Marshall Islands and Melbourne, Florida. Error and bias statistics were calculated according to the land, ocean and coast classifications of the surface terrain mask developed for the Goddard Profiling (GPROF) rain algorithm. Variations in the comparative error statistics are attributed to various factors related to differences in the swath geometry of each rain sensor, the orbital and instrument characteristics of the satellite and the regional climatology. The most significant result from this study found that each of the satellites incurred negative longterm oceanic retrieval biases of 10 to 30%.
Developing Sampling Frame for Case Study: Challenges and Conditions
ERIC Educational Resources Information Center
Ishak, Noriah Mohd; Abu Bakar, Abu Yazid
2014-01-01
Due to statistical analysis, the issue of random sampling is pertinent to any quantitative study. Unlike quantitative study, the elimination of inferential statistical analysis, allows qualitative researchers to be more creative in dealing with sampling issue. Since results from qualitative study cannot be generalized to the bigger population,…
How Large Should a Statistical Sample Be?
ERIC Educational Resources Information Center
Menil, Violeta C.; Ye, Ruili
2012-01-01
This study serves as a teaching aid for teachers of introductory statistics. The aim of this study was limited to determining various sample sizes when estimating population proportion. Tables on sample sizes were generated using a C[superscript ++] program, which depends on population size, degree of precision or error level, and confidence…
Repeated Random Sampling in Year 5
ERIC Educational Resources Information Center
Watson, Jane M.; English, Lyn D.
2016-01-01
As an extension to an activity introducing Year 5 students to the practice of statistics, the software "TinkerPlots" made it possible to collect repeated random samples from a finite population to informally explore students' capacity to begin reasoning with a distribution of sample statistics. This article provides background for the…
Challenging Conventional Wisdom for Multivariate Statistical Models with Small Samples
ERIC Educational Resources Information Center
McNeish, Daniel
2017-01-01
In education research, small samples are common because of financial limitations, logistical challenges, or exploratory studies. With small samples, statistical principles on which researchers rely do not hold, leading to trust issues with model estimates and possible replication issues when scaling up. Researchers are generally aware of such…
Comparative Financial Statistics for Public Two-Year Colleges: FY 1991 Peer Groups Sample.
ERIC Educational Resources Information Center
Dickmeyer, Nathan; Cirino, Anna Marie
Comparative financial information, derived from two national surveys of 503 public two-year colleges, is presented in this report for fiscal year (FY) 1990-91. The report includes statistics for the national sample and six peer groups, space for colleges to compare their institutional statistics with national and peer groups, and tables, bar…
[A comparison of convenience sampling and purposive sampling].
Suen, Lee-Jen Wu; Huang, Hui-Man; Lee, Hao-Hsien
2014-06-01
Convenience sampling and purposive sampling are two different sampling methods. This article first explains sampling terms such as target population, accessible population, simple random sampling, intended sample, actual sample, and statistical power analysis. These terms are then used to explain the difference between "convenience sampling" and purposive sampling." Convenience sampling is a non-probabilistic sampling technique applicable to qualitative or quantitative studies, although it is most frequently used in quantitative studies. In convenience samples, subjects more readily accessible to the researcher are more likely to be included. Thus, in quantitative studies, opportunity to participate is not equal for all qualified individuals in the target population and study results are not necessarily generalizable to this population. As in all quantitative studies, increasing the sample size increases the statistical power of the convenience sample. In contrast, purposive sampling is typically used in qualitative studies. Researchers who use this technique carefully select subjects based on study purpose with the expectation that each participant will provide unique and rich information of value to the study. As a result, members of the accessible population are not interchangeable and sample size is determined by data saturation not by statistical power analysis.
Statistics for characterizing data on the periphery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Theiler, James P; Hush, Donald R
2010-01-01
We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.
NASA Astrophysics Data System (ADS)
Sergeenko, N. P.
2017-11-01
An adequate statistical method should be developed in order to predict probabilistically the range of ionospheric parameters. This problem is solved in this paper. The time series of the critical frequency of the layer F2- foF2( t) were subjected to statistical processing. For the obtained samples {δ foF2}, statistical distributions and invariants up to the fourth order are calculated. The analysis shows that the distributions differ from the Gaussian law during the disturbances. At levels of sufficiently small probability distributions, there are arbitrarily large deviations from the model of the normal process. Therefore, it is attempted to describe statistical samples {δ foF2} based on the Poisson model. For the studied samples, the exponential characteristic function is selected under the assumption that time series are a superposition of some deterministic and random processes. Using the Fourier transform, the characteristic function is transformed into a nonholomorphic excessive-asymmetric probability-density function. The statistical distributions of the samples {δ foF2} calculated for the disturbed periods are compared with the obtained model distribution function. According to the Kolmogorov's criterion, the probabilities of the coincidence of a posteriori distributions with the theoretical ones are P 0.7-0.9. The conducted analysis makes it possible to draw a conclusion about the applicability of a model based on the Poisson random process for the statistical description and probabilistic variation estimates during heliogeophysical disturbances of the variations {δ foF2}.
42 CFR 1003.109 - Notice of proposed determination.
Code of Federal Regulations, 2010 CFR
2010-10-01
... briefly describe the statistical sampling technique utilized by the Inspector General); (3) The reason why... statistical sampling in accordance with § 1003.133 in which case the notice shall describe those claims and...
11 CFR 9036.4 - Commission review of submissions.
Code of Federal Regulations, 2010 CFR
2010-01-01
..., in conducting its review, may utilize statistical sampling techniques. Based on the results of its... nonmatchable and the reason that it is not matchable; or if statistical sampling is used, the estimated amount...
Cortes, Aneg L; Montiel, Enrique R; Gimeno, Isabel M
2009-12-01
The use of Flinders Technology Associates (FTA) filter cards to quantify Marek's disease virus (MDV) DNA for the diagnosis of Marek's disease (MD) and to monitor MD vaccines was evaluated. Samples of blood (43), solid tumors (14), and feather pulp (FP; 36) collected fresh and in FTA cards were analyzed. MDV DNA load was quantified by real-time PCR. Threshold cycle (Ct) ratios were calculated for each sample by dividing the Ct value of the internal control gene (glyceraldehyde-3-phosphate dehydrogenase) by the Ct value of the MDV gene. Statistically significant correlation (P < 0.05) within Ct ratios was detected between samples collected fresh and in FTA cards by using Pearson's correlation test. Load of serotype 1 MDV DNA was quantified in 24 FP, 14 solid tumor, and 43 blood samples. There was a statistically significant correlation between FP (r = 0.95), solid tumor (r = 0.94), and blood (r = 0.9) samples collected fresh and in FTA cards. Load of serotype 2 MDV DNA was quantified in 17 FP samples, and the correlation between samples collected fresh and in FTA cards was also statistically significant (Pearson's coefficient, r = 0.96); load of serotype 3 MDV DNA was quantified in 36 FP samples, and correlation between samples taken fresh and in FTA cards was also statistically significant (r = 0.84). MDV DNA samples extracted 3 days (t0) and 8 months after collection (t1) were used to evaluate the stability of MDV DNA in archived samples collected in FTA cards. A statistically significant correlation was found for serotype 1 (r = 0.96), serotype 2 (r = 1), and serotype 3 (r = 0.9). The results show that FTA cards are an excellent media to collect, transport, and archive samples for MD diagnosis and to monitor MD vaccines. In addition, FTA cards are widely available, inexpensive, and adequate for the shipment of samples nationally and internationally.
Hyatt, M.W.; Hubert, W.A.
2001-01-01
We assessed relative weight (Wr) distributions among 291 samples of stock-to-quality-length brook trout Salvelinus fontinalis, brown trout Salmo trutta, rainbow trout Oncorhynchus mykiss, and cutthroat trout O. clarki from lentic and lotic habitats. Statistics describing Wr sample distributions varied slightly among species and habitat types. The average sample was leptokurtotic and slightly skewed to the right with a standard deviation of about 10, but the shapes of Wr distributions varied widely among samples. Twenty-two percent of the samples had nonnormal distributions, suggesting the need to evaluate sample distributions before applying statistical tests to determine whether assumptions are met. In general, our findings indicate that samples of about 100 stock-to-quality-length fish are needed to obtain confidence interval widths of four Wr units around the mean. Power analysis revealed that samples of about 50 stock-to-quality-length fish are needed to detect a 2% change in mean Wr at a relatively high level of power (beta = 0.01, alpha = 0.05).
Statistical Literacy and Sample Survey Results
ERIC Educational Resources Information Center
McAlevey, Lynn; Sullivan, Charles
2010-01-01
Sample surveys are widely used in the social sciences and business. The news media almost daily quote from them, yet they are widely misused. Using students with prior managerial experience embarking on an MBA course, we show that common sample survey results are misunderstood even by those managers who have previously done a statistics course. In…
A method for determining the weak statistical stationarity of a random process
NASA Technical Reports Server (NTRS)
Sadeh, W. Z.; Koper, C. A., Jr.
1978-01-01
A method for determining the weak statistical stationarity of a random process is presented. The core of this testing procedure consists of generating an equivalent ensemble which approximates a true ensemble. Formation of an equivalent ensemble is accomplished through segmenting a sufficiently long time history of a random process into equal, finite, and statistically independent sample records. The weak statistical stationarity is ascertained based on the time invariance of the equivalent-ensemble averages. Comparison of these averages with their corresponding time averages over a single sample record leads to a heuristic estimate of the ergodicity of a random process. Specific variance tests are introduced for evaluating the statistical independence of the sample records, the time invariance of the equivalent-ensemble autocorrelations, and the ergodicity. Examination and substantiation of these procedures were conducted utilizing turbulent velocity signals.
Olives, Casey; Valadez, Joseph J; Brooker, Simon J; Pagano, Marcello
2012-01-01
Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%), and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n=15 and n=25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n=15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools.
42 CFR 402.7 - Notice of proposed determination.
Code of Federal Regulations, 2010 CFR
2010-10-01
... and a brief description of the statistical sampling technique CMS or OIG used. (3) The reason why the... is relying upon statistical sampling to project the number and types of claims or requests for...
ERIC Educational Resources Information Center
Bellera, Carine A.; Julien, Marilyse; Hanley, James A.
2010-01-01
The Wilcoxon statistics are usually taught as nonparametric alternatives for the 1- and 2-sample Student-"t" statistics in situations where the data appear to arise from non-normal distributions, or where sample sizes are so small that we cannot check whether they do. In the past, critical values, based on exact tail areas, were…
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"
ERIC Educational Resources Information Center
Ozturk, Elif
2012-01-01
The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
Characterizations of linear sufficient statistics
NASA Technical Reports Server (NTRS)
Peters, B. C., Jr.; Reoner, R.; Decell, H. P., Jr.
1977-01-01
A surjective bounded linear operator T from a Banach space X to a Banach space Y must be a sufficient statistic for a dominated family of probability measures defined on the Borel sets of X. These results were applied, so that they characterize linear sufficient statistics for families of the exponential type, including as special cases the Wishart and multivariate normal distributions. The latter result was used to establish precisely which procedures for sampling from a normal population had the property that the sample mean was a sufficient statistic.
The large sample size fallacy.
Lantz, Björn
2013-06-01
Significance in the statistical sense has little to do with significance in the common practical sense. Statistical significance is a necessary but not a sufficient condition for practical significance. Hence, results that are extremely statistically significant may be highly nonsignificant in practice. The degree of practical significance is generally determined by the size of the observed effect, not the p-value. The results of studies based on large samples are often characterized by extreme statistical significance despite small or even trivial effect sizes. Interpreting such results as significant in practice without further analysis is referred to as the large sample size fallacy in this article. The aim of this article is to explore the relevance of the large sample size fallacy in contemporary nursing research. Relatively few nursing articles display explicit measures of observed effect sizes or include a qualitative discussion of observed effect sizes. Statistical significance is often treated as an end in itself. Effect sizes should generally be calculated and presented along with p-values for statistically significant results, and observed effect sizes should be discussed qualitatively through direct and explicit comparisons with the effects in related literature. © 2012 Nordic College of Caring Science.
Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression.
Chen, Yanguang
2016-01-01
In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson's statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran's index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China's regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test.
Statistical considerations for agroforestry studies
James A. Baldwin
1993-01-01
Statistical topics that related to agroforestry studies are discussed. These included study objectives, populations of interest, sampling schemes, sample sizes, estimation vs. hypothesis testing, and P-values. In addition, a relatively new and very much improved histogram display is described.
ERIC Educational Resources Information Center
Noll, Jennifer; Hancock, Stacey
2015-01-01
This research investigates what students' use of statistical language can tell us about their conceptions of distribution and sampling in relation to informal inference. Prior research documents students' challenges in understanding ideas of distribution and sampling as tools for making informal statistical inferences. We know that these…
Comparative Financial Statistics for Public Two-Year Colleges: FY 1995 National Sample.
ERIC Educational Resources Information Center
Meeker, Bradley
Based on responses by 405 public two-year colleges in the United States to 2 surveys, this report provides comparative financial information for fiscal year 1994-95. The report provides space for colleges to compare their institutional statistics with national sample medians, quartile data for the national sample, and tables and graphs of…
Comparative Financial Statistics for Public Two-Year Colleges: FY 1994 National Sample.
ERIC Educational Resources Information Center
Dickmeyer, Nathan; Meeker, Bradley
Based on responses by 427 public two-year colleges in the United States to two surveys, this report provides comparative financial information for fiscal year 1993-94. The report provides space for colleges to compare their institutional statistics with national sample medians, quartile data for the national sample, and tables and graphs of…
USDA-ARS?s Scientific Manuscript database
Small, coded, pill-sized tracers embedded in grain are proposed as a method for grain traceability. A sampling process for a grain traceability system was designed and investigated by applying probability statistics using a science-based sampling approach to collect an adequate number of tracers fo...
People Patterns: Statistics. Environmental Module for Use in a Mathematics Laboratory Setting.
ERIC Educational Resources Information Center
Zastrocky, Michael; Trojan, Arthur
This module on statistics consists of 18 worksheets that cover such topics as sample spaces, mean, median, mode, taking samples, posting results, analyzing data, and graphing. The last four worksheets require the students to work with samples and use these to compare people's responses. A computer dating service is one result of this work.…
Efficient statistical tests to compare Youden index: accounting for contingency correlation.
Chen, Fangyao; Xue, Yuqiang; Tan, Ming T; Chen, Pingyan
2015-04-30
Youden index is widely utilized in studies evaluating accuracy of diagnostic tests and performance of predictive, prognostic, or risk models. However, both one and two independent sample tests on Youden index have been derived ignoring the dependence (association) between sensitivity and specificity, resulting in potentially misleading findings. Besides, paired sample test on Youden index is currently unavailable. This article develops efficient statistical inference procedures for one sample, independent, and paired sample tests on Youden index by accounting for contingency correlation, namely associations between sensitivity and specificity and paired samples typically represented in contingency tables. For one and two independent sample tests, the variances are estimated by Delta method, and the statistical inference is based on the central limit theory, which are then verified by bootstrap estimates. For paired samples test, we show that the estimated covariance of the two sensitivities and specificities can be represented as a function of kappa statistic so the test can be readily carried out. We then show the remarkable accuracy of the estimated variance using a constrained optimization approach. Simulation is performed to evaluate the statistical properties of the derived tests. The proposed approaches yield more stable type I errors at the nominal level and substantially higher power (efficiency) than does the original Youden's approach. Therefore, the simple explicit large sample solution performs very well. Because we can readily implement the asymptotic and exact bootstrap computation with common software like R, the method is broadly applicable to the evaluation of diagnostic tests and model performance. Copyright © 2015 John Wiley & Sons, Ltd.
2015-07-24
Business Office Manual at the six MTFs reviewed. Based on the statistical sample, there were 144,930 claims worth $34.8 million that had at least...19 Parameters ______________________________________________________________________________ 19 Statistical ...the UBO Manual at the six MTFs reviewed. Based on the statistical sample, there were 144,930 claims worth $34.8 million that had at least one
Valid statistical inference methods for a case-control study with missing data.
Tian, Guo-Liang; Zhang, Chi; Jiang, Xuejun
2018-04-01
The main objective of this paper is to derive the valid sampling distribution of the observed counts in a case-control study with missing data under the assumption of missing at random by employing the conditional sampling method and the mechanism augmentation method. The proposed sampling distribution, called the case-control sampling distribution, can be used to calculate the standard errors of the maximum likelihood estimates of parameters via the Fisher information matrix and to generate independent samples for constructing small-sample bootstrap confidence intervals. Theoretical comparisons of the new case-control sampling distribution with two existing sampling distributions exhibit a large difference. Simulations are conducted to investigate the influence of the three different sampling distributions on statistical inferences. One finding is that the conclusion by the Wald test for testing independency under the two existing sampling distributions could be completely different (even contradictory) from the Wald test for testing the equality of the success probabilities in control/case groups under the proposed distribution. A real cervical cancer data set is used to illustrate the proposed statistical methods.
Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu
2015-01-01
Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing equivalent from nonequivalent cell populations. FlowMap‐FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F‐measure of 0.88 was obtained, indicating high precision and recall of the FR‐based population matching results. FlowMap‐FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © 2015 International Society for Advancement of Cytometry PMID:26274018
Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu; Scheuermann, Richard H
2016-01-01
Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell populations. FlowMap-FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F-measure of 0.88 was obtained, indicating high precision and recall of the FR-based population matching results. FlowMap-FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © The Authors. Published by Wiley Periodicals, Inc. on behalf of ISAC.
Does size matter? Statistical limits of paleomagnetic field reconstruction from small rock specimens
NASA Astrophysics Data System (ADS)
Berndt, Thomas; Muxworthy, Adrian R.; Fabian, Karl
2016-01-01
As samples of ever decreasing sizes are being studied paleomagnetically, care has to be taken that the underlying assumptions of statistical thermodynamics (Maxwell-Boltzmann statistics) are being met. Here we determine how many grains and how large a magnetic moment a sample needs to have to be able to accurately record an ambient field. It is found that for samples with a thermoremanent magnetic moment larger than 10-11Am2 the assumption of a sufficiently large number of grains is usually given. Standard 25 mm diameter paleomagnetic samples usually contain enough magnetic grains such that statistical errors are negligible, but "single silicate crystal" works on, for example, zircon, plagioclase, and olivine crystals are approaching the limits of what is physically possible, leading to statistic errors in both the angular deviation and paleointensity that are comparable to other sources of error. The reliability of nanopaleomagnetic imaging techniques capable of resolving individual grains (used, for example, to study the cloudy zone in meteorites), however, is questionable due to the limited area of the material covered.
A weighted generalized score statistic for comparison of predictive values of diagnostic tests.
Kosinski, Andrzej S
2013-03-15
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.
A weighted generalized score statistic for comparison of predictive values of diagnostic tests
Kosinski, Andrzej S.
2013-01-01
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations which are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we present, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic which incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, it always reduces to the score statistic in the independent samples situation, and it preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the weighted generalized score test statistic in a general GEE setting. PMID:22912343
Statistical inference for tumor growth inhibition T/C ratio.
Wu, Jianrong
2010-09-01
The tumor growth inhibition T/C ratio is commonly used to quantify treatment effects in drug screening tumor xenograft experiments. The T/C ratio is converted to an antitumor activity rating using an arbitrary cutoff point and often without any formal statistical inference. Here, we applied a nonparametric bootstrap method and a small sample likelihood ratio statistic to make a statistical inference of the T/C ratio, including both hypothesis testing and a confidence interval estimate. Furthermore, sample size and power are also discussed for statistical design of tumor xenograft experiments. Tumor xenograft data from an actual experiment were analyzed to illustrate the application.
The Math Problem: Advertising Students' Attitudes toward Statistics
ERIC Educational Resources Information Center
Fullerton, Jami A.; Kendrick, Alice
2013-01-01
This study used the Students' Attitudes toward Statistics Scale (STATS) to measure attitude toward statistics among a national sample of advertising students. A factor analysis revealed four underlying factors make up the attitude toward statistics construct--"Interest & Future Applicability," "Confidence," "Statistical Tools," and "Initiative."…
Chaibub Neto, Elias
2015-01-01
In this paper we propose a vectorized implementation of the non-parametric bootstrap for statistics based on sample moments. Basically, we adopt the multinomial sampling formulation of the non-parametric bootstrap, and compute bootstrap replications of sample moment statistics by simply weighting the observed data according to multinomial counts instead of evaluating the statistic on a resampled version of the observed data. Using this formulation we can generate a matrix of bootstrap weights and compute the entire vector of bootstrap replications with a few matrix multiplications. Vectorization is particularly important for matrix-oriented programming languages such as R, where matrix/vector calculations tend to be faster than scalar operations implemented in a loop. We illustrate the application of the vectorized implementation in real and simulated data sets, when bootstrapping Pearson’s sample correlation coefficient, and compared its performance against two state-of-the-art R implementations of the non-parametric bootstrap, as well as a straightforward one based on a for loop. Our investigations spanned varying sample sizes and number of bootstrap replications. The vectorized bootstrap compared favorably against the state-of-the-art implementations in all cases tested, and was remarkably/considerably faster for small/moderate sample sizes. The same results were observed in the comparison with the straightforward implementation, except for large sample sizes, where the vectorized bootstrap was slightly slower than the straightforward implementation due to increased time expenditures in the generation of weight matrices via multinomial sampling. PMID:26125965
40 CFR 90.712 - Request for public hearing.
Code of Federal Regulations, 2010 CFR
2010-07-01
... sampling plans and statistical analyses have been properly applied (specifically, whether sampling procedures and statistical analyses specified in this subpart were followed and whether there exists a basis... Clerk and will be made available to the public during Agency business hours. ...
USDA-ARS?s Scientific Manuscript database
Whether a required Salmonella test series is passed or failed depends not only on the presence of the bacteria, but also on the methods for taking samples, the methods for culturing samples, and the statistics associated with the sampling plan. The pass-fail probabilities of the two-class attribute...
ERIC Educational Resources Information Center
Lunsford, M. Leigh; Rowell, Ginger Holmes; Goodson-Espy, Tracy
2006-01-01
We applied a classroom research model to investigate student understanding of sampling distributions of sample means and the Central Limit Theorem in post-calculus introductory probability and statistics courses. Using a quantitative assessment tool developed by previous researchers and a qualitative assessment tool developed by the authors, we…
[The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].
Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel
2017-01-01
The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.
Robust functional statistics applied to Probability Density Function shape screening of sEMG data.
Boudaoud, S; Rix, H; Al Harrach, M; Marin, F
2014-01-01
Recent studies pointed out possible shape modifications of the Probability Density Function (PDF) of surface electromyographical (sEMG) data according to several contexts like fatigue and muscle force increase. Following this idea, criteria have been proposed to monitor these shape modifications mainly using High Order Statistics (HOS) parameters like skewness and kurtosis. In experimental conditions, these parameters are confronted with small sample size in the estimation process. This small sample size induces errors in the estimated HOS parameters restraining real-time and precise sEMG PDF shape monitoring. Recently, a functional formalism, the Core Shape Model (CSM), has been used to analyse shape modifications of PDF curves. In this work, taking inspiration from CSM method, robust functional statistics are proposed to emulate both skewness and kurtosis behaviors. These functional statistics combine both kernel density estimation and PDF shape distances to evaluate shape modifications even in presence of small sample size. Then, the proposed statistics are tested, using Monte Carlo simulations, on both normal and Log-normal PDFs that mimic observed sEMG PDF shape behavior during muscle contraction. According to the obtained results, the functional statistics seem to be more robust than HOS parameters to small sample size effect and more accurate in sEMG PDF shape screening applications.
Testing for independence in J×K contingency tables with complex sample survey data.
Lipsitz, Stuart R; Fitzmaurice, Garrett M; Sinha, Debajyoti; Hevelone, Nathanael; Giovannucci, Edward; Hu, Jim C
2015-09-01
The test of independence of row and column variables in a (J×K) contingency table is a widely used statistical test in many areas of application. For complex survey samples, use of the standard Pearson chi-squared test is inappropriate due to correlation among units within the same cluster. Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) proposed an approach in which the standard Pearson chi-squared statistic is multiplied by a design effect to adjust for the complex survey design. Unfortunately, this test fails to exist when one of the observed cell counts equals zero. Even with the large samples typical of many complex surveys, zero cell counts can occur for rare events, small domains, or contingency tables with a large number of cells. Here, we propose Wald and score test statistics for independence based on weighted least squares estimating equations. In contrast to the Rao-Scott test statistic, the proposed Wald and score test statistics always exist. In simulations, the score test is found to perform best with respect to type I error. The proposed method is motivated by, and applied to, post surgical complications data from the United States' Nationwide Inpatient Sample (NIS) complex survey of hospitals in 2008. © 2015, The International Biometric Society.
Visual Sample Plan Version 7.0 User's Guide
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matzke, Brett D.; Newburn, Lisa LN; Hathaway, John E.
2014-03-01
User's guide for VSP 7.0 This user's guide describes Visual Sample Plan (VSP) Version 7.0 and provides instructions for using the software. VSP selects the appropriate number and location of environmental samples to ensure that the results of statistical tests performed to provide input to risk decisions have the required confidence and performance. VSP Version 7.0 provides sample-size equations or algorithms needed by specific statistical tests appropriate for specific environmental sampling objectives. It also provides data quality assessment and statistical analysis functions to support evaluation of the data and determine whether the data support decisions regarding sites suspected of contamination.more » The easy-to-use program is highly visual and graphic. VSP runs on personal computers with Microsoft Windows operating systems (XP, Vista, Windows 7, and Windows 8). Designed primarily for project managers and users without expertise in statistics, VSP is applicable to two- and three-dimensional populations to be sampled (e.g., rooms and buildings, surface soil, a defined layer of subsurface soil, water bodies, and other similar applications) for studies of environmental quality. VSP is also applicable for designing sampling plans for assessing chem/rad/bio threat and hazard identification within rooms and buildings, and for designing geophysical surveys for unexploded ordnance (UXO) identification.« less
Effect of the absolute statistic on gene-sampling gene-set analysis methods.
Nam, Dougu
2017-06-01
Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.
NASA Astrophysics Data System (ADS)
Ghannadpour, Seyyed Saeed; Hezarkhani, Ardeshir
2016-03-01
The U-statistic method is one of the most important structural methods to separate the anomaly from the background. It considers the location of samples and carries out the statistical analysis of the data without judging from a geochemical point of view and tries to separate subpopulations and determine anomalous areas. In the present study, to use U-statistic method in three-dimensional (3D) condition, U-statistic is applied on the grade of two ideal test examples, by considering sample Z values (elevation). So far, this is the first time that this method has been applied on a 3D condition. To evaluate the performance of 3D U-statistic method and in order to compare U-statistic with one non-structural method, the method of threshold assessment based on median and standard deviation (MSD method) is applied on the two example tests. Results show that the samples indicated by U-statistic method as anomalous are more regular and involve less dispersion than those indicated by the MSD method. So that, according to the location of anomalous samples, denser areas of them can be determined as promising zones. Moreover, results show that at a threshold of U = 0, the total error of misclassification for U-statistic method is much smaller than the total error of criteria of bar {x}+n× s. Finally, 3D model of two test examples for separating anomaly from background using 3D U-statistic method is provided. The source code for a software program, which was developed in the MATLAB programming language in order to perform the calculations of the 3D U-spatial statistic method, is additionally provided. This software is compatible with all the geochemical varieties and can be used in similar exploration projects.
Image correlation and sampling study
NASA Technical Reports Server (NTRS)
Popp, D. J.; Mccormack, D. S.; Sedwick, J. L.
1972-01-01
The development of analytical approaches for solving image correlation and image sampling of multispectral data is discussed. Relevant multispectral image statistics which are applicable to image correlation and sampling are identified. The general image statistics include intensity mean, variance, amplitude histogram, power spectral density function, and autocorrelation function. The translation problem associated with digital image registration and the analytical means for comparing commonly used correlation techniques are considered. General expressions for determining the reconstruction error for specific image sampling strategies are developed.
Robin M. Reich; Hans T. Schreuder
2006-01-01
The sampling strategy involving both statistical and in-place inventory information is presented for the natural resources project of the Green Belt area (Centuron Verde) in the Mexican state of Jalisco. The sampling designs used were a grid based ground sample of a 90x90 m plot and a two-stage stratified sample of 30 x 30 m plots. The data collected were used to...
Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression
Chen, Yanguang
2016-01-01
In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson’s statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran’s index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China’s regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test. PMID:26800271
Kumar, R Vinoth; Rajvikram, N; Rajakumar, P; Saravanan, R; Deepak, V Arun; Vijaykumar, V
2016-03-01
The aim of this study was to evaluate the release of nickel and chromium ions in human saliva during fixed orthodontic therapy. Ten patients with Angle's Class-I malocclusion with bimaxillary protrusion without any metal restorations or crowns and with all the permanent teeth were selected. Five male patients and five female patients in the age group range of 14 to 23 years were scheduled for orthodontic treatment with first premolar extraction. Saliva samples were collected in three stages: sample 1, before orthodontic treatment; sample 2, after 10 days of bonding sample; and sample 3, after 1 month of bonding. The samples were analyzed for the following metals nickel and chromium using inductively coupled plasma optical emission spectrometry (ICP-OES). The levels of nickel and chromium were statistically significant, while nickel showed a gradual increase in the first 10 days and a decline thereafter. Chromium showed a gradual increase and was statistically significant on the 30th day. There was greatest release of ions during the first 10 days and a gradual decline thereafter. Control group had traces of nickel and chromium. While comparing levels of nickel in saliva, there was a significant rise from baseline to 10th and 30th-day sample, which was statistically significant. While comparing 10th day to that of 30th day, there was no statistical significance. The levels of chromium ion in the saliva were more in 30th day, and when comparing 10th-day sample with 30th day, there was statistical significance. Nickel and chromium levels were well within the permissible levels. However, some hypersensitive individuals may be allergic to this minimal permissible level.
Olives, Casey; Valadez, Joseph J.; Brooker, Simon J.; Pagano, Marcello
2012-01-01
Background Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%), and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. Methodology We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n = 15 and n = 25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Principle Findings Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n = 15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. Conclusion/Significance This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools. PMID:22970333
2015-05-12
Deficiencies That Affect the Reliability of Estimates ________________________________________6 Statistical Precision Could Be Improved... statistical precision of improper payments estimates in seven of the DoD payment programs through the use of stratified sample designs. DoD improper...payments not subject to sampling, which made the results statistically invalid. We made a recommendation to correct this problem in a previous report;4
Statistical Inference on Memory Structure of Processes and Its Applications to Information Theory
2016-05-12
valued times series from a sample. (A practical algorithm to compute the estimator is a work in progress.) Third, finitely-valued spatial processes...ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 mathematical statistics; time series ; Markov chains; random...proved. Second, a statistical method is developed to estimate the memory depth of discrete- time and continuously-valued times series from a sample. (A
Validation of Statistical Sampling Algorithms in Visual Sample Plan (VSP): Summary Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nuffer, Lisa L; Sego, Landon H.; Wilson, John E.
2009-02-18
The U.S. Department of Homeland Security, Office of Technology Development (OTD) contracted with a set of U.S. Department of Energy national laboratories, including the Pacific Northwest National Laboratory (PNNL), to write a Remediation Guidance for Major Airports After a Chemical Attack. The report identifies key activities and issues that should be considered by a typical major airport following an incident involving release of a toxic chemical agent. Four experimental tasks were identified that would require further research in order to supplement the Remediation Guidance. One of the tasks, Task 4, OTD Chemical Remediation Statistical Sampling Design Validation, dealt with statisticalmore » sampling algorithm validation. This report documents the results of the sampling design validation conducted for Task 4. In 2005, the Government Accountability Office (GAO) performed a review of the past U.S. responses to Anthrax terrorist cases. Part of the motivation for this PNNL report was a major GAO finding that there was a lack of validated sampling strategies in the U.S. response to Anthrax cases. The report (GAO 2005) recommended that probability-based methods be used for sampling design in order to address confidence in the results, particularly when all sample results showed no remaining contamination. The GAO also expressed a desire that the methods be validated, which is the main purpose of this PNNL report. The objective of this study was to validate probability-based statistical sampling designs and the algorithms pertinent to within-building sampling that allow the user to prescribe or evaluate confidence levels of conclusions based on data collected as guided by the statistical sampling designs. Specifically, the designs found in the Visual Sample Plan (VSP) software were evaluated. VSP was used to calculate the number of samples and the sample location for a variety of sampling plans applied to an actual release site. Most of the sampling designs validated are probability based, meaning samples are located randomly (or on a randomly placed grid) so no bias enters into the placement of samples, and the number of samples is calculated such that IF the amount and spatial extent of contamination exceeds levels of concern, at least one of the samples would be taken from a contaminated area, at least X% of the time. Hence, "validation" of the statistical sampling algorithms is defined herein to mean ensuring that the "X%" (confidence) is actually met.« less
Intuitive statistics by 8-month-old infants
Xu, Fei; Garcia, Vashti
2008-01-01
Human learners make inductive inferences based on small amounts of data: we generalize from samples to populations and vice versa. The academic discipline of statistics formalizes these intuitive statistical inferences. What is the origin of this ability? We report six experiments investigating whether 8-month-old infants are “intuitive statisticians.” Our results showed that, given a sample, the infants were able to make inferences about the population from which the sample had been drawn. Conversely, given information about the entire population of relatively small size, the infants were able to make predictions about the sample. Our findings provide evidence that infants possess a powerful mechanism for inductive learning, either using heuristics or basic principles of probability. This ability to make inferences based on samples or information about the population develops early and in the absence of schooling or explicit teaching. Human infants may be rational learners from very early in development. PMID:18378901
Statistical inference involving binomial and negative binomial parameters.
García-Pérez, Miguel A; Núñez-Antón, Vicente
2009-05-01
Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.
Sampling and counting genome rearrangement scenarios
2015-01-01
Background Even for moderate size inputs, there are a tremendous number of optimal rearrangement scenarios, regardless what the model is and which specific question is to be answered. Therefore giving one optimal solution might be misleading and cannot be used for statistical inferring. Statistically well funded methods are necessary to sample uniformly from the solution space and then a small number of samples are sufficient for statistical inferring. Contribution In this paper, we give a mini-review about the state-of-the-art of sampling and counting rearrangement scenarios, focusing on the reversal, DCJ and SCJ models. Above that, we also give a Gibbs sampler for sampling most parsimonious labeling of evolutionary trees under the SCJ model. The method has been implemented and tested on real life data. The software package together with example data can be downloaded from http://www.renyi.hu/~miklosi/SCJ-Gibbs/ PMID:26452124
Apes are intuitive statisticians.
Rakoczy, Hannes; Clüver, Annette; Saucke, Liane; Stoffregen, Nicole; Gräbener, Alice; Migura, Judith; Call, Josep
2014-04-01
Inductive learning and reasoning, as we use it both in everyday life and in science, is characterized by flexible inferences based on statistical information: inferences from populations to samples and vice versa. Many forms of such statistical reasoning have been found to develop late in human ontogeny, depending on formal education and language, and to be fragile even in adults. New revolutionary research, however, suggests that even preverbal human infants make use of intuitive statistics. Here, we conducted the first investigation of such intuitive statistical reasoning with non-human primates. In a series of 7 experiments, Bonobos, Chimpanzees, Gorillas and Orangutans drew flexible statistical inferences from populations to samples. These inferences, furthermore, were truly based on statistical information regarding the relative frequency distributions in a population, and not on absolute frequencies. Intuitive statistics in its most basic form is thus an evolutionarily more ancient rather than a uniquely human capacity. Copyright © 2014 Elsevier B.V. All rights reserved.
Managing Clustered Data Using Hierarchical Linear Modeling
ERIC Educational Resources Information Center
Warne, Russell T.; Li, Yan; McKyer, E. Lisako J.; Condie, Rachel; Diep, Cassandra S.; Murano, Peter S.
2012-01-01
Researchers in nutrition research often use cluster or multistage sampling to gather participants for their studies. These sampling methods often produce violations of the assumption of data independence that most traditional statistics share. Hierarchical linear modeling is a statistical method that can overcome violations of the independence…
Computer program uses Monte Carlo techniques for statistical system performance analysis
NASA Technical Reports Server (NTRS)
Wohl, D. P.
1967-01-01
Computer program with Monte Carlo sampling techniques determines the effect of a component part of a unit upon the overall system performance. It utilizes the full statistics of the disturbances and misalignments of each component to provide unbiased results through simulated random sampling.
NASA Technical Reports Server (NTRS)
Colarco, P. R.; Kahn, R. A.; Remer, L. A.; Levy, R. C.
2014-01-01
We use the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite aerosol optical thickness (AOT) product to assess the impact of reduced swath width on global and regional AOT statistics and trends. Alongtrack and across-track sampling strategies are employed, in which the full MODIS data set is sub-sampled with various narrow-swath (approximately 400-800 km) and single pixel width (approximately 10 km) configurations. Although view-angle artifacts in the MODIS AOT retrieval confound direct comparisons between averages derived from different sub-samples, careful analysis shows that with many portions of the Earth essentially unobserved, spatial sampling introduces uncertainty in the derived seasonal-regional mean AOT. These AOT spatial sampling artifacts comprise up to 60%of the full-swath AOT value under moderate aerosol loading, and can be as large as 0.1 in some regions under high aerosol loading. Compared to full-swath observations, narrower swath and single pixel width sampling exhibits a reduced ability to detect AOT trends with statistical significance. On the other hand, estimates of the global, annual mean AOT do not vary significantly from the full-swath values as spatial sampling is reduced. Aggregation of the MODIS data at coarse grid scales (10 deg) shows consistency in the aerosol trends across sampling strategies, with increased statistical confidence, but quantitative errors in the derived trends are found even for the full-swath data when compared to high spatial resolution (0.5 deg) aggregations. Using results of a model-derived aerosol reanalysis, we find consistency in our conclusions about a seasonal-regional spatial sampling artifact in AOT Furthermore, the model shows that reduced spatial sampling can amount to uncertainty in computed shortwave top-ofatmosphere aerosol radiative forcing of 2-3 W m(sup-2). These artifacts are lower bounds, as possibly other unconsidered sampling strategies would perform less well. These results suggest that future aerosol satellite missions having significantly less than full-swath viewing are unlikely to sample the true AOT distribution well enough to obtain the statistics needed to reduce uncertainty in aerosol direct forcing of climate.
A Bibliography of Statistical Applications in Geography, Technical Paper No. 9.
ERIC Educational Resources Information Center
Greer-Wootten, Bryn; And Others
Included in this bibliography are resource materials available to both college instructors and students on statistical applications in geographic research. Two stages of statistical development are treated in the bibliography. They are 1) descriptive statistics, in which the sample is the focus of interest, and 2) analytical statistics, in which…
Use of Statistical Heuristics in Everyday Inductive Reasoning.
ERIC Educational Resources Information Center
Nisbett, Richard E.; And Others
1983-01-01
In everyday reasoning, people use statistical heuristics (judgmental tools that are rough intuitive equivalents of statistical principles). Use of statistical heuristics is more likely when (1) sampling is clear, (2) the role of chance is clear, (3) statistical reasoning is normative for the event, or (4) the subject has had training in…
NASA Astrophysics Data System (ADS)
Cabalín, L. M.; González, A.; Ruiz, J.; Laserna, J. J.
2010-08-01
Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s - 1 . Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.
Using Electronic Data Interchange to Report Product Quality
1993-03-01
Numbers 0 31.1 S........................ . . . . ........... .... . .--- . ... N/U 140 SPS Sampling Parameters for Summary Statistics 0 1 N/U 150 REF...DTM Date/Time Reference 0 1 N/U 190 REF Reference Numbers 021 .................................. .......... .. ... NAU 200 STA Statistics 0 1 N/U 210...Measurements 0 1 N/U 120 DTM Date/Time Reference 0 >1 N/U 130 REF Reference Numbers 0 >1 :LOOIV f-SPS N/U 140 SPS Sampling Parameters for Summary Statistics 0 1
Statistical scaling of pore-scale Lagrangian velocities in natural porous media.
Siena, M; Guadagnini, A; Riva, M; Bijeljic, B; Pereira Nunes, J P; Blunt, M J
2014-08-01
We investigate the scaling behavior of sample statistics of pore-scale Lagrangian velocities in two different rock samples, Bentheimer sandstone and Estaillades limestone. The samples are imaged using x-ray computer tomography with micron-scale resolution. The scaling analysis relies on the study of the way qth-order sample structure functions (statistical moments of order q of absolute increments) of Lagrangian velocities depend on separation distances, or lags, traveled along the mean flow direction. In the sandstone block, sample structure functions of all orders exhibit a power-law scaling within a clearly identifiable intermediate range of lags. Sample structure functions associated with the limestone block display two diverse power-law regimes, which we infer to be related to two overlapping spatially correlated structures. In both rocks and for all orders q, we observe linear relationships between logarithmic structure functions of successive orders at all lags (a phenomenon that is typically known as extended power scaling, or extended self-similarity). The scaling behavior of Lagrangian velocities is compared with the one exhibited by porosity and specific surface area, which constitute two key pore-scale geometric observables. The statistical scaling of the local velocity field reflects the behavior of these geometric observables, with the occurrence of power-law-scaling regimes within the same range of lags for sample structure functions of Lagrangian velocity, porosity, and specific surface area.
NASA Astrophysics Data System (ADS)
Nickles, C.; Zhao, Y.; Beighley, E.; Durand, M. T.; David, C. H.; Lee, H.
2017-12-01
The Surface Water and Ocean Topography (SWOT) satellite mission is jointly developed by NASA, the French space agency (CNES), with participation from the Canadian and UK space agencies to serve both the hydrology and oceanography communities. The SWOT mission will sample global surface water extents and elevations (lakes/reservoirs, rivers, estuaries, oceans, sea and land ice) at a finer spatial resolution than is currently possible enabling hydrologic discovery, model advancements and new applications that are not currently possible or likely even conceivable. Although the mission will provide global cover, analysis and interpolation of the data generated from the irregular space/time sampling represents a significant challenge. In this study, we explore the applicability of the unique space/time sampling for understanding river discharge dynamics throughout the Ohio River Basin. River network topology, SWOT sampling (i.e., orbit and identified SWOT river reaches) and spatial interpolation concepts are used to quantify the fraction of effective sampling of river reaches each day of the three-year mission. Streamflow statistics for SWOT generated river discharge time series are compared to continuous daily river discharge series. Relationships are presented to transform SWOT generated streamflow statistics to equivalent continuous daily discharge time series statistics intended to support hydrologic applications using low-flow and annual flow duration statistics.
Thomas, Elaine
2005-01-01
This article is the second in a series of three that will give health care professionals (HCPs) a sound introduction to medical statistics (Thomas, 2004). The objective of research is to find out about the population at large. However, it is generally not possible to study the whole of the population and research questions are addressed in an appropriate study sample. The next crucial step is then to use the information from the sample of individuals to make statements about the wider population of like individuals. This procedure of drawing conclusions about the population, based on study data, is known as inferential statistics. The findings from the study give us the best estimate of what is true for the relevant population, given the sample is representative of the population. It is important to consider how accurate this best estimate is, based on a single sample, when compared to the unknown population figure. Any difference between the observed sample result and the population characteristic is termed the sampling error. This article will cover the two main forms of statistical inference (hypothesis tests and estimation) along with issues that need to be addressed when considering the implications of the study results. Copyright (c) 2005 Whurr Publishers Ltd.
Some connections between importance sampling and enhanced sampling methods in molecular dynamics.
Lie, H C; Quer, J
2017-11-21
In molecular dynamics, enhanced sampling methods enable the collection of better statistics of rare events from a reference or target distribution. We show that a large class of these methods is based on the idea of importance sampling from mathematical statistics. We illustrate this connection by comparing the Hartmann-Schütte method for rare event simulation (J. Stat. Mech. Theor. Exp. 2012, P11004) and the Valsson-Parrinello method of variationally enhanced sampling [Phys. Rev. Lett. 113, 090601 (2014)]. We use this connection in order to discuss how recent results from the Monte Carlo methods literature can guide the development of enhanced sampling methods.
Some connections between importance sampling and enhanced sampling methods in molecular dynamics
NASA Astrophysics Data System (ADS)
Lie, H. C.; Quer, J.
2017-11-01
In molecular dynamics, enhanced sampling methods enable the collection of better statistics of rare events from a reference or target distribution. We show that a large class of these methods is based on the idea of importance sampling from mathematical statistics. We illustrate this connection by comparing the Hartmann-Schütte method for rare event simulation (J. Stat. Mech. Theor. Exp. 2012, P11004) and the Valsson-Parrinello method of variationally enhanced sampling [Phys. Rev. Lett. 113, 090601 (2014)]. We use this connection in order to discuss how recent results from the Monte Carlo methods literature can guide the development of enhanced sampling methods.
NASA Astrophysics Data System (ADS)
Pries, V. V.; Proskuriakov, N. E.
2018-04-01
To control the assembly quality of multi-element mass-produced products on automatic rotor lines, control methods with operational feedback are required. However, due to possible failures in the operation of the devices and systems of automatic rotor line, there is always a real probability of getting defective (incomplete) products into the output process stream. Therefore, a continuous sampling control of the products completeness, based on the use of statistical methods, remains an important element in managing the quality of assembly of multi-element mass products on automatic rotor lines. The feature of continuous sampling control of the multi-element products completeness in the assembly process is its breaking sort, which excludes the possibility of returning component parts after sampling control to the process stream and leads to a decrease in the actual productivity of the assembly equipment. Therefore, the use of statistical procedures for continuous sampling control of the multi-element products completeness when assembled on automatic rotor lines requires the use of such sampling plans that ensure a minimum size of control samples. Comparison of the values of the limit of the average output defect level for the continuous sampling plan (CSP) and for the automated continuous sampling plan (ACSP) shows the possibility of providing lower limit values for the average output defects level using the ACSP-1. Also, the average sample size when using the ACSP-1 plan is less than when using the CSP-1 plan. Thus, the application of statistical methods in the assembly quality management of multi-element products on automatic rotor lines, involving the use of proposed plans and methods for continuous selective control, will allow to automating sampling control procedures and the required level of quality of assembled products while minimizing sample size.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kyle, Jennifer E.; Casey, Cameron P.; Stratton, Kelly G.
The use of dried blood spots (DBS) has many advantages over traditional plasma and serum samples such as smaller blood volume required, storage at room temperature, and ability for sampling in remote locations. However, understanding the robustness of different analytes in DBS samples is essential, especially in older samples collected for longitudinal studies. Here we analyzed DBS samples collected in 2000-2001 and stored at room temperature and compared them to matched serum samples stored at -80°C to determine if they could be effectively used as specific time points in a longitudinal study following metabolic disease. Four hundred small molecules weremore » identified in both the serum and DBS samples using gas chromatograph-mass spectrometry (GC-MS), liquid chromatography-MS (LC-MS) and LC-ion mobility spectrometry-MS (LC-IMS-MS). The identified polar metabolites overlapped well between the sample types, though only one statistically significant polar metabolite in a case-control study was conserved, indicating degradation occurs in the DBS samples affecting quantitation. Differences in the lipid identifications indicated that some oxidation occurs in the DBS samples. However, thirty-six statistically significant lipids correlated in both sample types indicating that lipid quantitation was more stable across the sample types.« less
NASA Technical Reports Server (NTRS)
Bell, Thomas L.; Kundu, Prasun K.; Einaudi, Franco (Technical Monitor)
2000-01-01
Estimates from TRMM satellite data of monthly total rainfall over an area are subject to substantial sampling errors due to the limited number of visits to the area by the satellite during the month. Quantitative comparisons of TRMM averages with data collected by other satellites and by ground-based systems require some estimate of the size of this sampling error. A method of estimating this sampling error based on the actual statistics of the TRMM observations and on some modeling work has been developed. "Sampling error" in TRMM monthly averages is defined here relative to the monthly total a hypothetical satellite permanently stationed above the area would have reported. "Sampling error" therefore includes contributions from the random and systematic errors introduced by the satellite remote sensing system. As part of our long-term goal of providing error estimates for each grid point accessible to the TRMM instruments, sampling error estimates for TRMM based on rain retrievals from TRMM microwave (TMI) data are compared for different times of the year and different oceanic areas (to minimize changes in the statistics due to algorithmic differences over land and ocean). Changes in sampling error estimates due to changes in rain statistics due 1) to evolution of the official algorithms used to process the data, and 2) differences from other remote sensing systems such as the Defense Meteorological Satellite Program (DMSP) Special Sensor Microwave/Imager (SSM/I), are analyzed.
Examples of Data Analysis with SPSS-X.
ERIC Educational Resources Information Center
MacFarland, Thomas W.
Intended for classroom use only, these unpublished notes contain computer lessons on descriptive statistics using SPSS-X Release 3.0 for VAX/UNIX. Statistical measures covered include Chi-square analysis; Spearman's rank correlation coefficient; Student's t-test with two independent samples; Student's t-test with a paired sample; One-way analysis…
Characterizing and Improving Distributed Intrusion Detection Systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hurd, Steven A; Proebstel, Elliot P.
2007-11-01
Due to ever-increasing quantities of information traversing networks, network administrators are developing greater reliance upon statistically sampled packet information as the source for their intrusion detection systems (IDS). Our research is aimed at understanding IDS performance when statistical packet sampling is used. Using the Snort IDS and a variety of data sets, we compared IDS results when an entire data set is used to the results when a statistically sampled subset of the data set is used. Generally speaking, IDS performance with statistically sampled information was shown to drop considerably even under fairly high sampling rates (such as 1:5). Characterizingmore » and Improving Distributed Intrusion Detection Systems4AcknowledgementsThe authors wish to extend our gratitude to Matt Bishop and Chen-Nee Chuah of UC Davis for their guidance and support on this work. Our thanks are also extended to Jianning Mai of UC Davis and Tao Ye of Sprint Advanced Technology Labs for their generous assistance.We would also like to acknowledge our dataset sources, CRAWDAD and CAIDA, without which this work would not have been possible. Support for OC48 data collection is provided by DARPA, NSF, DHS, Cisco and CAIDA members.« less
Across-cohort QC analyses of GWAS summary statistics from complex traits.
Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M
2016-01-01
Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics F st statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.
Across-cohort QC analyses of GWAS summary statistics from complex traits
Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M
2017-01-01
Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics Fst statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy. PMID:27552965
ERIC Educational Resources Information Center
Lovett, Jennifer N.; Lee, Hollylynne S.
2017-01-01
Amid the implementation of new curriculum standard regarding statistics and new recommendations for preservice secondary mathematics teachers [PSMTs] to teach statistics, there is a need to examine the current state of PSMTs' common statistical knowledge. This study reports on the statistical knowledge 217 PSMTs from a purposeful sample of 18…
Asquith, William H.; Barbie, Dana L.
2014-01-01
Selected summary statistics (L-moments) and estimates of respective sampling variances were computed for the 35 streamgages lacking statistically significant trends. From the L-moments and estimated sampling variances, weighted means or regional values were computed for each L-moment. An example application is included demonstrating how the L-moments could be used to evaluate the magnitude and frequency of annual mean streamflow.
Broët, Philippe; Tsodikov, Alexander; De Rycke, Yann; Moreau, Thierry
2004-06-01
This paper presents two-sample statistics suited for testing equality of survival functions against improper semi-parametric accelerated failure time alternatives. These tests are designed for comparing either the short- or the long-term effect of a prognostic factor, or both. These statistics are obtained as partial likelihood score statistics from a time-dependent Cox model. As a consequence, the proposed tests can be very easily implemented using widely available software. A breast cancer clinical trial is presented as an example to demonstrate the utility of the proposed tests.
ProUCL version 4.1.00 Documentation Downloads
ProUCL version 4.1.00 represents a comprehensive statistical software package equipped with statistical methods and graphical tools needed to address many environmental sampling and statistical issues as described in various these guidance documents.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arutyunyan, R.V.; Bol`shov, L.A.; Vasil`ev, S.K.
1994-06-01
The objective of this study was to clarify a number of issues related to the spatial distribution of contaminants from the Chernobyl accident. The effects of local statistics were addressed by collecting and analyzing (for Cesium 137) soil samples from a number of regions, and it was found that sample activity differed by a factor of 3-5. The effect of local non-uniformity was estimated by modeling the distribution of the average activity of a set of five samples for each of the regions, with the spread in the activities for a {+-}2 range being equal to 25%. The statistical characteristicsmore » of the distribution of contamination were then analyzed and found to be a log-normal distribution with the standard deviation being a function of test area. All data for the Bryanskaya Oblast area were analyzed statistically and were adequately described by a log-normal function.« less
Explorations in Statistics: the Bootstrap
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2009-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This fourth installment of Explorations in Statistics explores the bootstrap. The bootstrap gives us an empirical approach to estimate the theoretical variability among possible values of a sample statistic such as the…
Applied statistics in ecology: common pitfalls and simple solutions
E. Ashley Steel; Maureen C. Kennedy; Patrick G. Cunningham; John S. Stanovick
2013-01-01
The most common statistical pitfalls in ecological research are those associated with data exploration, the logic of sampling and design, and the interpretation of statistical results. Although one can find published errors in calculations, the majority of statistical pitfalls result from incorrect logic or interpretation despite correct numerical calculations. There...
7 CFR 800.86 - Inspection of shiplot, unit train, and lash barge grain in single lots.
Code of Federal Regulations, 2010 CFR
2010-01-01
... prescribed in the instructions. (b) Application procedure. Applications for the official inspection of... statistical acceptance sampling and inspection plan according to the provisions of this section and procedures... inspection as part of a single lot and accepted by a statistical acceptance sampling and inspection plan...
Sadeghi, Fatemeh; Nasseri, Simin; Mosaferi, Mohammad; Nabizadeh, Ramin; Yunesian, Masud; Mesdaghinia, Alireza
2017-05-01
In this research, probable arsenic contamination in drinking water in the city of Ardabil was studied in 163 samples during four seasons. In each season, sampling was carried out randomly in the study area. Results were analyzed statistically applying SPSS 19 software, and the data was also modeled by Arc GIS 10.1 software. The maximum permissible arsenic concentration in drinking water defined by the World Health Organization and Iranian national standard is 10 μg/L. Statistical analysis showed 75, 88, 47, and 69% of samples in autumn, winter, spring, and summer, respectively, had concentrations higher than the national standard. The mean concentrations of arsenic in autumn, winter, spring, and summer were 19.89, 15.9, 10.87, and 14.6 μg/L, respectively, and the overall average in all samples through the year was 15.32 μg/L. Although GIS outputs indicated that the concentration distribution profiles changed in four consecutive seasons, variance analysis of the results showed that statistically there is no significant difference in arsenic levels in four seasons.
Szyda, Joanna; Liu, Zengting; Zatoń-Dobrowolska, Magdalena; Wierzbicki, Heliodor; Rzasa, Anna
2008-01-01
We analysed data from a selective DNA pooling experiment with 130 individuals of the arctic fox (Alopex lagopus), which originated from 2 different types regarding body size. The association between alleles of 6 selected unlinked molecular markers and body size was tested by using univariate and multinomial logistic regression models, applying odds ratio and test statistics from the power divergence family. Due to the small sample size and the resulting sparseness of the data table, in hypothesis testing we could not rely on the asymptotic distributions of the tests. Instead, we tried to account for data sparseness by (i) modifying confidence intervals of odds ratio; (ii) using a normal approximation of the asymptotic distribution of the power divergence tests with different approaches for calculating moments of the statistics; and (iii) assessing P values empirically, based on bootstrap samples. As a result, a significant association was observed for 3 markers. Furthermore, we used simulations to assess the validity of the normal approximation of the asymptotic distribution of the test statistics under the conditions of small and sparse samples.
Lindsey, Bruce D.; Rupert, Michael G.
2012-01-01
Decadal-scale changes in groundwater quality were evaluated by the U.S. Geological Survey National Water-Quality Assessment (NAWQA) Program. Samples of groundwater collected from wells during 1988-2000 - a first sampling event representing the decade ending the 20th century - were compared on a pair-wise basis to samples from the same wells collected during 2001-2010 - a second sampling event representing the decade beginning the 21st century. The data set consists of samples from 1,236 wells in 56 well networks, representing major aquifers and urban and agricultural land-use areas, with analytical results for chloride, dissolved solids, and nitrate. Statistical analysis was done on a network basis rather than by individual wells. Although spanning slightly more or less than a 10-year period, the two-sample comparison between the first and second sampling events is referred to as an analysis of decadal-scale change based on a step-trend analysis. The 22 principal aquifers represented by these 56 networks account for nearly 80 percent of the estimated withdrawals of groundwater used for drinking-water supply in the Nation. Well networks where decadal-scale changes in concentrations were statistically significant were identified using the Wilcoxon-Pratt signed-rank test. For the statistical analysis of chloride, dissolved solids, and nitrate concentrations at the network level, more than half revealed no statistically significant change over the decadal period. However, for networks that had statistically significant changes, increased concentrations outnumbered decreased concentrations by a large margin. Statistically significant increases of chloride concentrations were identified for 43 percent of 56 networks. Dissolved solids concentrations increased significantly in 41 percent of the 54 networks with dissolved solids data, and nitrate concentrations increased significantly in 23 percent of 56 networks. At least one of the three - chloride, dissolved solids, or nitrate - had a statistically significant increase in concentration in 66 percent of the networks. Statistically significant decreases in concentrations were identified in 4 percent of the networks for chloride, 2 percent of the networks for dissolved solids, and 9 percent of the networks for nitrate. A larger percentage of urban land-use networks had statistically significant increases in chloride, dissolved solids, and nitrate concentrations than agricultural land-use networks. In order to assess the magnitude of statistically significant changes, the median of the differences between constituent concentrations from the first full-network sampling event and those from the second full-network sampling event was calculated using the Turnbull method. The largest median decadal increases in chloride concentrations were in networks in the Upper Illinois River Basin (67 mg/L) and in the New England Coastal Basins (34 mg/L), whereas the largest median decadal decrease in chloride concentrations was in the Upper Snake River Basin (1 mg/L). The largest median decadal increases in dissolved solids concentrations were in networks in the Rio Grande Valley (260 mg/L) and the Upper Illinois River Basin (160 mg/L). The largest median decadal decrease in dissolved solids concentrations was in the Apalachicola-Chattahoochee-Flint River Basin (6.0 mg/L). The largest median decadal increases in nitrate as nitrogen (N) concentrations were in networks in the South Platte River Basin (2.0 mg/L as N) and the San Joaquin-Tulare Basins (1.0 mg/L as N). The largest median decadal decrease in nitrate concentrations was in the Santee River Basin and Coastal Drainages (0.63 mg/L). The magnitude of change in networks with statistically significant increases typically was much larger than the magnitude of change in networks with statistically significant decreases. The magnitude of change was greatest for chloride in the urban land-use networks and greatest for dissolved solids and nitrate in the agricultural land-use networks. Analysis of data from all networks combined indicated statistically significant increases for chloride, dissolved solids, and nitrate. Although chloride, dissolved solids, and nitrate concentrations were typically less than the drinking-water standards and guidelines, a statistical test was used to determine whether or not the proportion of samples exceeding the drinking-water standard or guideline changed significantly between the first and second full-network sampling events. The proportion of samples exceeding the U.S. Environmental Protection Agency (USEPA) Secondary Maximum Contaminant Level for dissolved solids (500 milligrams per liter) increased significantly between the first and second full-network sampling events when evaluating all networks combined at the national level. Also, for all networks combined, the proportion of samples exceeding the USEPA Maximum Contaminant Level (MCL) of 10 mg/L as N for nitrate increased significantly. One network in the Delmarva Peninsula had a significant increase in the proportion of samples exceeding the MCL for nitrate. A subset of 261 wells was sampled every other year (biennially) to evaluate decadal-scale changes using a time-series analysis. The analysis of the biennial data set showed that changes were generally similar to the findings from the analysis of decadal-scale change that was based on a step-trend analysis. Because of the small number of wells in a network with biennial data (typically 4-5 wells), the time-series analysis is more useful for understanding water-quality responses to changes in site-specific conditions rather than as an indicator of the change for the entire network.
Approximate sample size formulas for the two-sample trimmed mean test with unequal variances.
Luh, Wei-Ming; Guo, Jiin-Huarng
2007-05-01
Yuen's two-sample trimmed mean test statistic is one of the most robust methods to apply when variances are heterogeneous. The present study develops formulas for the sample size required for the test. The formulas are applicable for the cases of unequal variances, non-normality and unequal sample sizes. Given the specified alpha and the power (1-beta), the minimum sample size needed by the proposed formulas under various conditions is less than is given by the conventional formulas. Moreover, given a specified size of sample calculated by the proposed formulas, simulation results show that Yuen's test can achieve statistical power which is generally superior to that of the approximate t test. A numerical example is provided.
Shih, Weichung Joe; Li, Gang; Wang, Yining
2016-03-01
Sample size plays a crucial role in clinical trials. Flexible sample-size designs, as part of the more general category of adaptive designs that utilize interim data, have been a popular topic in recent years. In this paper, we give a comparative review of four related methods for such a design. The likelihood method uses the likelihood ratio test with an adjusted critical value. The weighted method adjusts the test statistic with given weights rather than the critical value. The dual test method requires both the likelihood ratio statistic and the weighted statistic to be greater than the unadjusted critical value. The promising zone approach uses the likelihood ratio statistic with the unadjusted value and other constraints. All four methods preserve the type-I error rate. In this paper we explore their properties and compare their relationships and merits. We show that the sample size rules for the dual test are in conflict with the rules of the promising zone approach. We delineate what is necessary to specify in the study protocol to ensure the validity of the statistical procedure and what can be kept implicit in the protocol so that more flexibility can be attained for confirmatory phase III trials in meeting regulatory requirements. We also prove that under mild conditions, the likelihood ratio test still preserves the type-I error rate when the actual sample size is larger than the re-calculated one. Copyright © 2015 Elsevier Inc. All rights reserved.
Wu, Baolin
2006-02-15
Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.
ERIC Educational Resources Information Center
O'Bryant, Monique J.
2017-01-01
The aim of this study was to validate an instrument that can be used by instructors or social scientist who are interested in evaluating statistics anxiety. The psychometric properties of the English version of the Statistical Anxiety Scale (SAS) was examined through a confirmatory factor analysis of scores from a sample of 323 undergraduate…
DESIGNING ENVIRONMENTAL MONITORING DATABASES FOR STATISTIC ASSESSMENT
Databases designed for statistical analyses have characteristics that distinguish them from databases intended for general use. EMAP uses a probabilistic sampling design to collect data to produce statistical assessments of environmental conditions. In addition to supporting the ...
Assaad, Houssein I; Choudhary, Pankaj K
2013-01-01
The L -statistics form an important class of estimators in nonparametric statistics. Its members include trimmed means and sample quantiles and functions thereof. This article is devoted to theory and applications of L -statistics for repeated measurements data, wherein the measurements on the same subject are dependent and the measurements from different subjects are independent. This article has three main goals: (a) Show that the L -statistics are asymptotically normal for repeated measurements data. (b) Present three statistical applications of this result, namely, location estimation using trimmed means, quantile estimation and construction of tolerance intervals. (c) Obtain a Bahadur representation for sample quantiles. These results are generalizations of similar results for independently and identically distributed data. The practical usefulness of these results is illustrated by analyzing a real data set involving measurement of systolic blood pressure. The properties of the proposed point and interval estimators are examined via simulation.
A random-sum Wilcoxon statistic and its application to analysis of ROC and LROC data.
Tang, Liansheng Larry; Balakrishnan, N
2011-01-01
The Wilcoxon-Mann-Whitney statistic is commonly used for a distribution-free comparison of two groups. One requirement for its use is that the sample sizes of the two groups are fixed. This is violated in some of the applications such as medical imaging studies and diagnostic marker studies; in the former, the violation occurs since the number of correctly localized abnormal images is random, while in the latter the violation is due to some subjects not having observable measurements. For this reason, we propose here a random-sum Wilcoxon statistic for comparing two groups in the presence of ties, and derive its variance as well as its asymptotic distribution for large sample sizes. The proposed statistic includes the regular Wilcoxon rank-sum statistic. Finally, we apply the proposed statistic for summarizing location response operating characteristic data from a liver computed tomography study, and also for summarizing diagnostic accuracy of biomarker data.
NASA Technical Reports Server (NTRS)
Racette, Paul; Lang, Roger; Zhang, Zhao-Nan; Zacharias, David; Krebs, Carolyn A. (Technical Monitor)
2002-01-01
Radiometers must be periodically calibrated because the receiver response fluctuates. Many techniques exist to correct for the time varying response of a radiometer receiver. An analytical technique has been developed that uses generalized least squares regression (LSR) to predict the performance of a wide variety of calibration algorithms. The total measurement uncertainty including the uncertainty of the calibration can be computed using LSR. The uncertainties of the calibration samples used in the regression are based upon treating the receiver fluctuations as non-stationary processes. Signals originating from the different sources of emission are treated as simultaneously existing random processes. Thus, the radiometer output is a series of samples obtained from these random processes. The samples are treated as random variables but because the underlying processes are non-stationary the statistics of the samples are treated as non-stationary. The statistics of the calibration samples depend upon the time for which the samples are to be applied. The statistics of the random variables are equated to the mean statistics of the non-stationary processes over the interval defined by the time of calibration sample and when it is applied. This analysis opens the opportunity for experimental investigation into the underlying properties of receiver non stationarity through the use of multiple calibration references. In this presentation we will discuss the application of LSR to the analysis of various calibration algorithms, requirements for experimental verification of the theory, and preliminary results from analyzing experiment measurements.
Ocimum basilicum L.: phenolic profile and antioxidant-related activity.
Dorman, H J Damien; Hiltunen, Raimo
2010-01-01
Ocimum basilicum L. leaf material was extracted by maceration with (80:20:1 v/v/v) methanol: water: acetic acid to produce a crude extract (CE), which was further fractionated by liquid-liquid extraction to isolate light petroleum (PE), ethyl acetate (EtOAc), n-butanol (n-BuOH) and H2O-soluble sub-fractions. The total phenol and flavonoid contents of the resulting samples were estimated using colorimetric-based methods, and their iron(III) reductive and free radical scavenging activities were determined in a battery of in vitro assays. The CE and sub-fractions contained phenolic compounds and flavonoids. The samples, except for PE, gave a positive result for the presence of flavones and flavonols; however, flavanones only appeared to be present in the CE. In iron(III) reduction, CE and n-BuOH were the most potent followed by EtOAc and H2O (statistically indistinguishable, p > 0.05). However, in the ferric reducing antioxidant power assay, H2O was the most potent followed by CE and EtOAc (statistically indistinguishable, p > 0.05) and n-BuOH and PE. In 1,1-diphenyl-2-picrylhydrazyl scavenging, all the samples, except PE, were effective against this reactive nitrogen species, with CE, EtOAc and n-BuOH being the most potent (statistically indistinguishable, p > 0.05). In alkylperoxyl scavenging, all the samples, except for PE, were effective against this reactive oxygen species (ROS). In superoxide anion scavenging, all the samples were capable of scavenging this ROS with CE being the most effective, followed by n-BuOH and H2O (statistically indistinguishable, p > 0.05) and EtOAc and PE. Similarly, in hydroxyl scavenging, all the samples were capable of scavenging this ROS with CE and n-BuOH being the most effective (statistically indistinguishable, p > 0.05) followed by EtOAc and H2O (statistically indistinguishable, p > 0.05) and PE.
Radar error statistics for the space shuttle
NASA Technical Reports Server (NTRS)
Lear, W. M.
1979-01-01
Radar error statistics of C-band and S-band that are recommended for use with the groundtracking programs to process space shuttle tracking data are presented. The statistics are divided into two parts: bias error statistics, using the subscript B, and high frequency error statistics, using the subscript q. Bias errors may be slowly varying to constant. High frequency random errors (noise) are rapidly varying and may or may not be correlated from sample to sample. Bias errors were mainly due to hardware defects and to errors in correction for atmospheric refraction effects. High frequency noise was mainly due to hardware and due to atmospheric scintillation. Three types of atmospheric scintillation were identified: horizontal, vertical, and line of sight. This was the first time that horizontal and line of sight scintillations were identified.
Sample size, confidence, and contingency judgement.
Clément, Mélanie; Mercier, Pierre; Pastò, Luigi
2002-06-01
According to statistical models, the acquisition function of contingency judgement is due to confidence increasing with sample size. According to associative models, the function reflects the accumulation of associative strength on which the judgement is based. Which view is right? Thirty university students assessed the relation between a fictitious medication and a symptom of skin discoloration in conditions that varied sample size (4, 6, 8 or 40 trials) and contingency (delta P = .20, .40, .60 or .80). Confidence was also collected. Contingency judgement was lower for smaller samples, while confidence level correlated inversely with sample size. This dissociation between contingency judgement and confidence contradicts the statistical perspective.
A.R. Mason; H.G. Paul
1994-01-01
Procedures for monitoring larval populations of the Douglas-fir tussock moth and the western spruce budworm are recommended based on many years experience in sampling these species in eastern Oregon and Washington. It is shown that statistically reliable estimates of larval density can be made for a population by sampling host trees in a series of permanent plots in a...
Forrester, Janet E
2015-12-01
Errors in the statistical presentation and analyses of data in the medical literature remain common despite efforts to improve the review process, including the creation of guidelines for authors and the use of statistical reviewers. This article discusses common elementary statistical errors seen in manuscripts recently submitted to Clinical Therapeutics and describes some ways in which authors and reviewers can identify errors and thus correct them before publication. A nonsystematic sample of manuscripts submitted to Clinical Therapeutics over the past year was examined for elementary statistical errors. Clinical Therapeutics has many of the same errors that reportedly exist in other journals. Authors require additional guidance to avoid elementary statistical errors and incentives to use the guidance. Implementation of reporting guidelines for authors and reviewers by journals such as Clinical Therapeutics may be a good approach to reduce the rate of statistical errors. Copyright © 2015 Elsevier HS Journals, Inc. All rights reserved.
Modified Distribution-Free Goodness-of-Fit Test Statistic.
Chun, So Yeon; Browne, Michael W; Shapiro, Alexander
2018-03-01
Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.
Kent, Peter; Boyle, Eleanor; Keating, Jennifer L; Albert, Hanne B; Hartvigsen, Jan
2017-02-01
To quantify variability in the results of statistical analyses based on contingency tables and discuss the implications for the choice of sample size for studies that derive clinical prediction rules. An analysis of three pre-existing sets of large cohort data (n = 4,062-8,674) was performed. In each data set, repeated random sampling of various sample sizes, from n = 100 up to n = 2,000, was performed 100 times at each sample size and the variability in estimates of sensitivity, specificity, positive and negative likelihood ratios, posttest probabilities, odds ratios, and risk/prevalence ratios for each sample size was calculated. There were very wide, and statistically significant, differences in estimates derived from contingency tables from the same data set when calculated in sample sizes below 400 people, and typically, this variability stabilized in samples of 400-600 people. Although estimates of prevalence also varied significantly in samples below 600 people, that relationship only explains a small component of the variability in these statistical parameters. To reduce sample-specific variability, contingency tables should consist of 400 participants or more when used to derive clinical prediction rules or test their performance. Copyright © 2016 Elsevier Inc. All rights reserved.
Gyarmathy, V Anna; Johnston, Lisa G; Caplinskiene, Irma; Caplinskas, Saulius; Latkin, Carl A
2014-02-01
Respondent driven sampling (RDS) and incentivized snowball sampling (ISS) are two sampling methods that are commonly used to reach people who inject drugs (PWID). We generated a set of simulated RDS samples on an actual sociometric ISS sample of PWID in Vilnius, Lithuania ("original sample") to assess if the simulated RDS estimates were statistically significantly different from the original ISS sample prevalences for HIV (9.8%), Hepatitis A (43.6%), Hepatitis B (Anti-HBc 43.9% and HBsAg 3.4%), Hepatitis C (87.5%), syphilis (6.8%) and Chlamydia (8.8%) infections and for selected behavioral risk characteristics. The original sample consisted of a large component of 249 people (83% of the sample) and 13 smaller components with 1-12 individuals. Generally, as long as all seeds were recruited from the large component of the original sample, the simulation samples simply recreated the large component. There were no significant differences between the large component and the entire original sample for the characteristics of interest. Altogether 99.2% of 360 simulation sample point estimates were within the confidence interval of the original prevalence values for the characteristics of interest. When population characteristics are reflected in large network components that dominate the population, RDS and ISS may produce samples that have statistically non-different prevalence values, even though some isolated network components may be under-sampled and/or statistically significantly different from the main groups. This so-called "strudel effect" is discussed in the paper. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Zhu, Wensheng; Yuan, Ying; Zhang, Jingwen; Zhou, Fan; Knickmeyer, Rebecca C; Zhu, Hongtu
2017-02-01
The aim of this paper is to systematically evaluate a biased sampling issue associated with genome-wide association analysis (GWAS) of imaging phenotypes for most imaging genetic studies, including the Alzheimer's Disease Neuroimaging Initiative (ADNI). Specifically, the original sampling scheme of these imaging genetic studies is primarily the retrospective case-control design, whereas most existing statistical analyses of these studies ignore such sampling scheme by directly correlating imaging phenotypes (called the secondary traits) with genotype. Although it has been well documented in genetic epidemiology that ignoring the case-control sampling scheme can produce highly biased estimates, and subsequently lead to misleading results and suspicious associations, such findings are not well documented in imaging genetics. We use extensive simulations and a large-scale imaging genetic data analysis of the Alzheimer's Disease Neuroimaging Initiative (ADNI) data to evaluate the effects of the case-control sampling scheme on GWAS results based on some standard statistical methods, such as linear regression methods, while comparing it with several advanced statistical methods that appropriately adjust for the case-control sampling scheme. Copyright © 2016 Elsevier Inc. All rights reserved.
The Statistics of wood assays for preservative retention
Patricia K. Lebow; Scott W. Conklin
2011-01-01
This paper covers general statistical concepts that apply to interpreting wood assay retention values. In particular, since wood assays are typically obtained from a single composited sample, the statistical aspects, including advantages and disadvantages, of simple compositing are covered.
2013-01-01
Background The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case–control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. Results We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson’s disease (PD) case–control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk. Conclusions We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS. PMID:23394771
ERIC Educational Resources Information Center
Davis, Ashley; Mirick, Rebecca G.
2015-01-01
Many social work students feel anxious when taking a statistics course. Their attitudes, beliefs, and behaviors after learning statistics are less known. However, such information could help instructors support students' ongoing development of statistical knowledge. With a sample of MSW students (N = 101) in one program, this study examined…
Chan, Kwun Chuen Gary; Qin, Jing
2015-10-01
Existing linear rank statistics cannot be applied to cross-sectional survival data without follow-up since all subjects are essentially censored. However, partial survival information are available from backward recurrence times and are frequently collected from health surveys without prospective follow-up. Under length-biased sampling, a class of linear rank statistics is proposed based only on backward recurrence times without any prospective follow-up. When follow-up data are available, the proposed rank statistic and a conventional rank statistic that utilizes follow-up information from the same sample are shown to be asymptotically independent. We discuss four ways to combine these two statistics when follow-up is present. Simulations show that all combined statistics have substantially improved power compared with conventional rank statistics, and a Mantel-Haenszel test performed the best among the proposal statistics. The method is applied to a cross-sectional health survey without follow-up and a study of Alzheimer's disease with prospective follow-up. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Probability sampling in legal cases: Kansas cellphone users
NASA Astrophysics Data System (ADS)
Kadane, Joseph B.
2012-10-01
Probability sampling is a standard statistical technique. This article introduces the basic ideas of probability sampling, and shows in detail how probability sampling was used in a particular legal case.
BROËT, PHILIPPE; TSODIKOV, ALEXANDER; DE RYCKE, YANN; MOREAU, THIERRY
2010-01-01
This paper presents two-sample statistics suited for testing equality of survival functions against improper semi-parametric accelerated failure time alternatives. These tests are designed for comparing either the short- or the long-term effect of a prognostic factor, or both. These statistics are obtained as partial likelihood score statistics from a time-dependent Cox model. As a consequence, the proposed tests can be very easily implemented using widely available software. A breast cancer clinical trial is presented as an example to demonstrate the utility of the proposed tests. PMID:15293627
ERIC Educational Resources Information Center
Nevitt, Jonathan; Hancock, Gregory R.
2001-01-01
Evaluated the bootstrap method under varying conditions of nonnormality, sample size, model specification, and number of bootstrap samples drawn from the resampling space. Results for the bootstrap suggest the resampling-based method may be conservative in its control over model rejections, thus having an impact on the statistical power associated…
Examples of Data Analysis with SPSS/PC+ Studentware.
ERIC Educational Resources Information Center
MacFarland, Thomas W.
Intended for classroom use only, these unpublished notes contain computer lessons on descriptive statistics with files previously created in WordPerfect 4.2 and Lotus 1-2-3 Version 1.A for the IBM PC+. The statistical measures covered include Student's t-test with two independent samples; Student's t-test with a paired sample; Chi-square analysis;…
Introduction to Sample Size Choice for Confidence Intervals Based on "t" Statistics
ERIC Educational Resources Information Center
Liu, Xiaofeng Steven; Loudermilk, Brandon; Simpson, Thomas
2014-01-01
Sample size can be chosen to achieve a specified width in a confidence interval. The probability of obtaining a narrow width given that the confidence interval includes the population parameter is defined as the power of the confidence interval, a concept unfamiliar to many practitioners. This article shows how to utilize the Statistical Analysis…
Appropriate Statistical Analysis for Two Independent Groups of Likert-Type Data
ERIC Educational Resources Information Center
Warachan, Boonyasit
2011-01-01
The objective of this research was to determine the robustness and statistical power of three different methods for testing the hypothesis that ordinal samples of five and seven Likert categories come from equal populations. The three methods are the two sample t-test with equal variances, the Mann-Whitney test, and the Kolmogorov-Smirnov test. In…
ERIC Educational Resources Information Center
Bailey, Thomas; Jenkins, Davis; Leinbach, Timothy
2005-01-01
This report summarizes statistics on access and attainment in higher education, focusing particularly on community college students, using data from the National Education Longitudinal Study of 1988 (NELS:88), which follows a nationally representative sample of individuals who were eighth graders in the spring of 1988. A sample of these…
Consistent Tolerance Bounds for Statistical Distributions
NASA Technical Reports Server (NTRS)
Mezzacappa, M. A.
1983-01-01
Assumption that sample comes from population with particular distribution is made with confidence C if data lie between certain bounds. These "confidence bounds" depend on C and assumption about distribution of sampling errors around regression line. Graphical test criteria using tolerance bounds are applied in industry where statistical analysis influences product development and use. Applied to evaluate equipment life.
Patton, Charles J.; Gilroy, Edward J.
1999-01-01
Data on which this report is based, including nutrient concentrations in synthetic reference samples determined concurrently with those in real samples, are extensive (greater than 20,000 determinations) and have been published separately. In addition to confirming the well-documented instability of nitrite in acidified samples, this study also demonstrates that when biota are removed from samples at collection sites by 0.45-micrometer membrane filtration, subsequent preservation with sulfuric acid or mercury (II) provides no statistically significant improvement in nutrient concentration stability during storage at 4 degrees Celsius for 30 days. Biocide preservation had no statistically significant effect on the 30-day stability of phosphorus concentrations in whole-water splits from any of the 15 stations, but did stabilize Kjeldahl nitrogen concentrations in whole-water splits from three data-collection stations where ammonium accounted for at least half of the measured Kjeldahl nitrogen.
Fernee, Christianne; Browne, Martin; Zakrzewski, Sonia
2017-01-01
This paper introduces statistical shape modelling (SSM) for use in osteoarchaeology research. SSM is a full field, multi-material analytical technique, and is presented as a supplementary geometric morphometric (GM) tool. Lower mandibular canines from two archaeological populations and one modern population were sampled, digitised using micro-CT, aligned, registered to a baseline and statistically modelled using principal component analysis (PCA). Sample material properties were incorporated as a binary enamel/dentin parameter. Results were assessed qualitatively and quantitatively using anatomical landmarks. Finally, the technique’s application was demonstrated for inter-sample comparison through analysis of the principal component (PC) weights. It was found that SSM could provide high detail qualitative and quantitative insight with respect to archaeological inter- and intra-sample variability. This technique has value for archaeological, biomechanical and forensic applications including identification, finite element analysis (FEA) and reconstruction from partial datasets. PMID:29216199
Gyarmathy, V. Anna; Johnston, Lisa G.; Caplinskiene, Irma; Caplinskas, Saulius; Latkin, Carl A.
2014-01-01
Background Respondent driven sampling (RDS) and Incentivized Snowball Sampling (ISS) are two sampling methods that are commonly used to reach people who inject drugs (PWID). Methods We generated a set of simulated RDS samples on an actual sociometric ISS sample of PWID in Vilnius, Lithuania (“original sample”) to assess if the simulated RDS estimates were statistically significantly different from the original ISS sample prevalences for HIV (9.8%), Hepatitis A (43.6%), Hepatitis B (Anti-HBc 43.9% and HBsAg 3.4%), Hepatitis C (87.5%), syphilis (6.8%) and Chlamydia (8.8%) infections and for selected behavioral risk characteristics. Results The original sample consisted of a large component of 249 people (83% of the sample) and 13 smaller components with 1 to 12 individuals. Generally, as long as all seeds were recruited from the large component of the original sample, the simulation samples simply recreated the large component. There were no significant differences between the large component and the entire original sample for the characteristics of interest. Altogether 99.2% of 360 simulation sample point estimates were within the confidence interval of the original prevalence values for the characteristics of interest. Conclusions When population characteristics are reflected in large network components that dominate the population, RDS and ISS may produce samples that have statistically non-different prevalence values, even though some isolated network components may be under-sampled and/or statistically significantly different from the main groups. This so-called “strudel effect” is discussed in the paper. PMID:24360650
Statistical and sampling issues when using multiple particle tracking
NASA Astrophysics Data System (ADS)
Savin, Thierry; Doyle, Patrick S.
2007-08-01
Video microscopy can be used to simultaneously track several microparticles embedded in a complex material. The trajectories are used to extract a sample of displacements at random locations in the material. From this sample, averaged quantities characterizing the dynamics of the probes are calculated to evaluate structural and/or mechanical properties of the assessed material. However, the sampling of measured displacements in heterogeneous systems is singular because the volume of observation with video microscopy is finite. By carefully characterizing the sampling design in the experimental output of the multiple particle tracking technique, we derive estimators for the mean and variance of the probes’ dynamics that are independent of the peculiar statistical characteristics. We expose stringent tests of these estimators using simulated and experimental complex systems with a known heterogeneous structure. Up to a certain fundamental limitation, which we characterize through a material degree of sampling by the embedded probe tracking, these estimators can be applied to quantify the heterogeneity of a material, providing an original and intelligible kind of information on complex fluid properties. More generally, we show that the precise assessment of the statistics in the multiple particle tracking output sample of observations is essential in order to provide accurate unbiased measurements.
Miranda de Sá, Antonio Mauricio F L; Infantosi, Antonio Fernando C; Lazarev, Vladimir V
2007-01-01
In the present work, a commonly used index for evaluating the Event-Related Synchronization and Desynchronization (ERS/ERD) in the EEG was expressed as a function of the Spectral F-Test (SFT), which is a statistical test for assessing if two sample spectra are from populations with identical theoretical spectra. The sampling distribution of SFT has been derived, allowing hence ERS/ERD to be evaluated under a statistical basis. An example of the technique was also provided in the EEG signals from 10 normal subjects during intermittent photic stimulation.
1987-08-01
out. To use each animal as its own control , arterial blood was sampled by means of chronically implanted aortic cannulas 112,13,14]. This simple...APPENDIX B STATISTICAL METHODOLOGY 37 APPENDIX B STATISTICAL METHODOLOGY The balanced design of this experiment (requiring that 25 animals from each...protoccl in that, in numerous cases, samples were collected at odd intervals (invalidating the orthogonality of the design ) and the number of samples’taken
1985-09-01
TECHNIQUES THESIS Robert A. Heinlein Captain, USAF AFIT/GLM/LSM/855-32.- _ DTIC MU’noN ’ST.,TEMENT A A-ZELECTE Approved lt public teleo*I Al \\ Z #&N0V21...343" A FEASIBILITY STUDY OF THE COLLECTION OF UNSCHEDULED MAINTENANCE DATA USING STrATISTICAL SAMPLING TECHNIQUES THESIS L .9 Robe-t A. Heinlein...a AFIT/GLM/LSM/85S-32 A FEASIBILITY STUDY OF THE COLLECTION OF UNSCHEDULED MAINTENANCE DATA USING STATISTICAL SAMPLING TECHNIQUES THESIS
Analysis of Longitudinal Outcome Data with Missing Values in Total Knee Arthroplasty.
Kang, Yeon Gwi; Lee, Jang Taek; Kang, Jong Yeal; Kim, Ga Hye; Kim, Tae Kyun
2016-01-01
We sought to determine the influence of missing data on the statistical results, and to determine which statistical method is most appropriate for the analysis of longitudinal outcome data of TKA with missing values among repeated measures ANOVA, generalized estimating equation (GEE) and mixed effects model repeated measures (MMRM). Data sets with missing values were generated with different proportion of missing data, sample size and missing-data generation mechanism. Each data set was analyzed with three statistical methods. The influence of missing data was greater with higher proportion of missing data and smaller sample size. MMRM tended to show least changes in the statistics. When missing values were generated by 'missing not at random' mechanism, no statistical methods could fully avoid deviations in the results. Copyright © 2016 Elsevier Inc. All rights reserved.
Determination of polarimetric parameters of honey by near-infrared transflectance spectroscopy.
García-Alvarez, M; Ceresuela, S; Huidobro, J F; Hermida, M; Rodríguez-Otero, J L
2002-01-30
NIR transflectance spectroscopy was used to determine polarimetric parameters (direct polarization, polarization after inversion, specific rotation in dry matter, and polarization due to nonmonosaccharides) and sucrose in honey. In total, 156 honey samples were collected during 1992 (45 samples), 1995 (56 samples), and 1996 (55 samples). Samples were analyzed by NIR spectroscopy and polarimetric methods. Calibration (118 samples) and validation (38 samples) sets were made up; honeys from the three years were included in both sets. Calibrations were performed by modified partial least-squares regression and scatter correction by standard normal variation and detrend methods. For direct polarization, polarization after inversion, specific rotation in dry matter, and polarization due to nonmonosaccharides, good statistics (bias, SEV, and R(2)) were obtained for the validation set, and no statistically (p = 0.05) significant differences were found between instrumental and polarimetric methods for these parameters. Statistical data for sucrose were not as good as those of the other parameters. Therefore, NIR spectroscopy is not an effective method for quantitative analysis of sucrose in these honey samples. However, NIR spectroscopy may be an acceptable method for semiquantitative evaluation of sucrose for honeys, such as those in our study, containing up to 3% of sucrose. Further work is necessary to validate the uncertainty at higher levels.
[Practical aspects regarding sample size in clinical research].
Vega Ramos, B; Peraza Yanes, O; Herrera Correa, G; Saldívar Toraya, S
1996-01-01
The knowledge of the right sample size let us to be sure if the published results in medical papers had a suitable design and a proper conclusion according to the statistics analysis. To estimate the sample size we must consider the type I error, type II error, variance, the size of the effect, significance and power of the test. To decide what kind of mathematics formula will be used, we must define what kind of study we have, it means if its a prevalence study, a means values one or a comparative one. In this paper we explain some basic topics of statistics and we describe four simple samples of estimation of sample size.
Statistical Analysis for Collision-free Boson Sampling.
Huang, He-Liang; Zhong, Han-Sen; Li, Tan; Li, Feng-Guang; Fu, Xiang-Qun; Zhang, Shuo; Wang, Xiang; Bao, Wan-Su
2017-11-10
Boson sampling is strongly believed to be intractable for classical computers but solvable with photons in linear optics, which raises widespread concern as a rapid way to demonstrate the quantum supremacy. However, due to its solution is mathematically unverifiable, how to certify the experimental results becomes a major difficulty in the boson sampling experiment. Here, we develop a statistical analysis scheme to experimentally certify the collision-free boson sampling. Numerical simulations are performed to show the feasibility and practicability of our scheme, and the effects of realistic experimental conditions are also considered, demonstrating that our proposed scheme is experimentally friendly. Moreover, our broad approach is expected to be generally applied to investigate multi-particle coherent dynamics beyond the boson sampling.
Xiang, Jim X
2016-01-01
Measuring a change in the existence of disease symptoms before and after a treatment is examined for statistical significance by means of the McNemar test. When comparing two treatments, Feuer and Kessler (1989) proposed a two-sample McNemar test. In this article, we show that this test usually inflates the type I error in the hypothesis testing, and propose a new two-sample McNemar test that is superior in terms of preserving type I error. We also make the connection between the two-sample McNemar test and the test statistic for the equal residual effects in a 2 × 2 crossover design. The limitations of the two-sample McNemar test are also discussed.
Breaking Free of Sample Size Dogma to Perform Innovative Translational Research
Bacchetti, Peter; Deeks, Steven G.; McCune, Joseph M.
2011-01-01
Innovative clinical and translational research is often delayed or prevented by reviewers’ expectations that any study performed in humans must be shown in advance to have high statistical power. This supposed requirement is not justifiable and is contradicted by the reality that increasing sample size produces diminishing marginal returns. Studies of new ideas often must start small (sometimes even with an N of 1) because of cost and feasibility concerns, and recent statistical work shows that small sample sizes for such research can produce more projected scientific value per dollar spent than larger sample sizes. Renouncing false dogma about sample size would remove a serious barrier to innovation and translation. PMID:21677197
Evaluation of a New Mean Scaled and Moment Adjusted Test Statistic for SEM
ERIC Educational Resources Information Center
Tong, Xiaoxiao; Bentler, Peter M.
2013-01-01
Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and 2 well-known robust test…
ERIC Educational Resources Information Center
Idris, Khairiani; Yang, Kai-Lin
2017-01-01
This article reports the results of a mixed-methods approach to develop and validate an instrument to measure Indonesian pre-service teachers' conceptions of statistics. First, a phenomenographic study involving a sample of 44 participants uncovered six categories of conceptions of statistics. Second, an instrument of conceptions of statistics was…
Ryberg, Karen R.; Hiemenz, Gregory
2009-01-01
The Bureau of Reclamation collected water-quality samples at 16 sites on the James River and the Arrowwood National Wildlife Refuge, N. Dak., as part of its refuge-monitoring program from 1987-93 and as part of an environmental impact statement commitment from 1999-2004. Climatic and hydrologic conditions varied greatly during both sampling periods. The first period was dominated by drought conditions, which abruptly changed to cooler and wetter conditions in 1992-93. During the second period, conditions were near normal to very wet and included higher inflow from the James River into the refuge. The two periods also differed in the sites sampled, seasons sampled, and properties and constituent concentrations measured. Summary statistics were reported separately for the two sampling periods for all physical properties and constituents. Nonparametric statistical tests were used to further analyze some of the water-quality data. During the first sampling period, 1987-93, specific conductance, turbidity, hardness, alkalinity, total dissolved solids, total suspended solids, nonvolatile suspended solids, calcium, magnesium, sodium, potassium, sulfate, chloride, phosphate, total phosphorus, total organic carbon, chlorophyll a, and arsenic were determined to have significantly different medians among the sites tested. During the second sampling period, 1999-2004, the medians of pH, sodium, chloride, barium, and boron varied significantly among sites. Sites sampled and period of record varied between the two sampling periods and the period of record varied among the sites. Also, some constituents analyzed during the first period (1987-93) were not analyzed during the second period (1999-2004), and winter sampling was done during the second sampling period only. This variability reduces the number of direct comparisons that can be made between the two periods. Three sites had complete periods of record for both sampling periods and were compared. Differences in variability and median concentration were identified between the two time periods. Sites representing inflow to the refuge and outflow were compared statistically for the period when data were available for both sites, 1999-2004. Of the nutrients tested - ammonia plus organic nitrogen, phosphate, and total phosphorus - no significant statistical differences were found between the inflow samples and the outflow samples. Statistically significant differences were found for pH, sulfate, chloride, barium, and manganese. Nutrients are of particular interest in the refuge because of the aquatic plant and animal life and the use of the wetland resources by waterfowl. However, the nutrient data were highly censored and there were differences in the seasonal timing of sample collection between the two sampling periods. Therefore, the nutrient data were examined graphically with stripplots that highlighted differences in the seasonal timing of sample collection and concentration differences likely related to the differences in climatic and hydrologic conditions between the two periods.
Austin, Peter C
2007-11-01
I conducted a systematic review of the use of propensity score matching in the cardiovascular surgery literature. I examined the adequacy of reporting and whether appropriate statistical methods were used. I examined 60 articles published in the Annals of Thoracic Surgery, European Journal of Cardio-thoracic Surgery, Journal of Cardiovascular Surgery, and the Journal of Thoracic and Cardiovascular Surgery between January 1, 2004, and December 31, 2006. Thirty-one of the 60 studies did not provide adequate information on how the propensity score-matched pairs were formed. Eleven (18%) of studies did not report on whether matching on the propensity score balanced baseline characteristics between treated and untreated subjects in the matched sample. No studies used appropriate methods to compare baseline characteristics between treated and untreated subjects in the propensity score-matched sample. Eight (13%) of the 60 studies explicitly used statistical methods appropriate for the analysis of matched data when estimating the effect of treatment on the outcomes. Two studies used appropriate methods for some outcomes, but not for all outcomes. Thirty-nine (65%) studies explicitly used statistical methods that were inappropriate for matched-pairs data when estimating the effect of treatment on outcomes. Eleven studies did not report the statistical tests that were used to assess the statistical significance of the treatment effect. Analysis of propensity score-matched samples tended to be poor in the cardiovascular surgery literature. Most statistical analyses ignored the matched nature of the sample. I provide suggestions for improving the reporting and analysis of studies that use propensity score matching.
Multi-Reader ROC studies with Split-Plot Designs: A Comparison of Statistical Methods
Obuchowski, Nancy A.; Gallas, Brandon D.; Hillis, Stephen L.
2012-01-01
Rationale and Objectives Multi-reader imaging trials often use a factorial design, where study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of the design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper we compare three methods of analysis for the split-plot design. Materials and Methods Three statistical methods are presented: Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean ANOVA approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power and confidence interval coverage of the three test statistics. Results The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% CIs fall close to the nominal coverage for small and large sample sizes. Conclusions The split-plot MRMC study design can be statistically efficient compared with the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rate, similar power, and nominal CI coverage, are available for this study design. PMID:23122570
Multi-reader ROC studies with split-plot designs: a comparison of statistical methods.
Obuchowski, Nancy A; Gallas, Brandon D; Hillis, Stephen L
2012-12-01
Multireader imaging trials often use a factorial design, in which study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of this design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper, the authors compare three methods of analysis for the split-plot design. Three statistical methods are presented: the Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean analysis-of-variance approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power, and confidence interval coverage of the three test statistics. The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% confidence intervals falls close to the nominal coverage for small and large sample sizes. The split-plot multireader, multicase study design can be statistically efficient compared to the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rates, similar power, and nominal confidence interval coverage, are available for this study design. Copyright © 2012 AUR. All rights reserved.
The Effect of Cluster Sampling Design in Survey Research on the Standard Error Statistic.
ERIC Educational Resources Information Center
Wang, Lin; Fan, Xitao
Standard statistical methods are used to analyze data that is assumed to be collected using a simple random sampling scheme. These methods, however, tend to underestimate variance when the data is collected with a cluster design, which is often found in educational survey research. The purposes of this paper are to demonstrate how a cluster design…
ERIC Educational Resources Information Center
Ainley, Janet; Gould, Robert; Pratt, Dave
2015-01-01
This paper is in the form of a reflective discussion of the collection of papers in this Special Issue on "Statistical reasoning: learning to reason from samples" drawing on deliberations arising at the Seventh International Collaboration for Research on Statistical Reasoning, Thinking, and Literacy (SRTL7). It is an important part of…
Hansen, John P
2003-01-01
Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 1, presents basic information about data including a classification system that describes the four major types of variables: continuous quantitative variable, discrete quantitative variable, ordinal categorical variable (including the binomial variable), and nominal categorical variable. A histogram is a graph that displays the frequency distribution for a continuous variable. The article also demonstrates how to calculate the mean, median, standard deviation, and variance for a continuous variable.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wild, M.; Rouhani, S.
1995-02-01
A typical site investigation entails extensive sampling and monitoring. In the past, sampling plans have been designed on purely ad hoc bases, leading to significant expenditures and, in some cases, collection of redundant information. In many instances, sampling costs exceed the true worth of the collected data. The US Environmental Protection Agency (EPA) therefore has advocated the use of geostatistics to provide a logical framework for sampling and analysis of environmental data. Geostatistical methodology uses statistical techniques for the spatial analysis of a variety of earth-related data. The use of geostatistics was developed by the mining industry to estimate oremore » concentrations. The same procedure is effective in quantifying environmental contaminants in soils for risk assessments. Unlike classical statistical techniques, geostatistics offers procedures to incorporate the underlying spatial structure of the investigated field. Sample points spaced close together tend to be more similar than samples spaced further apart. This can guide sampling strategies and determine complex contaminant distributions. Geostatistic techniques can be used to evaluate site conditions on the basis of regular, irregular, random and even spatially biased samples. In most environmental investigations, it is desirable to concentrate sampling in areas of known or suspected contamination. The rigorous mathematical procedures of geostatistics allow for accurate estimates at unsampled locations, potentially reducing sampling requirements. The use of geostatistics serves as a decision-aiding and planning tool and can significantly reduce short-term site assessment costs, long-term sampling and monitoring needs, as well as lead to more accurate and realistic remedial design criteria.« less
Statistical approaches used to assess and redesign surface water-quality-monitoring networks.
Khalil, B; Ouarda, T B M J
2009-11-01
An up-to-date review of the statistical approaches utilized for the assessment and redesign of surface water quality monitoring (WQM) networks is presented. The main technical aspects of network design are covered in four sections, addressing monitoring objectives, water quality variables, sampling frequency and spatial distribution of sampling locations. This paper discusses various monitoring objectives and related procedures used for the assessment and redesign of long-term surface WQM networks. The appropriateness of each approach for the design, contraction or expansion of monitoring networks is also discussed. For each statistical approach, its advantages and disadvantages are examined from a network design perspective. Possible methods to overcome disadvantages and deficiencies in the statistical approaches that are currently in use are recommended.
Mugel, Douglas N.
2002-01-01
Forty-seven wells and 8 springs were sampled in May, October, and November 2000 in the upper Shoal Creek Basin, southwest Missouri, to determine if nutrient concentrations and fecal bacteria densities are increasing in the shallow aquifer as a result of poultry confined animal feeding operations (CAFOs). Most of the land use in the basin is agricultural, with cattle and hay production dominating; the number of poultry CAFOs has increased in recent years. Poultry waste (litter) is used as a source of nutrients on pasture land as much as several miles away from poultry barns.Most wells in the sample network were classified as ?P? wells, which were open only or mostly to the Springfield Plateau aquifer and where poultry litter was applied to a substantial acreage within 0.5 mile of the well both in spring 2000 and in several previous years; and ?Ag? wells, which were open only or mostly to the Springfield Plateau aquifer and which had limited or no association with poultry CAFOs. Water-quality data from wells and springs were grouped for statistical purposes as P1, Ag1, and Sp1 (May 2000 samples) and P2, Ag2, and Sp2 (October or November 2000 samples). The results of this study do not indicate that poultry CAFOs are affecting the shallow ground water in the upper Shoal Creek Basin with respect to nutrient concentrations and fecal bacteria densities. Statistical tests do not indicate that P wells sampled in spring 2000 have statistically larger concentrations of nitrite plus nitrate or fecal indicator bacteria densities than Ag wells sampled during the same time, at a 95-percent confidence level. Instead, the Ag wells had statistically larger concentrations of nitrite plus nitrate and fecal coliform bacteria densities than the P wells.The results of this study do not indicate seasonal variations from spring 2000 to fall 2000 in the concentrations of nutrients or fecal indicator bacteria densities from well samples. Statistical tests do not indicate statistically significant differences at a 95-percent confidence level for nitrite plus nitrate concentrations or fecal indicator bacteria densities between either P wells sampled in spring and fall 2000, or Ag wells sampled in spring and fall 2000. However, analysis of samples from springs shows that fecal streptococcus bacteria densities were statistically smaller in fall 2000 than in spring 2000 at a 95-percent confidence level.Nitrite plus nitrate concentrations in spring 2000 samples ranged from less than the detection level [0.02 mg/L (milligram per liter) as nitrogen] to 18 mg/L as nitrogen. Seven samples from three wells had nitrite plus nitrate concentrations at or larger than the maximum contaminant level (MCL) of 10 mg/L as nitrogen. The median nitrite plus nitrate concentrations were 0.28 mg/L as nitrogen for P1 samples, 4.6 mg/L as nitrogen for Ag1 samples, and 3.9 mg/L as nitrogen for Sp1 samples.Fecal coliform bacteria were detected in 1 of 25 P1 samples and 5 of 15 Ag1 samples. Escherichia coli (E. coli) bacteria were detected in 3 of 24 P1 samples and 1 of 13 Ag1 samples. Fecal streptococcus bacteria were detected in 8 of 25 P1 samples and 6 of 15 Ag1 samples. Bacteria densities in samples from wells ranged from less than 1 to 81 col/100 mL (colonies per 100 milliliters) of fecal coliform, less than 1 to 140 col/100 mL of E. coli, and less than 1 to 130 col/100 mL of fecal streptococcus. Fecal indicator bacteria densities in samples from springs were substantially larger than in samples from wells. In Sp1 samples, bacteria densities ranged from 12 to 3,300 col/100 mL of fecal coliform, 40 to 2,700 col/100 mL of E. coli, and 42 to 3,100 col/100 mL of fecal streptococcus.
The Statistical Power of Planned Comparisons.
ERIC Educational Resources Information Center
Benton, Roberta L.
Basic principles underlying statistical power are examined; and issues pertaining to effect size, sample size, error variance, and significance level are highlighted via the use of specific hypothetical examples. Analysis of variance (ANOVA) and related methods remain popular, although other procedures sometimes have more statistical power against…
In vivo Comet assay--statistical analysis and power calculations of mice testicular cells.
Hansen, Merete Kjær; Sharma, Anoop Kumar; Dybdahl, Marianne; Boberg, Julie; Kulahci, Murat
2014-11-01
The in vivo Comet assay is a sensitive method for evaluating DNA damage. A recurrent concern is how to analyze the data appropriately and efficiently. A popular approach is to summarize the raw data into a summary statistic prior to the statistical analysis. However, consensus on which summary statistic to use has yet to be reached. Another important consideration concerns the assessment of proper sample sizes in the design of Comet assay studies. This study aims to identify a statistic suitably summarizing the % tail DNA of mice testicular samples in Comet assay studies. A second aim is to provide curves for this statistic outlining the number of animals and gels to use. The current study was based on 11 compounds administered via oral gavage in three doses to male mice: CAS no. 110-26-9, CAS no. 512-56-1, CAS no. 111873-33-7, CAS no. 79-94-7, CAS no. 115-96-8, CAS no. 598-55-0, CAS no. 636-97-5, CAS no. 85-28-9, CAS no. 13674-87-8, CAS no. 43100-38-5 and CAS no. 60965-26-6. Testicular cells were examined using the alkaline version of the Comet assay and the DNA damage was quantified as % tail DNA using a fully automatic scoring system. From the raw data 23 summary statistics were examined. A linear mixed-effects model was fitted to the summarized data and the estimated variance components were used to generate power curves as a function of sample size. The statistic that most appropriately summarized the within-sample distributions was the median of the log-transformed data, as it most consistently conformed to the assumptions of the statistical model. Power curves for 1.5-, 2-, and 2.5-fold changes of the highest dose group compared to the control group when 50 and 100 cells were scored per gel are provided to aid in the design of future Comet assay studies on testicular cells. Copyright © 2014 Elsevier B.V. All rights reserved.
Fast and accurate imputation of summary statistics enhances evidence of functional enrichment
Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P.; Patterson, Nick; Price, Alkes L.
2014-01-01
Motivation: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. Results: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1–5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case–control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of χ2 association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Availability and implementation: Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. Contact: bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:24990607
Micro-organism distribution sampling for bioassays
NASA Technical Reports Server (NTRS)
Nelson, B. A.
1975-01-01
Purpose of sampling distribution is to characterize sample-to-sample variation so statistical tests may be applied, to estimate error due to sampling (confidence limits) and to evaluate observed differences between samples. Distribution could be used for bioassays taken in hospitals, breweries, food-processing plants, and pharmaceutical plants.
Chao, Li-Wei; Szrek, Helena; Peltzer, Karl; Ramlagan, Shandir; Fleming, Peter; Leite, Rui; Magerman, Jesswill; Ngwenya, Godfrey B.; Pereira, Nuno Sousa; Behrman, Jere
2011-01-01
Finding an efficient method for sampling micro- and small-enterprises (MSEs) for research and statistical reporting purposes is a challenge in developing countries, where registries of MSEs are often nonexistent or outdated. This lack of a sampling frame creates an obstacle in finding a representative sample of MSEs. This study uses computer simulations to draw samples from a census of businesses and non-businesses in the Tshwane Municipality of South Africa, using three different sampling methods: the traditional probability sampling method, the compact segment sampling method, and the World Health Organization’s Expanded Programme on Immunization (EPI) sampling method. Three mechanisms by which the methods could differ are tested, the proximity selection of respondents, the at-home selection of respondents, and the use of inaccurate probability weights. The results highlight the importance of revisits and accurate probability weights, but the lesser effect of proximity selection on the samples’ statistical properties. PMID:22582004
A Statistical Analysis Plan to Support the Joint Forward Area Air Defense Test.
1984-08-02
hy estahlishing a specific significance level prior to performing the statistical test (traditionally a levels are set at .01 or .05). What is often...undesirable increase in 8. For constant a levels , the power (I - 8) of a statistical test can he increased by Increasing the sample size of the test. fRef...ANOVA Iparison Test on MOP I=--ferences Exist AmongF "Upon MOP "A" Factor I "A" Factor I 1MOP " A " Levels ? I . I I I _ _ ________ IPerform k-Sample Com- I
tscvh R Package: Computational of the two samples test on microarray-sequencing data
NASA Astrophysics Data System (ADS)
Fajriyah, Rohmatul; Rosadi, Dedi
2017-12-01
We present a new R package, a tscvh (two samples cross-variance homogeneity), as we called it. This package is a software of the cross-variance statistical test which has been proposed and introduced by Fajriyah ([3] and [4]), based on the cross-variance concept. The test can be used as an alternative test for the significance difference between two means when sample size is small, the situation which is usually appeared in the bioinformatics research. Based on its statistical distribution, the p-value can be also provided. The package is built under a homogeneity of variance between samples.
Small sample mediation testing: misplaced confidence in bootstrapped confidence intervals.
Koopman, Joel; Howe, Michael; Hollenbeck, John R; Sin, Hock-Peng
2015-01-01
Bootstrapping is an analytical tool commonly used in psychology to test the statistical significance of the indirect effect in mediation models. Bootstrapping proponents have particularly advocated for its use for samples of 20-80 cases. This advocacy has been heeded, especially in the Journal of Applied Psychology, as researchers are increasingly utilizing bootstrapping to test mediation with samples in this range. We discuss reasons to be concerned with this escalation, and in a simulation study focused specifically on this range of sample sizes, we demonstrate not only that bootstrapping has insufficient statistical power to provide a rigorous hypothesis test in most conditions but also that bootstrapping has a tendency to exhibit an inflated Type I error rate. We then extend our simulations to investigate an alternative empirical resampling method as well as a Bayesian approach and demonstrate that they exhibit comparable statistical power to bootstrapping in small samples without the associated inflated Type I error. Implications for researchers testing mediation hypotheses in small samples are presented. For researchers wishing to use these methods in their own research, we have provided R syntax in the online supplemental materials. (c) 2015 APA, all rights reserved.
NASA Astrophysics Data System (ADS)
WANG, P. T.
2015-12-01
Groundwater modeling requires to assign hydrogeological properties to every numerical grid. Due to the lack of detailed information and the inherent spatial heterogeneity, geological properties can be treated as random variables. Hydrogeological property is assumed to be a multivariate distribution with spatial correlations. By sampling random numbers from a given statistical distribution and assigning a value to each grid, a random field for modeling can be completed. Therefore, statistics sampling plays an important role in the efficiency of modeling procedure. Latin Hypercube Sampling (LHS) is a stratified random sampling procedure that provides an efficient way to sample variables from their multivariate distributions. This study combines the the stratified random procedure from LHS and the simulation by using LU decomposition to form LULHS. Both conditional and unconditional simulations of LULHS were develpoed. The simulation efficiency and spatial correlation of LULHS are compared to the other three different simulation methods. The results show that for the conditional simulation and unconditional simulation, LULHS method is more efficient in terms of computational effort. Less realizations are required to achieve the required statistical accuracy and spatial correlation.
How Good Are Statistical Models at Approximating Complex Fitness Landscapes?
du Plessis, Louis; Leventhal, Gabriel E.; Bonhoeffer, Sebastian
2016-01-01
Fitness landscapes determine the course of adaptation by constraining and shaping evolutionary trajectories. Knowledge of the structure of a fitness landscape can thus predict evolutionary outcomes. Empirical fitness landscapes, however, have so far only offered limited insight into real-world questions, as the high dimensionality of sequence spaces makes it impossible to exhaustively measure the fitness of all variants of biologically meaningful sequences. We must therefore revert to statistical descriptions of fitness landscapes that are based on a sparse sample of fitness measurements. It remains unclear, however, how much data are required for such statistical descriptions to be useful. Here, we assess the ability of regression models accounting for single and pairwise mutations to correctly approximate a complex quasi-empirical fitness landscape. We compare approximations based on various sampling regimes of an RNA landscape and find that the sampling regime strongly influences the quality of the regression. On the one hand it is generally impossible to generate sufficient samples to achieve a good approximation of the complete fitness landscape, and on the other hand systematic sampling schemes can only provide a good description of the immediate neighborhood of a sequence of interest. Nevertheless, we obtain a remarkably good and unbiased fit to the local landscape when using sequences from a population that has evolved under strong selection. Thus, current statistical methods can provide a good approximation to the landscape of naturally evolving populations. PMID:27189564
Bellenguez, Céline; Strange, Amy; Freeman, Colin; Donnelly, Peter; Spencer, Chris C A
2012-01-01
High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer chris.spencer@well.ox.ac.uk Supplementary data are available at Bioinformatics online.
NASA Technical Reports Server (NTRS)
Gardner, Adrian
2010-01-01
National Aeronautical and Space Administration (NASA) weather and atmospheric environmental organizations are insatiable consumers of geophysical, hydrometeorological and solar weather statistics. The expanding array of internet-worked sensors producing targeted physical measurements has generated an almost factorial explosion of near real-time inputs to topical statistical datasets. Normalizing and value-based parsing of such statistical datasets in support of time-constrained weather and environmental alerts and warnings is essential, even with dedicated high-performance computational capabilities. What are the optimal indicators for advanced decision making? How do we recognize the line between sufficient statistical sampling and excessive, mission destructive sampling ? How do we assure that the normalization and parsing process, when interpolated through numerical models, yields accurate and actionable alerts and warnings? This presentation will address the integrated means and methods to achieve desired outputs for NASA and consumers of its data.
ERIC Educational Resources Information Center
Tabor, Josh
2010-01-01
On the 2009 AP[c] Statistics Exam, students were asked to create a statistic to measure skewness in a distribution. This paper explores several of the most popular student responses and evaluates which statistic performs best when sampling from various skewed populations. (Contains 8 figures, 3 tables, and 4 footnotes.)
Simulation Study Using a New Type of Sample Variance
NASA Technical Reports Server (NTRS)
Howe, D. A.; Lainson, K. J.
1996-01-01
We evaluate with simulated data a new type of sample variance for the characterization of frequency stability. The new statistic (referred to as TOTALVAR and its square root TOTALDEV) is a better predictor of long-term frequency variations than the present sample Allan deviation. The statistical model uses the assumption that a time series of phase or frequency differences is wrapped (periodic) with overall frequency difference removed. We find that the variability at long averaging times is reduced considerably for the five models of power-law noise commonly encountered with frequency standards and oscillators.
Small sample estimation of the reliability function for technical products
NASA Astrophysics Data System (ADS)
Lyamets, L. L.; Yakimenko, I. V.; Kanishchev, O. A.; Bliznyuk, O. A.
2017-12-01
It is demonstrated that, in the absence of big statistic samples obtained as a result of testing complex technical products for failure, statistic estimation of the reliability function of initial elements can be made by the moments method. A formal description of the moments method is given and its advantages in the analysis of small censored samples are discussed. A modified algorithm is proposed for the implementation of the moments method with the use of only the moments at which the failures of initial elements occur.
Pace, M.N.; Rosentreter, J.J.; Bartholomay, R.C.
2001-01-01
Idaho State University and the US Geological Survey, in cooperation with the US Department of Energy, conducted a study to determine and evaluate strontium distribution coefficients (Kds) of subsurface materials at the Idaho National Engineering and Environmental Laboratory (INEEL). The Kds were determined to aid in assessing the variability of strontium Kds and their effects on chemical transport of strontium-90 in the Snake River Plain aquifer system. Data from batch experiments done to determine strontium Kds of five sediment-infill samples and six standard reference material samples were analyzed by using multiple linear regression analysis and the stepwise variable-selection method in the statistical program, Statistical Product and Service Solutions, to derive an equation of variables that can be used to predict strontium Kds of sediment-infill samples. The sediment-infill samples were from basalt vesicles and fractures from a selected core at the INEEL; strontium Kds ranged from ???201 to 356 ml g-1. The standard material samples consisted of clay minerals and calcite. The statistical analyses of the batch-experiment results showed that the amount of strontium in the initial solution, the amount of manganese oxide in the sample material, and the amount of potassium in the initial solution are the most important variables in predicting strontium Kds of sediment-infill samples.
A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data
Chen, Yi-Hau
2017-01-01
Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA. PMID:28622336
A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data.
Lai, En-Yu; Chen, Yi-Hau; Wu, Kun-Pin
2017-06-01
Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, R; Bai, W
Purpose: Because of statistical noise in Monte Carlo dose calculations, effective point doses may not be accurate. Volume spheres are useful for evaluating dose in Monte Carlo plans, which have an inherent statistical uncertainty.We use a user-defined sphere volume instead of a point, take sphere sampling around effective point make the dose statistics to decrease the stochastic errors. Methods: Direct dose measurements were made using a 0.125cc Semiflex ion chamber (IC) 31010 isocentrically placed in the center of a homogeneous Cylindric sliced RW3 phantom (PTW, Germany).In the scanned CT phantom series the sensitive volume length of the IC (6.5mm) weremore » delineated and defined the isocenter as the simulation effective points. All beams were simulated in Monaco in accordance to the measured model. In our simulation using 2mm voxels calculation grid spacing and choose calculate dose to medium and request the relative standard deviation ≤0.5%. Taking three different assigned IC over densities (air electron density(ED) as 0.01g/cm3 default CT scanned ED and Esophageal lumen ED 0.21g/cm3) were tested at different sampling sphere radius (2.5, 2, 1.5 and 1 mm) statistics dose were compared with the measured does. Results: The results show that in the Monaco TPS for the IC using Esophageal lumen ED 0.21g/cm3 and sampling sphere radius 1.5mm the statistical value is the best accordance with the measured value, the absolute average percentage deviation is 0.49%. And when the IC using air electron density(ED) as 0.01g/cm3 and default CT scanned EDthe recommented statistical sampling sphere radius is 2.5mm, the percentage deviation are 0.61% and 0.70%, respectivly. Conclusion: In Monaco treatment planning system for the ionization chamber 31010 recommend air cavity using ED 0.21g/cm3 and sampling 1.5mm sphere volume instead of a point dose to decrease the stochastic errors. Funding Support No.C201505006.« less
Use Trends Indicated by Statistically Calibrated Recreational Sites in the National Forest System
Gary L. Tyre
1971-01-01
Trends in statistically sampled use of developed sites in the National Forest system indicate an average annual increase of 6.0 percent in the period 1966-69. The high variability of the measure precludes its use for projecting expected future use, but it can be important in gauging the credibility of annual use changes at both sampled and unsampled locations.
Howard Stauffer; Nadav Nur
2005-01-01
The papers included in the Advances in Statistics section of the Partners in Flight (PIF) 2002 Proceedings represent a small sample of statistical topics of current importance to Partners In Flight research scientists: hierarchical modeling, estimation of detection probabilities, and Bayesian applications. Sauer et al. (this volume) examines a hierarchical model...
Statistical analysis tables for truncated or censored samples
NASA Technical Reports Server (NTRS)
Cohen, A. C.; Cooley, C. G.
1971-01-01
Compilation describes characteristics of truncated and censored samples, and presents six illustrations of practical use of tables in computing mean and variance estimates for normal distribution using selected samples.
NASA Astrophysics Data System (ADS)
Phillips, Thomas J.; Gates, W. Lawrence; Arpe, Klaus
1992-12-01
The effects of sampling frequency on the first- and second-moment statistics of selected European Centre for Medium-Range Weather Forecasts (ECMWF) model variables are investigated in a simulation of "perpetual July" with a diurnal cycle included and with surface and atmospheric fields saved at hourly intervals. The shortest characteristic time scales (as determined by the e-folding time of lagged autocorrelation functions) are those of ground heat fluxes and temperatures, precipitation and runoff, convective processes, cloud properties, and atmospheric vertical motion, while the longest time scales are exhibited by soil temperature and moisture, surface pressure, and atmospheric specific humidity, temperature, and wind. The time scales of surface heat and momentum fluxes and of convective processes are substantially shorter over land than over oceans. An appropriate sampling frequency for each model variable is obtained by comparing the estimates of first- and second-moment statistics determined at intervals ranging from 2 to 24 hours with the "best" estimates obtained from hourly sampling. Relatively accurate estimation of first- and second-moment climate statistics (10% errors in means, 20% errors in variances) can be achieved by sampling a model variable at intervals that usually are longer than the bandwidth of its time series but that often are shorter than its characteristic time scale. For the surface variables, sampling at intervals that are nonintegral divisors of a 24-hour day yields relatively more accurate time-mean statistics because of a reduction in errors associated with aliasing of the diurnal cycle and higher-frequency harmonics. The superior estimates of first-moment statistics are accompanied by inferior estimates of the variance of the daily means due to the presence of systematic biases, but these probably can be avoided by defining a different measure of low-frequency variability. Estimates of the intradiurnal variance of accumulated precipitation and surface runoff also are strongly impacted by the length of the storage interval. In light of these results, several alternative strategies for storage of the EMWF model variables are recommended.
Slowdowns in diversification rates from real phylogenies may not be real.
Cusimano, Natalie; Renner, Susanne S
2010-07-01
Studies of diversification patterns often find a slowing in lineage accumulation toward the present. This seemingly pervasive pattern of rate downturns has been taken as evidence for adaptive radiations, density-dependent regulation, and metacommunity species interactions. The significance of rate downturns is evaluated with statistical tests (the gamma statistic and Monte Carlo constant rates (MCCR) test; birth-death likelihood models and Akaike Information Criterion [AIC] scores) that rely on null distributions, which assume that the included species are a random sample of the entire clade. Sampling in real phylogenies, however, often is nonrandom because systematists try to include early-diverging species or representatives of previous intrataxon classifications. We studied the effects of biased sampling, structured sampling, and random sampling by experimentally pruning simulated trees (60 and 150 species) as well as a completely sampled empirical tree (58 species) and then applying the gamma statistic/MCCR test and birth-death likelihood models/AIC scores to assess rate changes. For trees with random species sampling, the true model (i.e., the one fitting the complete phylogenies) could be inferred in most cases. Oversampling deep nodes, however, strongly biases inferences toward downturns, with simulations of structured and biased sampling suggesting that this occurs when sampling percentages drop below 80%. The magnitude of the effect and the sensitivity of diversification rate models is such that a useful rule of thumb may be not to infer rate downturns from real trees unless they have >80% species sampling.
Portillo, M C; Gonzalez, J M
2008-08-01
Molecular fingerprints of microbial communities are a common method for the analysis and comparison of environmental samples. The significance of differences between microbial community fingerprints was analyzed considering the presence of different phylotypes and their relative abundance. A method is proposed by simulating coverage of the analyzed communities as a function of sampling size applying a Cramér-von Mises statistic. Comparisons were performed by a Monte Carlo testing procedure. As an example, this procedure was used to compare several sediment samples from freshwater ponds using a relative quantitative PCR-DGGE profiling technique. The method was able to discriminate among different samples based on their molecular fingerprints, and confirmed the lack of differences between aliquots from a single sample.
Statistics 101 for Radiologists.
Anvari, Arash; Halpern, Elkan F; Samir, Anthony E
2015-10-01
Diagnostic tests have wide clinical applications, including screening, diagnosis, measuring treatment effect, and determining prognosis. Interpreting diagnostic test results requires an understanding of key statistical concepts used to evaluate test efficacy. This review explains descriptive statistics and discusses probability, including mutually exclusive and independent events and conditional probability. In the inferential statistics section, a statistical perspective on study design is provided, together with an explanation of how to select appropriate statistical tests. Key concepts in recruiting study samples are discussed, including representativeness and random sampling. Variable types are defined, including predictor, outcome, and covariate variables, and the relationship of these variables to one another. In the hypothesis testing section, we explain how to determine if observed differences between groups are likely to be due to chance. We explain type I and II errors, statistical significance, and study power, followed by an explanation of effect sizes and how confidence intervals can be used to generalize observed effect sizes to the larger population. Statistical tests are explained in four categories: t tests and analysis of variance, proportion analysis tests, nonparametric tests, and regression techniques. We discuss sensitivity, specificity, accuracy, receiver operating characteristic analysis, and likelihood ratios. Measures of reliability and agreement, including κ statistics, intraclass correlation coefficients, and Bland-Altman graphs and analysis, are introduced. © RSNA, 2015.
Interpreting “statistical hypothesis testing” results in clinical research
Sarmukaddam, Sanjeev B.
2012-01-01
Difference between “Clinical Significance and Statistical Significance” should be kept in mind while interpreting “statistical hypothesis testing” results in clinical research. This fact is already known to many but again pointed out here as philosophy of “statistical hypothesis testing” is sometimes unnecessarily criticized mainly due to failure in considering such distinction. Randomized controlled trials are also wrongly criticized similarly. Some scientific method may not be applicable in some peculiar/particular situation does not mean that the method is useless. Also remember that “statistical hypothesis testing” is not for decision making and the field of “decision analysis” is very much an integral part of science of statistics. It is not correct to say that “confidence intervals have nothing to do with confidence” unless one understands meaning of the word “confidence” as used in context of confidence interval. Interpretation of the results of every study should always consider all possible alternative explanations like chance, bias, and confounding. Statistical tests in inferential statistics are, in general, designed to answer the question “How likely is the difference found in random sample(s) is due to chance” and therefore limitation of relying only on statistical significance in making clinical decisions should be avoided. PMID:22707861
Computer Graphics Simulations of Sampling Distributions.
ERIC Educational Resources Information Center
Gordon, Florence S.; Gordon, Sheldon P.
1989-01-01
Describes the use of computer graphics simulations to enhance student understanding of sampling distributions that arise in introductory statistics. Highlights include the distribution of sample proportions, the distribution of the difference of sample means, the distribution of the difference of sample proportions, and the distribution of sample…
A statistical approach to selecting and confirming validation targets in -omics experiments
2012-01-01
Background Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. Results Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. Conclusions For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. PMID:22738145
NASA Astrophysics Data System (ADS)
Gomo, M.; Vermeulen, D.
2015-03-01
An investigation was conducted to statistically compare the influence of non-purging and purging groundwater sampling methods on analysed inorganic chemistry parameters and calculated saturation indices. Groundwater samples were collected from 15 monitoring wells drilled in Karoo aquifers before and after purging for the comparative study. For the non-purging method, samples were collected from groundwater flow zones located in the wells using electrical conductivity (EC) profiling. The two data sets of non-purged and purged groundwater samples were analysed for inorganic chemistry parameters at the Institute of Groundwater Studies (IGS) laboratory of the Free University in South Africa. Saturation indices for mineral phases that were found in the data base of PHREEQC hydrogeochemical model were calculated for each data set. Four one-way ANOVA tests were conducted using Microsoft excel 2007 to investigate if there is any statistically significant difference between: (1) all inorganic chemistry parameters measured in the non-purged and purged groundwater samples per each specific well, (2) all mineral saturation indices calculated for the non-purged and purged groundwater samples per each specific well, (3) individual inorganic chemistry parameters measured in the non-purged and purged groundwater samples across all wells and (4) Individual mineral saturation indices calculated for non-purged and purged groundwater samples across all wells. For all the ANOVA tests conducted, the calculated alpha values (p) are greater than 0.05 (significance level) and test statistic (F) is less than the critical value (Fcrit) (F < Fcrit). The results imply that there was no statistically significant difference between the two data sets. With a 95% confidence, it was therefore concluded that the variance between groups was rather due to random chance and not to the influence of the sampling methods (tested factor). It is therefore be possible that in some hydrogeologic conditions, non-purged groundwater samples might be just as representative as the purged ones. The findings of this study can provide an important platform for future evidence oriented research investigations to establish the necessity of purging prior to groundwater sampling in different aquifer systems.
Statistical analysis of radioimmunoassay. In comparison with bioassay (in Japanese)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakano, R.
1973-01-01
Using the data of RIA (radioimmunoassay), statistical procedures for dealing with two problems of the linearization of dose response curve and calculation of relative potency were described. There were three methods for linearization of dose response curve of RIA. In each method, the following parameters were shown on the horizontal and vertical axis: dose x, (B/T)/sup -1/; c/x + c, B/T (C: dose which makes B/T 50%); log x, logit B/T. Among them, the last method seems to be most practical. The statistical procedures for bioassay were employed for calculating the relative potency of unknown samples compared to the standardmore » samples from dose response curves of standand and unknown samples using regression coefficient. It is desirable that relative potency is calculated by plotting more than 5 points in the standard curve and plotting more than 2 points in unknow samples. For examining the statistical limit of precision of measuremert, LH activity of gonadotropin in urine was measured and relative potency, precision coefficient and the upper and lower limits of relative potency at 95% confidence limit were calculated. On the other hand, bioassay (by the ovarian ascorbic acid reduction method and anteriol lobe of prostate weighing method) was done in the same samples, and the precision was compared with that of RIA. In these examinations, the upper and lower limits of the relative potency at 95% confidence limit were near each other, while in bioassay, a considerable difference was observed between the upper and lower limits. The necessity of standardization and systematization of the statistical procedures for increasing the precision of RIA was pointed out. (JA)« less
Quantum walks: The first detected passage time problem
NASA Astrophysics Data System (ADS)
Friedman, H.; Kessler, D. A.; Barkai, E.
2017-03-01
Even after decades of research, the problem of first passage time statistics for quantum dynamics remains a challenging topic of fundamental and practical importance. Using a projective measurement approach, with a sampling time τ , we obtain the statistics of first detection events for quantum dynamics on a lattice, with the detector located at the origin. A quantum renewal equation for a first detection wave function, in terms of which the first detection probability can be calculated, is derived. This formula gives the relation between first detection statistics and the solution of the corresponding Schrödinger equation in the absence of measurement. We illustrate our results with tight-binding quantum walk models. We examine a closed system, i.e., a ring, and reveal the intricate influence of the sampling time τ on the statistics of detection, discussing the quantum Zeno effect, half dark states, revivals, and optimal detection. The initial condition modifies the statistics of a quantum walk on a finite ring in surprising ways. In some cases, the average detection time is independent of the sampling time while in others the average exhibits multiple divergences as the sampling time is modified. For an unbounded one-dimensional quantum walk, the probability of first detection decays like (time)(-3 ) with superimposed oscillations, with exceptional behavior when the sampling period τ times the tunneling rate γ is a multiple of π /2 . The amplitude of the power-law decay is suppressed as τ →0 due to the Zeno effect. Our work, an extended version of our previously published paper, predicts rich physical behaviors compared with classical Brownian motion, for which the first passage probability density decays monotonically like (time)-3 /2, as elucidated by Schrödinger in 1915.
Wente, Stephen P.
2004-01-01
Many Federal, Tribal, State, and local agencies monitor mercury in fish-tissue samples to identify sites with elevated fish-tissue mercury (fish-mercury) concentrations, track changes in fish-mercury concentrations over time, and produce fish-consumption advisories. Interpretation of such monitoring data commonly is impeded by difficulties in separating the effects of sample characteristics (species, tissues sampled, and sizes of fish) from the effects of spatial and temporal trends on fish-mercury concentrations. Without such a separation, variation in fish-mercury concentrations due to differences in the characteristics of samples collected over time or across space can be misattributed to temporal or spatial trends; and/or actual trends in fish-mercury concentration can be misattributed to differences in sample characteristics. This report describes a statistical model and national data set (31,813 samples) for calibrating the aforementioned statistical model that can separate spatiotemporal and sample characteristic effects in fish-mercury concentration data. This model could be useful for evaluating spatial and temporal trends in fishmercury concentrations and developing fish-consumption advisories. The observed fish-mercury concentration data and model predictions can be accessed, displayed geospatially, and downloaded via the World Wide Web (http://emmma.usgs.gov). This report and the associated web site may assist in the interpretation of large amounts of data from widespread fishmercury monitoring efforts.
The space of ultrametric phylogenetic trees.
Gavryushkin, Alex; Drummond, Alexei J
2016-08-21
The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data
Hu, Jiyuan; Li, Tengfei; Xiu, Zidi; Zhang, Hong
2015-01-01
Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package “MAFsnp” implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/. PMID:26309201
Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben
2017-09-15
Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Statistical Power in Meta-Analysis
ERIC Educational Resources Information Center
Liu, Jin
2015-01-01
Statistical power is important in a meta-analysis study, although few studies have examined the performance of simulated power in meta-analysis. The purpose of this study is to inform researchers about statistical power estimation on two sample mean difference test under different situations: (1) the discrepancy between the analytical power and…
34 CFR Appendix A to Subpart N of... - Sample Default Prevention Plan
Code of Federal Regulations, 2010 CFR
2010-07-01
... relevant default prevention statistics, including a statistical analysis of the borrowers who default on...'s delinquency status by obtaining reports from data managers and FFEL Program lenders. 5. Enhance... academic study. III. Statistics for Measuring Progress 1. The number of students enrolled at your...
Variance approximations for assessments of classification accuracy
R. L. Czaplewski
1994-01-01
Variance approximations are derived for the weighted and unweighted kappa statistics, the conditional kappa statistic, and conditional probabilities. These statistics are useful to assess classification accuracy, such as accuracy of remotely sensed classifications in thematic maps when compared to a sample of reference classifications made in the field. Published...
APA's Learning Objectives for Research Methods and Statistics in Practice: A Multimethod Analysis
ERIC Educational Resources Information Center
Tomcho, Thomas J.; Rice, Diana; Foels, Rob; Folmsbee, Leah; Vladescu, Jason; Lissman, Rachel; Matulewicz, Ryan; Bopp, Kara
2009-01-01
Research methods and statistics courses constitute a core undergraduate psychology requirement. We analyzed course syllabi and faculty self-reported coverage of both research methods and statistics course learning objectives to assess the concordance with APA's learning objectives (American Psychological Association, 2007). We obtained a sample of…
Mathematical Anxiety among Business Statistics Students.
ERIC Educational Resources Information Center
High, Robert V.
A survey instrument was developed to identify sources of mathematics anxiety among undergraduate business students in a statistics class. A number of statistics classes were selected at two colleges in Long Island, New York. A final sample of n=102 respondents indicated that there was a relationship between the mathematics grade in prior…
Moore, B.L.; Evaldi, R.D.
1995-01-01
Bottom sediments from 25 stream sites in Jefferson County, Ky., were analyzed for percent volatile solids and concentrations of nutrients, major metals, trace elements, miscellaneous inorganic compounds, and selected organic compounds. Statistical high outliers of the constituent concentrations analyzed for in the bottom sediments were defined as a measure of possible elevated concentrations. Statistical high outliers were determined for at least 1 constituent at each of 12 sampling sites in Jefferson County. Of the 10 stream basins sampled in Jefferson County, the Middle Fork Beargrass Basin, Cedar Creek Basin, and Harrods Creek Basin were the only three basins where a statistical high outlier was not found for any of the measured constituents. In the Pennsylvania Run Basin, total volatile solids, nitrate plus nitrite, and endrin constituents were statistical high outliers. Pond Creek was the only basin where five constituents were statistical high outliers-barium, beryllium, cadmium, chromium, and silver. Nitrate plus nitrite and copper constituents were the only statistical high outliers found in the Mill Creek Basin. In the Floyds Fork Basin, nitrate plus nitrite, phosphorus, mercury, and silver constituents were the only statistical high outliers. Ammonia was the only statistical high outlier found in the South Fork Beargrass Basin. In the Goose Creek Basin, mercury and silver constituents were the only statistical high outliers. Cyanide was the only statistical high outlier in the Muddy Fork Basin.
Systematic sampling for suspended sediment
Robert B. Thomas
1991-01-01
Abstract - Because of high costs or complex logistics, scientific populations cannot be measured entirely and must be sampled. Accepted scientific practice holds that sample selection be based on statistical principles to assure objectivity when estimating totals and variances. Probability sampling--obtaining samples with known probabilities--is the only method that...
ERIC Educational Resources Information Center
Blair, Edward; Blair, Johnny
2015-01-01
Written for students and researchers who wish to understand the conceptual and practical aspects of sampling, this book is designed to be accessible without requiring advanced statistical training. It covers a wide range of topics, from the basics of sampling to special topics such as sampling rare populations, sampling organizational populations,…
Practical continuous-variable quantum key distribution without finite sampling bandwidth effects.
Li, Huasheng; Wang, Chao; Huang, Peng; Huang, Duan; Wang, Tao; Zeng, Guihua
2016-09-05
In a practical continuous-variable quantum key distribution system, finite sampling bandwidth of the employed analog-to-digital converter at the receiver's side may lead to inaccurate results of pulse peak sampling. Then, errors in the parameters estimation resulted. Subsequently, the system performance decreases and security loopholes are exposed to eavesdroppers. In this paper, we propose a novel data acquisition scheme which consists of two parts, i.e., a dynamic delay adjusting module and a statistical power feedback-control algorithm. The proposed scheme may improve dramatically the data acquisition precision of pulse peak sampling and remove the finite sampling bandwidth effects. Moreover, the optimal peak sampling position of a pulse signal can be dynamically calibrated through monitoring the change of the statistical power of the sampled data in the proposed scheme. This helps to resist against some practical attacks, such as the well-known local oscillator calibration attack.
Kim, Sung-Min; Choi, Yosoon
2017-01-01
To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z-score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z-scores: high content with a high z-score (HH), high content with a low z-score (HL), low content with a high z-score (LH), and low content with a low z-score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1–4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required. PMID:28629168
Kim, Sung-Min; Choi, Yosoon
2017-06-18
To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z -score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z -scores: high content with a high z -score (HH), high content with a low z -score (HL), low content with a high z -score (LH), and low content with a low z -score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1-4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required.
Statistical characterization of a large geochemical database and effect of sample size
Zhang, C.; Manheim, F.T.; Hinde, J.; Grossman, J.N.
2005-01-01
The authors investigated statistical distributions for concentrations of chemical elements from the National Geochemical Survey (NGS) database of the U.S. Geological Survey. At the time of this study, the NGS data set encompasses 48,544 stream sediment and soil samples from the conterminous United States analyzed by ICP-AES following a 4-acid near-total digestion. This report includes 27 elements: Al, Ca, Fe, K, Mg, Na, P, Ti, Ba, Ce, Co, Cr, Cu, Ga, La, Li, Mn, Nb, Nd, Ni, Pb, Sc, Sr, Th, V, Y and Zn. The goal and challenge for the statistical overview was to delineate chemical distributions in a complex, heterogeneous data set spanning a large geographic range (the conterminous United States), and many different geological provinces and rock types. After declustering to create a uniform spatial sample distribution with 16,511 samples, histograms and quantile-quantile (Q-Q) plots were employed to delineate subpopulations that have coherent chemical and mineral affinities. Probability groupings are discerned by changes in slope (kinks) on the plots. Major rock-forming elements, e.g., Al, Ca, K and Na, tend to display linear segments on normal Q-Q plots. These segments can commonly be linked to petrologic or mineralogical associations. For example, linear segments on K and Na plots reflect dilution of clay minerals by quartz sand (low in K and Na). Minor and trace element relationships are best displayed on lognormal Q-Q plots. These sensitively reflect discrete relationships in subpopulations within the wide range of the data. For example, small but distinctly log-linear subpopulations for Pb, Cu, Zn and Ag are interpreted to represent ore-grade enrichment of naturally occurring minerals such as sulfides. None of the 27 chemical elements could pass the test for either normal or lognormal distribution on the declustered data set. Part of the reasons relate to the presence of mixtures of subpopulations and outliers. Random samples of the data set with successively smaller numbers of data points showed that few elements passed standard statistical tests for normality or log-normality until sample size decreased to a few hundred data points. Large sample size enhances the power of statistical tests, and leads to rejection of most statistical hypotheses for real data sets. For large sample sizes (e.g., n > 1000), graphical methods such as histogram, stem-and-leaf, and probability plots are recommended for rough judgement of probability distribution if needed. ?? 2005 Elsevier Ltd. All rights reserved.
A normative inference approach for optimal sample sizes in decisions from experience
Ostwald, Dirk; Starke, Ludger; Hertwig, Ralph
2015-01-01
“Decisions from experience” (DFE) refers to a body of work that emerged in research on behavioral decision making over the last decade. One of the major experimental paradigms employed to study experience-based choice is the “sampling paradigm,” which serves as a model of decision making under limited knowledge about the statistical structure of the world. In this paradigm respondents are presented with two payoff distributions, which, in contrast to standard approaches in behavioral economics, are specified not in terms of explicit outcome-probability information, but by the opportunity to sample outcomes from each distribution without economic consequences. Participants are encouraged to explore the distributions until they feel confident enough to decide from which they would prefer to draw from in a final trial involving real monetary payoffs. One commonly employed measure to characterize the behavior of participants in the sampling paradigm is the sample size, that is, the number of outcome draws which participants choose to obtain from each distribution prior to terminating sampling. A natural question that arises in this context concerns the “optimal” sample size, which could be used as a normative benchmark to evaluate human sampling behavior in DFE. In this theoretical study, we relate the DFE sampling paradigm to the classical statistical decision theoretic literature and, under a probabilistic inference assumption, evaluate optimal sample sizes for DFE. In our treatment we go beyond analytically established results by showing how the classical statistical decision theoretic framework can be used to derive optimal sample sizes under arbitrary, but numerically evaluable, constraints. Finally, we critically evaluate the value of deriving optimal sample sizes under this framework as testable predictions for the experimental study of sampling behavior in DFE. PMID:26441720
Survey of Youth in Custody, 1987. Bureau of Justice Statistics Special Report.
ERIC Educational Resources Information Center
Beck, Allen J.; And Others
In 1987, the U.S. Bureau of Justice Statistics interviewed 2,621 juveniles and young adults confined in 50 long-term, state-operated institutions in 26 states. More than one-quarter of the sample were over the age of 18. The results of the survey revealed that nearly 40% of the sample were being held for a violent offense. More than 60% used drugs…
Erus, Guray; Zacharaki, Evangelia I; Davatzikos, Christos
2014-04-01
This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a "target-specific" feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject's images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an "estimability" criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. Copyright © 2014 Elsevier B.V. All rights reserved.
Vexler, Albert; Tanajian, Hovig; Hutson, Alan D
In practice, parametric likelihood-ratio techniques are powerful statistical tools. In this article, we propose and examine novel and simple distribution-free test statistics that efficiently approximate parametric likelihood ratios to analyze and compare distributions of K groups of observations. Using the density-based empirical likelihood methodology, we develop a Stata package that applies to a test for symmetry of data distributions and compares K -sample distributions. Recognizing that recent statistical software packages do not sufficiently address K -sample nonparametric comparisons of data distributions, we propose a new Stata command, vxdbel, to execute exact density-based empirical likelihood-ratio tests using K samples. To calculate p -values of the proposed tests, we use the following methods: 1) a classical technique based on Monte Carlo p -value evaluations; 2) an interpolation technique based on tabulated critical values; and 3) a new hybrid technique that combines methods 1 and 2. The third, cutting-edge method is shown to be very efficient in the context of exact-test p -value computations. This Bayesian-type method considers tabulated critical values as prior information and Monte Carlo generations of test statistic values as data used to depict the likelihood function. In this case, a nonparametric Bayesian method is proposed to compute critical values of exact tests.
Erus, Guray; Zacharaki, Evangelia I.; Davatzikos, Christos
2014-01-01
This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a “target-specific” feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject’s images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an “estimability” criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. PMID:24607564
On the analysis of very small samples of Gaussian repeated measurements: an alternative approach.
Westgate, Philip M; Burchett, Woodrow W
2017-03-15
The analysis of very small samples of Gaussian repeated measurements can be challenging. First, due to a very small number of independent subjects contributing outcomes over time, statistical power can be quite small. Second, nuisance covariance parameters must be appropriately accounted for in the analysis in order to maintain the nominal test size. However, available statistical strategies that ensure valid statistical inference may lack power, whereas more powerful methods may have the potential for inflated test sizes. Therefore, we explore an alternative approach to the analysis of very small samples of Gaussian repeated measurements, with the goal of maintaining valid inference while also improving statistical power relative to other valid methods. This approach uses generalized estimating equations with a bias-corrected empirical covariance matrix that accounts for all small-sample aspects of nuisance correlation parameter estimation in order to maintain valid inference. Furthermore, the approach utilizes correlation selection strategies with the goal of choosing the working structure that will result in the greatest power. In our study, we show that when accurate modeling of the nuisance correlation structure impacts the efficiency of regression parameter estimation, this method can improve power relative to existing methods that yield valid inference. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
14 CFR Sec. 19-7 - Passenger origin-destination survey.
Code of Federal Regulations, 2011 CFR
2011-01-01
... Transportation Statistics' Director of Airline Information. (c) A statistically valid sample of light coupons... LAX Salt Lake City NorthwestOperating Carrier NorthwestTicketed Carrier Fare Code Phoenix American...
Distribution of the two-sample t-test statistic following blinded sample size re-estimation.
Lu, Kaifeng
2016-05-01
We consider the blinded sample size re-estimation based on the simple one-sample variance estimator at an interim analysis. We characterize the exact distribution of the standard two-sample t-test statistic at the final analysis. We describe a simulation algorithm for the evaluation of the probability of rejecting the null hypothesis at given treatment effect. We compare the blinded sample size re-estimation method with two unblinded methods with respect to the empirical type I error, the empirical power, and the empirical distribution of the standard deviation estimator and final sample size. We characterize the type I error inflation across the range of standardized non-inferiority margin for non-inferiority trials, and derive the adjusted significance level to ensure type I error control for given sample size of the internal pilot study. We show that the adjusted significance level increases as the sample size of the internal pilot study increases. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Statistical Analysis of Research Data | Center for Cancer Research
Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. The Statistical Analysis of Research Data (SARD) course will be held on April 5-6, 2018 from 9 a.m.-5 p.m. at the National Institutes of Health's Natcher Conference Center, Balcony C on the Bethesda Campus. SARD is designed to provide an overview on the general principles of statistical analysis of research data. The first day will feature univariate data analysis, including descriptive statistics, probability distributions, one- and two-sample inferential statistics.
NASA Technical Reports Server (NTRS)
Hough, D. H.; Readhead, A. C. S.
1989-01-01
A complete, flux-density-limited sample of double-lobed radio quasars is defined, with nuclei bright enough to be mapped with the Mark III VLBI system. It is shown that the statistics of linear size, nuclear strength, and curvature are consistent with the assumption of random source orientations and simple relativistic beaming in the nuclei. However, these statistics are also consistent with the effects of interaction between the beams and the surrounding medium. The distribution of jet velocities in the nuclei, as measured with VLBI, will provide a powerful test of physical theories of extragalactic radio sources.
Balanced mechanical resonator for powder handling device
NASA Technical Reports Server (NTRS)
Sarrazin, Philippe C. (Inventor); Brunner, Will M. (Inventor)
2012-01-01
A system incorporating a balanced mechanical resonator and a method for vibration of a sample composed of granular material to generate motion of a powder sample inside the sample holder for obtaining improved analysis statistics, without imparting vibration to the sample holder support.
CALIPSO Observations of Near-Cloud Aerosol Properties as a Function of Cloud Fraction
NASA Technical Reports Server (NTRS)
Yang, Weidong; Marshak, Alexander; Varnai, Tamas; Wood, Robert
2015-01-01
This paper uses spaceborne lidar data to study how near-cloud aerosol statistics of attenuated backscatter depend on cloud fraction. The results for a large region around the Azores show that: (1) far-from-cloud aerosol statistics are dominated by samples from scenes with lower cloud fractions, while near-cloud aerosol statistics are dominated by samples from scenes with higher cloud fractions; (2) near-cloud enhancements of attenuated backscatter occur for any cloud fraction but are most pronounced for higher cloud fractions; (3) the difference in the enhancements for different cloud fractions is most significant within 5km from clouds; (4) near-cloud enhancements can be well approximated by logarithmic functions of cloud fraction and distance to clouds. These findings demonstrate that if variability in cloud fraction across the scenes used to composite aerosol statistics are not considered, a sampling artifact will affect these statistics calculated as a function of distance to clouds. For the Azores-region dataset examined here, this artifact occurs mostly within 5 km from clouds, and exaggerates the near-cloud enhancements of lidar backscatter and color ratio by about 30. This shows that for accurate characterization of the changes in aerosol properties with distance to clouds, it is important to account for the impact of changes in cloud fraction.
Morris, Roisin; MacNeela, Padraig; Scott, Anne; Treacy, Pearl; Hyde, Abbey; O'Brien, Julian; Lehwaldt, Daniella; Byrne, Anne; Drennan, Jonathan
2008-04-01
In a study to establish the interrater reliability of the Irish Nursing Minimum Data Set (I-NMDS) for mental health difficulties relating to the choice of reliability test statistic were encountered. The objective of this paper is to highlight the difficulties associated with testing interrater reliability for an ordinal scale using a relatively homogenous sample and the recommended kw statistic. One pair of mental health nurses completed the I-NMDS for mental health for a total of 30 clients attending a mental health day centre over a two-week period. Data was analysed using the kw and percentage agreement statistics. A total of 34 of the 38 I-NMDS for mental health variables with lower than acceptable levels of kw reliability scores achieved acceptable levels of reliability according to their percentage agreement scores. The study findings implied that, due to the homogeneity of the sample, low variability within the data resulted in the 'base rate problem' associated with the use of kw statistic. Conclusions point to the interpretation of kw in tandem with percentage agreement scores. Suggestions that kw scores were low due to chance agreement and that one should strive to use a study sample with known variability are queried.
Chi-squared and C statistic minimization for low count per bin data
NASA Astrophysics Data System (ADS)
Nousek, John A.; Shue, David R.
1989-07-01
Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.
Evaluation of a segment-based LANDSAT full-frame approach to corp area estimation
NASA Technical Reports Server (NTRS)
Bauer, M. E. (Principal Investigator); Hixson, M. M.; Davis, S. M.
1981-01-01
As the registration of LANDSAT full frames enters the realm of current technology, sampling methods should be examined which utilize other than the segment data used for LACIE. The effect of separating the functions of sampling for training and sampling for area estimation. The frame selected for analysis was acquired over north central Iowa on August 9, 1978. A stratification of he full-frame was defined. Training data came from segments within the frame. Two classification and estimation procedures were compared: statistics developed on one segment were used to classify that segment, and pooled statistics from the segments were used to classify a systematic sample of pixels. Comparisons to USDA/ESCS estimates illustrate that the full-frame sampling approach can provide accurate and precise area estimates.
Analysis of Statistical Methods Currently used in Toxicology Journals
Na, Jihye; Yang, Hyeri
2014-01-01
Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health. PMID:25343012
Analysis of Statistical Methods Currently used in Toxicology Journals.
Na, Jihye; Yang, Hyeri; Bae, SeungJin; Lim, Kyung-Min
2014-09-01
Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health.
Inference with viral quasispecies diversity indices: clonal and NGS approaches.
Gregori, Josep; Salicrú, Miquel; Domingo, Esteban; Sanchez, Alex; Esteban, Juan I; Rodríguez-Frías, Francisco; Quer, Josep
2014-04-15
Given the inherent dynamics of a viral quasispecies, we are often interested in the comparison of diversity indices of sequential samples of a patient, or in the comparison of diversity indices of virus in groups of patients in a treated versus control design. It is then important to make sure that the diversity measures from each sample may be compared with no bias and within a consistent statistical framework. In the present report, we review some indices often used as measures for viral quasispecies complexity and provide means for statistical inference, applying procedures taken from the ecology field. In particular, we examine the Shannon entropy and the mutation frequency, and we discuss the appropriateness of different normalization methods of the Shannon entropy found in the literature. By taking amplicons ultra-deep pyrosequencing (UDPS) raw data as a surrogate of a real hepatitis C virus viral population, we study through in-silico sampling the statistical properties of these indices under two methods of viral quasispecies sampling, classical cloning followed by Sanger sequencing (CCSS) and next-generation sequencing (NGS) such as UDPS. We propose solutions specific to each of the two sampling methods-CCSS and NGS-to guarantee statistically conforming conclusions as free of bias as possible. josep.gregori@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
De Spiegelaere, Ward; Malatinkova, Eva; Lynch, Lindsay; Van Nieuwerburgh, Filip; Messiaen, Peter; O'Doherty, Una; Vandekerckhove, Linos
2014-06-01
Quantification of integrated proviral HIV DNA by repetitive-sampling Alu-HIV PCR is a candidate virological tool to monitor the HIV reservoir in patients. However, the experimental procedures and data analysis of the assay are complex and hinder its widespread use. Here, we provide an improved and simplified data analysis method by adopting binomial and Poisson statistics. A modified analysis method on the basis of Poisson statistics was used to analyze the binomial data of positive and negative reactions from a 42-replicate Alu-HIV PCR by use of dilutions of an integration standard and on samples of 57 HIV-infected patients. Results were compared with the quantitative output of the previously described Alu-HIV PCR method. Poisson-based quantification of the Alu-HIV PCR was linearly correlated with the standard dilution series, indicating that absolute quantification with the Poisson method is a valid alternative for data analysis of repetitive-sampling Alu-HIV PCR data. Quantitative outputs of patient samples assessed by the Poisson method correlated with the previously described Alu-HIV PCR analysis, indicating that this method is a valid alternative for quantifying integrated HIV DNA. Poisson-based analysis of the Alu-HIV PCR data enables absolute quantification without the need of a standard dilution curve. Implementation of the CI estimation permits improved qualitative analysis of the data and provides a statistical basis for the required minimal number of technical replicates. © 2014 The American Association for Clinical Chemistry.
48 CFR 31.201-6 - Accounting for unallowable costs.
Code of Federal Regulations, 2012 CFR
2012-10-01
... unbiased sample that is a reasonable representation of the sampling universe. (ii) Any large dollar value... universe from that sampled cost is also subject to the same penalty provisions. (4) Use of statistical...
48 CFR 31.201-6 - Accounting for unallowable costs.
Code of Federal Regulations, 2013 CFR
2013-10-01
... unbiased sample that is a reasonable representation of the sampling universe. (ii) Any large dollar value... universe from that sampled cost is also subject to the same penalty provisions. (4) Use of statistical...
48 CFR 31.201-6 - Accounting for unallowable costs.
Code of Federal Regulations, 2014 CFR
2014-10-01
... unbiased sample that is a reasonable representation of the sampling universe. (ii) Any large dollar value... universe from that sampled cost is also subject to the same penalty provisions. (4) Use of statistical...
48 CFR 31.201-6 - Accounting for unallowable costs.
Code of Federal Regulations, 2011 CFR
2011-10-01
... unbiased sample that is a reasonable representation of the sampling universe. (ii) Any large dollar value... universe from that sampled cost is also subject to the same penalty provisions. (4) Use of statistical...
Community Health Centers: Providers, Patients, and Content of Care
... Statistics (NCHS). NAMCS uses a multistage probability sample design involving geographic primary sampling units (PSUs), physician practices ... 05 level. To account for the complex sample design during variance estimation, all analyses were performed using ...
Designing Intervention Studies: Selected Populations, Range Restrictions, and Statistical Power
ERIC Educational Resources Information Center
Miciak, Jeremy; Taylor, W. Pat; Stuebing, Karla K.; Fletcher, Jack M.; Vaughn, Sharon
2016-01-01
An appropriate estimate of statistical power is critical for the design of intervention studies. Although the inclusion of a pretest covariate in the test of the primary outcome can increase statistical power, samples selected on the basis of pretest performance may demonstrate range restriction on the selection measure and other correlated…
Long-term strategy for the statistical design of a forest health monitoring system
Hans T. Schreuder; Raymond L. Czaplewski
1993-01-01
A conceptual framework is given for a broad-scale survey of forest health that accomplishes three objectives: generate descriptive statistics; detect changes in such statistics; and simplify analytical inferences that identify, and possibly establish cause-effect relationships. Our paper discusses the development of sampling schemes to satisfy these three objectives,...
Writing to Learn Statistics in an Advanced Placement Statistics Course
ERIC Educational Resources Information Center
Northrup, Christian Glenn
2012-01-01
This study investigated the use of writing in a statistics classroom to learn if writing provided a rich description of problem-solving processes of students as they solved problems. Through analysis of 329 written samples provided by students, it was determined that writing provided a rich description of problem-solving processes and enabled…
Artificial Intelligence Approach to Support Statistical Quality Control Teaching
ERIC Educational Resources Information Center
Reis, Marcelo Menezes; Paladini, Edson Pacheco; Khator, Suresh; Sommer, Willy Arno
2006-01-01
Statistical quality control--SQC (consisting of Statistical Process Control, Process Capability Studies, Acceptance Sampling and Design of Experiments) is a very important tool to obtain, maintain and improve the Quality level of goods and services produced by an organization. Despite its importance, and the fact that it is taught in technical and…
Applying Statistical Process Control to Clinical Data: An Illustration.
ERIC Educational Resources Information Center
Pfadt, Al; And Others
1992-01-01
Principles of statistical process control are applied to a clinical setting through the use of control charts to detect changes, as part of treatment planning and clinical decision-making processes. The logic of control chart analysis is derived from principles of statistical inference. Sample charts offer examples of evaluating baselines and…
Web-Based Statistical Sampling and Analysis
ERIC Educational Resources Information Center
Quinn, Anne; Larson, Karen
2016-01-01
Consistent with the Common Core State Standards for Mathematics (CCSSI 2010), the authors write that they have asked students to do statistics projects with real data. To obtain real data, their students use the free Web-based app, Census at School, created by the American Statistical Association (ASA) to help promote civic awareness among school…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Plemons, R.E.; Hopwood, W.H. Jr.; Hamilton, J.H.
For a number of years the Oak Ridge Y-12 Plant Laboratory has been analyzing coal predominately for the utilities department of the Y-12 Plant. All laboratory procedures, except a Leco sulfur method which used the Leco Instruction Manual as a reference, were written based on the ASTM coal analyses. Sulfur is analyzed at the present time by two methods, gravimetric and Leco. The laboratory has two major endeavors for monitoring the quality of its coal analyses. (1) A control program by the Plant Statistical Quality Control Department. Quality Control submits one sample for every nine samples submitted by the utilitiesmore » departments and the laboratory analyzes a control sample along with the utilities samples. (2) An exchange program with the DOE Coal Analysis Laboratory in Bruceton, Pennsylvania. The Y-12 Laboratory submits to the DOE Coal Laboratory, on even numbered months, a sample that Y-12 has analyzed. The DOE Coal Laboratory submits, on odd numbered months, one of their analyzed samples to the Y-12 Plant Laboratory to be analyzed. The results of these control and exchange programs are monitored not only by laboratory personnel, but also by Statistical Quality Control personnel who provide statistical evaluations. After analysis and reporting of results, all utilities samples are retained by the laboratory until the coal contracts have been settled. The utilities departments have responsibility for the initiation and preparation of the coal samples. The samples normally received by the laboratory have been ground to 4-mesh, reduced to 0.5-gallon quantities, and sealed in air-tight containers. Sample identification numbers and a Request for Analysis are generated by the utilities departments.« less
Comparison of Sample Size by Bootstrap and by Formulas Based on Normal Distribution Assumption.
Wang, Zuozhen
2018-01-01
Bootstrapping technique is distribution-independent, which provides an indirect way to estimate the sample size for a clinical trial based on a relatively smaller sample. In this paper, sample size estimation to compare two parallel-design arms for continuous data by bootstrap procedure are presented for various test types (inequality, non-inferiority, superiority, and equivalence), respectively. Meanwhile, sample size calculation by mathematical formulas (normal distribution assumption) for the identical data are also carried out. Consequently, power difference between the two calculation methods is acceptably small for all the test types. It shows that the bootstrap procedure is a credible technique for sample size estimation. After that, we compared the powers determined using the two methods based on data that violate the normal distribution assumption. To accommodate the feature of the data, the nonparametric statistical method of Wilcoxon test was applied to compare the two groups in the data during the process of bootstrap power estimation. As a result, the power estimated by normal distribution-based formula is far larger than that by bootstrap for each specific sample size per group. Hence, for this type of data, it is preferable that the bootstrap method be applied for sample size calculation at the beginning, and that the same statistical method as used in the subsequent statistical analysis is employed for each bootstrap sample during the course of bootstrap sample size estimation, provided there is historical true data available that can be well representative of the population to which the proposed trial is planning to extrapolate.
Cundell, A M; Bean, R; Massimore, L; Maier, C
1998-01-01
To determine the relationship between the sampling time of the environmental monitoring, i.e., viable counts, in aseptic filling areas and the microbial count and frequency of alerts for air, surface and personnel microbial monitoring, statistical analyses were conducted on 1) the frequency of alerts versus the time of day for routine environmental sampling conducted in calendar year 1994, and 2) environmental monitoring data collected at 30-minute intervals during routine aseptic filling operations over two separate days in four different clean rooms with multiple shifts and equipment set-ups at a parenteral manufacturing facility. Statistical analyses showed, except for one floor location that had significantly higher number of counts but no alert or action level samplings in the first two hours of operation, there was no relationship between the number of counts and the time of sampling. Further studies over a 30-day period at the floor location showed no relationship between time of sampling and microbial counts. The conclusion reached in the study was that there is no worst case time for environmental monitoring at that facility and that sampling any time during the aseptic filling operation will give a satisfactory measure of the microbial cleanliness in the clean room during the set-up and aseptic filling operation.
Remineralization Property of an Orthodontic Primer Containing a Bioactive Glass with Silver and Zinc
Lee, Seung-Min; Kim, In-Ryoung; Park, Bong-Soo; Ko, Ching-Chang; Son, Woo-Sung; Kim, Yong-Il
2017-01-01
White spot lesions (WSLs) are irreversible damages in orthodontic treatment due to excessive etching or demineralization by microorganisms. In this study, we conducted a mechanical and cell viability test to examine the antibacterial properties of 0.2% and 1% bioactive glass (BAG) and silver-doped and zinc-doped BAGs in a primer and evaluated their clinical applicability to prevent WSLs. The microhardness statistically significantly increased in the adhesive-containing BAG, while the other samples showed no statistically significant difference compared with the control group. The shear bond strength of all samples increased compared with that of the control group. The cell viability of the control and sample groups was similar within 24 h, but decreased slightly over 48 h. All samples showed antibacterial properties. Regarding remineralization property, the group containing 0.2% of the samples showed remineralization properties compared with the control group, but was not statistically significant; further, the group containing 1% of the samples showed a significant difference compared with the control group. Among them, the orthodontic bonding primer containing 1% silver-doped BAG showed the highest remineralization property. The new orthodontic bonding primer used in this study showed an antimicrobial effect, chemical remineralization effect, and WSL prevention as well as clinically applicable properties, both physically and biologically. PMID:29088092
NASA Astrophysics Data System (ADS)
Grulke, Eric A.; Wu, Xiaochun; Ji, Yinglu; Buhr, Egbert; Yamamoto, Kazuhiro; Song, Nam Woong; Stefaniak, Aleksandr B.; Schwegler-Berry, Diane; Burchett, Woodrow W.; Lambert, Joshua; Stromberg, Arnold J.
2018-04-01
Size and shape distributions of gold nanorod samples are critical to their physico-chemical properties, especially their longitudinal surface plasmon resonance. This interlaboratory comparison study developed methods for measuring and evaluating size and shape distributions for gold nanorod samples using transmission electron microscopy (TEM) images. The objective was to determine whether two different samples, which had different performance attributes in their application, were different with respect to their size and/or shape descriptor distributions. Touching particles in the captured images were identified using a ruggedness shape descriptor. Nanorods could be distinguished from nanocubes using an elongational shape descriptor. A non-parametric statistical test showed that cumulative distributions of an elongational shape descriptor, that is, the aspect ratio, were statistically different between the two samples for all laboratories. While the scale parameters of size and shape distributions were similar for both samples, the width parameters of size and shape distributions were statistically different. This protocol fulfills an important need for a standardized approach to measure gold nanorod size and shape distributions for applications in which quantitative measurements and comparisons are important. Furthermore, the validated protocol workflow can be automated, thus providing consistent and rapid measurements of nanorod size and shape distributions for researchers, regulatory agencies, and industry.
Evaluating the efficiency of environmental monitoring programs
Levine, Carrie R.; Yanai, Ruth D.; Lampman, Gregory G.; Burns, Douglas A.; Driscoll, Charles T.; Lawrence, Gregory B.; Lynch, Jason; Schoch, Nina
2014-01-01
Statistical uncertainty analyses can be used to improve the efficiency of environmental monitoring, allowing sampling designs to maximize information gained relative to resources required for data collection and analysis. In this paper, we illustrate four methods of data analysis appropriate to four types of environmental monitoring designs. To analyze a long-term record from a single site, we applied a general linear model to weekly stream chemistry data at Biscuit Brook, NY, to simulate the effects of reducing sampling effort and to evaluate statistical confidence in the detection of change over time. To illustrate a detectable difference analysis, we analyzed a one-time survey of mercury concentrations in loon tissues in lakes in the Adirondack Park, NY, demonstrating the effects of sampling intensity on statistical power and the selection of a resampling interval. To illustrate a bootstrapping method, we analyzed the plot-level sampling intensity of forest inventory at the Hubbard Brook Experimental Forest, NH, to quantify the sampling regime needed to achieve a desired confidence interval. Finally, to analyze time-series data from multiple sites, we assessed the number of lakes and the number of samples per year needed to monitor change over time in Adirondack lake chemistry using a repeated-measures mixed-effects model. Evaluations of time series and synoptic long-term monitoring data can help determine whether sampling should be re-allocated in space or time to optimize the use of financial and human resources.
Evaluation of asbestos levels in two schools before and after asbestos removal. Final report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karaffa, M.A.; Chesson, J.; Russell, J.
This report presents a statistical evaluation of airborne asbestos data collected at two schools before and after removal of asbestos-containing material (ACM). Although the monitoring data are not totally consistent with new Asbestos Hazard Emergency Response Act (AHERA) requirements and recent EPA guidelines, the study evaluates these historical data by standard statistical methods to determine if abated work areas meet proposed clearance criteria. The objectives of this statistical analysis were to compare (1) airborne asbestos levels indoors after removal with levels outdoors, (2) airborne asbestos levels before and after removal of asbestos, and (3) static sampling and aggressive sampling ofmore » airborne asbestos. The results of this evaluation indicated the following: the effect of asbestos removal on indoor air quality is unpredictable; the variability in fiber concentrations among different sampling sites within the same building indicates the need to treat different sites as separate areas for the purpose of clearance; and aggressive sampling is appropriate for clearance testing because it captures more entrainable asbestos structures. Aggressive sampling lowers the chance of declaring a worksite clean when entrainable asbestos is still present.« less
Song, Rui; Kosorok, Michael R.; Cai, Jianwen
2009-01-01
Summary Recurrent events data are frequently encountered in clinical trials. This article develops robust covariate-adjusted log-rank statistics applied to recurrent events data with arbitrary numbers of events under independent censoring and the corresponding sample size formula. The proposed log-rank tests are robust with respect to different data-generating processes and are adjusted for predictive covariates. It reduces to the Kong and Slud (1997, Biometrika 84, 847–862) setting in the case of a single event. The sample size formula is derived based on the asymptotic normality of the covariate-adjusted log-rank statistics under certain local alternatives and a working model for baseline covariates in the recurrent event data context. When the effect size is small and the baseline covariates do not contain significant information about event times, it reduces to the same form as that of Schoenfeld (1983, Biometrics 39, 499–503) for cases of a single event or independent event times within a subject. We carry out simulations to study the control of type I error and the comparison of powers between several methods in finite samples. The proposed sample size formula is illustrated using data from an rhDNase study. PMID:18162107
Dermatoglyphic features in patients with multiple sclerosis
Sabanciogullari, Vedat; Cevik, Seyda; Karacan, Kezban; Bolayir, Ertugrul; Cimen, Mehmet
2014-01-01
Objective: To examine dermatoglyphic features to clarify implicated genetic predisposition in the etiology of multiple sclerosis (MS). Methods: The study was conducted between January and December 2013 in the Departments of Anatomy, and Neurology, Cumhuriyet University School of Medicine, Sivas, Turkey. The dermatoglyphic data of 61 patients, and a control group consisting of 62 healthy adults obtained with a digital scanner were transferred to a computer environment. The ImageJ program was used, and atd, dat, adt angles, a-b ridge count, sample types of all fingers, and ridge counts were calculated. Results: In both hands of the patients with MS, the a-b ridge count and ridge counts in all fingers increased, and the differences in these values were statistically significant. There was also a statistically significant increase in the dat angle in both hands of the MS patients. On the contrary, there was no statistically significant difference between the groups in terms of dermal ridge samples, and the most frequent sample in both groups was the ulnar loop. Conclusions: Aberrations in the distribution of dermatoglyphic samples support the genetic predisposition in MS etiology. Multiple sclerosis susceptible individuals may be determined by analyzing dermatoglyphic samples. PMID:25274586
Acidity of fine sulfate particles at Great Smokey Mountains National Park
DOE Office of Scientific and Technical Information (OSTI.GOV)
Day, D.; Malm, W.C.; Kreidenweis, S.
1995-12-31
The acidity of ambient particles is of interest from the perspectives of human health, visibility, and ecology. This paper reports on the acidity of fine (< 2.5{mu}m) particles measured during August 1994 at Look Rock observation tower in Great Smokey Mountains National Park. This site is located at latitude 35{degrees} 37 feet 56 inches, longitude 83{degrees} 56 feet 32 inches, and at an elevation of 808m above sea level. All samples were collected using the IMPROVE (Interagency Monitoring of Protected Visual Environments) sampler. The sampling periods included: (1) 4-hour samples collected three times daily with starting times of 8:00 AM,more » 12:00 noon, and 4:00 PM; (2) 12-hour samples collected twice daily with starting times of 8:00 AM and 8:00 PM (all times reported are eastern daylight savings time). The IMPROVE sampler, collecting 4-hour samples, employed a citric acid/glycerol coated annular denuder to remove ammonia gas while the 12-hour sampler did not use a citric acid denuder. The intensive monitoring effort, conducted during August 1994, showed that: (1) the fine aerosol mass is generally dominated by sulfate and its associated water; (2) there was no statistically significant difference in average sulfate concentration between the 12-hour samples nor was there a statistically significant difference in average sulfate concentration between the 4-hour samples; (3) the aerosol is highly acidic, ranging from almost pure sulfuric acid to pure ammonium bisulfate, with an average molar ammonium ion to sulfate ratio of about 0.75 which suggests the ambient sulfate aerosol was a mixture of ammonium bisulfate and sulfuric acid; and (4) there was no statistically significant diurnal variation in particle acidity nor was there a statistically significant difference in particle acidity between the 4 hour samples.« less
A Science and Risk-Based Pragmatic Methodology for Blend and Content Uniformity Assessment.
Sayeed-Desta, Naheed; Pazhayattil, Ajay Babu; Collins, Jordan; Doshi, Chetan
2018-04-01
This paper describes a pragmatic approach that can be applied in assessing powder blend and unit dosage uniformity of solid dose products at Process Design, Process Performance Qualification, and Continued/Ongoing Process Verification stages of the Process Validation lifecycle. The statistically based sampling, testing, and assessment plan was developed due to the withdrawal of the FDA draft guidance for industry "Powder Blends and Finished Dosage Units-Stratified In-Process Dosage Unit Sampling and Assessment." This paper compares the proposed Grouped Area Variance Estimate (GAVE) method with an alternate approach outlining the practicality and statistical rationalization using traditional sampling and analytical methods. The approach is designed to fit solid dose processes assuring high statistical confidence in both powder blend uniformity and dosage unit uniformity during all three stages of the lifecycle complying with ASTM standards as recommended by the US FDA.
NASA Astrophysics Data System (ADS)
Naylor, M.; Main, I. G.; Greenhough, J.; Bell, A. F.; McCloskey, J.
2009-04-01
The Sumatran Boxing Day earthquake and subsequent large events provide an opportunity to re-evaluate the statistical evidence for characteristic earthquake events in frequency-magnitude distributions. Our aims are to (i) improve intuition regarding the properties of samples drawn from power laws, (ii) illustrate using random samples how appropriate Poisson confidence intervals can both aid the eye and provide an appropriate statistical evaluation of data drawn from power-law distributions, and (iii) apply these confidence intervals to test for evidence of characteristic earthquakes in subduction-zone frequency-magnitude distributions. We find no need for a characteristic model to describe frequency magnitude distributions in any of the investigated subduction zones, including Sumatra, due to an emergent skew in residuals of power law count data at high magnitudes combined with a sample bias for examining large earthquakes as candidate characteristic events.
Improving Instruction Using Statistical Process Control.
ERIC Educational Resources Information Center
Higgins, Ronald C.; Messer, George H.
1990-01-01
Two applications of statistical process control to the process of education are described. Discussed are the use of prompt feedback to teachers and prompt feedback to students. A sample feedback form is provided. (CW)
How Statistics "Excel" Online.
ERIC Educational Resources Information Center
Chao, Faith; Davis, James
2000-01-01
Discusses the use of Microsoft Excel software and provides examples of its use in an online statistics course at Golden Gate University in the areas of randomness and probability, sampling distributions, confidence intervals, and regression analysis. (LRW)
Scheid, Anika; Nebel, Markus E
2012-07-09
Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case - without sacrificing much of the accuracy of the results. Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms.
2012-01-01
Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case – without sacrificing much of the accuracy of the results. Conclusions Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms. PMID:22776037
Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.
Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P; Patterson, Nick; Price, Alkes L
2014-10-15
Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary materials are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Steganalysis of recorded speech
NASA Astrophysics Data System (ADS)
Johnson, Micah K.; Lyu, Siwei; Farid, Hany
2005-03-01
Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.
Interpretation of correlations in clinical research.
Hung, Man; Bounsanga, Jerry; Voss, Maren Wright
2017-11-01
Critically analyzing research is a key skill in evidence-based practice and requires knowledge of research methods, results interpretation, and applications, all of which rely on a foundation based in statistics. Evidence-based practice makes high demands on trained medical professionals to interpret an ever-expanding array of research evidence. As clinical training emphasizes medical care rather than statistics, it is useful to review the basics of statistical methods and what they mean for interpreting clinical studies. We reviewed the basic concepts of correlational associations, violations of normality, unobserved variable bias, sample size, and alpha inflation. The foundations of causal inference were discussed and sound statistical analyses were examined. We discuss four ways in which correlational analysis is misused, including causal inference overreach, over-reliance on significance, alpha inflation, and sample size bias. Recent published studies in the medical field provide evidence of causal assertion overreach drawn from correlational findings. The findings present a primer on the assumptions and nature of correlational methods of analysis and urge clinicians to exercise appropriate caution as they critically analyze the evidence before them and evaluate evidence that supports practice. Critically analyzing new evidence requires statistical knowledge in addition to clinical knowledge. Studies can overstate relationships, expressing causal assertions when only correlational evidence is available. Failure to account for the effect of sample size in the analyses tends to overstate the importance of predictive variables. It is important not to overemphasize the statistical significance without consideration of effect size and whether differences could be considered clinically meaningful.
Public and patient involvement in quantitative health research: A statistical perspective.
Hannigan, Ailish
2018-06-19
The majority of studies included in recent reviews of impact for public and patient involvement (PPI) in health research had a qualitative design. PPI in solely quantitative designs is underexplored, particularly its impact on statistical analysis. Statisticians in practice have a long history of working in both consultative (indirect) and collaborative (direct) roles in health research, yet their perspective on PPI in quantitative health research has never been explicitly examined. To explore the potential and challenges of PPI from a statistical perspective at distinct stages of quantitative research, that is sampling, measurement and statistical analysis, distinguishing between indirect and direct PPI. Statistical analysis is underpinned by having a representative sample, and a collaborative or direct approach to PPI may help achieve that by supporting access to and increasing participation of under-represented groups in the population. Acknowledging and valuing the role of lay knowledge of the context in statistical analysis and in deciding what variables to measure may support collective learning and advance scientific understanding, as evidenced by the use of participatory modelling in other disciplines. A recurring issue for quantitative researchers, which reflects quantitative sampling methods, is the selection and required number of PPI contributors, and this requires further methodological development. Direct approaches to PPI in quantitative health research may potentially increase its impact, but the facilitation and partnership skills required may require further training for all stakeholders, including statisticians. © 2018 The Authors Health Expectations published by John Wiley & Sons Ltd.
Equivalent statistics and data interpretation.
Francis, Gregory
2017-08-01
Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.
An evaluation of flow-stratified sampling for estimating suspended sediment loads
Robert B. Thomas; Jack Lewis
1995-01-01
Abstract - Flow-stratified sampling is a new method for sampling water quality constituents such as suspended sediment to estimate loads. As with selection-at-list-time (SALT) and time-stratified sampling, flow-stratified sampling is a statistical method requiring random sampling, and yielding unbiased estimates of load and variance. It can be used to estimate event...
Sticky trap and stem-tap sampling protocols for the Asian citrus psyllid (Hemiptera: Psyllidae)
USDA-ARS?s Scientific Manuscript database
Sampling statistics were obtained to develop a sampling protocol for estimating numbers of adult Diaphorina citri in citrus using two different sampling methods: yellow sticky traps and stem–tap samples. A 4.0 ha block of mature orange trees was stratified into ten 0.4 ha strata and sampled using...
[Application of statistics on chronic-diseases-relating observational research papers].
Hong, Zhi-heng; Wang, Ping; Cao, Wei-hua
2012-09-01
To study the application of statistics on Chronic-diseases-relating observational research papers which were recently published in the Chinese Medical Association Magazines, with influential index above 0.5. Using a self-developed criterion, two investigators individually participated in assessing the application of statistics on Chinese Medical Association Magazines, with influential index above 0.5. Different opinions reached an agreement through discussion. A total number of 352 papers from 6 magazines, including the Chinese Journal of Epidemiology, Chinese Journal of Oncology, Chinese Journal of Preventive Medicine, Chinese Journal of Cardiology, Chinese Journal of Internal Medicine and Chinese Journal of Endocrinology and Metabolism, were reviewed. The rate of clear statement on the following contents as: research objectives, t target audience, sample issues, objective inclusion criteria and variable definitions were 99.43%, 98.57%, 95.43%, 92.86% and 96.87%. The correct rates of description on quantitative and qualitative data were 90.94% and 91.46%, respectively. The rates on correctly expressing the results, on statistical inference methods related to quantitative, qualitative data and modeling were 100%, 95.32% and 87.19%, respectively. 89.49% of the conclusions could directly response to the research objectives. However, 69.60% of the papers did not mention the exact names of the study design, statistically, that the papers were using. 11.14% of the papers were in lack of further statement on the exclusion criteria. Percentage of the papers that could clearly explain the sample size estimation only taking up as 5.16%. Only 24.21% of the papers clearly described the variable value assignment. Regarding the introduction on statistical conduction and on database methods, the rate was only 24.15%. 18.75% of the papers did not express the statistical inference methods sufficiently. A quarter of the papers did not use 'standardization' appropriately. As for the aspect of statistical inference, the rate of description on statistical testing prerequisite was only 24.12% while 9.94% papers did not even employ the statistical inferential method that should be used. The main deficiencies on the application of Statistics used in papers related to Chronic-diseases-related observational research were as follows: lack of sample-size determination, variable value assignment description not sufficient, methods on statistics were not introduced clearly or properly, lack of consideration for pre-requisition regarding the use of statistical inferences.
Gordon, J.D.; Schroder, L.J.; Morden-Moore, A. L.; Bowersox, V.C.
1995-01-01
Separate experiments by the U.S. Geological Survey (USGS) and the Illinois State Water Survey Central Analytical Laboratory (CAL) independently assessed the stability of hydrogen ion and specific conductance in filtered wet-deposition samples stored at ambient temperatures. The USGS experiment represented a test of sample stability under a diverse range of conditions, whereas the CAL experiment was a controlled test of sample stability. In the experiment by the USGS, a statistically significant (?? = 0.05) relation between [H+] and time was found for the composited filtered, natural, wet-deposition solution when all reported values are included in the analysis. However, if two outlying pH values most likely representing measurement error are excluded from the analysis, the change in [H+] over time was not statistically significant. In the experiment by the CAL, randomly selected samples were reanalyzed between July 1984 and February 1991. The original analysis and reanalysis pairs revealed that [H+] differences, although very small, were statistically different from zero, whereas specific-conductance differences were not. Nevertheless, the results of the CAL reanalysis project indicate there appears to be no consistent, chemically significant degradation in sample integrity with regard to [H+] and specific conductance while samples are stored at room temperature at the CAL. Based on the results of the CAL and USGS studies, short-term (45-60 day) stability of [H+] and specific conductance in natural filtered wet-deposition samples that are shipped and stored unchilled at ambient temperatures was satisfactory.
Automated sampling assessment for molecular simulations using the effective sample size
Zhang, Xin; Bhatt, Divesh; Zuckerman, Daniel M.
2010-01-01
To quantify the progress in the development of algorithms and forcefields used in molecular simulations, a general method for the assessment of the sampling quality is needed. Statistical mechanics principles suggest the populations of physical states characterize equilibrium sampling in a fundamental way. We therefore develop an approach for analyzing the variances in state populations, which quantifies the degree of sampling in terms of the effective sample size (ESS). The ESS estimates the number of statistically independent configurations contained in a simulated ensemble. The method is applicable to both traditional dynamics simulations as well as more modern (e.g., multi–canonical) approaches. Our procedure is tested in a variety of systems from toy models to atomistic protein simulations. We also introduce a simple automated procedure to obtain approximate physical states from dynamic trajectories: this allows sample–size estimation in systems for which physical states are not known in advance. PMID:21221418
Implications of Satellite Swath Width on Global Aerosol Optical Thickness Statistics
NASA Technical Reports Server (NTRS)
Colarco, Peter; Kahn, Ralph; Remer, Lorraine; Levy, Robert; Welton, Ellsworth
2012-01-01
We assess the impact of swath width on the statistics of aerosol optical thickness (AOT) retrieved by satellite as inferred from observations made by the Moderate Resolution Imaging Spectroradiometer (MODIS). We sub-sample the year 2009 MODIS data from both the Terra and Aqua spacecraft along several candidate swaths of various widths. We find that due to spatial sampling there is an uncertainty of approximately 0.01 in the global, annual mean AOT. The sub-sampled monthly mean gridded AOT are within +/- 0.01 of the full swath AOT about 20% of the time for the narrow swath sub-samples, about 30% of the time for the moderate width sub-samples, and about 45% of the time for the widest swath considered. These results suggest that future aerosol satellite missions with only a narrow swath view may not sample the true AOT distribution sufficiently to reduce significantly the uncertainty in aerosol direct forcing of climate.
The prior statistics of object colors.
Koenderink, Jan J
2010-02-01
The prior statistics of object colors is of much interest because extensive statistical investigations of reflectance spectra reveal highly non-uniform structure in color space common to several very different databases. This common structure is due to the visual system rather than to the statistics of environmental structure. Analysis involves an investigation of the proper sample space of spectral reflectance factors and of the statistical consequences of the projection of spectral reflectances on the color solid. Even in the case of reflectance statistics that are translationally invariant with respect to the wavelength dimension, the statistics of object colors is highly non-uniform. The qualitative nature of this non-uniformity is due to trichromacy.
40 CFR 761.130 - Sampling requirements.
Code of Federal Regulations, 2010 CFR
2010-07-01
... sampling scheme and the guidance document are available on EPA's PCB Web site at http://www.epa.gov/pcb, or... § 761.125(c) (2) through (4). Using its best engineering judgment, EPA may sample a statistically valid random or grid sampling technique, or both. When using engineering judgment or random “grab” samples, EPA...
40 CFR 761.130 - Sampling requirements.
Code of Federal Regulations, 2011 CFR
2011-07-01
... sampling scheme and the guidance document are available on EPA's PCB Web site at http://www.epa.gov/pcb, or... § 761.125(c) (2) through (4). Using its best engineering judgment, EPA may sample a statistically valid random or grid sampling technique, or both. When using engineering judgment or random “grab” samples, EPA...
Using Candy Samples to Learn about Sampling Techniques and Statistical Data Evaluation
ERIC Educational Resources Information Center
Canaes, Larissa S.; Brancalion, Marcel L.; Rossi, Adriana V.; Rath, Susanne
2008-01-01
A classroom exercise for undergraduate and beginning graduate students that takes about one class period is proposed and discussed. It is an easy, interesting exercise that demonstrates important aspects of sampling techniques (sample amount, particle size, and the representativeness of the sample in relation to the bulk material). The exercise…
Reflexion on linear regression trip production modelling method for ensuring good model quality
NASA Astrophysics Data System (ADS)
Suprayitno, Hitapriya; Ratnasari, Vita
2017-11-01
Transport Modelling is important. For certain cases, the conventional model still has to be used, in which having a good trip production model is capital. A good model can only be obtained from a good sample. Two of the basic principles of a good sampling is having a sample capable to represent the population characteristics and capable to produce an acceptable error at a certain confidence level. It seems that this principle is not yet quite understood and used in trip production modeling. Therefore, investigating the Trip Production Modelling practice in Indonesia and try to formulate a better modeling method for ensuring the Model Quality is necessary. This research result is presented as follows. Statistics knows a method to calculate span of prediction value at a certain confidence level for linear regression, which is called Confidence Interval of Predicted Value. The common modeling practice uses R2 as the principal quality measure, the sampling practice varies and not always conform to the sampling principles. An experiment indicates that small sample is already capable to give excellent R2 value and sample composition can significantly change the model. Hence, good R2 value, in fact, does not always mean good model quality. These lead to three basic ideas for ensuring good model quality, i.e. reformulating quality measure, calculation procedure, and sampling method. A quality measure is defined as having a good R2 value and a good Confidence Interval of Predicted Value. Calculation procedure must incorporate statistical calculation method and appropriate statistical tests needed. A good sampling method must incorporate random well distributed stratified sampling with a certain minimum number of samples. These three ideas need to be more developed and tested.
Forcino, Frank L; Leighton, Lindsey R; Twerdy, Pamela; Cahill, James F
2015-01-01
Community ecologists commonly perform multivariate techniques (e.g., ordination, cluster analysis) to assess patterns and gradients of taxonomic variation. A critical requirement for a meaningful statistical analysis is accurate information on the taxa found within an ecological sample. However, oversampling (too many individuals counted per sample) also comes at a cost, particularly for ecological systems in which identification and quantification is substantially more resource consuming than the field expedition itself. In such systems, an increasingly larger sample size will eventually result in diminishing returns in improving any pattern or gradient revealed by the data, but will also lead to continually increasing costs. Here, we examine 396 datasets: 44 previously published and 352 created datasets. Using meta-analytic and simulation-based approaches, the research within the present paper seeks (1) to determine minimal sample sizes required to produce robust multivariate statistical results when conducting abundance-based, community ecology research. Furthermore, we seek (2) to determine the dataset parameters (i.e., evenness, number of taxa, number of samples) that require larger sample sizes, regardless of resource availability. We found that in the 44 previously published and the 220 created datasets with randomly chosen abundances, a conservative estimate of a sample size of 58 produced the same multivariate results as all larger sample sizes. However, this minimal number varies as a function of evenness, where increased evenness resulted in increased minimal sample sizes. Sample sizes as small as 58 individuals are sufficient for a broad range of multivariate abundance-based research. In cases when resource availability is the limiting factor for conducting a project (e.g., small university, time to conduct the research project), statistically viable results can still be obtained with less of an investment.
Crans, Gerald G; Shuster, Jonathan J
2008-08-15
The debate as to which statistical methodology is most appropriate for the analysis of the two-sample comparative binomial trial has persisted for decades. Practitioners who favor the conditional methods of Fisher, Fisher's exact test (FET), claim that only experimental outcomes containing the same amount of information should be considered when performing analyses. Hence, the total number of successes should be fixed at its observed level in hypothetical repetitions of the experiment. Using conditional methods in clinical settings can pose interpretation difficulties, since results are derived using conditional sample spaces rather than the set of all possible outcomes. Perhaps more importantly from a clinical trial design perspective, this test can be too conservative, resulting in greater resource requirements and more subjects exposed to an experimental treatment. The actual significance level attained by FET (the size of the test) has not been reported in the statistical literature. Berger (J. R. Statist. Soc. D (The Statistician) 2001; 50:79-85) proposed assessing the conservativeness of conditional methods using p-value confidence intervals. In this paper we develop a numerical algorithm that calculates the size of FET for sample sizes, n, up to 125 per group at the two-sided significance level, alpha = 0.05. Additionally, this numerical method is used to define new significance levels alpha(*) = alpha+epsilon, where epsilon is a small positive number, for each n, such that the size of the test is as close as possible to the pre-specified alpha (0.05 for the current work) without exceeding it. Lastly, a sample size and power calculation example are presented, which demonstrates the statistical advantages of implementing the adjustment to FET (using alpha(*) instead of alpha) in the two-sample comparative binomial trial. 2008 John Wiley & Sons, Ltd
Sampling and Data Analysis for Environmental Microbiology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murray, Christopher J.
2001-06-01
A brief review of the literature indicates the importance of statistical analysis in applied and environmental microbiology. Sampling designs are particularly important for successful studies, and it is highly recommended that researchers review their sampling design before heading to the laboratory or the field. Most statisticians have numerous stories of scientists who approached them after their study was complete only to have to tell them that the data they gathered could not be used to test the hypothesis they wanted to address. Once the data are gathered, a large and complex body of statistical techniques are available for analysis ofmore » the data. Those methods include both numerical and graphical techniques for exploratory characterization of the data. Hypothesis testing and analysis of variance (ANOVA) are techniques that can be used to compare the mean and variance of two or more groups of samples. Regression can be used to examine the relationships between sets of variables and is often used to examine the dependence of microbiological populations on microbiological parameters. Multivariate statistics provides several methods that can be used for interpretation of datasets with a large number of variables and to partition samples into similar groups, a task that is very common in taxonomy, but also has applications in other fields of microbiology. Geostatistics and other techniques have been used to examine the spatial distribution of microorganisms. The objectives of this chapter are to provide a brief survey of some of the statistical techniques that can be used for sample design and data analysis of microbiological data in environmental studies, and to provide some examples of their use from the literature.« less
Mathematical background and attitudes toward statistics in a sample of Spanish college students.
Carmona, José; Martínez, Rafael J; Sánchez, Manuel
2005-08-01
To examine the relation of mathematical background and initial attitudes toward statistics of Spanish college students in social sciences the Survey of Attitudes Toward Statistics was given to 827 students. Multivariate analyses tested the effects of two indicators of mathematical background (amount of exposure and achievement in previous courses) on the four subscales. Analysis suggested grades in previous courses are more related to initial attitudes toward statistics than the number of mathematics courses taken. Mathematical background was related with students' affective responses to statistics but not with their valuing of statistics. Implications of possible research are discussed.
K, Punith; K, Lalitha; G, Suman; BS, Pradeep; Kumar K, Jayanth
2008-01-01
Research Question: Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? Objective: To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Study Design: Population-based cross-sectional study. Study Setting: Areas under Mathikere Urban Health Center. Study Subjects: Children aged 12 months to 23 months. Sample Size: 220 in cluster sampling, 76 in lot quality assurance sampling. Statistical Analysis: Percentages and Proportions, Chi square Test. Results: (1) Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2) Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area. PMID:19876474
2014-12-01
both 2000 and 2007 Bw·eau of Justice Statistics Law Enforcement Management and Administrative Statistics sw’Veys. These agencies incmporate most...responded to a variety of community policing and homeland security questions in both 2000 and 2007 Bureau of Justice Statistics Law Enforcement...Management and Administrative Statistics surveys. These agencies incorporate most major U.S. police departments as well as a representative sample of smaller
Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies
2010-03-01
Probabilistic Latent Semantic Indexing (PLSI) is an automated indexing information retrieval model [20]. It is based on a statistical latent class model which is...uses a statistical foundation that is more accurate in finding hidden semantic relationships [20]. The model uses factor analysis of count data, number...principle of statistical infer- ence which asserts that all of the information in a sample is contained in the likelihood function [20]. The statistical
Experimental Design in Clinical 'Omics Biomarker Discovery.
Forshed, Jenny
2017-11-03
This tutorial highlights some issues in the experimental design of clinical 'omics biomarker discovery, how to avoid bias and get as true quantities as possible from biochemical analyses, and how to select samples to improve the chance of answering the clinical question at issue. This includes the importance of defining clinical aim and end point, knowing the variability in the results, randomization of samples, sample size, statistical power, and how to avoid confounding factors by including clinical data in the sample selection, that is, how to avoid unpleasant surprises at the point of statistical analysis. The aim of this Tutorial is to help translational clinical and preclinical biomarker candidate research and to improve the validity and potential of future biomarker candidate findings.
Ichthyoplankton abundance and variance in a large river system concerns for long-term monitoring
Holland-Bartels, Leslie E.; Dewey, Michael R.; Zigler, Steven J.
1995-01-01
System-wide spatial patterns of ichthyoplankton abundance and variability were assessed in the upper Mississippi and lower Illinois rivers to address the experimental design and statistical confidence in density estimates. Ichthyoplankton was sampled from June to August 1989 in primary milieus (vegetated and non-vegated backwaters and impounded areas, main channels and main channel borders) in three navigation pools (8, 13 and 26) of the upper Mississippi River and in a downstream reach of the Illinois River. Ichthyoplankton densities varied among stations of similar aquatic landscapes (milieus) more than among subsamples within a station. An analysis of sampling effort indicated that the collection of single samples at many stations in a given milieu type is statistically and economically preferable to the collection of multiple subsamples at fewer stations. Cluster analyses also revealed that stations only generally grouped by their preassigned milieu types. Pilot studies such as this can define station groupings and sources of variation beyond an a priori habitat classification. Thus the minimum intensity of sampling required to achieve a desired statistical confidence can be identified before implementing monitoring efforts.
ERIC Educational Resources Information Center
Savalei, Victoria
2010-01-01
Incomplete nonnormal data are common occurrences in applied research. Although these 2 problems are often dealt with separately by methodologists, they often cooccur. Very little has been written about statistics appropriate for evaluating models with such data. This article extends several existing statistics for complete nonnormal data to…
A First Assignment to Create Student Buy-In in an Introductory Business Statistics Course
ERIC Educational Resources Information Center
Newfeld, Daria
2016-01-01
This paper presents a sample assignment to be administered after the first two weeks of an introductory business focused statistics course in order to promote student buy-in. This assignment integrates graphical displays of data, descriptive statistics and cross-tabulation analysis through the lens of a marketing analysis study. A marketing sample…
Properties of different selection signature statistics and a new strategy for combining them.
Ma, Y; Ding, X; Qanbari, S; Weigend, S; Zhang, Q; Simianer, H
2015-11-01
Identifying signatures of recent or ongoing selection is of high relevance in livestock population genomics. From a statistical perspective, determining a proper testing procedure and combining various test statistics is challenging. On the basis of extensive simulations in this study, we discuss the statistical properties of eight different established selection signature statistics. In the considered scenario, we show that a reasonable power to detect selection signatures is achieved with high marker density (>1 SNP/kb) as obtained from sequencing, while rather small sample sizes (~15 diploid individuals) appear to be sufficient. Most selection signature statistics such as composite likelihood ratio and cross population extended haplotype homozogysity have the highest power when fixation of the selected allele is reached, while integrated haplotype score has the highest power when selection is ongoing. We suggest a novel strategy, called de-correlated composite of multiple signals (DCMS) to combine different statistics for detecting selection signatures while accounting for the correlation between the different selection signature statistics. When examined with simulated data, DCMS consistently has a higher power than most of the single statistics and shows a reliable positional resolution. We illustrate the new statistic to the established selective sweep around the lactase gene in human HapMap data providing further evidence of the reliability of this new statistic. Then, we apply it to scan selection signatures in two chicken samples with diverse skin color. Our analysis suggests that a set of well-known genes such as BCO2, MC1R, ASIP and TYR were involved in the divergent selection for this trait.
Austin, Peter C; Steyerberg, Ewout W
2012-06-20
When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population.
Teo, Guoshou; Kim, Sinae; Tsou, Chih-Chiang; Collins, Ben; Gingras, Anne-Claude; Nesvizhskii, Alexey I; Choi, Hyungwon
2015-11-03
Data independent acquisition (DIA) mass spectrometry is an emerging technique that offers more complete detection and quantification of peptides and proteins across multiple samples. DIA allows fragment-level quantification, which can be considered as repeated measurements of the abundance of the corresponding peptides and proteins in the downstream statistical analysis. However, few statistical approaches are available for aggregating these complex fragment-level data into peptide- or protein-level statistical summaries. In this work, we describe a software package, mapDIA, for statistical analysis of differential protein expression using DIA fragment-level intensities. The workflow consists of three major steps: intensity normalization, peptide/fragment selection, and statistical analysis. First, mapDIA offers normalization of fragment-level intensities by total intensity sums as well as a novel alternative normalization by local intensity sums in retention time space. Second, mapDIA removes outlier observations and selects peptides/fragments that preserve the major quantitative patterns across all samples for each protein. Last, using the selected fragments and peptides, mapDIA performs model-based statistical significance analysis of protein-level differential expression between specified groups of samples. Using a comprehensive set of simulation datasets, we show that mapDIA detects differentially expressed proteins with accurate control of the false discovery rates. We also describe the analysis procedure in detail using two recently published DIA datasets generated for 14-3-3β dynamic interaction network and prostate cancer glycoproteome. The software was written in C++ language and the source code is available for free through SourceForge website http://sourceforge.net/projects/mapdia/.This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.
A statistical comparison of two carbon fiber/epoxy fabrication techniques
NASA Technical Reports Server (NTRS)
Hodge, A. J.
1991-01-01
A statistical comparison of the compression strengths of specimens that were fabricated by either a platen press or an autoclave were performed on IM6/3501-6 carbon/epoxy composites of 16-ply (0,+45,90,-45)(sub S2) lay-up configuration. The samples were cured with the same parameters and processing materials. It was found that the autoclaved panels were thicker than the platen press cured samples. Two hundred samples of each type of cure process were compression tested. The autoclaved samples had an average strength of 450 MPa (65.5 ksi), while the press cured samples had an average strength of 370 MPa (54.0 ksi). A Weibull analysis of the data showed that there is only a 30 pct. probability that the two types of cure systems yield specimens that can be considered from the same family.
Yang, Yang; DeGruttola, Victor
2016-01-01
Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients. PMID:22740584
Yang, Yang; DeGruttola, Victor
2012-06-22
Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients.
[Respondent-Driven Sampling: a new sampling method to study visible and hidden populations].
Mantecón, Alejandro; Juan, Montse; Calafat, Amador; Becoña, Elisardo; Román, Encarna
2008-01-01
The paper introduces a variant of chain-referral sampling: respondent-driven sampling (RDS). This sampling method shows that methods based on network analysis can be combined with the statistical validity of standard probability sampling methods. In this sense, RDS appears to be a mathematical improvement of snowball sampling oriented to the study of hidden populations. However, we try to prove its validity with populations that are not within a sampling frame but can nonetheless be contacted without difficulty. The basics of RDS are explained through our research on young people (aged 14 to 25) who go clubbing, consume alcohol and other drugs, and have sex. Fieldwork was carried out between May and July 2007 in three Spanish regions: Baleares, Galicia and Comunidad Valenciana. The presentation of the study shows the utility of this type of sampling when the population is accessible but there is a difficulty deriving from the lack of a sampling frame. However, the sample obtained is not a random representative one in statistical terms of the target population. It must be acknowledged that the final sample is representative of a 'pseudo-population' that approximates to the target population but is not identical to it.
A Story-Based Simulation for Teaching Sampling Distributions
ERIC Educational Resources Information Center
Turner, Stephen; Dabney, Alan R.
2015-01-01
Statistical inference relies heavily on the concept of sampling distributions. However, sampling distributions are difficult to teach. We present a series of short animations that are story-based, with associated assessments. We hope that our contribution can be useful as a tool to teach sampling distributions in the introductory statistics…
Sample Size Estimation: The Easy Way
ERIC Educational Resources Information Center
Weller, Susan C.
2015-01-01
This article presents a simple approach to making quick sample size estimates for basic hypothesis tests. Although there are many sources available for estimating sample sizes, methods are not often integrated across statistical tests, levels of measurement of variables, or effect sizes. A few parameters are required to estimate sample sizes and…
Farrell, Mary Beth
2018-06-01
This article is the second part of a continuing education series reviewing basic statistics that nuclear medicine and molecular imaging technologists should understand. In this article, the statistics for evaluating interpretation accuracy, significance, and variance are discussed. Throughout the article, actual statistics are pulled from the published literature. We begin by explaining 2 methods for quantifying interpretive accuracy: interreader and intrareader reliability. Agreement among readers can be expressed simply as a percentage. However, the Cohen κ-statistic is a more robust measure of agreement that accounts for chance. The higher the κ-statistic is, the higher is the agreement between readers. When 3 or more readers are being compared, the Fleiss κ-statistic is used. Significance testing determines whether the difference between 2 conditions or interventions is meaningful. Statistical significance is usually expressed using a number called a probability ( P ) value. Calculation of P value is beyond the scope of this review. However, knowing how to interpret P values is important for understanding the scientific literature. Generally, a P value of less than 0.05 is considered significant and indicates that the results of the experiment are due to more than just chance. Variance, standard deviation (SD), confidence interval, and standard error (SE) explain the dispersion of data around a mean of a sample drawn from a population. SD is commonly reported in the literature. A small SD indicates that there is not much variation in the sample data. Many biologic measurements fall into what is referred to as a normal distribution taking the shape of a bell curve. In a normal distribution, 68% of the data will fall within 1 SD, 95% will fall within 2 SDs, and 99.7% will fall within 3 SDs. Confidence interval defines the range of possible values within which the population parameter is likely to lie and gives an idea of the precision of the statistic being measured. A wide confidence interval indicates that if the experiment were repeated multiple times on other samples, the measured statistic would lie within a wide range of possibilities. The confidence interval relies on the SE. © 2018 by the Society of Nuclear Medicine and Molecular Imaging.
Geospatial techniques for developing a sampling frame of watersheds across a region
Gresswell, Robert E.; Bateman, Douglas S.; Lienkaemper, George; Guy, T.J.
2004-01-01
Current land-management decisions that affect the persistence of native salmonids are often influenced by studies of individual sites that are selected based on judgment and convenience. Although this approach is useful for some purposes, extrapolating results to areas that were not sampled is statistically inappropriate because the sampling design is usually biased. Therefore, in recent investigations of coastal cutthroat trout (Oncorhynchus clarki clarki) located above natural barriers to anadromous salmonids, we used a methodology for extending the statistical scope of inference. The purpose of this paper is to apply geospatial tools to identify a population of watersheds and develop a probability-based sampling design for coastal cutthroat trout in western Oregon, USA. The population of mid-size watersheds (500-5800 ha) west of the Cascade Range divide was derived from watershed delineations based on digital elevation models. Because a database with locations of isolated populations of coastal cutthroat trout did not exist, a sampling frame of isolated watersheds containing cutthroat trout had to be developed. After the sampling frame of watersheds was established, isolated watersheds with coastal cutthroat trout were stratified by ecoregion and erosion potential based on dominant bedrock lithology (i.e., sedimentary and igneous). A stratified random sample of 60 watersheds was selected with proportional allocation in each stratum. By comparing watershed drainage areas of streams in the general population to those in the sampling frame and the resulting sample (n = 60), we were able to evaluate the how representative the subset of watersheds was in relation to the population of watersheds. Geospatial tools provided a relatively inexpensive means to generate the information necessary to develop a statistically robust, probability-based sampling design.
Methods for collection and analysis of aquatic biological and microbiological samples
Greeson, Phillip E.; Ehlke, T.A.; Irwin, G.A.; Lium, B.W.; Slack, K.V.
1977-01-01
Chapter A4 contains methods used by the U.S. Geological Survey to collect, preserve, and analyze waters to determine their biological and microbiological properties. Part 1 discusses biological sampling and sampling statistics. The statistical procedures are accompanied by examples. Part 2 consists of detailed descriptions of more than 45 individual methods, including those for bacteria, phytoplankton, zooplankton, seston, periphyton, macrophytes, benthic invertebrates, fish and other vertebrates, cellular contents, productivity, and bioassays. Each method is summarized, and the application, interferences, apparatus, reagents, collection, analysis, calculations, reporting of results, precision and references are given. Part 3 consists of a glossary. Part 4 is a list of taxonomic references.
NASA Technical Reports Server (NTRS)
Tolson, R. H.
1981-01-01
A technique is described for providing a means of evaluating the influence of spatial sampling on the determination of global mean total columnar ozone. A finite number of coefficients in the expansion are determined, and the truncated part of the expansion is shown to contribute an error to the estimate, which depends strongly on the spatial sampling and is relatively insensitive to data noise. First and second order statistics are derived for each term in a spherical harmonic expansion which represents the ozone field, and the statistics are used to estimate systematic and random errors in the estimates of total ozone.
2013-01-01
Background Relative validity (RV), a ratio of ANOVA F-statistics, is often used to compare the validity of patient-reported outcome (PRO) measures. We used the bootstrap to establish the statistical significance of the RV and to identify key factors affecting its significance. Methods Based on responses from 453 chronic kidney disease (CKD) patients to 16 CKD-specific and generic PRO measures, RVs were computed to determine how well each measure discriminated across clinically-defined groups of patients compared to the most discriminating (reference) measure. Statistical significance of RV was quantified by the 95% bootstrap confidence interval. Simulations examined the effects of sample size, denominator F-statistic, correlation between comparator and reference measures, and number of bootstrap replicates. Results The statistical significance of the RV increased as the magnitude of denominator F-statistic increased or as the correlation between comparator and reference measures increased. A denominator F-statistic of 57 conveyed sufficient power (80%) to detect an RV of 0.6 for two measures correlated at r = 0.7. Larger denominator F-statistics or higher correlations provided greater power. Larger sample size with a fixed denominator F-statistic or more bootstrap replicates (beyond 500) had minimal impact. Conclusions The bootstrap is valuable for establishing the statistical significance of RV estimates. A reasonably large denominator F-statistic (F > 57) is required for adequate power when using the RV to compare the validity of measures with small or moderate correlations (r < 0.7). Substantially greater power can be achieved when comparing measures of a very high correlation (r > 0.9). PMID:23721463
Analysis of statistical misconception in terms of statistical reasoning
NASA Astrophysics Data System (ADS)
Maryati, I.; Priatna, N.
2018-05-01
Reasoning skill is needed for everyone to face globalization era, because every person have to be able to manage and use information from all over the world which can be obtained easily. Statistical reasoning skill is the ability to collect, group, process, interpret, and draw conclusion of information. Developing this skill can be done through various levels of education. However, the skill is low because many people assume that statistics is just the ability to count and using formulas and so do students. Students still have negative attitude toward course which is related to research. The purpose of this research is analyzing students’ misconception in descriptive statistic course toward the statistical reasoning skill. The observation was done by analyzing the misconception test result and statistical reasoning skill test; observing the students’ misconception effect toward statistical reasoning skill. The sample of this research was 32 students of math education department who had taken descriptive statistic course. The mean value of misconception test was 49,7 and standard deviation was 10,6 whereas the mean value of statistical reasoning skill test was 51,8 and standard deviation was 8,5. If the minimal value is 65 to state the standard achievement of a course competence, students’ mean value is lower than the standard competence. The result of students’ misconception study emphasized on which sub discussion that should be considered. Based on the assessment result, it was found that students’ misconception happen on this: 1) writing mathematical sentence and symbol well, 2) understanding basic definitions, 3) determining concept that will be used in solving problem. In statistical reasoning skill, the assessment was done to measure reasoning from: 1) data, 2) representation, 3) statistic format, 4) probability, 5) sample, and 6) association.
Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module
NASA Astrophysics Data System (ADS)
Martinez, Gregory D.; McKay, James; Farmer, Ben; Scott, Pat; Roebber, Elinore; Putze, Antje; Conrad, Jan
2017-11-01
We introduce ScannerBit, the statistics and sampling module of the public, open-source global fitting framework GAMBIT. ScannerBit provides a standardised interface to different sampling algorithms, enabling the use and comparison of multiple computational methods for inferring profile likelihoods, Bayesian posteriors, and other statistical quantities. The current version offers random, grid, raster, nested sampling, differential evolution, Markov Chain Monte Carlo (MCMC) and ensemble Monte Carlo samplers. We also announce the release of a new standalone differential evolution sampler, Diver, and describe its design, usage and interface to ScannerBit. We subject Diver and three other samplers (the nested sampler MultiNest, the MCMC GreAT, and the native ScannerBit implementation of the ensemble Monte Carlo algorithm T-Walk) to a battery of statistical tests. For this we use a realistic physical likelihood function, based on the scalar singlet model of dark matter. We examine the performance of each sampler as a function of its adjustable settings, and the dimensionality of the sampling problem. We evaluate performance on four metrics: optimality of the best fit found, completeness in exploring the best-fit region, number of likelihood evaluations, and total runtime. For Bayesian posterior estimation at high resolution, T-Walk provides the most accurate and timely mapping of the full parameter space. For profile likelihood analysis in less than about ten dimensions, we find that Diver and MultiNest score similarly in terms of best fit and speed, outperforming GreAT and T-Walk; in ten or more dimensions, Diver substantially outperforms the other three samplers on all metrics.
Wellek, Stefan
2017-02-28
In current practice, the most frequently applied approach to the handling of ties in the Mann-Whitney-Wilcoxon (MWW) test is based on the conditional distribution of the sum of mid-ranks, given the observed pattern of ties. Starting from this conditional version of the testing procedure, a sample size formula was derived and investigated by Zhao et al. (Stat Med 2008). In contrast, the approach we pursue here is a nonconditional one exploiting explicit representations for the variances of and the covariance between the two U-statistics estimators involved in the Mann-Whitney form of the test statistic. The accuracy of both ways of approximating the sample sizes required for attaining a prespecified level of power in the MWW test for superiority with arbitrarily tied data is comparatively evaluated by means of simulation. The key qualitative conclusions to be drawn from these numerical comparisons are as follows: With the sample sizes calculated by means of the respective formula, both versions of the test maintain the level and the prespecified power with about the same degree of accuracy. Despite the equivalence in terms of accuracy, the sample size estimates obtained by means of the new formula are in many cases markedly lower than that calculated for the conditional test. Perhaps, a still more important advantage of the nonconditional approach based on U-statistics is that it can be also adopted for noninferiority trials. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.
Dazard, Jean-Eudes; Rao, J Sunil
2012-07-01
The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.
The Asymmetry Parameter and Branching Ratio of Sigma Plus Radiative Decay
DOE Office of Scientific and Technical Information (OSTI.GOV)
Foucher, Maurice Emile
1992-05-01
We have measured the asymmetry parameter and branching ratio of themore » $$\\Sigma^+$$ radiative decay. This high statistics experiment (FNAL 761) was performed in the Proton Center charged hyperon beam at Fermi National Accelerator Laboratory in Batavia, Illinois. We find for the asymmetry parameter -0.720 $$\\pm$$ 0.086 $$\\pm$$ 0.045 where the first error is statistical and the second is systematic. This result is based on a sample of 34754 $$\\pm$$ 212 events. We find a preliminary value for the branching ratio $$Br ( \\Sigma^+ \\to p\\gamma )$$ $$/ Br ( \\Sigma^+ \\to p \\pi^0 )$$ = (2.14 $$\\pm$$ 0.07 $$\\pm$$ 0.11) x $$10^{-3}$$ where the first error is statistical and the second is systematic. This result is based on a sample of 31040 $$\\pm$$ 650 events. Both results are in agreement with previous low statistics measurements.« less
The (mis)reporting of statistical results in psychology journals.
Bakker, Marjan; Wicherts, Jelte M
2011-09-01
In order to study the prevalence, nature (direction), and causes of reporting errors in psychology, we checked the consistency of reported test statistics, degrees of freedom, and p values in a random sample of high- and low-impact psychology journals. In a second study, we established the generality of reporting errors in a random sample of recent psychological articles. Our results, on the basis of 281 articles, indicate that around 18% of statistical results in the psychological literature are incorrectly reported. Inconsistencies were more common in low-impact journals than in high-impact journals. Moreover, around 15% of the articles contained at least one statistical conclusion that proved, upon recalculation, to be incorrect; that is, recalculation rendered the previously significant result insignificant, or vice versa. These errors were often in line with researchers' expectations. We classified the most common errors and contacted authors to shed light on the origins of the errors.
Power of tests for comparing trend curves with application to national immunization survey (NIS).
Zhao, Zhen
2011-02-28
To develop statistical tests for comparing trend curves of study outcomes between two socio-demographic strata across consecutive time points, and compare statistical power of the proposed tests under different trend curves data, three statistical tests were proposed. For large sample size with independent normal assumption among strata and across consecutive time points, the Z and Chi-square test statistics were developed, which are functions of outcome estimates and the standard errors at each of the study time points for the two strata. For small sample size with independent normal assumption, the F-test statistic was generated, which is a function of sample size of the two strata and estimated parameters across study period. If two trend curves are approximately parallel, the power of Z-test is consistently higher than that of both Chi-square and F-test. If two trend curves cross at low interaction, the power of Z-test is higher than or equal to the power of both Chi-square and F-test; however, at high interaction, the powers of Chi-square and F-test are higher than that of Z-test. The measurement of interaction of two trend curves was defined. These tests were applied to the comparison of trend curves of vaccination coverage estimates of standard vaccine series with National Immunization Survey (NIS) 2000-2007 data. Copyright © 2011 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Stefanski, Philip L.
2015-01-01
Commercially available software packages today allow users to quickly perform the routine evaluations of (1) descriptive statistics to numerically and graphically summarize both sample and population data, (2) inferential statistics that draws conclusions about a given population from samples taken of it, (3) probability determinations that can be used to generate estimates of reliability allowables, and finally (4) the setup of designed experiments and analysis of their data to identify significant material and process characteristics for application in both product manufacturing and performance enhancement. This paper presents examples of analysis and experimental design work that has been conducted using Statgraphics®(Registered Trademark) statistical software to obtain useful information with regard to solid rocket motor propellants and internal insulation material. Data were obtained from a number of programs (Shuttle, Constellation, and Space Launch System) and sources that include solid propellant burn rate strands, tensile specimens, sub-scale test motors, full-scale operational motors, rubber insulation specimens, and sub-scale rubber insulation analog samples. Besides facilitating the experimental design process to yield meaningful results, statistical software has demonstrated its ability to quickly perform complex data analyses and yield significant findings that might otherwise have gone unnoticed. One caveat to these successes is that useful results not only derive from the inherent power of the software package, but also from the skill and understanding of the data analyst.
Estimating and comparing microbial diversity in the presence of sequencing errors
Chiu, Chun-Huo
2016-01-01
Estimating and comparing microbial diversity are statistically challenging due to limited sampling and possible sequencing errors for low-frequency counts, producing spurious singletons. The inflated singleton count seriously affects statistical analysis and inferences about microbial diversity. Previous statistical approaches to tackle the sequencing errors generally require different parametric assumptions about the sampling model or about the functional form of frequency counts. Different parametric assumptions may lead to drastically different diversity estimates. We focus on nonparametric methods which are universally valid for all parametric assumptions and can be used to compare diversity across communities. We develop here a nonparametric estimator of the true singleton count to replace the spurious singleton count in all methods/approaches. Our estimator of the true singleton count is in terms of the frequency counts of doubletons, tripletons and quadrupletons, provided these three frequency counts are reliable. To quantify microbial alpha diversity for an individual community, we adopt the measure of Hill numbers (effective number of taxa) under a nonparametric framework. Hill numbers, parameterized by an order q that determines the measures’ emphasis on rare or common species, include taxa richness (q = 0), Shannon diversity (q = 1, the exponential of Shannon entropy), and Simpson diversity (q = 2, the inverse of Simpson index). A diversity profile which depicts the Hill number as a function of order q conveys all information contained in a taxa abundance distribution. Based on the estimated singleton count and the original non-singleton frequency counts, two statistical approaches (non-asymptotic and asymptotic) are developed to compare microbial diversity for multiple communities. (1) A non-asymptotic approach refers to the comparison of estimated diversities of standardized samples with a common finite sample size or sample completeness. This approach aims to compare diversity estimates for equally-large or equally-complete samples; it is based on the seamless rarefaction and extrapolation sampling curves of Hill numbers, specifically for q = 0, 1 and 2. (2) An asymptotic approach refers to the comparison of the estimated asymptotic diversity profiles. That is, this approach compares the estimated profiles for complete samples or samples whose size tends to be sufficiently large. It is based on statistical estimation of the true Hill number of any order q ≥ 0. In the two approaches, replacing the spurious singleton count by our estimated count, we can greatly remove the positive biases associated with diversity estimates due to spurious singletons and also make fair comparisons across microbial communities, as illustrated in our simulation results and in applying our method to analyze sequencing data from viral metagenomes. PMID:26855872
Using Group Projects to Assess the Learning of Sampling Distributions
ERIC Educational Resources Information Center
Neidigh, Robert O.; Dunkelberger, Jake
2012-01-01
In an introductory business statistics course, student groups used sample data to compare a set of sample means to the theoretical sampling distribution. Each group was given a production measurement with a population mean and standard deviation. The groups were also provided an excel spreadsheet with 40 sample measurements per week for 52 weeks…
NASA Astrophysics Data System (ADS)
He, Honghui; Dong, Yang; Zhou, Jialing; Ma, Hui
2017-03-01
As one of the salient features of light, polarization contains abundant structural and optical information of media. Recently, as a comprehensive description of polarization property, the Mueller matrix polarimetry has been applied to various biomedical studies such as cancerous tissues detections. In previous works, it has been found that the structural information encoded in the 2D Mueller matrix images can be presented by other transformed parameters with more explicit relationship to certain microstructural features. In this paper, we present a statistical analyzing method to transform the 2D Mueller matrix images into frequency distribution histograms (FDHs) and their central moments to reveal the dominant structural features of samples quantitatively. The experimental results of porcine heart, intestine, stomach, and liver tissues demonstrate that the transformation parameters and central moments based on the statistical analysis of Mueller matrix elements have simple relationships to the dominant microstructural properties of biomedical samples, including the density and orientation of fibrous structures, the depolarization power, diattenuation and absorption abilities. It is shown in this paper that the statistical analysis of 2D images of Mueller matrix elements may provide quantitative or semi-quantitative criteria for biomedical diagnosis.
A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data
Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J.; Yanes, Oscar
2012-01-01
Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples. PMID:24957762
A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data.
Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J; Yanes, Oscar
2012-10-18
Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples.
Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics
NASA Technical Reports Server (NTRS)
Pohorille, Andrew
2006-01-01
The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described by rate constants. These problems are isomorphic with chemical kinetics problems. Recently, several efficient techniques for this purpose have been developed based on the approach originally proposed by Gillespie. Although the utility of the techniques mentioned above for Bayesian problems has not been determined, further research along these lines is warranted
NASA Astrophysics Data System (ADS)
Theodorakou, Chrysoula; Farquharson, Michael J.
2009-08-01
The motivation behind this study is to assess whether angular dispersive x-ray diffraction (ADXRD) data, processed using multivariate analysis techniques, can be used for classifying secondary colorectal liver cancer tissue and normal surrounding liver tissue in human liver biopsy samples. The ADXRD profiles from a total of 60 samples of normal liver tissue and colorectal liver metastases were measured using a synchrotron radiation source. The data were analysed for 56 samples using nonlinear peak-fitting software. Four peaks were fitted to all of the ADXRD profiles, and the amplitude, area, amplitude and area ratios for three of the four peaks were calculated and used for the statistical and multivariate analysis. The statistical analysis showed that there are significant differences between all the peak-fitting parameters and ratios between the normal and the diseased tissue groups. The technique of soft independent modelling of class analogy (SIMCA) was used to classify normal liver tissue and colorectal liver metastases resulting in 67% of the normal tissue samples and 60% of the secondary colorectal liver tissue samples being classified correctly. This study has shown that the ADXRD data of normal and secondary colorectal liver cancer are statistically different and x-ray diffraction data analysed using multivariate analysis have the potential to be used as a method of tissue classification.
Harris, Alexandre M.; DeGiorgio, Michael
2016-01-01
Gene diversity, or expected heterozygosity (H), is a common statistic for assessing genetic variation within populations. Estimation of this statistic decreases in accuracy and precision when individuals are related or inbred, due to increased dependence among allele copies in the sample. The original unbiased estimator of expected heterozygosity underestimates true population diversity in samples containing relatives, as it only accounts for sample size. More recently, a general unbiased estimator of expected heterozygosity was developed that explicitly accounts for related and inbred individuals in samples. Though unbiased, this estimator’s variance is greater than that of the original estimator. To address this issue, we introduce a general unbiased estimator of gene diversity for samples containing related or inbred individuals, which employs the best linear unbiased estimator of allele frequencies, rather than the commonly used sample proportion. We examine the properties of this estimator, H∼BLUE, relative to alternative estimators using simulations and theoretical predictions, and show that it predominantly has the smallest mean squared error relative to others. Further, we empirically assess the performance of H∼BLUE on a global human microsatellite dataset of 5795 individuals, from 267 populations, genotyped at 645 loci. Additionally, we show that the improved variance of H∼BLUE leads to improved estimates of the population differentiation statistic, FST, which employs measures of gene diversity within its calculation. Finally, we provide an R script, BestHet, to compute this estimator from genomic and pedigree data. PMID:28040781
Bartsch, L.A.; Richardson, W.B.; Naimo, T.J.
1998-01-01
Estimation of benthic macroinvertebrate populations over large spatial scales is difficult due to the high variability in abundance and the cost of sample processing and taxonomic analysis. To determine a cost-effective, statistically powerful sample design, we conducted an exploratory study of the spatial variation of benthic macroinvertebrates in a 37 km reach of the Upper Mississippi River. We sampled benthos at 36 sites within each of two strata, contiguous backwater and channel border. Three standard ponar (525 cm(2)) grab samples were obtained at each site ('Original Design'). Analysis of variance and sampling cost of strata-wide estimates for abundance of Oligochaeta, Chironomidae, and total invertebrates showed that only one ponar sample per site ('Reduced Design') yielded essentially the same abundance estimates as the Original Design, while reducing the overall cost by 63%. A posteriori statistical power analysis (alpha = 0.05, beta = 0.20) on the Reduced Design estimated that at least 18 sites per stratum were needed to detect differences in mean abundance between contiguous backwater and channel border areas for Oligochaeta, Chironomidae, and total invertebrates. Statistical power was nearly identical for the three taxonomic groups. The abundances of several taxa of concern (e.g., Hexagenia mayflies and Musculium fingernail clams) were too spatially variable to estimate power with our method. Resampling simulations indicated that to achieve adequate sampling precision for Oligochaeta, at least 36 sample sites per stratum would be required, whereas a sampling precision of 0.2 would not be attained with any sample size for Hexagenia in channel border areas, or Chironomidae and Musculium in both strata given the variance structure of the original samples. Community-wide diversity indices (Brillouin and 1-Simpsons) increased as sample area per site increased. The backwater area had higher diversity than the channel border area. The number of sampling sites required to sample benthic macroinvertebrates during our sampling period depended on the study objective and ranged from 18 to more than 40 sites per stratum. No single sampling regime would efficiently and adequately sample all components of the macroinvertebrate community.
NASA Technical Reports Server (NTRS)
Tomberlin, T. J.
1985-01-01
Research studies of residents' responses to noise consist of interviews with samples of individuals who are drawn from a number of different compact study areas. The statistical techniques developed provide a basis for those sample design decisions. These techniques are suitable for a wide range of sample survey applications. A sample may consist of a random sample of residents selected from a sample of compact study areas, or in a more complex design, of a sample of residents selected from a sample of larger areas (e.g., cities). The techniques may be applied to estimates of the effects on annoyance of noise level, numbers of noise events, the time-of-day of the events, ambient noise levels, or other factors. Methods are provided for determining, in advance, how accurately these effects can be estimated for different sample sizes and study designs. Using a simple cost function, they also provide for optimum allocation of the sample across the stages of the design for estimating these effects. These techniques are developed via a regression model in which the regression coefficients are assumed to be random, with components of variance associated with the various stages of a multi-stage sample design.
Sampling for mercury at subnanogram per litre concentrations for load estimation in rivers
Colman, J.A.; Breault, R.F.
2000-01-01
Estimation of constituent loads in streams requires collection of stream samples that are representative of constituent concentrations, that is, composites of isokinetic multiple verticals collected along a stream transect. An all-Teflon isokinetic sampler (DH-81) cleaned in 75??C, 4 N HCl was tested using blank, split, and replicate samples to assess systematic and random sample contamination by mercury species. Mean mercury concentrations in field-equipment blanks were low: 0.135 ng??L-1 for total mercury (??Hg) and 0.0086 ng??L-1 for monomethyl mercury (MeHg). Mean square errors (MSE) for ??Hg and MeHg duplicate samples collected at eight sampling stations were not statistically different from MSE of samples split in the laboratory, which represent the analytical and splitting error. Low fieldblank concentrations and statistically equal duplicate- and split-sample MSE values indicate that no measurable contamination was occurring during sampling. Standard deviations associated with example mercury load estimations were four to five times larger, on a relative basis, than standard deviations calculated from duplicate samples, indicating that error of the load determination was primarily a function of the loading model used, not of sampling or analytical methods.
ERIC Educational Resources Information Center
Ali, Usama S.; Walker, Michael E.
2014-01-01
Two methods are currently in use at Educational Testing Service (ETS) for equating observed item difficulty statistics. The first method involves the linear equating of item statistics in an observed sample to reference statistics on the same items. The second method, or the item response curve (IRC) method, involves the summation of conditional…
ERIC Educational Resources Information Center
Conley, Quincy
2013-01-01
Statistics is taught at every level of education, yet teachers often have to assume their students have no knowledge of statistics and start from scratch each time they set out to teach statistics. The motivation for this experimental study comes from interest in exploring educational applications of augmented reality (AR) delivered via mobile…
ERIC Educational Resources Information Center
Trumpower, David L.
2015-01-01
Making inferences about population differences based on samples of data, that is, performing intuitive analysis of variance (IANOVA), is common in everyday life. However, the intuitive reasoning of individuals when making such inferences (even following statistics instruction), often differs from the normative logic of formal statistics. The…
Data-Based Detection of Potential Terrorist Attacks: Statistical and Graphical Methods
2010-06-01
Naren; Vasquez-Robinet, Cecilia; Watkinson, Jonathan: "A General Probabilistic Model of the PCR Process," Applied Mathematics and Computation 182(1...September 2006. Seminar, Measuring the effect of Length biased sampling, Mathematical Sciences Section, National Security Agency, 19 September 2006...Committee on National Statistics, 9 February 2007. Invited seminar, Statistical Tests for Bullet Lead Comparisons, Department of Mathematics , Butler
ERIC Educational Resources Information Center
Adetona, Abel Adekanmi
2017-01-01
The study aimed at assessing how students and teachers factor taken together influence students' achievement in Statistics as well as their relative contribution to the prediction. Two research questions were raised and purposive sampling was adopted to select national diploma year 2 students since they are already in their final level in the…
NASA Astrophysics Data System (ADS)
Agus, M.; Hitchcott, P. K.; Penna, M. P.; Peró-Cebollero, M.; Guàrdia-Olmos, J.
2016-11-01
Many studies have investigated the features of probabilistic reasoning developed in relation to different formats of problem presentation, showing that it is affected by various individual and contextual factors. Incomplete understanding of the identity and role of these factors may explain the inconsistent evidence concerning the effect of problem presentation format. Thus, superior performance has sometimes been observed for graphically, rather than verbally, presented problems. The present study was undertaken to address this issue. Psychology undergraduates without any statistical expertise (N = 173 in Italy; N = 118 in Spain; N = 55 in England) were administered statistical problems in two formats (verbal-numerical and graphical-pictorial) under a condition of time pressure. Students also completed additional measures indexing several potentially relevant individual dimensions (statistical ability, statistical anxiety, attitudes towards statistics and confidence). Interestingly, a facilitatory effect of graphical presentation was observed in the Italian and Spanish samples but not in the English one. Significantly, the individual dimensions predicting statistical performance also differed between the samples, highlighting a different role of confidence. Hence, these findings confirm previous observations concerning problem presentation format while simultaneously highlighting the importance of individual dimensions.
Seven ways to increase power without increasing N.
Hansen, W B; Collins, L M
1994-01-01
Many readers of this monograph may wonder why a chapter on statistical power was included. After all, by now the issue of statistical power is in many respects mundane. Everyone knows that statistical power is a central research consideration, and certainly most National Institute on Drug Abuse grantees or prospective grantees understand the importance of including a power analysis in research proposals. However, there is ample evidence that, in practice, prevention researchers are not paying sufficient attention to statistical power. If they were, the findings observed by Hansen (1992) in a recent review of the prevention literature would not have emerged. Hansen (1992) examined statistical power based on 46 cohorts followed longitudinally, using nonparametric assumptions given the subjects' age at posttest and the numbers of subjects. Results of this analysis indicated that, in order for a study to attain 80-percent power for detecting differences between treatment and control groups, the difference between groups at posttest would need to be at least 8 percent (in the best studies) and as much as 16 percent (in the weakest studies). In order for a study to attain 80-percent power for detecting group differences in pre-post change, 22 of the 46 cohorts would have needed relative pre-post reductions of greater than 100 percent. Thirty-three of the 46 cohorts had less than 50-percent power to detect a 50-percent relative reduction in substance use. These results are consistent with other review findings (e.g., Lipsey 1990) that have shown a similar lack of power in a broad range of research topics. Thus, it seems that, although researchers are aware of the importance of statistical power (particularly of the necessity for calculating it when proposing research), they somehow are failing to end up with adequate power in their completed studies. This chapter argues that the failure of many prevention studies to maintain adequate statistical power is due to an overemphasis on sample size (N) as the only, or even the best, way to increase statistical power. It is easy to see how this overemphasis has come about. Sample size is easy to manipulate, has the advantage of being related to power in a straight-forward way, and usually is under the direct control of the researcher, except for limitations imposed by finances or subject availability. Another option for increasing power is to increase the alpha used for hypothesis-testing but, as very few researchers seriously consider significance levels much larger than the traditional .05, this strategy seldom is used. Of course, sample size is important, and the authors of this chapter are not recommending that researchers cease choosing sample sizes carefully. Rather, they argue that researchers should not confine themselves to increasing N to enhance power. It is important to take additional measures to maintain and improve power over and above making sure the initial sample size is sufficient. The authors recommend two general strategies. One strategy involves attempting to maintain the effective initial sample size so that power is not lost needlessly. The other strategy is to take measures to maximize the third factor that determines statistical power: effect size.
Statistical analyses to support guidelines for marine avian sampling. Final report
Kinlan, Brian P.; Zipkin, Elise; O'Connell, Allan F.; Caldow, Chris
2012-01-01
Interest in development of offshore renewable energy facilities has led to a need for high-quality, statistically robust information on marine wildlife distributions. A practical approach is described to estimate the amount of sampling effort required to have sufficient statistical power to identify species-specific “hotspots” and “coldspots” of marine bird abundance and occurrence in an offshore environment divided into discrete spatial units (e.g., lease blocks), where “hotspots” and “coldspots” are defined relative to a reference (e.g., regional) mean abundance and/or occurrence probability for each species of interest. For example, a location with average abundance or occurrence that is three times larger the mean (3x effect size) could be defined as a “hotspot,” and a location that is three times smaller than the mean (1/3x effect size) as a “coldspot.” The choice of the effect size used to define hot and coldspots will generally depend on a combination of ecological and regulatory considerations. A method is also developed for testing the statistical significance of possible hotspots and coldspots. Both methods are illustrated with historical seabird survey data from the USGS Avian Compendium Database. Our approach consists of five main components: 1. A review of the primary scientific literature on statistical modeling of animal group size and avian count data to develop a candidate set of statistical distributions that have been used or may be useful to model seabird counts. 2. Statistical power curves for one-sample, one-tailed Monte Carlo significance tests of differences of observed small-sample means from a specified reference distribution. These curves show the power to detect "hotspots" or "coldspots" of occurrence and abundance at a range of effect sizes, given assumptions which we discuss. 3. A model selection procedure, based on maximum likelihood fits of models in the candidate set, to determine an appropriate statistical distribution to describe counts of a given species in a particular region and season. 4. Using a large database of historical at-sea seabird survey data, we applied this technique to identify appropriate statistical distributions for modeling a variety of species, allowing the distribution to vary by season. For each species and season, we used the selected distribution to calculate and map retrospective statistical power to detect hotspots and coldspots, and map pvalues from Monte Carlo significance tests of hotspots and coldspots, in discrete lease blocks designated by the U.S. Department of Interior, Bureau of Ocean Energy Management (BOEM). 5. Because our definition of hotspots and coldspots does not explicitly include variability over time, we examine the relationship between the temporal scale of sampling and the proportion of variance captured in time series of key environmental correlates of marine bird abundance, as well as available marine bird abundance time series, and use these analyses to develop recommendations for the temporal distribution of sampling to adequately represent both shortterm and long-term variability. We conclude by presenting a schematic “decision tree” showing how this power analysis approach would fit in a general framework for avian survey design, and discuss implications of model assumptions and results. We discuss avenues for future development of this work, and recommendations for practical implementation in the context of siting and wildlife assessment for offshore renewable energy development projects.
Progressive statistics for studies in sports medicine and exercise science.
Hopkins, William G; Marshall, Stephen W; Batterham, Alan M; Hanin, Juri
2009-01-01
Statistical guidelines and expert statements are now available to assist in the analysis and reporting of studies in some biomedical disciplines. We present here a more progressive resource for sample-based studies, meta-analyses, and case studies in sports medicine and exercise science. We offer forthright advice on the following controversial or novel issues: using precision of estimation for inferences about population effects in preference to null-hypothesis testing, which is inadequate for assessing clinical or practical importance; justifying sample size via acceptable precision or confidence for clinical decisions rather than via adequate power for statistical significance; showing SD rather than SEM, to better communicate the magnitude of differences in means and nonuniformity of error; avoiding purely nonparametric analyses, which cannot provide inferences about magnitude and are unnecessary; using regression statistics in validity studies, in preference to the impractical and biased limits of agreement; making greater use of qualitative methods to enrich sample-based quantitative projects; and seeking ethics approval for public access to the depersonalized raw data of a study, to address the need for more scrutiny of research and better meta-analyses. Advice on less contentious issues includes the following: using covariates in linear models to adjust for confounders, to account for individual differences, and to identify potential mechanisms of an effect; using log transformation to deal with nonuniformity of effects and error; identifying and deleting outliers; presenting descriptive, effect, and inferential statistics in appropriate formats; and contending with bias arising from problems with sampling, assignment, blinding, measurement error, and researchers' prejudices. This article should advance the field by stimulating debate, promoting innovative approaches, and serving as a useful checklist for authors, reviewers, and editors.
Le Quellec, Sandra; Paris, Mickaël; Nougier, Christophe; Sobas, Frédéric; Rugeri, Lucia; Girard, Sandrine; Bordet, Jean-Claude; Négrier, Claude; Dargaud, Yesim
2017-05-01
Pneumatic tube system (PTS) in hospitals is commonly used for the transport of blood samples to clinical laboratories, as it is rapid and cost-effective. The aim was to compare the effects on haematology samples of a newly acquired ~2km-long PTS that links 2 hospitals with usual transport (non-pneumatic tube system, NPTS). Complete blood cell count, routine coagulation assays, platelet function tests (PFT) with light-transmission aggregometry and global coagulation assays including ROTEM® and thrombin generation assay (TGA) were performed on blood samples from 30 healthy volunteers and 9 healthy volunteers who agreed to take aspirin prior to blood sampling. The turnaround time was reduced by 31% (p<0.001) with the use of PTS. No statistically significant difference was observed for most routine haematology assays including PFT, and ROTEM® analysis. A statistically significant, but not clinically relevant, shortening of the APTT after sample transport by PTS was found (mean±SD: 30s±1.8 vs. 29.5s±2.1 for NPTS). D-dimer levels were 7.4% higher after transport through PTS but were not discordant. A statistically significant increase of thrombin generation was found in both platelet poor- and platelet rich- plasma samples after PTS transport compared to NPTS transport. PTS is suitable for the transport of samples prior to routine haematology assays including PFT, but should not be used for samples intended for thrombin generation measurement. Copyright © 2017 Elsevier Ltd. All rights reserved.
Incorporating Biological Knowledge into Evaluation of Casual Regulatory Hypothesis
NASA Technical Reports Server (NTRS)
Chrisman, Lonnie; Langley, Pat; Bay, Stephen; Pohorille, Andrew; DeVincenzi, D. (Technical Monitor)
2002-01-01
Biological data can be scarce and costly to obtain. The small number of samples available typically limits statistical power and makes reliable inference of causal relations extremely difficult. However, we argue that statistical power can be increased substantially by incorporating prior knowledge and data from diverse sources. We present a Bayesian framework that combines information from different sources and we show empirically that this lets one make correct causal inferences with small sample sizes that otherwise would be impossible.
Laser Velocimeter Measurements and Analysis in Turbulent Flows with Combustion. Part 2.
1983-07-01
sampling error for 63 this sample size. Mean velocities and turbulence intensi- ties were found to be statistically accurate to ± 1 % and 13%, respectively...Although the statist - ical error was found to be rather small (± 1 % for mean velo- cities and 13% for turbulence intensities), there can be additional...34Computational and Experimental Study of a Captive Annular Eddy," Journal of Fluid Mechanics, Vol. 28, pt. 1 , pp. 43-63, 12 April, 1967. 152 REFERENCES (con’d
GLIMMPSE Lite: Calculating Power and Sample Size on Smartphone Devices
Munjal, Aarti; Sakhadeo, Uttara R.; Muller, Keith E.; Glueck, Deborah H.; Kreidler, Sarah M.
2014-01-01
Researchers seeking to develop complex statistical applications for mobile devices face a common set of difficult implementation issues. In this work, we discuss general solutions to the design challenges. We demonstrate the utility of the solutions for a free mobile application designed to provide power and sample size calculations for univariate, one-way analysis of variance (ANOVA), GLIMMPSE Lite. Our design decisions provide a guide for other scientists seeking to produce statistical software for mobile platforms. PMID:25541688
Analysis of defect structure in silicon. Characterization of samples from UCP ingot 5848-13C
NASA Technical Reports Server (NTRS)
Natesh, R.; Guyer, T.; Stringfellow, G. B.
1982-01-01
Statistically significant quantitative structural imperfection measurements were made on samples from ubiquitous crystalline process (UCP) Ingot 5848 - 13 C. Important trends were noticed between the measured data, cell efficiency, and diffusion length. Grain boundary substructure appears to have an important effect on the conversion efficiency of solar cells from Semix material. Quantitative microscopy measurements give statistically significant information compared to other microanalytical techniques. A surface preparation technique to obtain proper contrast of structural defects suitable for QTM analysis was perfected.
Replicating studies in which samples of participants respond to samples of stimuli.
Westfall, Jacob; Judd, Charles M; Kenny, David A
2015-05-01
In a direct replication, the typical goal is to reproduce a prior experimental result with a new but comparable sample of participants in a high-powered replication study. Often in psychology, the research to be replicated involves a sample of participants responding to a sample of stimuli. In replicating such studies, we argue that the same criteria should be used in sampling stimuli as are used in sampling participants. Namely, a new but comparable sample of stimuli should be used to ensure that the original results are not due to idiosyncrasies of the original stimulus sample, and the stimulus sample must often be enlarged to ensure high statistical power. In support of the latter point, we discuss the fact that in experiments involving samples of stimuli, statistical power typically does not approach 1 as the number of participants goes to infinity. As an example of the importance of sampling new stimuli, we discuss the bygone literature on the risky shift phenomenon, which was almost entirely based on a single stimulus sample that was later discovered to be highly unrepresentative. We discuss the use of both resampled and expanded stimulus sets, that is, stimulus samples that include the original stimuli plus new stimuli. © The Author(s) 2015.
Statistical Methods for Passive Vehicle Classification in Urban Traffic Surveillance and Control
DOT National Transportation Integrated Search
1980-01-01
A statistical approach to passive vehicle classification using the phase-shift signature from electromagnetic presence-type vehicle detectors is developed with digitized samples of the analog phase-shift signature, the problem of classifying vehicle ...
Forest wildlife habitat statistics for Maine - 1982
Robert T. Brooks; Thomas S. Frieswyk; Arthur Ritter
1986-01-01
A statistical report on the first forest wildlife habitat survey of Maine (1982). Eighty-five tables show estimates of forest area and several attributes of forest land wildlife habitat. Data are presented at two levels: state and geographic sampling unit.
40 CFR 91.512 - Request for public hearing.
Code of Federal Regulations, 2010 CFR
2010-07-01
... plans and statistical analyses have been properly applied (specifically, whether sampling procedures and statistical analyses specified in this subpart were followed and whether there exists a basis for... will be made available to the public during Agency business hours. ...
Methodological quality of behavioural weight loss studies: a systematic review
Lemon, S. C.; Wang, M. L.; Haughton, C. F.; Estabrook, D. P.; Frisard, C. F.; Pagoto, S. L.
2018-01-01
Summary This systematic review assessed the methodological quality of behavioural weight loss intervention studies conducted among adults and associations between quality and statistically significant weight loss outcome, strength of intervention effectiveness and sample size. Searches for trials published between January, 2009 and December, 2014 were conducted using PUBMED, MEDLINE and PSYCINFO and identified ninety studies. Methodological quality indicators included study design, anthropometric measurement approach, sample size calculations, intent-to-treat (ITT) analysis, loss to follow-up rate, missing data strategy, sampling strategy, report of treatment receipt and report of intervention fidelity (mean = 6.3). Indicators most commonly utilized included randomized design (100%), objectively measured anthropometrics (96.7%), ITT analysis (86.7%) and reporting treatment adherence (76.7%). Most studies (62.2%) had a follow-up rate >75% and reported a loss to follow-up analytic strategy or minimal missing data (69.9%). Describing intervention fidelity (34.4%) and sampling from a known population (41.1%) were least common. Methodological quality was not associated with reporting a statistically significant result, effect size or sample size. This review found the published literature of behavioural weight loss trials to be of high quality for specific indicators, including study design and measurement. Identified for improvement include utilization of more rigorous statistical approaches to loss to follow up and better fidelity reporting. PMID:27071775
Analyzing Kernel Matrices for the Identification of Differentially Expressed Genes
Xia, Xiao-Lei; Xing, Huanlai; Liu, Xueqin
2013-01-01
One of the most important applications of microarray data is the class prediction of biological samples. For this purpose, statistical tests have often been applied to identify the differentially expressed genes (DEGs), followed by the employment of the state-of-the-art learning machines including the Support Vector Machines (SVM) in particular. The SVM is a typical sample-based classifier whose performance comes down to how discriminant samples are. However, DEGs identified by statistical tests are not guaranteed to result in a training dataset composed of discriminant samples. To tackle this problem, a novel gene ranking method namely the Kernel Matrix Gene Selection (KMGS) is proposed. The rationale of the method, which roots in the fundamental ideas of the SVM algorithm, is described. The notion of ''the separability of a sample'' which is estimated by performing -like statistics on each column of the kernel matrix, is first introduced. The separability of a classification problem is then measured, from which the significance of a specific gene is deduced. Also described is a method of Kernel Matrix Sequential Forward Selection (KMSFS) which shares the KMGS method's essential ideas but proceeds in a greedy manner. On three public microarray datasets, our proposed algorithms achieved noticeably competitive performance in terms of the B.632+ error rate. PMID:24349110
Comparing geological and statistical approaches for element selection in sediment tracing research
NASA Astrophysics Data System (ADS)
Laceby, J. Patrick; McMahon, Joe; Evrard, Olivier; Olley, Jon
2015-04-01
Elevated suspended sediment loads reduce reservoir capacity and significantly increase the cost of operating water treatment infrastructure, making the management of sediment supply to reservoirs of increasingly importance. Sediment fingerprinting techniques can be used to determine the relative contributions of different sources of sediment accumulating in reservoirs. The objective of this research is to compare geological and statistical approaches to element selection for sediment fingerprinting modelling. Time-integrated samplers (n=45) were used to obtain source samples from four major subcatchments flowing into the Baroon Pocket Dam in South East Queensland, Australia. The geochemistry of potential sources were compared to the geochemistry of sediment cores (n=12) sampled in the reservoir. The geochemical approach selected elements for modelling that provided expected, observed and statistical discrimination between sediment sources. Two statistical approaches selected elements for modelling with the Kruskal-Wallis H-test and Discriminatory Function Analysis (DFA). In particular, two different significance levels (0.05 & 0.35) for the DFA were included to investigate the importance of element selection on modelling results. A distribution model determined the relative contributions of different sources to sediment sampled in the Baroon Pocket Dam. Elemental discrimination was expected between one subcatchment (Obi Obi Creek) and the remaining subcatchments (Lexys, Falls and Bridge Creek). Six major elements were expected to provide discrimination. Of these six, only Fe2O3 and SiO2 provided expected, observed and statistical discrimination. Modelling results with this geological approach indicated 36% (+/- 9%) of sediment sampled in the reservoir cores were from mafic-derived sources and 64% (+/- 9%) were from felsic-derived sources. The geological and the first statistical approach (DFA0.05) differed by only 1% (σ 5%) for 5 out of 6 model groupings with only the Lexys Creek modelling results differing significantly (35%). The statistical model with expanded elemental selection (DFA0.35) differed from the geological model by an average of 30% for all 6 models. Elemental selection for sediment fingerprinting therefore has the potential to impact modeling results. Accordingly is important to incorporate both robust geological and statistical approaches when selecting elements for sediment fingerprinting. For the Baroon Pocket Dam, management should focus on reducing the supply of sediments derived from felsic sources in each of the subcatchments.
Mechanical properties of silicate glasses exposed to a low-Earth orbit
NASA Technical Reports Server (NTRS)
Wiedlocher, David E.; Tucker, Dennis S.; Nichols, Ron; Kinser, Donald L.
1992-01-01
The effects of a 5.8 year exposure to low earth orbit environment upon the mechanical properties of commercial optical fused silica, low iron soda-lime-silica, Pyrex 7740, Vycor 7913, BK-7, and the glass ceramic Zerodur were examined. Mechanical testing employed the ASTM-F-394 piston on 3-ball method in a liquid nitrogen environment. Samples were exposed on the Long Duration Exposure Facility (LDEF) in two locations. Impacts were observed on all specimens except Vycor. Weibull analysis as well as a standard statistical evaluation were conducted. The Weibull analysis revealed no differences between control samples and the two exposed samples. We thus concluded that radiation components of the Earth orbital environment did not degrade the mechanical strength of the samples examined within the limits of experimental error. The upper bound of strength degradation for meteorite impacted samples based upon statistical analysis and observation was 50 percent.
Alcañiz-Zanón, Manuela; Mompart-Penina, Anna; Guillén-Estany, Montserrat; Medina-Bustos, Antonia; Aragay-Barbany, Josep M; Brugulat-Guiteras, Pilar; Tresserras-Gaju, Ricard
2014-01-01
This article presents the genesis of the Health Survey of Catalonia (Spain, 2010-2014) with its semiannual subsamples and explains the basic characteristics of its multistage sampling design. In comparison with previous surveys, the organizational advantages of this new statistical operation include rapid data availability and the ability to continuously monitor the population. The main benefits are timeliness in the production of indicators and the possibility of introducing new topics through the supplemental questionnaire as a function of needs. Limitations consist of the complexity of the sample design and the lack of longitudinal follow-up of the sample. Suitable sampling weights for each specific subsample are necessary for any statistical analysis of micro-data. Accuracy in the analysis of territorial disaggregation or population subgroups increases if annual samples are accumulated. Copyright © 2013 SESPAS. Published by Elsevier Espana. All rights reserved.
Effects of sampling interval on spatial patterns and statistics of watershed nitrogen concentration
Wu, S.-S.D.; Usery, E.L.; Finn, M.P.; Bosch, D.D.
2009-01-01
This study investigates how spatial patterns and statistics of a 30 m resolution, model-simulated, watershed nitrogen concentration surface change with sampling intervals from 30 m to 600 m for every 30 m increase for the Little River Watershed (Georgia, USA). The results indicate that the mean, standard deviation, and variogram sills do not have consistent trends with increasing sampling intervals, whereas the variogram ranges remain constant. A sampling interval smaller than or equal to 90 m is necessary to build a representative variogram. The interpolation accuracy, clustering level, and total hot spot areas show decreasing trends approximating a logarithmic function. The trends correspond to the nitrogen variogram and start to level at a sampling interval of 360 m, which is therefore regarded as a critical spatial scale of the Little River Watershed. Copyright ?? 2009 by Bellwether Publishing, Ltd. All right reserved.
Gunter, M.E.; Singleton, E.; Bandli, B.R.; Lowers, H.A.; Meeker, G.P.
2005-01-01
Major-, minor-, and trace-element compositions, as determined by X-ray fluorescence (XRF) analysis, were obtained on 34 samples of vermiculite to ascertain whether chemical differences exist to the extent of determining the source of commercial products. The sample set included ores from four deposits, seven commercially available garden products, and insulation from four attics. The trace-element distributions of Ba, Cr, and V can be used to distinguish the Libby vermiculite samples from the garden products. In general, the overall composition of the Libby and South Carolina deposits appeared similar, but differed from the South Africa and China deposits based on simple statistical methods. Cluster analysis provided a good distinction of the four ore types, grouped the four attic samples with the Libby ore, and, with less certainty, grouped the garden samples with the South Africa ore.
How Sample Size Affects a Sampling Distribution
ERIC Educational Resources Information Center
Mulekar, Madhuri S.; Siegel, Murray H.
2009-01-01
If students are to understand inferential statistics successfully, they must have a profound understanding of the nature of the sampling distribution. Specifically, they must comprehend the determination of the expected value and standard error of a sampling distribution as well as the meaning of the central limit theorem. Many students in a high…
Emery, R J
1997-03-01
Institutional radiation safety programs routinely use wipe test sampling and liquid scintillation counting analysis to indicate the presence of removable radioactive contamination. Significant volumes of liquid waste can be generated by such surveillance activities, and the subsequent disposal of these materials can sometimes be difficult and costly. In settings where large numbers of negative results are regularly obtained, the limited grouping of samples for analysis based on expected value statistical techniques is possible. To demonstrate the plausibility of the approach, single wipe samples exposed to varying amounts of contamination were analyzed concurrently with nine non-contaminated samples. Although the sample grouping inevitably leads to increased quenching with liquid scintillation counting systems, the effect did not impact the ability to detect removable contamination in amounts well below recommended action levels. Opportunities to further improve this cost effective semi-quantitative screening procedure are described, including improvements in sample collection procedures, enhancing sample-counting media contact through mixing and extending elution periods, increasing sample counting times, and adjusting institutional action levels.
Barker, C.E.; Pawlewicz, M.J.
1993-01-01
In coal samples, published recommendations based on statistical methods suggest 100 measurements are needed to estimate the mean random vitrinite reflectance (Rv-r) to within ??2%. Our survey of published thermal maturation studies indicates that those using dispersed organic matter (DOM) mostly have an objective of acquiring 50 reflectance measurements. This smaller objective size in DOM versus that for coal samples poses a statistical contradiction because the standard deviations of DOM reflectance distributions are typically larger indicating a greater sample size is needed to accurately estimate Rv-r in DOM. However, in studies of thermal maturation using DOM, even 50 measurements can be an unrealistic requirement given the small amount of vitrinite often found in such samples. Furthermore, there is generally a reduced need for assuring precision like that needed for coal applications. Therefore, a key question in thermal maturation studies using DOM is how many measurements of Rv-r are needed to adequately estimate the mean. Our empirical approach to this problem is to compute the reflectance distribution statistics: mean, standard deviation, skewness, and kurtosis in increments of 10 measurements. This study compares these intermediate computations of Rv-r statistics with a final one computed using all measurements for that sample. Vitrinite reflectance was measured on mudstone and sandstone samples taken from borehole M-25 in the Cerro Prieto, Mexico geothermal system which was selected because the rocks have a wide range of thermal maturation and a comparable humic DOM with depth. The results of this study suggest that after only 20-30 measurements the mean Rv-r is generally known to within 5% and always to within 12% of the mean Rv-r calculated using all of the measured particles. Thus, even in the worst case, the precision after measuring only 20-30 particles is in good agreement with the general precision of one decimal place recommended for mean Rv-r measurements on DOM. The coefficient of variation (V = standard deviation/mean) is proposed as a statistic to indicate the reliability of the mean Rv-r estimates made at n ??? 20. This preliminary study suggests a V 0.2 suggests an unreliable mean in such small samples. ?? 1993.
2005-07-01
as an access graft is addressed using statistical methods below. Graft consistency can be defined statistically as the variance associated with the...addressed using statistical methods below. Graft consistency can be defined statistically as the variance associated with the sample of grafts tested in...measured using a refractometer (Brix % method). The equilibration data are shown in Graph 1. The results suggest the following equilibration scheme: 40% v/v
NASA Technical Reports Server (NTRS)
Young, M.; Koslovsky, M.; Schaefer, Caroline M.; Feiveson, A. H.
2017-01-01
Back by popular demand, the JSC Biostatistics Laboratory and LSAH statisticians are offering an opportunity to discuss your statistical challenges and needs. Take the opportunity to meet the individuals offering expert statistical support to the JSC community. Join us for an informal conversation about any questions you may have encountered with issues of experimental design, analysis, or data visualization. Get answers to common questions about sample size, repeated measures, statistical assumptions, missing data, multiple testing, time-to-event data, and when to trust the results of your analyses.
Confidence crisis of results in biomechanics research.
Knudson, Duane
2017-11-01
Many biomechanics studies have small sample sizes and incorrect statistical analyses, so reporting of inaccurate inferences and inflated magnitude of effects are common in the field. This review examines these issues in biomechanics research and summarises potential solutions from research in other fields to increase the confidence in the experimental effects reported in biomechanics. Authors, reviewers and editors of biomechanics research reports are encouraged to improve sample sizes and the resulting statistical power, improve reporting transparency, improve the rigour of statistical analyses used, and increase the acceptance of replication studies to improve the validity of inferences from data in biomechanics research. The application of sports biomechanics research results would also improve if a larger percentage of unbiased effects and their uncertainty were reported in the literature.
Biostatistical analysis of quantitative immunofluorescence microscopy images.
Giles, C; Albrecht, M A; Lam, V; Takechi, R; Mamo, J C
2016-12-01
Semiquantitative immunofluorescence microscopy has become a key methodology in biomedical research. Typical statistical workflows are considered in the context of avoiding pseudo-replication and marginalising experimental error. However, immunofluorescence microscopy naturally generates hierarchically structured data that can be leveraged to improve statistical power and enrich biological interpretation. Herein, we describe a robust distribution fitting procedure and compare several statistical tests, outlining their potential advantages/disadvantages in the context of biological interpretation. Further, we describe tractable procedures for power analysis that incorporates the underlying distribution, sample size and number of images captured per sample. The procedures outlined have significant potential for increasing understanding of biological processes and decreasing both ethical and financial burden through experimental optimization. © 2016 The Authors Journal of Microscopy © 2016 Royal Microscopical Society.
Conservative Tests under Satisficing Models of Publication Bias.
McCrary, Justin; Christensen, Garret; Fanelli, Daniele
2016-01-01
Publication bias leads consumers of research to observe a selected sample of statistical estimates calculated by producers of research. We calculate critical values for statistical significance that could help to adjust after the fact for the distortions created by this selection effect, assuming that the only source of publication bias is file drawer bias. These adjusted critical values are easy to calculate and differ from unadjusted critical values by approximately 50%-rather than rejecting a null hypothesis when the t-ratio exceeds 2, the analysis suggests rejecting a null hypothesis when the t-ratio exceeds 3. Samples of published social science research indicate that on average, across research fields, approximately 30% of published t-statistics fall between the standard and adjusted cutoffs.
Conservative Tests under Satisficing Models of Publication Bias
McCrary, Justin; Christensen, Garret; Fanelli, Daniele
2016-01-01
Publication bias leads consumers of research to observe a selected sample of statistical estimates calculated by producers of research. We calculate critical values for statistical significance that could help to adjust after the fact for the distortions created by this selection effect, assuming that the only source of publication bias is file drawer bias. These adjusted critical values are easy to calculate and differ from unadjusted critical values by approximately 50%—rather than rejecting a null hypothesis when the t-ratio exceeds 2, the analysis suggests rejecting a null hypothesis when the t-ratio exceeds 3. Samples of published social science research indicate that on average, across research fields, approximately 30% of published t-statistics fall between the standard and adjusted cutoffs. PMID:26901834
NASA Astrophysics Data System (ADS)
Edjah, Adwoba; Stenni, Barbara; Cozzi, Giulio; Turetta, Clara; Dreossi, Giuliano; Tetteh Akiti, Thomas; Yidana, Sandow
2017-04-01
Adwoba Kua- Manza Edjaha, Barbara Stennib,c,Giuliano Dreossib, Giulio Cozzic, Clara Turetta c,T.T Akitid ,Sandow Yidanae a,eDepartment of Earth Science, University of Ghana Legon, Ghana West Africa bDepartment of Enviromental Sciences, Informatics and Statistics, Ca Foscari University of Venice, Italy cInstitute for the Dynamics of Environmental Processes, CNR, Venice, Italy dDepartment of Nuclear Application and Techniques, Graduate School of Nuclear and Allied Sciences University of Ghana Legon This research is part of a PhD research work "Hydrogeological Assessment of the Lower Tano river basin for sustainable economic usage, Ghana, West - Africa". In this study, the researcher investigated surface water and groundwater quality in the Lower Tano river basin. This assessment was based on some selected sampling sites associated with mining activities, and the development of oil and gas. Statistical approach was applied to characterize the quality of surface water and groundwater. Also, water stable isotopes, which is a natural tracer of the hydrological cycle was used to investigate the origin of groundwater recharge in the basin. The study revealed that Pb and Ni values of the surface water and groundwater samples exceeded the WHO standards for drinking water. In addition, water quality index (WQI), based on physicochemical parameters(EC, TDS, pH) and major ions(Ca2+, Na+, Mg2+, HCO3-,NO3-, CL-, SO42-, K+) exhibited good quality water for 60% of the sampled surface water and groundwater. Other statistical techniques, such as Heavy metal pollution index (HPI), degree of contamination (Cd), and heavy metal evaluation index (HEI), based on trace element parameters in the water samples, reveal that 90% of the surface water and groundwater samples belong to high level of pollution. Principal component analysis (PCA) also suggests that the water quality in the basin is likely affected by rock - water interaction and anthropogenic activities (sea water intrusion). This was confirm by further statistical analysis (cluster analysis and correlation matrix) of the water quality parameters. Spatial distribution of water quality parameters, trace elements and the results obtained from the statistical analysis was determined by geographical information system (GIS). In addition, the isotopic analysis of the sampled surface water and groundwater revealed that most of the surface water and groundwater were of meteoric origin with little or no isotopic variations. It is expected that outcomes of this research will form a baseline for making appropriate decision on water quality management by decision makers in the Lower Tano river Basin. Keywords: Water stable isotopes, Trace elements, Multivariate statistics, Evaluation indices, Lower Tano river basin.
Mueller, Amy V; Hemond, Harold F
2016-05-18
Knowledge of ionic concentrations in natural waters is essential to understand watershed processes. Inorganic nitrogen, in the form of nitrate and ammonium ions, is a key nutrient as well as a participant in redox, acid-base, and photochemical processes of natural waters, leading to spatiotemporal patterns of ion concentrations at scales as small as meters or hours. Current options for measurement in situ are costly, relying primarily on instruments adapted from laboratory methods (e.g., colorimetric, UV absorption); free-standing and inexpensive ISE sensors for NO3(-) and NH4(+) could be attractive alternatives if interferences from other constituents were overcome. Multi-sensor arrays, coupled with appropriate non-linear signal processing, offer promise in this capacity but have not yet successfully achieved signal separation for NO3(-) and NH4(+)in situ at naturally occurring levels in unprocessed water samples. A novel signal processor, underpinned by an appropriate sensor array, is proposed that overcomes previous limitations by explicitly integrating basic chemical constraints (e.g., charge balance). This work further presents a rationalized process for the development of such in situ instrumentation for NO3(-) and NH4(+), including a statistical-modeling strategy for instrument design, training/calibration, and validation. Statistical analysis reveals that historical concentrations of major ionic constituents in natural waters across New England strongly covary and are multi-modal. This informs the design of a statistically appropriate training set, suggesting that the strong covariance of constituents across environmental samples can be exploited through appropriate signal processing mechanisms to further improve estimates of minor constituents. Two artificial neural network architectures, one expanded to incorporate knowledge of basic chemical constraints, were tested to process outputs of a multi-sensor array, trained using datasets of varying degrees of statistical representativeness to natural water samples. The accuracy of ANN results improves monotonically with the statistical representativeness of the training set (error decreases by ∼5×), while the expanded neural network architecture contributes a further factor of 2-3.5 decrease in error when trained with the most representative sample set. Results using the most statistically accurate set of training samples (which retain environmentally relevant ion concentrations but avoid the potential interference of humic acids) demonstrated accurate, unbiased quantification of nitrate and ammonium at natural environmental levels (±20% down to <10 μM), as well as the major ions Na(+), K(+), Ca(2+), Mg(2+), Cl(-), and SO4(2-), in unprocessed samples. These results show promise for the development of new in situ instrumentation for the support of scientific field work.
Lociciro, S; Esseiva, P; Hayoz, P; Dujourdy, L; Besacier, F; Margot, P
2008-05-20
Harmonisation and optimization of analytical and statistical methodologies were carried out between two forensic laboratories (Lausanne, Switzerland and Lyon, France) in order to provide drug intelligence for cross-border cocaine seizures. Part I dealt with the optimization of the analytical method and its robustness. This second part investigates statistical methodologies that will provide reliable comparison of cocaine seizures analysed on two different gas chromatographs interfaced with a flame ionisation detectors (GC-FIDs) in two distinct laboratories. Sixty-six statistical combinations (ten data pre-treatments followed by six different distance measurements and correlation coefficients) were applied. One pre-treatment (N+S: area of each peak is divided by its standard deviation calculated from the whole data set) followed by the Cosine or Pearson correlation coefficients were found to be the best statistical compromise for optimal discrimination of linked and non-linked samples. The centralisation of the analyses in one single laboratory is not a required condition anymore to compare samples seized in different countries. This allows collaboration, but also, jurisdictional control over data.
Statistical benchmark for BosonSampling
NASA Astrophysics Data System (ADS)
Walschaers, Mattia; Kuipers, Jack; Urbina, Juan-Diego; Mayer, Klaus; Tichy, Malte Christopher; Richter, Klaus; Buchleitner, Andreas
2016-03-01
Boson samplers—set-ups that generate complex many-particle output states through the transmission of elementary many-particle input states across a multitude of mutually coupled modes—promise the efficient quantum simulation of a classically intractable computational task, and challenge the extended Church-Turing thesis, one of the fundamental dogmas of computer science. However, as in all experimental quantum simulations of truly complex systems, one crucial problem remains: how to certify that a given experimental measurement record unambiguously results from enforcing the claimed dynamics, on bosons, fermions or distinguishable particles? Here we offer a statistical solution to the certification problem, identifying an unambiguous statistical signature of many-body quantum interference upon transmission across a multimode, random scattering device. We show that statistical analysis of only partial information on the output state allows to characterise the imparted dynamics through particle type-specific features of the emerging interference patterns. The relevant statistical quantifiers are classically computable, define a falsifiable benchmark for BosonSampling, and reveal distinctive features of many-particle quantum dynamics, which go much beyond mere bunching or anti-bunching effects.
Output statistics of laser anemometers in sparsely seeded flows
NASA Technical Reports Server (NTRS)
Edwards, R. V.; Jensen, A. S.
1982-01-01
It is noted that until very recently, research on this topic concentrated on the particle arrival statistics and the influence of the optical parameters on them. Little attention has been paid to the influence of subsequent processing on the measurement statistics. There is also controversy over whether the effects of the particle statistics can be measured. It is shown here that some of the confusion derives from a lack of understanding of the experimental parameters that are to be controlled or known. A rigorous framework is presented for examining the measurement statistics of such systems. To provide examples, two problems are then addressed. The first has to do with a sample and hold processor, the second with what is called a saturable processor. The sample and hold processor converts the output to a continuous signal by holding the last reading until a new one is obtained. The saturable system is one where the maximum processable rate is arrived at by the dead time of some unit in the system. At high particle rates, the processed rate is determined through the dead time.
OCT Amplitude and Speckle Statistics of Discrete Random Media.
Almasian, Mitra; van Leeuwen, Ton G; Faber, Dirk J
2017-11-01
Speckle, amplitude fluctuations in optical coherence tomography (OCT) images, contains information on sub-resolution structural properties of the imaged sample. Speckle statistics could therefore be utilized in the characterization of biological tissues. However, a rigorous theoretical framework relating OCT speckle statistics to structural tissue properties has yet to be developed. As a first step, we present a theoretical description of OCT speckle, relating the OCT amplitude variance to size and organization for samples of discrete random media (DRM). Starting the calculations from the size and organization of the scattering particles, we analytically find expressions for the OCT amplitude mean, amplitude variance, the backscattering coefficient and the scattering coefficient. We assume fully developed speckle and verify the validity of this assumption by experiments on controlled samples of silica microspheres suspended in water. We show that the OCT amplitude variance is sensitive to sub-resolution changes in size and organization of the scattering particles. Experimentally determined and theoretically calculated optical properties are compared and in good agreement.
Particle-sampling statistics in laser anemometers Sample-and-hold systems and saturable systems
NASA Technical Reports Server (NTRS)
Edwards, R. V.; Jensen, A. S.
1983-01-01
The effect of the data-processing system on the particle statistics obtained with laser anemometry of flows containing suspended particles is examined. Attention is given to the sample and hold processor, a pseudo-analog device which retains the last measurement until a new measurement is made, followed by time-averaging of the data. The second system considered features a dead time, i.e., a saturable system with a significant reset time with storage in a data buffer. It is noted that the saturable system operates independent of the particle arrival rate. The probabilities of a particle arrival in a given time period are calculated for both processing systems. It is shown that the system outputs are dependent on the mean particle flow rate, the flow correlation time, and the flow statistics, indicating that the particle density affects both systems. The results are significant for instances of good correlation between the particle density and velocity, such as occurs near the edge of a jet.
Assessment of variations in thermal cycle life data of thermal barrier coated rods
NASA Astrophysics Data System (ADS)
Hendricks, R. C.; McDonald, G.
An analysis of thermal cycle life data for 22 thermal barrier coated (TBC) specimens was conducted. The Zr02-8Y203/NiCrAlY plasma spray coated Rene 41 rods were tested in a Mach 0.3 Jet A/air burner flame. All specimens were subjected to the same coating and subsequent test procedures in an effort to control three parametric groups; material properties, geometry and heat flux. Statistically, the data sample space had a mean of 1330 cycles with a standard deviation of 520 cycles. The data were described by normal or log-normal distributions, but other models could also apply; the sample size must be increased to clearly delineate a statistical failure model. The statistical methods were also applied to adhesive/cohesive strength data for 20 TBC discs of the same composition, with similar results. The sample space had a mean of 9 MPa with a standard deviation of 4.2 MPa.
Assessment of variations in thermal cycle life data of thermal barrier coated rods
NASA Technical Reports Server (NTRS)
Hendricks, R. C.; Mcdonald, G.
1981-01-01
An analysis of thermal cycle life data for 22 thermal barrier coated (TBC) specimens was conducted. The Zr02-8Y203/NiCrAlY plasma spray coated Rene 41 rods were tested in a Mach 0.3 Jet A/air burner flame. All specimens were subjected to the same coating and subsequent test procedures in an effort to control three parametric groups; material properties, geometry and heat flux. Statistically, the data sample space had a mean of 1330 cycles with a standard deviation of 520 cycles. The data were described by normal or log-normal distributions, but other models could also apply; the sample size must be increased to clearly delineate a statistical failure model. The statistical methods were also applied to adhesive/cohesive strength data for 20 TBC discs of the same composition, with similar results. The sample space had a mean of 9 MPa with a standard deviation of 4.2 MPa.
ADEQUACY OF VISUALLY CLASSIFIED PARTICLE COUNT STATISTICS FROM REGIONAL STREAM HABITAT SURVEYS
Streamlined sampling procedures must be used to achieve a sufficient sample size with limited resources in studies undertaken to evaluate habitat status and potential management-related habitat degradation at a regional scale. At the same time, these sampling procedures must achi...
Analysis of the Einstein sample of early-type galaxies
NASA Technical Reports Server (NTRS)
Eskridge, Paul B.; Fabbiano, Giuseppina
1993-01-01
The EINSTEIN galaxy catalog contains x-ray data for 148 early-type (E and SO) galaxies. A detailed analysis of the global properties of this sample are studied. By comparing the x-ray properties with other tracers of the ISM, as well as with observables related to the stellar dynamics and populations of the sample, we expect to determine more clearly the physical relationships that determine the evolution of early-type galaxies. Previous studies with smaller samples have explored the relationships between x-ray luminosity (L(sub x)) and luminosities in other bands. Using our larger sample and the statistical techniques of survival analysis, a number of these earlier analyses were repeated. For our full sample, a strong statistical correlation is found between L(sub X) and L(sub B) (the probability that the null hypothesis is upheld is P less than 10(exp -4) from a variety of rank correlation tests. Regressions with several algorithms yield consistent results.
Using GIS to generate spatially balanced random survey designs for natural resource applications.
Theobald, David M; Stevens, Don L; White, Denis; Urquhart, N Scott; Olsen, Anthony R; Norman, John B
2007-07-01
Sampling of a population is frequently required to understand trends and patterns in natural resource management because financial and time constraints preclude a complete census. A rigorous probability-based survey design specifies where to sample so that inferences from the sample apply to the entire population. Probability survey designs should be used in natural resource and environmental management situations because they provide the mathematical foundation for statistical inference. Development of long-term monitoring designs demand survey designs that achieve statistical rigor and are efficient but remain flexible to inevitable logistical or practical constraints during field data collection. Here we describe an approach to probability-based survey design, called the Reversed Randomized Quadrant-Recursive Raster, based on the concept of spatially balanced sampling and implemented in a geographic information system. This provides environmental managers a practical tool to generate flexible and efficient survey designs for natural resource applications. Factors commonly used to modify sampling intensity, such as categories, gradients, or accessibility, can be readily incorporated into the spatially balanced sample design.
The t-test: An Influential Inferential Tool in Chaplaincy and Other Healthcare Research.
Jankowski, Katherine R B; Flannelly, Kevin J; Flannelly, Laura T
2018-01-01
The t-test developed by William S. Gosset (also known as Student's t-test and the two-sample t-test) is commonly used to compare one sample mean on a measure with another sample mean on the same measure. The outcome of the t-test is used to draw inferences about how different the samples are from each other. It is probably one of the most frequently relied upon statistics in inferential research. It is easy to use: a researcher can calculate the statistic with three simple tools: paper, pen, and a calculator. A computer program can quickly calculate the t-test for large samples. The ease of use can result in the misuse of the t-test. This article discusses the development of the original t-test, basic principles of the t-test, two additional types of t-tests (the one-sample t-test and the paired t-test), and recommendations about what to consider when using the t-test to draw inferences in research.
Bonetti, Jennifer; Quarino, Lawrence
2014-05-01
This study has shown that the combination of simple techniques with the use of multivariate statistics offers the potential for the comparative analysis of soil samples. Five samples were obtained from each of twelve state parks across New Jersey in both the summer and fall seasons. Each sample was examined using particle-size distribution, pH analysis in both water and 1 M CaCl2 , and a loss on ignition technique. Data from each of the techniques were combined, and principal component analysis (PCA) and canonical discriminant analysis (CDA) were used for multivariate data transformation. Samples from different locations could be visually differentiated from one another using these multivariate plots. Hold-one-out cross-validation analysis showed error rates as low as 3.33%. Ten blind study samples were analyzed resulting in no misclassifications using Mahalanobis distance calculations and visual examinations of multivariate plots. Seasonal variation was minimal between corresponding samples, suggesting potential success in forensic applications. © 2014 American Academy of Forensic Sciences.
Arthur, W J; Markham, O D
1984-04-01
Polonium-210 concentrations were determined for soil, vegetation and small mammal tissues collected at a solid radioactive waste disposal area, near a phosphate ore processing plant and at two rural areas in southeastern Idaho. Polonium concentrations in media sampled near the radioactive waste disposal facility were equal to or less than values from rural area samples, indicating that disposal of solid radioactive waste at the Idaho National Engineering Laboratory Site has not resulted in increased environmental levels of polonium. Concentrations of 210Po in soils, deer mice hide and carcass samples collected near the phosphate processing plant were statistically (P less than or equal to 0.05) greater than the other sampling locations; however, the mean 210Po concentration in soils and small mammal tissues from sampling areas near the phosphate plant were only four and three times greater, respectively, than control values. No statistical (P greater than 0.05) difference was observed for 210Po concentrations in vegetation among any of the sampling locations.
Comparison between the Laser-Badal and Vernier Optometers.
1988-09-01
naval aviators (SNAs). We also measured dark vcrgence in the same sample of SNAs. THE FINDINGS There was no statistically significant difference found...relatively inexperienced operator. 7. The difference between mean scores on the vernier and laser-Badal optometers was statistically significant...thus indicating that test results were reliable within instru- menrts. TAbLE 1. Test and Retest Statistics . Measure Mean SD n t-value Dark vergence
Quasi-Monochromatic Visual Environments and the Resting Point of Accommodation
1988-01-01
accommodation. No statistically significant differences were revealed to support the possibility of color mediated differential regression to resting...discussed with respect to the general findings of the total sample as well as the specific behavior of individual participants. The summarized statistics ...remaining ten varied considerably with respect to the averaged trends reported in the above descriptive statistics as well as with respect to precision
5 CFR 532.215 - Establishments included in regular appropriated fund surveys.
Code of Federal Regulations, 2010 CFR
2010-01-01
... in surveys shall be selected under standard probability sample selection procedures. In areas with... establishment list drawn under statistical sampling procedures. [55 FR 46142, Nov. 1, 1990] ...
Voids and constraints on nonlinear clustering of galaxies
NASA Technical Reports Server (NTRS)
Vogeley, Michael S.; Geller, Margaret J.; Park, Changbom; Huchra, John P.
1994-01-01
Void statistics of the galaxy distribution in the Center for Astrophysics Redshift Survey provide strong constraints on galaxy clustering in the nonlinear regime, i.e., on scales R equal to or less than 10/h Mpc. Computation of high-order moments of the galaxy distribution requires a sample that (1) densely traces the large-scale structure and (2) covers sufficient volume to obtain good statistics. The CfA redshift survey densely samples structure on scales equal to or less than 10/h Mpc and has sufficient depth and angular coverage to approach a fair sample on these scales. In the nonlinear regime, the void probability function (VPF) for CfA samples exhibits apparent agreement with hierarchical scaling (such scaling implies that the N-point correlation functions for N greater than 2 depend only on pairwise products of the two-point function xi(r)) However, simulations of cosmological models show that this scaling in redshift space does not necessarily imply such scaling in real space, even in the nonlinear regime; peculiar velocities cause distortions which can yield erroneous agreement with hierarchical scaling. The underdensity probability measures the frequency of 'voids' with density rho less than 0.2 -/rho. This statistic reveals a paucity of very bright galaxies (L greater than L asterisk) in the 'voids.' Underdensities are equal to or greater than 2 sigma more frequent in bright galaxy samples than in samples that include fainter galaxies. Comparison of void statistics of CfA samples with simulations of a range of cosmological models favors models with Gaussian primordial fluctuations and Cold Dark Matter (CDM)-like initial power spectra. Biased models tend to produce voids that are too empty. We also compare these data with three specific models of the Cold Dark Matter cosmogony: an unbiased, open universe CDM model (omega = 0.4, h = 0.5) provides a good match to the VPF of the CfA samples. Biasing of the galaxy distribution in the 'standard' CDM model (omega = 1, b = 1.5; see below for definitions) and nonzero cosmological constant CDM model (omega = 0.4, h = 0.6 lambda(sub 0) = 0.6, b = 1.3) produce voids that are too empty. All three simulations match the observed VPF and underdensity probability for samples of very bright (M less than M asterisk = -19.2) galaxies, but produce voids that are too empty when compared with samples that include fainter galaxies.
High throughput nonparametric probability density estimation.
Farmer, Jenny; Jacobs, Donald
2018-01-01
In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.
High throughput nonparametric probability density estimation
Farmer, Jenny
2018-01-01
In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference. PMID:29750803
The relation between statistical power and inference in fMRI
Wager, Tor D.; Yarkoni, Tal
2017-01-01
Statistically underpowered studies can result in experimental failure even when all other experimental considerations have been addressed impeccably. In fMRI the combination of a large number of dependent variables, a relatively small number of observations (subjects), and a need to correct for multiple comparisons can decrease statistical power dramatically. This problem has been clearly addressed yet remains controversial—especially in regards to the expected effect sizes in fMRI, and especially for between-subjects effects such as group comparisons and brain-behavior correlations. We aimed to clarify the power problem by considering and contrasting two simulated scenarios of such possible brain-behavior correlations: weak diffuse effects and strong localized effects. Sampling from these scenarios shows that, particularly in the weak diffuse scenario, common sample sizes (n = 20–30) display extremely low statistical power, poorly represent the actual effects in the full sample, and show large variation on subsequent replications. Empirical data from the Human Connectome Project resembles the weak diffuse scenario much more than the localized strong scenario, which underscores the extent of the power problem for many studies. Possible solutions to the power problem include increasing the sample size, using less stringent thresholds, or focusing on a region-of-interest. However, these approaches are not always feasible and some have major drawbacks. The most prominent solutions that may help address the power problem include model-based (multivariate) prediction methods and meta-analyses with related synthesis-oriented approaches. PMID:29155843
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woodroffe, J. R.; Brito, T. V.; Jordanova, V. K.
In the standard practice of neutron multiplicity counting , the first three sampled factorial moments of the event triggered neutron count distribution were used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. Our study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra,more » Italy, sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less
Latent spatial models and sampling design for landscape genetics
Hanks, Ephraim M.; Hooten, Mevin B.; Knick, Steven T.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Cross, Todd B.; Schwartz, Michael K.
2016-01-01
We propose a spatially-explicit approach for modeling genetic variation across space and illustrate how this approach can be used to optimize spatial prediction and sampling design for landscape genetic data. We propose a multinomial data model for categorical microsatellite allele data commonly used in landscape genetic studies, and introduce a latent spatial random effect to allow for spatial correlation between genetic observations. We illustrate how modern dimension reduction approaches to spatial statistics can allow for efficient computation in landscape genetic statistical models covering large spatial domains. We apply our approach to propose a retrospective spatial sampling design for greater sage-grouse (Centrocercus urophasianus) population genetics in the western United States.
"Magnitude-based inference": a statistical review.
Welsh, Alan H; Knight, Emma J
2015-04-01
We consider "magnitude-based inference" and its interpretation by examining in detail its use in the problem of comparing two means. We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how "magnitude-based inference" is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. We show that "magnitude-based inference" is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with "magnitude-based inference" and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using "magnitude-based inference," a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis.
Oregon ground-water quality and its relation to hydrogeological factors; a statistical approach
Miller, T.L.; Gonthier, J.B.
1984-01-01
An appraisal of Oregon ground-water quality was made using existing data accessible through the U.S. Geological Survey computer system. The data available for about 1,000 sites were separated by aquifer units and hydrologic units. Selected statistical moments were described for 19 constituents including major ions. About 96 percent of all sites in the data base were sampled only once. The sample data were classified by aquifer unit and hydrologic unit and analysis of variance was run to determine if significant differences exist between the units within each of these two classifications for the same 19 constituents on which statistical moments were determined. Results of the analysis of variance indicated both classification variables performed about the same, but aquifer unit did provide more separation for some constituents. Samples from the Rogue River basin were classified by location within the flow system and type of flow system. The samples were then analyzed using analysis of variance on 14 constituents to determine if there were significant differences between subsets classified by flow path. Results of this analysis were not definitive, but classification as to the type of flow system did indicate potential for segregating water-quality data into distinct subsets. (USGS)
A STATISTICAL SURVEY OF DIOXIN-LIKE COMPOUNDS IN ...
The USEPA and the USDA completed the first statistically designed survey of the occurrence and concentration of dibenzo-p-dioxins (CDDs), dibenzofurans (CDFs), and coplanar polychlorinated biphenyls (PCBs) in the fat of beef animals raised for human consumption in the United States. Back fat was sampled from 63 carcasses at federally inspected slaughter establishments nationwide. The sample design called for sampling beef animal classes in proportion to national annual slaughter statistics. All samples were analyzed using a modification of EPA method 1613, using isotope dilution, High Resolution GC/MS to determine the rate of occurrence of 2,3,7,8-substituted CDDs/CDFs/PCBs. The method detection limits ranged from 0.05 ng/kg for TCDD to 3 ng/kg for OCDD. The results of this survey showed a mean concentration (reported as I-TEQ, lipid adjusted) in U.S. beef animals of 0.35 ng/kg and 0.89 ng/kg for CDD/CDF TEQs when either non-detects are treated as 0 value or assigned a value of 1/2 the detection limit, respectively, and 0.51 ng/kg for coplanar PCB TEQs at both non-detect equal 0 and 1/2 detection limit. journal article
NASA Technical Reports Server (NTRS)
Alberts, J. R.; Burden, H. W.; Hawes, N.; Ronca, A. E.
1996-01-01
To assess prenatal and postnatal developmental status in the offspring of a group of animals, it is typical to examine fetuses from some of the dams as well as infants born to the remaining dams. Statistical limitations often arise, particularly when the animals are rare or especially precious, because all offspring of the dam represent only a single statistical observation; littermates are not independent observations (biologically or statistically). We describe a study in which pregnant laboratory rats were laparotomized on day 7 of gestation (GD7) to ascertain the number and distribution of uterine implantation sites and were subjected to a simulated experience on a 10-day space shuttle flight. After the simulated landing on GD18, rats were unilaterally hysterectomized, thus providing a sample of fetuses from 10 independent uteruses, followed by successful vaginal delivery on GD22, yielding postnatal samples from 10 uteruses. A broad profile of maternal and offspring morphologic and physiologic measures indicated that these novel sampling procedures did not compromise maternal well-being and maintained normal offspring development and function. Measures included maternal organ weights and hormone concentrations, offspring body size, growth, organ weights, sexual differentiation, and catecholamine concentrations.
NASA Technical Reports Server (NTRS)
Bell, Thomas L.; Abdullah, A.; Martin, Russell L.; North, Gerald R.
1990-01-01
Estimates of monthly average rainfall based on satellite observations from a low earth orbit will differ from the true monthly average because the satellite observes a given area only intermittently. This sampling error inherent in satellite monitoring of rainfall would occur even if the satellite instruments could measure rainfall perfectly. The size of this error is estimated for a satellite system being studied at NASA, the Tropical Rainfall Measuring Mission (TRMM). First, the statistical description of rainfall on scales from 1 to 1000 km is examined in detail, based on rainfall data from the Global Atmospheric Research Project Atlantic Tropical Experiment (GATE). A TRMM-like satellite is flown over a two-dimensional time-evolving simulation of rainfall using a stochastic model with statistics tuned to agree with GATE statistics. The distribution of sampling errors found from many months of simulated observations is found to be nearly normal, even though the distribution of area-averaged rainfall is far from normal. For a range of orbits likely to be employed in TRMM, sampling error is found to be less than 10 percent of the mean for rainfall averaged over a 500 x 500 sq km area.
Trutschel, Diana; Palm, Rebecca; Holle, Bernhard; Simon, Michael
2017-11-01
Because not every scientific question on effectiveness can be answered with randomised controlled trials, research methods that minimise bias in observational studies are required. Two major concerns influence the internal validity of effect estimates: selection bias and clustering. Hence, to reduce the bias of the effect estimates, more sophisticated statistical methods are needed. To introduce statistical approaches such as propensity score matching and mixed models into representative real-world analysis and to conduct the implementation in statistical software R to reproduce the results. Additionally, the implementation in R is presented to allow the results to be reproduced. We perform a two-level analytic strategy to address the problems of bias and clustering: (i) generalised models with different abilities to adjust for dependencies are used to analyse binary data and (ii) the genetic matching and covariate adjustment methods are used to adjust for selection bias. Hence, we analyse the data from two population samples, the sample produced by the matching method and the full sample. The different analysis methods in this article present different results but still point in the same direction. In our example, the estimate of the probability of receiving a case conference is higher in the treatment group than in the control group. Both strategies, genetic matching and covariate adjustment, have their limitations but complement each other to provide the whole picture. The statistical approaches were feasible for reducing bias but were nevertheless limited by the sample used. For each study and obtained sample, the pros and cons of the different methods have to be weighted. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Sadri, Donia; Farhadi, Sareh; Shahabi, Zahra; Sarshar, Samaneh
2016-01-01
The recent scientific reports have shown that angiogenesis can affect biological behavior of pathologic lesions. Regarding unique clinical outcome of Odontogenic keratocyst (OKC), the present study was aimed to compare angiogenesis in Odontogenic keratocyst and Dentigerous cyst (DC). In this experimental study, tissue sections of 46 samples of OKC and DC were stained through immunohistochemical method using Vascular Endothelial Growth Factor (VEGF) antibody. VEGF expression was evaluated in epithelial cells, fibroblasts and endothelial cells. The average percentage of stained cells in any samples was categorized to 3 groups as follows: SCORE 0: 10% of cells or less are positive. SCORE 1: 10 to 50% of cells are positive. SCORE 2: more than 50% of cells are positive. Mann-U-Whitney, T-test and chi-square was used for statistical analysis. The average of VEGF expression in 24 samples of DC was 20.2% and in 22 samples of OKC was 52.6%, respectively. The average of VEGF expression in these two cysts had statistical significant differences. (PV= 0.045). There was significant statistical differences between two cysts in the terms of VEGF SCORE (PV= 0.000). OKC samples had significantly higher SCORE for the purpose of VEGF incidence than DC. Also, there were no differences between VEGF expression in epithelial cells of two cysts (PV= 0.268) there were significant statistical differences between two cysts in terms of endothelial cell staining. The endothelial cell staining was significantly higher in OKC than DC (PV= 0.037%). Regarding higher expression of Vascular Endothelial Growth factor in OKC than DC, it seems that angiogenesis may have great impression on clinical outcome of OKC.
Mukhopadhyay, Nitai D; Sampson, Andrew J; Deniz, Daniel; Alm Carlsson, Gudrun; Williamson, Jeffrey; Malusek, Alexandr
2012-01-01
Correlated sampling Monte Carlo methods can shorten computing times in brachytherapy treatment planning. Monte Carlo efficiency is typically estimated via efficiency gain, defined as the reduction in computing time by correlated sampling relative to conventional Monte Carlo methods when equal statistical uncertainties have been achieved. The determination of the efficiency gain uncertainty arising from random effects, however, is not a straightforward task specially when the error distribution is non-normal. The purpose of this study is to evaluate the applicability of the F distribution and standardized uncertainty propagation methods (widely used in metrology to estimate uncertainty of physical measurements) for predicting confidence intervals about efficiency gain estimates derived from single Monte Carlo runs using fixed-collision correlated sampling in a simplified brachytherapy geometry. A bootstrap based algorithm was used to simulate the probability distribution of the efficiency gain estimates and the shortest 95% confidence interval was estimated from this distribution. It was found that the corresponding relative uncertainty was as large as 37% for this particular problem. The uncertainty propagation framework predicted confidence intervals reasonably well; however its main disadvantage was that uncertainties of input quantities had to be calculated in a separate run via a Monte Carlo method. The F distribution noticeably underestimated the confidence interval. These discrepancies were influenced by several photons with large statistical weights which made extremely large contributions to the scored absorbed dose difference. The mechanism of acquiring high statistical weights in the fixed-collision correlated sampling method was explained and a mitigation strategy was proposed. Copyright © 2011 Elsevier Ltd. All rights reserved.
Sample sizes and model comparison metrics for species distribution models
B.B. Hanberry; H.S. He; D.C. Dey
2012-01-01
Species distribution models use small samples to produce continuous distribution maps. The question of how small a sample can be to produce an accurate model generally has been answered based on comparisons to maximum sample sizes of 200 observations or fewer. In addition, model comparisons often are made with the kappa statistic, which has become controversial....
Sampling methods for amphibians in streams in the Pacific Northwest.
R. Bruce Bury; Paul Stephen Corn
1991-01-01
Methods describing how to sample aquatic and semiaquatic amphibians in small streams and headwater habitats in the Pacific Northwest are presented. We developed a technique that samples 10-meter stretches of selected streams, which was adequate to detect presence or absence of amphibian species and provided sample sizes statistically sufficient to compare abundance of...
ERIC Educational Resources Information Center
Office of Student Financial Assistance (ED), Washington, DC.
A manual on sampling is presented to assist audit and program reviewers, project officers, managers, and program specialists of the U.S. Office of Student Financial Assistance (OSFA). For each of the following types of samples, definitions and examples are provided, along with information on advantages and disadvantages: simple random sampling,…
Pedagogical Simulation of Sampling Distributions and the Central Limit Theorem
ERIC Educational Resources Information Center
Hagtvedt, Reidar; Jones, Gregory Todd; Jones, Kari
2007-01-01
Students often find the fact that a sample statistic is a random variable very hard to grasp. Even more mysterious is why a sample mean should become ever more Normal as the sample size increases. This simulation tool is meant to illustrate the process, thereby giving students some intuitive grasp of the relationship between a parent population…
Liem, Franziskus; Mérillat, Susan; Bezzola, Ladina; Hirsiger, Sarah; Philipp, Michel; Madhyastha, Tara; Jäncke, Lutz
2015-03-01
FreeSurfer is a tool to quantify cortical and subcortical brain anatomy automatically and noninvasively. Previous studies have reported reliability and statistical power analyses in relatively small samples or only selected one aspect of brain anatomy. Here, we investigated reliability and statistical power of cortical thickness, surface area, volume, and the volume of subcortical structures in a large sample (N=189) of healthy elderly subjects (64+ years). Reliability (intraclass correlation coefficient) of cortical and subcortical parameters is generally high (cortical: ICCs>0.87, subcortical: ICCs>0.95). Surface-based smoothing increases reliability of cortical thickness maps, while it decreases reliability of cortical surface area and volume. Nevertheless, statistical power of all measures benefits from smoothing. When aiming to detect a 10% difference between groups, the number of subjects required to test effects with sufficient power over the entire cortex varies between cortical measures (cortical thickness: N=39, surface area: N=21, volume: N=81; 10mm smoothing, power=0.8, α=0.05). For subcortical regions this number is between 16 and 76 subjects, depending on the region. We also demonstrate the advantage of within-subject designs over between-subject designs. Furthermore, we publicly provide a tool that allows researchers to perform a priori power analysis and sensitivity analysis to help evaluate previously published studies and to design future studies with sufficient statistical power. Copyright © 2014 Elsevier Inc. All rights reserved.
A time to be born: Variation in the hour of birth in a rural population of Northern Argentina.
Chaney, Carlye; Goetz, Laura G; Valeggia, Claudia
2018-04-17
The present study aimed at investigating the timing of birth across the day in a rural population of indigenous and nonindigenous women in the province of Formosa, Argentina in order to explore the variation in patterns in a non-Western setting. This study utilized birth record data transcribed from delivery room records at a rural hospital in the province of Formosa, northern Argentina. The sample included data for Criollo, Wichí, and Toba/Qom women (n = 2421). Statistical analysis was conducted using directional statistics to identify a mean sample direction. Chi-square tests for homogeneity were also used to test for statistical significant differences between hours of the day. The mean sample direction was 81.04°, which equates to 5:24 AM when calculated as time on a 24-hr clock. Chi-squared analyses showed a statistically significant peak in births between 12:00 and 4:00 AM. Birth counts generally declined throughout the day until a statistically significant trough around 5:00 PM. This pattern may be associated with the circadian rhythms of hormone release, particularly melatonin, on a proximate level. At the ultimate level, giving birth in the early hours of the morning may have been selected to time births when the mother could benefit from the predator protection and support provided by her social group as well as increased mother-infant bonding from a more peaceful environment. © 2018 Wiley Periodicals, Inc.
Identifying natural flow regimes using fish communities
NASA Astrophysics Data System (ADS)
Chang, Fi-John; Tsai, Wen-Ping; Wu, Tzu-Ching; Chen, Hung-kwai; Herricks, Edwin E.
2011-10-01
SummaryModern water resources management has adopted natural flow regimes as reasonable targets for river restoration and conservation. The characterization of a natural flow regime begins with the development of hydrologic statistics from flow records. However, little guidance exists for defining the period of record needed for regime determination. In Taiwan, the Taiwan Eco-hydrological Indicator System (TEIS), a group of hydrologic statistics selected for fisheries relevance, is being used to evaluate ecological flows. The TEIS consists of a group of hydrologic statistics selected to characterize the relationships between flow and the life history of indigenous species. Using the TEIS and biosurvey data for Taiwan, this paper identifies the length of hydrologic record sufficient for natural flow regime characterization. To define the ecological hydrology of fish communities, this study connected hydrologic statistics to fish communities by using methods to define antecedent conditions that influence existing community composition. A moving average method was applied to TEIS statistics to reflect the effects of antecedent flow condition and a point-biserial correlation method was used to relate fisheries collections with TEIS statistics. The resulting fish species-TEIS (FISH-TEIS) hydrologic statistics matrix takes full advantage of historical flows and fisheries data. The analysis indicates that, in the watersheds analyzed, averaging TEIS statistics for the present year and 3 years prior to the sampling date, termed MA(4), is sufficient to develop a natural flow regime. This result suggests that flow regimes based on hydrologic statistics for the period of record can be replaced by regimes developed for sampled fish communities.
Sawmill simulation: concepts and computer use
Hugh W. Reynolds; Charles J. Gatchell
1969-01-01
Product specifications were fed into a computer so that the yield of products from the same sample of logs could be determined for simulated sawing methods. Since different sawing patterns were tested on the same sample, variation among log samples was eliminated; hence, the statistical conclusions are very precise.
Williamson, Graham R
2003-11-01
This paper discusses the theoretical limitations of the use of random sampling and probability theory in the production of a significance level (or P-value) in nursing research. Potential alternatives, in the form of randomization tests, are proposed. Research papers in nursing, medicine and psychology frequently misrepresent their statistical findings, as the P-values reported assume random sampling. In this systematic review of studies published between January 1995 and June 2002 in the Journal of Advanced Nursing, 89 (68%) studies broke this assumption because they used convenience samples or entire populations. As a result, some of the findings may be questionable. The key ideas of random sampling and probability theory for statistical testing (for generating a P-value) are outlined. The result of a systematic review of research papers published in the Journal of Advanced Nursing is then presented, showing how frequently random sampling appears to have been misrepresented. Useful alternative techniques that might overcome these limitations are then discussed. REVIEW LIMITATIONS: This review is limited in scope because it is applied to one journal, and so the findings cannot be generalized to other nursing journals or to nursing research in general. However, it is possible that other nursing journals are also publishing research articles based on the misrepresentation of random sampling. The review is also limited because in several of the articles the sampling method was not completely clearly stated, and in this circumstance a judgment has been made as to the sampling method employed, based on the indications given by author(s). Quantitative researchers in nursing should be very careful that the statistical techniques they use are appropriate for the design and sampling methods of their studies. If the techniques they employ are not appropriate, they run the risk of misinterpreting findings by using inappropriate, unrepresentative and biased samples.
ERIC Educational Resources Information Center
James, David E.; Schraw, Gregory; Kuch, Fred
2015-01-01
We present an equation, derived from standard statistical theory, that can be used to estimate sampling margin of error for student evaluations of teaching (SETs). We use the equation to examine the effect of sample size, response rates and sample variability on the estimated sampling margin of error, and present results in four tables that allow…
Design of portable ultraminiature flow cytometers for medical diagnostics
NASA Astrophysics Data System (ADS)
Leary, James F.
2018-02-01
Design of portable microfluidic flow/image cytometry devices for measurements in the field (e.g. initial medical diagnostics) requires careful design in terms of power requirements and weight to allow for realistic portability. True portability with high-throughput microfluidic systems also requires sampling systems without the need for sheath hydrodynamic focusing both to avoid the need for sheath fluid and to enable higher volumes of actual sample, rather than sheath/sample combinations. Weight/power requirements dictate use of super-bright LEDs with top-hat excitation beam architectures and very small silicon photodiodes or nanophotonic sensors that can both be powered by small batteries. Signal-to-noise characteristics can be greatly improved by appropriately pulsing the LED excitation sources and sampling and subtracting noise in between excitation pulses. Microfluidic cytometry also requires judicious use of small sample volumes and appropriate statistical sampling by microfluidic cytometry or imaging for adequate statistical significance to permit real-time (typically in less than 15 minutes) initial medical decisions for patients in the field. This is not something conventional cytometry traditionally worries about, but is very important for development of small, portable microfluidic devices with small-volume throughputs. It also provides a more reasonable alternative to conventional tubes of blood when sampling geriatric and newborn patients for whom a conventional peripheral blood draw can be problematical. Instead one or two drops of blood obtained by pin-prick should be able to provide statistically meaningful results for use in making real-time medical decisions without the need for blood fractionation, which is not realistic in the doctor's office or field.
Anomaly detection for machine learning redshifts applied to SDSS galaxies
NASA Astrophysics Data System (ADS)
Hoyle, Ben; Rau, Markus Michael; Paech, Kerstin; Bonnett, Christopher; Seitz, Stella; Weller, Jochen
2015-10-01
We present an analysis of anomaly detection for machine learning redshift estimation. Anomaly detection allows the removal of poor training examples, which can adversely influence redshift estimates. Anomalous training examples may be photometric galaxies with incorrect spectroscopic redshifts, or galaxies with one or more poorly measured photometric quantity. We select 2.5 million `clean' SDSS DR12 galaxies with reliable spectroscopic redshifts, and 6730 `anomalous' galaxies with spectroscopic redshift measurements which are flagged as unreliable. We contaminate the clean base galaxy sample with galaxies with unreliable redshifts and attempt to recover the contaminating galaxies using the Elliptical Envelope technique. We then train four machine learning architectures for redshift analysis on both the contaminated sample and on the preprocessed `anomaly-removed' sample and measure redshift statistics on a clean validation sample generated without any preprocessing. We find an improvement on all measured statistics of up to 80 per cent when training on the anomaly removed sample as compared with training on the contaminated sample for each of the machine learning routines explored. We further describe a method to estimate the contamination fraction of a base data sample.
Reese, Sarah E; Archer, Kellie J; Therneau, Terry M; Atkinson, Elizabeth J; Vachon, Celine M; de Andrade, Mariza; Kocher, Jean-Pierre A; Eckel-Passow, Jeanette E
2013-11-15
Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal component analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of PCA to quantify the existence of batch effects, called guided PCA (gPCA). We describe a test statistic that uses gPCA to test whether a batch effect exists. We apply our proposed test statistic derived using gPCA to simulated data and to two copy number variation case studies: the first study consisted of 614 samples from a breast cancer family study using Illumina Human 660 bead-chip arrays, whereas the second case study consisted of 703 samples from a family blood pressure study that used Affymetrix SNP Array 6.0. We demonstrate that our statistic has good statistical properties and is able to identify significant batch effects in two copy number variation case studies. We developed a new statistic that uses gPCA to identify whether batch effects exist in high-throughput genomic data. Although our examples pertain to copy number data, gPCA is general and can be used on other data types as well. The gPCA R package (Available via CRAN) provides functionality and data to perform the methods in this article. reesese@vcu.edu
Sathler, Renata; Pinzan, Arnaldo; Fernandes, Thais Maria Freire; de Almeida, Renato Rodrigues; Henriques, José Fernando Castanha
2014-01-01
Introduction The objective of this study was to identify the patterns of dental variables of adolescent Japanese-Brazilian descents with normal occlusion, and also to compare them with a similar Caucasian and Mongoloid sample. Methods Lateral cephalometric radiographs were used to compare the groups: Caucasian (n = 40), Japanese-Brazilian (n = 32) and Mongoloid (n = 33). The statistical tests used were one-way ANOVA and ANCOVA. The cephalometric measurements used followed the analyses of Steiner, Tweed and McNamara Jr. Results Statistical differences (P < 0.05) indicated a smaller interincisal angle and overbite for the Japanese-Brazilian sample, when compared to the Caucasian sample, although with similar values to the Mongoloid group. Conclusion The dental patterns found for the Japanese-Brazilian descents were, in general, more similar to those of the Mongoloid sample. PMID:25279521
Standard reference water samples for rare earth element determinations
Verplanck, P.L.; Antweiler, Ronald C.; Nordstrom, D. Kirk; Taylor, Howard E.
2001-01-01
Standard reference water samples (SRWS) were collected from two mine sites, one near Ophir, CO, USA and the other near Redding, CA, USA. The samples were filtered, preserved, and analyzed for rare earth element (REE) concentrations (La, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, and Lu) by inductively coupled plasma-mass spectrometry (ICP-MS). These two samples were acid mine waters with elevated concentrations of REEs (0.45-161 ??g/1). Seventeen international laboratories participated in a 'round-robin' chemical analysis program, which made it possible to evaluate the data by robust statistical procedures that are insensitive to outliers. The resulting most probable values are reported. Ten to 15 of the participants also reported values for Ba, Y, and Sc. Field parameters, major ion, and other trace element concentrations, not subject to statistical evaluation, are provided.
Dodge, Kent A.; Hornberger, Michelle I.; Dyke, Jessica
2006-01-01
Water, bed sediment, and biota were sampled in streams from Butte to below Missoula as part of a long-term monitoring program, conducted in cooperation with the U.S. Environmental Protection Agency, to characterize aquatic resources in the upper Clark Fork basin of western Montana. Sampling sites were located on the Clark Fork, six major tributaries, and three smaller tributaries. Water-quality samples were collected periodically at 18 sites during October 2004 through September 2005 (water year 2005). Bed-sediment and biological samples were collected once in August 2005. The primary constituents analyzed were trace elements associated with tailings from historical mining and smelting activities. This report summarizes the results of water-quality, bed-sediment, and biota samples col-lected in water year 2005 and provides statistical summaries of data collected since 1985. Water-quality data for samples collected periodically from streams include concentrations of selected major ions, trace ele-ments, and suspended sediment. Daily values of suspended-sed-iment concentration and suspended-sediment discharge were determined for three sites. Bed-sediment data include trace-ele-ment concentrations in the fine-grained fraction. Bio-logical data include trace-element concentrations in whole-body tissue of aquatic benthic insects. Quality-assurance data are reported for analytical results of water, bed sediment, and biota. Statistical summaries of water-quality, bed-sediment, and biological data are provided for the period of record since 1985 for each site.
Sample size and power considerations in network meta-analysis
2012-01-01
Background Network meta-analysis is becoming increasingly popular for establishing comparative effectiveness among multiple interventions for the same disease. Network meta-analysis inherits all methodological challenges of standard pairwise meta-analysis, but with increased complexity due to the multitude of intervention comparisons. One issue that is now widely recognized in pairwise meta-analysis is the issue of sample size and statistical power. This issue, however, has so far only received little attention in network meta-analysis. To date, no approaches have been proposed for evaluating the adequacy of the sample size, and thus power, in a treatment network. Findings In this article, we develop easy-to-use flexible methods for estimating the ‘effective sample size’ in indirect comparison meta-analysis and network meta-analysis. The effective sample size for a particular treatment comparison can be interpreted as the number of patients in a pairwise meta-analysis that would provide the same degree and strength of evidence as that which is provided in the indirect comparison or network meta-analysis. We further develop methods for retrospectively estimating the statistical power for each comparison in a network meta-analysis. We illustrate the performance of the proposed methods for estimating effective sample size and statistical power using data from a network meta-analysis on interventions for smoking cessation including over 100 trials. Conclusion The proposed methods are easy to use and will be of high value to regulatory agencies and decision makers who must assess the strength of the evidence supporting comparative effectiveness estimates. PMID:22992327
NASA Astrophysics Data System (ADS)
Kucharik, C.; Roth, J.
2002-12-01
The threat of global climate change has provoked policy-makers to consider plausible strategies to slow the accumulation of greenhouse gases, especially carbon dioxide, in the atmosphere. One such idea involves the sequestration of atmospheric carbon (C) in degraded agricultural soils as part of the Conservation Reserve Program (CRP). While the potential for significant C sequestration in CRP grassland ecosystems has been demonstrated, the paired-site sampling approach traditionally used to quantify soil C changes has not been evaluated with robust statistical analysis. In this study, 14 paired CRP (> 8 years old) and cropland sites in Dane County, Wisconsin (WI) were used to assess whether a paired-site sampling design could detect statistically significant differences (ANOVA) in mean soil organic C and total nitrogen (N) storage. We compared surface (0 to 10 cm) bulk density, and sampled soils (0 to 5, 5 to 10, and 10 to 25 cm) for textural differences and chemical analysis of organic matter (OM), soil organic C (SOC), total N, and pH. The CRP contributed to lowering soil bulk density by 13% (p < 0.0001) and increased SOC and OM storage (kg m-2) by 13 to 17% in the 0 to 5 cm layer (p = 0.1). We tested the statistical power associated with ANOVA for measured soil properties, and calculated minimum detectable differences (MDD). We concluded that 40 to 65 paired sites and soil sampling in 5 cm increments near the surface were needed to achieve an 80% confidence level (α = 0.05; β = 0.20) in soil C and N sequestration rates. Because soil C and total N storage was highly variable among these sites (CVs > 20%), only a 23 to 29% change in existing total organic C and N pools could be reliably detected. While C and N sequestration (247 kg C ha{-1 } yr-1 and 17 kg N ha-1 yr-1) may be occurring and confined to the surface 5 cm as part of the WI CRP, our sampling design did not statistically support the desired 80% confidence level. We conclude that usage of statistical power analysis is essential to insure a high level of confidence in soil C and N sequestration rates that are quantified using paired plots.
A Bayesian nonparametric method for prediction in EST analysis
Lijoi, Antonio; Mena, Ramsés H; Prünster, Igor
2007-01-01
Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. PMID:17868445
Statistical Methods and Tools for Uxo Characterization (SERDP Final Technical Report)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pulsipher, Brent A.; Gilbert, Richard O.; Wilson, John E.
2004-11-15
The Strategic Environmental Research and Development Program (SERDP) issued a statement of need for FY01 titled Statistical Sampling for Unexploded Ordnance (UXO) Site Characterization that solicited proposals to develop statistically valid sampling protocols for cost-effective, practical, and reliable investigation of sites contaminated with UXO; protocols that could be validated through subsequent field demonstrations. The SERDP goal was the development of a sampling strategy for which a fraction of the site is initially surveyed by geophysical detectors to confidently identify clean areas and subsections (target areas, TAs) that had elevated densities of anomalous geophysical detector readings that could indicate the presencemore » of UXO. More detailed surveys could then be conducted to search the identified TAs for UXO. SERDP funded three projects: those proposed by the Pacific Northwest National Laboratory (PNNL) (SERDP Project No. UXO 1199), Sandia National Laboratory (SNL), and Oak Ridge National Laboratory (ORNL). The projects were closely coordinated to minimize duplication of effort and facilitate use of shared algorithms where feasible. This final report for PNNL Project 1199 describes the methods developed by PNNL to address SERDP's statement-of-need for the development of statistically-based geophysical survey methods for sites where 100% surveys are unattainable or cost prohibitive.« less
Weighted statistical parameters for irregularly sampled time series
NASA Astrophysics Data System (ADS)
Rimoldini, Lorenzo
2014-01-01
Unevenly spaced time series are common in astronomy because of the day-night cycle, weather conditions, dependence on the source position in the sky, allocated telescope time and corrupt measurements, for example, or inherent to the scanning law of satellites like Hipparcos and the forthcoming Gaia. Irregular sampling often causes clumps of measurements and gaps with no data which can severely disrupt the values of estimators. This paper aims at improving the accuracy of common statistical parameters when linear interpolation (in time or phase) can be considered an acceptable approximation of a deterministic signal. A pragmatic solution is formulated in terms of a simple weighting scheme, adapting to the sampling density and noise level, applicable to large data volumes at minimal computational cost. Tests on time series from the Hipparcos periodic catalogue led to significant improvements in the overall accuracy and precision of the estimators with respect to the unweighted counterparts and those weighted by inverse-squared uncertainties. Automated classification procedures employing statistical parameters weighted by the suggested scheme confirmed the benefits of the improved input attributes. The classification of eclipsing binaries, Mira, RR Lyrae, Delta Cephei and Alpha2 Canum Venaticorum stars employing exclusively weighted descriptive statistics achieved an overall accuracy of 92 per cent, about 6 per cent higher than with unweighted estimators.
Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo
2018-01-01
This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555
Gene coexpression measures in large heterogeneous samples using count statistics.
Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan
2014-11-18
With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.
Wang, Ling-jia; Kissler, Hermann J; Wang, Xiaojun; Cochet, Olivia; Krzystyniak, Adam; Misawa, Ryosuke; Golab, Karolina; Tibudan, Martin; Grzanka, Jakub; Savari, Omid; Grose, Randall; Kaufman, Dixon B; Millis, Michael; Witkowski, Piotr
2015-01-01
Pancreatic islet mass, represented by islet equivalent (IEQ), is the most important parameter in decision making for clinical islet transplantation. To obtain IEQ, the sample of islets is routinely counted manually under a microscope and discarded thereafter. Islet purity, another parameter in islet processing, is routinely acquired by estimation only. In this study, we validated our digital image analysis (DIA) system developed using the software of Image Pro Plus for islet mass and purity assessment. Application of the DIA allows to better comply with current good manufacturing practice (cGMP) standards. Human islet samples were captured as calibrated digital images for the permanent record. Five trained technicians participated in determination of IEQ and purity by manual counting method and DIA. IEQ count showed statistically significant correlations between the manual method and DIA in all sample comparisons (r >0.819 and p < 0.0001). Statistically significant difference in IEQ between both methods was found only in High purity 100μL sample group (p = 0.029). As far as purity determination, statistically significant differences between manual assessment and DIA measurement was found in High and Low purity 100μL samples (p<0.005), In addition, islet particle number (IPN) and the IEQ/IPN ratio did not differ statistically between manual counting method and DIA. In conclusion, the DIA used in this study is a reliable technique in determination of IEQ and purity. Islet sample preserved as a digital image and results produced by DIA can be permanently stored for verification, technical training and islet information exchange between different islet centers. Therefore, DIA complies better with cGMP requirements than the manual counting method. We propose DIA as a quality control tool to supplement the established standard manual method for islets counting and purity estimation. PMID:24806436
Influence of the processing route of porcelain/Ti-6Al-4V interfaces on shear bond strength.
Toptan, Fatih; Alves, Alexandra C; Henriques, Bruno; Souza, Júlio C M; Coelho, Rui; Silva, Filipe S; Rocha, Luís A; Ariza, Edith
2013-04-01
This study aims at evaluating the two-fold effect of initial surface conditions and dental porcelain-to-Ti-6Al-4V alloy joining processing route on the shear bond strength. Porcelain-to-Ti-6Al-4V samples were processed by conventional furnace firing (porcelain-fused-to-metal) and hot pressing. Prior to the processing, Ti-6Al-4V cylinders were prepared by three different surface treatments: polishing, alumina or silica blasting. Within the firing process, polished and alumina blasted samples were subjected to two different cooling rates: air cooling and a slower cooling rate (65°C/min). Metal/porcelain bond strength was evaluated by shear bond test. The data were analyzed using one-way ANOVA followed by Tuckey's test (p<0.05). Before and after shear bond tests, metallic surfaces and metal/ceramic interfaces were examined by Field Emission Gun Scanning Electron Microscope (FEG-SEM) equipped with Energy Dispersive X-Ray Spectroscopy (EDS). Shear bond strength values of the porcelain-to-Ti-6Al-4V alloy interfaces ranged from 27.1±8.9MPa for porcelain fused to polished samples up to 134.0±43.4MPa for porcelain fused to alumina blasted samples. According to the statistical analysis, no significant difference were found on the shear bond strength values for different cooling rates. Processing method was statistically significant only for the polished samples, and airborne particle abrasion was statistically significant only for the fired samples. The type of the blasting material did not cause a statistically significant difference on the shear bond strength values. Shear bond strength of dental porcelain to Ti-6Al-4V alloys can be significantly improved from controlled conditions of surface treatments and processing methods. Copyright © 2013 Elsevier Ltd. All rights reserved.
Pavlacky, David C; Lukacs, Paul M; Blakesley, Jennifer A; Skorkowsky, Robert C; Klute, David S; Hahn, Beth A; Dreitz, Victoria J; George, T Luke; Hanni, David J
2017-01-01
Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1) coordination across organizations and regions, 2) meaningful management and conservation objectives, and 3) rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR) program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17). We provide two examples for the Brewer's sparrow (Spizella breweri) in BCR 17 demonstrating the ability of the design to 1) determine hierarchical population responses to landscape change and 2) estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous statistical design and analyses ensures reliable knowledge about bird populations that is relevant and integral to bird conservation at multiple scales.
Hahn, Beth A.; Dreitz, Victoria J.; George, T. Luke
2017-01-01
Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1) coordination across organizations and regions, 2) meaningful management and conservation objectives, and 3) rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR) program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17). We provide two examples for the Brewer’s sparrow (Spizella breweri) in BCR 17 demonstrating the ability of the design to 1) determine hierarchical population responses to landscape change and 2) estimate hierarchical habitat relationships to predict the response of the Brewer’s sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous statistical design and analyses ensures reliable knowledge about bird populations that is relevant and integral to bird conservation at multiple scales. PMID:29065128
Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan
2017-12-01
Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis (PCA) and cluster analysis (CA). Copyright © 2017 Elsevier Ltd. All rights reserved.
A modified weighted function method for parameter estimation of Pearson type three distribution
NASA Astrophysics Data System (ADS)
Liang, Zhongmin; Hu, Yiming; Li, Binquan; Yu, Zhongbo
2014-04-01
In this paper, an unconventional method called Modified Weighted Function (MWF) is presented for the conventional moment estimation of a probability distribution function. The aim of MWF is to estimate the coefficient of variation (CV) and coefficient of skewness (CS) from the original higher moment computations to the first-order moment calculations. The estimators for CV and CS of Pearson type three distribution function (PE3) were derived by weighting the moments of the distribution with two weight functions, which were constructed by combining two negative exponential-type functions. The selection of these weight functions was based on two considerations: (1) to relate weight functions to sample size in order to reflect the relationship between the quantity of sample information and the role of weight function and (2) to allocate more weights to data close to medium-tail positions in a sample series ranked in an ascending order. A Monte-Carlo experiment was conducted to simulate a large number of samples upon which statistical properties of MWF were investigated. For the PE3 parent distribution, results of MWF were compared to those of the original Weighted Function (WF) and Linear Moments (L-M). The results indicate that MWF was superior to WF and slightly better than L-M, in terms of statistical unbiasness and effectiveness. In addition, the robustness of MWF, WF, and L-M were compared by designing the Monte-Carlo experiment that samples are obtained from Log-Pearson type three distribution (LPE3), three parameter Log-Normal distribution (LN3), and Generalized Extreme Value distribution (GEV), respectively, but all used as samples from the PE3 distribution. The results show that in terms of statistical unbiasness, no one method possesses the absolutely overwhelming advantage among MWF, WF, and L-M, while in terms of statistical effectiveness, the MWF is superior to WF and L-M.
A Civilian/Military Trauma Institute: National Trauma Coordinating Center
2015-12-01
zip codes was used in “proximity to violence” analysis. Data were analyzed using SPSS (version 20.0, SPSS Inc., Chicago, IL). Multivariable linear...number of adverse events and serious events was not statistically higher in one group, the incidence of deep venous thrombosis (DVT) was statistically ...subjects the lack of statistical difference on multivariate analysis may be related to an underpowered sample size. It was recommended that the
Reliability of a Measure of Institutional Discrimination against Minorities
1979-12-01
samples are presented. The first is based upon classical statistical theory and the second derives from a series of computer-generated Monte Carlo...Institutional racism and sexism . Englewood Cliffs, N. J.: Prentice-Hall, Inc., 1978. Hays, W. L. and Winkler, R. L. Statistics : probability, inference... statistical measure of the e of institutional discrimination are discussed. Two methods of dealing with the problem of reliability of the measure in small
Measuring Microaggression and Organizational Climate Factors in Military Units
2011-04-01
i.e., items) to accurately assess what we intend for them to measure. To assess construct and convergent validity, the author assessed the statistical ...sample indicated both convergent and construct validity of the microaggression scale. Table 5 presents these statistics . Measuring Microaggressions...models. As shown in Table 7, the measurement models had acceptable fit indices. That is, the Chi-square statistics were at their minimum; although the
The Seismic risk perception in Italy deduced by a statistical sample
NASA Astrophysics Data System (ADS)
Crescimbene, Massimo; La Longa, Federica; Camassi, Romano; Pino, Nicola Alessandro; Pessina, Vera; Peruzza, Laura; Cerbara, Loredana; Crescimbene, Cristiana
2015-04-01
In 2014 EGU Assembly we presented the results of a web a survey on the perception of seismic risk in Italy. The data were derived from over 8,500 questionnaires coming from all Italian regions. Our questionnaire was built by using the semantic differential method (Osgood et al. 1957) with a seven points Likert scale. The questionnaire is inspired the main theoretical approaches of risk perception (psychometric paradigm, cultural theory, etc.) .The results were promising and seem to clearly indicate an underestimation of seismic risk by the italian population. Based on these promising results, the DPC has funded our research for the second year. In 2015 EGU Assembly we present the results of a new survey deduced by an italian statistical sample. The importance of statistical significance at national scale was also suggested by ISTAT (Italian Statistic Institute), considering the study as of national interest, accepted the "project on the perception of seismic risk" as a pilot study inside the National Statistical System (SISTAN), encouraging our RU to proceed in this direction. The survey was conducted by a company specialised in population surveys using the CATI method (computer assisted telephone interview). Preliminary results will be discussed. The statistical support was provided by the research partner CNR-IRPPS. This research is funded by Italian Civil Protection Department (DPC).
Powerful Statistical Inference for Nested Data Using Sufficient Summary Statistics
Dowding, Irene; Haufe, Stefan
2018-01-01
Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This “naive” approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment. PMID:29615885
NASA Astrophysics Data System (ADS)
Jaya Christiyan, K. G.; Chandrasekhar, U.; Mathivanan, N. Rajesh; Venkateswarlu, K.
2018-02-01
A 3D printing was successfully used to fabricate samples of Polylactic Acid (PLA). Processing parameters such as Lay-up speed, Lay-up thickness, and printing nozzle were varied. All samples were tested for flexural strength using three point load test. A statistical mathematical model was developed to correlate the processing parameters with flexural strength. The result clearly demonstrated that the lay-up thickness and nozzle diameter influenced flexural strength significantly, whereas lay-up speed hardly influenced the flexural strength.
Empirical likelihood-based tests for stochastic ordering
BARMI, HAMMOU EL; MCKEAGUE, IAN W.
2013-01-01
This paper develops an empirical likelihood approach to testing for the presence of stochastic ordering among univariate distributions based on independent random samples from each distribution. The proposed test statistic is formed by integrating a localized empirical likelihood statistic with respect to the empirical distribution of the pooled sample. The asymptotic null distribution of this test statistic is found to have a simple distribution-free representation in terms of standard Brownian bridge processes. The approach is used to compare the lengths of rule of Roman Emperors over various historical periods, including the “decline and fall” phase of the empire. In a simulation study, the power of the proposed test is found to improve substantially upon that of a competing test due to El Barmi and Mukerjee. PMID:23874142
Learning to Reason from Samples
ERIC Educational Resources Information Center
Ben-Zvi, Dani; Bakker, Arthur; Makar, Katie
2015-01-01
The goal of this article is to introduce the topic of "learning to reason from samples," which is the focus of this special issue of "Educational Studies in Mathematics" on "statistical reasoning." Samples are data sets, taken from some wider universe (e.g., a population or a process) using a particular procedure…
VizieR Online Data Catalog: LVL global optical photometry (Cook+, 2014)
NASA Astrophysics Data System (ADS)
Cook, D. O.; Dale, D. A.; Johnson, B. D.; van Zee, L.; Lee, J. C.; Kennicutt, R. C.; Calzetti, D.; Staudaher, S. M.; Engelbracht, C. W.
2015-05-01
The LVL sample consists of 258 of our nearest galaxy neighbours reflecting a statistically complete, representative sample of the local Universe. The sample selection and description are detailed in Dale et al. (2009ApJ...703..517D, Cat. J/ApJ/703/517). (4 data files).
VizieR Online Data Catalog: LVL SEDs and physical properties (Cook+, 2014)
NASA Astrophysics Data System (ADS)
Cook, D. O.; Dale, D. A.; Johnson, B. D.; van Zee, L.; Lee, J. C.; Kennicutt, R. C.; Calzetti, D.; Staudaher, S. M.; Engelbracht, C. W.
2015-05-01
The LVL sample consists of 258 of our nearest galaxy neighbours reflecting a statistically complete, representative sample of the local Universe. The sample selection and description are detailed in Dale et al. (2009ApJ...703..517D, Cat. J/ApJ/703/517). (1 data file).
Some Tests of Randomness with Applications
1981-02-01
freedom. For further details, the reader is referred to Gnanadesikan (1977, p. 169) wherein other relevant tests are also given, Graphical tests, as...sample from a gamma distri- bution. J. Am. Statist. Assoc. 71, 480-7. Gnanadesikan , R. (1977). Methods for Statistical Data Analysis of Multivariate
[A Review on the Use of Effect Size in Nursing Research].
Kang, Hyuncheol; Yeon, Kyupil; Han, Sang Tae
2015-10-01
The purpose of this study was to introduce the main concepts of statistical testing and effect size and to provide researchers in nursing science with guidance on how to calculate the effect size for the statistical analysis methods mainly used in nursing. For t-test, analysis of variance, correlation analysis, regression analysis which are used frequently in nursing research, the generally accepted definitions of the effect size were explained. Some formulae for calculating the effect size are described with several examples in nursing research. Furthermore, the authors present the required minimum sample size for each example utilizing G*Power 3 software that is the most widely used program for calculating sample size. It is noted that statistical significance testing and effect size measurement serve different purposes, and the reliance on only one side may be misleading. Some practical guidelines are recommended for combining statistical significance testing and effect size measure in order to make more balanced decisions in quantitative analyses.
Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation.
Yin, Guosheng; Ma, Yanyuan
2013-01-01
The Pearson test statistic is constructed by partitioning the data into bins and computing the difference between the observed and expected counts in these bins. If the maximum likelihood estimator (MLE) of the original data is used, the statistic generally does not follow a chi-squared distribution or any explicit distribution. We propose a bootstrap-based modification of the Pearson test statistic to recover the chi-squared distribution. We compute the observed and expected counts in the partitioned bins by using the MLE obtained from a bootstrap sample. This bootstrap-sample MLE adjusts exactly the right amount of randomness to the test statistic, and recovers the chi-squared distribution. The bootstrap chi-squared test is easy to implement, as it only requires fitting exactly the same model to the bootstrap data to obtain the corresponding MLE, and then constructs the bin counts based on the original data. We examine the test size and power of the new model diagnostic procedure using simulation studies and illustrate it with a real data set.
Some challenges with statistical inference in adaptive designs.
Hung, H M James; Wang, Sue-Jane; Yang, Peiling
2014-01-01
Adaptive designs have generated a great deal of attention to clinical trial communities. The literature contains many statistical methods to deal with added statistical uncertainties concerning the adaptations. Increasingly encountered in regulatory applications are adaptive statistical information designs that allow modification of sample size or related statistical information and adaptive selection designs that allow selection of doses or patient populations during the course of a clinical trial. For adaptive statistical information designs, a few statistical testing methods are mathematically equivalent, as a number of articles have stipulated, but arguably there are large differences in their practical ramifications. We pinpoint some undesirable features of these methods in this work. For adaptive selection designs, the selection based on biomarker data for testing the correlated clinical endpoints may increase statistical uncertainty in terms of type I error probability, and most importantly the increased statistical uncertainty may be impossible to assess.
Developing a cosmic ray muon sampling capability for muon tomography and monitoring applications
NASA Astrophysics Data System (ADS)
Chatzidakis, S.; Chrysikopoulou, S.; Tsoukalas, L. H.
2015-12-01
In this study, a cosmic ray muon sampling capability using a phenomenological model that captures the main characteristics of the experimentally measured spectrum coupled with a set of statistical algorithms is developed. The "muon generator" produces muons with zenith angles in the range 0-90° and energies in the range 1-100 GeV and is suitable for Monte Carlo simulations with emphasis on muon tomographic and monitoring applications. The muon energy distribution is described by the Smith and Duller (1959) [35] phenomenological model. Statistical algorithms are then employed for generating random samples. The inverse transform provides a means to generate samples from the muon angular distribution, whereas the Acceptance-Rejection and Metropolis-Hastings algorithms are employed to provide the energy component. The predictions for muon energies 1-60 GeV and zenith angles 0-90° are validated with a series of actual spectrum measurements and with estimates from the software library CRY. The results confirm the validity of the phenomenological model and the applicability of the statistical algorithms to generate polyenergetic-polydirectional muons. The response of the algorithms and the impact of critical parameters on computation time and computed results were investigated. Final output from the proposed "muon generator" is a look-up table that contains the sampled muon angles and energies and can be easily integrated into Monte Carlo particle simulation codes such as Geant4 and MCNP.
Testing the non-unity of rate ratio under inverse sampling.
Tang, Man-Lai; Liao, Yi Jie; Ng, Hong Keung Tony; Chan, Ping Shing
2007-08-01
Inverse sampling is considered to be a more appropriate sampling scheme than the usual binomial sampling scheme when subjects arrive sequentially, when the underlying response of interest is acute, and when maximum likelihood estimators of some epidemiologic indices are undefined. In this article, we study various statistics for testing non-unity rate ratios in case-control studies under inverse sampling. These include the Wald, unconditional score, likelihood ratio and conditional score statistics. Three methods (the asymptotic, conditional exact, and Mid-P methods) are adopted for P-value calculation. We evaluate the performance of different combinations of test statistics and P-value calculation methods in terms of their empirical sizes and powers via Monte Carlo simulation. In general, asymptotic score and conditional score tests are preferable for their actual type I error rates are well controlled around the pre-chosen nominal level, and their powers are comparatively the largest. The exact version of Wald test is recommended if one wants to control the actual type I error rate at or below the pre-chosen nominal level. If larger power is expected and fluctuation of sizes around the pre-chosen nominal level are allowed, then the Mid-P version of Wald test is a desirable alternative. We illustrate the methodologies with a real example from a heart disease study. (c) 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Influence of light curing and sample thickness on microhardness of a composite resin
Aguiar, Flávio HB; Andrade, Kelly RM; Leite Lima, Débora AN; Ambrosano, Gláucia MB; Lovadino, José R
2009-01-01
The aim of this in vitro study was to evaluate the influence of light-curing units and different sample thicknesses on the microhardness of a composite resin. Composite resin specimens were randomly prepared and assigned to nine experimental groups (n = 5): considering three light-curing units (conventional quartz tungsten halogen [QTH]: 550 mW/cm2 – 20 s; high irradiance QTH: 1160 mW/cm2 – 10 s; and light-emitting diode [LED]: 360 mW/cm2 – 40 s) and three sample thicknesses (0.5 mm, 1 mm, and 2 mm). All samples were polymerized with the light tip 8 mm away from the specimen. Knoop microhardness was then measured on the top and bottom surfaces of each sample. The top surfaces, with some exceptions, were almost similar; however, in relation to the bottom surfaces, statistical differences were found between curing units and thicknesses. In all experimental groups, the 0.5-mm-thick increments showed microhardness values statistically higher than those observed for 1- and -2-mm increments. The conventional and LED units showed higher hardness mean values and were statistically different from the high irradiance unit. In all experimental groups, microhardness mean values obtained for the top surface were higher than those observed for the bottom surface. In conclusion, higher levels of irradiance or thinner increments would help improve hybrid composite resin polymerization. PMID:23674901
Preparing for the first meeting with a statistician.
De Muth, James E
2008-12-15
Practical statistical issues that should be considered when performing data collection and analysis are reviewed. The meeting with a statistician should take place early in the research development before any study data are collected. The process of statistical analysis involves establishing the research question, formulating a hypothesis, selecting an appropriate test, sampling correctly, collecting data, performing tests, and making decisions. Once the objectives are established, the researcher can determine the characteristics or demographics of the individuals required for the study, how to recruit volunteers, what type of data are needed to answer the research question(s), and the best methods for collecting the required information. There are two general types of statistics: descriptive and inferential. Presenting data in a more palatable format for the reader is called descriptive statistics. Inferential statistics involve making an inference or decision about a population based on results obtained from a sample of that population. In order for the results of a statistical test to be valid, the sample should be representative of the population from which it is drawn. When collecting information about volunteers, researchers should only collect information that is directly related to the study objectives. Important information that a statistician will require first is an understanding of the type of variables involved in the study and which variables can be controlled by researchers and which are beyond their control. Data can be presented in one of four different measurement scales: nominal, ordinal, interval, or ratio. Hypothesis testing involves two mutually exclusive and exhaustive statements related to the research question. Statisticians should not be replaced by computer software, and they should be consulted before any research data are collected. When preparing to meet with a statistician, the pharmacist researcher should be familiar with the steps of statistical analysis and consider several questions related to the study to be conducted.
2012-01-01
Background When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. Methods An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Results Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. Conclusions The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population. PMID:22716998
Précis of statistical significance: rationale, validity, and utility.
Chow, S L
1998-04-01
The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of data. Neither can any of them fulfill the nonstatistical functions expected of them by critics.
Code of Federal Regulations, 2011 CFR
2011-01-01
.... agricultural and rural economy. (2) Administering a methodological research program to improve agricultural... design and data collection methodologies to the agricultural statistics program. Major functions include...) Designing, testing, and establishing survey techniques and standards, including sample design, sample...
Code of Federal Regulations, 2010 CFR
2010-01-01
.... agricultural and rural economy. (2) Administering a methodological research program to improve agricultural... design and data collection methodologies to the agricultural statistics program. Major functions include...) Designing, testing, and establishing survey techniques and standards, including sample design, sample...