Chen, Feinian; Curran, Patrick J.; Bollen, Kenneth A.; Kirby, James; Paxton, Pamela
2009-01-01
This article is an empirical evaluation of the choice of fixed cutoff points in assessing the root mean square error of approximation (RMSEA) test statistic as a measure of goodness-of-fit in Structural Equation Models. Using simulation data, the authors first examine whether there is any empirical evidence for the use of a universal cutoff, and then compare the practice of using the point estimate of the RMSEA alone versus that of using it jointly with its related confidence interval. The results of the study demonstrate that there is little empirical support for the use of .05 or any other value as universal cutoff values to determine adequate model fit, regardless of whether the point estimate is used alone or jointly with the confidence interval. The authors' analyses suggest that to achieve a certain level of power or Type I error rate, the choice of cutoff values depends on model specifications, degrees of freedom, and sample size. PMID:19756246
An Investigation of the Sample Performance of Two Nonnormality Corrections for RMSEA
ERIC Educational Resources Information Center
Brosseau-Liard, Patricia E.; Savalei, Victoria; Li, Libo
2012-01-01
The root mean square error of approximation (RMSEA) is a popular fit index in structural equation modeling (SEM). Typically, RMSEA is computed using the normal theory maximum likelihood (ML) fit function. Under nonnormality, the uncorrected sample estimate of the ML RMSEA tends to be inflated. Two robust corrections to the sample ML RMSEA have…
ERIC Educational Resources Information Center
Huberty, Carl J.
An approach to statistical testing, which combines Neyman-Pearson hypothesis testing and Fisher significance testing, is recommended. The use of P-values in this approach is discussed in some detail. The author also discusses some problems which are often found in introductory statistics textbooks. The problems involve the definitions of…
Statistical Significance Testing.
ERIC Educational Resources Information Center
McLean, James E., Ed.; Kaufman, Alan S., Ed.
1998-01-01
The controversy about the use or misuse of statistical significance testing has become the major methodological issue in educational research. This special issue contains three articles that explore the controversy, three commentaries on these articles, an overall response, and three rejoinders by the first three authors. They are: (1)…
(Errors in statistical tests)3.
Phillips, Carl V; MacLehose, Richard F; Kaufman, Jay S
2008-07-14
In 2004, Garcia-Berthou and Alcaraz published "Incongruence between test statistics and P values in medical papers," a critique of statistical errors that received a tremendous amount of attention. One of their observations was that the final reported digit of p-values in articles published in the journal Nature departed substantially from the uniform distribution that they suggested should be expected. In 2006, Jeng critiqued that critique, observing that the statistical analysis of those terminal digits had been based on comparing the actual distribution to a uniform continuous distribution, when digits obviously are discretely distributed. Jeng corrected the calculation and reported statistics that did not so clearly support the claim of a digit preference. However delightful it may be to read a critique of statistical errors in a critique of statistical errors, we nevertheless found several aspects of the whole exchange to be quite troubling, prompting our own meta-critique of the analysis.The previous discussion emphasized statistical significance testing. But there are various reasons to expect departure from the uniform distribution in terminal digits of p-values, so that simply rejecting the null hypothesis is not terribly informative. Much more importantly, Jeng found that the original p-value of 0.043 should have been 0.086, and suggested this represented an important difference because it was on the other side of 0.05. Among the most widely reiterated (though often ignored) tenets of modern quantitative research methods is that we should not treat statistical significance as a bright line test of whether we have observed a phenomenon. Moreover, it sends the wrong message about the role of statistics to suggest that a result should be dismissed because of limited statistical precision when it is so easy to gather more data.In response to these limitations, we gathered more data to improve the statistical precision, and analyzed the actual pattern of the
Statistics and Hypothesis Testing in Biology.
ERIC Educational Resources Information Center
Maret, Timothy J.; Ziemba, Robert E.
1997-01-01
Suggests that early in their education students be taught to use basic statistical tests as rigorous methods of comparing experimental results with scientific hypotheses. Stresses that students learn how to use statistical tests in hypothesis-testing by applying them in actual hypothesis-testing situations. To illustrate, uses questions such as…
Statistical analysis of fatigue tests
NASA Astrophysics Data System (ADS)
Olsson, Karl Erik
1992-07-01
The ultimate aim of fatigue design, the minimization of the life cycle cost of the product, is discussed. The key is the load-strength model. Load and strength are described by distribution functions. Here the strength distribution is dealt with. Because of the 'weakest link theory' the three parameter Weibull distribution is the proper choice. With test results from a current project, the power of the Weibull analysis is demonstrated and some comments made about the traditional 'standard deviation of log cycles' approach. A Weibull transformation of rare occurrence, with the capability of separating different failure modes, is presented. Low stress level spectrum test results influenced by the 'fatigue limit' are easily separated. The difference between constant and variable amplitude test results is negligible. In the plane of the two Weibull parameters, shape and standardized location, it is possible to give a general view of component strength variation, from roller bearing life and fatigue stength of welds to yield strength.
Nonparametric Statistics Test Software Package.
1983-09-01
of elements less than or equal to the grand median; it is arranged acording to aggregate variable indices. NB is the count of elements greater than...guar- antee that the design i.s balanced , and insert a filler valule in place of missing data values when entering the data set. Each test will...names on more than one line ( card image) of the data or options file. .140 -------- --.’... u .. -,..-..-.. -" .... ...... .....- - • . . Zyp j aA
2009 GED Testing Program Statistical Report
ERIC Educational Resources Information Center
GED Testing Service, 2010
2010-01-01
The "2009 GED[R] Testing Program Statistical Report" is the 52nd annual report in the program's 68-year history of providing a second opportunity for adults without a high school credential to earn their jurisdiction's GED credential. The report provides candidate demographic and GED Test performance statistics as well as historical…
Quantum Statistical Testing of a QRNG Algorithm
Humble, Travis S; Pooser, Raphael C; Britt, Keith A
2013-01-01
We present the algorithmic design of a quantum random number generator, the subsequent synthesis of a physical design and its verification using quantum statistical testing. We also describe how quantum statistical testing can be used to diagnose channel noise in QKD protocols.
A statistical test to show negligible trend
Philip M. Dixon; Joseph H.K. Pechmann
2005-01-01
The usual statistical tests of trend are inappropriate for demonstrating the absence of trend. This is because failure to reject the null hypothesis of no trend does not prove that null hypothesis. The appropriate statistical method is based on an equivalence test. The null hypothesis is that the trend is not zero, i.e., outside an a priori specified equivalence region...
The insignificance of statistical significance testing
Johnson, Douglas H.
1999-01-01
Despite their use in scientific journals such as The Journal of Wildlife Management, statistical hypothesis tests add very little value to the products of research. Indeed, they frequently confuse the interpretation of data. This paper describes how statistical hypothesis tests are often viewed, and then contrasts that interpretation with the correct one. I discuss the arbitrariness of P-values, conclusions that the null hypothesis is true, power analysis, and distinctions between statistical and biological significance. Statistical hypothesis testing, in which the null hypothesis about the properties of a population is almost always known a priori to be false, is contrasted with scientific hypothesis testing, which examines a credible null hypothesis about phenomena in nature. More meaningful alternatives are briefly outlined, including estimation and confidence intervals for determining the importance of factors, decision theory for guiding actions in the face of uncertainty, and Bayesian approaches to hypothesis testing and other statistical practices.
Applications of Statistical Tests in Hand Surgery
Song, Jae W.; Haas, Ann; Chung, Kevin C.
2015-01-01
During the nineteenth century, with the emergence of public health as a goal to improve hygiene and conditions of the poor, statistics established itself as a distinct scientific field important for critically interpreting studies of public health concerns. During the twentieth century, statistics began to evolve mathematically and methodologically with hypothesis testing and experimental design. Today, the design of medical experiments centers around clinical trials and observational studies, and with the use of statistics, the collected data are summarized, weighed, and presented to direct both physicians and the public towards Evidence-Based Medicine. Having a basic understanding of statistics is mandatory in evaluating the validity of published literature and applying it to patient care. In this review, we aim to apply a practical approach in discussing basic statistical tests by providing a guide to choosing the correct statistical test along with examples relevant to hand surgery research. PMID:19969193
Dealing with Assumptions Underlying Statistical Tests
ERIC Educational Resources Information Center
Wells, Craig S.; Hintze, John M.
2007-01-01
The validity of a hypothesis test is partly determined by whether the assumptions underlying the test are satisfied. In the past, a preliminary analysis of the data has been suggested prior to the use of the statistical test. In this article, the authors describe several limitations of preliminary tests (e.g., influence on significance levels).…
Teaching Statistics in Language Testing Courses
ERIC Educational Resources Information Center
Brown, James Dean
2013-01-01
The purpose of this article is to examine the literature on teaching statistics for useful ideas that teachers of language testing courses can draw on and incorporate into their teaching toolkits as they see fit. To those ends, the article addresses eight questions: What is known generally about teaching statistics? Why are students so anxious…
Teaching Statistics in Language Testing Courses
ERIC Educational Resources Information Center
Brown, James Dean
2013-01-01
The purpose of this article is to examine the literature on teaching statistics for useful ideas that teachers of language testing courses can draw on and incorporate into their teaching toolkits as they see fit. To those ends, the article addresses eight questions: What is known generally about teaching statistics? Why are students so anxious…
Statistics Test Questions: Content and Trends
ERIC Educational Resources Information Center
Salcedo, Audy
2014-01-01
This study presents the results of the analysis of a group of teacher-made test questions for statistics courses at the university level. Teachers were asked to submit tests they had used in their previous two semesters. Ninety-seven tests containing 978 questions were gathered and classified according to the SOLO taxonomy (Biggs & Collis,…
Statistics Test Questions: Content and Trends
ERIC Educational Resources Information Center
Salcedo, Audy
2014-01-01
This study presents the results of the analysis of a group of teacher-made test questions for statistics courses at the university level. Teachers were asked to submit tests they had used in their previous two semesters. Ninety-seven tests containing 978 questions were gathered and classified according to the SOLO taxonomy (Biggs & Collis,…
Binomial test statistics using Psi functions
Bowman, Kimiko o
2007-01-01
For the negative binomial model (probability generating function (p + 1 - pt){sup -k}) a logarithmic derivative is the Psi function difference {psi}(k + x) - {psi}(k); this and its derivatives lead to a test statistic to decide on the validity of a specified model. The test statistic uses a data base so there exists a comparison available between theory and application. Note that the test function is not dominated by outliers. Applications to (i) Fisher's tick data, (ii) accidents data, (iii) Weldon's dice data are included.
Graphic presentation of the simplest statistical tests
NASA Astrophysics Data System (ADS)
Georgiev, Tsvetan B.
This paper presents graphically well known tests about change of population mean and standard deviation, about comparison of population means and standard deviations, as well as about significance of correlation and regression coefficients. The critical bounds and criteria for variability with statistical guaranty P=95 % and P=99 % are presented as dependences on the data number n. The graphs further give fast visual solutions of the direct problem (estimation of confidence interval for specified P and n), as well of the reverse problem (estimation of n, which is necessary for achieving a desired statistical guaranty of the result). The aim of the work is to present the simplest statistical tests in a comprehensible and convenient graphs, which will be always at hand. The graphs may be useful in the investigations of time series in astronomy, geophysics, ecology etc., as well as in the education.
Statistical treatment of fatigue test data
Raske, D.T.
1980-01-01
This report discussed several aspects of fatigue data analysis in order to provide a basis for the development of statistically sound design curves. Included is a discussion on the choice of the dependent variable, the assumptions associated with least squares regression models, the variability of fatigue data, the treatment of data from suspended tests and outlying observations, and various strain-life relations.
2006 GED Testing Program Statistical Report
ERIC Educational Resources Information Center
GED Testing Service, 2007
2007-01-01
The 2006 GED[R] Testing Program Statistical Report is the 49th annual report in the program's 65-year history of providing a second opportunity to adults without a high school diploma to earn their jurisdiction's General Educational Development (GED) credential, and, as a result, advance their educational, personal, and professional aspirations.…
2007 GED Testing Program Statistical Report
ERIC Educational Resources Information Center
GED Testing Service, 2008
2008-01-01
The "2007 GED[R] Testing Program Statistical Report" is the 50th annual report in the program's 66-year history of providing a second opportunity for adults without a high school diploma to earn their jurisdiction's GED credential, and, as a result, advance their educational, personal, and professional aspirations. Section I, "Who…
Tests of Statistical Significance Made Sound
ERIC Educational Resources Information Center
Haig, Brian D.
2017-01-01
This article considers the nature and place of tests of statistical significance (ToSS) in science, with particular reference to psychology. Despite the enormous amount of attention given to this topic, psychology's understanding of ToSS remains deficient. The major problem stems from a widespread and uncritical acceptance of null hypothesis…
Statistical tests for recessive lethal-carriers.
Hamilton, M A; Haseman, J K
1979-08-01
This paper presents a statistical method for testing whether a male mouse is a recessive lethal-carrier. The analysis is based on a back-cross experiment in which the male mouse is mated with some of his daughters. The numbers of total implantations and intrauterine deaths in each litter are recorded. It is assumed that, conditional on the number of total implantations, the number of intrauterine deaths follows a binomial distribution. Using computer-simulated experimentation it is shown that the proposed statistical method, which is sensitive to the pattern of intrauterine death rates, is more powerful than a test based only on the total number of implant deaths. The proposed test requires relatively simple calculations and can be used for a wide range of values of total implantations and background implant mortality rates. For computer-simulated experiments, there was no practical difference between the empirical error rate and the nominal error rate.
Mechanical Impact Testing: A Statistical Measurement
NASA Technical Reports Server (NTRS)
Engel, Carl D.; Herald, Stephen D.; Davis, S. Eddie
2005-01-01
In the decades since the 1950s, when NASA first developed mechanical impact testing of materials, researchers have continued efforts to gain a better understanding of the chemical, mechanical, and thermodynamic nature of the phenomenon. The impact mechanism is a real combustion ignition mechanism that needs understanding in the design of an oxygen system. The use of test data from this test method has been questioned due to lack of a clear method of application of the data and variability found between tests, material batches, and facilities. This effort explores a large database that has accumulated over a number of years and explores its overall nature. Moreover, testing was performed to determine the statistical nature of the test procedure to help establish sample size guidelines for material characterization. The current method of determining a pass/fail criterion based on either light emission or sound report or material charring is questioned.
Assumptions of Statistical Tests: What Lies Beneath.
Jupiter, Daniel C
We have discussed many statistical tests and tools in this series of commentaries, and while we have mentioned the underlying assumptions of the tests, we have not explored them in detail. We stop to look at some of the assumptions of the t-test and linear regression, justify and explain them, mention what can go wrong when the assumptions are not met, and suggest some solutions in this case. Copyright © 2017 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Statistical tests for prediction of lignite quality
C.J. Kolovos
2007-06-15
Domestic lignite from large, bucket wheel excavators based open pit mines is the main fuel for electricity generation in Greece. Lignite from one or more mines may arrive at any power plant stockyard. The mixture obtained constitutes the lignite fuel fed to the power plant. The fuel is sampled in regular time intervals. These samples are considered as results of observations of values of spatial random variables. The aim was to form and statistically test many small sample populations. Statistical tests on the values of the humidity content, the ash-water free content, and the lower heating value of the lignite fuel indicated that the sample values form a normal population. The Kolmogorov-Smirnov test was applied for testing goodness-of-fit of sample distribution for a three year period and different power plants of the Kozani-Ptolemais area, western Macedonia, Greece. The normal distribution hypothesis can be widely accepted for forecasting the distribution of values of the basic quality characteristics even for a small number of samples.
Kepler Planet Detection Metrics: Statistical Bootstrap Test
NASA Technical Reports Server (NTRS)
Jenkins, Jon M.; Burke, Christopher J.
2016-01-01
This document describes the data produced by the Statistical Bootstrap Test over the final three Threshold Crossing Event (TCE) deliveries to NExScI: SOC 9.1 (Q1Q16)1 (Tenenbaum et al. 2014), SOC 9.2 (Q1Q17) aka DR242 (Seader et al. 2015), and SOC 9.3 (Q1Q17) aka DR253 (Twicken et al. 2016). The last few years have seen significant improvements in the SOC science data processing pipeline, leading to higher quality light curves and more sensitive transit searches. The statistical bootstrap analysis results presented here and the numerical results archived at NASAs Exoplanet Science Institute (NExScI) bear witness to these software improvements. This document attempts to introduce and describe the main features and differences between these three data sets as a consequence of the software changes.
Conditional statistical inference with multistage testing designs.
Zwitser, Robert J; Maris, Gunter
2015-03-01
In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.
A Statistical Perspective on Highly Accelerated Testing
Thomas, Edward V.
2015-02-01
Highly accelerated life testing has been heavily promoted at Sandia (and elsewhere) as a means to rapidly identify product weaknesses caused by flaws in the product's design or manufacturing process. During product development, a small number of units are forced to fail at high stress. The failed units are then examined to determine the root causes of failure. The identification of the root causes of product failures exposed by highly accelerated life testing can instigate changes to the product's design and/or manufacturing process that result in a product with increased reliability. It is widely viewed that this qualitative use of highly accelerated life testing (often associated with the acronym HALT) can be useful. However, highly accelerated life testing has also been proposed as a quantitative means for "demonstrating" the reliability of a product where unreliability is associated with loss of margin via an identified and dominating failure mechanism. It is assumed that the dominant failure mechanism can be accelerated by changing the level of a stress factor that is assumed to be related to the dominant failure mode. In extreme cases, a minimal number of units (often from a pre-production lot) are subjected to a single highly accelerated stress relative to normal use. If no (or, sufficiently few) units fail at this high stress level, some might claim that a certain level of reliability has been demonstrated (relative to normal use conditions). Underlying this claim are assumptions regarding the level of knowledge associated with the relationship between the stress level and the probability of failure. The primary purpose of this document is to discuss (from a statistical perspective) the efficacy of using accelerated life testing protocols (and, in particular, "highly accelerated" protocols) to make quantitative inferences concerning the performance of a product (e.g., reliability) when in fact there is lack-of-knowledge and uncertainty concerning the
Statistical reasoning in clinical trials: hypothesis testing.
Kelen, G D; Brown, C G; Ashton, J
1988-01-01
Hypothesis testing is based on certain statistical and mathematical principles that allow investigators to evaluate data by making decisions based on the probability or implausibility of observing the results obtained. However, classic hypothesis testing has its limitations, and probabilities mathematically calculated are inextricably linked to sample size. Furthermore, the meaning of the p value frequently is misconstrued as indicating that the findings are also of clinical significance. Finally, hypothesis testing allows for four possible outcomes, two of which are errors that can lead to erroneous adoption of certain hypotheses: 1. The null hypothesis is rejected when, in fact, it is false. 2. The null hypothesis is rejected when, in fact, it is true (type I or alpha error). 3. The null hypothesis is conceded when, in fact, it is true. 4. The null hypothesis is conceded when, in fact, it is false (type II or beta error). The implications of these errors, their relation to sample size, the interpretation of negative trials, and strategies related to the planning of clinical trials will be explored in a future article in this journal.
Explorations in Statistics: Hypothesis Tests and P Values
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2009-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This second installment of "Explorations in Statistics" delves into test statistics and P values, two concepts fundamental to the test of a scientific null hypothesis. The essence of a test statistic is that it compares what…
Explorations in Statistics: Hypothesis Tests and P Values
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2009-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This second installment of "Explorations in Statistics" delves into test statistics and P values, two concepts fundamental to the test of a scientific null hypothesis. The essence of a test statistic is that it compares what…
SANABRIA, FEDERICO; KILLEEN, PETER R.
2008-01-01
Despite being under challenge for the past 50 years, null hypothesis significance testing (NHST) remains dominant in the scientific field for want of viable alternatives. NHST, along with its significance level p, is inadequate for most of the uses to which it is put, a flaw that is of particular interest to educational practitioners who too often must use it to sanctify their research. In this article, we review the failure of NHST and propose prep, the probability of replicating an effect, as a more useful statistic for evaluating research and aiding practical decision making. PMID:19122766
Statistical Tests of Galactic Dynamo Theory
NASA Astrophysics Data System (ADS)
Chamandy, Luke; Shukurov, Anvar; Taylor, A. Russ
2016-12-01
Mean-field galactic dynamo theory is the leading theory to explain the prevalence of regular magnetic fields in spiral galaxies, but its systematic comparison with observations is still incomplete and fragmentary. Here we compare predictions of mean-field dynamo models to observational data on magnetic pitch angle and the strength of the mean magnetic field. We demonstrate that a standard {α }2{{Ω }} dynamo model produces pitch angles of the regular magnetic fields of nearby galaxies that are reasonably consistent with available data. The dynamo estimates of the magnetic field strength are generally within a factor of a few of the observational values. Reasonable agreement between theoretical and observed pitch angles generally requires the turbulent correlation time τ to be in the range of 10-20 {Myr}, in agreement with standard estimates. Moreover, good agreement also requires that the ratio of the ionized gas scale height to root-mean-square turbulent velocity increases with radius. Our results thus widen the possibilities to constrain interstellar medium parameters using observations of magnetic fields. This work is a step toward systematic statistical tests of galactic dynamo theory. Such studies are becoming more and more feasible as larger data sets are acquired using current and up-and-coming instruments.
Statistical tests of simple earthquake cycle models
NASA Astrophysics Data System (ADS)
DeVries, Phoebe M. R.; Evans, Eileen L.
2016-12-01
A central goal of observing and modeling the earthquake cycle is to forecast when a particular fault may generate an earthquake: a fault late in its earthquake cycle may be more likely to generate an earthquake than a fault early in its earthquake cycle. Models that can explain geodetic observations throughout the entire earthquake cycle may be required to gain a more complete understanding of relevant physics and phenomenology. Previous efforts to develop unified earthquake models for strike-slip faults have largely focused on explaining both preseismic and postseismic geodetic observations available across a few faults in California, Turkey, and Tibet. An alternative approach leverages the global distribution of geodetic and geologic slip rate estimates on strike-slip faults worldwide. Here we use the Kolmogorov-Smirnov test for similarity of distributions to infer, in a statistically rigorous manner, viscoelastic earthquake cycle models that are inconsistent with 15 sets of observations across major strike-slip faults. We reject a large subset of two-layer models incorporating Burgers rheologies at a significance level of α = 0.05 (those with long-term Maxwell viscosities ηM < 4.0 × 1019 Pa s and ηM > 4.6 × 1020 Pa s) but cannot reject models on the basis of transient Kelvin viscosity ηK. Finally, we examine the implications of these results for the predicted earthquake cycle timing of the 15 faults considered and compare these predictions to the geologic and historical record.
Statistical tests of simple earthquake cycle models
Devries, Phoebe M. R.; Evans, Eileen
2016-01-01
A central goal of observing and modeling the earthquake cycle is to forecast when a particular fault may generate an earthquake: a fault late in its earthquake cycle may be more likely to generate an earthquake than a fault early in its earthquake cycle. Models that can explain geodetic observations throughout the entire earthquake cycle may be required to gain a more complete understanding of relevant physics and phenomenology. Previous efforts to develop unified earthquake models for strike-slip faults have largely focused on explaining both preseismic and postseismic geodetic observations available across a few faults in California, Turkey, and Tibet. An alternative approach leverages the global distribution of geodetic and geologic slip rate estimates on strike-slip faults worldwide. Here we use the Kolmogorov-Smirnov test for similarity of distributions to infer, in a statistically rigorous manner, viscoelastic earthquake cycle models that are inconsistent with 15 sets of observations across major strike-slip faults. We reject a large subset of two-layer models incorporating Burgers rheologies at a significance level of α = 0.05 (those with long-term Maxwell viscosities ηM <~ 4.0 × 1019 Pa s and ηM >~ 4.6 × 1020 Pa s) but cannot reject models on the basis of transient Kelvin viscosity ηK. Finally, we examine the implications of these results for the predicted earthquake cycle timing of the 15 faults considered and compare these predictions to the geologic and historical record.
Controversies around the Role of Statistical Tests in Experimental Research.
ERIC Educational Resources Information Center
Batanero, Carmen
2000-01-01
Describes the logic of statistical testing in the Fisher and Neyman-Pearson approaches. Reviews some common misinterpretations of basic concepts behind statistical tests. Analyzes the philosophical and psychological issues that can contribute to these misinterpretations. Suggests possible ways in which statistical education might contribute to the…
Statistical significance testing and clinical trials.
Krause, Merton S
2011-09-01
The efficacy of treatments is better expressed for clinical purposes in terms of these treatments' outcome distributions and their overlapping rather than in terms of the statistical significance of these distributions' mean differences, because clinical practice is primarily concerned with the outcome of each individual client rather than with the mean of the variety of outcomes in any group of clients. Reports of the obtained outcome distributions for the comparison groups of all competently designed and executed randomized clinical trials should be publicly available no matter what the statistical significance of the mean differences among these groups, because all of these studies' outcome distributions provide clinically useful information about the efficacy of the treatments compared.
ERIC Educational Resources Information Center
Mittag, Kathleen C.; Thompson, Bruce
2000-01-01
Surveyed AERA members regarding their perceptions of: statistical issues and statistical significance testing; the general linear model; stepwise methods; score reliability; type I and II errors; sample size; statistical probabilities as exclusive measures of effect size; p values as direct measures of result value; and p values evaluating…
Mantel-Haenszel test statistics for correlated binary data.
Zhang, J; Boos, D D
1997-12-01
This paper proposes two new Mantel-Haenszel test statistics for correlated binary data in 2 x 2 tables that are asymptotically valid in both sparse data (many strata) and large-strata limiting models. Monte Carlo experiments show that the statistics compare favorably to previously proposed test statistics, especially for 5-25 small to moderate-sized strata. Confidence intervals are also obtained and compared to those from the test of Liang (1985, Biometrika 72, 678-682).
Assessing Statistical Aspects of Test Fairness with Structural Equation Modelling
ERIC Educational Resources Information Center
Kline, Rex B.
2013-01-01
Test fairness and test bias are not synonymous concepts. Test bias refers to statistical evidence that the psychometrics or interpretation of test scores depend on group membership, such as gender or race, when such differences are not expected. A test that is grossly biased may be judged to be unfair, but test fairness concerns the broader, more…
An overall statistic for testing symmetry in social interactions.
Leiva, David; Solanas, Antonio; Salafranca, Lluís
2008-11-01
The present work focuses on the skew-symmetry index as a measure of social reciprocity. This index is based on the correspondence between the amount of behaviour that individuals address toward their partners and what they receive in return. Although the skew-symmetry index enables researchers to describe social groups, statistical inferential tests are required. This study proposes an overall statistical technique for testing symmetry in experimental conditions, calculating the skew-symmetry statistic (Phi) at group level. Sampling distributions for the skew-symmetry statistic were estimated by means of a Monte Carlo simulation to allow researchers to make statistical decisions. Furthermore, this study will allow researchers to choose the optimal experimental conditions for carrying out their research, as the power of the statistical test was estimated. This statistical test could be used in experimental social psychology studies in which researchers may control the group size and the number of interactions within dyads.
Statistical Analysis of Multiple Choice Testing
2001-04-01
the question to help determine poor distractors (incorrect answers). However, Attali and Fraenkel show that while it is sound to use the Rpbis...heavily on question difficulty.21 Attali and Fraenkel say that the Biserial is usually preferred as a criterion measure for the correct alternative...pubs/mcq/scpre.html, p.6 17 Renckly, Thomas R. Test Analysis & Development Sysem (TAD) version 5.49. CD- ROM.(1990-2000). 18 Ibid. 19 Attali , Yigal
A Note on Measurement Scales and Statistical Testing
ERIC Educational Resources Information Center
Meijer, Rob R.; Oosterloo, Sebie J.
2008-01-01
In elementary books on applied statistics (e.g., Siegel, 1988; Agresti, 1990) and books on research methodology in psychology and personality assessment (e.g., Aiken, 1999), it is often suggested that the choice of a statistical test and the choice of statistical operations should be determined by the level of measurement of the data. Although…
A Note on Measurement Scales and Statistical Testing
ERIC Educational Resources Information Center
Meijer, Rob R.; Oosterloo, Sebie J.
2008-01-01
In elementary books on applied statistics (e.g., Siegel, 1988; Agresti, 1990) and books on research methodology in psychology and personality assessment (e.g., Aiken, 1999), it is often suggested that the choice of a statistical test and the choice of statistical operations should be determined by the level of measurement of the data. Although…
Multiple comparisons and nonparametric statistical tests on a programmable calculator.
Hurwitz, A
1987-03-01
Calculator programs are provided for statistical tests for comparing groups of data. These tests can be applied when t-tests are inappropriate, as for multiple comparisons, or for evaluating groups of data that are not distributed normally or have unequal variances. The programs, designed to run on the least expensive Hewlett-Packard programmable scientific calculator, Model HP-11C, should place these statistical tests within easy reach of most students and investigators.
Explorations in statistics: hypothesis tests and P values.
Curran-Everett, Douglas
2009-06-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This second installment of Explorations in Statistics delves into test statistics and P values, two concepts fundamental to the test of a scientific null hypothesis. The essence of a test statistic is that it compares what we observe in the experiment to what we expect to see if the null hypothesis is true. The P value associated with the magnitude of that test statistic answers this question: if the null hypothesis is true, what proportion of possible values of the test statistic are at least as extreme as the one I got? Although statisticians continue to stress the limitations of hypothesis tests, there are two realities we must acknowledge: hypothesis tests are ingrained within science, and the simple test of a null hypothesis can be useful. As a result, it behooves us to explore the notions of hypothesis tests, test statistics, and P values.
Testing the Difference of Correlated Agreement Coefficients for Statistical Significance
ERIC Educational Resources Information Center
Gwet, Kilem L.
2016-01-01
This article addresses the problem of testing the difference between two correlated agreement coefficients for statistical significance. A number of authors have proposed methods for testing the difference between two correlated kappa coefficients, which require either the use of resampling methods or the use of advanced statistical modeling…
Testing the Difference of Correlated Agreement Coefficients for Statistical Significance
ERIC Educational Resources Information Center
Gwet, Kilem L.
2016-01-01
This article addresses the problem of testing the difference between two correlated agreement coefficients for statistical significance. A number of authors have proposed methods for testing the difference between two correlated kappa coefficients, which require either the use of resampling methods or the use of advanced statistical modeling…
Statistical properties of the USP dissolution test with pooled samples.
Saccone, Carlos D; Meneces, Nora S; Tessore, Julio
2005-01-01
The Montecarlo simulation method is used to study the statistical properties of the USP pooled dissolution test. In this paper, the statistical behavior of the dissolution test for pooled samples is studied, including: a) the operating characteristic curve showing the probability of passing the test versus the mean amount dissolved, b) the influence of measurement uncertainty on the result of the test, c) an analysis of the dependence of the statistical behavior on the underlying distribution of the individual amounts dissolved, d) a comparison of the statistical behavior of the unit dissolution test versus the pooled dissolution test, e) the average number of stages needed to reach a decision presented as a function of parameters of the lot, f) the relative influence of the three stages of the test on the probability of acceptance.
Nonparametric statistical testing of EEG- and MEG-data.
Maris, Eric; Oostenveld, Robert
2007-08-15
In this paper, we show how ElectroEncephaloGraphic (EEG) and MagnetoEncephaloGraphic (MEG) data can be analyzed statistically using nonparametric techniques. Nonparametric statistical tests offer complete freedom to the user with respect to the test statistic by means of which the experimental conditions are compared. This freedom provides a straightforward way to solve the multiple comparisons problem (MCP) and it allows to incorporate biophysically motivated constraints in the test statistic, which may drastically increase the sensitivity of the statistical test. The paper is written for two audiences: (1) empirical neuroscientists looking for the most appropriate data analysis method, and (2) methodologists interested in the theoretical concepts behind nonparametric statistical tests. For the empirical neuroscientist, a large part of the paper is written in a tutorial-like fashion, enabling neuroscientists to construct their own statistical test, maximizing the sensitivity to the expected effect. And for the methodologist, it is explained why the nonparametric test is formally correct. This means that we formulate a null hypothesis (identical probability distribution in the different experimental conditions) and show that the nonparametric test controls the false alarm rate under this null hypothesis.
ERIC Educational Resources Information Center
Sanabria, Federico; Killeen, Peter R.
2007-01-01
Despite being under challenge for the past 50 years, null hypothesis significance testing (NHST) remains dominant in the scientific field for want of viable alternatives. NHST, along with its significance level "p," is inadequate for most of the uses to which it is put, a flaw that is of particular interest to educational practitioners…
Misuse of statistical tests in Archives of Clinical Neuropsychology publications.
Schatz, Philip; Jay, Kristin A; McComb, Jason; McLaughlin, Jason R
2005-12-01
This article reviews the (mis)use of statistical tests in neuropsychology research studies published in the Archives of Clinical Neuropsychology in the years 1990-1992 and 1996-2000, and 2001-2004, prior to, commensurate with the internet-based and paper-based release, and following the release of the American Psychological Association's Task Force on Statistical Inference. The authors focused on four statistical errors: inappropriate use of null hypothesis tests, inappropriate use of P-values, neglect of effect size, and inflation of Type I error rates. Despite the recommendations of the Task Force on Statistical Inference published in 1999, the present study recorded instances of these statistical errors both pre- and post-APA's report, with only the reporting of effect size increasing after the release of the report. Neuropsychologists involved in empirical research should be better aware of the limitations and boundaries of hypothesis testing as well as the theoretical aspects of research methodology.
The Use of Meta-Analytic Statistical Significance Testing
ERIC Educational Resources Information Center
Polanin, Joshua R.; Pigott, Terri D.
2015-01-01
Meta-analysis multiplicity, the concept of conducting multiple tests of statistical significance within one review, is an underdeveloped literature. We address this issue by considering how Type I errors can impact meta-analytic results, suggest how statistical power may be affected through the use of multiplicity corrections, and propose how…
The Importance of Teaching Power in Statistical Hypothesis Testing
ERIC Educational Resources Information Center
Olinsky, Alan; Schumacher, Phyllis; Quinn, John
2012-01-01
In this paper, we discuss the importance of teaching power considerations in statistical hypothesis testing. Statistical power analysis determines the ability of a study to detect a meaningful effect size, where the effect size is the difference between the hypothesized value of the population parameter under the null hypothesis and the true value…
The Use of Meta-Analytic Statistical Significance Testing
ERIC Educational Resources Information Center
Polanin, Joshua R.; Pigott, Terri D.
2015-01-01
Meta-analysis multiplicity, the concept of conducting multiple tests of statistical significance within one review, is an underdeveloped literature. We address this issue by considering how Type I errors can impact meta-analytic results, suggest how statistical power may be affected through the use of multiplicity corrections, and propose how…
Advances in Testing the Statistical Significance of Mediation Effects
ERIC Educational Resources Information Center
Mallinckrodt, Brent; Abraham, W. Todd; Wei, Meifen; Russell, Daniel W.
2006-01-01
P. A. Frazier, A. P. Tix, and K. E. Barron (2004) highlighted a normal theory method popularized by R. M. Baron and D. A. Kenny (1986) for testing the statistical significance of indirect effects (i.e., mediator variables) in multiple regression contexts. However, simulation studies suggest that this method lacks statistical power relative to some…
The Importance of Teaching Power in Statistical Hypothesis Testing
ERIC Educational Resources Information Center
Olinsky, Alan; Schumacher, Phyllis; Quinn, John
2012-01-01
In this paper, we discuss the importance of teaching power considerations in statistical hypothesis testing. Statistical power analysis determines the ability of a study to detect a meaningful effect size, where the effect size is the difference between the hypothesized value of the population parameter under the null hypothesis and the true value…
Chi-Square Statistics, Tests of Hypothesis and Technology.
ERIC Educational Resources Information Center
Rochowicz, John A.
The use of technology such as computers and programmable calculators enables students to find p-values and conduct tests of hypotheses in many different ways. Comprehension and interpretation of a research problem become the focus for statistical analysis. This paper describes how to calculate chisquare statistics and p-values for statistical…
A Statistical Test for Comparing Nonnested Covariance Structure Models.
ERIC Educational Resources Information Center
Levy, Roy; Hancock, Gregory R.
While statistical procedures are well known for comparing hierarchically related (nested) covariance structure models, statistical tests for comparing nonhierarchically related (nonnested) models have proven more elusive. While isolated attempts have been made, none exists within the commonly used maximum likelihood estimation framework, thereby…
BIAZA statistics guidelines: toward a common application of statistical tests for zoo research.
Plowman, Amy B
2008-05-01
Zoo research presents many statistical challenges, mostly arising from the need to work with small sample sizes. Efforts to overcome these often lead to the misuse of statistics including pseudoreplication, inappropriate pooling, assumption violation or excessive Type II errors because of using tests with low power to avoid assumption violation. To tackle these issues and make some general statistical recommendations for zoo researchers, the Research Group of the British and Irish Association of Zoos and Aquariums (BIAZA) conducted a workshop. Participants included zoo-based researchers, university academics with zoo interests and three statistical experts. The result was a BIAZA publication Zoo Research Guidelines: Statistics for Typical Zoo Datasets (Plowman [2006] Zoo research guidelines: statistics for zoo datasets. London: BIAZA), which provides advice for zoo researchers on study design and analysis to ensure appropriate and rigorous use of statistics. The main recommendations are: (1) that many typical zoo investigations should be conducted as single case/small N randomized designs, analyzed with randomization tests, (2) that when comparing complete time budgets across conditions in behavioral studies, G tests and their derivatives are the most appropriate statistical tests and (3) that in studies involving multiple dependent and independent variables there are usually no satisfactory alternatives to traditional parametric tests and, despite some assumption violations, it is better to use these tests with careful interpretation, than to lose information through not testing at all. The BIAZA guidelines were recommended by American Association of Zoos and Aquariums (AZA) researchers at the AZA Annual Conference in Tampa, FL, September 2006, and are free to download from www.biaza.org.uk. Zoo Biol 27:226-233, 2008. (c) 2008 Wiley-Liss, Inc.
Bootstrapping Selected Item Statistics from a Student-Made Test.
ERIC Educational Resources Information Center
Burroughs, Monte
This study applied nonparametric bootstrapping to test null hypotheses for selected statistics (KR-20, difficulty, and discrimination) derived from a student-made test. The test, administered to 21 students enrolled in a graduate-level educational assessment class, contained 42 items, 33 of which were analyzed. Random permutations of the data…
Common pitfalls in statistical analysis: The perils of multiple testing
Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc
2016-01-01
Multiple testing refers to situations where a dataset is subjected to statistical testing multiple times - either at multiple time-points or through multiple subgroups or for multiple end-points. This amplifies the probability of a false-positive finding. In this article, we look at the consequences of multiple testing and explore various methods to deal with this issue. PMID:27141478
Statistical Significance Testing Should Be Discontinued in Mathematics Education Research.
ERIC Educational Resources Information Center
Menon, Rama
1993-01-01
Discusses five common myths about statistical significance testing (SST), the possible erroneous and harmful contributions of SST to educational research, and suggested alternatives to SST for mathematics education research. (Contains 61 references.) (MKR)
Evaluation of Multi-parameter Test Statistics for Multiple Imputation.
Liu, Yu; Enders, Craig K
2017-01-01
In Ordinary Least Square regression, researchers often are interested in knowing whether a set of parameters is different from zero. With complete data, this could be achieved using the gain in prediction test, hierarchical multiple regression, or an omnibus F test. However, in substantive research scenarios, missing data often exist. In the context of multiple imputation, one of the current state-of-art missing data strategies, there are several different analogous multi-parameter tests of the joint significance of a set of parameters, and these multi-parameter test statistics can be referenced to various distributions to make statistical inferences. However, little is known about the performance of these tests, and virtually no research study has compared the Type 1 error rates and statistical power of these tests in scenarios that are typical of behavioral science data (e.g., small to moderate samples, etc.). This paper uses Monte Carlo simulation techniques to examine the performance of these multi-parameter test statistics for multiple imputation under a variety of realistic conditions. We provide a number of practical recommendations for substantive researchers based on the simulation results, and illustrate the calculation of these test statistics with an empirical example.
Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie
2013-01-01
Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials.
Statistical Evaluation of Molecular Contamination During Spacecraft Thermal Vacuum Test
NASA Technical Reports Server (NTRS)
Chen, Philip; Hedgeland, Randy; Montoya, Alex; Roman-Velazquez, Juan; Dunn, Jamie; Colony, Joe; Petitto, Joseph
1998-01-01
The purpose of this paper is to evaluate the statistical molecular contamination data with a goal to improve spacecraft contamination control. The statistical data was generated in typical thermal vacuum tests at the National Aeronautics and Space Administration, Goddard Space Flight Center (GSFC). The magnitude of material outgassing was measured using a Quartz Crystal Microbalance (QCM) device during the test. A solvent rinse sample was taken at the conclusion of each test. Then detailed qualitative and quantitative measurements were obtained through chemical analyses. All data used in this study encompassed numerous spacecraft tests in recent years.
Statistical Evaluation of Molecular Contamination During Spacecraft Thermal Vacuum Test
NASA Technical Reports Server (NTRS)
Chen, Philip; Hedgeland, Randy; Montoya, Alex; Roman-Velazquez, Juan; Dunn, Jamie; Colony, Joe; Petitto, Joseph
1999-01-01
The purpose of this paper is to evaluate the statistical molecular contamination data with a goal to improve spacecraft contamination control. The statistical data was generated in typical thermal vacuum tests at the National Aeronautics and Space Administration, Goddard Space Flight Center (GSFC). The magnitude of material outgassing was measured using a Quartz Crystal Microbalance (QCNO device during the test. A solvent rinse sample was taken at the conclusion of each test. Then detailed qualitative and quantitative measurements were obtained through chemical analyses. All data used in this study encompassed numerous spacecraft tests in recent years.
Statistical Evaluation of Molecular Contamination During Spacecraft Thermal Vacuum Test
NASA Technical Reports Server (NTRS)
Chen, Philip; Hedgeland, Randy; Montoya, Alex; Roman-Velazquez, Juan; Dunn, Jamie; Colony, Joe; Petitto, Joseph
1997-01-01
The purpose of this paper is to evaluate the statistical molecular contamination data with a goal to improve spacecraft contamination control. The statistical data was generated in typical thermal vacuum tests at the National Aeronautics and Space Administration, Goddard Space Flight Center (GSFC). The magnitude of material outgassing was measured using a Quartz Crystal Microbalance (QCM) device during the test. A solvent rinse sample was taken at the conclusion of the each test. Then detailed qualitative and quantitative measurements were obtained through chemical analyses. All data used in this study encompassed numerous spacecraft tests in recent years.
Statistical Evaluation of Molecular Contamination During Spacecraft Thermal Vacuum Test
NASA Technical Reports Server (NTRS)
Chen, Philip; Hedgeland, Randy; Montoya, Alex; Roman-Velazquez, Juan; Dunn, Jamie; Colony, Joe; Petitto, Joseph
1998-01-01
The purpose of this paper is to evaluate the statistical molecular contamination data with a goal to improve spacecraft contamination control. The statistical data was generated in typical thermal vacuum tests at the National Aeronautics and Space Administration, Goddard Space Flight Center (GSFC). The magnitude of material outgassing was measured using a Quartz Crystal Microbalance (QCM) device during the test. A solvent rinse sample was taken at the conclusion of each test. Then detailed qualitative and quantitative measurements were obtained through chemical analyses. All data used in this study encompassed numerous spacecraft tests in recent years.
Multiple statistical tests: Lessons from a d20
Madan, Christopher R.
2016-01-01
Statistical analyses are often conducted with α= .05. When multiple statistical tests are conducted, this procedure needs to be adjusted to compensate for the otherwise inflated Type I error. In some instances in tabletop gaming, sometimes it is desired to roll a 20-sided die (or 'd20') twice and take the greater outcome. Here I draw from probability theory and the case of a d20, where the probability of obtaining any specific outcome is 1/ 20, to determine the probability of obtaining a specific outcome (Type-I error) at least once across repeated, independent statistical tests. PMID:27347382
Rabinowitz, Daniel
2003-05-01
The focus of this work is the TDT-type and family-based test statistics used for adjusting for potential confounding due to population heterogeneity or misspecified allele frequencies. A variety of heuristics have been used to motivate and derive these statistics, and the statistics have been developed for a variety of analytic goals. There appears to be no general theoretical framework, however, that may be used to evaluate competing approaches. Furthermore, there is no framework to guide the development of efficient TDT-type and family-based methods for analytic goals for which methods have not yet been proposed. The purpose of this paper is to present a theoretical framework that serves both to identify the information which is available to methods that are immune to confounding due to population heterogeneity or misspecified allele frequencies, and to inform the construction of efficient unbiased tests in novel settings. The development relies on the existence of a characterization of the null hypothesis in terms of a completely specified conditional distribution of transmitted genotypes. An important observation is that, with such a characterization, when the conditioning event is unobserved or incomplete, there is statistical information that cannot be exploited by any exact conditional test. The main technical result of this work is an approach to computing test statistics for local alternatives that exploit all of the available statistical information. Copyright 2003 Wiley-Liss, Inc.
Statistical Measures of Integrity in Online Testing: Empirical Study
ERIC Educational Resources Information Center
Wielicki, Tom
2016-01-01
This paper reports on longitudinal study regarding integrity of testing in an online format as used by e-learning platforms. Specifically, this study explains whether online testing, which implies an open book format is compromising integrity of assessment by encouraging cheating among students. Statistical experiment designed for this study…
Your Chi-Square Test Is Statistically Significant: Now What?
ERIC Educational Resources Information Center
Sharpe, Donald
2015-01-01
Applied researchers have employed chi-square tests for more than one hundred years. This paper addresses the question of how one should follow a statistically significant chi-square test result in order to determine the source of that result. Four approaches were evaluated: calculating residuals, comparing cells, ransacking, and partitioning. Data…
Your Chi-Square Test Is Statistically Significant: Now What?
ERIC Educational Resources Information Center
Sharpe, Donald
2015-01-01
Applied researchers have employed chi-square tests for more than one hundred years. This paper addresses the question of how one should follow a statistically significant chi-square test result in order to determine the source of that result. Four approaches were evaluated: calculating residuals, comparing cells, ransacking, and partitioning. Data…
Statistical significance test for transition matrices of atmospheric Markov chains
NASA Technical Reports Server (NTRS)
Vautard, Robert; Mo, Kingtse C.; Ghil, Michael
1990-01-01
Low-frequency variability of large-scale atmospheric dynamics can be represented schematically by a Markov chain of multiple flow regimes. This Markov chain contains useful information for the long-range forecaster, provided that the statistical significance of the associated transition matrix can be reliably tested. Monte Carlo simulation yields a very reliable significance test for the elements of this matrix. The results of this test agree with previously used empirical formulae when each cluster of maps identified as a distinct flow regime is sufficiently large and when they all contain a comparable number of maps. Monte Carlo simulation provides a more reliable way to test the statistical significance of transitions to and from small clusters. It can determine the most likely transitions, as well as the most unlikely ones, with a prescribed level of statistical significance.
Revisit the 21-day cumulative irritation test - statistical considerations.
Zhang, Paul; Li, Qing
2017-03-01
The 21-day cumulative irritation test is widely used for evaluating the irritation potential of topical skin-care products. This test consists of clinician's assessment of skin reaction of the patch sites and a classification system to categorize the test product's irritation potential. A new classification system is proposed which enables us to control the estimation error and provides a statistical confidence with regard to the repeatability of the classification.
Statistical Approach to the Operational Testing of Space Fence
2015-07-01
size of satellite tracks needed to estimate metric accuracy for individual TEARR parameters, which we calculate via chi - square hypothesis tests on...decisions. Index Terms— Analysis of variance, Operational testing , Least squares methods, Phased array radar, Satellite tracking, SATCAT, Space...I N S T I T U T E F O R D E F E N S E A N A L Y S E S Statistical Approach to the Operational Testing of Space Fence Daniel L. Pechkis Nelson S
[Clinical research IV. Relevancy of the statistical test chosen].
Talavera, Juan O; Rivas-Ruiz, Rodolfo
2011-01-01
When we look at the difference between two therapies or the association of a risk factor or prognostic indicator with its outcome, we need to evaluate the accuracy of the result. This assessment is based on a judgment that uses information about the study design and statistical management of the information. This paper specifically mentions the relevance of the statistical test selected. Statistical tests are chosen mainly from two characteristics: the objective of the study and type of variables. The objective can be divided into three test groups: a) those in which you want to show differences between groups or inside a group before and after a maneuver, b) those that seek to show the relationship (correlation) between variables, and c) those that aim to predict an outcome. The types of variables are divided in two: quantitative (continuous and discontinuous) and qualitative (ordinal and dichotomous). For example, if we seek to demonstrate differences in age (quantitative variable) among patients with systemic lupus erythematosus (SLE) with and without neurological disease (two groups), the appropriate test is the "Student t test for independent samples." But if the comparison is about the frequency of females (binomial variable), then the appropriate statistical test is the χ(2).
Distributions of Hardy-Weinberg equilibrium test statistics.
Rohlfs, R V; Weir, B S
2008-11-01
It is well established that test statistics and P-values derived from discrete data, such as genetic markers, are also discrete. In most genetic applications, the null distribution for a discrete test statistic is approximated with a continuous distribution, but this approximation may not be reasonable. In some cases using the continuous approximation for the expected null distribution may cause truly null test statistics to appear nonnull. We explore the implications of using continuous distributions to approximate the discrete distributions of Hardy-Weinberg equilibrium test statistics and P-values. We derive exact P-value distributions under the null and alternative hypotheses, enabling a more accurate analysis than is possible with continuous approximations. We apply these methods to biological data and find that using continuous distribution theory with exact tests may underestimate the extent of Hardy-Weinberg disequilibrium in a sample. The implications may be most important for the widespread use of whole-genome case-control association studies and Hardy-Weinberg equilibrium (HWE) testing for data quality control.
ROTS: An R package for reproducibility-optimized statistical testing.
Suomi, Tomi; Seyednasrollah, Fatemeh; Jaakkola, Maria K; Faux, Thomas; Elo, Laura L
2017-05-01
Differential expression analysis is one of the most common types of analyses performed on various biological data (e.g. RNA-seq or mass spectrometry proteomics). It is the process that detects features, such as genes or proteins, showing statistically significant differences between the sample groups under comparison. A major challenge in the analysis is the choice of an appropriate test statistic, as different statistics have been shown to perform well in different datasets. To this end, the reproducibility-optimized test statistic (ROTS) adjusts a modified t-statistic according to the inherent properties of the data and provides a ranking of the features based on their statistical evidence for differential expression between two groups. ROTS has already been successfully applied in a range of different studies from transcriptomics to proteomics, showing competitive performance against other state-of-the-art methods. To promote its widespread use, we introduce here a Bioconductor R package for performing ROTS analysis conveniently on different types of omics data. To illustrate the benefits of ROTS in various applications, we present three case studies, involving proteomics and RNA-seq data from public repositories, including both bulk and single cell data. The package is freely available from Bioconductor (https://www.bioconductor.org/packages/ROTS).
Shukla, R.; Yu Daohai; Fulk, F.
1995-12-31
Short-term toxicity tests with aquatic organisms are a valuable measurement tool in the assessment of the toxicity of effluents, environmental samples and single chemicals. Currently toxicity tests are utilized in a wide range of US EPA regulatory activities including effluent discharge compliance. In the current approach for determining the No Observed Effect Concentration, an effluent concentration is presumed safe if there is no statistically significant difference in toxicant response versus control response. The conclusion of a safe concentration may be due to the fact that it truly is safe, or alternatively, that the ability of the statistical test to detect an effect, given its existence, is inadequate. Results of research of a new statistical approach, the basis of which is to move away from a demonstration of no difference to a demonstration of equivalence, will be discussed. The concept of observed confidence distributions, first suggested by Cox, is proposed as a measure of the strength of evidence for practically equivalent responses between a given effluent concentration and the control. The research included determination of intervals of practically equivalent responses as a function of the variability of control response. The approach is illustrated using reproductive data from tests with Ceriodaphnia dubia and survival and growth data from tests with fathead minnow. The data are from the US EPA`s National Reference Toxicant Database.
Innovative role of statistics in acid rain performance testing
Warren-Hicks, W.; Etchison, T.; Lieberman, E.R.
1995-12-31
Title IV of the Clean Air Act Amendments (CAAAs) of 1990 mandated that affected electric utilities reduce sulfur dioxide (SO{sub 2}) and nitrogen oxide (NO{sub x}) emissions, the primary precursors of acidic deposition, and included an innovative market-based SO{sub 2} regulatory program. A central element of the Acid Rain Program is the requirement that affected utility units install CEMS. This paper describes how the Acid Rain Regulations incorporated statistical procedures in the performance tests for continuous emissions monitoring systems (CEMS) and how statistical analysis was used to assess the appropriateness, stringency, and potential impact of various performance tests and standards that were considered for inclusion in the Acid Rain Regulations. Described here is the statistical analysis that was used to set a relative accuracy standard, establish the calculation procedures for filling in missing data when a monitor malfunctions, and evaluate the performance tests applied to petitions for alternative monitoring systems. The paper concludes that the statistical evaluations of proposed provisions of the Acid Rain Regulations resulted in the adoption of performance tests and standards that were scientifically substantiated, workable, and effective.
Conducting tests for statistically significant differences using forest inventory data
James A. Westfall; Scott A. Pugh; John W. Coulston
2013-01-01
Many forest inventory and monitoring programs are based on a sample of ground plots from which estimates of forest resources are derived. In addition to evaluating metrics such as number of trees or amount of cubic wood volume, it is often desirable to make comparisons between resource attributes. To properly conduct statistical tests for differences, it is imperative...
Statistical Studies on Sequential Probability Ratio Test for Radiation Detection
Warnick Kernan, Ding Yuan, et al.
2007-07-01
A Sequential Probability Ratio Test (SPRT) algorithm helps to increase the reliability and speed of radiation detection. This algorithm is further improved to reduce spatial gap and false alarm. SPRT, using Last-in-First-Elected-Last-Out (LIFELO) technique, reduces the error between the radiation measured and resultant alarm. Statistical analysis determines the reduction of spatial error and false alarm.
Wavelet analysis in ecology and epidemiology: impact of statistical tests
Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario
2014-01-01
Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the ‘beta-surrogate’ method. PMID:24284892
Wavelet analysis in ecology and epidemiology: impact of statistical tests.
Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario
2014-02-06
Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the 'beta-surrogate' method.
Huang, Tzu-Hsueh; Ning, Xinghai; Wang, Xiaojian; Murthy, Niren; Tzeng, Yih-Ling; Dickson, Robert M
2015-02-03
Flow cytometry holds promise to accelerate antibiotic susceptibility determinations; however, without robust multidimensional statistical analysis, general discrimination criteria have remained elusive. In this study, a new statistical method, probability binning signature quadratic form (PB-sQF), was developed and applied to analyze flow cytometric data of bacterial responses to antibiotic exposure. Both sensitive lab strains (Escherichia coli and Pseudomonas aeruginosa) and a multidrug resistant, clinically isolated strain (E. coli) were incubated with the bacteria-targeted dye, maltohexaose-conjugated IR786, and each of many bactericidal or bacteriostatic antibiotics to identify changes induced around corresponding minimum inhibition concentrations (MIC). The antibiotic-induced damages were monitored by flow cytometry after 1-h incubation through forward scatter, side scatter, and fluorescence channels. The 3-dimensional differences between the flow cytometric data of the no-antibiotic treated bacteria and the antibiotic-treated bacteria were characterized by PB-sQF into a 1-dimensional linear distance. A 99% confidence level was established by statistical bootstrapping for each antibiotic-bacteria pair. For the susceptible E. coli strain, statistically significant increments from this 99% confidence level were observed from 1/16x MIC to 1x MIC for all the antibiotics. The same increments were recorded for P. aeruginosa, which has been reported to cause difficulty in flow-based viability tests. For the multidrug resistant E. coli, significant distances from control samples were observed only when an effective antibiotic treatment was utilized. Our results suggest that a rapid and robust antimicrobial susceptibility test (AST) can be constructed by statistically characterizing the differences between sample and control flow cytometric populations, even in a label-free scheme with scattered light alone. These distances vs paired controls coupled with rigorous
Compound p-value statistics for multiple testing procedures
Habiger, Joshua D.; Peña, Edsel A.
2014-01-01
Many multiple testing procedures make use of the p-values from the individual pairs of hypothesis tests, and are valid if the p-value statistics are independent and uniformly distributed under the null hypotheses. However, it has recently been shown that these types of multiple testing procedures are inefficient since such p-values do not depend upon all of the available data. This paper provides tools for constructing compound p-value statistics, which are those that depend upon all of the available data, but still satisfy the conditions of independence and uniformity under the null hypotheses. Several examples are provided, including a class of compound p-value statistics for testing location shifts. It is demonstrated, both analytically and through simulations, that multiple testing procedures tend to reject more false null hypotheses when applied to these compound p-values rather than the usual p-values, and at the same time still guarantee the desired type I error rate control. The compound p-values are used to analyze a real microarray data set and allow for more rejected null hypotheses. PMID:25076800
Compound p-value statistics for multiple testing procedures.
Habiger, Joshua D; Peña, Edsel A
2014-04-01
Many multiple testing procedures make use of the p-values from the individual pairs of hypothesis tests, and are valid if the p-value statistics are independent and uniformly distributed under the null hypotheses. However, it has recently been shown that these types of multiple testing procedures are inefficient since such p-values do not depend upon all of the available data. This paper provides tools for constructing compound p-value statistics, which are those that depend upon all of the available data, but still satisfy the conditions of independence and uniformity under the null hypotheses. Several examples are provided, including a class of compound p-value statistics for testing location shifts. It is demonstrated, both analytically and through simulations, that multiple testing procedures tend to reject more false null hypotheses when applied to these compound p-values rather than the usual p-values, and at the same time still guarantee the desired type I error rate control. The compound p-values are used to analyze a real microarray data set and allow for more rejected null hypotheses.
Mean-squared-displacement statistical test for fractional Brownian motion
NASA Astrophysics Data System (ADS)
Sikora, Grzegorz; Burnecki, Krzysztof; Wyłomańska, Agnieszka
2017-03-01
Anomalous diffusion in crowded fluids, e.g., in cytoplasm of living cells, is a frequent phenomenon. A common tool by which the anomalous diffusion of a single particle can be classified is the time-averaged mean square displacement (TAMSD). A classical mechanism leading to the anomalous diffusion is the fractional Brownian motion (FBM). A validation of such process for single-particle tracking data is of great interest for experimentalists. In this paper we propose a rigorous statistical test for FBM based on TAMSD. To this end we analyze the distribution of the TAMSD statistic, which is given by the generalized chi-squared distribution. Next, we study the power of the test by means of Monte Carlo simulations. We show that the test is very sensitive for changes of the Hurst parameter. Moreover, it can easily distinguish between two models of subdiffusion: FBM and continuous-time random walk.
Validated intraclass correlation statistics to test item performance models.
Courrieu, Pierre; Brand-D'abrescia, Muriele; Peereman, Ronald; Spieler, Daniel; Rey, Arnaud
2011-03-01
A new method, with an application program in Matlab code, is proposed for testing item performance models on empirical databases. This method uses data intraclass correlation statistics as expected correlations to which one compares simple functions of correlations between model predictions and observed item performance. The method rests on a data population model whose validity for the considered data is suitably tested and has been verified for three behavioural measure databases. Contrarily to usual model selection criteria, this method provides an effective way of testing under-fitting and over-fitting, answering the usually neglected question "does this model suitably account for these data?"
The Probability of Obtaining Two Statistically Different Test Scores as a Test Index
ERIC Educational Resources Information Center
Muller, Jorg M.
2006-01-01
A new test index is defined as the probability of obtaining two randomly selected test scores (PDTS) as statistically different. After giving a concept definition of the test index, two simulation studies are presented. The first analyzes the influence of the distribution of test scores, test reliability, and sample size on PDTS within classical…
Asymptotics of Bonferroni for Dependent Normal Test Statistics.
Proschan, Michael A; Shaw, Pamela A
2011-07-01
The Bonferroni adjustment is sometimes used to control the familywise error rate (FWE) when the number of comparisons is huge. In genome wide association studies, researchers compare cases to controls with respect to thousands of single nucleotide polymorphisms. It has been claimed that the Bonferroni adjustment is only slightly conservative if the comparisons are nearly independent. We show that the veracity of this claim depends on how one defines "nearly." Specifically, if the test statistics' pairwise correlations converge to 0 as the number of tests tend to ∞, the conservatism of the Bonferroni procedure depends on their rate of convergence. The type I error rate of Bonferroni can tend to 0 or 1 - exp(-α) ≈ α, depending on that rate. We show using elementary probability theory what happens to the distribution of the number of errors when using Bonferroni, as the number of dependent normal test statistics gets large. We also use the limiting behavior of Bonferroni to shed light on properties of other commonly used test statistics.
Statistical testing against baseline was common in dental research.
Koletsi, Despina; Madahar, Arun; Fleming, Padhraig S; Pandis, Nikolaos
2015-07-01
To assess the presence of within-group comparisons with baseline in a subset of leading dental journals and to explore possible associations with a range of study characteristics including journal and study design. Thirty consecutive issues of five leading dental journals were electronically searched. The conduct and reporting of statistical analysis in respect of comparisons against baseline or otherwise along with the manner of interpretation of the results were assessed. Descriptive statistics were obtained, and chi-square test and Fisher's exact were undertaken to test the association between trial characteristics and overall study interpretation. A total of 184 studies were included with the highest proportion published in Journal of Endodontics (n = 84, 46%) and most involving a single center (n = 157, 85%). Overall, 43 studies (23%) presented interpretation of their outcomes based solely on comparisons against baseline. Inappropriate use of baseline testing was found to be less likely in interventional studies (P < 0.001). Use of comparisons with baseline appears to be common among both observational and interventional research studies in dentistry. Enhanced conduct and reporting of statistical tests are required to ensure that inferences from research studies are appropriate and informative. Copyright © 2015 Elsevier Inc. All rights reserved.
A critique of statistical hypothesis testing in clinical research
Raha, Somik
2011-01-01
Many have documented the difficulty of using the current paradigm of Randomized Controlled Trials (RCTs) to test and validate the effectiveness of alternative medical systems such as Ayurveda. This paper critiques the applicability of RCTs for all clinical knowledge-seeking endeavors, of which Ayurveda research is a part. This is done by examining statistical hypothesis testing, the underlying foundation of RCTs, from a practical and philosophical perspective. In the philosophical critique, the two main worldviews of probability are that of the Bayesian and the frequentist. The frequentist worldview is a special case of the Bayesian worldview requiring the unrealistic assumptions of knowing nothing about the universe and believing that all observations are unrelated to each other. Many have claimed that the first belief is necessary for science, and this claim is debunked by comparing variations in learning with different prior beliefs. Moving beyond the Bayesian and frequentist worldviews, the notion of hypothesis testing itself is challenged on the grounds that a hypothesis is an unclear distinction, and assigning a probability on an unclear distinction is an exercise that does not lead to clarity of action. This critique is of the theory itself and not any particular application of statistical hypothesis testing. A decision-making frame is proposed as a way of both addressing this critique and transcending ideological debates on probability. An example of a Bayesian decision-making approach is shown as an alternative to statistical hypothesis testing, utilizing data from a past clinical trial that studied the effect of Aspirin on heart attacks in a sample population of doctors. As a big reason for the prevalence of RCTs in academia is legislation requiring it, the ethics of legislating the use of statistical methods for clinical research is also examined. PMID:22022152
Statistical tests on clustered global earthquake synthetic data sets
NASA Astrophysics Data System (ADS)
Daub, Eric G.; Trugman, Daniel T.; Johnson, Paul A.
2015-08-01
We study the ability of statistical tests to identify nonrandom features of earthquake catalogs, with a focus on the global earthquake record since 1900. We construct four types of synthetic data sets containing varying strengths of clustering, with each data set containing on average 10,000 events over 100 years with magnitudes above M = 6. We apply a suite of statistical tests to each synthetic realization in order to evaluate the ability of each test to identify the sequences of events as nonrandom. Our results show that detection ability is dependent on the quantity of data, the nature of the type of clustering, and the specific signal used in the statistical test. Data sets that exhibit a stronger variation in the seismicity rate are generally easier to identify as nonrandom for a given background rate. We also show that we can address this problem in a Bayesian framework, with the clustered data sets as prior distributions. Using this new Bayesian approach, we can place quantitative bounds on the range of possible clustering strengths that are consistent with the global earthquake data. At M = 7, we can estimate 99th percentile confidence bounds on the number of triggered events, with an upper bound of 20% of the catalog for global aftershock sequences, with a stronger upper bound on the fraction of triggered events of 10% for long-term event clusters. At M = 8, the bounds are less strict due to the reduced number of events. However, our analysis shows that other types of clustering could be present in the data that we are unable to detect. Our results aid in the interpretation of the results of statistical tests on earthquake catalogs, both worldwide and regionally.
Statistical comparison of similarity tests applied to speech production data
NASA Astrophysics Data System (ADS)
Kollia, H.; Jorgenson, Jay; Saint Fleur, Rose; Foster, Kevin
2004-05-01
Statistical analysis of data variability in speech production research has traditionally been addressed with the assumption of normally distributed error terms. The correct and valid application of statistical procedure requires a thorough investigation of the assumptions that underlie the methodology. In previous work [Kollia and Jorgenson, J. Acoust. Soc. Am. 102 (1997); 109 (2002)], it was shown that the error terms of speech production data in a linear regression can be modeled accurately using a quadratic probability distribution, rather than a normal distribution as is frequently assumed. The measurement used in the earlier Kollia-Jorgenson work involved the classical Kolmogorov-Smirnov statistical test. In the present work, the authors further explore the problem of analyzing the error terms coming from linear regression using a variety of known statistical tests, including, but not limited to chi-square, Kolmogorov-Smirnov, Anderson-Darling, Cramer-von Mises, skewness and kurtosis, and Durbin. Our study complements a similar study by Shapiro, Wilk, and Chen [J. Am. Stat. Assoc. (1968)]. [Partial support provided by PSC-CUNY and NSF to Jay Jorgenson.
LATTE - Linking Acoustic Tests and Tagging Using Statistical Estimation
2013-09-30
after C. Harris’ maternity leave. WORK COMPLETED The project started in April 2010. Task 3 (data processing) is now essentially over, and task 4...Marques et al. An overview of LATTE: Linking Acoustic Tests and Tagging using statistical Estimation: Modelling the Behaviour of Beaked Whales in...geometric (Figure 2). 5 Figure 2 – Conceptual description of the dive cycle of a beaked whale considering 7 behavioural states: 1. At the
Statistical tests of ARIES data. [very long base interferometry geodesy
NASA Technical Reports Server (NTRS)
Musman, S.
1982-01-01
Statistical tests are performed on Project ARIES preliminary baseline measurements in the Southern California triangle formed by the Jet Propulsion Laboratory, the Owens Valley Radio Observatory, and the Goldstone tracking complex during 1976-1980. In addition to conventional one-dimensional tests a two-dimensional test which allows for an arbitrary correlation between errors in individual components is formulated using the Hotelling statistic. On two out of three baselines the mean rate of change in baseline vector is statistically significant. Apparent motions on all three baselines are consistent with a pure shear with north-south compression and east-west expansion of 1 x 10 to the -7th/year. The ARIES measurements are consistent with the USGS geodolite networks in Southern California and the SAFE laser satellite ranging experiment. All three experiments are consistent with a 6 cm/year motion between the Pacific and North American Plates and a band of diffuse shear 300 km wide, except that corresponding rotation of the entire triangle is not found.
Statistical tests of ARIES data. [very long base interferometry geodesy
NASA Technical Reports Server (NTRS)
Musman, S.
1982-01-01
Statistical tests are performed on Project ARIES preliminary baseline measurements in the Southern California triangle formed by the Jet Propulsion Laboratory, the Owens Valley Radio Observatory, and the Goldstone tracking complex during 1976-1980. In addition to conventional one-dimensional tests a two-dimensional test which allows for an arbitrary correlation between errors in individual components is formulated using the Hotelling statistic. On two out of three baselines the mean rate of change in baseline vector is statistically significant. Apparent motions on all three baselines are consistent with a pure shear with north-south compression and east-west expansion of 1 x 10 to the -7th/year. The ARIES measurements are consistent with the USGS geodolite networks in Southern California and the SAFE laser satellite ranging experiment. All three experiments are consistent with a 6 cm/year motion between the Pacific and North American Plates and a band of diffuse shear 300 km wide, except that corresponding rotation of the entire triangle is not found.
Statistical Treatment of Earth Observing System Pyroshock Separation Test Data
NASA Technical Reports Server (NTRS)
McNelis, Anne M.; Hughes, William O.
1998-01-01
The Earth Observing System (EOS) AM-1 spacecraft for NASA's Mission to Planet Earth is scheduled to be launched on an Atlas IIAS vehicle in June of 1998. One concern is that the instruments on the EOS spacecraft are sensitive to the shock-induced vibration produced when the spacecraft separates from the launch vehicle. By employing unique statistical analysis to the available ground test shock data, the NASA Lewis Research Center found that shock-induced vibrations would not be as great as the previously specified levels of Lockheed Martin. The EOS pyroshock separation testing, which was completed in 1997, produced a large quantity of accelerometer data to characterize the shock response levels at the launch vehicle/spacecraft interface. Thirteen pyroshock separation firings of the EOS and payload adapter configuration yielded 78 total measurements at the interface. The multiple firings were necessary to qualify the newly developed Lockheed Martin six-hardpoint separation system. Because of the unusually large amount of data acquired, Lewis developed a statistical methodology to predict the maximum expected shock levels at the interface between the EOS spacecraft and the launch vehicle. Then, this methodology, which is based on six shear plate accelerometer measurements per test firing at the spacecraft/launch vehicle interface, was used to determine the shock endurance specification for EOS. Each pyroshock separation test of the EOS spacecraft simulator produced its own set of interface accelerometer data. Probability distributions, histograms, the median, and higher order moments (skew and kurtosis) were analyzed. The data were found to be lognormally distributed, which is consistent with NASA pyroshock standards. Each set of lognormally transformed test data produced was analyzed to determine if the data should be combined statistically. Statistical testing of the data's standard deviations and means (F and t testing, respectively) determined if data sets were
Rare-Variant Association Analysis: Study Designs and Statistical Tests
Lee, Seunggeung; Abecasis, Gonçalo R.; Boehnke, Michael; Lin, Xihong
2014-01-01
Despite the extensive discovery of trait- and disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants can explain additional disease risk or trait variability. An increasing number of studies are underway to identify trait- and disease-associated rare variants. In this review, we provide an overview of statistical issues in rare-variant association studies with a focus on study designs and statistical tests. We present the design and analysis pipeline of rare-variant studies and review cost-effective sequencing designs and genotyping platforms. We compare various gene- or region-based association tests, including burden tests, variance-component tests, and combined omnibus tests, in terms of their assumptions and performance. Also discussed are the related topics of meta-analysis, population-stratification adjustment, genotype imputation, follow-up studies, and heritability due to rare variants. We provide guidelines for analysis and discuss some of the challenges inherent in these studies and future research directions. PMID:24995866
A Statistical Test for Detecting Answer Copying on Multiple-Choice Tests
ERIC Educational Resources Information Center
van der Linden, Wim J.; Sotaridona, Leonardo
2004-01-01
A statistical test for the detection of answer copying on multiple-choice tests is presented. The test is based on the idea that the answers of examinees to test items may be the result of three possible processes: (1) knowing, (2) guessing, and (3) copying, but that examinees who do not have access to the answers of other examinees can arrive at…
n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator
2012-09-12
nSIGHTS (n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator) is a comprehensive well test analysis software package. It provides a user-interface, a well test analysis model and many tools to analyze both field and simulated data. The well test analysis model simulates a single-phase, one-dimensional, radial/non-radial flow regime, with a borehole at the center of the modeled flow system. nSIGHTS solves the radially symmetric n-dimensional forward flow problem using a solver based on a graph-theoretic approach. The results of the forward simulation are pressure, and flow rate, given all the input parameters. The parameter estimation portion of nSIGHTS uses a perturbation-based approach to interpret the best-fit well and reservoir parameters, given an observed dataset of pressure and flow rate.
Shaikh, Masood Ali
2016-04-01
Statistical tests help infer meaningful conclusions from studies conducted and data collected. This descriptive study analyzed the type of statistical tests used and the statistical software utilized for analysis reported in the original articles published in 2014 by the three Medline-indexed journals of Pakistan. Cumulatively, 466 original articles were published in 2014. The most frequently reported statistical tests for original articles by all three journals were bivariate parametric and non-parametric tests i.e. involving comparisons between two groups e.g. Chi-square test, t-test, and various types of correlations. Cumulatively, 201 (43.1%) articles used these tests. SPSS was the primary choice for statistical analysis, as it was exclusively used in 374 (80.3%) original articles. There has been a substantial increase in the number of articles published, and in the sophistication of statistical tests used in the articles published in the Pakistani Medline indexed journals in 2014, compared to 2007.
DERIVATION OF A TEST STATISTIC FOR EMPHYSEMA QUANTIFICATION
Vegas-Sanchez-Ferrero, Gonzalo; Washko, George; Rahaghi, Farbod N.; Ledesma-Carbayo, Maria J.; Estépar, R. San José
2016-01-01
Density masking is the de-facto quantitative imaging phenotype for emphysema that is widely used by the clinical community. Density masking defines the burden of emphysema by a fixed threshold, usually between −910 HU and −950 HU, that has been experimentally validated with histology. In this work, we formalized emphysema quantification by means of statistical inference. We show that a non-central Gamma is a good approximation for the local distribution of image intensities for normal and emphysema tissue. We then propose a test statistic in terms of the sample mean of a truncated non-central Gamma random variable. Our results show that this approach is well-suited for the detection of emphysema and superior to standard density masking. The statistical method was tested in a dataset of 1337 samples obtained from 9 different scanner models in subjects with COPD. Results showed an increase of 17% when compared to the density masking approach, and an overall accuracy of 94.09%. PMID:27974952
Statistical Hypothesis Testing in Intraspecific Phylogeography: NCPA versus ABC
Templeton, Alan R.
2009-01-01
Nested clade phylogeographic analysis (NCPA) and approximate Bayesian computation (ABC) have been used to test phylogeographic hypotheses. Multilocus NCPA tests null hypotheses, whereas ABC discriminates among a finite set of alternatives. The interpretive criteria of NCPA are explicit and allow complex models to be built from simple components. The interpretive criteria of ABC are ad hoc and require the specification of a complete phylogeographic model. The conclusions from ABC are often influenced by implicit assumptions arising from the many parameters needed to specify a complex model. These complex models confound many assumptions so that biological interpretations are difficult. Sampling error is accounted for in NCPA, but ABC ignores important sources of sampling error that creates pseudo-statistical power. NCPA generates the full sampling distribution of its statistics, but ABC only yields local probabilities, which in turn make it impossible to distinguish between a good fitting model, a non-informative model, and an over-determined model. Both NCPA and ABC use approximations, but convergences of the approximations used in NCPA are well defined whereas those in ABC are not. NCPA can analyze a large number of locations, but ABC cannot. Finally, the dimensionality of tested hypothesis is known in NCPA, but not for ABC. As a consequence, the “probabilities” generated by ABC are not true probabilities and are statistically non-interpretable. Accordingly, ABC should not be used for hypothesis testing, but simulation approaches are valuable when used in conjunction with NCPA or other methods that do not rely on highly parameterized models. PMID:19192182
Testing manifest monotonicity using order-constrained statistical inference.
Tijmstra, Jesper; Hessen, David J; van der Heijden, Peter G M; Sijtsma, Klaas
2013-01-01
Most dichotomous item response models share the assumption of latent monotonicity, which states that the probability of a positive response to an item is a nondecreasing function of a latent variable intended to be measured. Latent monotonicity cannot be evaluated directly, but it implies manifest monotonicity across a variety of observed scores, such as the restscore, a single item score, and in some cases the total score. In this study, we show that manifest monotonicity can be tested by means of the order-constrained statistical inference framework. We propose a procedure that uses this framework to determine whether manifest monotonicity should be rejected for specific items. This approach provides a likelihood ratio test for which the p-value can be approximated through simulation. A simulation study is presented that evaluates the Type I error rate and power of the test, and the procedure is applied to empirical data.
Statistical testing of association between menstruation and migraine.
Barra, Mathias; Dahl, Fredrik A; Vetvik, Kjersti G
2015-02-01
To repair and refine a previously proposed method for statistical analysis of association between migraine and menstruation. Menstrually related migraine (MRM) affects about 20% of female migraineurs in the general population. The exact pathophysiological link from menstruation to migraine is hypothesized to be through fluctuations in female reproductive hormones, but the exact mechanisms remain unknown. Therefore, the main diagnostic criterion today is concurrency of migraine attacks with menstruation. Methods aiming to exclude spurious associations are wanted, so that further research into these mechanisms can be performed on a population with a true association. The statistical method is based on a simple two-parameter null model of MRM (which allows for simulation modeling), and Fisher's exact test (with mid-p correction) applied to standard 2 × 2 contingency tables derived from the patients' headache diaries. Our method is a corrected version of a previously published flawed framework. To our best knowledge, no other published methods for establishing a menstruation-migraine association by statistical means exist today. The probabilistic methodology shows good performance when subjected to receiver operator characteristic curve analysis. Quick reference cutoff values for the clinical setting were tabulated for assessing association given a patient's headache history. In this paper, we correct a proposed method for establishing association between menstruation and migraine by statistical methods. We conclude that the proposed standard of 3-cycle observations prior to setting an MRM diagnosis should be extended with at least one perimenstrual window to obtain sufficient information for statistical processing. © 2014 American Headache Society.
Schmidt, Robert L; Chute, Deborah J; Colbert-Getz, Jorie M; Firpo-Betancourt, Adolfo; James, Daniel S; Karp, Julie K; Miller, Douglas C; Milner, Danny A; Smock, Kristi J; Sutton, Ann T; Walker, Brandon S; White, Kristie L; Wilson, Andrew R; Wojcik, Eva M; Yared, Marwan A; Factor, Rachel E
2017-02-01
-Statistical literacy can be defined as understanding the statistical tests and terminology needed for the design, analysis, and conclusions of original research or laboratory testing. Little is known about the statistical literacy of clinical or anatomic pathologists. -To determine the statistical methods most commonly used in pathology studies from the literature and to assess familiarity and knowledge level of these statistical tests by pathology residents and practicing pathologists. -The most frequently used statistical methods were determined by a review of 1100 research articles published in 11 pathology journals during 2015. Familiarity with statistical methods was determined by a survey of pathology trainees and practicing pathologists at 9 academic institutions in which pathologists were asked to rate their knowledge of the methods identified by the focused review of the literature. -We identified 18 statistical tests that appear frequently in published pathology studies. On average, pathologists reported a knowledge level between "no knowledge" and "basic knowledge" of most statistical tests. Knowledge of tests was higher for more frequently used tests. Greater statistical knowledge was associated with a focus on clinical pathology versus anatomic pathology, having had a statistics course, having an advanced degree other than an MD degree, and publishing research. Statistical knowledge was not associated with length of pathology practice. -An audit of pathology literature reveals that knowledge of about 12 statistical tests would be sufficient to provide statistical literacy for pathologists. On average, most pathologists report they can interpret commonly used tests but are unable to perform them. Most pathologists indicated that they would benefit from additional statistical training.
Quantum Statistical Testing of a Quantum Random Number Generator
Humble, Travis S
2014-01-01
The unobservable elements in a quantum technology, e.g., the quantum state, complicate system verification against promised behavior. Using model-based system engineering, we present methods for verifying the opera- tion of a prototypical quantum random number generator. We begin with the algorithmic design of the QRNG followed by the synthesis of its physical design requirements. We next discuss how quantum statistical testing can be used to verify device behavior as well as detect device bias. We conclude by highlighting how system design and verification methods must influence effort to certify future quantum technologies.
Quantum statistical testing of a quantum random number generator
NASA Astrophysics Data System (ADS)
Humble, Travis S.
2014-10-01
The unobservable elements in a quantum technology, e.g., the quantum state, complicate system verification against promised behavior. Using model-based system engineering, we present methods for verifying the operation of a prototypical quantum random number generator. We begin with the algorithmic design of the QRNG followed by the synthesis of its physical design requirements. We next discuss how quantum statistical testing can be used to verify device behavior as well as detect device bias. We conclude by highlighting how system design and verification methods must influence effort to certify future quantum technologies.
Statistical characteristics of mechanical heart valve cavitation in accelerated testing.
Wu, Changfu; Hwang, Ned H C; Lin, Yu-Kweng M
2004-07-01
Cavitation damage has been observed on mechanical heart valves (MHVs) undergoing accelerated testing. Cavitation itself can be modeled as a stochastic process, as it varies from beat to beat of the testing machine. This in-vitro study was undertaken to investigate the statistical characteristics of MHV cavitation. A 25-mm St. Jude Medical bileaflet MHV (SJM 25) was tested in an accelerated tester at various pulse rates, ranging from 300 to 1,000 bpm, with stepwise increments of 100 bpm. A miniature pressure transducer was placed near a leaflet tip on the inflow side of the valve, to monitor regional transient pressure fluctuations at instants of valve closure. The pressure trace associated with each beat was passed through a 70 kHz high-pass digital filter to extract the high-frequency oscillation (HFO) components resulting from the collapse of cavitation bubbles. Three intensity-related measures were calculated for each HFO burst: its time span; its local root-mean-square (LRMS) value; and the area enveloped by the absolute value of the HFO pressure trace and the time axis, referred to as cavitation impulse. These were treated as stochastic processes, of which the first-order probability density functions (PDFs) were estimated for each test rate. Both the LRMS value and cavitation impulse were log-normal distributed, and the time span was normal distributed. These distribution laws were consistent at different test rates. The present investigation was directed at understanding MHV cavitation as a stochastic process. The results provide a basis for establishing further the statistical relationship between cavitation intensity and time-evolving cavitation damage on MHV surfaces. These data are required to assess and compare the performance of MHVs of different designs.
ERIC Educational Resources Information Center
Ervin, Nancy S.
How accurately deltas (statistics measuring the difficulty of items) established by pre-test populations reflect deltas obtained from final form populations, and the consequent utility of pre-test deltas for constructing final (operational test) forms to meet developed statistical specifications were studied. Data were examined from five subject…
Statistical tests for power-law cross-correlated processes.
Podobnik, Boris; Jiang, Zhi-Qiang; Zhou, Wei-Xing; Stanley, H Eugene
2011-12-01
For stationary time series, the cross-covariance and the cross-correlation as functions of time lag n serve to quantify the similarity of two time series. The latter measure is also used to assess whether the cross-correlations are statistically significant. For nonstationary time series, the analogous measures are detrended cross-correlations analysis (DCCA) and the recently proposed detrended cross-correlation coefficient, ρ(DCCA)(T,n), where T is the total length of the time series and n the window size. For ρ(DCCA)(T,n), we numerically calculated the Cauchy inequality -1 ≤ ρ(DCCA)(T,n) ≤ 1. Here we derive -1 ≤ ρ DCCA)(T,n) ≤ 1 for a standard variance-covariance approach and for a detrending approach. For overlapping windows, we find the range of ρ(DCCA) within which the cross-correlations become statistically significant. For overlapping windows we numerically determine-and for nonoverlapping windows we derive--that the standard deviation of ρ(DCCA)(T,n) tends with increasing T to 1/T. Using ρ(DCCA)(T,n) we show that the Chinese financial market's tendency to follow the U.S. market is extremely weak. We also propose an additional statistical test that can be used to quantify the existence of cross-correlations between two power-law correlated time series.
Statistical tests for power-law cross-correlated processes
NASA Astrophysics Data System (ADS)
Podobnik, Boris; Jiang, Zhi-Qiang; Zhou, Wei-Xing; Stanley, H. Eugene
2011-12-01
For stationary time series, the cross-covariance and the cross-correlation as functions of time lag n serve to quantify the similarity of two time series. The latter measure is also used to assess whether the cross-correlations are statistically significant. For nonstationary time series, the analogous measures are detrended cross-correlations analysis (DCCA) and the recently proposed detrended cross-correlation coefficient, ρDCCA(T,n), where T is the total length of the time series and n the window size. For ρDCCA(T,n), we numerically calculated the Cauchy inequality -1≤ρDCCA(T,n)≤1. Here we derive -1≤ρDCCA(T,n)≤1 for a standard variance-covariance approach and for a detrending approach. For overlapping windows, we find the range of ρDCCA within which the cross-correlations become statistically significant. For overlapping windows we numerically determine—and for nonoverlapping windows we derive—that the standard deviation of ρDCCA(T,n) tends with increasing T to 1/T. Using ρDCCA(T,n) we show that the Chinese financial market's tendency to follow the U.S. market is extremely weak. We also propose an additional statistical test that can be used to quantify the existence of cross-correlations between two power-law correlated time series.
[Statistical tests in medical research: traditional methods vs. multivariate NPC permutation tests].
Arboretti, Rosa; Bordignon, Paolo; Corain, Livio; Palermo, Giuseppe; Pesarin, Fortunato; Salmaso, Luigi
2015-01-01
Statistical tests in medical research: traditional methods vs. multivariate npc permutation tests.Within medical research, a useful statistical tool is based on hypotheses testing in terms of the so-called null, that is the treatment has no effect, and alternative hypotheses, that is the treatment has some effects. By controlling the risks of wrong decisions, empirical data are used in order to possibly reject the null hypotheses in favour of the alternative, so that demonstrating the efficacy of a treatment of interest. The multivariate permutation tests, based on the nonparametric combination - NPC method, provide an innovative, robust and effective hypotheses testing solution to many real problems that are commonly encountered in medical research when multiple end-points are observed. This paper discusses the various approaches to hypothesis testing and the main advantages of NPC tests, which consist in the fact that they require much less stringent assumptions than traditional statistical tests. Moreover, the related results may be extended to the reference population even in case of selection-bias, that is non-random sampling. In this work, we review and discuss some basic testing procedures along with the theoretical and practical relevance of NPC tests showing their effectiveness in medical research. Within the non-parametric methods, NPC tests represent the current "frontier" of statistical research, but already widely available in the practice of analysis of clinical data.
Biostatistics Series Module 7: The Statistics of Diagnostic Tests
Hazra, Avijit; Gogtay, Nithya
2017-01-01
optimum cutoff – the one lying on the “elbow” of the curve. Cohen's kappa (κ) statistic is a measure of inter-rater agreement for categorical variables. It can also be applied to assess how far two tests agree with respect to diagnostic categorization. It is generally thought to be a more robust measure than simple percent agreement calculation since kappa takes into account the agreement occurring by chance. PMID:28216720
Biostatistics Series Module 7: The Statistics of Diagnostic Tests.
Hazra, Avijit; Gogtay, Nithya
2017-01-01
- the one lying on the "elbow" of the curve. Cohen's kappa (κ) statistic is a measure of inter-rater agreement for categorical variables. It can also be applied to assess how far two tests agree with respect to diagnostic categorization. It is generally thought to be a more robust measure than simple percent agreement calculation since kappa takes into account the agreement occurring by chance.
Statistical Tests of Taylor's Hypothesis: An Application to Precipitation Fields
NASA Astrophysics Data System (ADS)
Murthi, A.; Li, B.; Bowman, K.; North, G.; Genton, M.; Sherman, M.
2009-05-01
The Taylor Hypothesis (TH) as applied to rainfall is a proposition about the space-time covariance structure of the rainfall field. Specifically, it supposes that if a spatio-temporal precipitation field with a stationary covariance Cov(r, τ) in both space r and time τ, moves with a constant velocity v, then the temporal covariance at time lag τ is equal to the spatial covariance at space lag v τ, that is, Cov(0, τ) = Cov(v τ, 0). Qualitatively this means that the field evolves slowly in time relative to the advective time scale, which is often referred to as the 'frozen field' hypothesis. Of specific interest is whether there is a cut-off or decorrelation time scale for which the TH holds for a given mean flow velocity v. In this study the validity of the TH is tested for precipitation fields using high-resolution gridded NEXRAD radar reflectivity data produced by the WSI Corporation by employing two different statistical approaches. The first method is based upon rigorous hypothesis testing while the second is based on a simple correlation analysis, which neglects possible dependencies in the correlation estimates. We use radar reflectivity values from the southeastern United States with an approximate horizontal resolution of 4 km x 4 km and a temporal resolution of 15 minutes. During the 4-day period from 2 to 5 May 2002, substantial precipitation occurs in the region of interest, and the motion of the precipitation systems is approximately uniform. The results of both statistical methods suggest that the TH might hold for the shortest space and time scales resolved by the data (4 km and 15 minutes), but that it does not hold for longer periods or larger spatial scales. Also, the simple correlation analysis tends to overestimate the statistical significance through failing to account for correlations between the covariance estimates.
Brannath, Werner; Bretz, Frank; Maurer, Willi; Sarkar, Sanat
2009-12-01
The two-sided Simes test is known to control the type I error rate with bivariate normal test statistics. For one-sided hypotheses, control of the type I error rate requires that the correlation between the bivariate normal test statistics is non-negative. In this article, we introduce a trimmed version of the one-sided weighted Simes test for two hypotheses which rejects if (i) the one-sided weighted Simes test rejects and (ii) both p-values are below one minus the respective weighted Bonferroni adjusted level. We show that the trimmed version controls the type I error rate at nominal significance level alpha if (i) the common distribution of test statistics is point symmetric and (ii) the two-sided weighted Simes test at level 2alpha controls the level. These assumptions apply, for instance, to bivariate normal test statistics with arbitrary correlation. In a simulation study, we compare the power of the trimmed weighted Simes test with the power of the weighted Bonferroni test and the untrimmed weighted Simes test. An additional result of this article ensures type I error rate control of the usual weighted Simes test under a weak version of the positive regression dependence condition for the case of two hypotheses. This condition is shown to apply to the two-sided p-values of one- or two-sample t-tests for bivariate normal endpoints with arbitrary correlation and to the corresponding one-sided p-values if the correlation is non-negative. The Simes test for such types of bivariate t-tests has not been considered before. According to our main result, the trimmed version of the weighted Simes test then also applies to the one-sided bivariate t-test with arbitrary correlation.
Statistical Tests of the PTHA Poisson Assumption for Submarine Landslides
NASA Astrophysics Data System (ADS)
Geist, E. L.; Chaytor, J. D.; Parsons, T.; Ten Brink, U. S.
2012-12-01
We demonstrate that a sequence of dated mass transport deposits (MTDs) can provide information to statistically test whether or not submarine landslides associated with these deposits conform to a Poisson model of occurrence. Probabilistic tsunami hazard analysis (PTHA) most often assumes Poissonian occurrence for all sources, with an exponential distribution of return times. Using dates that define the bounds of individual MTDs, we first describe likelihood and Monte Carlo methods of parameter estimation for a suite of candidate occurrence models (Poisson, lognormal, gamma, Brownian Passage Time). In addition to age-dating uncertainty, both methods incorporate uncertainty caused by the open time intervals: i.e., before the first and after the last event to the present. Accounting for these open intervals is critical when there are a small number of observed events. The optimal occurrence model is selected according to both the Akaike Information Criteria (AIC) and Akaike's Bayesian Information Criterion (ABIC). In addition, the likelihood ratio test can be performed on occurrence models from the same family: e.g., the gamma model relative to the exponential model of return time distribution. Parameter estimation, model selection, and hypothesis testing are performed on data from two IODP holes in the northern Gulf of Mexico that penetrated a total of 14 MTDs, some of which are correlated between the two holes. Each of these events has been assigned an age based on microfossil zonations and magnetostratigraphic datums. Results from these sites indicate that the Poisson assumption is likely valid. However, parameter estimation results using the likelihood method for one of the sites suggest that the events may have occurred quasi-periodically. Methods developed in this study provide tools with which one can determine both the rate of occurrence and the statistical validity of the Poisson assumption when submarine landslides are included in PTHA.
ERIC Educational Resources Information Center
Bolt, Daniel M.; Gierl, Mark J.
2006-01-01
Inspection of differential item functioning (DIF) in translated test items can be informed by graphical comparisons of item response functions (IRFs) across translated forms. Due to the many forms of DIF that can emerge in such analyses, it is important to develop statistical tests that can confirm various characteristics of DIF when present.…
Jones, P L; Swain, W T; Trammell, C J
1999-01-01
When a population is too large for exhaustive study, as is the case for all possible uses of a software system, a statistically correct sample must be drawn as a basis for inferences about the population. A Markov chain usage model is an engineering formalism that represents the population of possible uses for which a product is to be tested. In statistical testing of software based on a Markov chain usage model, the rich body of analytical results available for Markov chains provides numerous insights that can be used in both product development and test planing. A usage model is based on specifications rather than code, so insights that result from model building can inform product decisions in the early stages of a project when the opportunity to prevent problems is the greatest. Statistical testing based on a usage model provides a sound scientific basis for quantifying the reliability of software.
Comparison of Statistical Methods for Detector Testing Programs
Rennie, John Alan; Abhold, Mark
2016-10-14
A typical goal for any detector testing program is to ascertain not only the performance of the detector systems under test, but also the confidence that systems accepted using that testing program’s acceptance criteria will exceed a minimum acceptable performance (which is usually expressed as the minimum acceptable success probability, p). A similar problem often arises in statistics, where we would like to ascertain the fraction, p, of a population of items that possess a property that may take one of two possible values. Typically, the problem is approached by drawing a fixed sample of size n, with the number of items out of n that possess the desired property, x, being termed successes. The sample mean gives an estimate of the population mean p ≈ x/n, although usually it is desirable to accompany such an estimate with a statement concerning the range within which p may fall and the confidence associated with that range. Procedures for establishing such ranges and confidence limits are described in detail by Clopper, Brown, and Agresti for two-sided symmetric confidence intervals.
Transfer of drug dissolution testing by statistical approaches: Case study
AL-Kamarany, Mohammed Amood; EL Karbane, Miloud; Ridouan, Khadija; Alanazi, Fars K.; Hubert, Philippe; Cherrah, Yahia; Bouklouze, Abdelaziz
2011-01-01
The analytical transfer is a complete process that consists in transferring an analytical procedure from a sending laboratory to a receiving laboratory. After having experimentally demonstrated that also masters the procedure in order to avoid problems in the future. Method of transfers is now commonplace during the life cycle of analytical method in the pharmaceutical industry. No official guideline exists for a transfer methodology in pharmaceutical analysis and the regulatory word of transfer is more ambiguous than for validation. Therefore, in this study, Gauge repeatability and reproducibility (R&R) studies associated with other multivariate statistics appropriates were successfully applied for the transfer of the dissolution test of diclofenac sodium as a case study from a sending laboratory A (accredited laboratory) to a receiving laboratory B. The HPLC method for the determination of the percent release of diclofenac sodium in solid pharmaceutical forms (one is the discovered product and another generic) was validated using accuracy profile (total error) in the sender laboratory A. The results showed that the receiver laboratory B masters the test dissolution process, using the same HPLC analytical procedure developed in laboratory A. In conclusion, if the sender used the total error to validate its analytical method, dissolution test can be successfully transferred without mastering the analytical method validation by receiving laboratory B and the pharmaceutical analysis method state should be maintained to ensure the same reliable results in the receiving laboratory. PMID:24109204
Testing Punctuated Equilibrium Theory Using Evolutionary Activity Statistics
NASA Astrophysics Data System (ADS)
Woodberry, O. G.; Korb, K. B.; Nicholson, A. E.
The Punctuated Equilibrium hypothesis (Eldredge and Gould,1972) asserts that most evolutionary change occurs during geologically rapid speciation events, with species exhibiting stasis most of the time. Punctuated Equilibrium is a natural extension of Mayr's theories on peripatric speciation via the founder effect, (Mayr, 1963; Eldredge and Gould, 1972) which associates changes in diversity to a population bottleneck. That is, while the formation of a foundation bottleneck brings an initial loss of genetic variation, it may subsequently result in the emergence of a child species distinctly different from its parent species. In this paper we adapt Bedau's evolutionary activity statistics (Bedau and Packard, 1991) to test these effects in an ALife simulation of speciation. We find a relative increase in evolutionary activity during speciations events, indicating that punctuation is occurring.
A test statistic for the affected-sib-set method.
Lange, K
1986-07-01
This paper discusses generalizations of the affected-sib-pair method. First, the requirement that sib identity-by-descent relations be known unambiguously is relaxed by substituting sib identity-by-state relations. This permits affected sibs to be used even when their parents are unavailable for typing. In the limit of an infinite number of marker alleles each of infinitesimal population frequency, the identity-by-state relations coincide with the usual identity-by-descent relations. Second, a weighted pairs test statistic is proposed that covers affected sib sets of size greater than two. These generalizations make the affected-sib-pair method a more powerful technique for detecting departures from independent segregation of disease and marker phenotypes. A sample calculation suggests such a departure for tuberculoid leprosy and the HLA D locus.
Ergodicity testing for anomalous diffusion: Small sample statistics
NASA Astrophysics Data System (ADS)
Janczura, Joanna; Weron, Aleksander
2015-04-01
The analysis of trajectories recorded in experiments often requires calculating time averages instead of ensemble averages. According to the Boltzmann hypothesis, they are equivalent only under the assumption of ergodicity. In this paper, we implement tools that allow to study ergodic properties. This analysis is conducted in two classes of anomalous diffusion processes: fractional Brownian motion and subordinated Ornstein-Uhlenbeck process. We show that only first of them is ergodic. We demonstrate this by applying rigorous statistical methods: mean square displacement, confidence intervals, and dynamical functional test. Our methodology is universal and can be implemented for analysis of many experimental data not only if a large sample is available but also when there are only few trajectories recorded.
New statistical PDFs: Predictions and tests up to LHC energies
NASA Astrophysics Data System (ADS)
Soffer, Jacques; Bourrely, Claude
2017-03-01
The quantum statistical parton distributions approach proposed more than one decade ago is revisited by considering a larger set of recent and accurate Deep Inelastic Scattering experimental results. It enables us to improve the description of the data by means of a new determination of the parton distributions. This global next-to-leading order QCD analysis leads to a good description of several structure functions, involving unpolarized parton distributions and helicity distributions, in a broad range of x and Q2 and in terms of a rather small number of free parameters. There are several challenging issues, in particular the behavior of d ¯(x )/u ¯(x ) at large x, a possible large positive gluon helicity distribution, etc.. The predictions of this theoretical approach will be tested for single-jet production and charge asymmetry in W± production in p ¯p and pp collisions up to LHC energies, using recent data and also for forthcoming experimental results.
Statistical methods for the blood beryllium lymphocyte proliferation test
Frome, E.L.; Smith, M.H.; Littlefield, L.G.
1996-10-01
The blood beryllium lymphocyte proliferation test (BeLPT) is a modification of the standard lymphocyte proliferation test that is used to identify persons who may have chronic beryllium disease. A major problem in the interpretation of BeLPT test results is outlying data values among the replicate well counts ({approx}7%). A log-linear regression model is used to describe the expected well counts for each set of Be exposure conditions, and the variance of the well counts is proportional to the square of the expected count. Two outlier-resistant regression methods are used to estimate stimulation indices (SIs) and the coefficient of variation. The first approach uses least absolute values (LAV) on the log of the well counts as a method for estimation; the second approach uses a resistant regression version of maximum quasi-likelihood estimation. A major advantage of these resistant methods is that they make it unnecessary to identify and delete outliers. These two new methods for the statistical analysis of the BeLPT data and the current outlier rejection method are applied to 173 BeLPT assays. We strongly recommend the LAV method for routine analysis of the BeLPT. Outliers are important when trying to identify individuals with beryllium hypersensitivity, since these individuals typically have large positive SI values. A new method for identifying large SIs using combined data from the nonexposed group and the beryllium workers is proposed. The log(SI)s are described with a Gaussian distribution with location and scale parameters estimated using resistant methods. This approach is applied to the test data and results are compared with those obtained from the current method. 24 refs., 9 figs., 8 tabs.
Statistical methods for the blood beryllium lymphocyte proliferation test.
Frome, E L; Smith, M H; Littlefield, L G; Neubert, R L; Colyer, S P
1996-01-01
The blood beryllium lymphocyte proliferation test (BeLPT) is a modification of the standard lymphocyte proliferation test that is used to identify persons who may have chronic beryllium disease. A major problem in the interpretation of BeLPT test results is outlying data values among the replicate well counts (approximately 7%). A long-linear regression model is used to describe the expected well counts for each set of Be exposure conditions, and the variance of the well counts is proportional to the square of the expected count. Two outlier-resistant regression methods are used to estimate stimulation indices (SIs) and the coefficient of variation. The first approach uses least absolute values (LAV) on the log of the well counts as a method for estimation; the second approach uses a resistant regression version of maximum quasi-likelihood estimation. A major advantage of these resistant methods is that they make it unnecessary to identify and delete outliers. These two new methods for the statistical analysis of the BeLPT data and the current outlier rejection method are applied to 173 BeLPT assays. We strongly recommend the LAV method for routine analysis of the BeLPT. Outliers are important when trying to identify individuals with beryllium hypersensitivity, since these individuals typically have large positive SI values. A new method for identifying large Sls using combined data from the nonexposed group and the beryllium workers is proposed. The log(SI)s are described with a Gaussian distribution with location and scale parameters estimated using resistant methods. This approach is applied to the test data and results are compared with those obtained from the current method. PMID:8933042
Statistical methods for the blood beryllium lymphocyte proliferation test.
Frome, E L; Smith, M H; Littlefield, L G; Neubert, R L; Colyer, S P
1996-10-01
The blood beryllium lymphocyte proliferation test (BeLPT) is a modification of the standard lymphocyte proliferation test that is used to identify persons who may have chronic beryllium disease. A major problem in the interpretation of BeLPT test results is outlying data values among the replicate well counts (approximately 7%). A long-linear regression model is used to describe the expected well counts for each set of Be exposure conditions, and the variance of the well counts is proportional to the square of the expected count. Two outlier-resistant regression methods are used to estimate stimulation indices (SIs) and the coefficient of variation. The first approach uses least absolute values (LAV) on the log of the well counts as a method for estimation; the second approach uses a resistant regression version of maximum quasi-likelihood estimation. A major advantage of these resistant methods is that they make it unnecessary to identify and delete outliers. These two new methods for the statistical analysis of the BeLPT data and the current outlier rejection method are applied to 173 BeLPT assays. We strongly recommend the LAV method for routine analysis of the BeLPT. Outliers are important when trying to identify individuals with beryllium hypersensitivity, since these individuals typically have large positive SI values. A new method for identifying large Sls using combined data from the nonexposed group and the beryllium workers is proposed. The log(SI)s are described with a Gaussian distribution with location and scale parameters estimated using resistant methods. This approach is applied to the test data and results are compared with those obtained from the current method.
Statistical tests of additional plate boundaries from plate motion inversions
NASA Technical Reports Server (NTRS)
Stein, S.; Gordon, R. G.
1984-01-01
The application of the F-ratio test, a standard statistical technique, to the results of relative plate motion inversions has been investigated. The method tests whether the improvement in fit of the model to the data resulting from the addition of another plate to the model is greater than that expected purely by chance. This approach appears to be useful in determining whether additional plate boundaries are justified. Previous results have been confirmed favoring separate North American and South American plates with a boundary located beween 30 N and the equator. Using Chase's global relative motion data, it is shown that in addition to separate West African and Somalian plates, separate West Indian and Australian plates, with a best-fitting boundary between 70 E and 90 E, can be resolved. These results are generally consistent with the observation that the Indian plate's internal deformation extends somewhat westward of the Ninetyeast Ridge. The relative motion pole is similar to Minster and Jordan's and predicts the NW-SE compression observed in earthquake mechanisms near the Ninetyeast Ridge.
Statistical tests of additional plate boundaries from plate motion inversions
NASA Technical Reports Server (NTRS)
Stein, S.; Gordon, R. G.
1984-01-01
The application of the F-ratio test, a standard statistical technique, to the results of relative plate motion inversions has been investigated. The method tests whether the improvement in fit of the model to the data resulting from the addition of another plate to the model is greater than that expected purely by chance. This approach appears to be useful in determining whether additional plate boundaries are justified. Previous results have been confirmed favoring separate North American and South American plates with a boundary located beween 30 N and the equator. Using Chase's global relative motion data, it is shown that in addition to separate West African and Somalian plates, separate West Indian and Australian plates, with a best-fitting boundary between 70 E and 90 E, can be resolved. These results are generally consistent with the observation that the Indian plate's internal deformation extends somewhat westward of the Ninetyeast Ridge. The relative motion pole is similar to Minster and Jordan's and predicts the NW-SE compression observed in earthquake mechanisms near the Ninetyeast Ridge.
A statistical design for testing apomictic diversification through linkage analysis.
Zeng, Yanru; Hou, Wei; Song, Shuang; Feng, Sisi; Shen, Lin; Xia, Guohua; Wu, Rongling
2014-03-01
The capacity of apomixis to generate maternal clones through seed reproduction has made it a useful characteristic for the fixation of heterosis in plant breeding. It has been observed that apomixis displays pronounced intra- and interspecific diversification, but the genetic mechanisms underlying this diversification remains elusive, obstructing the exploitation of this phenomenon in practical breeding programs. By capitalizing on molecular information in mapping populations, we describe and assess a statistical design that deploys linkage analysis to estimate and test the pattern and extent of apomictic differences at various levels from genotypes to species. The design is based on two reciprocal crosses between two individuals each chosen from a hermaphrodite or monoecious species. A multinomial distribution likelihood is constructed by combining marker information from two crosses. The EM algorithm is implemented to estimate the rate of apomixis and test its difference between two plant populations or species as the parents. The design is validated by computer simulation. A real data analysis of two reciprocal crosses between hickory (Carya cathayensis) and pecan (C. illinoensis) demonstrates the utilization and usefulness of the design in practice. The design provides a tool to address fundamental and applied questions related to the evolution and breeding of apomixis.
Development and testing of improved statistical wind power forecasting methods.
Mendes, J.; Bessa, R.J.; Keko, H.; Sumaili, J.; Miranda, V.; Ferreira, C.; Gama, J.; Botterud, A.; Zhou, Z.; Wang, J.
2011-12-06
Wind power forecasting (WPF) provides important inputs to power system operators and electricity market participants. It is therefore not surprising that WPF has attracted increasing interest within the electric power industry. In this report, we document our research on improving statistical WPF algorithms for point, uncertainty, and ramp forecasting. Below, we provide a brief introduction to the research presented in the following chapters. For a detailed overview of the state-of-the-art in wind power forecasting, we refer to [1]. Our related work on the application of WPF in operational decisions is documented in [2]. Point forecasts of wind power are highly dependent on the training criteria used in the statistical algorithms that are used to convert weather forecasts and observational data to a power forecast. In Chapter 2, we explore the application of information theoretic learning (ITL) as opposed to the classical minimum square error (MSE) criterion for point forecasting. In contrast to the MSE criterion, ITL criteria do not assume a Gaussian distribution of the forecasting errors. We investigate to what extent ITL criteria yield better results. In addition, we analyze time-adaptive training algorithms and how they enable WPF algorithms to cope with non-stationary data and, thus, to adapt to new situations without requiring additional offline training of the model. We test the new point forecasting algorithms on two wind farms located in the U.S. Midwest. Although there have been advancements in deterministic WPF, a single-valued forecast cannot provide information on the dispersion of observations around the predicted value. We argue that it is essential to generate, together with (or as an alternative to) point forecasts, a representation of the wind power uncertainty. Wind power uncertainty representation can take the form of probabilistic forecasts (e.g., probability density function, quantiles), risk indices (e.g., prediction risk index) or scenarios
A Unifying Framework for Teaching Nonparametric Statistical Tests
ERIC Educational Resources Information Center
Bargagliotti, Anna E.; Orrison, Michael E.
2014-01-01
Increased importance is being placed on statistics at both the K-12 and undergraduate level. Research divulging effective methods to teach specific statistical concepts is still widely sought after. In this paper, we focus on best practices for teaching topics in nonparametric statistics at the undergraduate level. To motivate the work, we…
A Unifying Framework for Teaching Nonparametric Statistical Tests
ERIC Educational Resources Information Center
Bargagliotti, Anna E.; Orrison, Michael E.
2014-01-01
Increased importance is being placed on statistics at both the K-12 and undergraduate level. Research divulging effective methods to teach specific statistical concepts is still widely sought after. In this paper, we focus on best practices for teaching topics in nonparametric statistics at the undergraduate level. To motivate the work, we…
McDonald, Janie; Gerard, Patrick D; McMahan, Christopher S; Schucany, William R
2016-12-01
Clustered binary data occur frequently in many application areas. When analyzing data of this form, ignoring key features, such as the intracluster correlation, may lead to inaccurate inference; e.g., inflated Type I error rates. For clustered binary data, Gerard and Schucany (2007) proposed an exact test for examining whether the marginal probability of a response differs from 0.5, which is the null hypothesis considered in the classic sign test. This new test maintains the specified Type I error rate and has more power, when compared to both the classic sign and permutation tests. The test statistic proposed by these authors equally weights the observed data from each cluster, regardless of whether the clusters are of equal size. To further improve the performance of the Gerard and Schucany test, a weighted test statistic is proposed and two weighting schemes are investigated. Seeking to further improve the performance of the proposed test, empirical Bayes estimates of the cluster level success probabilities are utilized. These adaptations lead to 5 new tests, each of which are shown through simulation studies to be superior to the Gerard and Schucany (2007) test. The proposed tests are further illustrated using data from a chemical repellency trial.
McDonald, Janie; Gerard, Patrick D.; McMahan, Christopher S.; Schucany, William R.
2017-01-01
Summary Clustered binary data occur frequently in many application areas. When analyzing data of this form, ignoring key features, such as the intracluster correlation, may lead to inaccurate inference; e.g., inflated Type I error rates. For clustered binary data, Gerard and Schucany (2007) proposed an exact test for examining whether the marginal probability of a response differs from 0.5, which is the null hypothesis considered in the classic sign test. This new test maintains the specified Type I error rate and has more power, when compared to both the classic sign and permutation tests. The test statistic proposed by these authors equally weights the observed data from each cluster, regardless of whether the clusters are of equal size. To further improve the performance of the Gerard and Schucany test, a weighted test statistic is proposed and two weighting schemes are investigated. Seeking to further improve the performance of the proposed test, empirical Bayes estimates of the cluster level success probabilities are utilized. These adaptations lead to 5 new tests, each of which are shown through simulation studies to be superior to the Gerard and Schucany (2007) test. The proposed tests are further illustrated using data from a chemical repellency trial. PMID:28626354
Strong gravitational lensing statistics as a test of cosmogonic scenarios
NASA Technical Reports Server (NTRS)
Cen, Renyue; Gott, J. Richard, III; Ostriker, Jeremiah P.; Turner, Edwin L.
1994-01-01
Gravitational lensing statistics can provide a direct and powerful test of cosmic structure formation theories. Since lensing tests, directly, the magnitude of the nonlinear mass density fluctuations on lines of sight to distant objects, no issues of 'bias' (of mass fluctuations with respect to galaxy density fluctuations) exist here, although lensing observations provide their own ambiguities of interpretation. We develop numerical techniques for generating model density distributions with the very large spatial dynamic range required by lensing considerations and for identifying regions of the simulations capable of multiple image lensing in a conservative and computationally efficient way that should be accurate for splittings significantly larger than 3 seconds. Applying these techniques to existing standard Cold dark matter (CDM) (Omega = 1) and Primeval Baryon Isocurvature (PBI) (Omega = 0.2) simulations (normalized to the Cosmic Background Explorer Satellite (COBE) amplitude), we find that the CDM model predicts large splitting (greater than 8 seconds) lensing events roughly an order-of-magnitude more frequently than the PBI model. Under the reasonable but idealized assumption that lensing structrues can be modeled as singular isothermal spheres (SIS), the predictions can be directly compared to observations of lensing events in quasar samples. Several large splitting (Delta Theta is greater than 8 seconds) cases are predicted in the standard CDM model (the exact number being dependent on the treatment of amplification bias), whereas none is observed. In a formal sense, the comparison excludes the CDM model at high confidence (essentially for the same reason that CDM predicts excessive small-scale cosmic velocity dispersions.) A very rough assessment of low-density but flat CDM model (Omega = 0.3, Lambda/3H(sup 2 sub 0) = 0.7) indicates a far lower and probably acceptable level of lensing. The PBI model is consistent with, but not strongly tested by, the
Strong gravitational lensing statistics as a test of cosmogonic scenarios
NASA Technical Reports Server (NTRS)
Cen, Renyue; Gott, J. Richard, III; Ostriker, Jeremiah P.; Turner, Edwin L.
1994-01-01
Gravitational lensing statistics can provide a direct and powerful test of cosmic structure formation theories. Since lensing tests, directly, the magnitude of the nonlinear mass density fluctuations on lines of sight to distant objects, no issues of 'bias' (of mass fluctuations with respect to galaxy density fluctuations) exist here, although lensing observations provide their own ambiguities of interpretation. We develop numerical techniques for generating model density distributions with the very large spatial dynamic range required by lensing considerations and for identifying regions of the simulations capable of multiple image lensing in a conservative and computationally efficient way that should be accurate for splittings significantly larger than 3 seconds. Applying these techniques to existing standard Cold dark matter (CDM) (Omega = 1) and Primeval Baryon Isocurvature (PBI) (Omega = 0.2) simulations (normalized to the Cosmic Background Explorer Satellite (COBE) amplitude), we find that the CDM model predicts large splitting (greater than 8 seconds) lensing events roughly an order-of-magnitude more frequently than the PBI model. Under the reasonable but idealized assumption that lensing structrues can be modeled as singular isothermal spheres (SIS), the predictions can be directly compared to observations of lensing events in quasar samples. Several large splitting (Delta Theta is greater than 8 seconds) cases are predicted in the standard CDM model (the exact number being dependent on the treatment of amplification bias), whereas none is observed. In a formal sense, the comparison excludes the CDM model at high confidence (essentially for the same reason that CDM predicts excessive small-scale cosmic velocity dispersions.) A very rough assessment of low-density but flat CDM model (Omega = 0.3, Lambda/3H(sup 2 sub 0) = 0.7) indicates a far lower and probably acceptable level of lensing. The PBI model is consistent with, but not strongly tested by, the
A statistical framework for testing modularity in multidimensional data.
Márquez, Eladio J
2008-10-01
Modular variation of multivariate traits results from modular distribution of effects of genetic and epigenetic interactions among those traits. However, statistical methods rarely detect truly modular patterns, possibly because the processes that generate intramodular associations may overlap spatially. Methodologically, this overlap may cause multiple patterns of modularity to be equally consistent with observed covariances. To deal with this indeterminacy, the present study outlines a framework for testing a priori hypotheses of modularity in which putative modules are mathematically represented as multidimensional subspaces embedded in the data. Model expectations are computed by subdividing the data into arrays of variables, and intermodular interactions are represented by overlapping arrays. Covariance structures are thus modeled as the outcome of complex and nonorthogonal intermodular interactions. This approach is demonstrated by analyzing mandibular modularity in nine rodent species. A total of 620 models are fit to each species, and the most strongly supported are heuristically modified to improve their fit. Five modules common to all species are identified, which approximately map to the developmental modules of the mandible. Within species, these modules are embedded within larger "super-modules," suggesting that these conserved modules act as building blocks from which covariation patterns are built.
ERIC Educational Resources Information Center
Wainer, Howard; And Others
Four researchers at the Educational Testing Service describe what they consider some of the most vexing research problems they face. While these problems are not completely statistical, they all have major statistical components. Following the introduction (section 1), in section 2, "Problems with the Simultaneous Estimation of Many True…
On Some Assumptions of the Null Hypothesis Statistical Testing
ERIC Educational Resources Information Center
Patriota, Alexandre Galvão
2017-01-01
Bayesian and classical statistical approaches are based on different types of logical principles. In order to avoid mistaken inferences and misguided interpretations, the practitioner must respect the inference rules embedded into each statistical method. Ignoring these principles leads to the paradoxical conclusions that the hypothesis…
Testing the Dark Energy with Gravitational Lensing Statistics
NASA Astrophysics Data System (ADS)
Cao, Shuo; Covone, Giovanni; Zhu, Zong-Hong
2012-08-01
We study the redshift distribution of two samples of early-type gravitational lenses, extracted from a larger collection of 122 systems, to constrain the cosmological constant in the ΛCDM model and the parameters of a set of alternative dark energy models (XCDM, Dvali-Gabadadze-Porrati, and Ricci dark energy models), in a spatially flat universe. The likelihood is maximized for ΩΛ = 0.70 ± 0.09 when considering the sample excluding the Sloan Lens ACS systems (known to be biased toward large image-separation lenses) and no-evolution, and ΩΛ = 0.81 ± 0.05 when limiting to gravitational lenses with image separation Δθ > 2'' and no-evolution. In both cases, results accounting for galaxy evolution are consistent within 1σ. The present test supports the accelerated expansion, by excluding the null hypothesis (i.e., ΩΛ = 0) at more than 4σ, regardless of the chosen sample and assumptions on the galaxy evolution. A comparison between competitive world models is performed by means of the Bayesian information criterion. This shows that the simplest cosmological constant model—that has only one free parameter—is still preferred by the available data on the redshift distribution of gravitational lenses. We perform an analysis of the possible systematic effects, finding that the systematic errors due to sample incompleteness, galaxy evolution, and model uncertainties approximately equal the statistical errors, with present-day data. We find that the largest sources of systemic errors are the dynamical normalization and the high-velocity cutoff factor, followed by the faint-end slope of the velocity dispersion function.
Statistical tests for taxonomic distinctiveness from observations of monophyly.
Rosenberg, Noah A
2007-02-01
The observation of monophyly for a specified set of genealogical lineages is often used to place the lineages into a distinctive taxonomic entity. However, it is sometimes possible that monophyly of the lineages can occur by chance as an outcome of the random branching of lineages within a single taxon. Thus, especially for small samples, an observation of monophyly for a set of lineages--even if strongly supported statistically--does not necessarily indicate that the lineages are from a distinctive group. Here I develop a test of the null hypothesis that monophyly is a chance outcome of random branching. I also compute the sample size required so that the probability of chance occurrence of monophyly of a specified set of lineages lies below a prescribed tolerance. Under the null model of random branching, the probability that monophyly of the lineages in an index group occurs by chance is substantial if the sample is highly asymmetric, that is, if only a few of the sampled lineages are from the index group, or if only a few lineages are external to the group. If sample sizes are similar inside and outside the group of interest, however, chance occurrence of monophyly can be rejected at stringent significance levels (P < 10(-5)) even for quite small samples (approximately 20 total lineages). For a fixed total sample size, rejection of the null hypothesis of random branching in a single taxon occurs at the most stringent level if samples of nearly equal size inside and outside the index group--with a slightly greater size within the index group--are used. Similar results apply, with smaller sample sizes needed, when reciprocal monophyly of two groups, rather than monophyly of a single group, is of interest. The results suggest minimal sample sizes required for inferences to be made about taxonomic distinctiveness from observations of monophyly.
TESTING THE DARK ENERGY WITH GRAVITATIONAL LENSING STATISTICS
Cao Shuo; Zhu Zonghong; Covone, Giovanni
2012-08-10
We study the redshift distribution of two samples of early-type gravitational lenses, extracted from a larger collection of 122 systems, to constrain the cosmological constant in the {Lambda}CDM model and the parameters of a set of alternative dark energy models (XCDM, Dvali-Gabadadze-Porrati, and Ricci dark energy models), in a spatially flat universe. The likelihood is maximized for {Omega}{sub {Lambda}} = 0.70 {+-} 0.09 when considering the sample excluding the Sloan Lens ACS systems (known to be biased toward large image-separation lenses) and no-evolution, and {Omega}{sub {Lambda}} = 0.81 {+-} 0.05 when limiting to gravitational lenses with image separation {Delta}{theta} > 2'' and no-evolution. In both cases, results accounting for galaxy evolution are consistent within 1{sigma}. The present test supports the accelerated expansion, by excluding the null hypothesis (i.e., {Omega}{sub {Lambda}} = 0) at more than 4{sigma}, regardless of the chosen sample and assumptions on the galaxy evolution. A comparison between competitive world models is performed by means of the Bayesian information criterion. This shows that the simplest cosmological constant model-that has only one free parameter-is still preferred by the available data on the redshift distribution of gravitational lenses. We perform an analysis of the possible systematic effects, finding that the systematic errors due to sample incompleteness, galaxy evolution, and model uncertainties approximately equal the statistical errors, with present-day data. We find that the largest sources of systemic errors are the dynamical normalization and the high-velocity cutoff factor, followed by the faint-end slope of the velocity dispersion function.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Statistical distributions of site and background soil samples often do not meet the assumptions of statistical tests. This is true even of non-parametric tests. This paper evaluates several statistical tests over a variety of cases involving realistic population distribution scen...
A weighted generalized score statistic for comparison of predictive values of diagnostic tests
Kosinski, Andrzej S.
2013-01-01
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations which are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we present, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic which incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, it always reduces to the score statistic in the independent samples situation, and it preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the weighted generalized score test statistic in a general GEE setting. PMID:22912343
The Geometry of Probability, Statistics, and Test Theory.
ERIC Educational Resources Information Center
Zimmerman, Donald W.; Zumbo, Bruno D.
2001-01-01
Presents a model of tests and measurement that identifies test scores with Hilbert space vectors and true and error components of scores with linear operators. This geometric point of view brings to light relations among elementary concepts in test theory, including reliability, validity, and parallel tests. (Author/SLD)
Stork, LeAnna M.; Gennings, Chris; Carchman, Richard; Carter, Jr., Walter H.; Pounds, Joel G.; Mumtaz, Moiz
2006-12-01
Several assumptions, defined and undefined, are used in the toxicity assessment of chemical mixtures. In scientific practice mixture components in the low-dose region, particularly subthreshold doses, are often assumed to behave additively (i.e., zero interaction) based on heuristic arguments. This assumption has important implications in the practice of risk assessment, but has not been experimentally tested. We have developed methodology to test for additivity in the sense of Berenbaum (Advances in Cancer Research, 1981), based on the statistical equivalence testing literature where the null hypothesis of interaction is rejected for the alternative hypothesis of additivity when data support the claim. The implication of this approach is that conclusions of additivity are made with a false positive rate controlled by the experimenter. The claim of additivity is based on prespecified additivity margins, which are chosen using expert biological judgment such that small deviations from additivity, which are not considered to be biologically important, are not statistically significant. This approach is in contrast to the usual hypothesis-testing framework that assumes additivity in the null hypothesis and rejects when there is significant evidence of interaction. In this scenario, failure to reject may be due to lack of statistical power making the claim of additivity problematic. The proposed method is illustrated in a mixture of five organophosphorus pesticides that were experimentally evaluated alone and at relevant mixing ratios. Motor activity was assessed in adult male rats following acute exposure. Four low-dose mixture groups were evaluated. Evidence of additivity is found in three of the four low-dose mixture groups.The proposed method tests for additivity of the whole mixture and does not take into account subset interactions (e.g., synergistic, antagonistic) that may have occurred and cancelled each other out.
Interpretation of Statistical Significance Testing: A Matter of Perspective.
ERIC Educational Resources Information Center
McClure, John; Suen, Hoi K.
1994-01-01
This article compares three models that have been the foundation for approaches to the analysis of statistical significance in early childhood research--the Fisherian and the Neyman-Pearson models (both considered "classical" approaches), and the Bayesian model. The article concludes that all three models have a place in the analysis of research…
Ensuring Positiveness of the Scaled Difference Chi-Square Test Statistic
ERIC Educational Resources Information Center
Satorra, Albert; Bentler, Peter M.
2010-01-01
A scaled difference test statistic T[tilde][subscript d] that can be computed from standard software of structural equation models (SEM) by hand calculations was proposed in Satorra and Bentler (Psychometrika 66:507-514, 2001). The statistic T[tilde][subscript d] is asymptotically equivalent to the scaled difference test statistic T[bar][subscript…
Understanding the Sampling Distribution and Its Use in Testing Statistical Significance.
ERIC Educational Resources Information Center
Breunig, Nancy A.
Despite the increasing criticism of statistical significance testing by researchers, particularly in the publication of the 1994 American Psychological Association's style manual, statistical significance test results are still popular in journal articles. For this reason, it remains important to understand the logic of inferential statistics. A…
Statistical algorithms for a comprehensive test ban treaty discrimination framework
Foote, N.D.; Anderson, D.N.; Higbee, K.T.; Miller, N.E.; Redgate, T.; Rohay, A.C.; Hagedorn, D.N.
1996-10-01
Seismic discrimination is the process of identifying a candidate seismic event as an earthquake or explosion using information from seismic waveform features (seismic discriminants). In the CTBT setting, low energy seismic activity must be detected and identified. A defensible CTBT discrimination decision requires an understanding of false-negative (declaring an event to be an earthquake given it is an explosion) and false-position (declaring an event to be an explosion given it is an earthquake) rates. These rates are derived from a statistical discrimination framework. A discrimination framework can be as simple as a single statistical algorithm or it can be a mathematical construct that integrates many different types of statistical algorithms and CTBT technologies. In either case, the result is the identification of an event and the numerical assessment of the accuracy of an identification, that is, false-negative and false-positive rates. In Anderson et al., eight statistical discrimination algorithms are evaluated relative to their ability to give results that effectively contribute to a decision process and to be interpretable with physical (seismic) theory. These algorithms can be discrimination frameworks individually or components of a larger framework. The eight algorithms are linear discrimination (LDA), quadratic discrimination (QDA), variably regularized discrimination (VRDA), flexible discrimination (FDA), logistic discrimination, K-th nearest neighbor (KNN), kernel discrimination, and classification and regression trees (CART). In this report, the performance of these eight algorithms, as applied to regional seismic data, is documented. Based on the findings in Anderson et al. and this analysis: CART is an appropriate algorithm for an automated CTBT setting.
Evaluating clinical significance: incorporating robust statistics with normative comparison tests.
van Wieringen, Katrina; Cribbie, Robert A
2014-05-01
The purpose of this study was to evaluate a modified test of equivalence for conducting normative comparisons when distribution shapes are non-normal and variances are unequal. A Monte Carlo study was used to compare the empirical Type I error rates and power of the proposed Schuirmann-Yuen test of equivalence, which utilizes trimmed means, with that of the previously recommended Schuirmann and Schuirmann-Welch tests of equivalence when the assumptions of normality and variance homogeneity are satisfied, as well as when they are not satisfied. The empirical Type I error rates of the Schuirmann-Yuen were much closer to the nominal α level than those of the Schuirmann or Schuirmann-Welch tests, and the power of the Schuirmann-Yuen was substantially greater than that of the Schuirmann or Schuirmann-Welch tests when distributions were skewed or outliers were present. The Schuirmann-Yuen test is recommended for assessing clinical significance with normative comparisons.
Statistics of sampling for microbiological testing of foodborne pathogens
USDA-ARS?s Scientific Manuscript database
Despite the many recent advances in protocols for testing for pathogens in foods, a number of challenges still exist. For example, the microbiological safety of food cannot be completely ensured by testing because microorganisms are not evenly distributed throughout the food. Therefore, since it i...
Statistical Revisions in the Washington Pre-College Testing Program.
ERIC Educational Resources Information Center
Beanblossom, Gary F.; And Others
The Washington Pre-College (WPC) program decided, in fall 1967, to inaugurate in April 1968 the testing of high school students during the spring of their junior year. The advantages of this shift from senior year testing were to provide guidance data for earlier, more extensive use in high school and to make these data available to colleges at…
Estimating Statistical Power When Making Adjustments for Multiple Tests
ERIC Educational Resources Information Center
Porter, Kristin E.
2016-01-01
In recent years, there has been increasing focus on the issue of multiple hypotheses testing in education evaluation studies. In these studies, researchers are typically interested in testing the effectiveness of an intervention on multiple outcomes, for multiple subgroups, at multiple points in time or across multiple treatment groups. When…
Statistical tests for closure of plate motion circuits
NASA Technical Reports Server (NTRS)
Gordon, Richard G.; Stein, Seth; Demets, Charles; Argus, Donald F.
1987-01-01
Two methods, one based on a chi-square test and the second on an F-ratio test, of testing for the plate motion circuit closures are described and evaluated. The chi-square test is used to evaluate goodness of fit, and it is assumed that the assigned errors are accurate estimates of the true errors in the data. The F-ratio test is used to compare variances of distributions, and it is assumed that the relative values of assigned error are accurate. The two methods are applied to the data of Minster and Jordan (1978) on the motion of the three plates that meet at the Galapagos Triple Junction, and the motion of the three plates that meet at the Indian Ocean Triple Junction. It is noted that the F-ratio plate circuit closure test is more useful than the chi-square test for identifying systematic misfits in data because the chi-square test overestimates the errors of plate motion data.
Kotliar, K E; Lanzl, I M
2016-10-01
The use and the understanding of statistics are very important for biomedical research and for the clinical practice. This is particularly true for estimation of the possibilities for different diagnostic and therapy options in the field of glaucoma. The apparent complexity and contraintuitiveness of statistics along with a cautious acceptance by many physicians, might be the cause of conscious and unconscious manipulation with data representation and interpretation. Comprehendable clarification of some typical errors in the handling of medical statistical data. Using two hypothetical examples from glaucoma diagnostics the presentation of the effect of a hypotensive drug and interpretation of the results of a diagnostic test and typical statistical applications and sources of error are analyzed in detail and discussed. Mechanisms of data manipulation and incorrect data interpretation are elucidated. Typical sources of error in the statistical analysis and data presentation are explained. The practical examples analyzed demonstrate the need to understand the basics of statistics and to be able to apply them correctly. The lack of basic knowledge or half-knowledge in medical statistics can lead to misunderstandings, confusion and wrong decisions in medical research and also in clinical practice.
New heterogeneous test statistics for the unbalanced fixed-effect nested design.
Guo, Jiin-Huarng; Billard, L; Luh, Wei-Ming
2011-05-01
When the underlying variances are unknown or/and unequal, using the conventional F test is problematic in the two-factor hierarchical data structure. Prompted by the approximate test statistics (Welch and Alexander-Govern methods), the authors develop four new heterogeneous test statistics to test factor A and factor B nested within A for the unbalanced fixed-effect two-stage nested design under variance heterogeneity. The actual significance levels and statistical power of the test statistics were compared in a simulation study. The results show that the proposed procedures maintain better Type I error rate control and have greater statistical power than those obtained by the conventional F test in various conditions. Therefore, the proposed test statistics are recommended in terms of robustness and easy implementation. ©2010 The British Psychological Society.
A statistical approach to nondestructive testing of laser welds
Duncan, H.A.
1983-07-01
A statistical analysis of the data obtained from a relatively new nondestructive technique for laser welding is presented. The technique is one in which information relating to the quality of the welded joint is extracted from the high intensity plume which is generated from the materials that are welded. The system is such that the detected plume is processed to give a numerical value associated with the material vaporization and consequently, the weld quality. Optimum thresholds for the region in which a weld can be considered as acceptable are determined based on the Neyman-Pearson criterion and Bayes rule.
Choosing statistical tests: part 12 of a series on evaluation of scientific publications.
du Prel, Jean-Baptist; Röhrig, Bernd; Hommel, Gerhard; Blettner, Maria
2010-05-01
The interpretation of scientific articles often requires an understanding of the methods of inferential statistics. This article informs the reader about frequently used statistical tests and their correct application. The most commonly used statistical tests were identified through a selective literature search on the methodology of medical research publications. These tests are discussed in this article, along with a selection of other standard methods of inferential statistics. Readers who are acquainted not just with descriptive methods, but also with Pearson's chi-square test, Fisher's exact test, and Student's t test will be able to interpret a large proportion of medical research articles. Criteria are presented for choosing the proper statistical test to be used out of the most frequently applied tests. An algorithm and a table are provided to facilitate the selection of the appropriate test.
ERIC Educational Resources Information Center
Monterde-i-Bort, Hector; Frias-Navarro, Dolores; Pascual-Llobell, Juan
2010-01-01
The empirical study we present here deals with a pedagogical issue that has not been thoroughly explored up until now in our field. Previous empirical studies in other sectors have identified the opinions of researchers about this topic, showing that completely unacceptable interpretations have been made of significance tests and other statistical…
ERIC Educational Resources Information Center
Monterde-i-Bort, Hector; Frias-Navarro, Dolores; Pascual-Llobell, Juan
2010-01-01
The empirical study we present here deals with a pedagogical issue that has not been thoroughly explored up until now in our field. Previous empirical studies in other sectors have identified the opinions of researchers about this topic, showing that completely unacceptable interpretations have been made of significance tests and other statistical…
COMBAT: A Combined Association Test for Genes Using Summary Statistics.
Wang, Minghui; Huang, Jianfei; Liu, Yiyuan; Ma, Li; Potash, James B; Han, Shizhong
2017-09-06
Genome-wide association studies (GWAS) have been widely used for identifying common variants associated with complex diseases. Traditional analysis of GWAS typically examines one marker at a time, usually single nucleotide polymorphisms (SNPs), to identify individual variants associated with a disease. However, due to the small effect sizes of common variants, the power to detect individual risk variants is generally low. As a complementary approach to SNP-level analysis, a variety of gene-based association tests have been proposed. However, the power of existing gene-based tests is often dependent on the underlying genetic models, and it is not known a priori which test is optimal. Here we propose a combined association test (COMBAT) for genes, which incorporates strengths from existing gene-based tests and shows higher overall performance than any individual test. Our method does not require raw genotype or phenotype data, but needs only SNP-level p-values and correlations between SNPs from ancestry-matched samples. Extensive simulations showed that COMBAT has an appropriate type I error rate, maintains higher power across a wide range of genetic models, and is more robust than any individual gene-based test. We further demonstrated the superior performance of COMBAT over several other gene-based tests through reanalysis of the meta-analytic results of GWAS for bipolar disorder. Our method allows for the more powerful application of gene-based analysis to complex diseases, which will have broad use given that GWAS summary results are increasingly publicly available. Copyright © 2017, Genetics.
Statistical tests between competing hypotheses of Hox cluster evolution.
Lanfear, Robert; Bromham, Lindell
2008-10-01
The Hox genes encode transcription factors that play vital roles in the anterior-posterior patterning of all bilaterian phyla studied to date. Additionally, the gain of Hox genes by duplication has been widely implicated as a driving force in the evolution of animal body plans. Because of this, reconstructing the evolution of the Hox cluster has been the focus of intense research interest. It has been commonly assumed that an ancestral four-gene ProtoHox cluster was duplicated early in animal evolution to give rise to the Hox and ParaHox clusters. However, this hypothesis has recently been called into question, and a number of alternative hypotheses of Hox and ParaHox gene evolution have been proposed. Here, we present the first statistical comparisons of current hypotheses of Hox and ParaHox gene evolution. We use two statistical methods that represent two different approaches to the treatment of phylogenetic uncertainty. In the first method, we estimate the maximum-likelihood tree for each hypothesis and compare these trees to one another using a parametric bootstrapping approach. In the second method, we use Bayesian phylogenetics to estimate the posterior distribution of trees, then we calculate the support for each hypothesis from this distribution. The results of both methods are largely congruent. We find that we are able to reject five out of the eight current hypotheses of Hox and ParaHox gene evolution that we consider. We conclude that the ProtoHox cluster is likely to have contained either three or four genes but that there is insufficient phylogenetic signal in the homeodomains to distinguish between these alternatives.
Leiva, David; Solanas, Antonio; Salafranca, Lluís
2008-05-01
In the present article, we focus on two indices that quantify directionality and skew-symmetrical patterns in social interactions as measures of social reciprocity: the directional consistency (DC) and skew-symmetry indices. Although both indices enable researchers to describe social groups, most studies require statistical inferential tests. The main aims of the present study are first, to propose an overall statistical technique for testing null hypotheses regarding social reciprocity in behavioral studies, using the DC and skew-symmetry statistics (Phi) at group level; and second, to compare both statistics in order to allow researchers to choose the optimal measure depending on the conditions. In order to allow researchers to make statistical decisions, statistical significance for both statistics has been estimated by means of a Monte Carlo simulation. Furthermore, this study will enable researchers to choose the optimal observational conditions for carrying out their research, since the power of the statistical tests has been estimated.
The Michigan Alcoholism Screening Test (MAST): A Statistical Validation Analysis
ERIC Educational Resources Information Center
Laux, John M.; Newman, Isadore; Brown, Russ
2004-01-01
This study extends the Michigan Alcoholism Screening Test (MAST; M. L. Selzer, 1971) literature base by examining 4 issues related to the validity of the MAST scores. Specifically, the authors examine the validity of the MAST scores in light of the presence of impression management, participant demographic variables, and item endorsement…
The Michigan Alcoholism Screening Test (MAST): A Statistical Validation Analysis
ERIC Educational Resources Information Center
Laux, John M.; Newman, Isadore; Brown, Russ
2004-01-01
This study extends the Michigan Alcoholism Screening Test (MAST; M. L. Selzer, 1971) literature base by examining 4 issues related to the validity of the MAST scores. Specifically, the authors examine the validity of the MAST scores in light of the presence of impression management, participant demographic variables, and item endorsement…
Misapplication of a Statistical Test: Comment on “Lies, Damned Lies, and Statistics (in Geology)”
NASA Astrophysics Data System (ADS)
Weigel, Robert S.
2011-02-01
In his Forum, P. Vermeesch (Eos, 90(47), 443, doi:10.1029/2009EO470004, 2009) argues that “the strong dependence of p values on sample size makes them uninterpretable” with an example where p values in a hypothesis test using Pearson's chi-square statistic differed by a factor of 1016 when the sample size decreased tenfold. The data were a sequence of magnitude 4 or larger earthquake events (N = 118,415) spanning 3654 days [U.S. Geological Survey, 2010]. There are two problems with the analysis. First, Vermeesch applied the chi-square test to data with statistical properties that are inconsistent with those assumed in the derivation of the chi-square test. Second, he made an assumption that, using a straightforward calculation, can be shown to be inconsistent with the data. I address here only problems related to the application of statistics without reference to any additional physical processes that may also need to be addressed before statistical analysis is performed (e.g., the physics of how aftershocks are related to main earthquakes).
A weighted generalized score statistic for comparison of predictive values of diagnostic tests.
Kosinski, Andrzej S
2013-03-15
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.
A Statistical Test of Uniformity in Solar Cycle Indices
NASA Technical Reports Server (NTRS)
Hathaway David H.
2012-01-01
Several indices are used to characterize the solar activity cycle. Key among these are: the International Sunspot Number, the Group Sunspot Number, Sunspot Area, and 10.7 cm Radio Flux. A valuable aspect of these indices is the length of the record -- many decades and many (different) 11-year cycles. However, this valuable length-of-record attribute has an inherent problem in that it requires many different observers and observing systems. This can lead to non-uniformity in the datasets and subsequent erroneous conclusions about solar cycle behavior. The sunspot numbers are obtained by counting sunspot groups and individual sunspots on a daily basis. This suggests that the day-to-day and month-to-month variations in these numbers should follow Poisson Statistics and be proportional to the square-root of the sunspot numbers themselves. Examining the historical records of these indices indicates that this is indeed the case - even with Sunspot Area and 10.7 cm Radio Flux. The ratios of the RMS variations to the square-root of the indices themselves are relatively constant with little variation over the phase of each solar cycle or from small to large solar cycles. There are, however, important step-like changes in these ratios associated with changes in observer and/or observer system. Here we show how these variations can be used to construct more uniform datasets.
Testing of hypotheses about altitude decompression sickness by statistical analyses
NASA Technical Reports Server (NTRS)
Van Liew, H. D.; Burkard, M. E.; Conkin, J.; Powell, M. R. (Principal Investigator)
1996-01-01
This communication extends a statistical analysis of forced-descent decompression sickness at altitude in exercising subjects (J Appl Physiol 1994; 76:2726-2734) with a data subset having an additional explanatory variable, rate of ascent. The original explanatory variables for risk-function analysis were environmental pressure of the altitude, duration of exposure, and duration of pure-O2 breathing before exposure; the best fit was consistent with the idea that instantaneous risk increases linearly as altitude exposure continues. Use of the new explanatory variable improved the fit of the smaller data subset, as indicated by log likelihood. Also, with ascent rate accounted for, replacement of the term for linear accrual of instantaneous risk by a term for rise and then decay made a highly significant improvement upon the original model (log likelihood increased by 37 log units). The authors conclude that a more representative data set and removal of the variability attributable to ascent rate allowed the rise-and-decay mechanism, which is expected from theory and observations, to become manifest.
Testing of hypotheses about altitude decompression sickness by statistical analyses
NASA Technical Reports Server (NTRS)
Van Liew, H. D.; Burkard, M. E.; Conkin, J.; Powell, M. R. (Principal Investigator)
1996-01-01
This communication extends a statistical analysis of forced-descent decompression sickness at altitude in exercising subjects (J Appl Physiol 1994; 76:2726-2734) with a data subset having an additional explanatory variable, rate of ascent. The original explanatory variables for risk-function analysis were environmental pressure of the altitude, duration of exposure, and duration of pure-O2 breathing before exposure; the best fit was consistent with the idea that instantaneous risk increases linearly as altitude exposure continues. Use of the new explanatory variable improved the fit of the smaller data subset, as indicated by log likelihood. Also, with ascent rate accounted for, replacement of the term for linear accrual of instantaneous risk by a term for rise and then decay made a highly significant improvement upon the original model (log likelihood increased by 37 log units). The authors conclude that a more representative data set and removal of the variability attributable to ascent rate allowed the rise-and-decay mechanism, which is expected from theory and observations, to become manifest.
Statistical analysis of shard and canister glass correlation test
Pulsipher, B.
1990-12-01
The vitrification facility at West Valley, New York will be used to incorporate nuclear waste into a vitrified waste form. Waste Acceptance Preliminary Specifications (WAPS) will be used to determine the acceptability of the waste form product. These specifications require chemical characterization of the waste form produced. West Valley Nuclear Services (WVNS) intends to characterize canister contents by obtaining shard samples from the top of the canisters prior to final sealing. A study was conducted to determine whether shard samples taken from the top of canisters filled with vitrified nuclear waste could be considered representative and therefore used to characterize the elemental composition of the entire canister contents. Three canisters produced during the SF-12 melter run conducted at WVNS were thoroughly sampled by core drilling at several axial and radial locations and by obtaining shard samples from the top of the canisters. Chemical analyses were performed and the resulting data were statistically analyzed by Pacific Northwest Laboratory (PNL). If one can assume that the process controls employed by WVNS during the SF-12 run are representative of those to be employed during future melter runs, shard samples can be used to characterize the canister contents. However, if batch-to-batch variations cannot be controlled to the acceptable levels observed from the SF-12 data, the representativeness of shard samples will be in question. The estimates of process and within-canister variations provided herein will prove valuable in determining the required frequency and number of shard samples to meet waste form qualification objectives.
Experimental tests of a statistical mechanics of static granular media
NASA Astrophysics Data System (ADS)
Schr"Oter, Matthias
2005-11-01
In 1989 Edwards and Oakeshott proposed a statistical mechanics theory of static granular materials described by a temperature-like state variable named compactivity [1]. We have made the first measurement of the compactivity of a granular material [2]. We have examined a granular column driven by flow pulses and have found that the system explores its phase space of mechanically stable configurations in a history-independent way. The system quickly approaches a steady state; the volume fluctuations about this steady state are Gaussian. The mean volume fraction can be varied by changing the flow rate of the pulses. We calculate the compactivity from the standard deviation of the volume fluctuations [3]. This talk will address the following two questions: (a) Are compactivity values measured with our ``thermometer'' different from values one might measure with a ``thermometer'' based on the grain volume distribution [4]? (b) Can compactivity be a control parameter of granular systems, for example, in size segregation in binary granular mixtures? [1] Edwards and Oakeshott, Physica A 157, 1080 (1989). [2] Schr"oter, Goldman, and Swinney, Phys. Rev. E 71, 030301 (2005). [3] Nowak, Knight, Ben-Naim, Jaeger, and Nagel, Phys. Rev. E 57, 1971 (1988). [4] Edwards, Bruji'c, and Makse, in Unifying Concepts in Granular Media and Glasses, edited by Coniglio et al. (Elsevier, Amsterdam, 2004)
Testing the DGP model with gravitational lensing statistics
NASA Astrophysics Data System (ADS)
Zhu, Zong-Hong; Sereno, M.
2008-09-01
Aims: The self-accelerating braneworld model (DGP) appears to provide a simple alternative to the standard ΛCDM cosmology to explain the current cosmic acceleration, which is strongly indicated by measurements of type Ia supernovae, as well as other concordant observations. Methods: We investigate observational constraints on this scenario provided by gravitational-lensing statistics using the Cosmic Lens All-Sky Survey (CLASS) lensing sample. Results: We show that a substantial part of the parameter space of the DGP model agrees well with that of radio source gravitational lensing sample. Conclusions: In the flat case, Ω_K=0, the likelihood is maximized, L=L_max, for ΩM = 0.30-0.11+0.19. If we relax the prior on Ω_K, the likelihood peaks at Ω_M,Ωr_c ≃ 0.29, 0.12, slightly in the region of open models. The confidence contours are, however, elongated such that we are unable to discard any of the close, flat or open models.
LATTE Linking Acoustic Tests and Tagging Using Statistical Estimation
2015-09-30
DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. FINAL REPORT LATTE – Linking Acoustic Tests and Tagging using...whales (some from animals exposed to acoustic stimuli); (2) medium-term satellite tagging studies of individual whales (some from data collected...during navy exercises); and (3) long-term passive acoustic monitoring from bottom-mounted hydrophones (much collected during exercises). All data came
Development and performances of a high statistics PMT test facility
NASA Astrophysics Data System (ADS)
Maximiliano Mollo, Carlos
2016-04-01
Since almost a century photomultipliers have been the main sensors for photon detection in nuclear and astro-particle physics experiments. In recent years the search for cosmic neutrinos gave birth to enormous size experiments (Antares, Kamiokande, Super-Kamiokande, etc.) and even kilometric scale experiments as ICECUBE and the future KM3NeT. A very large volume neutrino telescope like KM3NeT requires several hundreds of thousands photomultipliers. The performance of the telescope strictly depends on the performance of each PMT. For this reason, it is mandatory to measure the characteristics of each single sensor. The characterization of a PMT normally requires more than 8 hours mostly due to the darkening step. This means that it is not feasible to measure the parameters of each PMT of a neutrino telescope without a system able to test more than one PMT simultaneously. For this application, we have designed, developed and realized a system able to measure the main characteristics of 62 3-inch photomultipliers simultaneously. Two measurement sessions per day are possible. In this work, we describe the design constraints and how they have been satisfied. Finally, we show the performance of the system and the first results coming from the test of few thousand tested PMTs.
Evaluation of a New Mean Scaled and Moment Adjusted Test Statistic for SEM
ERIC Educational Resources Information Center
Tong, Xiaoxiao; Bentler, Peter M.
2013-01-01
Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and 2 well-known robust test…
Normality Tests for Statistical Analysis: A Guide for Non-Statisticians
Ghasemi, Asghar; Zahediasl, Saleh
2012-01-01
Statistical errors are common in scientific literature and about 50% of the published articles have at least one error. The assumption of normality needs to be checked for many statistical procedures, namely parametric tests, because their validity depends on it. The aim of this commentary is to overview checking for normality in statistical analysis using SPSS. PMID:23843808
Evaluation of a New Mean Scaled and Moment Adjusted Test Statistic for SEM
ERIC Educational Resources Information Center
Tong, Xiaoxiao; Bentler, Peter M.
2013-01-01
Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and 2 well-known robust test…
The Power of Statistical Tests for Moderators in Meta-Analysis
ERIC Educational Resources Information Center
Hedges, Larry V.; Pigott, Therese D.
2004-01-01
Calculation of the statistical power of statistical tests is important in planning and interpreting the results of research studies, including meta-analyses. It is particularly important in moderator analyses in meta-analysis, which are often used as sensitivity analyses to rule out moderator effects but also may have low statistical power. This…
Giant Magellan Telescope site testing: PWV statistics and calibration
NASA Astrophysics Data System (ADS)
Thomas-Osip, Joanna E.; Prieto, Gabriel; McWilliam, Andrew; Phillips, Mark M.; McCarthy, Patrick; Johns, Matt; Querel, Richard; Naylor, David
2010-07-01
Cerro Las Campanas located at Las Campanas Observatory (LCO) in Chile has been selected as the site for the Giant Magellan Telescope. We report results obtained since the commencement, in 2005, of a systematic site testing survey of potential GMT sites at LCO. Atmospheric precipitable water vapor (PWV) adversely impacts mid-IR astronomy through reduced transparency and increased background. Prior to the GMT site testing effort, little was known regarding the PWV characteristics at LCO and therefore, a multi-pronged approach was used to ensure the determination of the fraction of the time suitable for mid-IR observations. High time resolution monitoring was achieved with an Infrared Radiometer for Millimeter Astronomy (IRMA) from the University of Lethbridge deployed at LCO since September of 2007. Absolute calibrations via the robust Brault method (described in Thomas-Osip et al.1) are provided by the Magellan Inamori Kyocera Echelle (MIKE), mounted on the Clay 6.5-m telescope on a timescale of several per month. We find that conditions suitable for mid-IR astronomy (PWV < 1.5 mm) are concentrated in the southern winter and spring months. Nearly 40% of clear time during these seasons have PWV < 1.5mm. Approximately 10% of these nights meet our PWV requirement for the entire night.
NASA Technical Reports Server (NTRS)
Purves, L.; Strang, R. F.; Dube, M. P.; Alea, P.; Ferragut, N.; Hershfeld, D.
1983-01-01
The software and procedures of a system of programs used to generate a report of the statistical correlation between NASTRAN modal analysis results and physical tests results from modal surveys are described. Topics discussed include: a mathematical description of statistical correlation, a user's guide for generating a statistical correlation report, a programmer's guide describing the organization and functions of individual programs leading to a statistical correlation report, and a set of examples including complete listings of programs, and input and output data.
ERIC Educational Resources Information Center
Denbleyker, John Nickolas
2012-01-01
The shortcomings of the proportion above cut (PAC) statistic used so prominently in the educational landscape renders it a very problematic measure for making correct inferences with student test data. The limitations of PAC-based statistics are more pronounced with cross-test comparisons due to their dependency on cut-score locations. A better…
A Review of Post-1994 Literature on Whether Statistical Significance Tests Should Be Banned.
ERIC Educational Resources Information Center
Sullivan, Jeremy R.
This paper summarizes the literature regarding statistical significance testing with an emphasis on: (1) the post-1994 literature in various disciplines; (2) alternatives to statistical significance testing; and (3) literature exploring why researchers have demonstrably failed to be influenced by the 1994 American Psychological Association…
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"
ERIC Educational Resources Information Center
Ozturk, Elif
2012-01-01
The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
ERIC Educational Resources Information Center
Norris, John M.
2015-01-01
Traditions of statistical significance testing in second language (L2) quantitative research are strongly entrenched in how researchers design studies, select analyses, and interpret results. However, statistical significance tests using "p" values are commonly misinterpreted by researchers, reviewers, readers, and others, leading to…
ERIC Educational Resources Information Center
Norris, John M.
2015-01-01
Traditions of statistical significance testing in second language (L2) quantitative research are strongly entrenched in how researchers design studies, select analyses, and interpret results. However, statistical significance tests using "p" values are commonly misinterpreted by researchers, reviewers, readers, and others, leading to…
The Historical Growth of Statistical Significance Testing in Psychology--and Its Future Prospects.
ERIC Educational Resources Information Center
Hubbard, Raymond; Ryan, Patricia A.
2000-01-01
Examined the historical growth in the popularity of statistical significance testing using a random sample of data from 12 American Psychological Association journals. Results replicate and extend findings from a study that used only one such journal. Discusses the role of statistical significance testing and the use of replication and…
ERIC Educational Resources Information Center
Denbleyker, John Nickolas
2012-01-01
The shortcomings of the proportion above cut (PAC) statistic used so prominently in the educational landscape renders it a very problematic measure for making correct inferences with student test data. The limitations of PAC-based statistics are more pronounced with cross-test comparisons due to their dependency on cut-score locations. A better…
EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.
Tong, Xiaoxiao; Bentler, Peter M
2013-01-01
Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.
On the Correct Use of Statistical Tests: Comment on “Lies, Damned Lies, and Statistics (in Geology)”
NASA Astrophysics Data System (ADS)
Sornette, D.; Pisarenko, V. F.
2011-02-01
Taking the distribution of global seismicity over weekdays as an illustration, Pieter Vermeesch (Eos, 90(47), 443, doi:10:1029/2009EO470004, 2009) in his Forum presented an argument in which a standard chi-square test is found to be so sensitively dependent on the sample size that probabilities of earthquake occurrence from these tests are uninterpretable. He suggests that statistical tests used in the geosciences to “make deductions more ‘objective’” are at best useless, if not misleading. In complete contradiction, we affirm that statistical tests, if they are used properly, are always informative. Vermeesch's error is to assume that it is possible to reduce in the chi-square test simultaneously both the number of earthquakes in each weekday and the sample size by 10. Instead, Vermeesch should have taken 10% of the original data set and then again grouped it into 7 days. Without doing this, it was inevitable that Vermeesch reached his erroneous conclusion.
The Effects of Repeated Cooperative Testing in an Introductory Statistics Course.
ERIC Educational Resources Information Center
Giraud, Gerald; Enders, Craig
Cooperative testing seems a logical complement to cooperative learning, but it is counter to traditional testing procedures and is viewed by some as an opportunity for cheating and freeloading on the efforts of other test takers. This study examined the practice of cooperative testing in introductory statistics. Findings indicate that students had…
Zhang, Fanghong; Miyaoka, Etsuo; Huang, Fuping; Tanaka, Yutaka
2015-01-01
The problem for establishing noninferiority is discussed between a new treatment and a standard (control) treatment with ordinal categorical data. A measure of treatment effect is used and a method of specifying noninferiority margin for the measure is provided. Two Z-type test statistics are proposed where the estimation of variance is constructed under the shifted null hypothesis using U-statistics. Furthermore, the confidence interval and the sample size formula are given based on the proposed test statistics. The proposed procedure is applied to a dataset from a clinical trial. A simulation study is conducted to compare the performance of the proposed test statistics with that of the existing ones, and the results show that the proposed test statistics are better in terms of the deviation from nominal level and the power.
Hu, Simin; Rao, J. Sunil
2007-01-01
In gene selection for cancer classification using microarray data, we define an eigenvalue-ratio statistic to measure a gene’s contribution to the joint discriminability when this gene is included into a set of genes. Based on this eigenvalue-ratio statistic, we define a novel hypothesis testing for gene statistical redundancy and propose two gene selection methods. Simulation studies illustrate the agreement between statistical redundancy testing and gene selection methods. Real data examples show the proposed gene selection methods can select a compact gene subset which can not only be used to build high quality cancer classifiers but also show biological relevance. PMID:19455233
NASA Technical Reports Server (NTRS)
Xu, Kuan-Man
2006-01-01
A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.
Evaluation of heart failure biomarker tests: a survey of statistical considerations.
De, Arkendra; Meier, Kristen; Tang, Rong; Li, Meijuan; Gwise, Thomas; Gomatam, Shanti; Pennello, Gene
2013-08-01
Biomarkers assessing cardiovascular function can encompass a wide range of biochemical or physiological measurements. Medical tests that measure biomarkers are typically evaluated for measurement validation and clinical performance in the context of their intended use. General statistical principles for the evaluation of medical tests are discussed in this paper in the context of heart failure. Statistical aspects of study design and analysis to be considered while assessing the quality of measurements and the clinical performance of tests are highlighted. A discussion of statistical considerations for specific clinical uses is also provided. The remarks in this paper mainly focus on methods and considerations for statistical evaluation of medical tests from the perspective of bias and precision. With such an evaluation of performance, healthcare professionals could have information that leads to a better understanding on the strengths and limitations of tests related to heart failure.
New Statistics for Testing Differential Expression of Pathways from Microarray Data
NASA Astrophysics Data System (ADS)
Siu, Hoicheong; Dong, Hua; Jin, Li; Xiong, Momiao
Exploring biological meaning from microarray data is very important but remains a great challenge. Here, we developed three new statistics: linear combination test, quadratic test and de-correlation test to identify differentially expressed pathways from gene expression profile. We apply our statistics to two rheumatoid arthritis datasets. Notably, our results reveal three significant pathways and 275 genes in common in two datasets. The pathways we found are meaningful to uncover the disease mechanisms of rheumatoid arthritis, which implies that our statistics are a powerful tool in functional analysis of gene expression data.
Testing for phylogenetic signal in biological traits: the ubiquity of cross-product statistics.
Pavoine, Sandrine; Ricotta, Carlo
2013-03-01
To evaluate rates of evolution, to establish tests of correlation between two traits, or to investigate to what degree the phylogeny of a species assemblage is predictive of a trait value so-called tests for phylogenetic signal are used. Being based on different approaches, these tests are generally thought to possess quite different statistical performances. In this article, we show that the Blomberg et al. K and K*, the Abouheif index, the Moran's I, and the Mantel correlation are all based on a cross-product statistic, and are thus all related to each other when they are associated to a permutation test of phylogenetic signal. What changes is only the way phylogenetic and trait similarities (or dissimilarities) among the tips of a phylogeny are computed. The definitions of the phylogenetic and trait-based (dis)similarities among tips thus determines the performance of the tests. We shortly discuss the biological and statistical consequences (in terms of power and type I error of the tests) of the observed relatedness among the statistics that allow tests for phylogenetic signal. Blomberg et al. K* statistic appears as one on the most efficient approaches to test for phylogenetic signal. When branch lengths are not available or not accurate, Abouheif's Cmean statistic is a powerful alternative to K*.
Ensuring Positiveness of the Scaled Difference Chi-square Test Statistic.
Satorra, Albert; Bentler, Peter M
2010-06-01
A scaled difference test statistic [Formula: see text] that can be computed from standard software of structural equation models (SEM) by hand calculations was proposed in Satorra and Bentler (2001). The statistic [Formula: see text] is asymptotically equivalent to the scaled difference test statistic T̄(d) introduced in Satorra (2000), which requires more involved computations beyond standard output of SEM software. The test statistic [Formula: see text] has been widely used in practice, but in some applications it is negative due to negativity of its associated scaling correction. Using the implicit function theorem, this note develops an improved scaling correction leading to a new scaled difference statistic T̄(d) that avoids negative chi-square values.
Gao, Xin
2006-06-15
The parametric F-test has been widely used in the analysis of factorial microarray experiments to assess treatment effects. However, the normality assumption is often untenable for microarray experiments with small replications. Therefore, permutation-based methods are called for help to assess the statistical significance. The distribution of the F-statistics across all the genes on the array can be regarded as a mixture distribution with a proportion of statistics generated from the null distribution of no differential gene expression whereas the other proportion of statistics generated from the alternative distribution of genes differentially expressed. This results in the fact that the permutation distribution of the F-statistics may not approximate well to the true null distribution of the F-statistics. Therefore, the construction of a proper null statistic to better approximate the null distribution of F-statistic is of great importance to the permutation-based multiple testing in microarray data analysis. In this paper, we extend the ideas of constructing null statistics based on pairwise differences to neglect the treatment effects from the two-sample comparison problem to the multifactorial balanced or unbalanced microarray experiments. A null statistic based on a subpartition method is proposed and its distribution is employed to approximate the null distribution of the F-statistic. The proposed null statistic is able to accommodate unbalance in the design and is also corrected for the undue correlation between its numerator and denominator. In the simulation studies and real biological data analysis, the number of true positives and the false discovery rate (FDR) of the proposed null statistic are compared with those of the permutated version of the F-statistic. It has been shown that our proposed method has a better control of the FDRs and a higher power than the standard permutation method to detect differentially expressed genes because of the better
Mnemonic Aids during Tests: Worthless Frivolity or Effective Tool in Statistics Education?
ERIC Educational Resources Information Center
Larwin, Karen H.; Larwin, David A.; Gorman, Jennifer
2012-01-01
Researchers have explored many pedagogical approaches in an effort to assist students in finding understanding and comfort in required statistics courses. This study investigates the impact of mnemonic aids used during tests on students' statistics course performance in particular. In addition, the present study explores several hypotheses that…
ERIC Educational Resources Information Center
Mittag, Kathleen C
A national survey of a stratified random sample of members of the American Educational Research Association was undertaken to explore perceptions of contemporary statistical issues, and especially of statistical significance tests. The 225 actual respondents were found to be reasonably representative of the population from which the sample was…
ERIC Educational Resources Information Center
LeMire, Steven D.
2010-01-01
This paper proposes an argument framework for the teaching of null hypothesis statistical testing and its application in support of research. Elements of the Toulmin (1958) model of argument are used to illustrate the use of p values and Type I and Type II error rates in support of claims about statistical parameters and subject matter research…
ERIC Educational Resources Information Center
LeMire, Steven D.
2010-01-01
This paper proposes an argument framework for the teaching of null hypothesis statistical testing and its application in support of research. Elements of the Toulmin (1958) model of argument are used to illustrate the use of p values and Type I and Type II error rates in support of claims about statistical parameters and subject matter research…
Evaluation of Small-Sample Statistics that Test Whether Variables Measure the Same Trait.
ERIC Educational Resources Information Center
Rasmussen, Jeffrey Lee
1988-01-01
The performance was studied of five small-sample statistics--by F. M. Lord, W. Kristof, Q. McNemar, R. A. Forsyth and L. S. Feldt, and J. P. Braden--that test whether two variables measure the same trait except for measurement error. Effects of non-normality were investigated. The McNemar statistic was most powerful. (TJH)
Selecting the most appropriate inferential statistical test for your quantitative research study.
Bettany-Saltikov, Josette; Whittaker, Victoria Jane
2014-06-01
To discuss the issues and processes relating to the selection of the most appropriate statistical test. A review of the basic research concepts together with a number of clinical scenarios is used to illustrate this. Quantitative nursing research generally features the use of empirical data which necessitates the selection of both descriptive and statistical tests. Different types of research questions can be answered by different types of research designs, which in turn need to be matched to a specific statistical test(s). Discursive paper. This paper discusses the issues relating to the selection of the most appropriate statistical test and makes some recommendations as to how these might be dealt with. When conducting empirical quantitative studies, a number of key issues need to be considered. Considerations for selecting the most appropriate statistical tests are discussed and flow charts provided to facilitate this process. When nursing clinicians and researchers conduct quantitative research studies, it is crucial that the most appropriate statistical test is selected to enable valid conclusions to be made. © 2013 John Wiley & Sons Ltd.
The Use of Person-Fit Statistics To Analyze Placement Tests.
ERIC Educational Resources Information Center
Dodeen, Hamzeh
Person fit is a statistical index that can be used as a direct measure to assess test accuracy by analyzing the response pattern of examinees and identifying those who misfit the testing model. This misfitting is a source of inaccuracy in estimating an individual's ability, and it decreases the expected criterion-related validity of the test being…
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.
Lin, Johnny; Bentler, Peter M
2012-01-01
Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.
A General Class of Test Statistics for Van Valen’s Red Queen Hypothesis
Wiltshire, Jelani; Huffer, Fred W.; Parker, William C.
2014-01-01
Van Valen’s Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen’s work, various statistical approaches have been used to address the relationship between taxon age and the rate of extinction. We propose a general class of test statistics that can be used to test for the effect of age on the rate of extinction. These test statistics allow for a varying background rate of extinction and attempt to remove the effects of other covariates when assessing the effect of age on extinction. No model is assumed for the covariate effects. Instead we control for covariate effects by pairing or grouping together similar species. Simulations are used to compare the power of the statistics. We apply the test statistics to data on Foram extinctions and find that age has a positive effect on the rate of extinction. A derivation of the null distribution of one of the test statistics is provided in the supplementary material. PMID:24910489
Test Analysis Program Evaluation: Item Statistics as Feedback to Test Developers
1990-10-01
The evaluation indicates that a computerized test analysis program can be used to identify questionable test items and help ensure Signal School tests are adequate to validate lessons and courses. (RWJ)
Lombard, Martani J; Steyn, Nelia P; Charlton, Karen E; Senekal, Marjanne
2015-04-22
Several statistical tests are currently applied to evaluate validity of dietary intake assessment methods. However, they provide information on different facets of validity. There is also no consensus on types and combinations of tests that should be applied to reflect acceptable validity for intakes. We aimed to 1) conduct a review to identify the tests and interpretation criteria used where dietary assessment methods was validated against a reference method and 2) illustrate the value of and challenges that arise in interpretation of outcomes of multiple statistical tests in assessment of validity using a test data set. An in-depth literature review was undertaken to identify the range of statistical tests used in the validation of quantitative food frequency questionnaires (QFFQs). Four databases were accessed to search for statistical methods and interpretation criteria used in papers focusing on relative validity. The identified tests and interpretation criteria were applied to a data set obtained using a QFFQ and four repeated 24-hour recalls from 47 adults (18-65 years) residing in rural Eastern Cape, South Africa. 102 studies were screened and 60 were included. Six statistical tests were identified; five with one set of interpretation criteria and one with two sets of criteria, resulting in seven possible validity interpretation outcomes. Twenty-one different combinations of these tests were identified, with the majority including three or less tests. Coefficient of correlation was the most commonly used (as a single test or in combination with one or more tests). Results of our application and interpretation of multiple statistical tests to assess validity of energy, macronutrients and selected micronutrients estimates illustrate that for most of the nutrients considered, some outcomes support validity, while others do not. One to three statistical tests may not be sufficient to provide comprehensive insights into various facets of validity. Results of our
Statistical Power of Randomization Tests Used with Multiple-Baseline Designs.
ERIC Educational Resources Information Center
Ferron, John; Sentovich, Chris
2002-01-01
Estimated statistical power for three randomization tests used with multiple-baseline designs using Monte Carlo methods. For an effect size of 0.5, none of the tests provided an adequate level of power, and for an effect size of 1.0, power was adequate for the Koehler-Levin test and the Marascuilo-Busk test only when the series length was long and…
Statistical Power of Randomization Tests Used with Multiple-Baseline Designs.
ERIC Educational Resources Information Center
Ferron, John; Sentovich, Chris
2002-01-01
Estimated statistical power for three randomization tests used with multiple-baseline designs using Monte Carlo methods. For an effect size of 0.5, none of the tests provided an adequate level of power, and for an effect size of 1.0, power was adequate for the Koehler-Levin test and the Marascuilo-Busk test only when the series length was long and…
ERIC Educational Resources Information Center
McArthur, David; Chou, Chih-Ping
Diagnostic testing confronts several challenges at once, among which are issues of test interpretation and immediate modification of the test itself in response to the interpretation. Several methods are available for administering and evaluating a test in real-time, towards optimizing the examiner's chances of isolating a persistent pattern of…
Weighted pedigree-based statistics for testing the association of rare variants.
Shugart, Yin Yao; Zhu, Yun; Guo, Wei; Xiong, Momiao
2012-11-24
With the advent of next-generation sequencing (NGS) technologies, researchers are now generating a deluge of data on high dimensional genomic variations, whose analysis is likely to reveal rare variants involved in the complex etiology of disease. Standing in the way of such discoveries, however, is the fact that statistics for rare variants are currently designed for use with population-based data. In this paper, we introduce a pedigree-based statistic specifically designed to test for rare variants in family-based data. The additional power of pedigree-based statistics stems from the fact that while rare variants related to diseases or traits of interest occur only infrequently in populations, in families with multiple affected individuals, such variants are enriched. Note that while the proposed statistic can be applied with and without statistical weighting, our simulations show that its power increases when weighting (WSS and VT) are applied. Our working hypothesis was that, since rare variants are concentrated in families with multiple affected individuals, pedigree-based statistics should detect rare variants more powerfully than population-based statistics. To evaluate how well our new pedigree-based statistics perform in association studies, we develop a general framework for sequence-based association studies capable of handling data from pedigrees of various types and also from unrelated individuals. In short, we developed a procedure for transforming population-based statistics into tests for family-based associations. Furthermore, we modify two existing tests, the weighted sum-square test and the variable-threshold test, and apply both to our family-based collapsing methods. We demonstrate that the new family-based tests are more powerful than corresponding population-based test and they generate a reasonable type I error rate.To demonstrate feasibility, we apply the newly developed tests to a pedigree-based GWAS data set from the Framingham Heart
Shaikh, Masood Ali
2017-09-01
Assessment of research articles in terms of study designs used, statistical tests applied and the use of statistical analysis programmes help determine research activity profile and trends in the country. In this descriptive study, all original articles published by Journal of Pakistan Medical Association (JPMA) and Journal of the College of Physicians and Surgeons Pakistan (JCPSP), in the year 2015 were reviewed in terms of study designs used, application of statistical tests, and the use of statistical analysis programmes. JPMA and JCPSP published 192 and 128 original articles, respectively, in the year 2015. Results of this study indicate that cross-sectional study design, bivariate inferential statistical analysis entailing comparison between two variables/groups, and use of statistical software programme SPSS to be the most common study design, inferential statistical analysis, and statistical analysis software programmes, respectively. These results echo previously published assessment of these two journals for the year 2014.
A simulation study for comparing testing statistics in response-adaptive randomization
2010-01-01
Background Response-adaptive randomizations are able to assign more patients in a comparative clinical trial to the tentatively better treatment. However, due to the adaptation in patient allocation, the samples to be compared are no longer independent. At large sample sizes, many asymptotic properties of test statistics derived for independent sample comparison are still applicable in adaptive randomization provided that the patient allocation ratio converges to an appropriate target asymptotically. However, the small sample properties of commonly used test statistics in response-adaptive randomization are not fully studied. Methods Simulations are systematically conducted to characterize the statistical properties of eight test statistics in six response-adaptive randomization methods at six allocation targets with sample sizes ranging from 20 to 200. Since adaptive randomization is usually not recommended for sample size less than 30, the present paper focuses on the case with a sample of 30 to give general recommendations with regard to test statistics for contingency tables in response-adaptive randomization at small sample sizes. Results Among all asymptotic test statistics, the Cook's correction to chi-square test (TMC) is the best in attaining the nominal size of hypothesis test. The William's correction to log-likelihood ratio test (TML) gives slightly inflated type I error and higher power as compared with TMC, but it is more robust against the unbalance in patient allocation. TMC and TML are usually the two test statistics with the highest power in different simulation scenarios. When focusing on TMC and TML, the generalized drop-the-loser urn (GDL) and sequential estimation-adjusted urn (SEU) have the best ability to attain the correct size of hypothesis test respectively. Among all sequential methods that can target different allocation ratios, GDL has the lowest variation and the highest overall power at all allocation ratios. The performance of
EFFECT OF ERROR OF MEASUREMENT ON THE POWER OF STATISTICAL TESTS. FINAL REPORT.
ERIC Educational Resources Information Center
CLEARY, T.A.; LINN, ROBERT L.
THE PURPOSE OF THIS RESEARCH WAS TO STUDY THE EFFECT OF ERROR OF MEASUREMENT UPON THE POWER OF STATISTICAL TESTS. ATTENTION WAS FOCUSED ON THE F-TEST OF THE SINGLE FACTOR ANALYSIS OF VARIANCE. FORMULAS WERE DERIVED TO SHOW THE RELATIONSHIP BETWEEN THE NONCENTRALITY PARAMETERS FOR ANALYSES USING TRUE SCORES AND THOSE USING OBSERVED SCORES. THE…
Evaluating Two Models of Collaborative Tests in an Online Introductory Statistics Course
ERIC Educational Resources Information Center
Björnsdóttir, Auðbjörg; Garfield, Joan; Everson, Michelle
2015-01-01
This study explored the use of two different types of collaborative tests in an online introductory statistics course. A study was designed and carried out to investigate three research questions: (1) What is the difference in students' learning between using consensus and non-consensus collaborative tests in the online environment?, (2) What is…
Nonclassicality tests by classical bounds on the statistics of multiple outcomes
Luis, Alfredo
2010-08-15
We derive simple practical tests revealing the quantum nature of states by the violation of classical upper bounds on the statistics of multiple outcomes of an observable. These criteria can be expressed in terms of the Kullback-Leibler divergence (or relative entropy). Nonclassicality tests for multiple outcomes can be satisfied by states that do not fulfill the corresponding single-outcome criteria.
ERIC Educational Resources Information Center
White, Desley
2015-01-01
Two practical activities are described, which aim to support critical thinking about statistics as they concern multiple outcomes testing. Formulae are presented in Microsoft Excel spreadsheets, which are used to calculate the inflation of error associated with the quantity of tests performed. This is followed by a decision-making exercise, where…
ERIC Educational Resources Information Center
White, Desley
2015-01-01
Two practical activities are described, which aim to support critical thinking about statistics as they concern multiple outcomes testing. Formulae are presented in Microsoft Excel spreadsheets, which are used to calculate the inflation of error associated with the quantity of tests performed. This is followed by a decision-making exercise, where…
What Are Null Hypotheses? The Reasoning Linking Scientific and Statistical Hypothesis Testing
ERIC Educational Resources Information Center
Lawson, Anton E.
2008-01-01
We should dispense with use of the confusing term "null hypothesis" in educational research reports. To explain why the term should be dropped, the nature of, and relationship between, scientific and statistical hypothesis testing is clarified by explication of (a) the scientific reasoning used by Gregor Mendel in testing specific…
The Comparability of the Statistical Characteristics of Test Items Generated by Computer Algorithms.
ERIC Educational Resources Information Center
Meisner, Richard; And Others
This paper presents a study on the generation of mathematics test items using algorithmic methods. The history of this approach is briefly reviewed and is followed by a survey of the research to date on the statistical parallelism of algorithmically generated mathematics items. Results are presented for 8 parallel test forms generated using 16…
Using Multiple DIF Statistics with the Same Items Appearing in Different Test Forms.
ERIC Educational Resources Information Center
Kubiak, Anna T.; Cowell, William R.
A procedure used to average several Mantel-Haenszel delta difference values for an item is described and evaluated. The differential item functioning (DIF) procedure used by the Educational Testing Service (ETS) is based on the Mantel-Haenszel statistical technique for studying matched groups. It is standard procedure at ETS to analyze test items…
ERIC Educational Resources Information Center
Thompson, Bruce
This paper evaluates the logic underlying various criticisms of statistical significance testing and makes specific recommendations for scientific and editorial practice that might better increase the knowledge base. Reliance on the traditional hypothesis testing model has led to a major bias against nonsignificant results and to misinterpretation…
ERIC Educational Resources Information Center
Luh, Wei-Ming; Guo, Jiin-Huarng
2002-01-01
Used Johnson's transformation (N. Johnson, 1978) with approximate test statistics to test the homogeneity of simple linear regression slopes in the presence of nonnormality and Type I, Type II or complete heteroscedasticity. Computer simulations show that the proposed techniques can control Type I error under various circumstances. (SLD)
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis
ERIC Educational Resources Information Center
Lin, Johnny; Bentler, Peter M.
2012-01-01
Goodness-of-fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square, but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's (1984) asymptotically distribution-free method and Satorra Bentler's…
A Note on Three Statistical Tests in the Logistic Regression DIF Procedure
ERIC Educational Resources Information Center
Paek, Insu
2012-01-01
Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis
ERIC Educational Resources Information Center
Lin, Johnny; Bentler, Peter M.
2012-01-01
Goodness-of-fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square, but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's (1984) asymptotically distribution-free method and Satorra Bentler's…
A Note on Three Statistical Tests in the Logistic Regression DIF Procedure
ERIC Educational Resources Information Center
Paek, Insu
2012-01-01
Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…
Statistical Techniques for Criterion-Referenced Tests. Final Report. October, 1976-October, 1977.
ERIC Educational Resources Information Center
Wilcox, Rand R.
Three statistical problems related to criterion-referenced testing are investigated: estimation of the likelihood of a false-positive or false-negative decision with a mastery test, estimation of true scores in the Compound Binomial Error Model, and comparison of the examinees to a control. Two methods for estimating the likelihood of…
Exact Statistical Tests for Heterogeneity of Frequencies Based on Extreme Values
Wu, Chih-Chieh; Grimson, Roger C.; Shete, Sanjay
2014-01-01
Sophisticated statistical analyses of incidence frequencies are often required for various epidemiologic and biomedical applications. Among the most commonly applied methods is Pearson's χ2 test, which is structured to detect non-specific anomalous patterns of frequencies and is useful for testing the significance for incidence heterogeneity. However, the Pearson's χ2 test is not efficient for assessing the significance of frequency in a particular cell (or class) to be attributed to chance alone. We recently developed statistical tests for detecting temporal anomalies of disease cases based on maximum and minimum frequencies; these tests are actually designed to test of significance for a particular high or low frequency. We show that our proposed methods are more sensitive and powerful for testing extreme cell counts than is the Pearson's χ2 test. We elucidated and illustrated the differences in sensitivity among our tests and the Pearson's χ2 test by analyzing a data set of Langerhans cell histiocytosis cases and its hypothetical sets. We also computed and compared the statistical power of these methods using various sets of cell numbers and alternative frequencies. Our study will provide investigators with useful guidelines for selecting the appropriate tests for their studies. PMID:25558124
Steffen, Jason H.; Ford, Eric B.; Rowe, Jason F.; Fabrycky, Daniel C.; Holman, Matthew J.; Welsh, William F.; Borucki, William J.; Batalha, Natalie M.; Bryson, Steve; Caldwell, Douglas A.; Ciardi, David R.; /Caltech /NASA, Ames /SETI Inst., Mtn. View
2012-01-01
We analyze the deviations of transit times from a linear ephemeris for the Kepler Objects of Interest (KOI) through Quarter six (Q6) of science data. We conduct two statistical tests for all KOIs and a related statistical test for all pairs of KOIs in multi-transiting systems. These tests identify several systems which show potentially interesting transit timing variations (TTVs). Strong TTV systems have been valuable for the confirmation of planets and their mass measurements. Many of the systems identified in this study should prove fruitful for detailed TTV studies.
Steffen, Jason H.; Ford, Eric B.; Rowe, Jason F.; Borucki, William J.; Bryson, Steve; Caldwell, Douglas A.; Jenkins, Jon M.; Koch, David G.; Sanderfer, Dwight T.; Seader, Shawn; Twicken, Joseph D.; Fabrycky, Daniel C.; Welsh, William F.; Batalha, Natalie M.; Ciardi, David R.; Prsa, Andrej
2012-09-10
We analyze the deviations of transit times from a linear ephemeris for the Kepler Objects of Interest (KOI) through quarter six of science data. We conduct two statistical tests for all KOIs and a related statistical test for all pairs of KOIs in multi-transiting systems. These tests identify several systems which show potentially interesting transit timing variations (TTVs). Strong TTV systems have been valuable for the confirmation of planets and their mass measurements. Many of the systems identified in this study should prove fruitful for detailed TTV studies.
Improved Test Planning and Analysis Through the Use of Advanced Statistical Methods
NASA Technical Reports Server (NTRS)
Green, Lawrence L.; Maxwell, Katherine A.; Glass, David E.; Vaughn, Wallace L.; Barger, Weston; Cook, Mylan
2016-01-01
The goal of this work is, through computational simulations, to provide statistically-based evidence to convince the testing community that a distributed testing approach is superior to a clustered testing approach for most situations. For clustered testing, numerous, repeated test points are acquired at a limited number of test conditions. For distributed testing, only one or a few test points are requested at many different conditions. The statistical techniques of Analysis of Variance (ANOVA), Design of Experiments (DOE) and Response Surface Methods (RSM) are applied to enable distributed test planning, data analysis and test augmentation. The D-Optimal class of DOE is used to plan an optimally efficient single- and multi-factor test. The resulting simulated test data are analyzed via ANOVA and a parametric model is constructed using RSM. Finally, ANOVA can be used to plan a second round of testing to augment the existing data set with new data points. The use of these techniques is demonstrated through several illustrative examples. To date, many thousands of comparisons have been performed and the results strongly support the conclusion that the distributed testing approach outperforms the clustered testing approach.
Wang, Q.; Denton, D.L.; Shukla, R.
2000-01-01
As a follow up to the recommendations of the September 1995 SETAC Pellston Workshop on Whole Effluent Toxicity (WET) on test methods and appropriate endpoints, this paper will discuss the applications and statistical properties of using a statistical criterion of minimum significant difference (MSD). The authors examined the upper limits of acceptable MSDs as acceptance criterion in the case of normally distributed data. The implications of this approach are examined in terms of false negative rate as well as false positive rate. Results indicated that the proposed approach has reasonable statistical properties. Reproductive data from short-term chronic WET test with Ceriodaphnia dubia tests were used to demonstrate the applications of the proposed approach. The data were collected by the North Carolina Department of Environment, Health, and Natural Resources (Raleigh, NC, USA) as part of their National Pollutant Discharge Elimination System program.
Canton, S.P.
1994-12-31
Past studies have shown considerable variability in whole effluent toxicity tests in terms of LC{sub 50}`s and NOEC`s from reference toxicant tests. However, this approach cannot differentiate between variability in test organisms themselves from the variable response to a toxicant. A data base of control treatments in chronic WET tests was constructed allowing evaluation of mean performance of WET test organisms Ceriodaphnia dubia and Pimephales promelas not subjected to chemical stress. Surrogate test series were then constructed by randomly selecting replicates from this control data base. These surrogate test series were analyzed using standard EPA statistical procedures to determine NOEC`s for survival and both NOEC`s and IC{sub 25} for reproduction and growth. Since NOEC`s have a significance level (p) of 0.05, it follows that approximately 5% of the tests could ``fail`` simply due to chance and this was, in fact, the case for these surrogate tests. The IC{sub 25} statistic is a linear interpolation technique, with 95% confidence intervals calculated through a bootstrap method. It does not have a statistical check for significance. With the IC{sub 25} statistic, 10.5% of the Ceriodaphnia tests indicated toxicity (i.e. an IC{sub 25} of less than 1 00% ``effluent``), while this increased to 37% for fathead minnows. There appear to be fundamental flaws in the calculation of the IC{sub 25} statistic and its confidence intervals, as currently provided in EPA documentation. Until these flaws are addressed, it is recommended that this method not be used in the analysis of chronic toxicity data.
Green, John; Wheeler, James R
2013-11-15
Solvents are often used to aid test item preparation in aquatic ecotoxicity experiments. This paper discusses the practical, statistical and regulatory considerations. The selection of the appropriate control (if a solvent is used) for statistical analysis is investigated using a database of 141 responses (endpoints) from 71 experiments. The advantages and disadvantages of basing the statistical analysis of treatment effects to the water control alone, solvent control alone, combined controls, or a conditional strategy of combining controls, when not statistically significantly different, are tested. The latter two approaches are shown to have distinct advantages. It is recommended that this approach continue to be the standard used for regulatory and research aquatic ecotoxicology studies. However, wherever technically feasible a solvent should not be employed or at least the concentration minimized. Copyright © 2013 Elsevier B.V. All rights reserved.
Modified H-statistic with adaptive Winsorized mean in two groups test
NASA Astrophysics Data System (ADS)
Teh, Kian Wooi; Abdullah, Suhaida; Yahaya, Sharipah Soaad Syed; Yusof, Zahayu Md
2014-06-01
t-test is a commonly used test statistics when comparing two independent groups. The computation of this test is simple yet it is powerful under normal distribution and equal variance dataset. However, in real life data, sometimes it is hard to get dataset which has this package. The violation of assumptions (normality and equal variances) will give the devastating effect on the Type I error rate control to the t-test. On the same time, the statistical power also will be reduced. Therefore in this study, the adaptive Winsorised mean with hinge estimator in H-statistic (AWM-H) is proposed. The H-statistic is one of the robust statistics that able to handle the problem of nonnormality in comparing independent group. This procedure originally used Modified One-step M (MOM) estimator which employed trimming process. In the AWM-H procedure, the MOM estimator is replaced with the adaptive Winsorized mean (AWM) as the central tendency measure of the test. The Winsorization process is based on hinge estimator HQ or HQ1. Overall results showed that the proposed method performed better than the original method and the classical method especially under heavy tailed distribution.
Multiple phenotype association tests using summary statistics in genome-wide association studies.
Liu, Zhonghua; Lin, Xihong
2017-06-26
We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. © 2017, The International Biometric Society.
Statistical studies of animal response data from USF toxicity screening test method
NASA Technical Reports Server (NTRS)
Hilado, C. J.; Machado, A. M.
1978-01-01
Statistical examination of animal response data obtained using Procedure B of the USF toxicity screening test method indicates that the data deviate only slightly from a normal or Gaussian distribution. This slight departure from normality is not expected to invalidate conclusions based on theoretical statistics. Comparison of times to staggering, convulsions, collapse, and death as endpoints shows that time to death appears to be the most reliable endpoint because it offers the lowest probability of missed observations and premature judgements.
Statistical studies of animal response data from USF toxicity screening test method
NASA Technical Reports Server (NTRS)
Hilado, C. J.; Machado, A. M.
1978-01-01
Statistical examination of animal response data obtained using Procedure B of the USF toxicity screening test method indicates that the data deviate only slightly from a normal or Gaussian distribution. This slight departure from normality is not expected to invalidate conclusions based on theoretical statistics. Comparison of times to staggering, convulsions, collapse, and death as endpoints shows that time to death appears to be the most reliable endpoint because it offers the lowest probability of missed observations and premature judgements.
A New Test of the Statistical Nature of the Brightest Cluster Galaxies
NASA Astrophysics Data System (ADS)
Lin, Yen-Ting; Ostriker, Jeremiah P.; Miller, Christopher J.
2010-06-01
A novel statistic is proposed to examine the hypothesis that all cluster galaxies are drawn from the same luminosity distribution (LD). In such a "statistical model" of galaxy LD, the brightest cluster galaxies (BCGs) are simply the statistical extreme of the galaxy population. Using a large sample of nearby clusters, we show that BCGs in high luminosity clusters (e.g., L tot >~ 4 × 1011 h -2 70 L sun) are unlikely (probability <=3 × 10-4) to be drawn from the LD defined by all red cluster galaxies more luminous than Mr = -20. On the other hand, BCGs in less luminous clusters are consistent with being the statistical extreme. Applying our method to the second brightest galaxies, we show that they are consistent with being the statistical extreme, which implies that the BCGs are also distinct from non-BCG luminous, red, cluster galaxies. We point out some issues with the interpretation of the classical tests proposed by Tremaine & Richstone (TR) that are designed to examine the statistical nature of BCGs, investigate the robustness of both our statistical test and those of TR against difficulties in photometry of galaxies of large angular size, and discuss the implication of our findings on surveys that use the luminous red galaxies to measure the baryon acoustic oscillation features in the galaxy power spectrum.
A NEW TEST OF THE STATISTICAL NATURE OF THE BRIGHTEST CLUSTER GALAXIES
Lin, Yen-Ting; Ostriker, Jeremiah P.; Miller, Christopher J.
2010-06-01
A novel statistic is proposed to examine the hypothesis that all cluster galaxies are drawn from the same luminosity distribution (LD). In such a 'statistical model' of galaxy LD, the brightest cluster galaxies (BCGs) are simply the statistical extreme of the galaxy population. Using a large sample of nearby clusters, we show that BCGs in high luminosity clusters (e.g., L {sub tot} {approx}> 4 x 10{sup 11} h {sup -2} {sub 70} L {sub sun}) are unlikely (probability {<=}3 x 10{sup -4}) to be drawn from the LD defined by all red cluster galaxies more luminous than M{sub r} = -20. On the other hand, BCGs in less luminous clusters are consistent with being the statistical extreme. Applying our method to the second brightest galaxies, we show that they are consistent with being the statistical extreme, which implies that the BCGs are also distinct from non-BCG luminous, red, cluster galaxies. We point out some issues with the interpretation of the classical tests proposed by Tremaine and Richstone (TR) that are designed to examine the statistical nature of BCGs, investigate the robustness of both our statistical test and those of TR against difficulties in photometry of galaxies of large angular size, and discuss the implication of our findings on surveys that use the luminous red galaxies to measure the baryon acoustic oscillation features in the galaxy power spectrum.
Statistical hypothesis testing by weak-value amplification: Proposal and evaluation
NASA Astrophysics Data System (ADS)
Susa, Yuki; Tanaka, Saki
2015-07-01
We study the detection capability of the weak-value amplification on the basis of the statistical hypothesis testing. We propose a reasonable testing method in the physical and statistical senses to find that the weak measurement with the large weak value has the advantage to increase the detection power and to reduce the possibility of missing the presence of interaction. We enhance the physical understanding of the weak value and mathematically establish the significance of the weak-value amplification. Our present work overcomes the critical dilemma of the weak-value amplification that the larger the amplification is, the smaller the number of data becomes, because the statistical hypothesis testing works even for a small number of data. This is contrasted with the parameter estimation by the weak-value amplification in the literature which requires a large number of data.
[Tests of statistical significance in three biomedical journals: a critical review].
Sarria Castro, Madelaine; Silva Ayçaguer, Luis Carlos
2004-05-01
To describe the use of conventional tests of statistical significance and the current trends shown by their use in three biomedical journals read in Spanish-speaking countries. All descriptive or explanatory original articles published in the five-year period of 1996 through 2000 were reviewed in three journals: Revista Cubana de Medicina General Integral [Cuban Journal of Comprehensive General Medicine], Revista Panamericana de Salud Pública/Pan American Journal of Public Health, and Medicina Clínica [Clinical Medicine] (which is published in Spain). In the three journals that were reviewed various shortcomings were found in their use of hypothesis tests based on P values and in the limited use of new tools that have been suggested for use in their place: confidence intervals (CIs) and Bayesian inference. The basic findings of our research were: minimal use of CIs, as either a complement to significance tests or as the only statistical tool; mentions of a small sample size as a possible explanation for the lack of statistical significance; a predominant use of rigid alpha values; a lack of uniformity in the presentation of results; and improper reference in the research conclusions to the results of hypothesis tests. Our results indicate the lack of compliance by authors and editors with accepted standards for the use of tests of statistical significance. The findings also highlight that the stagnant use of these tests continues to be a common practice in the scientific literature.
New advances in methodology for statistical tests useful in geostatistical studies
Borgman, L.E.
1988-05-01
Methodology for statistical procedures to perform tests of hypothesis pertaining to various aspects of geostatistical investigations has been slow in developing. The correlated nature of the data precludes most classical tests and makes the design of new tests difficult. Recent studies have led to modifications of the classical t test which allow for the intercorrelation. In addition, results for certain nonparametric tests have been obtained. The conclusions of these studies provide a variety of new tools for the geostatistician in deciding questions on significant differences and magnitudes.
Influence of the test medium on azithromycin and erythromycin regression statistics.
Barry, A L; Fuchs, P C
1991-10-01
Azithromycin and erythromycin disk test results were compared to MIC values obtained in six different media. One hundred isolates were tested in triplicate, and geometric mean MICs were plotted against arithmetic mean zone diameters and regression statistics calculated. The test media evaluated did not markedly influence MIC values, but incubation in 5-7% CO2 resulted in a two- to four-fold decrease in the activity of both drugs. For testing Haemophilus influenzae and other species that need to be tested in 5-7% CO2, interpretive breakpoints for the macrolides and azalides should be modified to compensate for the anticipated decrease in activity.
A new model test in high energy physics in frequentist and Bayesian statistical formalisms
NASA Astrophysics Data System (ADS)
Kamenshchikov, A.
2017-01-01
A problem of a new physical model test given observed experimental data is a typical one for modern experiments of high energy physics (HEP). A solution of the problem may be provided with two alternative statistical formalisms, namely frequentist and Bayesian, which are widely spread in contemporary HEP searches. A characteristic experimental situation is modeled from general considerations and both the approaches are utilized in order to test a new model. The results are juxtaposed, what demonstrates their consistency in this work. An effect of a systematic uncertainty treatment in the statistical analysis is also considered.
Optimization and statistical evaluation of dissolution tests for indinavir sulfate capsules.
Carvalho-Silva, B; Moreira-Campos, L M; Nunan, E A; Vianna-Soares, C D; Araujo-Alves, B L; Cesar, I C; Pianetti, G A
2004-11-01
An optimization, statistically based on t-student test, to set up dissolution test conditions for indinavir sulfate capsules is presented. Three dissolution media, including that reported in United States Pharmacopeial Forum, and two apparatus, paddle and basket, were applied. Two different indinavir sulfate capsules, products A and B, were evaluated. For a reliable statistical analysis eighteen capsules were assayed in each condition based on the combination of dissolution medium and apparatus. All tested media were statistically equivalent (P > 0.05) for both drug products when paddle apparatus was employed at the stirring speed of 50 rpm. The use of basket apparatus at the stirring speed of 50 rpm caused significant decrease in the drug release percent for the product B (P < 0.05). The best dissolution conditions tested, for products A and B, were applied to evaluate capsules dissolution profiles. Twelve dosage units were assayed and dissolution efficiency concept was used, for each condition, to obtain results with statistical significance (P > 0.05). Optimal conditions to carry out the dissolution test were 900 ml of 0.1 M hydrochloric acid as dissolution medium, basket at 100 rpm stirring speed and 260 nm ultraviolet detection.
Rivoirard, Romain; Duplay, Vianney; Oriol, Mathieu; Tinquaut, Fabien; Chauvin, Franck; Magne, Nicolas; Bourmaud, Aurelie
2016-01-01
Quality of reporting for Randomized Clinical Trials (RCTs) in oncology was analyzed in several systematic reviews, but, in this setting, there is paucity of data for the outcomes definitions and consistency of reporting for statistical tests in RCTs and Observational Studies (OBS). The objective of this review was to describe those two reporting aspects, for OBS and RCTs in oncology. From a list of 19 medical journals, three were retained for analysis, after a random selection: British Medical Journal (BMJ), Annals of Oncology (AoO) and British Journal of Cancer (BJC). All original articles published between March 2009 and March 2014 were screened. Only studies whose main outcome was accompanied by a corresponding statistical test were included in the analysis. Studies based on censored data were excluded. Primary outcome was to assess quality of reporting for description of primary outcome measure in RCTs and of variables of interest in OBS. A logistic regression was performed to identify covariates of studies potentially associated with concordance of tests between Methods and Results parts. 826 studies were included in the review, and 698 were OBS. Variables were described in Methods section for all OBS studies and primary endpoint was clearly detailed in Methods section for 109 RCTs (85.2%). 295 OBS (42.2%) and 43 RCTs (33.6%) had perfect agreement for reported statistical test between Methods and Results parts. In multivariable analysis, variable "number of included patients in study" was associated with test consistency: aOR (adjusted Odds Ratio) for third group compared to first group was equal to: aOR Grp3 = 0.52 [0.31-0.89] (P value = 0.009). Variables in OBS and primary endpoint in RCTs are reported and described with a high frequency. However, statistical tests consistency between methods and Results sections of OBS is not always noted. Therefore, we encourage authors and peer reviewers to verify consistency of statistical tests in oncology studies.
Lee, Chaeyoung
2012-11-01
Epistasis that may explain a large portion of the phenotypic variation for complex economic traits of animals has been ignored in many genetic association studies. A Baysian method was introduced to draw inferences about multilocus genotypic effects based on their marginal posterior distributions by a Gibbs sampler. A simulation study was conducted to provide statistical powers under various unbalanced designs by using this method. Data were simulated by combined designs of number of loci, within genotype variance, and sample size in unbalanced designs with or without null combined genotype cells. Mean empirical statistical power was estimated for testing posterior mean estimate of combined genotype effect. A practical example for obtaining empirical statistical power estimates with a given sample size was provided under unbalanced designs. The empirical statistical powers would be useful for determining an optimal design when interactive associations of multiple loci with complex phenotypes were examined.
Testing independence of bivariate interval-censored data using modified Kendall's tau statistic.
Kim, Yuneung; Lim, Johan; Park, DoHwan
2015-11-01
In this paper, we study a nonparametric procedure to test independence of bivariate interval censored data; for both current status data (case 1 interval-censored data) and case 2 interval-censored data. To do it, we propose a score-based modification of the Kendall's tau statistic for bivariate interval-censored data. Our modification defines the Kendall's tau statistic with expected numbers of concordant and disconcordant pairs of data. The performance of the modified approach is illustrated by simulation studies and application to the AIDS study. We compare our method to alternative approaches such as the two-stage estimation method by Sun et al. (Scandinavian Journal of Statistics, 2006) and the multiple imputation method by Betensky and Finkelstein (Statistics in Medicine, 1999b).
Standard Errors and Confidence Intervals of Norm Statistics for Educational and Psychological Tests.
Oosterhuis, Hannah E M; van der Ark, L Andries; Sijtsma, Klaas
2016-11-14
Norm statistics allow for the interpretation of scores on psychological and educational tests, by relating the test score of an individual test taker to the test scores of individuals belonging to the same gender, age, or education groups, et cetera. Given the uncertainty due to sampling error, one would expect researchers to report standard errors for norm statistics. In practice, standard errors are seldom reported; they are either unavailable or derived under strong distributional assumptions that may not be realistic for test scores. We derived standard errors for four norm statistics (standard deviation, percentile ranks, stanine boundaries and Z-scores) under the mild assumption that the test scores are multinomially distributed. A simulation study showed that the standard errors were unbiased and that corresponding Wald-based confidence intervals had good coverage. Finally, we discuss the possibilities for applying the standard errors in practical test use in education and psychology. The procedure is provided via the R function check.norms, which is available in the mokken package.
Kably Ambe, Alberto; Ruíz Anguas, Julián; Pérez-Avellá, Andrea Baptista; Carballo Mondragón, Esperanza; Karchmer Krivitzky, Samuel
2003-01-01
Determining if a statistical correlation can be established between those variables observed during the transference test performed before ovarian stimulation and those obtained during the real embryo transference. Clinical retrospective. Ninety four female patients included in the IVF-ET were studied and a transference test previous to ovarian stimulation was performed. The following parameters were considered in this test: hysterometry, type of catheter, degree of difficulty, and person performing it. A total of 117 embryonic transferences were carried out, and the same parameters observed during the test were measured. The chi square test was used for the statistical analysis. The most commonly used kind of catheter was the Cook Soft Pass (n = 92), and a statistically significant correlation was observed with the one used during the real transference (n = 94). Concerning the degree of difficulty, the transference test resulted complicated in 4.2% of the cases, difficult in 31.6% of them, and easy in 64.2% of them, presenting a statistical correlation with the real transference, where the procedure resulted complicated in 1.7% of the patients, difficult in 32.4% of them, and easy in 65.8%. In addition, and with the intention of ruling out the medical factor, we tried to have the same person performing the transference test and the real procedure, and a statistically significant correlation was also observed. Pregnancy rate per transference was 27.4%. When an association between the difficulty of a real transference and pregnancy rates was tried, no statistical association between an easy or a difficult transference was observed. Embrionic transference is a fundamental phase for IVF-ET programs. However, a variety of factors can influence this procedure, some are medical and others are mechanical. Our results allow us to conclude that there is a statistical correlation between those variables observed during the real embryonic transference and the test, so the
NASA Technical Reports Server (NTRS)
Colvin, E. L.; Emptage, M. R.
1992-01-01
The breaking load test provides quantitative stress corrosion cracking data by determining the residual strength of tension specimens that have been exposed to corrosive environments. Eight laboratories have participated in a cooperative test program under the auspices of ASTM Committee G-1 to evaluate the new test method. All eight laboratories were able to distinguish between three tempers of aluminum alloy 7075. The statistical analysis procedures that were used in the test program do not work well in all situations. An alternative procedure using Box-Cox transformations shows a great deal of promise. An ASTM standard method has been drafted which incorporates the Box-Cox procedure.
NASA Technical Reports Server (NTRS)
Colvin, E. L.; Emptage, M. R.
1992-01-01
The breaking load test provides quantitative stress corrosion cracking data by determining the residual strength of tension specimens that have been exposed to corrosive environments. Eight laboratories have participated in a cooperative test program under the auspices of ASTM Committee G-1 to evaluate the new test method. All eight laboratories were able to distinguish between three tempers of aluminum alloy 7075. The statistical analysis procedures that were used in the test program do not work well in all situations. An alternative procedure using Box-Cox transformations shows a great deal of promise. An ASTM standard method has been drafted which incorporates the Box-Cox procedure.
Statistical tests for the Gaussian nature of primordial fluctuations through CBR experiments
Luo, X. NASA/Fermilab Astrophysics Center, Fermi National Accelerator Laboratory, Batavia, Illinois 60510-0500 )
1994-04-15
Information about the physical processes that generate the primordial fluctuations in the early Universe can be gained by testing the Gaussian nature of the fluctuations through cosmic microwave background radiation (CBR) temperature anisotropy experiments. One of the crucial aspects of density perturbations that are produced by the standard inflation scenario is that they are Gaussian, whereas seeds produced by topological defects left over from an early cosmic phase transition tend to be non-Gaussian. To carry out this test, sophisticated statistical tools are required. In this paper, we will discuss several such statistical tools, including multivariant skewness and kurtosis, Euler-Poincare characteristics, the three-point temperature correlation function, and Hotelling's [ital T][sup 2] statistic defined through bispectral estimates of a one-dimensional data set. The effect of noise present in the current data is discussed in detail and the COBE 53 GHz data set is analyzed. Our analysis shows that, on the large angular scale to which COBE is sensitive, the statistics are probably Gaussian. On the small angular scales, the importance of Hotelling's [ital T][sup 2] statistic is stressed, and the minimum sample size required to test Gaussianity is estimated. Although the current data set available from various experiments at half-degree scales is still too small, improvement of the data set by roughly a factor of 2 will be enough to test the Gaussianity statistically. On the arc min scale, we analyze the recent RING data through bispectral analysis, and the result indicates possible deviation from Gaussianity. Effects of point sources are also discussed. It is pointed out that the Gaussianity problem can be resolved in the near future by ground-based or balloon-borne experiments.
ERIC Educational Resources Information Center
Page, Robert; Satake, Eiki
2017-01-01
While interest in Bayesian statistics has been growing in statistics education, the treatment of the topic is still inadequate in both textbooks and the classroom. Because so many fields of study lead to careers that involve a decision-making process requiring an understanding of Bayesian methods, it is becoming increasingly clear that Bayesian…
ERIC Educational Resources Information Center
Ho, Andrew D.; Yu, Carol C.
2015-01-01
Many statistical analyses benefit from the assumption that unconditional or conditional distributions are continuous and normal. More than 50 years ago in this journal, Lord and Cook chronicled departures from normality in educational tests, and Micerri similarly showed that the normality assumption is met rarely in educational and psychological…
ERIC Educational Resources Information Center
Dorman, Jeffrey P.
2009-01-01
This article discusses the effect of clustering on statistical tests conducted with school environment data. Because most school environment studies involve the collection of data from teachers nested within schools, the hierarchical nature to these data cannot be ignored. In particular, this article considers the influence of intraschool…
A statistical comparison of impact and ambient testing results from the Alamosa Canyon Bridge
Doebling, S.W.; Farrar, C.R.; Cornwell, P.
1996-12-31
In this paper, the modal properties of the Alamosa Canyon Bridge obtained using ambient data are compared to those obtained from impact hammer vibration tests. Using ambient sources of excitation to determine the modal characteristics of large civil engineering structures is desirable for several reasons. The forced vibration testing of such structures generally requires a large amount of specialized equipment and trained personnel making the tests quite expensive. Also, an automated health monitoring system for a large civil structure will most likely use ambient excitation. A modal identification procedure based on a statistical Monte Carlo analysis using the Eigensystem Realization Algorithm is used to compute the modal parameters and their statistics. The results show that for most of the measured modes, the differences between the modal frequencies of the ambient and hammer data sets are statistically significant. However, the differences between the corresponding damping ratio results are not statistically significant. Also, one of the modes identified from the hammer test data was not identifiable from the ambient data set.
An Algorithm to Improve Test Answer Copying Detection Using the Omega Statistic
ERIC Educational Resources Information Center
Maeda, Hotaka; Zhang, Bo
2017-01-01
The omega (?) statistic is reputed to be one of the best indices for detecting answer copying on multiple choice tests, but its performance relies on the accurate estimation of copier ability, which is challenging because responses from the copiers may have been contaminated. We propose an algorithm that aims to identify and delete the suspected…
A Critique of One-Tailed Hypothesis Test Procedures in Business and Economics Statistics Textbooks.
ERIC Educational Resources Information Center
Liu, Tung; Stone, Courtenay C.
1999-01-01
Surveys introductory business and economics statistics textbooks and finds that they differ over the best way to explain one-tailed hypothesis tests: the simple null-hypothesis approach or the composite null-hypothesis approach. Argues that the composite null-hypothesis approach contains methodological shortcomings that make it more difficult for…
Statistical forecasts and tests for small interplate repeating earthquakes along the Japan Trench
NASA Astrophysics Data System (ADS)
Okada, Masami; Uchida, Naoki; Aoki, Shigeki
2012-08-01
Earthquake predictability is a fundamental problem of seismology. Using a sophisticated model, a Bayesian approach with lognormal distribution on the renewal process, we theoretically formulated a method to calculate the conditional probability of a forthcoming recurrent event and forecast the probabilities of small interplate repeating earthquakes along the Japan Trench. The numbers of forecast sequences for 12 months were 93 for July 2006 to June 2007, 127 for 2008, 145 for 2009, and 163 for 2010. Forecasts except for 2006-07 were posted on a web site for impartial testing. Consistencies of the probabilities with catalog data of two early experiments were so good that they were statistically accepted. However, the 2009 forecasts were rejected by the statistical tests, mainly due to a large slow slip event on the plate boundary triggered by two events with M 7.0 and M 6.9. All 365 forecasts of the three experiments were statistically accepted by consistency tests. Comparison tests and the relative/receiver operating characteristic confirm that our model has significantly higher performance in probabilistic forecast than the exponential distribution model on the Poisson process. Therefore, we conclude that the occurrence of microrepeaters is statistically dependent on elapsed time since the last event and is not random in time.
Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.
ERIC Educational Resources Information Center
Kieffer, Kevin M.; Thompson, Bruce
As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate…
Recent Literature on Whether Statistical Significance Tests Should or Should Not Be Banned.
ERIC Educational Resources Information Center
Deegear, James
This paper summarizes the literature regarding statistical significant testing with an emphasis on recent literature in various discipline and literature exploring why researchers have demonstrably failed to be influenced by the American Psychological Association publication manual's encouragement to report effect sizes. Also considered are…
Connecting Science and Mathematics: The Nature of Scientific and Statistical Hypothesis Testing
ERIC Educational Resources Information Center
Lawson, Anton E.; Oehrtman, Michael; Jensen, Jamie
2008-01-01
Confusion persists concerning the roles played by scientific hypotheses and predictions in doing science. This confusion extends to the nature of scientific and statistical hypothesis testing. The present paper utilizes the "If/and/then/Therefore" pattern of hypothetico-deductive (HD) reasoning to explicate the nature of both scientific and…
A Short-Cut Statistic for Item Analysis of Mastery Tests: A Comparison of Three Procedures.
ERIC Educational Resources Information Center
Subkoviak, Michael J.; Harris, Deborah J.
This study examined three statistical methods for selecting items for mastery tests. One is the pretest-posttest method due to Cox and Vargas (1966); it is computationally simple, but has a number of serious limitations. The second is a latent trait method recommended by van der Linden (1981); it is computationally complex, but has a number of…
Identifying Local Dependence with a Score Test Statistic Based on the Bifactor Logistic Model
ERIC Educational Resources Information Center
Liu, Yang; Thissen, David
2012-01-01
Local dependence (LD) refers to the violation of the local independence assumption of most item response models. Statistics that indicate LD between a pair of items on a test or questionnaire that is being fitted with an item response model can play a useful diagnostic role in applications of item response theory. In this article, a new score test…
Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics.
Chen, Wenan; Larrabee, Beth R; Ovsyannikova, Inna G; Kennedy, Richard B; Haralambieva, Iana H; Poland, Gregory A; Schaid, Daniel J
2015-07-01
Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf.
Alphas and Asterisks: The Development of Statistical Significance Testing Standards in Sociology
ERIC Educational Resources Information Center
Leahey, Erin
2005-01-01
In this paper, I trace the development of statistical significance testing standards in sociology by analyzing data from articles published in two prestigious sociology journals between 1935 and 2000. I focus on the role of two key elements in the diffusion literature, contagion and rationality, as well as the role of institutional factors. I…
Identifying Local Dependence with a Score Test Statistic Based on the Bifactor Logistic Model
ERIC Educational Resources Information Center
Liu, Yang; Thissen, David
2012-01-01
Local dependence (LD) refers to the violation of the local independence assumption of most item response models. Statistics that indicate LD between a pair of items on a test or questionnaire that is being fitted with an item response model can play a useful diagnostic role in applications of item response theory. In this article, a new score test…
ERIC Educational Resources Information Center
Ho, Andrew D.; Yu, Carol C.
2015-01-01
Many statistical analyses benefit from the assumption that unconditional or conditional distributions are continuous and normal. More than 50 years ago in this journal, Lord and Cook chronicled departures from normality in educational tests, and Micerri similarly showed that the normality assumption is met rarely in educational and psychological…
Diagnosing Skills of Statistical Hypothesis Testing Using the Rule Space Method
ERIC Educational Resources Information Center
Im, Seongah; Yin, Yue
2009-01-01
This study illustrated the use of the Rule Space Method to diagnose students' proficiencies in, skills and knowledge of statistical hypothesis testing. Participants included 96 undergraduate and, graduate students, of whom 94 were classified into one or more of the knowledge states identified by, the rule space analysis. Analysis at the level of…
Connecting Science and Mathematics: The Nature of Scientific and Statistical Hypothesis Testing
ERIC Educational Resources Information Center
Lawson, Anton E.; Oehrtman, Michael; Jensen, Jamie
2008-01-01
Confusion persists concerning the roles played by scientific hypotheses and predictions in doing science. This confusion extends to the nature of scientific and statistical hypothesis testing. The present paper utilizes the "If/and/then/Therefore" pattern of hypothetico-deductive (HD) reasoning to explicate the nature of both scientific and…
Statistical Significance Testing in "Educational and Psychological Measurement" and Other Journals.
ERIC Educational Resources Information Center
Daniel, Larry G.
Statistical significance tests (SSTs) have been the object of much controversy among social scientists. Proponents have hailed SSTs as an objective means for minimizing the likelihood that chance factors have contributed to research results. Critics have both questioned the logic underlying SSTs and bemoaned the widespread misapplication and…
An Algorithm to Improve Test Answer Copying Detection Using the Omega Statistic
ERIC Educational Resources Information Center
Maeda, Hotaka; Zhang, Bo
2017-01-01
The omega (?) statistic is reputed to be one of the best indices for detecting answer copying on multiple choice tests, but its performance relies on the accurate estimation of copier ability, which is challenging because responses from the copiers may have been contaminated. We propose an algorithm that aims to identify and delete the suspected…
Detecting trends in raptor counts: power and type I error rates of various statistical tests
Hatfield, J.S.; Gould, W.R.; Hoover, B.A.; Fuller, M.R.; Lindquist, E.L.
1996-01-01
We conducted simulations that estimated power and type I error rates of statistical tests for detecting trends in raptor population count data collected from a single monitoring site. Results of the simulations were used to help analyze count data of bald eagles (Haliaeetus leucocephalus) from 7 national forests in Michigan, Minnesota, and Wisconsin during 1980-1989. Seven statistical tests were evaluated, including simple linear regression on the log scale and linear regression with a permutation test. Using 1,000 replications each, we simulated n = 10 and n = 50 years of count data and trends ranging from -5 to 5% change/year. We evaluated the tests at 3 critical levels (alpha = 0.01, 0.05, and 0.10) for both upper- and lower-tailed tests. Exponential count data were simulated by adding sampling error with a coefficient of variation of 40% from either a log-normal or autocorrelated log-normal distribution. Not surprisingly, tests performed with 50 years of data were much more powerful than tests with 10 years of data. Positive autocorrelation inflated alpha-levels upward from their nominal levels, making the tests less conservative and more likely to reject the null hypothesis of no trend. Of the tests studied, Cox and Stuart's test and Pollard's test clearly had lower power than the others. Surprisingly, the linear regression t-test, Collins' linear regression permutation test, and the nonparametric Lehmann's and Mann's tests all had similar power in our simulations. Analyses of the count data suggested that bald eagles had increasing trends on at least 2 of the 7 national forests during 1980-1989.
2011-04-30
Commander, Naval Sea Systems Command • Army Contracting Command, U.S. Army Materiel Command • Program Manager, Airborne, Maritime and Fixed Station...are in the area of the Design and Acquisition of Military Assets. Specific domains of interests include the concept of value and its integration...inference may point to areas where the test may be modified or additional control measures may be introduced to increase the likelihood of obtaining
Schoenberg, Mike R; Dawson, Kyra A; Duff, Kevin; Patton, Doyle; Scott, James G; Adams, Russell L
2006-10-01
The Rey Auditory Verbal Learning Test [RAVLT; Rey, A. (1941). L'examen psychologique dans les cas d'encéphalopathie traumatique. Archives de Psychologie, 28, 21] is a commonly used neuropsychological measure that assesses verbal learning and memory. Normative data have been compiled [Schmidt, M. (1996). Rey Auditory and Verbal Learning Test: A handbook. Los Angeles, CA: Western Psychological Services]. When assessing an individual suspected of neurological dysfunction, useful comparisons include the extent that the patient deviates from healthy peers and also how closely the subject's performance matches those with known brain injury. This study provides the means and S.D.'s of 392 individuals with documented neurological dysfunction [closed head TBI (n=68), neoplasms (n=57), stroke (n=47), Dementia of the Alzheimer's type (n=158), and presurgical epilepsy left seizure focus (n=28), presurgical epilepsy right seizure focus (n=34)] and 122 patients with no known neurological dysfunction and psychiatric complaints. Patients were stratified into three age groups, 16-35, 36-59, and 60-88. Data were provided for trials I-V, List B, immediate recall, 30-min delayed recall, and recognition. Classification characteristics of the RAVLT using [Schmidt, M. (1996). Rey Auditory and Verbal Learning Test: A handbook. Los Angeles, CA: Western Psychological Services] meta-norms found the RAVLT to best distinguish patients suspected of Alzheimer's disease from the psychiatric comparison group.
Testing statistical self-similarity in the topology of river networks
Troutman, Brent M.; Mantilla, Ricardo; Gupta, Vijay K.
2010-01-01
Recent work has demonstrated that the topological properties of real river networks deviate significantly from predictions of Shreve's random model. At the same time the property of mean self-similarity postulated by Tokunaga's model is well supported by data. Recently, a new class of network model called random self-similar networks (RSN) that combines self-similarity and randomness has been introduced to replicate important topological features observed in real river networks. We investigate if the hypothesis of statistical self-similarity in the RSN model is supported by data on a set of 30 basins located across the continental United States that encompass a wide range of hydroclimatic variability. We demonstrate that the generators of the RSN model obey a geometric distribution, and self-similarity holds in a statistical sense in 26 of these 30 basins. The parameters describing the distribution of interior and exterior generators are tested to be statistically different and the difference is shown to produce the well-known Hack's law. The inter-basin variability of RSN parameters is found to be statistically significant. We also test generator dependence on two climatic indices, mean annual precipitation and radiative index of dryness. Some indication of climatic influence on the generators is detected, but this influence is not statistically significant with the sample size available. Finally, two key applications of the RSN model to hydrology and geomorphology are briefly discussed.
Reproducibility-optimized test statistic for ranking genes in microarray studies.
Elo, Laura L; Filén, Sanna; Lahesmaa, Riitta; Aittokallio, Tero
2008-01-01
A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. While previous studies on simulated or spike-in datasets do not provide practical guidance on how to choose the best method for a given real dataset, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene- anking statistic directly from the data. In comparison with existing ranking methods, the reproducibilityoptimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in dataset. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given dataset without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibilityoptimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.
Testing statistical self-similarity in the topology of river networks
NASA Astrophysics Data System (ADS)
Mantilla, Ricardo; Troutman, Brent M.; Gupta, Vijay K.
2010-09-01
Recent work has demonstrated that the topological properties of real river networks deviate significantly from predictions of Shreve's random model. At the same time the property of mean self-similarity postulated by Tokunaga's model is well supported by data. Recently, a new class of network model called random self-similar networks (RSN) that combines self-similarity and randomness has been introduced to replicate important topological features observed in real river networks. We investigate if the hypothesis of statistical self-similarity in the RSN model is supported by data on a set of 30 basins located across the continental United States that encompass a wide range of hydroclimatic variability. We demonstrate that the generators of the RSN model obey a geometric distribution, and self-similarity holds in a statistical sense in 26 of these 30 basins. The parameters describing the distribution of interior and exterior generators are tested to be statistically different and the difference is shown to produce the well-known Hack's law. The inter-basin variability of RSN parameters is found to be statistically significant. We also test generator dependence on two climatic indices, mean annual precipitation and radiative index of dryness. Some indication of climatic influence on the generators is detected, but this influence is not statistically significant with the sample size available. Finally, two key applications of the RSN model to hydrology and geomorphology are briefly discussed.
ERIC Educational Resources Information Center
Smith, A. Delany; Henson, Robin K.
This paper addresses the state of the art regarding the use of statistical significance tests (SSTs). How social science research will be conducted in the future is impacted directly by current debates regarding hypothesis testing. This paper: (1) briefly explicates the current debate on hypothesis testing; (2) reviews the newly published report…
NASA Astrophysics Data System (ADS)
Coelho, Carlos A.; Marques, Filipe J.
2013-09-01
In this paper the authors combine the equicorrelation and equivariance test introduced by Wilks [13] with the likelihood ratio test (l.r.t.) for independence of groups of variables to obtain the l.r.t. of block equicorrelation and equivariance. This test or its single block version may find applications in many areas as in psychology, education, medicine, genetics and they are important "in many tests of multivariate analysis, e.g. in MANOVA, Profile Analysis, Growth Curve analysis, etc" [12, 9]. By decomposing the overall hypothesis into the hypotheses of independence of groups of variables and the hypothesis of equicorrelation and equivariance we are able to obtain the expressions for the overall l.r.t. statistic and its moments. From these we obtain a suitable factorization of the characteristic function (c.f.) of the logarithm of the l.r.t. statistic, which enables us to develop highly manageable and precise near-exact distributions for the test statistic.
Obuchowski, Nancy A; Buckler, Andrew; Kinahan, Paul; Chen-Mayer, Heather; Petrick, Nicholas; Barboriak, Daniel P; Bullen, Jennifer; Barnhart, Huiman; Sullivan, Daniel C
2016-04-01
A major initiative of the Quantitative Imaging Biomarker Alliance is to develop standards-based documents called "Profiles," which describe one or more technical performance claims for a given imaging modality. The term "actor" denotes any entity (device, software, or person) whose performance must meet certain specifications for the claim to be met. The objective of this paper is to present the statistical issues in testing actors' conformance with the specifications. In particular, we present the general rationale and interpretation of the claims, the minimum requirements for testing whether an actor achieves the performance requirements, the study designs used for testing conformity, and the statistical analysis plan. We use three examples to illustrate the process: apparent diffusion coefficient in solid tumors measured by MRI, change in Perc 15 as a biomarker for the progression of emphysema, and percent change in solid tumor volume by computed tomography as a biomarker for lung cancer progression.
Rudd, James; Moore, Jason H; Urbanowicz, Ryan J
2013-11-01
Permutation-based statistics for evaluating the significance of class prediction, predictive attributes, and patterns of association have only appeared within the learning classifier system (LCS) literature since 2012. While still not widely utilized by the LCS research community, formal evaluations of test statistic confidence are imperative to large and complex real world applications such as genetic epidemiology where it is standard practice to quantify the likelihood that a seemingly meaningful statistic could have been obtained purely by chance. LCS algorithms are relatively computationally expensive on their own. The compounding requirements for generating permutation-based statistics may be a limiting factor for some researchers interested in applying LCS algorithms to real world problems. Technology has made LCS parallelization strategies more accessible and thus more popular in recent years. In the present study we examine the benefits of externally parallelizing a series of independent LCS runs such that permutation testing with cross validation becomes more feasible to complete on a single multi-core workstation. We test our python implementation of this strategy in the context of a simulated complex genetic epidemiological data mining problem. Our evaluations indicate that as long as the number of concurrent processes does not exceed the number of CPU cores, the speedup achieved is approximately linear.
A Test by Any Other Name: P Values, Bayes Factors, and Statistical Inference.
Stern, Hal S
2016-01-01
Procedures used for statistical inference are receiving increased scrutiny as the scientific community studies the factors associated with insuring reproducible research. This note addresses recent negative attention directed at p values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye toward better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required.
Halpin, Peter F; Stam, Henderikus J
2006-01-01
The application of statistical testing in psychological research over the period of 1940-1960 is examined in order to address psychologists' reconciliation of the extant controversy between the Fisher and Neyman-Pearson approaches. Textbooks of psychological statistics and the psychological journal literature are reviewed to examine the presence of what Gigerenzer (1993) called a hybrid model of statistical testing. Such a model is present in the textbooks, although the mathematically incomplete character of this model precludes the appearance of a similarly hybridized approach to statistical testing in the research literature. The implications of this hybrid model for psychological research and the statistical testing controversy are discussed.
Bandyopadhyay, Sanghamitra; Mallik, Saurav; Mukhopadhyay, Anirban
2014-01-01
DNA microarray is a powerful technology that can simultaneously determine the levels of thousands of transcripts (generated, for example, from genes/miRNAs) across different experimental conditions or tissue samples. The motto of differential expression analysis is to identify the transcripts whose expressions change significantly across different types of samples or experimental conditions. A number of statistical testing methods are available for this purpose. In this paper, we provide a comprehensive survey on different parametric and non-parametric testing methodologies for identifying differential expression from microarray data sets. The performances of the different testing methods have been compared based on some real-life miRNA and mRNA expression data sets. For validating the resulting differentially expressed miRNAs, the outcomes of each test are checked with the information available for miRNA in the standard miRNA database PhenomiR 2.0. Subsequently, we have prepared different simulated data sets of different sample sizes (from 10 to 100 per group/population) and thereafter the power of each test have been calculated individually. The comparative simulated study might lead to formulate robust and comprehensive judgements about the performance of each test in the basis of assumption of data distribution. Finally, a list of advantages and limitations of the different statistical tests has been provided, along with indications of some areas where further studies are required.
Bandyopadhyay, Sanghamitra; Mallik, Saurav; Mukhopadhyay, Anirban
2013-11-15
DNA microarray is a powerful technology that can simultaneously determine the levels of thousands of transcripts (e.g. genes/miRNAs) across different experimental conditions or tissue samples. The motto of differential expression analysis is to identify the transcripts whose expressions change significantly across different types of samples or experimental conditions. A number of statistical testing methods are available for this purpose. In this article, we provide a comprehensive survey on different parametric and nonparametric testing methodologies for identifying differential expression from microarray datasets. The performances of the different testing methods have been compared based on some real-life miRNA and mRNA expression data sets. For validating the resulting differentially expressed miRNAs, the outcomes of each test are checked with the information available for miRNA in the standard miRNA database PhenomiR 2.0. Subsequently, we have prepared different simulated datasets of different sample sizes (from 10 to 100 per group/population) and thereafter the power of each test have been calculated individually. The comparative simulated study might lead to formulate robust and comprehensive judgements about the performance of each test in the basis of assumption of data distribution. Finally, a list of advantages and limitations of the different statistical tests has been provided, along with indications of some areas where further studies are required.
A new design for high-throughput peel tests: statistical analysis and example
NASA Astrophysics Data System (ADS)
Chiche, Arnaud; Zhang, Wenhua; Stafford, Christopher M.; Karim, Alamgir
2005-01-01
The peel test is one of the most common techniques to investigate the properties of pressure sensitive adhesives (PSAs). As the demand increases for combinatorial tools to rapidly test material performance, designing a high-throughput peel test is a critical improvement of this well-established technique. A glaring drawback to adapting conventional peel tests to study combinatorial specimens is the lack of sufficient statistical information that is the foundation of this type of measurement. For example, using a continuous gradient of sample properties or test conditions in the peel direction implies that each data point (force) corresponds to a given test condition, thus prohibiting the average force to be calculated for a given condition. The aim of this paper is both to highlight the potential problems and limitations of a high-throughput peel test and suggest simple experimental solutions to these problems based on a statistical analysis of the data. The effect of the peel rate on the peel force is used to illustrate our approach.
An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics.
Kim, Junghi; Bai, Yun; Pan, Wei
2015-12-01
We study the problem of testing for single marker-multiple phenotype associations based on genome-wide association study (GWAS) summary statistics without access to individual-level genotype and phenotype data. For most published GWASs, because obtaining summary data is substantially easier than accessing individual-level phenotype and genotype data, while often multiple correlated traits have been collected, the problem studied here has become increasingly important. We propose a powerful adaptive test and compare its performance with some existing tests. We illustrate its applications to analyses of a meta-analyzed GWAS dataset with three blood lipid traits and another with sex-stratified anthropometric traits, and further demonstrate its potential power gain over some existing methods through realistic simulation studies. We start from the situation with only one set of (possibly meta-analyzed) genome-wide summary statistics, then extend the method to meta-analysis of multiple sets of genome-wide summary statistics, each from one GWAS. We expect the proposed test to be useful in practice as more powerful than or complementary to existing methods.
Statistical Analysis of Electromigration in Cu Interconnects With Multi-link Test Structures
Justison, P.; Kawasaki, H.; Gall, M.; Thrasher, S.; Hauschildt, M.; Hernandez, R.; Capasso, C.; Ho, P.S.
2004-12-08
The continual downward scaling of devices and increases in drive current have required an ever shrinking interconnect pitch and higher current densities. In order to overcome both the higher signal delay, as well as reliability concerns, new metallization technologies like Cu interconnects and low-k interlevel dielectrics have been developed. The implementation of inlaid Cu interconnects introduces a new set of material systems and geometries which results in new mass transport and failure mechanisms under electromigration. This study focuses on the characterization and understanding of electromigration-induced failures in advanced, 0.13 {mu}m technology node Cu interconnects. Statistically based methodologies, using multi-link test structures, were developed and used to further understand the reliability of these advanced interconnects. Single-inlaid structures designed to test both the upper and lower interfaces associated with a Cu via were used to understand the role of void formation and interconnect geometry in EM behavior. These statistical methodologies were also applied to EM tests on dual-inlaid test structures in order to understand the impact of a continuous via-metal connection on void formation including the potential for multiple failure modes. Dual-inlaid integrations of varying maturity levels were examined to highlight the advantages of the statistically based methodology in determining extrinsic failure modes as well as increasing the confidence of EM lifetime prediction.
Modeling the Test-Retest Statistics of a Localization Experiment in the Full Horizontal Plane.
Morsnowski, André; Maune, Steffen
2016-10-01
Two approaches to model the test-retest statistics of a localization experiment basing on Gaussian distribution and on surrogate data are introduced. Their efficiency is investigated using different measures describing directional hearing ability. A localization experiment in the full horizontal plane is a challenging task for hearing impaired patients. In clinical routine, we use this experiment to evaluate the progress of our cochlear implant (CI) recipients. Listening and time effort limit the reproducibility. The localization experiment consists of a 12 loudspeaker circle, placed in an anechoic room, a "camera silens". In darkness, HSM sentences are presented at 65 dB pseudo-erratically from all 12 directions with five repetitions. This experiment is modeled by a set of Gaussian distributions with different standard deviations added to a perfect estimator, as well as by surrogate data. Five repetitions per direction are used to produce surrogate data distributions for the sensation directions. To investigate the statistics, we retrospectively use the data of 33 CI patients with 92 pairs of test-retest-measurements from the same day. The first model does not take inversions into account, (i.e., permutations of the direction from back to front and vice versa are not considered), although they are common for hearing impaired persons particularly in the rear hemisphere. The second model considers these inversions but does not work with all measures. The introduced models successfully describe test-retest statistics of directional hearing. However, since their applications on the investigated measures perform differently no general recommendation can be provided. The presented test-retest statistics enable pair test comparisons for localization experiments.
Score statistic to test for genetic correlation for proband-family design.
el Galta, R; van Duijn, C M; van Houwelingen, J C; Houwing-Duistermaat, J J
2005-07-01
In genetic epidemiological studies informative families are often oversampled to increase the power of a study. For a proband-family design, where relatives of probands are sampled, we derive the score statistic to test for clustering of binary and quantitative traits within families due to genetic factors. The derived score statistic is robust to ascertainment scheme. We considered correlation due to unspecified genetic effects and/or due to sharing alleles identical by descent (IBD) at observed marker locations in a candidate region. A simulation study was carried out to study the distribution of the statistic under the null hypothesis in small data-sets. To illustrate the score statistic, data from 33 families with type 2 diabetes mellitus (DM2) were analyzed. In addition to the binary outcome DM2 we also analyzed the quantitative outcome, body mass index (BMI). For both traits familial aggregation was highly significant. For DM2, also including IBD sharing at marker D3S3681 as a cause of correlation gave an even more significant result, which suggests the presence of a trait gene linked to this marker. We conclude that for the proband-family design the score statistic is a powerful and robust tool for detecting clustering of outcomes.
Ahn, Soyeon; Park, Seong Ho; Lee, Kyoung Ho
2013-05-01
Demonstrating similarity between compared groups--that is, equivalence or noninferiority of the outcome of one group to the outcome of another group--requires a different analytic approach than determining the difference between groups--that is, superiority of one group over another. Neither a statistically significant difference between groups (P < .05) nor a lack of significant difference (P ≥ .05) from conventional statistical tests provides answers about equivalence/noninferiority. Statistical testing of equivalence/noninferiority generally uses a confidence interval, where equivalence/noninferiority is claimed when the confidence interval of the difference in outcome between compared groups is within a predetermined equivalence/noninferiority margin that represents a clinically or scientifically acceptable range of differences and is typically described by Δ. The equivalence/noninferiority margin should be justified both clinically and statistically, considering the loss in the main outcome and the compensatory gain, and be chosen conservatively to avoid making a false claim of equivalence/noninferiority for an inferior outcome. Sample size estimation needs to be specified for equivalence/noninferiority design, considering Δ in addition to other general factors. The need for equivalence/noninferiority research studies is expected to increase in radiology, and a good understanding of the fundamental principles of the methodology will be helpful for conducting as well as for interpreting such studies.
Jenkinson, Garrett; Goutsias, John
2013-05-28
The master equation is used extensively to model chemical reaction systems with stochastic dynamics. However, and despite its phenomenological simplicity, it is not in general possible to compute the solution of this equation. Drawing exact samples from the master equation is possible, but can be computationally demanding, especially when estimating high-order statistical summaries or joint probability distributions. As a consequence, one often relies on analytical approximations to the solution of the master equation or on computational techniques that draw approximative samples from this equation. Unfortunately, it is not in general possible to check whether a particular approximation scheme is valid. The main objective of this paper is to develop an effective methodology to address this problem based on statistical hypothesis testing. By drawing a moderate number of samples from the master equation, the proposed techniques use the well-known Kolmogorov-Smirnov statistic to reject the validity of a given approximation method or accept it with a certain level of confidence. Our approach is general enough to deal with any master equation and can be used to test the validity of any analytical approximation method or any approximative sampling technique of interest. A number of examples, based on the Schlögl model of chemistry and the SIR model of epidemiology, clearly illustrate the effectiveness and potential of the proposed statistical framework.
Rivoirard, Romain; Duplay, Vianney; Oriol, Mathieu; Tinquaut, Fabien; Chauvin, Franck; Magne, Nicolas; Bourmaud, Aurelie
2016-01-01
Background Quality of reporting for Randomized Clinical Trials (RCTs) in oncology was analyzed in several systematic reviews, but, in this setting, there is paucity of data for the outcomes definitions and consistency of reporting for statistical tests in RCTs and Observational Studies (OBS). The objective of this review was to describe those two reporting aspects, for OBS and RCTs in oncology. Methods From a list of 19 medical journals, three were retained for analysis, after a random selection: British Medical Journal (BMJ), Annals of Oncology (AoO) and British Journal of Cancer (BJC). All original articles published between March 2009 and March 2014 were screened. Only studies whose main outcome was accompanied by a corresponding statistical test were included in the analysis. Studies based on censored data were excluded. Primary outcome was to assess quality of reporting for description of primary outcome measure in RCTs and of variables of interest in OBS. A logistic regression was performed to identify covariates of studies potentially associated with concordance of tests between Methods and Results parts. Results 826 studies were included in the review, and 698 were OBS. Variables were described in Methods section for all OBS studies and primary endpoint was clearly detailed in Methods section for 109 RCTs (85.2%). 295 OBS (42.2%) and 43 RCTs (33.6%) had perfect agreement for reported statistical test between Methods and Results parts. In multivariable analysis, variable "number of included patients in study" was associated with test consistency: aOR (adjusted Odds Ratio) for third group compared to first group was equal to: aOR Grp3 = 0.52 [0.31–0.89] (P value = 0.009). Conclusion Variables in OBS and primary endpoint in RCTs are reported and described with a high frequency. However, statistical tests consistency between methods and Results sections of OBS is not always noted. Therefore, we encourage authors and peer reviewers to verify consistency of
Portmanteau test statistics for seasonal serial correlation in time series models.
Mahdi, Esam
2016-01-01
The seasonal autoregressive moving average SARMA models have been widely adopted for modeling many time series encountered in economic, hydrology, meteorological, and environmental studies which exhibited strong seasonal behavior with a period s. If the model is adequate, the autocorrelations in the errors at the seasonal and the nonseasonal lags will be zero. Despite the popularity uses of the portmanteau tests for the SARMA models, the diagnostic checking at the seasonal lags [Formula: see text], where m is the largest lag considered for autocorrelation and s is the seasonal period, has not yet received as much attention as it deserves. In this paper, we devise seasonal portmanteau test statistics to test whether the seasonal autocorrelations at multiple lags s of time series are different from zero. Simulation studies are performed to assess the performance of the asymptotic distribution results of the proposed statistics in finite samples. Results suggest to use the proposed tests as complementary to those classical tests found in literature. An illustrative application is given to demonstrate the usefulness of this test.
Vera-Ruiz, Victor A; Lau, Kwok W; Robinson, John; Jermiin, Lars S
2014-01-01
Under a Markov model of evolution, recoding, or lumping, of the four nucleotides into fewer groups may permit analysis under simpler conditions but may unfortunately yield misleading results unless the evolutionary process of the recoded groups remains Markovian. If a Markov process is lumpable, then the evolutionary process of the recoded groups is Markovian. We consider stationary, reversible, and homogeneous Markov processes on two taxa and compare three tests for lumpability: one using an ad hoc test statistic, which is based on an index that is evaluated using a bootstrap approximation of its distribution; one that is based on a test proposed specifically for Markov chains; and one using a likelihood-ratio test. We show that the likelihood-ratio test is more powerful than the index test, which is more powerful than that based on the Markov chain test statistic. We also show that for stationary processes on binary trees with more than two taxa, the tests can be applied to all pairs. Finally, we show that if the process is lumpable, then estimates obtained under the recoded model agree with estimates obtained under the original model, whereas, if the process is not lumpable, then these estimates can differ substantially. We apply the new likelihood-ratio test for lumpability to two primate data sets, one with a mitochondrial origin and one with a nuclear origin. Recoding may result in biased phylogenetic estimates because the original evolutionary process is not lumpable. Accordingly, testing for lumpability should be done prior to phylogenetic analysis of recoded data.
Taroni, F; Biedermann, A; Bozza, S
2016-02-01
Many people regard the concept of hypothesis testing as fundamental to inferential statistics. Various schools of thought, in particular frequentist and Bayesian, have promoted radically different solutions for taking a decision about the plausibility of competing hypotheses. Comprehensive philosophical comparisons about their advantages and drawbacks are widely available and continue to span over large debates in the literature. More recently, controversial discussion was initiated by an editorial decision of a scientific journal [1] to refuse any paper submitted for publication containing null hypothesis testing procedures. Since the large majority of papers published in forensic journals propose the evaluation of statistical evidence based on the so called p-values, it is of interest to expose the discussion of this journal's decision within the forensic science community. This paper aims to provide forensic science researchers with a primer on the main concepts and their implications for making informed methodological choices.
Statistical study of thermal fracture of ceramic materials in the water quench test
NASA Technical Reports Server (NTRS)
Rogers, Wayne P.; Emery, Ashley F.; Bradt, Richard C.; Kobayashi, Albert S.
1987-01-01
The Weibull statistical theory of fracture was applied to thermal shock of ceramics in the water quench test. Transient thermal stresses and probability of failure were calculated for a cylindrical specimen cooled by convection. The convective heat transfer coefficient was calibrated using the time to failure which was measured with an acoustic emission technique. Theoretical failure probability distributions as a function of time and quench temperature compare favorably with experimental results for three high-alumina ceramics and a glass.
2007-11-02
Software Encryption. Dec. 1993. Cambridge, U.K.: R. Anderson, pp. 185-190. [2] Donald E. Knuth , The Art of Computer Programming. Vol 2: Seminumer- ical...two statistical tests for keystream sequences,” Electronics Letters. 23, pp. 365-366. [3] D. E. Knuth (1998), The Art of Computer Programming. Vol. 2...combination of the linearly independent m-bit vectors. Maple An interactive computer algebra system that provides a complete mathematical environment for the
Statistical analysis of the hen's egg test for micronucleus induction (HET-MN assay).
Hothorn, Ludwig A; Reisinger, Kerstin; Wolf, Thorsten; Poth, Albrecht; Fieblinger, Dagmar; Liebsch, Manfred; Pirow, Ralph
2013-09-18
The HET-MN assay (hen's egg test for micronucleus induction) is different from other in vitro genotoxicity assays in that it includes toxicologically important features such as absorption, distribution, metabolic activation, and excretion of the test compound. As a promising follow-up to complement existing in vitro test batteries for genotoxicity, the HET-MN is currently undergoing a formal validation. To optimize the validation, the present study describes a critical analysis of previously obtained HET-MN data to check the experimental design and to identify the most appropriate statistical procedure to evaluate treatment effects. Six statistical challenges (I-VI) of general relevance were identified, and remedies were provided which can be transferred to similarly designed test methods: a Williams-type trend test is proposed for overdispersed counts (II) by means of a square-root transformation which is robust for small sample sizes (I), variance heterogeneity (III), and possible downturn effects at high doses (IV). Due to near-to-zero or even zero-count data occurring in the negative control (V), a conditional comparison of the treatment groups against the mean of the historical controls (VI) instead of the concurrent control was proposed, which is in accordance with US-FDA recommendations. For the modified Williams-type tests, the power can be estimated depending on the magnitude and shape of the trend, the number of dose groups, and the magnitude of the MN counts in the negative control. The experimental design used previously (i.e. six eggs per dose group, scoring of 1000 cells per egg) was confirmed. The proposed approaches are easily available in the statistical computing environment R, and the corresponding R-codes are provided.
Gene- and pathway-based association tests for multiple traits with GWAS summary statistics.
Kwak, Il-Youp; Pan, Wei
2017-01-01
To identify novel genetic variants associated with complex traits and to shed new insights on underlying biology, in addition to the most popular single SNP-single trait association analysis, it would be useful to explore multiple correlated (intermediate) traits at the gene- or pathway-level by mining existing single GWAS or meta-analyzed GWAS data. For this purpose, we present an adaptive gene-based test and a pathway-based test for association analysis of multiple traits with GWAS summary statistics. The proposed tests are adaptive at both the SNP- and trait-levels; that is, they account for possibly varying association patterns (e.g. signal sparsity levels) across SNPs and traits, thus maintaining high power across a wide range of situations. Furthermore, the proposed methods are general: they can be applied to mixed types of traits, and to Z-statistics or P-values as summary statistics obtained from either a single GWAS or a meta-analysis of multiple GWAS. Our numerical studies with simulated and real data demonstrated the promising performance of the proposed methods.
ERIC Educational Resources Information Center
Huberty, Carl J.
1993-01-01
Twenty-eight books published from 1910 to 1949, 19 books published from 1990 to 1992, and 5 multiple edition books were reviewed to examine the presentations of statistical testing, particularly coverage of the p-value and fixed-alpha approaches. Statistical testing itself is not at fault, but some textbook presentations, testing practices, and…
Statistical correlation analysis for comparing vibration data from test and analysis
NASA Technical Reports Server (NTRS)
Butler, T. G.; Strang, R. F.; Purves, L. R.; Hershfeld, D. J.
1986-01-01
A theory was developed to compare vibration modes obtained by NASTRAN analysis with those obtained experimentally. Because many more analytical modes can be obtained than experimental modes, the analytical set was treated as expansion functions for putting both sources in comparative form. The dimensional symmetry was developed for three general cases: nonsymmetric whole model compared with a nonsymmetric whole structural test, symmetric analytical portion compared with a symmetric experimental portion, and analytical symmetric portion with a whole experimental test. The theory was coded and a statistical correlation program was installed as a utility. The theory is established with small classical structures.
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Case Studies for the Statistical Design of Experiments Applied to Powered Rotor Wind Tunnel Tests
NASA Technical Reports Server (NTRS)
Overmeyer, Austin D.; Tanner, Philip E.; Martin, Preston B.; Commo, Sean A.
2015-01-01
The application of statistical Design of Experiments (DOE) to helicopter wind tunnel testing was explored during two powered rotor wind tunnel entries during the summers of 2012 and 2013. These tests were performed jointly by the U.S. Army Aviation Development Directorate Joint Research Program Office and NASA Rotary Wing Project Office, currently the Revolutionary Vertical Lift Project, at NASA Langley Research Center located in Hampton, Virginia. Both entries were conducted in the 14- by 22-Foot Subsonic Tunnel with a small portion of the overall tests devoted to developing case studies of the DOE approach as it applies to powered rotor testing. A 16-47 times reduction in the number of data points required was estimated by comparing the DOE approach to conventional testing methods. The average error for the DOE surface response model for the OH-58F test was 0.95 percent and 4.06 percent for drag and download, respectively. The DOE surface response model of the Active Flow Control test captured the drag within 4.1 percent of measured data. The operational differences between the two testing approaches are identified, but did not prevent the safe operation of the powered rotor model throughout the DOE test matrices.
An accurate test for homogeneity of odds ratios based on Cochran's Q-statistic.
Kulinskaya, Elena; Dollinger, Michael B
2015-06-10
A frequently used statistic for testing homogeneity in a meta-analysis of K independent studies is Cochran's Q. For a standard test of homogeneity the Q statistic is referred to a chi-square distribution with K-1 degrees of freedom. For the situation in which the effects of the studies are logarithms of odds ratios, the chi-square distribution is much too conservative for moderate size studies, although it may be asymptotically correct as the individual studies become large. Using a mixture of theoretical results and simulations, we provide formulas to estimate the shape and scale parameters of a gamma distribution to fit the distribution of Q. Simulation studies show that the gamma distribution is a good approximation to the distribution for Q. Use of the gamma distribution instead of the chi-square distribution for Q should eliminate inaccurate inferences in assessing homogeneity in a meta-analysis. (A computer program for implementing this test is provided.) This hypothesis test is competitive with the Breslow-Day test both in accuracy of level and in power.
Testing statistical significance scores of sequence comparison methods with structure similarity
Hulsen, Tim; de Vlieg, Jacob; Leunissen, Jack AM; Groenen, Peter MA
2006-01-01
Background In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. Results All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. Conclusion The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons. PMID:17038163
Statistics, Handle with Care: Detecting Multiple Model Components with the Likelihood Ratio Test
NASA Astrophysics Data System (ADS)
Protassov, Rostislav; van Dyk, David A.; Connors, Alanna; Kashyap, Vinay L.; Siemiginowska, Aneta
2002-05-01
The likelihood ratio test (LRT) and the related F-test, popularized in astrophysics by Eadie and coworkers in 1971, Bevington in 1969, Lampton, Margon, & Bowyer, in 1976, Cash in 1979, and Avni in 1978, do not (even asymptotically) adhere to their nominal χ2 and F-distributions in many statistical tests common in astrophysics, thereby casting many marginal line or source detections and nondetections into doubt. Although the above authors illustrate the many legitimate uses of these statistics, in some important cases it can be impossible to compute the correct false positive rate. For example, it has become common practice to use the LRT or the F-test to detect a line in a spectral model or a source above background despite the lack of certain required regularity conditions. (These applications were not originally suggested by Cash or by Bevington.) In these and other settings that involve testing a hypothesis that is on the boundary of the parameter space, contrary to common practice, the nominal χ2 distribution for the LRT or the F-distribution for the F-test should not be used. In this paper, we characterize an important class of problems in which the LRT and the F-test fail and illustrate this nonstandard behavior. We briefly sketch several possible acceptable alternatives, focusing on Bayesian posterior predictive probability values. We present this method in some detail since it is a simple, robust, and intuitive approach. This alternative method is illustrated using the gamma-ray burst of 1997 May 8 (GRB 970508) to investigate the presence of an Fe K emission line during the initial phase of the observation. There are many legitimate uses of the LRT and the F-test in astrophysics, and even when these tests are inappropriate, there remain several statistical alternatives (e.g., judicious use of error bars and Bayes factors). Nevertheless, there are numerous cases of the inappropriate use of the LRT and similar tests in the literature, bringing substantive
Coulson, Melissa; Healey, Michelle; Fidler, Fiona; Cumming, Geoff
2010-01-01
A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST), or confidence intervals (CIs). Authors of articles published in psychology, behavioral neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST. PMID:21607077
Coulson, Melissa; Healey, Michelle; Fidler, Fiona; Cumming, Geoff
2010-01-01
A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST), or confidence intervals (CIs). Authors of articles published in psychology, behavioral neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST.
Brown, D Andrew; Lazar, Nicole A; Datta, Gauri S; Jang, Woncheol; McDowell, Jennifer E
2014-01-01
The analysis of functional neuroimaging data often involves the simultaneous testing for activation at thousands of voxels, leading to a massive multiple testing problem. This is true whether the data analyzed are time courses observed at each voxel or a collection of summary statistics such as statistical parametric maps (SPMs). It is known that classical multiplicity corrections become strongly conservative in the presence of a massive number of tests. Some more popular approaches for thresholding imaging data, such as the Benjamini-Hochberg step-up procedure for false discovery rate control, tend to lose precision or power when the assumption of independence of the data does not hold. Bayesian approaches to large scale simultaneous inference also often rely on the assumption of independence. We introduce a spatial dependence structure into a Bayesian testing model for the analysis of SPMs. By using SPMs rather than the voxel time courses, much of the computational burden of Bayesian analysis is mitigated. Increased power is demonstrated by using the dependence model to draw inference on a real dataset collected in a fMRI study of cognitive control. The model also is shown to lead to improved identification of neural activation patterns known to be associated with eye movement tasks. © 2013.
Statistical Tests of System Linearity Based on the Method of Surrogate Data
Hunter, N.; Paez, T.; Red-Horse, J.
1998-11-04
When dealing with measured data from dynamic systems we often make the tacit assumption that the data are generated by linear dynamics. While some systematic tests for linearity and determinism are available - for example the coherence fimction, the probability density fimction, and the bispectrum - fi,u-ther tests that quanti$ the existence and the degree of nonlinearity are clearly needed. In this paper we demonstrate a statistical test for the nonlinearity exhibited by a dynamic system excited by Gaussian random noise. We perform the usual division of the input and response time series data into blocks as required by the Welch method of spectrum estimation and search for significant relationships between a given input fkequency and response at harmonics of the selected input frequency. We argue that systematic tests based on the recently developed statistical method of surrogate data readily detect significant nonlinear relationships. The paper elucidates the method of surrogate data. Typical results are illustrated for a linear single degree-of-freedom system and for a system with polynomial stiffness nonlinearity.
A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants
Broadaway, K. Alaine; Cutler, David J.; Duncan, Richard; Moore, Jacob L.; Ware, Erin B.; Jhun, Min A.; Bielak, Lawrence F.; Zhao, Wei; Smith, Jennifer A.; Peyser, Patricia A.; Kardia, Sharon L.R.; Ghosh, Debashis; Epstein, Michael P.
2016-01-01
Increasing empirical evidence suggests that many genetic variants influence multiple distinct phenotypes. When cross-phenotype effects exist, multivariate association methods that consider pleiotropy are often more powerful than univariate methods that model each phenotype separately. Although several statistical approaches exist for testing cross-phenotype effects for common variants, there is a lack of similar tests for gene-based analysis of rare variants. In order to fill this important gap, we introduce a statistical method for cross-phenotype analysis of rare variants using a nonparametric distance-covariance approach that compares similarity in multivariate phenotypes to similarity in rare-variant genotypes across a gene. The approach can accommodate both binary and continuous phenotypes and further can adjust for covariates. Our approach yields a closed-form test whose significance can be evaluated analytically, thereby improving computational efficiency and permitting application on a genome-wide scale. We use simulated data to demonstrate that our method, which we refer to as the Gene Association with Multiple Traits (GAMuT) test, provides increased power over competing approaches. We also illustrate our approach using exome-chip data from the Genetic Epidemiology Network of Arteriopathy. PMID:26942286
Gershgorin, B.; Majda, A.J.
2011-02-20
A statistically exactly solvable model for passive tracers is introduced as a test model for the authors' Nonlinear Extended Kalman Filter (NEKF) as well as other filtering algorithms. The model involves a Gaussian velocity field and a passive tracer governed by the advection-diffusion equation with an imposed mean gradient. The model has direct relevance to engineering problems such as the spread of pollutants in the air or contaminants in the water as well as climate change problems concerning the transport of greenhouse gases such as carbon dioxide with strongly intermittent probability distributions consistent with the actual observations of the atmosphere. One of the attractive properties of the model is the existence of the exact statistical solution. In particular, this unique feature of the model provides an opportunity to design and test fast and efficient algorithms for real-time data assimilation based on rigorous mathematical theory for a turbulence model problem with many active spatiotemporal scales. Here, we extensively study the performance of the NEKF which uses the exact first and second order nonlinear statistics without any approximations due to linearization. The role of partial and sparse observations, the frequency of observations and the observation noise strength in recovering the true signal, its spectrum, and fat tail probability distribution are the central issues discussed here. The results of our study provide useful guidelines for filtering realistic turbulent systems with passive tracers through partial observations.
TONG, LIPING; YANG, JIE; COOPER, RICHARD S.
2010-01-01
SUMMARY We address the asymptotic and approximate distributions of a large class of test statistics with quadratic forms used in association studies. The statistics of interest take the general form D = XT AX, where A is a general similarity matrix which may or may not be positive semi-definite, and X follows the multivariate normal distribution with mean μ and variance matrix Σ, where Σ may or may not be singular. We show that D can be written as a linear combination of independent chi-square random variables with a shift. Furthermore, its distribution can be approximated by a chi-square or the difference of two chi-square distributions. In the setting of association testing, our methods are especially useful in two situations. First, when the required significance level is much smaller than 0.05 such as in a genome scan the estimation of p-values using permutation procedures can be challenging. Second, when an EM algorithm is required to infer haplotype frequencies from un-phased genotype data the computation can be intensive for a permutation procedure. In either situation, an efficient and accurate estimation procedure would be useful. Our method can be applied to any quadratic form statistic and therefore should be of general interest. PMID:20529017
Feyissa, Daniel D; Aher, Yogesh D; Engidawork, Ephrem; Höger, Harald; Lubec, Gert; Korz, Volker
2017-01-01
Animal models for anxiety, depressive-like and cognitive diseases or aging often involve testing of subjects in behavioral test batteries. The large number of test variables with different mean variations and within and between test correlations often constitute a significant problem in determining essential variables to assess behavioral patterns and their variation in individual animals as well as appropriate statistical treatment. Therefore, we applied a multivariate approach (principal component analysis) to analyse the behavioral data of 162 male adult Sprague-Dawley rats that underwent a behavioral test battery including commonly used tests for spatial learning and memory (holeboard) and different behavioral patterns (open field, elevated plus maze, forced swim test) as well as for motor abilities (Rota rod). The high dimensional behavioral results were reduced to fewer components associated with spatial cognition, general activity, anxiety-, and depression-like behavior and motor ability. The loading scores of individual rats on these different components allow an assessment and the distribution of individual features in a population of animals. The reduced number of components can be used also for statistical calculations like appropriate sample sizes for valid discriminations between experimental groups, which otherwise have to be done on each variable. Because the animals were intact, untreated and experimentally naïve the results reflect trait patterns of behavior and thus individuality. The distribution of animals with high or low levels of anxiety, depressive-like behavior, general activity and cognitive features in a local population provides information of the probability of their appeareance in experimental samples and thus may help to avoid biases. However, such an analysis initially requires a large cohort of animals in order to gain a valid assessment.
Feyissa, Daniel D.; Aher, Yogesh D.; Engidawork, Ephrem; Höger, Harald; Lubec, Gert; Korz, Volker
2017-01-01
Animal models for anxiety, depressive-like and cognitive diseases or aging often involve testing of subjects in behavioral test batteries. The large number of test variables with different mean variations and within and between test correlations often constitute a significant problem in determining essential variables to assess behavioral patterns and their variation in individual animals as well as appropriate statistical treatment. Therefore, we applied a multivariate approach (principal component analysis) to analyse the behavioral data of 162 male adult Sprague-Dawley rats that underwent a behavioral test battery including commonly used tests for spatial learning and memory (holeboard) and different behavioral patterns (open field, elevated plus maze, forced swim test) as well as for motor abilities (Rota rod). The high dimensional behavioral results were reduced to fewer components associated with spatial cognition, general activity, anxiety-, and depression-like behavior and motor ability. The loading scores of individual rats on these different components allow an assessment and the distribution of individual features in a population of animals. The reduced number of components can be used also for statistical calculations like appropriate sample sizes for valid discriminations between experimental groups, which otherwise have to be done on each variable. Because the animals were intact, untreated and experimentally naïve the results reflect trait patterns of behavior and thus individuality. The distribution of animals with high or low levels of anxiety, depressive-like behavior, general activity and cognitive features in a local population provides information of the probability of their appeareance in experimental samples and thus may help to avoid biases. However, such an analysis initially requires a large cohort of animals in order to gain a valid assessment. PMID:28261069
Statistical auditing and randomness test of lotto k/N-type games
NASA Astrophysics Data System (ADS)
Coronel-Brizio, H. F.; Hernández-Montoya, A. R.; Rapallo, F.; Scalas, E.
2008-11-01
One of the most popular lottery games worldwide is the so-called “lotto k/N”. It considers N numbers 1,2,…,N from which k are drawn randomly, without replacement. A player selects k or more numbers and the first prize is shared amongst those players whose selected numbers match all of the k randomly drawn. Exact rules may vary in different countries. In this paper, mean values and covariances for the random variables representing the numbers drawn from this kind of game are presented, with the aim of using them to audit statistically the consistency of a given sample of historical results with theoretical values coming from a hypergeometric statistical model. The method can be adapted to test pseudorandom number generators.
A Test By Any Other Name: P-values, Bayes Factors and Statistical Inference
Stern, Hal S.
2016-01-01
The exchange between Hoitjink, van Kooten and Hulsker (in press) (HKH) and Morey, Wagenmakers, and Rouder (in press) (MWR) in this issue is focused on the use of Bayes factors for statistical inference but raises a number of more general questions about Bayesian and frequentist approaches to inference. This note addresses recent negative attention directed at p-values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye towards better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required. PMID:26881954
Tests of Mediation: Paradoxical Decline in Statistical Power as a Function of Mediator Collinearity
Beasley, T. Mark
2013-01-01
Increasing the correlation between the independent variable and the mediator (a coefficient) increases the effect size (ab) for mediation analysis; however, increasing a by definition increases collinearity in mediation models. As a result, the standard error of product tests increase. The variance inflation due to increases in a at some point outweighs the increase of the effect size (ab) and results in a loss of statistical power. This phenomenon also occurs with nonparametric bootstrapping approaches because the variance of the bootstrap distribution of ab approximates the variance expected from normal theory. Both variances increase dramatically when a exceeds the b coefficient, thus explaining the power decline with increases in a. Implications for statistical analysis and applied researchers are discussed. PMID:24954952
A statistical test for the equality of differently adjusted incidence rate ratios.
Hoffmann, Kurt; Pischon, Tobias; Schulz, Mandy; Schulze, Matthias B; Ray, Jennifer; Boeing, Heiner
2008-03-01
An incidence rate ratio (IRR) is a meaningful effect measure in epidemiology if it is adjusted for all important confounders. For evaluation of the impact of adjustment, adjusted IRRs should be compared with crude IRRs. The aim of this methodological study was to present a statistical approach for testing the equality of adjusted and crude IRRs and to derive a confidence interval for the ratio of the two IRRs. The method can be extended to compare two differently adjusted IRRs and, thus, to evaluate the effect of additional adjustment. The method runs immediately on existing software. To illustrate the application of this approach, the authors studied adjusted IRRs for two risk factors of type 2 diabetes using data from the European Prospective Investigation into Cancer and Nutrition-Potsdam Study from 2005. The statistical method described may be helpful as an additional tool for analyzing epidemiologic cohort data and for interpreting results obtained from Cox regression models with adjustment for different covariates.
Test of statistical models for gases with and without internal energy states.
NASA Technical Reports Server (NTRS)
Huang, A. B.; Hwang, P. F.
1973-01-01
The problem of nonlinear rarefied Couette flow with heat transfer has been studied for both monatomic and diatomic gases using the Boltzmann equation with the Bhatnagar-Gross-Krook type models as the governing equation and the method of discrete ordinates as a tool. The calculated results have been compared with the existing experimental data in order to test the accuracy and the applicability of the statistical models for this one-dimensional problem. The calculated density results are found to be in good agreement with available experimental data; the calculated heat flux solution for the linear case is found to always be lower than the experimental data of Teagan and Springer. The comparisons made here indicate that the statistical models are indeed reasonably accurate so that their use is justified in the type of problems investigated.
Statistical analysis of aquifer-test results for nine regional aquifers in Louisiana
Martin, Angel; Early, D.A.
1987-01-01
This report, prepared as part of the Gulf Coast Regional Aquifer-System Analysis project, presents a compilation, summarization, and statistical analysis of aquifer-test results for nine regional aquifers in Louisiana. These are from youngest to oldest: The alluvial, Pleistocene, Evangeline, Jasper, Catahoula, Cockfield, Sparta, Carrizo, and Wilcox aquifers. Approximately 1,500 aquifer tests in U.S. Geological Survey files in Louisiana were examined and 1,001 were input to a computer file. Analysis of the aquifer test results and plots that describe aquifer hydraulic characteristics were made for each regional aquifer. Results indicate that, on the average, permeability (hydraulic conductivity) generally tends to decrease from the youngest aquifers to the oldest. The most permeable aquifers in Louisiana are the alluvial and Pleistocene aquifers; whereas, the least permeable are the Carrizo and Wilcox aquifers. (Author 's abstract)
2010-01-01
Background The null hypothesis significance test (NHST) is the most frequently used statistical method, although its inferential validity has been widely criticized since its introduction. In 1988, the International Committee of Medical Journal Editors (ICMJE) warned against sole reliance on NHST to substantiate study conclusions and suggested supplementary use of confidence intervals (CI). Our objective was to evaluate the extent and quality in the use of NHST and CI, both in English and Spanish language biomedical publications between 1995 and 2006, taking into account the International Committee of Medical Journal Editors recommendations, with particular focus on the accuracy of the interpretation of statistical significance and the validity of conclusions. Methods Original articles published in three English and three Spanish biomedical journals in three fields (General Medicine, Clinical Specialties and Epidemiology - Public Health) were considered for this study. Papers published in 1995-1996, 2000-2001, and 2005-2006 were selected through a systematic sampling method. After excluding the purely descriptive and theoretical articles, analytic studies were evaluated for their use of NHST with P-values and/or CI for interpretation of statistical "significance" and "relevance" in study conclusions. Results Among 1,043 original papers, 874 were selected for detailed review. The exclusive use of P-values was less frequent in English language publications as well as in Public Health journals; overall such use decreased from 41% in 1995-1996 to 21% in 2005-2006. While the use of CI increased over time, the "significance fallacy" (to equate statistical and substantive significance) appeared very often, mainly in journals devoted to clinical specialties (81%). In papers originally written in English and Spanish, 15% and 10%, respectively, mentioned statistical significance in their conclusions. Conclusions Overall, results of our review show some improvements in
An Application of M[subscript 2] Statistic to Evaluate the Fit of Cognitive Diagnostic Models
ERIC Educational Resources Information Center
Liu, Yanlou; Tian, Wei; Xin, Tao
2016-01-01
The fit of cognitive diagnostic models (CDMs) to response data needs to be evaluated, since CDMs might yield misleading results when they do not fit the data well. Limited-information statistic M[subscript 2] and the associated root mean square error of approximation (RMSEA[subscript 2]) in item factor analysis were extended to evaluate the fit of…
An Application of M[subscript 2] Statistic to Evaluate the Fit of Cognitive Diagnostic Models
ERIC Educational Resources Information Center
Liu, Yanlou; Tian, Wei; Xin, Tao
2016-01-01
The fit of cognitive diagnostic models (CDMs) to response data needs to be evaluated, since CDMs might yield misleading results when they do not fit the data well. Limited-information statistic M[subscript 2] and the associated root mean square error of approximation (RMSEA[subscript 2]) in item factor analysis were extended to evaluate the fit of…
Using Relative Statistics and Approximate Disease Prevalence to Compare Screening Tests.
Samuelson, Frank; Abbey, Craig
2016-11-01
Schatzkin et al. and other authors demonstrated that the ratios of some conditional statistics such as the true positive fraction are equal to the ratios of unconditional statistics, such as disease detection rates, and therefore we can calculate these ratios between two screening tests on the same population even if negative test patients are not followed with a reference procedure and the true and false negative rates are unknown. We demonstrate that this same property applies to an expected utility metric. We also demonstrate that while simple estimates of relative specificities and relative areas under ROC curves (AUC) do depend on the unknown negative rates, we can write these ratios in terms of disease prevalence, and the dependence of these ratios on a posited prevalence is often weak particularly if that prevalence is small or the performance of the two screening tests is similar. Therefore we can estimate relative specificity or AUC with little loss of accuracy, if we use an approximate value of disease prevalence.
Rountree, Wes; Vandergrift, Nathan; Bainbridge, John; Sanchez, Ana M.; Denny, Thomas N.
2014-01-01
In September 2011 Duke University was awarded a contract to develop the National Institutes of Health/National Institute of Allergy and Infectious Diseases (NIH/NIAID) External Quality Assurance Program Oversight Laboratory (EQAPOL). Through EQAPOL, proficiency testing programs are administered for Interferon-γ (IFN-γ) Enzyme-linked immunosorbent spot (ELISpot), Intracellular Cytokine Staining Flow Cytometry (ICS) and Luminex-based cytokine assays. One of the charges of the EQAPOL program was to apply statistical methods to determine overall site performance. We utilized various statistical methods for each program to find the most appropriate for assessing laboratory performance using the consensus average as the target value. Accuracy ranges were calculated based on Wald-type confidence intervals, exact Poisson confidence intervals, or via simulations. Given the nature of proficiency testing data, which has repeated measures within donor/sample made across several laboratories; the use of mixed effects models with alpha adjustments for multiple comparisons was also explored. Mixed effects models were found to be the most useful method to assess laboratory performance with respect to accuracy to the consensus. Model based approaches to the proficiency testing data in EQAPOL will continue to be utilized. Mixed effects models also provided a means of performing more complex analyses that would address secondary research questions regarding within and between laboratory variability as well as longitudinal analyses. PMID:24456626
Chu, Tsong-Lun; Varuttamaseni, Athi; Baek, Joo-Seok
2016-11-01
The U.S. Nuclear Regulatory Commission (NRC) encourages the use of probabilistic risk assessment (PRA) technology in all regulatory matters, to the extent supported by the state-of-the-art in PRA methods and data. Although much has been accomplished in the area of risk-informed regulation, risk assessment for digital systems has not been fully developed. The NRC established a plan for research on digital systems to identify and develop methods, analytical tools, and regulatory guidance for (1) including models of digital systems in the PRAs of nuclear power plants (NPPs), and (2) incorporating digital systems in the NRC's risk-informed licensing and oversight activities. Under NRC's sponsorship, Brookhaven National Laboratory (BNL) explored approaches for addressing the failures of digital instrumentation and control (I and C) systems in the current NPP PRA framework. Specific areas investigated included PRA modeling digital hardware, development of a philosophical basis for defining software failure, and identification of desirable attributes of quantitative software reliability methods. Based on the earlier research, statistical testing is considered a promising method for quantifying software reliability. This paper describes a statistical software testing approach for quantifying software reliability and applies it to the loop-operating control system (LOCS) of an experimental loop of the Advanced Test Reactor (ATR) at Idaho National Laboratory (INL).
Drug-excipient compatibility testing using a high-throughput approach and statistical design.
Wyttenbach, Nicole; Birringer, Christian; Alsenz, Jochem; Kuentz, Martin
2005-01-01
The aim of our research was to develop a miniaturized high throughput drug-excipient compatibility test. Experiments were planned and evaluated using statistical experimental design. Binary mixtures of a drug, acetylsalicylic acid, or fluoxetine hydrochloride, and of excipients commonly used in solid dosage forms were prepared at a ratio of approximately 1:100 in 96-well microtiter plates. Samples were exposed to different temperature (40 degrees C/ 50 degrees C) and humidity (10%/75%) for different time (1 week/4 weeks), and chemical drug degradation was analyzed using a fast gradient high pressure liquid chromatography (HPLC). Categorical statistical design was applied to identify the effects and interactions of time, temperature, humidity, and excipient on drug degradation. Acetylsalicylic acid was least stable in the presence of magnesium stearate, dibasic calcium phosphate, or sodium starch glycolate. Fluoxetine hydrochloride exhibited a marked degradation only with lactose. Factor-interaction plots revealed that the relative humidity had the strongest effect on the drug excipient blends tested. In conclusion, the developed technique enables fast drug-excipient compatibility testing and identification of interactions. Since only 0.1 mg of drug is needed per data point, fast rational preselection of the pharmaceutical additives can be performed early in solid dosage form development.
Statistical tests of a periodicity hypothesis for crater formation rate - II
NASA Astrophysics Data System (ADS)
Yabushita, S.
1996-04-01
A statistical test is made of the periodicity hypothesis for crater formation rate, using a new data set compiled by Grieve. The criterion adopted is that of Broadbent, modified so as to take into account the loss of craters with time. Small craters (diameters <=2 km) are highly concentrated near the recent epoch, and are not adequate as a data set for testing. Various subsets of the original data are subjected to the test and a period close to 30 Myr is detected. On the assumption of random distribution of crater ages, the probability of detecting such a period is calculated at 50, 73 and 64 per cent respectively for craters with
A new method to test shear wave splitting: Improving statistical assessment of splitting parameters
NASA Astrophysics Data System (ADS)
Corbalan Castejon, Ana
Shear wave splitting has proved to be a very useful technique to probe for seismic anisotropy in the earth's interior, and measurements of seismic anisotropy are perhaps the best way to constrain the strain history of the lithosphere and asthenosphere. However, existent methods of shear wave splitting analysis do not estimate uncertainty correctly, and do not allow for careful statistical modeling of anisotropy and uncertainty in complex scenarios. Consequently, the interpretation of shear wave splitting measurements has an undesirable subjective component. This study illustrates a new method to characterize shear wave splitting and the associated uncertainty based on the cross-convolution method [Menke and Levin, 2003]. This new method has been tested on synthetic data and benchmarked with data from the Pasadena, California seismic station (PAS). Synthetic tests show that the method can successfully obtain the splitting parameters from observed split shear waves. PAS results are very reasonable and consistent with previous studies [Liu et al., 1995; Ozalaybey and Savage, 1995; Polet and Kanamori, 2002]. As presented, the Menke and Levin [2003] method does not explicitly model the errors. Our method works on noisy data without any particular need for processing, it fully accounts for correlation structures on the noise, and it models the errors with a proper bootstrapping approach. Hence, the method presented here casts the analysis of shear wave splitting into a more formal statistical context, allowing for formal hypothesis testing and more nuanced interpretation of seismic anisotropy results.
A statistical test of the unified model of active galactic nuclei
NASA Astrophysics Data System (ADS)
Hong, Xiao-yu; Wan, Tong-shan
1995-02-01
A statistical test is carried out on the AGN unified model using a sample of superluminal sources. For different classes of source the distribution of R, the luminosity ration between the core and extended region, and the mean angle overlineφ between the jet and the line of sight were evaluated. Correlations among R, the Lorentz factor γ, the projected size of the jet d and the linear size of the source l were examined. It was found that there is anticorrelation between R and d, and correlation between γ and l. These results are favorable to the orientation interpretation of the unified model of the AGN.
ERIC Educational Resources Information Center
van Krimpen-Stoop, Edith M. L. A.; Meijer, Rob R.
Person-fit research in the context of paper-and-pencil tests is reviewed, and some specific problems regarding person fit in the context of computerized adaptive testing (CAT) are discussed. Some new methods are proposed to investigate person fit in a CAT environment. These statistics are based on Statistical Process Control (SPC) theory. A…
Hickey, Graeme L; Kefford, Ben J; Dunlop, Jason E; Craig, Peter S
2008-11-01
Species sensitivity distributions (SSDs) may accurately predict the proportion of species in a community that are at hazard from environmental contaminants only if they contain sensitivity data from a large sample of species representative of the mix of species present in the locality or habitat of interest. With current widely accepted ecotoxicological methods, however, this rarely occurs. Two recent suggestions address this problem. First, use rapid toxicity tests, which are less rigorous than conventional tests, to approximate experimentally the sensitivity of many species quickly and in approximate proportion to naturally occurring communities. Second, use expert judgements regarding the sensitivity of higher taxonomic groups (e.g., orders) and Bayesian statistical methods to construct SSDs that reflect the richness (or perceived importance) of these groups. Here, we describe and analyze several models from a Bayesian perspective to construct SSDs from data derived using rapid toxicity testing, combining both rapid test data and expert opinion. We compare these new models with two frequentist approaches, Kaplan-Meier and a log-normal distribution, using a large data set on the salinity sensitivity of freshwater macroinvertebrates from Victoria (Australia). The frequentist log-normal analysis produced a SSD that overestimated the hazard to species relative to the Kaplan-Meier and Bayesian analyses. Of the Bayesian analyses investigated, the introduction of a weighting factor to account for the richness (or importance) of taxonomic groups influenced the calculated hazard to species. Furthermore, Bayesian methods allowed us to determine credible intervals representing SSD uncertainty. We recommend that rapid tests, expert judgements, and novel Bayesian statistical methods be used so that SSDs reflect communities of organisms found in nature.
Giambartolomei, Claudia; Vukcevic, Damjan; Schadt, Eric E.; Franke, Lude; Hingorani, Aroon D.; Wallace, Chris; Plagnol, Vincent
2014-01-01
Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including >100,000 individuals of European ancestry. Combining all lipid biomarkers, our re-analysis supported 26 out of 38 reported colocalisation results with eQTLs and identified 14 new colocalisation results, hence highlighting the value of a formal statistical test. In three cases of reported eQTL-lipid pairs (SYPL2, IFT172, TBKBP1) for which our analysis suggests that the eQTL pattern is not consistent with the lipid association, we identify alternative colocalisation results with SORT1, GCKR, and KPNB1, indicating that these genes are more likely to be causal in these genomic intervals. A key feature of the method is the ability to derive the output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at http://coloc.cs.ucl.ac.uk/coloc/). Our methodology provides information about candidate causal genes in associated intervals and has direct implications for the understanding of complex diseases as well as the design of drugs to
Statistical Testing of Dynamically Downscaled Rainfall Data for the East Coast of Australia
NASA Astrophysics Data System (ADS)
Parana Manage, Nadeeka; Lockart, Natalie; Willgoose, Garry; Kuczera, George
2015-04-01
This study performs a validation of statistical properties of downscaled climate data, concentrating on the rainfall which is required for hydrology predictions used in reservoir simulations. The data sets used in this study have been produced by the NARCliM (NSW/ACT Regional Climate Modelling) project which provides a dynamically downscaled climate dataset for South-East Australia at 10km resolution. NARCliM has used three configurations of the Weather Research Forecasting Regional Climate Model and four different GCMs (MIROC-medres 3.2, ECHAM5, CCCMA 3.1 and CSIRO mk3.0) from CMIP3 to perform twelve ensembles of simulations for current and future climates. Additionally to the GCM-driven simulations, three control run simulations driven by the NCEP/NCAR reanalysis for the entire period of 1950-2009 has also been performed by the project. The validation has been performed in the Upper Hunter region of Australia which is a semi-arid to arid region 200 kilometres North-West of Sydney. The analysis used the time series of downscaled rainfall data and ground based measurements for selected Bureau of Meteorology rainfall stations within the study area. The initial testing of the gridded rainfall was focused on the autoregressive characteristics of time series because the reservoir performance depends on long-term average runoffs. A correlation analysis was performed for fortnightly, monthly and annual averaged time resolutions showing a good statistical match between reanalysis and ground truth. The spatial variation of the statistics of gridded rainfall series were calculated and plotted at the catchment scale. The spatial correlation analysis shows a poor agreement between NARCliM data and ground truth at each time resolution. However, the spatial variability plots show a strong link between the statistics and orography at the catchment scale.
Statistical Methods for Multivariate Meta-analysis of Diagnostic Tests: An Overview and Tutorial
Ma, Xiaoye; Nie, Lei; Cole, Stephen R.; Chu, Haitao
2013-01-01
Summary In this article, we present an overview and tutorial of statistical methods for meta-analysis of diagnostic tests under two scenarios: 1) when the reference test can be considered a gold standard; and 2) when the reference test cannot be considered a gold standard. In the first scenario, we first review the conventional summary receiver operating characteristics (ROC) approach and a bivariate approach using linear mixed models (BLMM). Both approaches require direct calculations of study-specific sensitivities and specificities. We next discuss the hierarchical summary ROC curve approach for jointly modeling positivity criteria and accuracy parameters, and the bivariate generalized linear mixed models (GLMM) for jointly modeling sensitivities and specificities. We further discuss the trivariate GLMM for jointly modeling prevalence, sensitivities and specificities, which allows us to assess the correlations among the three parameters. These approaches are based on the exact binomial distribution and thus do not require an ad hoc continuity correction. Last, we discuss a latent class random effects model for meta-analysis of diagnostic tests when the reference test itself is imperfect for the second scenario. A number of case studies with detailed annotated SAS code in procedures MIXED and NLMIXED are presented to facilitate the implementation of these approaches. PMID:23804970
Sassenhagen, Jona; Alday, Phillip M
2016-11-01
Experimental research on behavior and cognition frequently rests on stimulus or subject selection where not all characteristics can be fully controlled, even when attempting strict matching. For example, when contrasting patients to controls, variables such as intelligence or socioeconomic status are often correlated with patient status. Similarly, when presenting word stimuli, variables such as word frequency are often correlated with primary variables of interest. One procedure very commonly employed to control for such nuisance effects is conducting inferential tests on confounding stimulus or subject characteristics. For example, if word length is not significantly different for two stimulus sets, they are considered as matched for word length. Such a test has high error rates and is conceptually misguided. It reflects a common misunderstanding of statistical tests: interpreting significance not to refer to inference about a particular population parameter, but about 1. the sample in question, 2. the practical relevance of a sample difference (so that a nonsignificant test is taken to indicate evidence for the absence of relevant differences). We show inferential testing for assessing nuisance effects to be inappropriate both pragmatically and philosophically, present a survey showing its high prevalence, and briefly discuss an alternative in the form of regression including nuisance variables.
A Simple Chi-Square Statistic for Testing Homogeneity of Zero-Inflated Distributions.
Johnson, William D; Burton, Jeffrey H; Beyl, Robbie A; Romer, Jacob E
2015-10-01
Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-valued observations is asymmetric, and its functional form may not be known or easily characterized. In this case, comparisons of the groups in terms of their respective percentiles may be appropriate as these estimates are nonparametric and more robust to outliers and other irregularities. The median test is often used to compare distributions with similar but asymmetric shapes but may be uninformative when there are excess zeros or dissimilar shapes. For zero-inflated distributions, it is useful to compare the distributions with respect to their proportion of zeros, coupled with the comparison of percentile profiles for the observed non-zero values. A simple chi-square test for simultaneous testing of these two components is proposed, applicable to both continuous and discrete data. Results of simulation studies are reported to summarize empirical power under several scenarios. We give recommendations for the minimum sample size which is necessary to achieve suitable test performance in specific examples.
2012-01-01
Background Because of the large volume of data and the intrinsic variation of data intensity observed in microarray experiments, different statistical methods have been used to systematically extract biological information and to quantify the associated uncertainty. The simplest method to identify differentially expressed genes is to evaluate the ratio of average intensities in two different conditions and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed. This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment. Results A new method of finding differentially expressed genes, called distributional fold change (DFC) test is introduced. The method is based on an analysis of the intensity distribution of all microarray probe sets mapped to a three dimensional feature space composed of average expression level, average difference of gene expression and total variance. The proposed method allows one to rank each feature based on the signal-to-noise ratio and to ascertain for each feature the confidence level and power for being differentially expressed. The performance of the new method was evaluated using the total and partial area under receiver operating curves and tested on 11 data sets from Gene Omnibus Database with independently verified differentially expressed genes and compared with the t-test and shrinkage t-test. Overall the DFC test performed the best – on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets. Conclusions The
Diagnosis of Misalignment in Overhung Rotor using the K-S Statistic and A2 Test
NASA Astrophysics Data System (ADS)
Garikapati, Diwakar; Pacharu, RaviKumar; Munukurthi, Rama Satya Satyanarayana
2017-03-01
Vibration measurement at the bearings of rotating machinery has become a useful technique for diagnosing incipient fault conditions. In particular, vibration measurement can be used to detect unbalance in rotor, bearing failure, gear problems or misalignment between a motor shaft and coupled shaft. This is a particular problem encountered in turbines, ID fans and FD fans used for power generation. For successful fault diagnosis, it is important to adopt motor current signature analysis (MCSA) techniques capable of identifying the faults. It is also useful to develop techniques for inferring information such as the severity of fault. It is proposed that modeling the cumulative distribution function of motor current signals with respect to appropriate theoretical distributions, and quantifying the goodness of fit with the Kolmogorov-Smirnov (KS) statistic and A2 test offers a suitable signal feature for diagnosis. This paper demonstrates the successful comparison of the K-S feature and A2 test for discriminating the misalignment fault from normal function.
More powerful genetic association testing via a new statistical framework for integrative genomics
Zhao, Sihai D.; Cai, T. Tony; Li, Hongzhe
2015-01-01
Integrative genomics offers a promising approach to more powerful genetic association studies. The hope is that combining outcome and genotype data with other types of genomic information can lead to more powerful SNP detection. We present a new association test based on a statistical model that explicitly assumes that genetic variations affect the outcome through perturbing gene expression levels. It is shown analytically that the proposed approach can have more power to detect SNPs that are associated with the outcome through transcriptional regulation, compared to tests using the outcome and genotype data alone, and simulations show that our method is relatively robust to misspecification. We also provide a strategy for applying our approach to high-dimensional genomic data. We use this strategy to identify a potentially new association between a SNP and a yeast cell’s response to the natural product tomatidine, which standard association analysis did not detect. PMID:24975802
More powerful genetic association testing via a new statistical framework for integrative genomics.
Zhao, Sihai D; Cai, T Tony; Li, Hongzhe
2014-12-01
Integrative genomics offers a promising approach to more powerful genetic association studies. The hope is that combining outcome and genotype data with other types of genomic information can lead to more powerful SNP detection. We present a new association test based on a statistical model that explicitly assumes that genetic variations affect the outcome through perturbing gene expression levels. It is shown analytically that the proposed approach can have more power to detect SNPs that are associated with the outcome through transcriptional regulation, compared to tests using the outcome and genotype data alone, and simulations show that our method is relatively robust to misspecification. We also provide a strategy for applying our approach to high-dimensional genomic data. We use this strategy to identify a potentially new association between a SNP and a yeast cell's response to the natural product tomatidine, which standard association analysis did not detect.
Escott-Price, Valentina; Ghodsi, Mansoureh; Schmidt, Karl Michael
2014-04-01
We evaluate the effect of genotyping errors on the type-I error of a general association test based on genotypes, showing that, in the presence of errors in the case and control samples, the test statistic asymptotically follows a scaled non-central $\\chi ^2$ distribution. We give explicit formulae for the scaling factor and non-centrality parameter for the symmetric allele-based genotyping error model and for additive and recessive disease models. They show how genotyping errors can lead to a significantly higher false-positive rate, growing with sample size, compared with the nominal significance levels. The strength of this effect depends very strongly on the population distribution of the genotype, with a pronounced effect in the case of rare alleles, and a great robustness against error in the case of large minor allele frequency. We also show how these results can be used to correct $p$-values.
2011-01-01
Background Many analyses of gene expression data involve hypothesis tests of an interaction term between two fixed effects, typically tested using a residual variance. In expression studies, the issue of variance heteroscedasticity has received much attention, and previous work has focused on either between-gene or within-gene heteroscedasticity. However, in a single experiment, heteroscedasticity may exist both within and between genes. Here we develop flexible shrinkage error estimators considering both between-gene and within-gene heteroscedasticity and use them to construct F-like test statistics for testing interactions, with cutoff values obtained by permutation. These permutation tests are complicated, and several permutation tests are investigated here. Results Our proposed test statistics are compared with other existing shrinkage-type test statistics through extensive simulation studies and a real data example. The results show that the choice of permutation procedures has dramatically more influence on detection power than the choice of F or F-like test statistics. When both types of gene heteroscedasticity exist, our proposed test statistics can control preselected type-I errors and are more powerful. Raw data permutation is not valid in this setting. Whether unrestricted or restricted residual permutation should be used depends on the specific type of test statistic. Conclusions The F-like test statistic that uses the proposed flexible shrinkage error estimator considering both types of gene heteroscedasticity and unrestricted residual permutation can provide a statistically valid and powerful test. Therefore, we recommended that it should always applied in the analysis of real gene expression data analysis to test an interaction term. PMID:22044602
Yang, Jie; Casella, George; McIntyre, Lauren M
2011-11-01
Many analyses of gene expression data involve hypothesis tests of an interaction term between two fixed effects, typically tested using a residual variance. In expression studies, the issue of variance heteroscedasticity has received much attention, and previous work has focused on either between-gene or within-gene heteroscedasticity. However, in a single experiment, heteroscedasticity may exist both within and between genes. Here we develop flexible shrinkage error estimators considering both between-gene and within-gene heteroscedasticity and use them to construct F-like test statistics for testing interactions, with cutoff values obtained by permutation. These permutation tests are complicated, and several permutation tests are investigated here. Our proposed test statistics are compared with other existing shrinkage-type test statistics through extensive simulation studies and a real data example. The results show that the choice of permutation procedures has dramatically more influence on detection power than the choice of F or F-like test statistics. When both types of gene heteroscedasticity exist, our proposed test statistics can control preselected type-I errors and are more powerful. Raw data permutation is not valid in this setting. Whether unrestricted or restricted residual permutation should be used depends on the specific type of test statistic. The F-like test statistic that uses the proposed flexible shrinkage error estimator considering both types of gene heteroscedasticity and unrestricted residual permutation can provide a statistically valid and powerful test. Therefore, we recommended that it should always applied in the analysis of real gene expression data analysis to test an interaction term.
A clone-based statistical test for localizing disease genes using genomic mismatch scanning
Palmer, C.G.S.; Woodward, A.; Smalley, S.L.
1994-09-01
Genomic mismatch scanning (GMS) is a technique for isolating regions of DNA that are identical-by-descent (IBD) within pairs of relatives. GMS selected data are hybridized to an ordered array of DNA, e.g., metaphase chromosomes, YACs, to identify and localize enhanced region(s) of IBD across pairs of relatives affected with a trait of interest. If the trait has a genetic basis, it is reasonable to assume that the trait gene(s) will be located in these enhanced regions. Our approach to localize these enhanced regions is based on the availability of an ordered array of clones, e.g., YACs, which span the entire human genome. We use an exact binomial order statistic to develop a test for enhanced regions of IBD in sets of clones 1 cM in size selected for being biologically independent (i.e., separated by 50 cM). The test statistic is the maximum proportion of IBD pairs selected from the independent YACs within a set. Thus far, we have defined the power of the test under the alternative hypothesis of a single gene conditional on the maximum proportion IBD being located at the disease locus. As an example, for 60 grandparent-grandchild pairs, the exact power of the test with alpha=0.001 is 0.83 when the relative risk of the disease is 4.0 and the maximum proportion is at the disease locus. This method can be used in small samples and is not dependent on any specific mapping function.
Zhang, Kai; Traskin, Mikhail; Small, Dylan S
2012-03-01
For group-randomized trials, randomization inference based on rank statistics provides robust, exact inference against nonnormal distributions. However, in a matched-pair design, the currently available rank-based statistics lose significant power compared to normal linear mixed model (LMM) test statistics when the LMM is true. In this article, we investigate and develop an optimal test statistic over all statistics in the form of the weighted sum of signed Mann-Whitney-Wilcoxon statistics under certain assumptions. This test is almost as powerful as the LMM even when the LMM is true, but it is much more powerful for heavy tailed distributions. A simulation study is conducted to examine the power.
Hou, Lin; Sun, Ning; Mane, Shrikant; Sayward, Fred; Rajeevan, Nallakkandi; Cheung, Kei-Hoi; Cho, Kelly; Pyarajan, Saiju; Aslan, Mihaela; Miller, Perry; Harvey, Philip D.; Gaziano, J. Michael; Concato, John; Zhao, Hongyu
2017-01-01
A key step in genomic studies is to assess high throughput measurements across millions of markers for each participant’s DNA, either using microarrays or sequencing techniques. Accurate genotype calling is essential for downstream statistical analysis of genotype-phenotype associations, and next generation sequencing (NGS) has recently become a more common approach in genomic studies. How the accuracy of variant calling in NGS-based studies affects downstream association analysis has not, however, been studied using empirical data in which both microarrays and NGS were available. In this article, we investigate the impact of variant calling errors on the statistical power to identify associations between single nucleotides and disease, and on associations between multiple rare variants and disease. Both differential and nondifferential genotyping errors are considered. Our results show that the power of burden tests for rare variants is strongly influenced by the specificity in variant calling, but is rather robust with regard to sensitivity. By using the variant calling accuracies estimated from a substudy of a Cooperative Studies Program project conducted by the Department of Veterans Affairs, we show that the power of association tests is mostly retained with commonly adopted variant calling pipelines. An R package, GWAS.PC, is provided to accommodate power analysis that takes account of genotyping errors (http://zhaocenter.org/software/). PMID:28019059
Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing
2016-01-08
A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method.
Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing
2016-01-01
A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method. PMID:26761006
Rasch fit statistics as a test of the invariance of item parameter estimates.
Smith, Richard M; Suh, Kyunghee K
2003-01-01
The invariance of the estimated parameters across variation in the incidental parameters of a sample is one of the most important properties of Rasch measurement models. This is the property that allows the equating of test forms and the use of computer adaptive testing. It necessarily follows that in Rasch models if the data fit the model, than the estimation of the parameter of interest must be invariant across sub-samples of the items or persons. This study investigates the degree to which the INFIT and OUTFIT item fit statistics in WINSTEPS detect violations of the invariance property of Rasch measurement models. The test in this study is a 80 item multiple-choice test used to assess mathematics competency. The WINSTEPS analysis of the dichotomous results, based on a sample of 2000 from a very large number of students who took the exam, indicated that only 7 of the 80 items misfit using the 1.3 mean square criteria advocated by Linacre and Wright. Subsequent calibration of separate samples of 1,000 students from the upper and lower third of the person raw score distribution, followed by a t-test comparison of the item calibrations, indicated that the item difficulties for 60 of the 80 items were more than 2 standard errors apart. The separate calibration t-values ranged from +21.00 to -7.00 with the t-test value of 41 of the 80 comparisons either larger than +5 or smaller than -5. Clearly these data do not exhibit the invariance of the item parameters expected if the data fit the model. Yet the INFIT and OUTFIT mean squares are completely insensitive to the lack of invariance in the item parameters. If the OUTFIT ZSTD from WINSTEPS was used with a critical value of | t | > 2.0, then 56 of the 60 items identified by the separate calibration t-test would be identified as misfitting. A fourth measure of misfit, the between ability-group item fit statistic identified 69 items as misfitting when a critical value of t > 2.0 was used. Clearly relying solely on the
Debate on GMOs Health Risks after Statistical Findings in Regulatory Tests
de Vendômois, Joël Spiroux; Cellier, Dominique; Vélot, Christian; Clair, Emilie; Mesnage, Robin; Séralini, Gilles-Eric
2010-01-01
We summarize the major points of international debate on health risk studies for the main commercialized edible GMOs. These GMOs are soy, maize and oilseed rape designed to contain new pesticide residues since they have been modified to be herbicide-tolerant (mostly to Roundup) or to produce mutated Bt toxins. The debated alimentary chronic risks may come from unpredictable insertional mutagenesis effects, metabolic effects, or from the new pesticide residues. The most detailed regulatory tests on the GMOs are three-month long feeding trials of laboratory rats, which are biochemically assessed. The tests are not compulsory, and are not independently conducted. The test data and the corresponding results are kept in secret by the companies. Our previous analyses of regulatory raw data at these levels, taking the representative examples of three GM maize NK 603, MON 810, and MON 863 led us to conclude that hepatorenal toxicities were possible, and that longer testing was necessary. Our study was criticized by the company developing the GMOs in question and the regulatory bodies, mainly on the divergent biological interpretations of statistically significant biochemical and physiological effects. We present the scientific reasons for the crucially different biological interpretations and also highlight the shortcomings in the experimental protocols designed by the company. The debate implies an enormous responsibility towards public health and is essential due to nonexistent traceability or epidemiological studies in the GMO-producing countries. PMID:20941377
Debate on GMOs health risks after statistical findings in regulatory tests.
de Vendômois, Joël Spiroux; Cellier, Dominique; Vélot, Christian; Clair, Emilie; Mesnage, Robin; Séralini, Gilles-Eric
2010-10-05
We summarize the major points of international debate on health risk studies for the main commercialized edible GMOs. These GMOs are soy, maize and oilseed rape designed to contain new pesticide residues since they have been modified to be herbicide-tolerant (mostly to Roundup) or to produce mutated Bt toxins. The debated alimentary chronic risks may come from unpredictable insertional mutagenesis effects, metabolic effects, or from the new pesticide residues. The most detailed regulatory tests on the GMOs are three-month long feeding trials of laboratory rats, which are biochemically assessed. The tests are not compulsory, and are not independently conducted. The test data and the corresponding results are kept in secret by the companies. Our previous analyses of regulatory raw data at these levels, taking the representative examples of three GM maize NK 603, MON 810, and MON 863 led us to conclude that hepatorenal toxicities were possible, and that longer testing was necessary. Our study was criticized by the company developing the GMOs in question and the regulatory bodies, mainly on the divergent biological interpretations of statistically significant biochemical and physiological effects. We present the scientific reasons for the crucially different biological interpretations and also highlight the shortcomings in the experimental protocols designed by the company. The debate implies an enormous responsibility towards public health and is essential due to nonexistent traceability or epidemiological studies in the GMO-producing countries.
A statistical framework for genome-wide scanning and testing of imprinted quantitative trait loci.
Cui, Yuehua
2007-01-07
Non-equivalent expression of alleles at a locus results in genomic imprinting. In this article, a statistical framework for genome-wide scanning and testing of imprinted quantitative trait loci (iQTL) underlying complex traits is developed based on experimental crosses of inbred line species in backcross populations. The joint likelihood function is composed of four component likelihood functions with each of them derived from one of four backcross families. The proposed approach models genomic imprinting effect as a probability measure with which one can test the degree of imprinting. Simulation results show that the model is robust for identifying iQTL with various degree of imprinting ranging from no imprinting, partial imprinting to complete imprinting. Under various simulation scenarios, the proposed model shows consistent parameter estimation with reasonable precision and high power in testing iQTL. When a QTL shows Mendelian effect, the proposed model also outperforms traditional Mendelian model. Extension to incorporate maternal effect is also given. The developed model, built within the maximum likelihood framework and implemented with the EM algorithm, provides a quantitative framework for testing and estimating iQTL involved in the genetic control of complex traits.
Testing Genetic Linkage with Relative Pairs and Covariates by Quasi-Likelihood Score Statistics
Schaid, Daniel J.; Sinnwell, Jason P.; Thibodeau, Stephen N.
2007-01-01
Background/Aims Genetic linkage analysis of common diseases is complicated by the heterogeneity of genetic and environmental factors that increase disease risk, and possibly interactions among them. Most linkage methods that account for covariates are restricted to sib pairs, with the exception of the conditional logistic regression model [1] implemented in LODPAL in the S.A.G.E. software [2]. Although this model can be applied to arbitrary pedigrees, at times it can be difficult to maximize the likelihood due to model constraints, and it does not account for the dependence among the different types of relative pairs in a pedigree. Methods To overcome these limitations, we developed a new approach based on score statistics for quasi-likelihoods, implemented as weighted least squares. Our methods can be used to test three different hypotheses: (1) a test for linkage without covariates; (2) a test for linkage with covariates, and (3) a test for effects of covariates on identity by descent sharing (i.e., heterogeneity). Furthermore, our methods are robust because they account for the dependence among different relative pairs within a pedigree. Results and Conclusion: Although application of our methods to a prostate cancer linkage study did not find any critical covariates in our data, the results illustrate the utility and interpretation of our methods, and suggest, nonetheless, that our methods will be useful for a broad range of genetic linkage heterogeneity analyses. PMID:17565225
Statistical power of likelihood ratio and Wald tests in latent class models with covariates.
Gudicha, Dereje W; Schmittmann, Verena D; Vermunt, Jeroen K
2016-12-30
This paper discusses power and sample-size computation for likelihood ratio and Wald testing of the significance of covariate effects in latent class models. For both tests, asymptotic distributions can be used; that is, the test statistic can be assumed to follow a central Chi-square under the null hypothesis and a non-central Chi-square under the alternative hypothesis. Power or sample-size computation using these asymptotic distributions requires specification of the non-centrality parameter, which in practice is rarely known. We show how to calculate this non-centrality parameter using a large simulated data set from the model under the alternative hypothesis. A simulation study is conducted evaluating the adequacy of the proposed power analysis methods, determining the key study design factor affecting the power level, and comparing the performance of the likelihood ratio and Wald test. The proposed power analysis methods turn out to perform very well for a broad range of conditions. Moreover, apart from effect size and sample size, an important factor affecting the power is the class separation, implying that when class separation is low, rather large sample sizes are needed to achieve a reasonable power level.
GeneTools--application for functional annotation and statistical hypothesis testing.
Beisvag, Vidar; Jünge, Frode K R; Bergum, Hallgeir; Jølsum, Lars; Lydersen, Stian; Günther, Clara-Cecilie; Ramampiaro, Heri; Langaas, Mette; Sandvik, Arne K; Laegreid, Astrid
2006-10-24
Modern biology has shifted from "one gene" approaches to methods for genomic-scale analysis like microarray technology, which allow simultaneous measurement of thousands of genes. This has created a need for tools facilitating interpretation of biological data in "batch" mode. However, such tools often leave the investigator with large volumes of apparently unorganized information. To meet this interpretation challenge, gene-set, or cluster testing has become a popular analytical tool. Many gene-set testing methods and software packages are now available, most of which use a variety of statistical tests to assess the genes in a set for biological information. However, the field is still evolving, and there is a great need for "integrated" solutions. GeneTools is a web-service providing access to a database that brings together information from a broad range of resources. The annotation data are updated weekly, guaranteeing that users get data most recently available. Data submitted by the user are stored in the database, where it can easily be updated, shared between users and exported in various formats. GeneTools provides three different tools: i) NMC Annotation Tool, which offers annotations from several databases like UniGene, Entrez Gene, SwissProt and GeneOntology, in both single- and batch search mode. ii) GO Annotator Tool, where users can add new gene ontology (GO) annotations to genes of interest. These user defined GO annotations can be used in further analysis or exported for public distribution. iii) eGOn, a tool for visualization and statistical hypothesis testing of GO category representation. As the first GO tool, eGOn supports hypothesis testing for three different situations (master-target situation, mutually exclusive target-target situation and intersecting target-target situation). An important additional function is an evidence-code filter that allows users, to select the GO annotations for the analysis. GeneTools is the first "all in one
GeneTools – application for functional annotation and statistical hypothesis testing
Beisvag, Vidar; Jünge, Frode KR; Bergum, Hallgeir; Jølsum, Lars; Lydersen, Stian; Günther, Clara-Cecilie; Ramampiaro, Heri; Langaas, Mette; Sandvik, Arne K; Lægreid, Astrid
2006-01-01
Background Modern biology has shifted from "one gene" approaches to methods for genomic-scale analysis like microarray technology, which allow simultaneous measurement of thousands of genes. This has created a need for tools facilitating interpretation of biological data in "batch" mode. However, such tools often leave the investigator with large volumes of apparently unorganized information. To meet this interpretation challenge, gene-set, or cluster testing has become a popular analytical tool. Many gene-set testing methods and software packages are now available, most of which use a variety of statistical tests to assess the genes in a set for biological information. However, the field is still evolving, and there is a great need for "integrated" solutions. Results GeneTools is a web-service providing access to a database that brings together information from a broad range of resources. The annotation data are updated weekly, guaranteeing that users get data most recently available. Data submitted by the user are stored in the database, where it can easily be updated, shared between users and exported in various formats. GeneTools provides three different tools: i) NMC Annotation Tool, which offers annotations from several databases like UniGene, Entrez Gene, SwissProt and GeneOntology, in both single- and batch search mode. ii) GO Annotator Tool, where users can add new gene ontology (GO) annotations to genes of interest. These user defined GO annotations can be used in further analysis or exported for public distribution. iii) eGOn, a tool for visualization and statistical hypothesis testing of GO category representation. As the first GO tool, eGOn supports hypothesis testing for three different situations (master-target situation, mutually exclusive target-target situation and intersecting target-target situation). An important additional function is an evidence-code filter that allows users, to select the GO annotations for the analysis. Conclusion Gene
Parker, Albert E; Hamilton, Martin A; Tomasino, Stephen F
2014-01-01
A performance standard for a disinfectant test method can be evaluated by quantifying the (Type I) pass-error rate for ineffective products and the (Type II) fail-error rate for highly effective products. This paper shows how to calculate these error rates for test methods where the log reduction in a microbial population is used as a measure of antimicrobial efficacy. The calculations can be used to assess performance standards that may require multiple tests of multiple microbes at multiple laboratories. Notably, the error rates account for among-laboratory variance of the log reductions estimated from a multilaboratory data set and the correlation among tests of different microbes conducted in the same laboratory. Performance standards that require that a disinfectant product pass all tests or multiple tests on average, are considered. The proposed statistical methodology is flexible and allows for a different acceptable outcome for each microbe tested, since, for example, variability may be different for different microbes. The approach can also be applied to semiquantitative methods for which product efficacy is reported as the number of positive carriers out of a treated set and the density of the microbes on control carriers is quantified, thereby allowing a log reduction to be calculated. Therefore, using the approach described in this paper, the error rates can also be calculated for semiquantitative method performance standards specified solely in terms of the maximum allowable number of positive carriers per test. The calculations are demonstrated in a case study of the current performance standard for the semiquantitative AOAC Use-Dilution Methods for Pseudomonas aeruginosa (964.02) and Staphylococcus aureus (955.15), which allow up to one positive carrier out of a set of 60 inoculated and treated carriers in each test. A simulation study was also conducted to verify the validity of the model's assumptions and accuracy. Our approach, easily implemented
Improved tests reveal that the accelarating moment release hypothesis is statistically insignificant
Hardebeck, J.L.; Felzer, K.R.; Michael, A.J.
2008-01-01
We test the hypothesis that accelerating moment release (AMR) is a precursor to large earthquakes, using data from California, Nevada, and Sumatra. Spurious cases of AMR can arise from data fitting because the time period, area, and sometimes magnitude range analyzed before each main shock are often optimized to produce the strongest AMR signal. Optimizing the search criteria can identify apparent AMR even if no robust signal exists. For both 1950-2006 California-Nevada M ??? 6.5 earthquakes and the 2004 M9.3 Sumatra earthquake, we can find two contradictory patterns in the pre-main shock earthquakes by data fitting: AMR and decelerating moment release. We compare the apparent AMR found in the real data to the apparent AMR found in four types of synthetic catalogs with no inherent AMR. When spatiotemporal clustering is included in the simulations, similar AMR signals are found by data fitting in both the real and synthetic data sets even though the synthetic data sets contain no real AMR. These tests demonstrate that apparent AMR may arise from a combination of data fitting and normal foreshock and aftershock activity. In principle, data-fitting artifacts could be avoided if the free parameters were determined from scaling relationships between the duration and spatial extent of the AMR pattern and the magnitude of the earthquake that follows it. However, we demonstrate that previously proposed scaling relationships are unstable, statistical artifacts caused by the use of a minimum magnitude for the earthquake catalog that scales with the main shock magnitude. Some recent AMR studies have used spatial regions based on hypothetical stress loading patterns, rather than circles, to select the data. We show that previous tests were biased and that unbiased tests do not find this change to the method to be an improvement. The use of declustered catalogs has also been proposed to eliminate the effect of clustering but we demonstrate that this does not increase the
Tropospheric delay statistics measured by two site test interferometers at Goldstone, California
NASA Astrophysics Data System (ADS)
Morabito, David D.; D'Addario, Larry R.; Acosta, Roberto J.; Nessel, James A.
2013-12-01
Site test interferometers (STIs) have been deployed at two locations within the NASA Deep Space Network tracking complex in Goldstone, California. An STI measures the difference of atmospheric delay fluctuations over a distance comparable to the separations of microwave antennas that could be combined as phased arrays for communication and navigation. The purpose of the Goldstone STIs is to assess the suitability of Goldstone as an uplink array site and to statistically characterize atmosphere-induced phase delay fluctuations for application to future arrays. Each instrument consists of two ~1 m diameter antennas and associated electronics separated by ~200 m. The antennas continuously observe signals emitted by geostationary satellites and produce measurements of the phase difference between the received signals. The two locations at Goldstone are separated by 12.5 km and differ in elevation by 119 m. We find that their delay fluctuations are statistically similar but do not appear as shifted versions of each other, suggesting that the length scale for evolution of the turbulence pattern is shorter than the separation between instruments. We also find that the fluctuations are slightly weaker at the higher altitude site.
Ramírez, J; Górriz, J M; Segura, J C
2007-05-01
Currently, there are technology barriers inhibiting speech processing systems that work in extremely noisy conditions from meeting the demands of modern applications. These systems often require a noise reduction system working in combination with a precise voice activity detector (VAD). This paper shows statistical likelihood ratio tests formulated in terms of the integrated bispectrum of the noisy signal. The integrated bispectrum is defined as a cross spectrum between the signal and its square, and therefore a function of a single frequency variable. It inherits the ability of higher order statistics to detect signals in noise with many other additional advantages: (i) Its computation as a cross spectrum leads to significant computational savings, and (ii) the variance of the estimator is of the same order as that of the power spectrum estimator. The proposed approach incorporates contextual information to the decision rule, a strategy that has reported significant benefits for robust speech recognition applications. The proposed VAD is compared to the G.729, adaptive multirate, and advanced front-end standards as well as recently reported algorithms showing a sustained advantage in speech/nonspeech detection accuracy and speech recognition performance.
NASA Astrophysics Data System (ADS)
Noel, Jean; Prieto, Juan C.; Styner, Martin
2017-03-01
Functional Analysis of Diffusion Tensor Tract Statistics (FADTTS) is a toolbox for analysis of white matter (WM) fiber tracts. It allows associating diffusion properties along major WM bundles with a set of covariates of interest, such as age, diagnostic status and gender, and the structure of the variability of these WM tract properties. However, to use this toolbox, a user must have an intermediate knowledge in scripting languages (MATLAB). FADTTSter was created to overcome this issue and make the statistical analysis accessible to any non-technical researcher. FADTTSter is actively being used by researchers at the University of North Carolina. FADTTSter guides non-technical users through a series of steps including quality control of subjects and fibers in order to setup the necessary parameters to run FADTTS. Additionally, FADTTSter implements interactive charts for FADTTS' outputs. This interactive chart enhances the researcher experience and facilitates the analysis of the results. FADTTSter's motivation is to improve usability and provide a new analysis tool to the community that complements FADTTS. Ultimately, by enabling FADTTS to a broader audience, FADTTSter seeks to accelerate hypothesis testing in neuroimaging studies involving heterogeneous clinical data and diffusion tensor imaging. This work is submitted to the Biomedical Applications in Molecular, Structural, and Functional Imaging conference. The source code of this application is available in NITRC.
A statistical design for testing transgenerational genomic imprinting in natural human populations.
Li, Yao; Guo, Yunqian; Wang, Jianxin; Hou, Wei; Chang, Myron N; Liao, Duanping; Wu, Rongling
2011-02-25
Genomic imprinting is a phenomenon in which the same allele is expressed differently, depending on its parental origin. Such a phenomenon, also called the parent-of-origin effect, has been recognized to play a pivotal role in embryological development and pathogenesis in many species. Here we propose a statistical design for detecting imprinted loci that control quantitative traits based on a random set of three-generation families from a natural population in humans. This design provides a pathway for characterizing the effects of imprinted genes on a complex trait or disease at different generations and testing transgenerational changes of imprinted effects. The design is integrated with population and cytogenetic principles of gene segregation and transmission from a previous generation to next. The implementation of the EM algorithm within the design framework leads to the estimation of genetic parameters that define imprinted effects. A simulation study is used to investigate the statistical properties of the model and validate its utilization. This new design, coupled with increasingly used genome-wide association studies, should have an immediate implication for studying the genetic architecture of complex traits in humans.
Hierarchical searching in model-based LADAR ATR using statistical separability tests
NASA Astrophysics Data System (ADS)
DelMarco, Stephen; Sobel, Erik; Douglas, Joel
2006-05-01
In this work we investigate simultaneous object identification improvement and efficient library search for model-based object recognition applications. We develop an algorithm to provide efficient, prioritized, hierarchical searching of the object model database. A common approach to model-based object recognition chooses the object label corresponding to the best match score. However, due to corrupting effects the best match score does not always correspond to the correct object model. To address this problem, we propose a search strategy which exploits information contained in a number of representative elements of the library to drill down to a small class with high probability of containing the object. We first optimally partition the library into a hierarchic taxonomy of disjoint classes. A small number of representative elements are used to characterize each object model class. At each hierarchy level, the observed object is matched against the representative elements of each class to generate score sets. A hypothesis testing problem, using a distribution-free statistical test, is defined on the score sets and used to choose the appropriate class for a prioritized search. We conduct a probabilistic analysis of the computational cost savings, and provide a formula measuring the computational advantage of the proposed approach. We generate numerical results using match scores derived from matching highly-detailed CAD models of civilian ground vehicles used in 3-D LADAR ATR. We present numerical results showing effects on classification performance of significance level and representative element number in the score set hypothesis testing problem.
Statistical Degradation Models for Reliability Analysis in Non-Destructive Testing
NASA Astrophysics Data System (ADS)
Chetvertakova, E. S.; Chimitova, E. V.
2017-04-01
In this paper, we consider the application of the statistical degradation models for reliability analysis in non-destructive testing. Such models enable to estimate the reliability function (the dependence of non-failure probability on time) for the fixed critical level using the information of the degradation paths of tested items. The most widely used models are the gamma and Wiener degradation models, in which the gamma or normal distributions are assumed as the distribution of degradation increments, respectively. Using the computer simulation technique, we have analysed the accuracy of the reliability estimates, obtained for considered models. The number of increments can be enlarged by increasing the sample size (the number of tested items) or by increasing the frequency of measuring degradation. It has been shown, that the sample size has a greater influence on the accuracy of the reliability estimates in comparison with the measuring frequency. Moreover, it has been shown that another important factor, influencing the accuracy of reliability estimation, is the duration of observing degradation process.
Jha, Sumit Kumar; Pullum, Laura L; Ramanathan, Arvind
2016-01-01
Embedded intelligent systems ranging from tiny im- plantable biomedical devices to large swarms of autonomous un- manned aerial systems are becoming pervasive in our daily lives. While we depend on the flawless functioning of such intelligent systems, and often take their behavioral correctness and safety for granted, it is notoriously difficult to generate test cases that expose subtle errors in the implementations of machine learning algorithms. Hence, the validation of intelligent systems is usually achieved by studying their behavior on representative data sets, using methods such as cross-validation and bootstrapping.In this paper, we present a new testing methodology for studying the correctness of intelligent systems. Our approach uses symbolic decision procedures coupled with statistical hypothesis testing to. We also use our algorithm to analyze the robustness of a human detection algorithm built using the OpenCV open-source computer vision library. We show that the human detection implementation can fail to detect humans in perturbed video frames even when the perturbations are so small that the corresponding frames look identical to the naked eye.
NASA Astrophysics Data System (ADS)
Anderson, Greg; Johnson, Hadley
1999-09-01
Over the past several years, many investigators have argued that static stress changes caused by large earthquakes influence the spatial and temporal distributions of subsequent regional seismicity, with earthquakes occurring preferentially in areas of stress increase and reduced seismicity where stress decreases. Some workers have developed quantitative methods to test for the existence of such static stress triggering, but no firm consensus has yet been reached as to the significance of these effects. We have developed a new test for static stress triggering in which we compute the change in Coulomb stress on the focal mechanism nodal planes of a set of events spanning the occurrence of a large earthquake. We compare the statistical distributions of these stress changes for events before and after the mainshock to decide if we can reject the hypothesis that these distributions are the same. Such rejection would be evidence for stress triggering. We have applied this test to the November 24, 1987, Elmore Ranch/Superstition Hills earthquake sequence and find that those post-mainshock events that experienced stress increases of at least 0.01-0.03 MPa (0.1-0.3 bar) or that occurred from 1.4 to 2.8 years after the mainshocks are consistent with having been triggered by mainshock-generated static stress changes.
Experimental Test of Heisenberg's Measurement Uncertainty Relation Based on Statistical Distances
NASA Astrophysics Data System (ADS)
Ma, Wenchao; Ma, Zhihao; Wang, Hengyan; Chen, Zhihua; Liu, Ying; Kong, Fei; Li, Zhaokai; Peng, Xinhua; Shi, Mingjun; Shi, Fazhan; Fei, Shao-Ming; Du, Jiangfeng
2016-04-01
Incompatible observables can be approximated by compatible observables in joint measurement or measured sequentially, with constrained accuracy as implied by Heisenberg's original formulation of the uncertainty principle. Recently, Busch, Lahti, and Werner proposed inaccuracy trade-off relations based on statistical distances between probability distributions of measurement outcomes [P. Busch et al., Phys. Rev. Lett. 111, 160405 (2013); P. Busch et al., Phys. Rev. A 89, 012129 (2014)]. Here we reformulate their theoretical framework, derive an improved relation for qubit measurement, and perform an experimental test on a spin system. The relation reveals that the worst-case inaccuracy is tightly bounded from below by the incompatibility of target observables, and is verified by the experiment employing joint measurement in which two compatible observables designed to approximate two incompatible observables on one qubit are measured simultaneously.
Colegrave, Nick; Ruxton, Graeme D
2017-03-29
A common approach to the analysis of experimental data across much of the biological sciences is test-qualified pooling. Here non-significant terms are dropped from a statistical model, effectively pooling the variation associated with each removed term with the error term used to test hypotheses (or estimate effect sizes). This pooling is only carried out if statistical testing on the basis of applying that data to a previous more complicated model provides motivation for this model simplification; hence the pooling is test-qualified. In pooling, the researcher increases the degrees of freedom of the error term with the aim of increasing statistical power to test their hypotheses of interest. Despite this approach being widely adopted and explicitly recommended by some of the most widely cited statistical textbooks aimed at biologists, here we argue that (except in highly specialized circumstances that we can identify) the hoped-for improvement in statistical power will be small or non-existent, and there is likely to be much reduced reliability of the statistical procedures through deviation of type I error rates from nominal levels. We thus call for greatly reduced use of test-qualified pooling across experimental biology, more careful justification of any use that continues, and a different philosophy for initial selection of statistical models in the light of this change in procedure.
Palazón, L; Navas, A
2017-06-01
Information on sediment contribution and transport dynamics from the contributing catchments is needed to develop management plans to tackle environmental problems related with effects of fine sediment as reservoir siltation. In this respect, the fingerprinting technique is an indirect technique known to be valuable and effective for sediment source identification in river catchments. Large variability in sediment delivery was found in previous studies in the Barasona catchment (1509 km(2), Central Spanish Pyrenees). Simulation results with SWAT and fingerprinting approaches identified badlands and agricultural uses as the main contributors to sediment supply in the reservoir. In this study the <63 μm sediment fraction from the surface reservoir sediments (2 cm) are investigated following the fingerprinting procedure to assess how the use of different statistical procedures affects the amounts of source contributions. Three optimum composite fingerprints were selected to discriminate between source contributions based in land uses/land covers from the same dataset by the application of (1) discriminant function analysis; and its combination (as second step) with (2) Kruskal-Wallis H-test and (3) principal components analysis. Source contribution results were different between assessed options with the greatest differences observed for option using #3, including the two step process: principal components analysis and discriminant function analysis. The characteristics of the solutions by the applied mixing model and the conceptual understanding of the catchment showed that the most reliable solution was achieved using #2, the two step process of Kruskal-Wallis H-test and discriminant function analysis. The assessment showed the importance of the statistical procedure used to define the optimum composite fingerprint for sediment fingerprinting applications. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Guo, Bingjie; Bitner-Gregersen, Elzbieta Maria; Sun, Hui; Block Helmers, Jens
2013-04-01
Earlier investigations have indicated that proper prediction of nonlinear loads and responses due to nonlinear waves is important for ship safety in extreme seas. However, the nonlinear loads and responses in extreme seas have not been sufficiently investigated yet, particularly when rogue waves are considered. A question remains whether the existing linear codes can predict nonlinear loads and responses with a satisfactory accuracy and how large the deviations from linear predictions are. To indicate it response statistics have been studied based on the model tests carried out with a LNG tanker in the towing tank of the Technical University of Berlin (TUB), and compared with the statistics derived from numerical simulations using the DNV code WASIM. It is a potential code for wave-ship interaction based on 3D Panel method, which can perform both linear and nonlinear simulation. The numerical simulations with WASIM and the model tests in extreme and rogue waves have been performed. The analysis of ship motions (heave and pitch) and bending moments, in both regular and irregular waves, is performed. The results from the linear and nonlinear simulations are compared with experimental data to indicate the impact of wave non-linearity on loads and response calculations when the code based on the Rankine Panel Method is used. The study shows that nonlinearities may have significant effect on extreme motions and bending moment generated by strongly nonlinear waves. The effect of water depth on ship responses is also demonstrated using numerical simulations. Uncertainties related to the results are discussed, giving particular attention to sampling variability.
Combining test statistics and models in bootstrapped model rejection: it is a balancing act
2014-01-01
Background Model rejections lie at the heart of systems biology, since they provide conclusive statements: that the corresponding mechanistic assumptions do not serve as valid explanations for the experimental data. Rejections are usually done using e.g. the chi-square test (χ2) or the Durbin-Watson test (DW). Analytical formulas for the corresponding distributions rely on assumptions that typically are not fulfilled. This problem is partly alleviated by the usage of bootstrapping, a computationally heavy approach to calculate an empirical distribution. Bootstrapping also allows for a natural extension to estimation of joint distributions, but this feature has so far been little exploited. Results We herein show that simplistic combinations of bootstrapped tests, like the max or min of the individual p-values, give inconsistent, i.e. overly conservative or liberal, results. A new two-dimensional (2D) approach based on parametric bootstrapping, on the other hand, is found both consistent and with a higher power than the individual tests, when tested on static and dynamic examples where the truth is known. In the same examples, the most superior test is a 2D χ2vsχ2, where the second χ2-value comes from an additional help model, and its ability to describe bootstraps from the tested model. This superiority is lost if the help model is too simple, or too flexible. If a useful help model is found, the most powerful approach is the bootstrapped log-likelihood ratio (LHR). We show that this is because the LHR is one-dimensional, because the second dimension comes at a cost, and because LHR has retained most of the crucial information in the 2D distribution. These approaches statistically resolve a previously published rejection example for the first time. Conclusions We have shown how to, and how not to, combine tests in a bootstrap setting, when the combination is advantageous, and when it is advantageous to include a second model. These results also provide a deeper
ERIC Educational Resources Information Center
Tabor, Josh
2010-01-01
On the 2009 AP[c] Statistics Exam, students were asked to create a statistic to measure skewness in a distribution. This paper explores several of the most popular student responses and evaluates which statistic performs best when sampling from various skewed populations. (Contains 8 figures, 3 tables, and 4 footnotes.)
DWPF Sample Vial Insert Study-Statistical Analysis of DWPF Mock-Up Test Data
Harris, S.P.
1997-09-18
This report is prepared as part of Technical/QA Task Plan WSRC-RP-97-351 which was issued in response to Technical Task Request HLW/DWPF/TTR-970132 submitted by DWPF. Presented in this report is a statistical analysis of DWPF Mock-up test data for evaluation of two new analytical methods which use insert samples from the existing HydragardTM sampler. The first is a new hydrofluoric acid based method called the Cold Chemical Method (Cold Chem) and the second is a modified fusion method.Either new DWPF analytical method could result in a two to three fold improvement in sample analysis time.Both new methods use the existing HydragardTM sampler to collect a smaller insert sample from the process sampling system. The insert testing methodology applies to the DWPF Slurry Mix Evaporator (SME) and the Melter Feed Tank (MFT) samples.The insert sample is named after the initial trials which placed the container inside the sample (peanut) vials. Samples in small 3 ml containers (Inserts) are analyzed by either the cold chemical method or a modified fusion method. The current analytical method uses a HydragardTM sample station to obtain nearly full 15 ml peanut vials. The samples are prepared by a multi-step process for Inductively Coupled Plasma (ICP) analysis by drying, vitrification, grinding and finally dissolution by either mixed acid or fusion. In contrast, the insert sample is placed directly in the dissolution vessel, thus eliminating the drying, vitrification and grinding operations for the Cold chem method. Although the modified fusion still requires drying and calcine conversion, the process is rapid due to the decreased sample size and that no vitrification step is required.A slurry feed simulant material was acquired from the TNX pilot facility from the test run designated as PX-7.The Mock-up test data were gathered on the basis of a statistical design presented in SRT-SCS-97004 (Rev. 0). Simulant PX-7 samples were taken in the DWPF Analytical Cell Mock
ERIC Educational Resources Information Center
Zheng, Yinggan; Gierl, Mark J.; Cui, Ying
2010-01-01
This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…
ERIC Educational Resources Information Center
Zheng, Yinggan; Gierl, Mark J.; Cui, Ying
2010-01-01
This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…
ERIC Educational Resources Information Center
Steinberg, Wendy J.
The purpose of this study was to examine the nature and degree of differences in expert versus novice knowledge structures, both before and after training, when judging the similarity of multiple-choice test items within a statistics and test theory (STT) domain. Subjects were employees of the Testing Division of the New York State Department of…
ERIC Educational Resources Information Center
Luh, Wei-Ming; Guo, Jiin-Huarng
2005-01-01
To deal with nonnormal and heterogeneous data for the one-way fixed effect analysis of variance model, the authors adopted a trimmed means method in conjunction with Hall's invertible transformation into a heteroscedastic test statistic (Alexander-Govern test or Welch test). The results of simulation experiments showed that the proposed technique…
Exact statistical tests for the intersection of independent lists of genes
NATARAJAN, LOKI; PU, MINYA; MESSER, KAREN
2012-01-01
Public data repositories have enabled researchers to compare results across multiple genomic studies in order to replicate findings. A common approach is to first rank genes according to an hypothesis of interest within each study. Then, lists of the top-ranked genes within each study are compared across studies. Genes recaptured as highly ranked (usually above some threshold) in multiple studies are considered to be significant. However, this comparison strategy often remains informal, in that Type I error and false discovery rate are usually uncontrolled. In this paper, we formalize an inferential strategy for this kind of list-intersection discovery test. We show how to compute a p-value associated with a `recaptured' set of genes, using a closed-form Poisson approximation to the distribution of the size of the recaptured set. The distribution of the test statistic depends on the rank threshold and the number of studies within which a gene must be recaptured. We use a Poisson approximation to investigate operating characteristics of the test. We give practical guidance on how to design a bioinformatic list-intersection study with prespecified control of Type I error (at the set level) and false discovery rate (at the gene level). We show how choice of test parameters will affect the expected proportion of significant genes identified. We present a strategy for identifying optimal choice of parameters, depending on the particular alternative hypothesis which might hold. We illustrate our methods using prostate cancer gene-expression datasets from the curated Oncomine database. PMID:23335952
What's the best statistic for a simple test of genetic association in a case-control study?
Kuo, Chia-Ling; Feingold, Eleanor
2010-04-01
Genome-wide genetic association studies typically start with univariate statistical tests of each marker. In principle, this single-SNP scanning is statistically straightforward--the testing is done with standard methods (e.g. chi(2) tests, regression) that have been well studied for decades. However, a number of different tests and testing procedures can be used. In a case-control study, one can use a 1 df allele-based test, a 1 or 2 df genotype-based test, or a compound procedure that combines two or more of these statistics. Additionally, most of the tests can be performed with or without covariates included in the model. While there are a number of statistical papers that make power comparisons among subsets of these methods, none has comprehensively tackled the question of which of the methods in common use is best suited to univariate scanning in a genome-wide association study. In this paper, we consider a wide variety of realistic test procedures, and first compare the power of the different procedures to detect a single locus under different genetic models. We then address the question of whether or when it is a good idea to include covariates in the analysis. We conclude that the most commonly used approach to handle covariates--modeling covariate main effects but not interactions--is almost never a good idea. Finally, we consider the performance of the statistics in a genome scan context.
FLAGS: A Flexible and Adaptive Association Test for Gene Sets Using Summary Statistics
Huang, Jianfei; Wang, Kai; Wei, Peng; Liu, Xiangtao; Liu, Xiaoming; Tan, Kai; Boerwinkle, Eric; Potash, James B.; Han, Shizhong
2016-01-01
Genome-wide association studies (GWAS) have been widely used for identifying common variants associated with complex diseases. Despite remarkable success in uncovering many risk variants and providing novel insights into disease biology, genetic variants identified to date fail to explain the vast majority of the heritability for most complex diseases. One explanation is that there are still a large number of common variants that remain to be discovered, but their effect sizes are generally too small to be detected individually. Accordingly, gene set analysis of GWAS, which examines a group of functionally related genes, has been proposed as a complementary approach to single-marker analysis. Here, we propose a flexible and adaptive test for gene sets (FLAGS), using summary statistics. Extensive simulations showed that this method has an appropriate type I error rate and outperforms existing methods with increased power. As a proof of principle, through real data analyses of Crohn’s disease GWAS data and bipolar disorder GWAS meta-analysis results, we demonstrated the superior performance of FLAGS over several state-of-the-art association tests for gene sets. Our method allows for the more powerful application of gene set analysis to complex diseases, which will have broad use given that GWAS summary results are increasingly publicly available. PMID:26773050
Statistical methods for the analysis of a screening test for chronic beryllium disease
Frome, E.L.; Neubert, R.L.; Smith, M.H.; Littlefield, L.G.; Colyer, S.P.
1994-10-01
The lymphocyte proliferation test (LPT) is a noninvasive screening procedure used to identify persons who may have chronic beryllium disease. A practical problem in the analysis of LPT well counts is the occurrence of outlying data values (approximately 7% of the time). A log-linear regression model is used to describe the expected well counts for each set of test conditions. The variance of the well counts is proportional to the square of the expected counts, and two resistant regression methods are used to estimate the parameters of interest. The first approach uses least absolute values (LAV) on the log of the well counts to estimate beryllium stimulation indices (SIs) and the coefficient of variation. The second approach uses a resistant regression version of maximum quasi-likelihood estimation. A major advantage of the resistant regression methods is that it is not necessary to identify and delete outliers. These two new methods for the statistical analysis of the LPT data and the outlier rejection method that is currently being used are applied to 173 LPT assays. The authors strongly recommend the LAV method for routine analysis of the LPT.
Test Statistics for the Identification of Assembly Neurons in Parallel Spike Trains
Picado Muiño, David; Borgelt, Christian
2015-01-01
In recent years numerous improvements have been made in multiple-electrode recordings (i.e., parallel spike-train recordings) and spike sorting to the extent that nowadays it is possible to monitor the activity of up to hundreds of neurons simultaneously. Due to these improvements it is now potentially possible to identify assembly activity (roughly understood as significant synchronous spiking of a group of neurons) from these recordings, which—if it can be demonstrated reliably—would significantly improve our understanding of neural activity and neural coding. However, several methodological problems remain when trying to do so and, among them, a principal one is the combinatorial explosion that one faces when considering all potential neuronal assemblies, since in principle every subset of the recorded neurons constitutes a candidate set for an assembly. We present several statistical tests to identify assembly neurons (i.e., neurons that participate in a neuronal assembly) from parallel spike trains with the aim of reducing the set of neurons to a relevant subset of them and this way ease the task of identifying neuronal assemblies in further analyses. These tests are an improvement of those introduced in the work by Berger et al. (2010) based on additional features like spike weight or pairwise overlap and on alternative ways to identify spike coincidences (e.g., by avoiding time binning, which tends to lose information). PMID:25866503
A Tool Preference Choice Method for RNA Secondary Structure Prediction by SVM with Statistical Tests
Hor, Chiou-Yi; Yang, Chang-Biau; Chang, Chia-Hung; Tseng, Chiou-Ting; Chen, Hung-Hsin
2013-01-01
The Prediction of RNA secondary structures has drawn much attention from both biologists and computer scientists. Many useful tools have been developed for this purpose. These tools have their individual strengths and weaknesses. As a result, based on support vector machines (SVM), we propose a tool choice method which integrates three prediction tools: pknotsRG, RNAStructure, and NUPACK. Our method first extracts features from the target RNA sequence, and adopts two information-theoretic feature selection methods for feature ranking. We propose a method to combine feature selection and classifier fusion in an incremental manner. Our test data set contains 720 RNA sequences, where 225 pseudoknotted RNA sequences are obtained from PseudoBase, and 495 nested RNA sequences are obtained from RNA SSTRAND. The method serves as a preprocessing way in analyzing RNA sequences before the RNA secondary structure prediction tools are employed. In addition, the performance of various configurations is subject to statistical tests to examine their significance. The best base-pair accuracy achieved is 75.5%, which is obtained by the proposed incremental method, and is significantly higher than 68.8%, which is associated with the best predictor, pknotsRG. PMID:23641141
Létourneau, Daniel McNiven, Andrea; Keller, Harald; Wang, An; Amin, Md Nurul; Pearce, Jim; Norrlinger, Bernhard; Jaffray, David A.
2014-12-15
Purpose: High-quality radiation therapy using highly conformal dose distributions and image-guided techniques requires optimum machine delivery performance. In this work, a monitoring system for multileaf collimator (MLC) performance, integrating semiautomated MLC quality control (QC) tests and statistical process control tools, was developed. The MLC performance monitoring system was used for almost a year on two commercially available MLC models. Control charts were used to establish MLC performance and assess test frequency required to achieve a given level of performance. MLC-related interlocks and servicing events were recorded during the monitoring period and were investigated as indicators of MLC performance variations. Methods: The QC test developed as part of the MLC performance monitoring system uses 2D megavoltage images (acquired using an electronic portal imaging device) of 23 fields to determine the location of the leaves with respect to the radiation isocenter. The precision of the MLC performance monitoring QC test and the MLC itself was assessed by detecting the MLC leaf positions on 127 megavoltage images of a static field. After initial calibration, the MLC performance monitoring QC test was performed 3–4 times/week over a period of 10–11 months to monitor positional accuracy of individual leaves for two different MLC models. Analysis of test results was performed using individuals control charts per leaf with control limits computed based on the measurements as well as two sets of specifications of ±0.5 and ±1 mm. Out-of-specification and out-of-control leaves were automatically flagged by the monitoring system and reviewed monthly by physicists. MLC-related interlocks reported by the linear accelerator and servicing events were recorded to help identify potential causes of nonrandom MLC leaf positioning variations. Results: The precision of the MLC performance monitoring QC test and the MLC itself was within ±0.22 mm for most MLC leaves
Testing of a "smart-pebble" for measuring particle transport statistics
NASA Astrophysics Data System (ADS)
Kitsikoudis, Vasileios; Avgeris, Loukas; Valyrakis, Manousos
2017-04-01
This paper presents preliminary results from novel experiments aiming to assess coarse sediment transport statistics for a range of transport conditions, via the use of an innovative "smart-pebble" device. This device is a waterproof sphere, which has 7 cm diameter and is equipped with a number of sensors that provide information about the velocity, acceleration and positioning of the "smart-pebble" within the flow field. A series of specifically designed experiments are carried out to monitor the entrainment of a "smart-pebble" for fully developed, uniform, turbulent flow conditions over a hydraulically rough bed. Specifically, the bed surface is configured to three sections, each of them consisting of well packed glass beads of slightly increasing size at the downstream direction. The first section has a streamwise length of L1=150 cm and beads size of D1=15 mm, the second section has a length of L2=85 cm and beads size of D2=22 mm, and the third bed section has a length of L3=55 cm and beads size of D3=25.4 mm. Two cameras monitor the area of interest to provide additional information regarding the "smart-pebble" movement. Three-dimensional flow measurements are obtained with the aid of an acoustic Doppler velocimeter along a measurement grid to assess the flow forcing field. A wide range of flow rates near and above the threshold of entrainment is tested, while using four distinct densities for the "smart-pebble", which can affect its transport speed and total momentum. The acquired data are analyzed to derive Lagrangian transport statistics and the implications of such an important experiment for the transport of particles by rolling are discussed. The flow conditions for the initiation of motion, particle accelerations and equilibrium particle velocities (translating into transport rates), statistics of particle impact and its motion, can be extracted from the acquired data, which can be further compared to develop meaningful insights for sediment transport
NASA Astrophysics Data System (ADS)
Roggo, Y.; Duponchel, L.; Ruckebusch, C.; Huvenne, J.-P.
2003-06-01
Near-infrared spectroscopy (NIRS) has been applied for both qualitative and quantitative evaluation of sugar beet. However, chemometrics methods are numerous and a choice criterion is sometime difficult to determine. In order to select the most accurate chemometrics method, statistical tests are developed. In the first part, quantitative models, which predict sucrose content of sugar beet, are compared. To realize a systematic study, 54 models are developed with different spectral pre-treatments (Standard Normal Variate (SNV), Detrending (D), first and second Derivative), different spectral ranges and different regression methods (Principal Component Regression (PCR), Partial Least Squares (PLS), Modified PLS (MPLS)). Analyze of variance and Fisher's tests are computed to compare respectively bias and Standard Error of Prediction Corrected for bias (SEP(C)). The model developed with full spectra pre-treated by SNV, second derivative and MPLS methods gives accurate results: bias is 0.008 and SEP(C) is 0.097 g of sucrose per 100 g of sample on a concentration range between 14 and 21 g/100 g. In the second part, McNemar's test is applied to compare the classification methods. The classifications are used with two data sets: the first data set concerns the disease resistance of sugar beet and the second deals with spectral differences between four spectrometers. The performances of four well-known classification methods are compared on the NIRS data: Linear Discriminant Analysis (LDA), K Nearest Neighbors method (KNN), Simple Modeling of Class Analogy (SIMCA) and Learning Vector Quantization neural network (LVQ) are computed. In this study, the most accurate method (SIMCA) has a prediction rate of 81.9% of good classification on the disease resistance determination and has 99.4% of good classification on the instrument data set.
McAlinden, Colm; Khadka, Jyoti; Pesudovs, Konrad
2011-07-01
The ever-expanding choice of ocular metrology and imaging equipment has driven research into the validity of their measurements. Consequently, studies of the agreement between two instruments or clinical tests have proliferated in the ophthalmic literature. It is important that researchers apply the appropriate statistical tests in agreement studies. Correlation coefficients are hazardous and should be avoided. The 'limits of agreement' method originally proposed by Altman and Bland in 1983 is the statistical procedure of choice. Its step-by-step use and practical considerations in relation to optometry and ophthalmology are detailed in addition to sample size considerations and statistical approaches to precision (repeatability or reproducibility) estimates.
Mitchell, Matthew W
2015-01-01
For pathway analysis of genomic data, the most common methods involve combining p-values from individual statistical tests. However, there are several multivariate statistical methods that can be used to test whether a pathway has changed. Because of the large number of variables and pathway sizes in genomics data, some of these statistics cannot be computed. However, in metabolomics data, the number of variables and pathway sizes are typically much smaller, making such computations feasible. Of particular interest is being able to detect changes in pathways that may not be detected for the individual variables. We compare the performance of both the p-value methods and multivariate statistics for self-contained tests with an extensive simulation study and a human metabolomics study. Permutation tests, rather than asymptotic results are used to assess the statistical significance of the pathways. Furthermore, both one and two-sided alternatives hypotheses are examined. From the human metabolomic study, many pathways were statistically significant, although the majority of the individual variables in the pathway were not. Overall, the p-value methods perform at least as well as the multivariate statistics for these scenarios.
NASA Astrophysics Data System (ADS)
Segou, M.; Parsons, T.; Ellsworth, W. L.
2012-12-01
We implement a retrospective forecast test specific to the 1989 Loma Prieta sequence and we focus on the comparison between two realizations of the epidemic-type aftershock sequence (ETAS) model and twenty-one models based on Coulomb stress change calculations and rate-and-state theory (CRS). We find that: (1) ETAS models forecast the spatial evolution of seismicity better in the near-source region, (2) CRS models can compete with ETAS models at off-fault regions and short-periods after the mainshock, (3) adopting optimally oriented planes as receivers could lead to better performance for short-time period up to a few days, whereas geologically specified planes should be implemented at long-term forecasting, and (4) CRS models based on shear stress have comparable performance with other CRS models, with the benefit of fewer free parameters involved in the stress calculations. The above results show that physics-based and statistical forecast models are complimentary, and that future forecasts should be combinations of ETAS and CRS models in space and time. We note that the realization in time and space of the CRS models involves a number of critical parameters ('learning' phase seismicity rates, regional stress field, loading rates on faults), which should be retrospectively tested to improve the predictive power of physics-based models.During our experiment the forecast covers Northern California [123.0-121.3°W in longitude 36.4-38.2°N in latitude] in a 2.5 km spatial grid within a 10-day interval following a mainshock, but here we focus on the results related with the post-seismic period of Loma Prieta earthquake. We consider for CRS models a common learning phase (1974-1980) to ensure consistency in our comparison, and we take into consideration stress perturbations imparted by 9 M>5.0 earthquakes between 1980-1989 in Northern California, including the 1988-1989 Lake Ellsman events. ETAS parameters correspond to the maximum likelihood estimations derived after
NASA Astrophysics Data System (ADS)
Taylor, Steven R.; Anderson, Dale N.
2011-02-01
In his Forum, P. Vermeesch (Eos, 90(47), 443, doi:10.1029/2009EO470004, 2009) applied Pearson's chi-square test to a large catalog of earthquakes to test the hypothesis that earthquakes are uniformly distributed across day of week (the formal null hypothesis that an earthquake has equal probability of occurring on any day). In his analysis, this hypothesis is rejected, and he proposes that the statistical test implies that earthquakes are correlated with day of the week (with specifically high seismicity on Sunday), and therefore strong dependence of p values on sample size makes them uninterpretable. It is a well-known property of classical hypothesis tests that the power of a statistical test is a function of the degrees of freedom, so that a test with large degrees of freedom will always have the resolution to reject the null. Consideration for practical as well as statistical significance is essential. Selecting bins so that the chi-square test fails to reject the null hypothesis is essentially formulating a test to agree with a foregone conclusion. To the point, this data set does not exhibit uniform seismicity across time, and the statistical test is summarizing the data correctly. With proper attention to the application setting, and formulation of the null and alternative hypotheses, summarizing with p values is technically sound.
Chlorine-36 data at Yucca Mountain: Statistical tests of conceptual models for unsaturated-zone flow
Campbell, K.; Wolfsberg, A.; Fabryka-Martin, J.; Sweetkind, D.
2003-01-01
An extensive set of chlorine-36 (36Cl) data has been collected in the Exploratory Studies Facility (ESF), an 8-km-long tunnel at Yucca Mountain, Nevada, for the purpose of developing and testing conceptual models of flow and transport in the unsaturated zone (UZ) at this site. At several locations, the measured values of 36Cl/Cl ratios for salts leached from rock samples are high enough to provide strong evidence that at least a small component of bomb-pulse 36Cl, fallout from atmospheric testing of nuclear devices in the 1950s and 1960s, was measured, implying that some fraction of the water traveled from the ground surface through 200-300 m of unsaturated rock to the level of the ESF during the last 50 years. These data are analyzed here using a formal statistical approach based on log-linear models to evaluate alternative conceptual models for the distribution of such fast flow paths. The most significant determinant of the presence of bomb-pulse 36Cl in a sample from the welded Topopah Spring unit (TSw) is the structural setting from which the sample was collected. Our analysis generally supports the conceptual model that a fault that cuts through the nonwelded Paintbrush tuff unit (PTn) that overlies the TSw is required in order for bomb-pulse 36Cl to be transmitted to the sample depth in less than 50 years. Away from PTn-cutting faults, the ages of water samples at the ESF appear to be a strong function of the thickness of the nonwelded tuff between the ground surface and the ESF, due to slow matrix flow in that unit. ?? 2002 Elsevier Science B.V. All rights reserved.
Chlorine-36 data at Yucca Mountain: statistical tests of conceptual models for unsaturated-zone flow
NASA Astrophysics Data System (ADS)
Campbell, Katherine; Wolfsberg, Andrew; Fabryka-Martin, June; Sweetkind, Donald
2003-05-01
An extensive set of chlorine-36 ( 36Cl) data has been collected in the Exploratory Studies Facility (ESF), an 8-km-long tunnel at Yucca Mountain, Nevada, for the purpose of developing and testing conceptual models of flow and transport in the unsaturated zone (UZ) at this site. At several locations, the measured values of 36Cl/Cl ratios for salts leached from rock samples are high enough to provide strong evidence that at least a small component of bomb-pulse 36Cl, fallout from atmospheric testing of nuclear devices in the 1950s and 1960s, was measured, implying that some fraction of the water traveled from the ground surface through 200-300 m of unsaturated rock to the level of the ESF during the last 50 years. These data are analyzed here using a formal statistical approach based on log-linear models to evaluate alternative conceptual models for the distribution of such fast flow paths. The most significant determinant of the presence of bomb-pulse 36Cl in a sample from the welded Topopah Spring unit (TSw) is the structural setting from which the sample was collected. Our analysis generally supports the conceptual model that a fault that cuts through the nonwelded Paintbrush tuff unit (PTn) that overlies the TSw is required in order for bomb-pulse 36Cl to be transmitted to the sample depth in less than 50 years. Away from PTn-cutting faults, the ages of water samples at the ESF appear to be a strong function of the thickness of the nonwelded tuff between the ground surface and the ESF, due to slow matrix flow in that unit.
Campbell, Katherine; Wolfsberg, Andrew; Fabryka-Martin, June; Sweetkind, Donald
2003-01-01
An extensive set of chlorine-36 (36Cl) data has been collected in the Exploratory Studies Facility (ESF), an 8-km-long tunnel at Yucca Mountain, Nevada, for the purpose of developing and testing conceptual models of flow and transport in the unsaturated zone (UZ) at this site. At several locations, the measured values of 36Cl/Cl ratios for salts leached from rock samples are high enough to provide strong evidence that at least a small component of bomb-pulse 36Cl, fallout from atmospheric testing of nuclear devices in the 1950s and 1960s, was measured, implying that some fraction of the water traveled from the ground surface through 200-300 m of unsaturated rock to the level of the ESF during the last 50 years. These data are analyzed here using a formal statistical approach based on log-linear models to evaluate alternative conceptual models for the distribution of such fast flow paths. The most significant determinant of the presence of bomb-pulse 36Cl in a sample from the welded Topopah Spring unit (TSw) is the structural setting from which the sample was collected. Our analysis generally supports the conceptual model that a fault that cuts through the nonwelded Paintbrush tuff unit (PTn) that overlies the TSw is required in order for bomb-pulse 36Cl to be transmitted to the sample depth in less than 50 years. Away from PTn-cutting faults, the ages of water samples at the ESF appear to be a strong function of the thickness of the nonwelded tuff between the ground surface and the ESF, due to slow matrix flow in that unit.
Chen, David; Shah, Anup; Nguyen, Hien; Loo, Dorothy; Inder, Kerry L; Hill, Michelle M
2014-09-05
The utility of high-throughput quantitative proteomics to identify differentially abundant proteins en-masse relies on suitable and accessible statistical methodology, which remains mostly an unmet need. We present a free web-based tool, called Quantitative Proteomics p-value Calculator (QPPC), designed for accessibility and usability by proteomics scientists and biologists. Being an online tool, there is no requirement for software installation. Furthermore, QPPC accepts generic peptide ratio data generated by any mass spectrometer and database search engine. Importantly, QPPC utilizes the permutation test that we recently found to be superior to other methods for analysis of peptide ratios because it does not assume normal distributions.1 QPPC assists the user in selecting significantly altered proteins based on numerical fold change, or standard deviation from the mean or median, together with the permutation p-value. Output is in the form of comma separated values files, along with graphical visualization using volcano plots and histograms. We evaluate the optimal parameters for use of QPPC, including the permutation level and the effect of outlier and contaminant peptides on p-value variability. The optimal parameters defined are deployed as default for the web-tool at http://qppc.di.uq.edu.au/ .
Ramus, Claire; Hovasse, Agnès; Marcellin, Marlène; Hesse, Anne-Marie; Mouton-Barbosa, Emmanuelle; Bouyssié, David; Vaca, Sebastian; Carapito, Christine; Chaoui, Karima; Bruley, Christophe; Garin, Jérôme; Cianférani, Sarah; Ferro, Myriam; Dorssaeler, Alain Van; Burlet-Schiltz, Odile; Schaeffer, Christine; Couté, Yohann; Gonzalez de Peredo, Anne
2016-03-01
This data article describes a controlled, spiked proteomic dataset for which the "ground truth" of variant proteins is known. It is based on the LC-MS analysis of samples composed of a fixed background of yeast lysate and different spiked amounts of the UPS1 mixture of 48 recombinant proteins. It can be used to objectively evaluate bioinformatic pipelines for label-free quantitative analysis, and their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. More specifically, it can be useful for tuning software tools parameters, but also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. The raw MS files can be downloaded from ProteomeXchange with identifier PXD001819. Starting from some raw files of this dataset, we also provide here some processed data obtained through various bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, to exemplify the use of such data in the context of software benchmarking, as discussed in details in the accompanying manuscript [1]. The experimental design used here for data processing takes advantage of the different spike levels introduced in the samples composing the dataset, and processed data are merged in a single file to facilitate the evaluation and illustration of software tools results for the detection of variant proteins with different absolute expression levels and fold change values.
Fermentation tube test statistics for direct water sampling and comments on the Thomas formula.
Nawalany, M; Loga, M
2010-09-01
This article describes a new interpretation of the Fermentation Tube Test (FTT) performed on water samples drawn from natural waters polluted by faecal bacteria. A novel general procedure to calculate the Most Probable Number of bacteria (MPN) in natural waters has been derived for the FTT for both direct and independent repetitive multiple water sampling. The generalization based on solving the newly proposed equation allows consideration of any a priori frequency distribution g(n) of bacterial concentration in analysed water as opposed to the unbounded uniform a priori distribution g(n) assumed in the standard procedures of the Standard Methods of Examining Water and Wastewater and ISO 8199:1988. Also a statistical analysis of the Thomas formula is presented. It is demonstrated that the Thomas formula is highly inaccurate. The authors propose, therefore, to remove the Thomas formula from the Standard Methods of Examining Water and Wastewater and ISO 8199:1988 altogether and replace it with a solution of the proposed generalized equation.
Ramus, Claire; Hovasse, Agnès; Marcellin, Marlène; Hesse, Anne-Marie; Mouton-Barbosa, Emmanuelle; Bouyssié, David; Vaca, Sebastian; Carapito, Christine; Chaoui, Karima; Bruley, Christophe; Garin, Jérôme; Cianférani, Sarah; Ferro, Myriam; Dorssaeler, Alain Van; Burlet-Schiltz, Odile; Schaeffer, Christine; Couté, Yohann; Gonzalez de Peredo, Anne
2015-01-01
This data article describes a controlled, spiked proteomic dataset for which the “ground truth” of variant proteins is known. It is based on the LC-MS analysis of samples composed of a fixed background of yeast lysate and different spiked amounts of the UPS1 mixture of 48 recombinant proteins. It can be used to objectively evaluate bioinformatic pipelines for label-free quantitative analysis, and their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. More specifically, it can be useful for tuning software tools parameters, but also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. The raw MS files can be downloaded from ProteomeXchange with identifier PXD001819. Starting from some raw files of this dataset, we also provide here some processed data obtained through various bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, to exemplify the use of such data in the context of software benchmarking, as discussed in details in the accompanying manuscript [1]. The experimental design used here for data processing takes advantage of the different spike levels introduced in the samples composing the dataset, and processed data are merged in a single file to facilitate the evaluation and illustration of software tools results for the detection of variant proteins with different absolute expression levels and fold change values. PMID:26862574
Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F
2013-08-01
To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches - for example, analysis of variance (ANOVA) - are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will
Hamilton, Martin A; Hamilton, Gordon Cord; Goeres, Darla M; Parker, Albert E
2013-01-01
This paper presents statistical techniques suitable for analyzing a collaborative study (multilaboratory study or ring trial) of a laboratory disinfectant product performance test (DPPT) method. Emphasis is on the assessment of the repeatability, reproducibility, resemblance, and responsiveness of the DPPT method. The suggested statistical techniques are easily modified for application to a single laboratory study. The presentation includes descriptions of the plots and tables that should be constructed during initial examination of the data, including a discussion of outliers and QA checks. The statistical recommendations deal with evaluations of prevailing types of DPPTs, including both quantitative and semiquantitative tests. The presentation emphasizes tests in which the disinfectant treatment is applied to surface-associated microbes and the outcome is a viable cell count; however, the statistical guidelines are appropriate for suspension tests and other test systems. The recommendations also are suitable for disinfectant tests using any microbe (vegetative bacteria, virus, spores, etc.) or any disinfectant treatment. The descriptions of the statistical techniques include either examples of calculations based on published data or citations to published calculations. Computer code is provided in an appendix.
ERIC Educational Resources Information Center
MacDonald, Paul; Paunonen, Sampo V.
2002-01-01
Examined the behavior of item and person statistics from item response theory and classical test theory frameworks through Monte Carlo methods with simulated test data. Findings suggest that item difficulty and person ability estimates are highly comparable for both approaches. (SLD)
Miecznikowski, Jeffrey C; Damodaran, Senthilkumar; Sellers, Kimberly F; Rabin, Richard A
2010-12-15
Numerous gel-based softwares exist to detect protein changes potentially associated with disease. The data, however, are abundant with technical and structural complexities, making statistical analysis a difficult task. A particularly important topic is how the various softwares handle missing data. To date, no one has extensively studied the impact that interpolating missing data has on subsequent analysis of protein spots. This work highlights the existing algorithms for handling missing data in two-dimensional gel analysis and performs a thorough comparison of the various algorithms and statistical tests on simulated and real datasets. For imputation methods, the best results in terms of root mean squared error are obtained using the least squares method of imputation along with the expectation maximization (EM) algorithm approach to estimate missing values with an array covariance structure. The bootstrapped versions of the statistical tests offer the most liberal option for determining protein spot significance while the generalized family wise error rate (gFWER) should be considered for controlling the multiple testing error. In summary, we advocate for a three-step statistical analysis of two-dimensional gel electrophoresis (2-DE) data with a data imputation step, choice of statistical test, and lastly an error control method in light of multiple testing. When determining the choice of statistical test, it is worth considering whether the protein spots will be subjected to mass spectrometry. If this is the case a more liberal test such as the percentile-based bootstrap t can be employed. For error control in electrophoresis experiments, we advocate that gFWER be controlled for multiple testing rather than the false discovery rate.
ERIC Educational Resources Information Center
Jones, Andrew T.
2011-01-01
Practitioners often depend on item analysis to select items for exam forms and have a variety of options available to them. These include the point-biserial correlation, the agreement statistic, the B index, and the phi coefficient. Although research has demonstrated that these statistics can be useful for item selection, no research as of yet has…
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
The Effects of Pre-Lecture Quizzes on Test Anxiety and Performance in a Statistics Course
ERIC Educational Resources Information Center
Brown, Michael J.; Tallon, Jennifer
2015-01-01
The purpose of our study was to examine the effects of pre-lecture quizzes in a statistics course. Students (N = 70) from 2 sections of an introductory statistics course served as participants in this study. One section completed pre-lecture quizzes whereas the other section did not. Completing pre-lecture quizzes was associated with improved exam…
ERIC Educational Resources Information Center
Jones, Andrew T.
2011-01-01
Practitioners often depend on item analysis to select items for exam forms and have a variety of options available to them. These include the point-biserial correlation, the agreement statistic, the B index, and the phi coefficient. Although research has demonstrated that these statistics can be useful for item selection, no research as of yet has…
The Effects of Pre-Lecture Quizzes on Test Anxiety and Performance in a Statistics Course
ERIC Educational Resources Information Center
Brown, Michael J.; Tallon, Jennifer
2015-01-01
The purpose of our study was to examine the effects of pre-lecture quizzes in a statistics course. Students (N = 70) from 2 sections of an introductory statistics course served as participants in this study. One section completed pre-lecture quizzes whereas the other section did not. Completing pre-lecture quizzes was associated with improved exam…
The T(ea) Test: Scripted Stories Increase Statistical Method Selection Skills
ERIC Educational Resources Information Center
Hackathorn, Jana; Ashdown, Brien
2015-01-01
To teach statistics, teachers must attempt to overcome pedagogical obstacles, such as dread, anxiety, and boredom. There are many options available to teachers that facilitate a pedagogically conducive environment in the classroom. The current study examined the effectiveness of incorporating scripted stories and humor into statistical method…
NASA Astrophysics Data System (ADS)
Shiraishi, Maresuke; Hikage, Chiaki; Namba, Ryo; Namikawa, Toshiya; Hazumi, Masashi
2016-08-01
The B -mode polarization in the cosmic microwave background (CMB) anisotropies at large angular scales provides compelling evidence for the primordial gravitational waves (GWs). It is often stated that a discovery of the GWs establishes the quantum fluctuation of vacuum during the cosmic inflation. Since the GWs could also be generated by source fields, however, we need to check if a sizable signal exists due to such source fields before reaching a firm conclusion when the B mode is discovered. Source fields of particular types can generate non-Gaussianity (NG) in the GWs. Testing statistics of the B mode is a powerful way of detecting such NG. As a concrete example, we show a model in which gauge field sources chiral GWs via a pseudoscalar coupling and forecast the detection significance at the future CMB satellite LiteBIRD. Effects of residual foregrounds and lensing B mode are both taken into account. We find the B -mode bispectrum "BBB" is in particular sensitive to the source-field NG, which is detectable at LiteBIRD with a >3 σ significance. Therefore the search for the BBB will be indispensable toward unambiguously establishing quantum fluctuation of vacuum when the B mode is discovered. We also introduced the Minkowski functional to detect the NGs. While we find that the Minkowski functional is less efficient than the harmonic-space bispectrum estimator, it still serves as a useful cross-check. Finally, we also discuss the possibility of extracting clean information on parity violation of GWs and new types of parity-violating observables induced by lensing.
Podoll, Amber S; Bell, Cynthia S; Molony, Donald A
2012-01-01
Nephrologists rely on valid clinical studies to inform their health care decisions. Knowledge of simple statistical principles equips the prudent nephrologist with the skills that allow him or her to critically evaluate clinical studies and to determine the validity of the results. Important in this process is knowing when certain statistical tests are used appropriately and if their application in interpreting research data will most likely lead to the most robust or valid conclusions. The research team bears the responsibility for determining the statistical analysis during the design phase of the study and subsequently for carrying out the appropriate analysis. This will ensure that bias is minimized and "valid" results are reported. We have summarized the important caveats and components in correctly choosing a statistical test with a series of tables. With this format, we wish to provide a tool for the nephrologist/researcher that he or she can use when required to decide if an appropriate statistical analysis plan was implemented for any particular study. We have included in these tables the types of statistical tests that might be used best for analysis of different types of comparisons on small and on larger patient samples.
Evaluating Statistical Targets for Assembling Parallel Mixed-Format Test Forms
ERIC Educational Resources Information Center
Debeer, Dries; Ali, Usama S.; van Rijn, Peter W.
2017-01-01
Test assembly is the process of selecting items from an item pool to form one or more new test forms. Often new test forms are constructed to be parallel with an existing (or an ideal) test. Within the context of item response theory, the test information function (TIF) or the test characteristic curve (TCC) are commonly used as statistical…
An objective statistical test for eccentricity forcing of Oligo-Miocene climate
NASA Astrophysics Data System (ADS)
Proistosescu, C.; Huybers, P.; Maloof, A. C.
2008-12-01
We seek a maximally objective test for the presence of orbital features in Oligocene and Miocene δ18O records from marine sediments. Changes in Earth's orbital eccentricity are thought to be an important control on the long term variability of climate during the Oligocene and Miocene Epochs. However, such an important control from eccentricity is surprising because eccentricity has relatively little influence on Earth's annual average insolation budget. Nevertheless, if significant eccentricity variability is present, it would provide important insight into the operation of the climate system at long timescales. Here we use previously published data, but using a chronology which is initially independent of orbital assumptions, to test for the presence of eccentricity period variability in the Oligocene/Miocene sediment records. In contrast to the sawtooth climate record of the Pleistocene, the Oligocene and Miocene climate record appears smooth and symmetric and does not reset itself every hundred thousand years. This smooth variation, as well as the time interval spanning many eccentricity periods makes Oligocene and Miocene paleorecords very suitable for evaluating the importance of eccentricity forcing. First, we construct time scales depending only upon the ages of geomagnetic reversals with intervening ages linearly interpolated with depth. Such a single age-depth relationship is, however, too uncertain to assess whether orbital features are present. Thus, we construct a second depth-derived age-model by averaging ages across multiple sediment cores which have, at least partly, independent accumulation rate histories. But ages are still too uncertain to permit unambiguous detection of orbital variability. Thus we employ limited tuning assumptions and measure the degree by orbital period variability increases using spectral power estimates. By tuning we know that we are biasing the record toward showing orbital variations, but we account for this bias in our
Brown, Geoffrey W.; Sandstrom, Mary M.; Preston, Daniel N.; ...
2014-11-17
In this study, the Integrated Data Collection Analysis (IDCA) program has conducted a proficiency test for small-scale safety and thermal (SSST) testing of homemade explosives (HMEs). Described here are statistical analyses of the results from this test for impact, friction, electrostatic discharge, and differential scanning calorimetry analysis of the RDX Class 5 Type II standard. The material was tested as a well-characterized standard several times during the proficiency test to assess differences among participants and the range of results that may arise for well-behaved explosive materials.
Tests for, origins of, and corrections to non-Gaussian statistics. The dipole-flip model
NASA Astrophysics Data System (ADS)
Schile, Addison J.; Thompson, Ward H.
2017-04-01
Linear response approximations are central to our understanding and simulations of nonequilibrium statistical mechanics. Despite the success of these approaches in predicting nonequilibrium dynamics, open questions remain. Laird and Thompson [J. Chem. Phys. 126, 211104 (2007)] previously formalized, in the context of solvation dynamics, the connection between the static linear-response approximation and the assumption of Gaussian statistics. The Gaussian statistics perspective is useful in understanding why linear response approximations are still accurate for perturbations much larger than thermal energies. In this paper, we use this approach to address three outstanding issues in the context of the "dipole-flip" model, which is known to exhibit nonlinear response. First, we demonstrate how non-Gaussian statistics can be predicted from purely equilibrium molecular dynamics (MD) simulations (i.e., without resort to a full nonequilibrium MD as is the current practice). Second, we show that the Gaussian statistics approximation may also be used to identify the physical origins of nonlinear response residing in a small number of coordinates. Third, we explore an approach for correcting the Gaussian statistics approximation for nonlinear response effects using the same equilibrium simulation. The results are discussed in the context of several other examples of nonlinear responses throughout the literature.
Tests for, origins of, and corrections to non-Gaussian statistics. The dipole-flip model.
Schile, Addison J; Thompson, Ward H
2017-04-21
Linear response approximations are central to our understanding and simulations of nonequilibrium statistical mechanics. Despite the success of these approaches in predicting nonequilibrium dynamics, open questions remain. Laird and Thompson [J. Chem. Phys. 126, 211104 (2007)] previously formalized, in the context of solvation dynamics, the connection between the static linear-response approximation and the assumption of Gaussian statistics. The Gaussian statistics perspective is useful in understanding why linear response approximations are still accurate for perturbations much larger than thermal energies. In this paper, we use this approach to address three outstanding issues in the context of the "dipole-flip" model, which is known to exhibit nonlinear response. First, we demonstrate how non-Gaussian statistics can be predicted from purely equilibrium molecular dynamics (MD) simulations (i.e., without resort to a full nonequilibrium MD as is the current practice). Second, we show that the Gaussian statistics approximation may also be used to identify the physical origins of nonlinear response residing in a small number of coordinates. Third, we explore an approach for correcting the Gaussian statistics approximation for nonlinear response effects using the same equilibrium simulation. The results are discussed in the context of several other examples of nonlinear responses throughout the literature.
1981-08-01
Vr ______=x (tr V) etr (xV) (2.13) kr K (k-r)! CO xkailK)CKv M 1 1 K - (X2 try V2x tr V) etr(xV) (2.14) k=OK k 00 xklK)C KvM r+2 2 r- r+1 Vr+1 -{x tr V...points. Communications in Statistics 4, 363-374. [12] Nair, U. S. (1940), Application of factorial series in the study of distribution laws in statistics
ERIC Educational Resources Information Center
Rogosa, David
1981-01-01
The form of the Johnson-Neyman region of significance is shown to be determined by the statistic for testing the null hypothesis that the population within-group regressions are parallel. Results are obtained for both simultaneous and nonsimultaneous regions of significance. (Author)
ERIC Educational Resources Information Center
Oshima, T. C.; Raju, Nambury S.; Nanda, Alice O.
2006-01-01
A new item parameter replication method is proposed for assessing the statistical significance of the noncompensatory differential item functioning (NCDIF) index associated with the differential functioning of items and tests framework. In this new method, a cutoff score for each item is determined by obtaining a (1-alpha ) percentile rank score…
USDA-ARS?s Scientific Manuscript database
Whether a required Salmonella test series is passed or failed depends not only on the presence of the bacteria, but also on the methods for taking samples, the methods for culturing samples, and the statistics associated with the sampling plan. The pass-fail probabilities of the two-class attribute...
ERIC Educational Resources Information Center
Oshima, T. C.; Raju, Nambury S.; Nanda, Alice O.
2006-01-01
A new item parameter replication method is proposed for assessing the statistical significance of the noncompensatory differential item functioning (NCDIF) index associated with the differential functioning of items and tests framework. In this new method, a cutoff score for each item is determined by obtaining a (1-alpha ) percentile rank score…
Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F
2013-01-01
Abstract To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches – for example, analysis of variance (ANOVA) – are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and
ERIC Educational Resources Information Center
Callamaras, Peter
1983-01-01
This buyer's guide to seven major types of statistics software packages for microcomputers reviews Edu-Ware Statistics 3.0; Financial Planning; Speed Stat; Statistics with DAISY; Human Systems Dynamics package of Stats Plus, ANOVA II, and REGRESS II; Maxistat; and Moore-Barnes' MBC Test Construction and MBC Correlation. (MBR)
Rasch Fit Statistics as a Test of the Invariance of Item Parameter Estimates.
ERIC Educational Resources Information Center
Smith, Richard M.; Suh, Kyunghee K.
2003-01-01
Studied the extent to which the INFIT and OUTFIT item fit statistics in WINSTEPS detect violations of the invariance property of Rasch measurement models. The analysis, based on a large number of high school students, shows that relying solely on INFIT and OUTFIT to assess model fit would cause the researcher to miss an important threat to…
Comment on a Wilcox Test Statistic for Comparing Means When Variances Are Unequal.
ERIC Educational Resources Information Center
Hsiung, Tung-Hsing; And Others
1994-01-01
The alternative proposed by Wilcox (1989) to the James second-order statistic for comparing population means when variances are heterogeneous can sometimes be invalid. The degree to which the procedure is invalid depends on differences in sample size, the expected values of the observations, and population variances. (SLD)
ERIC Educational Resources Information Center
Anastasiadou, Sofia D.
2011-01-01
The aims of this paper are to determine the validity and reliability of SASTSc scale as an instrument to measure students' attitudes that monitors affective components relevant to learning the disciple of statistics with the help of technology and its impact on students' career in a Greek sample. Initially, it consisted of 28 items concerning 5…
Basic Mathematics Test Predicts Statistics Achievement and Overall First Year Academic Success
ERIC Educational Resources Information Center
Fonteyne, Lot; De Fruyt, Filip; Dewulf, Nele; Duyck, Wouter; Erauw, Kris; Goeminne, Katy; Lammertyn, Jan; Marchant, Thierry; Moerkerke, Beatrijs; Oosterlinck, Tom; Rosseel, Yves
2015-01-01
In the psychology and educational science programs at Ghent University, only 36.1% of the new incoming students in 2011 and 2012 passed all exams. Despite availability of information, many students underestimate the scientific character of social science programs. Statistics courses are a major obstacle in this matter. Not all enrolling students…
Residuals and the Residual-Based Statistic for Testing Goodness of Fit of Structural Equation Models
ERIC Educational Resources Information Center
Foldnes, Njal; Foss, Tron; Olsson, Ulf Henning
2012-01-01
The residuals obtained from fitting a structural equation model are crucial ingredients in obtaining chi-square goodness-of-fit statistics for the model. The authors present a didactic discussion of the residuals, obtaining a geometrical interpretation by recognizing the residuals as the result of oblique projections. This sheds light on the…
Basic Mathematics Test Predicts Statistics Achievement and Overall First Year Academic Success
ERIC Educational Resources Information Center
Fonteyne, Lot; De Fruyt, Filip; Dewulf, Nele; Duyck, Wouter; Erauw, Kris; Goeminne, Katy; Lammertyn, Jan; Marchant, Thierry; Moerkerke, Beatrijs; Oosterlinck, Tom; Rosseel, Yves
2015-01-01
In the psychology and educational science programs at Ghent University, only 36.1% of the new incoming students in 2011 and 2012 passed all exams. Despite availability of information, many students underestimate the scientific character of social science programs. Statistics courses are a major obstacle in this matter. Not all enrolling students…
Residuals and the Residual-Based Statistic for Testing Goodness of Fit of Structural Equation Models
ERIC Educational Resources Information Center
Foldnes, Njal; Foss, Tron; Olsson, Ulf Henning
2012-01-01
The residuals obtained from fitting a structural equation model are crucial ingredients in obtaining chi-square goodness-of-fit statistics for the model. The authors present a didactic discussion of the residuals, obtaining a geometrical interpretation by recognizing the residuals as the result of oblique projections. This sheds light on the…
A Statistical Analysis of Infrequent Events on Multiple-Choice Tests that Indicate Probable Cheating
ERIC Educational Resources Information Center
Sundermann, Michael J.
2008-01-01
A statistical analysis of multiple-choice answers is performed to identify anomalies that can be used as evidence of student cheating. The ratio of exact errors in common (EEIC: two students put the same wrong answer for a question) to differences (D: two students get different answers) was found to be a good indicator of cheating under a wide…
ERIC Educational Resources Information Center
Novick, Melvin R.
This project is concerned with the development and implementation of some new statistical techniques that will facilitate a continuing input of information about the student to the instructional manager so that individualization of instruction can be managed effectively. The source of this informational input is typically a short…
The Adequacy of Different Robust Statistical Tests in Comparing Two Independent Groups
ERIC Educational Resources Information Center
Pero-Cebollero, Maribel; Guardia-Olmos, Joan
2013-01-01
In the current study, we evaluated various robust statistical methods for comparing two independent groups. Two scenarios for simulation were generated: one of equality and another of population mean differences. In each of the scenarios, 33 experimental conditions were used as a function of sample size, standard deviation and asymmetry. For each…
Festing, Michael F. W.
2014-01-01
The safety of chemicals, drugs, novel foods and genetically modified crops is often tested using repeat-dose sub-acute toxicity tests in rats or mice. It is important to avoid misinterpretations of the results as these tests are used to help determine safe exposure levels in humans. Treated and control groups are compared for a range of haematological, biochemical and other biomarkers which may indicate tissue damage or other adverse effects. However, the statistical analysis and presentation of such data poses problems due to the large number of statistical tests which are involved. Often, it is not clear whether a “statistically significant” effect is real or a false positive (type I error) due to sampling variation. The author's conclusions appear to be reached somewhat subjectively by the pattern of statistical significances, discounting those which they judge to be type I errors and ignoring any biomarker where the p-value is greater than p = 0.05. However, by using standardised effect sizes (SESs) a range of graphical methods and an over-all assessment of the mean absolute response can be made. The approach is an extension, not a replacement of existing methods. It is intended to assist toxicologists and regulators in the interpretation of the results. Here, the SES analysis has been applied to data from nine published sub-acute toxicity tests in order to compare the findings with those of the author's. Line plots, box plots and bar plots show the pattern of response. Dose-response relationships are easily seen. A “bootstrap” test compares the mean absolute differences across dose groups. In four out of seven papers where the no observed adverse effect level (NOAEL) was estimated by the authors, it was set too high according to the bootstrap test, suggesting that possible toxicity is under-estimated. PMID:25426843
Statistical Profiling of Academic Oral English Proficiency Based on an ITA Screening Test
ERIC Educational Resources Information Center
Choi, Ick Kyu
2013-01-01
At the University of California, Los Angeles, the Test of Oral Proficiency (TOP), an internally developed oral proficiency test, is administered to international teaching assistant (ITA) candidates to ensure an appropriate level of academic oral English proficiency. Test taker performances are rated live by two raters according to four subscales.…
Statistical Profiling of Academic Oral English Proficiency Based on an ITA Screening Test
ERIC Educational Resources Information Center
Choi, Ick Kyu
2013-01-01
At the University of California, Los Angeles, the Test of Oral Proficiency (TOP), an internally developed oral proficiency test, is administered to international teaching assistant (ITA) candidates to ensure an appropriate level of academic oral English proficiency. Test taker performances are rated live by two raters according to four subscales.…
ERIC Educational Resources Information Center
Huynh, Huynh
1979-01-01
In mastery testing, the raw agreement index and the kappa index may be estimated via one test administration when the test scores follow beta-binomial distributions. This paper reports formulae, tables, and a computer program which facilitate the computation of the standard errors of the estimates. (Author/CTM)
ERIC Educational Resources Information Center
Reese, Lynda M.
This study extended prior Law School Admission Council (LSAC) research related to the item response theory (IRT) local item independence assumption into the realm of classical test theory. Initially, results from the Law School Admission Test (LSAT) and two other tests were investigated to determine the approximate state of local item independence…
ERIC Educational Resources Information Center
Alexandrowicz, Rainer W.
2011-01-01
The linear logistic test model (LLTM) is a valuable and approved tool in educational research, as it allows for modelling cognitive components involved in a cognitive task. It allows for a rigorous assessment of fit by means of a Likelihood Ratio Test (LRT). This approach is genuine to the Rasch family of models, yet it suffers from the unsolved…
NASA Astrophysics Data System (ADS)
Hilborn, Robert C.
1997-04-01
The connection between the spin of particles and the permutation symmetry ("statistics") of multiparticle states lies at the heart of much of atomic, molecular, condensed matter, and nuclear physics. The spin-statistics theorem of relativistic quantum field theory seems to provide a theoretical basis for this connection. There are, however, loopholes (O. W. Greenberg, Phys. Rev. D 43, 4111 (1991).) that allow for a field theory of identical particles whose statistics interpolate smoothly between that of bosons and fermions. Thus, it is up to experiment to reveal how closely nature follows the usual spin- statistics connection. After reviewing experiments that provide stringent limits on possible violations of the spin-statistics connection for electrons, I shall describe recent analogous experiments for spin-0 particles (R. C. Hilborn and C. L. Yuca, Phys. Rev. Lett. 76, 2844 (1996).) using diode laser spectroscopy of the A-band of molecular oxygen near 760 nm. These experiments show that the probability of finding two ^16O nuclei (spin-0 particles) in an antisymmetric state is less than 1ppm. I shall also discuss proposals to test the spin-statistics connection for photons.
Murphy, Thomas; Schwedock, Julie; Nguyen, Kham; Mills, Anna; Jones, David
2015-01-01
New recommendations for the validation of rapid microbiological methods have been included in the revised Technical Report 33 release from the PDA. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This case study applies those statistical methods to accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological methods system being evaluated for water bioburden testing. Results presented demonstrate that the statistical methods described in the PDA Technical Report 33 chapter can all be successfully applied to the rapid microbiological method data sets and gave the same interpretation for equivalence to the standard method. The rapid microbiological method was in general able to pass the requirements of PDA Technical Report 33, though the study shows that there can be occasional outlying results and that caution should be used when applying statistical methods to low average colony-forming unit values. Prior to use in a quality-controlled environment, any new method or technology has to be shown to work as designed by the manufacturer for the purpose required. For new rapid microbiological methods that detect and enumerate contaminating microorganisms, additional recommendations have been provided in the revised PDA Technical Report No. 33. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This paper applies those statistical methods to analyze accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological method system being validated for water bioburden testing. The case study demonstrates that the statistical methods described in the PDA Technical Report No. 33 chapter can be successfully applied to rapid microbiological method data sets and give the same comparability results for similarity or difference as the standard method. © PDA, Inc
Burgess, R.M.; Morrison, G.E. . Environmental Research Lab.)
1994-04-01
Over the last 10 years a great deal of research effort has concentrated on determining the effects of contaminated sediments on aquatic organisms. For marine systems, this effort has emphasized acute sediment toxicity tests using amphipods, although a variety of other end points and species have been used. Another candidate species for marine, solid-phase, sublethal sediment toxicity testing is the bivalve Mulinia lateralis. Useful attributes of this euryhaline bivalve include a wide geographic distribution, easy lab culture, and amenability to toxicity testing applications. Detailed in this paper are organism selection and culture, establishment of statistical design, and an estimate of organism mortality and sublethal response variability. Results of Mulinia lateralis toxicity tests with 65 contaminated sediments from eight sites are reported, as well as results of comparative toxicity tests using two amphipod species. Ampelisca abdita and Eohaustorius estuaries. Analysis of statistical power indicates treatment weight and survival response that are 25% different from the site control responses can be detected with a probability of 95%. Results of comparative toxicity tests illustrate that although Mulinia lateralis and amphipod acute end points are relatively similar in sensitivity, utilization of the Mulinia lateralis sublethal growth end point greatly increases test sensitivity. This paper describes a new marine sediment toxicity test that complements the existing suite of marine sediment toxicity assessment techniques.
Kruschke, John K; Liddell, Torrin M
2017-02-07
In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty on the other. Among frequentists in psychology, a shift of emphasis from hypothesis testing to estimation has been dubbed "the New Statistics" (Cumming 2014). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis.
Consistency in statistical moments as a test for bubble cloud clustering.
Weber, Thomas C; Lyons, Anthony P; Bradley, David L
2011-11-01
Frequency dependent measurements of attenuation and/or sound speed through clouds of gas bubbles in liquids are often inverted to find the bubble size distribution and the void fraction of gas. The inversions are often done using an effective medium theory as a forward model under the assumption that the bubble positions are Poisson distributed (i.e., statistically independent). Under circumstances in which single scattering does not adequately describe the pressure field, the assumption of independence in position can yield large errors when clustering is present, leading to errors in the inverted bubble size distribution. It is difficult, however, to determine the existence of clustering in bubble clouds without the use of specialized acoustic or optical imaging equipment. A method is described here in which the existence of bubble clustering can be identified by examining the consistency between the first two statistical moments of multiple frequency acoustic measurements.
Mitchell, Meghan B.; Shaughnessy, Lynn W.; Shirk, Steven D.; Yang, Frances M.; Atri, Alireza
2013-01-01
Accurate measurement of cognitive function is critical for understanding the disease course of Alzheimer’s disease (AD). Detecting cognitive change over time can be confounded by level of premorbid intellectual function or cognitive reserve and lead to under or over diagnosis of cognitive impairment and AD. Statistical models of cognitive performance that include cognitive reserve can improve sensitivity to change and clinical efficacy. We used confirmatory factor analysis to test a four-factor model comprised of memory/language, processing speed/executive function, attention, and cognitive reserve factors in a group of cognitively healthy older adults and a group of participants along the spectrum of amnestic mild cognitive impairment to AD (aMCI-AD). The model showed excellent fit for the control group (χ2 = 100, df = 78, CFI = .962, RMSEA = .049) and adequate fit for the aMCI-AD group (χ2 = 1750, df = 78, CFI = .932, RMSEA = .085). Though strict invariance criteria were not met, invariance testing to determine if factor structures are similar across groups yielded acceptable absolute model fits and provide evidence in support of configural, metric, and scalar invariance. These results provide further support for the construct validity of cognitive reserve in healthy and memory impaired older adults. PMID:23039909
NASA Astrophysics Data System (ADS)
Perlicki, Krzysztof
2010-03-01
A low-cost statistical polarization mode dispersion/polarization dependent loss emulator is presented in this article. The emulator was constructed by concatenating 15 highly birefringence optical-fiber segments and randomly varying the mode coupling between them by rotating the polarization state. The impact of polarization effects on polarization division multiplexing transmission quality was measured. The designed polarization mode dispersion/polarization dependent loss emulator was applied to mimic the polarization effects of real optical-fiber links.
2014-10-01
first introduced in the seminal paper by Wallis (1951). Wallis extended the previous work of Wald and Wolfowitz (1946) for a normally distrib- uted...Statistical tolerance regions: Theory, applications, and computation (Vol. 744). Hoboken, NJ: John Wiley & Sons. 22Defense ARJ, October 2014...D. C. (2001). Design and analysis of experiments (5th ed.). Hoboken, NJ: John Wiley & Sons. Montgomery, D. C. (2005). Design and analysis of
Okamura, H; Punt, A E; Semba, Y; Ichinokawa, M
2013-04-01
This paper proposes a new and flexible statistical method for marginal increment analysis that directly accounts for periodicity in circular data using a circular-linear regression model with random effects. The method is applied to vertebral marginal increment data for Alaska skate Bathyraja parmifera. The best fit model selected using the AIC indicates that growth bands are formed annually. Simulation, where the underlying characteristics of the data are known, shows that the method performs satisfactorily when uncertainty is not extremely high.
Posada, David
2006-07-01
ModelTest server is a web-based application for the selection of models of nucleotide substitution using the program ModelTest. The server takes as input a text file with likelihood scores for the set of candidate models. Models can be selected with hierarchical likelihood ratio tests, or with the Akaike or Bayesian information criteria. The output includes several statistics for the assessment of model selection uncertainty, for model averaging or to estimate the relative importance of model parameters. The server can be accessed at http://darwin.uvigo.es/software/modeltest_server.html.
Posada, David
2006-01-01
ModelTest server is a web-based application for the selection of models of nucleotide substitution using the program ModelTest. The server takes as input a text file with likelihood scores for the set of candidate models. Models can be selected with hierarchical likelihood ratio tests, or with the Akaike or Bayesian information criteria. The output includes several statistics for the assessment of model selection uncertainty, for model averaging or to estimate the relative importance of model parameters. The server can be accessed at . PMID:16845102
Statistical tests with accurate size and power for balanced linear mixed models.
Muller, Keith E; Edwards, Lloyd J; Simpson, Sean L; Taylor, Douglas J
2007-08-30
The convenience of linear mixed models for Gaussian data has led to their widespread use. Unfortunately, standard mixed model tests often have greatly inflated test size in small samples. Many applications with correlated outcomes in medical imaging and other fields have simple properties which do not require the generality of a mixed model. Alternately, stating the special cases as a general linear multivariate model allows analysing them with either the univariate or multivariate approach to repeated measures (UNIREP, MULTIREP). Even in small samples, an appropriate UNIREP or MULTIREP test always controls test size and has a good power approximation, in sharp contrast to mixed model tests. Hence, mixed model tests should never be used when one of the UNIREP tests (uncorrected, Huynh-Feldt, Geisser-Greenhouse, Box conservative) or MULTIREP tests (Wilks, Hotelling-Lawley, Roy's, Pillai-Bartlett) apply. Convenient methods give exact power for the uncorrected and Box conservative tests. Simulations demonstrate that new power approximations for all four UNIREP tests eliminate most inaccuracy in existing methods. In turn, free software implements the approximations to give a better choice of sample size. Two repeated measures power analyses illustrate the methods. The examples highlight the advantages of examining the entire response surface of power as a function of sample size, mean differences, and variability.
Stanisavljevic, Dejana; Trajkovic, Goran; Marinkovic, Jelena; Bukumiric, Zoran; Cirkovic, Andja; Milic, Natasa
2014-01-01
Medical statistics has become important and relevant for future doctors, enabling them to practice evidence based medicine. Recent studies report that students' attitudes towards statistics play an important role in their statistics achievements. The aim of the study was to test the psychometric properties of the Serbian version of the Survey of Attitudes Towards Statistics (SATS) in order to acquire a valid instrument to measure attitudes inside the Serbian educational context. The validation study was performed on a cohort of 417 medical students who were enrolled in an obligatory introductory statistics course. The SATS adaptation was based on an internationally accepted methodology for translation and cultural adaptation. Psychometric properties of the Serbian version of the SATS were analyzed through the examination of factorial structure and internal consistency. Most medical students held positive attitudes towards statistics. The average total SATS score was above neutral (4.3±0.8), and varied from 1.9 to 6.2. Confirmatory factor analysis validated the six-factor structure of the questionnaire (Affect, Cognitive Competence, Value, Difficulty, Interest and Effort). Values for fit indices TLI (0.940) and CFI (0.961) were above the cut-off of ≥0.90. The RMSEA value of 0.064 (0.051-0.078) was below the suggested value of ≤0.08. Cronbach's alpha of the entire scale was 0.90, indicating scale reliability. In a multivariate regression model, self-rating of ability in mathematics and current grade point average were significantly associated with the total SATS score after adjusting for age and gender. Present study provided the evidence for the appropriate metric properties of the Serbian version of SATS. Confirmatory factor analysis validated the six-factor structure of the scale. The SATS might be reliable and a valid instrument for identifying medical students' attitudes towards statistics in the Serbian educational context.
Stanisavljevic, Dejana; Trajkovic, Goran; Marinkovic, Jelena; Bukumiric, Zoran; Cirkovic, Andja; Milic, Natasa
2014-01-01
Background Medical statistics has become important and relevant for future doctors, enabling them to practice evidence based medicine. Recent studies report that students’ attitudes towards statistics play an important role in their statistics achievements. The aim of the study was to test the psychometric properties of the Serbian version of the Survey of Attitudes Towards Statistics (SATS) in order to acquire a valid instrument to measure attitudes inside the Serbian educational context. Methods The validation study was performed on a cohort of 417 medical students who were enrolled in an obligatory introductory statistics course. The SATS adaptation was based on an internationally accepted methodology for translation and cultural adaptation. Psychometric properties of the Serbian version of the SATS were analyzed through the examination of factorial structure and internal consistency. Results Most medical students held positive attitudes towards statistics. The average total SATS score was above neutral (4.3±0.8), and varied from 1.9 to 6.2. Confirmatory factor analysis validated the six-factor structure of the questionnaire (Affect, Cognitive Competence, Value, Difficulty, Interest and Effort). Values for fit indices TLI (0.940) and CFI (0.961) were above the cut-off of ≥0.90. The RMSEA value of 0.064 (0.051–0.078) was below the suggested value of ≤0.08. Cronbach’s alpha of the entire scale was 0.90, indicating scale reliability. In a multivariate regression model, self-rating of ability in mathematics and current grade point average were significantly associated with the total SATS score after adjusting for age and gender. Conclusion Present study provided the evidence for the appropriate metric properties of the Serbian version of SATS. Confirmatory factor analysis validated the six-factor structure of the scale. The SATS might be reliable and a valid instrument for identifying medical students’ attitudes towards statistics in the Serbian
A comparison of exact tests for trend with binary endpoints using Bartholomew's statistic.
Consiglio, J D; Shan, G; Wilding, G E
2014-01-01
Tests for trend are important in a number of scientific fields when trends associated with binary variables are of interest. Implementing the standard Cochran-Armitage trend test requires an arbitrary choice of scores assigned to represent the grouping variable. Bartholomew proposed a test for qualitatively ordered samples using asymptotic critical values, but type I error control can be problematic in finite samples. To our knowledge, use of the exact probability distribution has not been explored, and we study its use in the present paper. Specifically we consider an approach based on conditioning on both sets of marginal totals and three unconditional approaches where only the marginal totals corresponding to the group sample sizes are treated as fixed. While slightly conservative, all four tests are guaranteed to have actual type I error rates below the nominal level. The unconditional tests are found to exhibit far less conservatism than the conditional test and thereby gain a power advantage.
Statistical considerations of the random selection process in a drug testing program
Burtis, C.A.; Owings, J.H.; Leete, R.S. Jr.
1987-01-01
In a prospective drug testing program, individuals whose job classifications have been defined as sensitive are placed in a selection pool. On a periodic basis, individuals are chosen from this pool for drug testing. Random selection is a fair and impartial approach. A random selection process generates a Poisson distribution of probabilities that can be used to predict how many times an individual will be selected during a specific time interval. This information can be used to model the selection part of a drug testing program to determine whether specific conditions of testing are met. For example, the probability of being selected a given number of times during the testing period can be minimized or maximized by varying the frequency of the sampling process. Consequently, the Poisson distribution and the mathematics governing it can be used to structure a drug testing program to meet the needs and dictates of any given situation.
Statistical properties of an early stopping rule for resampling-based multiple testing.
Jiang, Hui; Salzman, Julia
2012-12-01
Resampling-based methods for multiple hypothesis testing often lead to long run times when the number of tests is large. This paper presents a simple rule that substantially reduces computation by allowing resampling to terminate early on a subset of tests. We prove that the method has a low probability of obtaining a set of rejected hypotheses different from those rejected without early stopping, and obtain error bounds for multiple hypothesis testing. Simulation shows that our approach saves more computation than other available procedures.
ERIC Educational Resources Information Center
Petocz, Peter; Sowey, Eric
2008-01-01
In this article, the authors focus on hypothesis testing--that peculiarly statistical way of deciding things. Statistical methods for testing hypotheses were developed in the 1920s and 1930s by some of the most famous statisticians, in particular Ronald Fisher, Jerzy Neyman and Egon Pearson, who laid the foundations of almost all modern methods of…
ERIC Educational Resources Information Center
Petocz, Peter; Sowey, Eric
2008-01-01
In this article, the authors focus on hypothesis testing--that peculiarly statistical way of deciding things. Statistical methods for testing hypotheses were developed in the 1920s and 1930s by some of the most famous statisticians, in particular Ronald Fisher, Jerzy Neyman and Egon Pearson, who laid the foundations of almost all modern methods of…
Weber, Benjamin; Lee, Sau L; Delvadia, Renishkumar; Lionberger, Robert; Li, Bing V; Tsong, Yi; Hochhaus, Guenther
2015-03-01
Equivalence testing of aerodynamic particle size distribution (APSD) through multi-stage cascade impactors (CIs) is important for establishing bioequivalence of orally inhaled drug products. Recent work demonstrated that the median of the modified chi-square ratio statistic (MmCSRS) is a promising metric for APSD equivalence testing of test (T) and reference (R) products as it can be applied to a reduced number of CI sites that are more relevant for lung deposition. This metric is also less sensitive to the increased variability often observed for low-deposition sites. A method to establish critical values for the MmCSRS is described here. This method considers the variability of the R product by employing a reference variance scaling approach that allows definition of critical values as a function of the observed variability of the R product. A stepwise CI equivalence test is proposed that integrates the MmCSRS as a method for comparing the relative shapes of CI profiles and incorporates statistical tests for assessing equivalence of single actuation content and impactor sized mass. This stepwise CI equivalence test was applied to 55 published CI profile scenarios, which were classified as equivalent or inequivalent by members of the Product Quality Research Institute working group (PQRI WG). The results of the stepwise CI equivalence test using a 25% difference in MmCSRS as an acceptance criterion provided the best matching with those of the PQRI WG as decisions of both methods agreed in 75% of the 55 CI profile scenarios.
Goodness of Fit Tests for Composite Hypotheses Based on an Increasing Number of Order Statistics
1976-09-01
34near normal" alternatives. Sarkadi [22] proved the consistency of the W*-test against alternatives wi’h * finite second moments. -1 X 65 The most...Cologne. [22] Sarkadi , K. (1975), "The Consistency of the Shapiro-Francia Test", Biometrika, Vol. 62, up. 445-450. [23] Sen, P.K. (1959), "On the Moments of
Multilevel Factor Analysis by Model Segregation: New Applications for Robust Test Statistics
ERIC Educational Resources Information Center
Schweig, Jonathan
2014-01-01
Measures of classroom environments have become central to policy efforts that assess school and teacher quality. This has sparked a wide interest in using multilevel factor analysis to test measurement hypotheses about classroom-level variables. One approach partitions the total covariance matrix and tests models separately on the…
ERIC Educational Resources Information Center
Adams, David R.; Cousley, Samuel B.
1977-01-01
Application of the Kruskal-Wallis test to survey research problems is discussed as an alternative for the business education researcher in testing questionnaire response differences among three or more independent groups. Problem illustrations and a computer program are included. (MF)
ERIC Educational Resources Information Center
Wilson, Kenneth M.; Powers, Donald E.
This study was undertaken to clarify the internal structure of the Law School Admission Test (LSAT) and shed light on the ability or abilities measured by the three item types that make up the test (logical reasoning, analytical reasoning, and reading comprehension). The study used data for two forms of the LSAT for general samples of LSAT…
USDA-ARS?s Scientific Manuscript database
The availability of accurate diagnostic tests is essential for the detection and control of Toxoplasma gondii infections in both definitive and intermediate hosts. Sensitivity, specificity and the area under the receiver-operating characteristic (ROC) curve are commonly-used measures of test accura...
Hybrid Statistical Testing for Nuclear Material Accounting Data and/or Process Monitoring Data
Ticknor, Lawrence O.; Hamada, Michael Scott; Sprinkle, James K.; Burr, Thomas Lee
2015-04-14
The two tests employed in the hybrid testing scheme are Page’s cumulative sums for all streams within a Balance Period (maximum of the maximums and average of the maximums) and Crosier’s multivariate cumulative sum applied to incremental cumulative sums across Balance Periods. The role of residuals for both kinds of data is discussed.
Multilevel Factor Analysis by Model Segregation: New Applications for Robust Test Statistics
ERIC Educational Resources Information Center
Schweig, Jonathan
2014-01-01
Measures of classroom environments have become central to policy efforts that assess school and teacher quality. This has sparked a wide interest in using multilevel factor analysis to test measurement hypotheses about classroom-level variables. One approach partitions the total covariance matrix and tests models separately on the…
ERIC Educational Resources Information Center
Woodruff, David; Wu, Yi-Fang
2012-01-01
The purpose of this paper is to illustrate alpha's robustness and usefulness, using actual and simulated educational test data. The sampling properties of alpha are compared with the sampling properties of several other reliability coefficients: Guttman's lambda[subscript 2], lambda[subscript 4], and lambda[subscript 6]; test-retest reliability;…
ERIC Educational Resources Information Center
Deacon, S. Helene; Leung, Dilys
2013-01-01
This study tested the diverging predictions of recent theories of children's learning of spelling regularities. We asked younger (Grades 1 and 2) and older (Grades 3 and 4) elementary school-aged children to choose the correct endings for words that varied in their morphological structure. We tested the impacts of semantic frequency by…
Plotkin, Marya; Besana, Giulia V R; Yuma, Safina; Kim, Young Mi; Kulindwa, Yusuph; Kabole, Fatma; Lu, Enriquito; Giattas, Mary Rose
2014-09-30
While the lifetime risk of developing cervical cancer (CaCx) and acquiring HIV is high for women in Tanzania, most women have not tested for HIV in the past year and most have never been screened for CaCx. Good management of both diseases, which have a synergistic relationship, requires integrated screening, prevention, and treatment services. The aim of this analysis is to assess the acceptability, feasibility and effectiveness of integrating HIV testing into CaCx prevention services in Tanzania, so as to inform scale-up strategies. We analysed 2010-2013 service delivery data from 21 government health facilities in four regions of the country, to examine integration of HIV testing within newly introduced CaCx screening and treatment services, located in the reproductive and child health (RCH) section of the facility. Analysis included the proportion of clients offered and accepting the HIV test, reasons why testing was not offered or was declined, and HIV status of CaCx screening clients. A total of 24,966 women were screened for CaCx; of these, approximately one-quarter (26%) were referred in from HIV care and treatment clinics. Among the women of unknown HIV status (n = 18,539), 60% were offered an HIV test. The proportion of women offered an HIV test varied over time, but showed a trend of decline as the program expanded. Unavailability of HIV test kits at the facility was the most common reason for a CaCx screening client not to be offered an HIV test (71% of 6,321 cases). Almost all women offered (94%) accepted testing, and 5% of those tested (582 women) learned for the first time that they were HIV-positive. Integrating HIV testing into CaCx screening services was highly acceptable to clients and was an effective means of reaching HIV-positive women who did not know their status; effectiveness was limited, however, by shortages of HIV test kits at facilities. Integration of HIV testing into CaCx screening services should be prioritized in HIV
De Meeûs, Thierry
2014-03-01
In population genetics data analysis, researchers are often faced to the problem of decision making from a series of tests of the same null hypothesis. This is the case when one wants to test differentiation between pathogens found on different host species sampled from different locations (as many tests as number of locations). Many procedures are available to date but not all apply to all situations. Finding which tests are significant or if the whole series is significant, when tests are independent or not do not require the same procedures. In this note I describe several procedures, among the simplest and easiest to undertake, that should allow decision making in most (if not all) situations population geneticists (or biologists) should meet, in particular in host-parasite systems. Copyright © 2014 Elsevier B.V. All rights reserved.
Statistically based reevaluation of PISC-II round robin test data
Heasler, P.G.; Taylor, T.T.; Doctor, S.R. )
1993-05-01
This report presents a re-analysis of an international PISC-II (Programme for Inspection of Steel Components, Phase 2) round-robin inspection results using formal statistical techniques to account for experimental error. The analysis examines US team performance vs. other participants performance,flaw sizing performance and errors associated with flaw sizing, factors influencing flaw detection probability, performance of all participants with respect to recently adopted ASME Section 11 flaw detection performance demonstration requirements, and develops conclusions concerning ultrasonic inspection capability. Inspection data were gathered on four heavy section steel components which included two plates and two nozzle configurations.
Divine, George; Norton, H James; Hunt, Ronald; Dienemann, Jacqueline
2013-09-01
When a study uses an ordinal outcome measure with unknown differences in the anchors and a small range such as 4 or 7, use of the Wilcoxon rank sum test or the Wilcoxon signed rank test may be most appropriate. However, because nonparametric methods are at best indirect functions of standard measures of location such as means or medians, the choice of the most appropriate summary measure can be difficult. The issues underlying use of these tests are discussed. The Wilcoxon-Mann-Whitney odds directly reflects the quantity that the rank sum procedure actually tests, and thus it can be a superior summary measure. Unlike the means and medians, its value will have a one-to-one correspondence with the Wilcoxon rank sum test result. The companion article appearing in this issue of Anesthesia & Analgesia ("Aromatherapy as Treatment for Postoperative Nausea: A Randomized Trial") illustrates these issues and provides an example of a situation for which the medians imply no difference between 2 groups, even though the groups are, in fact, quite different. The trial cited also provides an example of a single sample that has a median of zero, yet there is a substantial shift for much of the nonzero data, and the Wilcoxon signed rank test is quite significant. These examples highlight the potential discordance between medians and Wilcoxon test results. Along with the issues surrounding the choice of a summary measure, there are considerations for the computation of sample size and power, confidence intervals, and multiple comparison adjustment. In addition, despite the increased robustness of the Wilcoxon procedures relative to parametric tests, some circumstances in which the Wilcoxon tests may perform poorly are noted, along with alternative versions of the procedures that correct for such limitations.
Turnidge, John; Bordash, Gerry
2007-07-01
Quality control (QC) ranges for antimicrobial agents against QC strains for both dilution and disk diffusion testing are currently set by the Clinical and Laboratory Standards Institute (CLSI), using data gathered in predefined structured multilaboratory studies, so-called tier 2 studies. The ranges are finally selected by the relevant CLSI subcommittee, based largely on visual inspection and a few simple rules. We have developed statistical methods for analyzing the data from tier 2 studies and applied them to QC strain-antimicrobial agent combinations from 178 dilution testing data sets and 48 disk diffusion data sets, including a method for identifying possible outlier data from individual laboratories. The methods are based on the fact that dilution testing MIC data were log normally distributed and disk diffusion zone diameter data were normally distributed. For dilution testing, compared to QC ranges actually set by CLSI, calculated ranges were identical in 68% of cases, narrower in 7% of cases, and wider in 14% of cases. For disk diffusion testing, calculated ranges were identical to CLSI ranges in 33% of cases, narrower in 8% of cases, and 1 to 2 mm wider in 58% of cases. Possible outliers were detected in 8% of diffusion test data but none of the disk diffusion data. Application of statistical techniques to the analysis of QC tier 2 data and the setting of QC ranges is relatively simple to perform on spreadsheets, and the output enhances the current CLSI methods for setting of QC ranges.
NASA Technical Reports Server (NTRS)
Dimitri, P. S.; Wall, C. 3rd; Oas, J. G.; Rauch, S. D.
2001-01-01
Meniere's disease (MD) and migraine associated dizziness (MAD) are two disorders that can have similar symptomatologies, but differ vastly in treatment. Vestibular testing is sometimes used to help differentiate between these disorders, but the inefficiency of a human interpreter analyzing a multitude of variables independently decreases its utility. Our hypothesis was that we could objectively discriminate between patients with MD and those with MAD using select variables from the vestibular test battery. Sinusoidal harmonic acceleration test variables were reduced to three vestibulo-ocular reflex physiologic parameters: gain, time constant, and asymmetry. A combination of these parameters plus a measurement of reduced vestibular response from caloric testing allowed us to achieve a joint classification rate of 91%, independent quadratic classification algorithm. Data from posturography were not useful for this type of differentiation. Overall, our classification function can be used as an unbiased assistant to discriminate between MD and MAD and gave us insight into the pathophysiologic differences between the two disorders.
NASA Technical Reports Server (NTRS)
Dimitri, P. S.; Wall, C. 3rd; Oas, J. G.; Rauch, S. D.
2001-01-01
Meniere's disease (MD) and migraine associated dizziness (MAD) are two disorders that can have similar symptomatologies, but differ vastly in treatment. Vestibular testing is sometimes used to help differentiate between these disorders, but the inefficiency of a human interpreter analyzing a multitude of variables independently decreases its utility. Our hypothesis was that we could objectively discriminate between patients with MD and those with MAD using select variables from the vestibular test battery. Sinusoidal harmonic acceleration test variables were reduced to three vestibulo-ocular reflex physiologic parameters: gain, time constant, and asymmetry. A combination of these parameters plus a measurement of reduced vestibular response from caloric testing allowed us to achieve a joint classification rate of 91%, independent quadratic classification algorithm. Data from posturography were not useful for this type of differentiation. Overall, our classification function can be used as an unbiased assistant to discriminate between MD and MAD and gave us insight into the pathophysiologic differences between the two disorders.
NASA Technical Reports Server (NTRS)
Hughes, William O.; McNelis, Anne M.
2010-01-01
The Earth Observing System (EOS) Terra spacecraft was launched on an Atlas IIAS launch vehicle on its mission to observe planet Earth in late 1999. Prior to launch, the new design of the spacecraft's pyroshock separation system was characterized by a series of 13 separation ground tests. The analysis methods used to evaluate this unusually large amount of shock data will be discussed in this paper, with particular emphasis on population distributions and finding statistically significant families of data, leading to an overall shock separation interface level. The wealth of ground test data also allowed a derivation of a Mission Assurance level for the flight. All of the flight shock measurements were below the EOS Terra Mission Assurance level thus contributing to the overall success of the EOS Terra mission. The effectiveness of the statistical methodology for characterizing the shock interface level and for developing a flight Mission Assurance level from a large sample size of shock data is demonstrated in this paper.
A statistical analysis of effects of test methods on spun carbon nanotube yarn
NASA Astrophysics Data System (ADS)
Veliky, Kenneth Blake
Carbon nanotube (CNT) fibers are very promising materials for many applications. Strong interactions among individual CNTs could produce a dense yarn results in exceptional properties. These properties are used in the application of high-performance reinforcement for composites. . As the reinforcement, the primary function is to provide outstanding load bearing capability. Currently literatures use a variety of measurement techniques and gauge lengths that have not been uniform for CNT yarn tests. The need for a standardized testing method for characterization is necessary in generating reproducible and comparable data for CNT yarn or fiber materials. In this work, the strength of CNT fibers was characterized using three different types of tensile test method: the film and fiber test fixtures from dynamics mechanic analysis (DMA), and --TS 600 tensile fixture. Samples that underwent the film and TS 600 tensile fixture were attached with a thick paper tabbing methodology based on ASTM standard D3379. As for the fiber fixture was performed with the test material attached directly to the fixture based on the fiber test instruction from TA Instrument. The results of the three different methods provided distinct variance in stress, strain, and modulus. A design of experiment (DoE) was established and performed on the DMA film fixture as determined from the preliminary experiment. The DoE was successful in quantifying the critical parameters' ranges that attributed to standard deviation of average stress. These parameters were then tested on 30 more samples with an improved additive manufactured tab. The results significantly decreased all mechanical testing parameters' standard deviations. Most importantly, the results prove the probability of a valid gauge break increased to more than 400%.
Drop-Weight Impact Test on U-Shape Concrete Specimens with Statistical and Regression Analyses
Zhu, Xue-Chao; Zhu, Han; Li, Hao-Ran
2015-01-01
According to the principle and method of drop-weight impact test, the impact resistance of concrete was measured using self-designed U-shape specimens and a newly designed drop-weight impact test apparatus. A series of drop-weight impact tests were carried out with four different masses of drop hammers (0.875, 0.8, 0.675 and 0.5 kg). The test results show that the impact resistance results fail to follow a normal distribution. As expected, U-shaped specimens can predetermine the location of the cracks very well. It is also easy to record the cracks propagation during the test. The maximum of coefficient of variation in this study is 31.2%; it is lower than the values obtained from the American Concrete Institute (ACI) impact tests in the literature. By regression analysis, the linear relationship between the first-crack and ultimate failure impact resistance is good. It can suggested that a minimum number of specimens is required to reliably measure the properties of the material based on the observed levels of variation. PMID:28793540
Statistical Tests for Detection of Misspecified Relationships by Use of Genome-Screen Data
McPeek, Mary Sara; Sun, Lei
2000-01-01
Misspecified relationships can have serious consequences for linkage studies, resulting in either reduced power or false-positive evidence for linkage. If some individuals in the pedigree are untyped, then Mendelian errors may not be observed. Previous approaches to detection of misspecified relationships by use of genotype data were developed for sib and half-sib pairs. We extend the likelihood calculations of Göring and Ott and Boehnke and Cox to more-general relative pairs, for which identity-by-descent (IBD) status is no longer a Markov chain, and we propose a likelihood-ratio test. We also extend the identity-by-state (IBS)–based test of Ehm and Wagner to nonsib relative pairs. The likelihood-ratio test has high power, but its drawbacks include the need to construct and apply a separate Markov chain for each possible alternative relationship and the need for simulation to assess significance. The IBS-based test is simpler but has lower power. We propose two new test statistics—conditional expected IBD (EIBD) and adjusted IBS (AIBS)—designed to retain the simplicity of IBS while increasing power by taking into account chance sharing. In simulations, the power of EIBD is generally close to that of the likelihood-ratio test. The power of AIBS is higher than that of IBS, in all cases considered. We suggest a strategy of initial screening by use of EIBD and AIBS, followed by application of the likelihood-ratio test to only a subset of relative pairs, identified by use of EIBD and AIBS. We apply the methods to a Genetic Analysis Workshop 11 data set from the Collaborative Study on the Genetics of Alcoholism. PMID:10712219