Chen, Feinian; Curran, Patrick J.; Bollen, Kenneth A.; Kirby, James; Paxton, Pamela
2009-01-01
This article is an empirical evaluation of the choice of fixed cutoff points in assessing the root mean square error of approximation (RMSEA) test statistic as a measure of goodness-of-fit in Structural Equation Models. Using simulation data, the authors first examine whether there is any empirical evidence for the use of a universal cutoff, and then compare the practice of using the point estimate of the RMSEA alone versus that of using it jointly with its related confidence interval. The results of the study demonstrate that there is little empirical support for the use of .05 or any other value as universal cutoff values to determine adequate model fit, regardless of whether the point estimate is used alone or jointly with the confidence interval. The authors' analyses suggest that to achieve a certain level of power or Type I error rate, the choice of cutoff values depends on model specifications, degrees of freedom, and sample size. PMID:19756246
ERIC Educational Resources Information Center
Huberty, Carl J.
An approach to statistical testing, which combines Neyman-Pearson hypothesis testing and Fisher significance testing, is recommended. The use of P-values in this approach is discussed in some detail. The author also discusses some problems which are often found in introductory statistics textbooks. The problems involve the definitions of…
Statistical Significance Testing.
ERIC Educational Resources Information Center
McLean, James E., Ed.; Kaufman, Alan S., Ed.
1998-01-01
The controversy about the use or misuse of statistical significance testing has become the major methodological issue in educational research. This special issue contains three articles that explore the controversy, three commentaries on these articles, an overall response, and three rejoinders by the first three authors. They are: (1)…
(Errors in statistical tests)3.
Phillips, Carl V; MacLehose, Richard F; Kaufman, Jay S
2008-07-14
In 2004, Garcia-Berthou and Alcaraz published "Incongruence between test statistics and P values in medical papers," a critique of statistical errors that received a tremendous amount of attention. One of their observations was that the final reported digit of p-values in articles published in the journal Nature departed substantially from the uniform distribution that they suggested should be expected. In 2006, Jeng critiqued that critique, observing that the statistical analysis of those terminal digits had been based on comparing the actual distribution to a uniform continuous distribution, when digits obviously are discretely distributed. Jeng corrected the calculation and reported statistics that did not so clearly support the claim of a digit preference. However delightful it may be to read a critique of statistical errors in a critique of statistical errors, we nevertheless found several aspects of the whole exchange to be quite troubling, prompting our own meta-critique of the analysis.The previous discussion emphasized statistical significance testing. But there are various reasons to expect departure from the uniform distribution in terminal digits of p-values, so that simply rejecting the null hypothesis is not terribly informative. Much more importantly, Jeng found that the original p-value of 0.043 should have been 0.086, and suggested this represented an important difference because it was on the other side of 0.05. Among the most widely reiterated (though often ignored) tenets of modern quantitative research methods is that we should not treat statistical significance as a bright line test of whether we have observed a phenomenon. Moreover, it sends the wrong message about the role of statistics to suggest that a result should be dismissed because of limited statistical precision when it is so easy to gather more data.In response to these limitations, we gathered more data to improve the statistical precision, and analyzed the actual pattern of the
Statistics and Hypothesis Testing in Biology.
ERIC Educational Resources Information Center
Maret, Timothy J.; Ziemba, Robert E.
1997-01-01
Suggests that early in their education students be taught to use basic statistical tests as rigorous methods of comparing experimental results with scientific hypotheses. Stresses that students learn how to use statistical tests in hypothesis-testing by applying them in actual hypothesis-testing situations. To illustrate, uses questions such as…
2009 GED Testing Program Statistical Report
ERIC Educational Resources Information Center
GED Testing Service, 2010
2010-01-01
The "2009 GED[R] Testing Program Statistical Report" is the 52nd annual report in the program's 68-year history of providing a second opportunity for adults without a high school credential to earn their jurisdiction's GED credential. The report provides candidate demographic and GED Test performance statistics as well as historical…
Quantum Statistical Testing of a QRNG Algorithm
Humble, Travis S; Pooser, Raphael C; Britt, Keith A
2013-01-01
We present the algorithmic design of a quantum random number generator, the subsequent synthesis of a physical design and its verification using quantum statistical testing. We also describe how quantum statistical testing can be used to diagnose channel noise in QKD protocols.
The insignificance of statistical significance testing
Johnson, Douglas H.
1999-01-01
Despite their use in scientific journals such as The Journal of Wildlife Management, statistical hypothesis tests add very little value to the products of research. Indeed, they frequently confuse the interpretation of data. This paper describes how statistical hypothesis tests are often viewed, and then contrasts that interpretation with the correct one. I discuss the arbitrariness of P-values, conclusions that the null hypothesis is true, power analysis, and distinctions between statistical and biological significance. Statistical hypothesis testing, in which the null hypothesis about the properties of a population is almost always known a priori to be false, is contrasted with scientific hypothesis testing, which examines a credible null hypothesis about phenomena in nature. More meaningful alternatives are briefly outlined, including estimation and confidence intervals for determining the importance of factors, decision theory for guiding actions in the face of uncertainty, and Bayesian approaches to hypothesis testing and other statistical practices.
Applications of Statistical Tests in Hand Surgery
Song, Jae W.; Haas, Ann; Chung, Kevin C.
2015-01-01
During the nineteenth century, with the emergence of public health as a goal to improve hygiene and conditions of the poor, statistics established itself as a distinct scientific field important for critically interpreting studies of public health concerns. During the twentieth century, statistics began to evolve mathematically and methodologically with hypothesis testing and experimental design. Today, the design of medical experiments centers around clinical trials and observational studies, and with the use of statistics, the collected data are summarized, weighed, and presented to direct both physicians and the public towards Evidence-Based Medicine. Having a basic understanding of statistics is mandatory in evaluating the validity of published literature and applying it to patient care. In this review, we aim to apply a practical approach in discussing basic statistical tests by providing a guide to choosing the correct statistical test along with examples relevant to hand surgery research. PMID:19969193
Teaching Statistics in Language Testing Courses
ERIC Educational Resources Information Center
Brown, James Dean
2013-01-01
The purpose of this article is to examine the literature on teaching statistics for useful ideas that teachers of language testing courses can draw on and incorporate into their teaching toolkits as they see fit. To those ends, the article addresses eight questions: What is known generally about teaching statistics? Why are students so anxious…
Statistics Test Questions: Content and Trends
ERIC Educational Resources Information Center
Salcedo, Audy
2014-01-01
This study presents the results of the analysis of a group of teacher-made test questions for statistics courses at the university level. Teachers were asked to submit tests they had used in their previous two semesters. Ninety-seven tests containing 978 questions were gathered and classified according to the SOLO taxonomy (Biggs & Collis,…
Binomial test statistics using Psi functions
Bowman, Kimiko o
2007-01-01
For the negative binomial model (probability generating function (p + 1 - pt){sup -k}) a logarithmic derivative is the Psi function difference {psi}(k + x) - {psi}(k); this and its derivatives lead to a test statistic to decide on the validity of a specified model. The test statistic uses a data base so there exists a comparison available between theory and application. Note that the test function is not dominated by outliers. Applications to (i) Fisher's tick data, (ii) accidents data, (iii) Weldon's dice data are included.
Graphic presentation of the simplest statistical tests
NASA Astrophysics Data System (ADS)
Georgiev, Tsvetan B.
This paper presents graphically well known tests about change of population mean and standard deviation, about comparison of population means and standard deviations, as well as about significance of correlation and regression coefficients. The critical bounds and criteria for variability with statistical guaranty P=95 % and P=99 % are presented as dependences on the data number n. The graphs further give fast visual solutions of the direct problem (estimation of confidence interval for specified P and n), as well of the reverse problem (estimation of n, which is necessary for achieving a desired statistical guaranty of the result). The aim of the work is to present the simplest statistical tests in a comprehensible and convenient graphs, which will be always at hand. The graphs may be useful in the investigations of time series in astronomy, geophysics, ecology etc., as well as in the education.
Statistical treatment of fatigue test data
Raske, D.T.
1980-01-01
This report discussed several aspects of fatigue data analysis in order to provide a basis for the development of statistically sound design curves. Included is a discussion on the choice of the dependent variable, the assumptions associated with least squares regression models, the variability of fatigue data, the treatment of data from suspended tests and outlying observations, and various strain-life relations.
Statistical tests for recessive lethal-carriers.
Hamilton, M A; Haseman, J K
1979-08-01
This paper presents a statistical method for testing whether a male mouse is a recessive lethal-carrier. The analysis is based on a back-cross experiment in which the male mouse is mated with some of his daughters. The numbers of total implantations and intrauterine deaths in each litter are recorded. It is assumed that, conditional on the number of total implantations, the number of intrauterine deaths follows a binomial distribution. Using computer-simulated experimentation it is shown that the proposed statistical method, which is sensitive to the pattern of intrauterine death rates, is more powerful than a test based only on the total number of implant deaths. The proposed test requires relatively simple calculations and can be used for a wide range of values of total implantations and background implant mortality rates. For computer-simulated experiments, there was no practical difference between the empirical error rate and the nominal error rate.
Mechanical Impact Testing: A Statistical Measurement
NASA Technical Reports Server (NTRS)
Engel, Carl D.; Herald, Stephen D.; Davis, S. Eddie
2005-01-01
In the decades since the 1950s, when NASA first developed mechanical impact testing of materials, researchers have continued efforts to gain a better understanding of the chemical, mechanical, and thermodynamic nature of the phenomenon. The impact mechanism is a real combustion ignition mechanism that needs understanding in the design of an oxygen system. The use of test data from this test method has been questioned due to lack of a clear method of application of the data and variability found between tests, material batches, and facilities. This effort explores a large database that has accumulated over a number of years and explores its overall nature. Moreover, testing was performed to determine the statistical nature of the test procedure to help establish sample size guidelines for material characterization. The current method of determining a pass/fail criterion based on either light emission or sound report or material charring is questioned.
Statistical tests for prediction of lignite quality
C.J. Kolovos
2007-06-15
Domestic lignite from large, bucket wheel excavators based open pit mines is the main fuel for electricity generation in Greece. Lignite from one or more mines may arrive at any power plant stockyard. The mixture obtained constitutes the lignite fuel fed to the power plant. The fuel is sampled in regular time intervals. These samples are considered as results of observations of values of spatial random variables. The aim was to form and statistically test many small sample populations. Statistical tests on the values of the humidity content, the ash-water free content, and the lower heating value of the lignite fuel indicated that the sample values form a normal population. The Kolmogorov-Smirnov test was applied for testing goodness-of-fit of sample distribution for a three year period and different power plants of the Kozani-Ptolemais area, western Macedonia, Greece. The normal distribution hypothesis can be widely accepted for forecasting the distribution of values of the basic quality characteristics even for a small number of samples.
Conditional statistical inference with multistage testing designs.
Zwitser, Robert J; Maris, Gunter
2015-03-01
In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.
A Statistical Perspective on Highly Accelerated Testing
Thomas, Edward V.
2015-02-01
Highly accelerated life testing has been heavily promoted at Sandia (and elsewhere) as a means to rapidly identify product weaknesses caused by flaws in the product's design or manufacturing process. During product development, a small number of units are forced to fail at high stress. The failed units are then examined to determine the root causes of failure. The identification of the root causes of product failures exposed by highly accelerated life testing can instigate changes to the product's design and/or manufacturing process that result in a product with increased reliability. It is widely viewed that this qualitative use of highly accelerated life testing (often associated with the acronym HALT) can be useful. However, highly accelerated life testing has also been proposed as a quantitative means for "demonstrating" the reliability of a product where unreliability is associated with loss of margin via an identified and dominating failure mechanism. It is assumed that the dominant failure mechanism can be accelerated by changing the level of a stress factor that is assumed to be related to the dominant failure mode. In extreme cases, a minimal number of units (often from a pre-production lot) are subjected to a single highly accelerated stress relative to normal use. If no (or, sufficiently few) units fail at this high stress level, some might claim that a certain level of reliability has been demonstrated (relative to normal use conditions). Underlying this claim are assumptions regarding the level of knowledge associated with the relationship between the stress level and the probability of failure. The primary purpose of this document is to discuss (from a statistical perspective) the efficacy of using accelerated life testing protocols (and, in particular, "highly accelerated" protocols) to make quantitative inferences concerning the performance of a product (e.g., reliability) when in fact there is lack-of-knowledge and uncertainty concerning the
Statistical reasoning in clinical trials: hypothesis testing.
Kelen, G D; Brown, C G; Ashton, J
1988-01-01
Hypothesis testing is based on certain statistical and mathematical principles that allow investigators to evaluate data by making decisions based on the probability or implausibility of observing the results obtained. However, classic hypothesis testing has its limitations, and probabilities mathematically calculated are inextricably linked to sample size. Furthermore, the meaning of the p value frequently is misconstrued as indicating that the findings are also of clinical significance. Finally, hypothesis testing allows for four possible outcomes, two of which are errors that can lead to erroneous adoption of certain hypotheses: 1. The null hypothesis is rejected when, in fact, it is false. 2. The null hypothesis is rejected when, in fact, it is true (type I or alpha error). 3. The null hypothesis is conceded when, in fact, it is true. 4. The null hypothesis is conceded when, in fact, it is false (type II or beta error). The implications of these errors, their relation to sample size, the interpretation of negative trials, and strategies related to the planning of clinical trials will be explored in a future article in this journal.
Explorations in Statistics: Hypothesis Tests and P Values
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2009-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This second installment of "Explorations in Statistics" delves into test statistics and P values, two concepts fundamental to the test of a scientific null hypothesis. The essence of a test statistic is that it compares what…
SANABRIA, FEDERICO; KILLEEN, PETER R.
2008-01-01
Despite being under challenge for the past 50 years, null hypothesis significance testing (NHST) remains dominant in the scientific field for want of viable alternatives. NHST, along with its significance level p, is inadequate for most of the uses to which it is put, a flaw that is of particular interest to educational practitioners who too often must use it to sanctify their research. In this article, we review the failure of NHST and propose prep, the probability of replicating an effect, as a more useful statistic for evaluating research and aiding practical decision making. PMID:19122766
Statistical Tests of Galactic Dynamo Theory
NASA Astrophysics Data System (ADS)
Chamandy, Luke; Shukurov, Anvar; Taylor, A. Russ
2016-12-01
Mean-field galactic dynamo theory is the leading theory to explain the prevalence of regular magnetic fields in spiral galaxies, but its systematic comparison with observations is still incomplete and fragmentary. Here we compare predictions of mean-field dynamo models to observational data on magnetic pitch angle and the strength of the mean magnetic field. We demonstrate that a standard {α }2{{Ω }} dynamo model produces pitch angles of the regular magnetic fields of nearby galaxies that are reasonably consistent with available data. The dynamo estimates of the magnetic field strength are generally within a factor of a few of the observational values. Reasonable agreement between theoretical and observed pitch angles generally requires the turbulent correlation time τ to be in the range of 10-20 {Myr}, in agreement with standard estimates. Moreover, good agreement also requires that the ratio of the ionized gas scale height to root-mean-square turbulent velocity increases with radius. Our results thus widen the possibilities to constrain interstellar medium parameters using observations of magnetic fields. This work is a step toward systematic statistical tests of galactic dynamo theory. Such studies are becoming more and more feasible as larger data sets are acquired using current and up-and-coming instruments.
Statistical tests of simple earthquake cycle models
NASA Astrophysics Data System (ADS)
DeVries, Phoebe M. R.; Evans, Eileen L.
2016-12-01
A central goal of observing and modeling the earthquake cycle is to forecast when a particular fault may generate an earthquake: a fault late in its earthquake cycle may be more likely to generate an earthquake than a fault early in its earthquake cycle. Models that can explain geodetic observations throughout the entire earthquake cycle may be required to gain a more complete understanding of relevant physics and phenomenology. Previous efforts to develop unified earthquake models for strike-slip faults have largely focused on explaining both preseismic and postseismic geodetic observations available across a few faults in California, Turkey, and Tibet. An alternative approach leverages the global distribution of geodetic and geologic slip rate estimates on strike-slip faults worldwide. Here we use the Kolmogorov-Smirnov test for similarity of distributions to infer, in a statistically rigorous manner, viscoelastic earthquake cycle models that are inconsistent with 15 sets of observations across major strike-slip faults. We reject a large subset of two-layer models incorporating Burgers rheologies at a significance level of α = 0.05 (those with long-term Maxwell viscosities ηM < 4.0 × 1019 Pa s and ηM > 4.6 × 1020 Pa s) but cannot reject models on the basis of transient Kelvin viscosity ηK. Finally, we examine the implications of these results for the predicted earthquake cycle timing of the 15 faults considered and compare these predictions to the geologic and historical record.
Controversies around the Role of Statistical Tests in Experimental Research.
ERIC Educational Resources Information Center
Batanero, Carmen
2000-01-01
Describes the logic of statistical testing in the Fisher and Neyman-Pearson approaches. Reviews some common misinterpretations of basic concepts behind statistical tests. Analyzes the philosophical and psychological issues that can contribute to these misinterpretations. Suggests possible ways in which statistical education might contribute to the…
Statistical significance testing and clinical trials.
Krause, Merton S
2011-09-01
The efficacy of treatments is better expressed for clinical purposes in terms of these treatments' outcome distributions and their overlapping rather than in terms of the statistical significance of these distributions' mean differences, because clinical practice is primarily concerned with the outcome of each individual client rather than with the mean of the variety of outcomes in any group of clients. Reports of the obtained outcome distributions for the comparison groups of all competently designed and executed randomized clinical trials should be publicly available no matter what the statistical significance of the mean differences among these groups, because all of these studies' outcome distributions provide clinically useful information about the efficacy of the treatments compared.
An overall statistic for testing symmetry in social interactions.
Leiva, David; Solanas, Antonio; Salafranca, Lluís
2008-11-01
The present work focuses on the skew-symmetry index as a measure of social reciprocity. This index is based on the correspondence between the amount of behaviour that individuals address toward their partners and what they receive in return. Although the skew-symmetry index enables researchers to describe social groups, statistical inferential tests are required. This study proposes an overall statistical technique for testing symmetry in experimental conditions, calculating the skew-symmetry statistic (Phi) at group level. Sampling distributions for the skew-symmetry statistic were estimated by means of a Monte Carlo simulation to allow researchers to make statistical decisions. Furthermore, this study will allow researchers to choose the optimal experimental conditions for carrying out their research, as the power of the statistical test was estimated. This statistical test could be used in experimental social psychology studies in which researchers may control the group size and the number of interactions within dyads.
A Note on Measurement Scales and Statistical Testing
ERIC Educational Resources Information Center
Meijer, Rob R.; Oosterloo, Sebie J.
2008-01-01
In elementary books on applied statistics (e.g., Siegel, 1988; Agresti, 1990) and books on research methodology in psychology and personality assessment (e.g., Aiken, 1999), it is often suggested that the choice of a statistical test and the choice of statistical operations should be determined by the level of measurement of the data. Although…
Statistical Analysis of Multiple Choice Testing
2001-04-01
the question to help determine poor distractors (incorrect answers). However, Attali and Fraenkel show that while it is sound to use the Rpbis...heavily on question difficulty.21 Attali and Fraenkel say that the Biserial is usually preferred as a criterion measure for the correct alternative...pubs/mcq/scpre.html, p.6 17 Renckly, Thomas R. Test Analysis & Development Sysem (TAD) version 5.49. CD- ROM.(1990-2000). 18 Ibid. 19 Attali , Yigal
Multiple comparisons and nonparametric statistical tests on a programmable calculator.
Hurwitz, A
1987-03-01
Calculator programs are provided for statistical tests for comparing groups of data. These tests can be applied when t-tests are inappropriate, as for multiple comparisons, or for evaluating groups of data that are not distributed normally or have unequal variances. The programs, designed to run on the least expensive Hewlett-Packard programmable scientific calculator, Model HP-11C, should place these statistical tests within easy reach of most students and investigators.
Testing the Difference of Correlated Agreement Coefficients for Statistical Significance
ERIC Educational Resources Information Center
Gwet, Kilem L.
2016-01-01
This article addresses the problem of testing the difference between two correlated agreement coefficients for statistical significance. A number of authors have proposed methods for testing the difference between two correlated kappa coefficients, which require either the use of resampling methods or the use of advanced statistical modeling…
Statistical properties of the USP dissolution test with pooled samples.
Saccone, Carlos D; Meneces, Nora S; Tessore, Julio
2005-01-01
The Montecarlo simulation method is used to study the statistical properties of the USP pooled dissolution test. In this paper, the statistical behavior of the dissolution test for pooled samples is studied, including: a) the operating characteristic curve showing the probability of passing the test versus the mean amount dissolved, b) the influence of measurement uncertainty on the result of the test, c) an analysis of the dependence of the statistical behavior on the underlying distribution of the individual amounts dissolved, d) a comparison of the statistical behavior of the unit dissolution test versus the pooled dissolution test, e) the average number of stages needed to reach a decision presented as a function of parameters of the lot, f) the relative influence of the three stages of the test on the probability of acceptance.
Nonparametric statistical testing of EEG- and MEG-data.
Maris, Eric; Oostenveld, Robert
2007-08-15
In this paper, we show how ElectroEncephaloGraphic (EEG) and MagnetoEncephaloGraphic (MEG) data can be analyzed statistically using nonparametric techniques. Nonparametric statistical tests offer complete freedom to the user with respect to the test statistic by means of which the experimental conditions are compared. This freedom provides a straightforward way to solve the multiple comparisons problem (MCP) and it allows to incorporate biophysically motivated constraints in the test statistic, which may drastically increase the sensitivity of the statistical test. The paper is written for two audiences: (1) empirical neuroscientists looking for the most appropriate data analysis method, and (2) methodologists interested in the theoretical concepts behind nonparametric statistical tests. For the empirical neuroscientist, a large part of the paper is written in a tutorial-like fashion, enabling neuroscientists to construct their own statistical test, maximizing the sensitivity to the expected effect. And for the methodologist, it is explained why the nonparametric test is formally correct. This means that we formulate a null hypothesis (identical probability distribution in the different experimental conditions) and show that the nonparametric test controls the false alarm rate under this null hypothesis.
Misuse of statistical tests in Archives of Clinical Neuropsychology publications.
Schatz, Philip; Jay, Kristin A; McComb, Jason; McLaughlin, Jason R
2005-12-01
This article reviews the (mis)use of statistical tests in neuropsychology research studies published in the Archives of Clinical Neuropsychology in the years 1990-1992 and 1996-2000, and 2001-2004, prior to, commensurate with the internet-based and paper-based release, and following the release of the American Psychological Association's Task Force on Statistical Inference. The authors focused on four statistical errors: inappropriate use of null hypothesis tests, inappropriate use of P-values, neglect of effect size, and inflation of Type I error rates. Despite the recommendations of the Task Force on Statistical Inference published in 1999, the present study recorded instances of these statistical errors both pre- and post-APA's report, with only the reporting of effect size increasing after the release of the report. Neuropsychologists involved in empirical research should be better aware of the limitations and boundaries of hypothesis testing as well as the theoretical aspects of research methodology.
A Statistical Test for Comparing Nonnested Covariance Structure Models.
ERIC Educational Resources Information Center
Levy, Roy; Hancock, Gregory R.
While statistical procedures are well known for comparing hierarchically related (nested) covariance structure models, statistical tests for comparing nonhierarchically related (nonnested) models have proven more elusive. While isolated attempts have been made, none exists within the commonly used maximum likelihood estimation framework, thereby…
Chi-Square Statistics, Tests of Hypothesis and Technology.
ERIC Educational Resources Information Center
Rochowicz, John A.
The use of technology such as computers and programmable calculators enables students to find p-values and conduct tests of hypotheses in many different ways. Comprehension and interpretation of a research problem become the focus for statistical analysis. This paper describes how to calculate chisquare statistics and p-values for statistical…
The Use of Meta-Analytic Statistical Significance Testing
ERIC Educational Resources Information Center
Polanin, Joshua R.; Pigott, Terri D.
2015-01-01
Meta-analysis multiplicity, the concept of conducting multiple tests of statistical significance within one review, is an underdeveloped literature. We address this issue by considering how Type I errors can impact meta-analytic results, suggest how statistical power may be affected through the use of multiplicity corrections, and propose how…
The Importance of Teaching Power in Statistical Hypothesis Testing
ERIC Educational Resources Information Center
Olinsky, Alan; Schumacher, Phyllis; Quinn, John
2012-01-01
In this paper, we discuss the importance of teaching power considerations in statistical hypothesis testing. Statistical power analysis determines the ability of a study to detect a meaningful effect size, where the effect size is the difference between the hypothesized value of the population parameter under the null hypothesis and the true value…
BIAZA statistics guidelines: toward a common application of statistical tests for zoo research.
Plowman, Amy B
2008-05-01
Zoo research presents many statistical challenges, mostly arising from the need to work with small sample sizes. Efforts to overcome these often lead to the misuse of statistics including pseudoreplication, inappropriate pooling, assumption violation or excessive Type II errors because of using tests with low power to avoid assumption violation. To tackle these issues and make some general statistical recommendations for zoo researchers, the Research Group of the British and Irish Association of Zoos and Aquariums (BIAZA) conducted a workshop. Participants included zoo-based researchers, university academics with zoo interests and three statistical experts. The result was a BIAZA publication Zoo Research Guidelines: Statistics for Typical Zoo Datasets (Plowman [2006] Zoo research guidelines: statistics for zoo datasets. London: BIAZA), which provides advice for zoo researchers on study design and analysis to ensure appropriate and rigorous use of statistics. The main recommendations are: (1) that many typical zoo investigations should be conducted as single case/small N randomized designs, analyzed with randomization tests, (2) that when comparing complete time budgets across conditions in behavioral studies, G tests and their derivatives are the most appropriate statistical tests and (3) that in studies involving multiple dependent and independent variables there are usually no satisfactory alternatives to traditional parametric tests and, despite some assumption violations, it is better to use these tests with careful interpretation, than to lose information through not testing at all. The BIAZA guidelines were recommended by American Association of Zoos and Aquariums (AZA) researchers at the AZA Annual Conference in Tampa, FL, September 2006, and are free to download from www.biaza.org.uk.
ERIC Educational Resources Information Center
Sanabria, Federico; Killeen, Peter R.
2007-01-01
Despite being under challenge for the past 50 years, null hypothesis significance testing (NHST) remains dominant in the scientific field for want of viable alternatives. NHST, along with its significance level "p," is inadequate for most of the uses to which it is put, a flaw that is of particular interest to educational practitioners…
Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie
2013-01-01
Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials.
Evaluation of Multi-parameter Test Statistics for Multiple Imputation.
Liu, Yu; Enders, Craig K
2017-03-22
In Ordinary Least Square regression, researchers often are interested in knowing whether a set of parameters is different from zero. With complete data, this could be achieved using the gain in prediction test, hierarchical multiple regression, or an omnibus F test. However, in substantive research scenarios, missing data often exist. In the context of multiple imputation, one of the current state-of-art missing data strategies, there are several different analogous multi-parameter tests of the joint significance of a set of parameters, and these multi-parameter test statistics can be referenced to various distributions to make statistical inferences. However, little is known about the performance of these tests, and virtually no research study has compared the Type 1 error rates and statistical power of these tests in scenarios that are typical of behavioral science data (e.g., small to moderate samples, etc.). This paper uses Monte Carlo simulation techniques to examine the performance of these multi-parameter test statistics for multiple imputation under a variety of realistic conditions. We provide a number of practical recommendations for substantive researchers based on the simulation results, and illustrate the calculation of these test statistics with an empirical example.
Multiple statistical tests: Lessons from a d20
Madan, Christopher R.
2016-01-01
Statistical analyses are often conducted with α= .05. When multiple statistical tests are conducted, this procedure needs to be adjusted to compensate for the otherwise inflated Type I error. In some instances in tabletop gaming, sometimes it is desired to roll a 20-sided die (or 'd20') twice and take the greater outcome. Here I draw from probability theory and the case of a d20, where the probability of obtaining any specific outcome is 1/ 20, to determine the probability of obtaining a specific outcome (Type-I error) at least once across repeated, independent statistical tests. PMID:27347382
Statistical Evaluation of Molecular Contamination During Spacecraft Thermal Vacuum Test
NASA Technical Reports Server (NTRS)
Chen, Philip; Hedgeland, Randy; Montoya, Alex; Roman-Velazquez, Juan; Dunn, Jamie; Colony, Joe; Petitto, Joseph
1998-01-01
The purpose of this paper is to evaluate the statistical molecular contamination data with a goal to improve spacecraft contamination control. The statistical data was generated in typical thermal vacuum tests at the National Aeronautics and Space Administration, Goddard Space Flight Center (GSFC). The magnitude of material outgassing was measured using a Quartz Crystal Microbalance (QCM) device during the test. A solvent rinse sample was taken at the conclusion of each test. Then detailed qualitative and quantitative measurements were obtained through chemical analyses. All data used in this study encompassed numerous spacecraft tests in recent years.
Statistical Evaluation of Molecular Contamination During Spacecraft Thermal Vacuum Test
NASA Technical Reports Server (NTRS)
Chen, Philip; Hedgeland, Randy; Montoya, Alex; Roman-Velazquez, Juan; Dunn, Jamie; Colony, Joe; Petitto, Joseph
1997-01-01
The purpose of this paper is to evaluate the statistical molecular contamination data with a goal to improve spacecraft contamination control. The statistical data was generated in typical thermal vacuum tests at the National Aeronautics and Space Administration, Goddard Space Flight Center (GSFC). The magnitude of material outgassing was measured using a Quartz Crystal Microbalance (QCM) device during the test. A solvent rinse sample was taken at the conclusion of the each test. Then detailed qualitative and quantitative measurements were obtained through chemical analyses. All data used in this study encompassed numerous spacecraft tests in recent years.
Statistical Evaluation of Molecular Contamination During Spacecraft Thermal Vacuum Test
NASA Technical Reports Server (NTRS)
Chen, Philip; Hedgeland, Randy; Montoya, Alex; Roman-Velazquez, Juan; Dunn, Jamie; Colony, Joe; Petitto, Joseph
1999-01-01
The purpose of this paper is to evaluate the statistical molecular contamination data with a goal to improve spacecraft contamination control. The statistical data was generated in typical thermal vacuum tests at the National Aeronautics and Space Administration, Goddard Space Flight Center (GSFC). The magnitude of material outgassing was measured using a Quartz Crystal Microbalance (QCNO device during the test. A solvent rinse sample was taken at the conclusion of each test. Then detailed qualitative and quantitative measurements were obtained through chemical analyses. All data used in this study encompassed numerous spacecraft tests in recent years.
Your Chi-Square Test Is Statistically Significant: Now What?
ERIC Educational Resources Information Center
Sharpe, Donald
2015-01-01
Applied researchers have employed chi-square tests for more than one hundred years. This paper addresses the question of how one should follow a statistically significant chi-square test result in order to determine the source of that result. Four approaches were evaluated: calculating residuals, comparing cells, ransacking, and partitioning. Data…
Statistical significance test for transition matrices of atmospheric Markov chains
NASA Technical Reports Server (NTRS)
Vautard, Robert; Mo, Kingtse C.; Ghil, Michael
1990-01-01
Low-frequency variability of large-scale atmospheric dynamics can be represented schematically by a Markov chain of multiple flow regimes. This Markov chain contains useful information for the long-range forecaster, provided that the statistical significance of the associated transition matrix can be reliably tested. Monte Carlo simulation yields a very reliable significance test for the elements of this matrix. The results of this test agree with previously used empirical formulae when each cluster of maps identified as a distinct flow regime is sufficiently large and when they all contain a comparable number of maps. Monte Carlo simulation provides a more reliable way to test the statistical significance of transitions to and from small clusters. It can determine the most likely transitions, as well as the most unlikely ones, with a prescribed level of statistical significance.
Statistical Approach to the Operational Testing of Space Fence
2015-07-01
Officer, an Orbital Analyst, and a radar signature analyst at the FPS-85 phased array radar and the FPS-79 dish radar. While at the FPS-79 he...unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT Space Fence will be a terrestrial-based radar designed to perform surveillance on earth- orbiting ...ensuring a reasonable test duration. We propose a rigorous statistical test design with candidate on- orbit test targets that span orbital limits defined by
Revisit the 21-day cumulative irritation test - statistical considerations.
Zhang, Paul; Li, Qing
2017-03-01
The 21-day cumulative irritation test is widely used for evaluating the irritation potential of topical skin-care products. This test consists of clinician's assessment of skin reaction of the patch sites and a classification system to categorize the test product's irritation potential. A new classification system is proposed which enables us to control the estimation error and provides a statistical confidence with regard to the repeatability of the classification.
[Clinical research IV. Relevancy of the statistical test chosen].
Talavera, Juan O; Rivas-Ruiz, Rodolfo
2011-01-01
When we look at the difference between two therapies or the association of a risk factor or prognostic indicator with its outcome, we need to evaluate the accuracy of the result. This assessment is based on a judgment that uses information about the study design and statistical management of the information. This paper specifically mentions the relevance of the statistical test selected. Statistical tests are chosen mainly from two characteristics: the objective of the study and type of variables. The objective can be divided into three test groups: a) those in which you want to show differences between groups or inside a group before and after a maneuver, b) those that seek to show the relationship (correlation) between variables, and c) those that aim to predict an outcome. The types of variables are divided in two: quantitative (continuous and discontinuous) and qualitative (ordinal and dichotomous). For example, if we seek to demonstrate differences in age (quantitative variable) among patients with systemic lupus erythematosus (SLE) with and without neurological disease (two groups), the appropriate test is the "Student t test for independent samples." But if the comparison is about the frequency of females (binomial variable), then the appropriate statistical test is the χ(2).
Distributions of Hardy-Weinberg equilibrium test statistics.
Rohlfs, R V; Weir, B S
2008-11-01
It is well established that test statistics and P-values derived from discrete data, such as genetic markers, are also discrete. In most genetic applications, the null distribution for a discrete test statistic is approximated with a continuous distribution, but this approximation may not be reasonable. In some cases using the continuous approximation for the expected null distribution may cause truly null test statistics to appear nonnull. We explore the implications of using continuous distributions to approximate the discrete distributions of Hardy-Weinberg equilibrium test statistics and P-values. We derive exact P-value distributions under the null and alternative hypotheses, enabling a more accurate analysis than is possible with continuous approximations. We apply these methods to biological data and find that using continuous distribution theory with exact tests may underestimate the extent of Hardy-Weinberg disequilibrium in a sample. The implications may be most important for the widespread use of whole-genome case-control association studies and Hardy-Weinberg equilibrium (HWE) testing for data quality control.
Shukla, R.; Yu Daohai; Fulk, F.
1995-12-31
Short-term toxicity tests with aquatic organisms are a valuable measurement tool in the assessment of the toxicity of effluents, environmental samples and single chemicals. Currently toxicity tests are utilized in a wide range of US EPA regulatory activities including effluent discharge compliance. In the current approach for determining the No Observed Effect Concentration, an effluent concentration is presumed safe if there is no statistically significant difference in toxicant response versus control response. The conclusion of a safe concentration may be due to the fact that it truly is safe, or alternatively, that the ability of the statistical test to detect an effect, given its existence, is inadequate. Results of research of a new statistical approach, the basis of which is to move away from a demonstration of no difference to a demonstration of equivalence, will be discussed. The concept of observed confidence distributions, first suggested by Cox, is proposed as a measure of the strength of evidence for practically equivalent responses between a given effluent concentration and the control. The research included determination of intervals of practically equivalent responses as a function of the variability of control response. The approach is illustrated using reproductive data from tests with Ceriodaphnia dubia and survival and growth data from tests with fathead minnow. The data are from the US EPA`s National Reference Toxicant Database.
Innovative role of statistics in acid rain performance testing
Warren-Hicks, W.; Etchison, T.; Lieberman, E.R.
1995-12-31
Title IV of the Clean Air Act Amendments (CAAAs) of 1990 mandated that affected electric utilities reduce sulfur dioxide (SO{sub 2}) and nitrogen oxide (NO{sub x}) emissions, the primary precursors of acidic deposition, and included an innovative market-based SO{sub 2} regulatory program. A central element of the Acid Rain Program is the requirement that affected utility units install CEMS. This paper describes how the Acid Rain Regulations incorporated statistical procedures in the performance tests for continuous emissions monitoring systems (CEMS) and how statistical analysis was used to assess the appropriateness, stringency, and potential impact of various performance tests and standards that were considered for inclusion in the Acid Rain Regulations. Described here is the statistical analysis that was used to set a relative accuracy standard, establish the calculation procedures for filling in missing data when a monitor malfunctions, and evaluate the performance tests applied to petitions for alternative monitoring systems. The paper concludes that the statistical evaluations of proposed provisions of the Acid Rain Regulations resulted in the adoption of performance tests and standards that were scientifically substantiated, workable, and effective.
Statistical Studies on Sequential Probability Ratio Test for Radiation Detection
Warnick Kernan, Ding Yuan, et al.
2007-07-01
A Sequential Probability Ratio Test (SPRT) algorithm helps to increase the reliability and speed of radiation detection. This algorithm is further improved to reduce spatial gap and false alarm. SPRT, using Last-in-First-Elected-Last-Out (LIFELO) technique, reduces the error between the radiation measured and resultant alarm. Statistical analysis determines the reduction of spatial error and false alarm.
Wavelet analysis in ecology and epidemiology: impact of statistical tests
Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario
2014-01-01
Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the ‘beta-surrogate’ method. PMID:24284892
Wavelet analysis in ecology and epidemiology: impact of statistical tests.
Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario
2014-02-06
Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the 'beta-surrogate' method.
Mean-squared-displacement statistical test for fractional Brownian motion
NASA Astrophysics Data System (ADS)
Sikora, Grzegorz; Burnecki, Krzysztof; Wyłomańska, Agnieszka
2017-03-01
Anomalous diffusion in crowded fluids, e.g., in cytoplasm of living cells, is a frequent phenomenon. A common tool by which the anomalous diffusion of a single particle can be classified is the time-averaged mean square displacement (TAMSD). A classical mechanism leading to the anomalous diffusion is the fractional Brownian motion (FBM). A validation of such process for single-particle tracking data is of great interest for experimentalists. In this paper we propose a rigorous statistical test for FBM based on TAMSD. To this end we analyze the distribution of the TAMSD statistic, which is given by the generalized chi-squared distribution. Next, we study the power of the test by means of Monte Carlo simulations. We show that the test is very sensitive for changes of the Hurst parameter. Moreover, it can easily distinguish between two models of subdiffusion: FBM and continuous-time random walk.
Validated intraclass correlation statistics to test item performance models.
Courrieu, Pierre; Brand-D'abrescia, Muriele; Peereman, Ronald; Spieler, Daniel; Rey, Arnaud
2011-03-01
A new method, with an application program in Matlab code, is proposed for testing item performance models on empirical databases. This method uses data intraclass correlation statistics as expected correlations to which one compares simple functions of correlations between model predictions and observed item performance. The method rests on a data population model whose validity for the considered data is suitably tested and has been verified for three behavioural measure databases. Contrarily to usual model selection criteria, this method provides an effective way of testing under-fitting and over-fitting, answering the usually neglected question "does this model suitably account for these data?"
Asymptotics of Bonferroni for Dependent Normal Test Statistics.
Proschan, Michael A; Shaw, Pamela A
2011-07-01
The Bonferroni adjustment is sometimes used to control the familywise error rate (FWE) when the number of comparisons is huge. In genome wide association studies, researchers compare cases to controls with respect to thousands of single nucleotide polymorphisms. It has been claimed that the Bonferroni adjustment is only slightly conservative if the comparisons are nearly independent. We show that the veracity of this claim depends on how one defines "nearly." Specifically, if the test statistics' pairwise correlations converge to 0 as the number of tests tend to ∞, the conservatism of the Bonferroni procedure depends on their rate of convergence. The type I error rate of Bonferroni can tend to 0 or 1 - exp(-α) ≈ α, depending on that rate. We show using elementary probability theory what happens to the distribution of the number of errors when using Bonferroni, as the number of dependent normal test statistics gets large. We also use the limiting behavior of Bonferroni to shed light on properties of other commonly used test statistics.
A critique of statistical hypothesis testing in clinical research
Raha, Somik
2011-01-01
Many have documented the difficulty of using the current paradigm of Randomized Controlled Trials (RCTs) to test and validate the effectiveness of alternative medical systems such as Ayurveda. This paper critiques the applicability of RCTs for all clinical knowledge-seeking endeavors, of which Ayurveda research is a part. This is done by examining statistical hypothesis testing, the underlying foundation of RCTs, from a practical and philosophical perspective. In the philosophical critique, the two main worldviews of probability are that of the Bayesian and the frequentist. The frequentist worldview is a special case of the Bayesian worldview requiring the unrealistic assumptions of knowing nothing about the universe and believing that all observations are unrelated to each other. Many have claimed that the first belief is necessary for science, and this claim is debunked by comparing variations in learning with different prior beliefs. Moving beyond the Bayesian and frequentist worldviews, the notion of hypothesis testing itself is challenged on the grounds that a hypothesis is an unclear distinction, and assigning a probability on an unclear distinction is an exercise that does not lead to clarity of action. This critique is of the theory itself and not any particular application of statistical hypothesis testing. A decision-making frame is proposed as a way of both addressing this critique and transcending ideological debates on probability. An example of a Bayesian decision-making approach is shown as an alternative to statistical hypothesis testing, utilizing data from a past clinical trial that studied the effect of Aspirin on heart attacks in a sample population of doctors. As a big reason for the prevalence of RCTs in academia is legislation requiring it, the ethics of legislating the use of statistical methods for clinical research is also examined. PMID:22022152
Statistical comparison of similarity tests applied to speech production data
NASA Astrophysics Data System (ADS)
Kollia, H.; Jorgenson, Jay; Saint Fleur, Rose; Foster, Kevin
2004-05-01
Statistical analysis of data variability in speech production research has traditionally been addressed with the assumption of normally distributed error terms. The correct and valid application of statistical procedure requires a thorough investigation of the assumptions that underlie the methodology. In previous work [Kollia and Jorgenson, J. Acoust. Soc. Am. 102 (1997); 109 (2002)], it was shown that the error terms of speech production data in a linear regression can be modeled accurately using a quadratic probability distribution, rather than a normal distribution as is frequently assumed. The measurement used in the earlier Kollia-Jorgenson work involved the classical Kolmogorov-Smirnov statistical test. In the present work, the authors further explore the problem of analyzing the error terms coming from linear regression using a variety of known statistical tests, including, but not limited to chi-square, Kolmogorov-Smirnov, Anderson-Darling, Cramer-von Mises, skewness and kurtosis, and Durbin. Our study complements a similar study by Shapiro, Wilk, and Chen [J. Am. Stat. Assoc. (1968)]. [Partial support provided by PSC-CUNY and NSF to Jay Jorgenson.
LATTE - Linking Acoustic Tests and Tagging Using Statistical Estimation
2013-09-30
after C. Harris’ maternity leave. WORK COMPLETED The project started in April 2010. Task 3 (data processing) is now essentially over, and task 4...Marques et al. An overview of LATTE: Linking Acoustic Tests and Tagging using statistical Estimation: Modelling the Behaviour of Beaked Whales in...geometric (Figure 2). 5 Figure 2 – Conceptual description of the dive cycle of a beaked whale considering 7 behavioural states: 1. At the
Statistical tests of ARIES data. [very long base interferometry geodesy
NASA Technical Reports Server (NTRS)
Musman, S.
1982-01-01
Statistical tests are performed on Project ARIES preliminary baseline measurements in the Southern California triangle formed by the Jet Propulsion Laboratory, the Owens Valley Radio Observatory, and the Goldstone tracking complex during 1976-1980. In addition to conventional one-dimensional tests a two-dimensional test which allows for an arbitrary correlation between errors in individual components is formulated using the Hotelling statistic. On two out of three baselines the mean rate of change in baseline vector is statistically significant. Apparent motions on all three baselines are consistent with a pure shear with north-south compression and east-west expansion of 1 x 10 to the -7th/year. The ARIES measurements are consistent with the USGS geodolite networks in Southern California and the SAFE laser satellite ranging experiment. All three experiments are consistent with a 6 cm/year motion between the Pacific and North American Plates and a band of diffuse shear 300 km wide, except that corresponding rotation of the entire triangle is not found.
Statistical Treatment of Earth Observing System Pyroshock Separation Test Data
NASA Technical Reports Server (NTRS)
McNelis, Anne M.; Hughes, William O.
1998-01-01
The Earth Observing System (EOS) AM-1 spacecraft for NASA's Mission to Planet Earth is scheduled to be launched on an Atlas IIAS vehicle in June of 1998. One concern is that the instruments on the EOS spacecraft are sensitive to the shock-induced vibration produced when the spacecraft separates from the launch vehicle. By employing unique statistical analysis to the available ground test shock data, the NASA Lewis Research Center found that shock-induced vibrations would not be as great as the previously specified levels of Lockheed Martin. The EOS pyroshock separation testing, which was completed in 1997, produced a large quantity of accelerometer data to characterize the shock response levels at the launch vehicle/spacecraft interface. Thirteen pyroshock separation firings of the EOS and payload adapter configuration yielded 78 total measurements at the interface. The multiple firings were necessary to qualify the newly developed Lockheed Martin six-hardpoint separation system. Because of the unusually large amount of data acquired, Lewis developed a statistical methodology to predict the maximum expected shock levels at the interface between the EOS spacecraft and the launch vehicle. Then, this methodology, which is based on six shear plate accelerometer measurements per test firing at the spacecraft/launch vehicle interface, was used to determine the shock endurance specification for EOS. Each pyroshock separation test of the EOS spacecraft simulator produced its own set of interface accelerometer data. Probability distributions, histograms, the median, and higher order moments (skew and kurtosis) were analyzed. The data were found to be lognormally distributed, which is consistent with NASA pyroshock standards. Each set of lognormally transformed test data produced was analyzed to determine if the data should be combined statistically. Statistical testing of the data's standard deviations and means (F and t testing, respectively) determined if data sets were
A Statistical Test for Detecting Answer Copying on Multiple-Choice Tests
ERIC Educational Resources Information Center
van der Linden, Wim J.; Sotaridona, Leonardo
2004-01-01
A statistical test for the detection of answer copying on multiple-choice tests is presented. The test is based on the idea that the answers of examinees to test items may be the result of three possible processes: (1) knowing, (2) guessing, and (3) copying, but that examinees who do not have access to the answers of other examinees can arrive at…
Shaikh, Masood Ali
2016-04-01
Statistical tests help infer meaningful conclusions from studies conducted and data collected. This descriptive study analyzed the type of statistical tests used and the statistical software utilized for analysis reported in the original articles published in 2014 by the three Medline-indexed journals of Pakistan. Cumulatively, 466 original articles were published in 2014. The most frequently reported statistical tests for original articles by all three journals were bivariate parametric and non-parametric tests i.e. involving comparisons between two groups e.g. Chi-square test, t-test, and various types of correlations. Cumulatively, 201 (43.1%) articles used these tests. SPSS was the primary choice for statistical analysis, as it was exclusively used in 374 (80.3%) original articles. There has been a substantial increase in the number of articles published, and in the sophistication of statistical tests used in the articles published in the Pakistani Medline indexed journals in 2014, compared to 2007.
n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator
2012-09-12
nSIGHTS (n-dimensional Statistical Inverse Graphical Hydraulic Test Simulator) is a comprehensive well test analysis software package. It provides a user-interface, a well test analysis model and many tools to analyze both field and simulated data. The well test analysis model simulates a single-phase, one-dimensional, radial/non-radial flow regime, with a borehole at the center of the modeled flow system. nSIGHTS solves the radially symmetric n-dimensional forward flow problem using a solver based on a graph-theoretic approach. The results of the forward simulation are pressure, and flow rate, given all the input parameters. The parameter estimation portion of nSIGHTS uses a perturbation-based approach to interpret the best-fit well and reservoir parameters, given an observed dataset of pressure and flow rate.
DERIVATION OF A TEST STATISTIC FOR EMPHYSEMA QUANTIFICATION
Vegas-Sanchez-Ferrero, Gonzalo; Washko, George; Rahaghi, Farbod N.; Ledesma-Carbayo, Maria J.; Estépar, R. San José
2016-01-01
Density masking is the de-facto quantitative imaging phenotype for emphysema that is widely used by the clinical community. Density masking defines the burden of emphysema by a fixed threshold, usually between −910 HU and −950 HU, that has been experimentally validated with histology. In this work, we formalized emphysema quantification by means of statistical inference. We show that a non-central Gamma is a good approximation for the local distribution of image intensities for normal and emphysema tissue. We then propose a test statistic in terms of the sample mean of a truncated non-central Gamma random variable. Our results show that this approach is well-suited for the detection of emphysema and superior to standard density masking. The statistical method was tested in a dataset of 1337 samples obtained from 9 different scanner models in subjects with COPD. Results showed an increase of 17% when compared to the density masking approach, and an overall accuracy of 94.09%. PMID:27974952
Statistical Hypothesis Testing in Intraspecific Phylogeography: NCPA versus ABC
Templeton, Alan R.
2009-01-01
Nested clade phylogeographic analysis (NCPA) and approximate Bayesian computation (ABC) have been used to test phylogeographic hypotheses. Multilocus NCPA tests null hypotheses, whereas ABC discriminates among a finite set of alternatives. The interpretive criteria of NCPA are explicit and allow complex models to be built from simple components. The interpretive criteria of ABC are ad hoc and require the specification of a complete phylogeographic model. The conclusions from ABC are often influenced by implicit assumptions arising from the many parameters needed to specify a complex model. These complex models confound many assumptions so that biological interpretations are difficult. Sampling error is accounted for in NCPA, but ABC ignores important sources of sampling error that creates pseudo-statistical power. NCPA generates the full sampling distribution of its statistics, but ABC only yields local probabilities, which in turn make it impossible to distinguish between a good fitting model, a non-informative model, and an over-determined model. Both NCPA and ABC use approximations, but convergences of the approximations used in NCPA are well defined whereas those in ABC are not. NCPA can analyze a large number of locations, but ABC cannot. Finally, the dimensionality of tested hypothesis is known in NCPA, but not for ABC. As a consequence, the “probabilities” generated by ABC are not true probabilities and are statistically non-interpretable. Accordingly, ABC should not be used for hypothesis testing, but simulation approaches are valuable when used in conjunction with NCPA or other methods that do not rely on highly parameterized models. PMID:19192182
Testing manifest monotonicity using order-constrained statistical inference.
Tijmstra, Jesper; Hessen, David J; van der Heijden, Peter G M; Sijtsma, Klaas
2013-01-01
Most dichotomous item response models share the assumption of latent monotonicity, which states that the probability of a positive response to an item is a nondecreasing function of a latent variable intended to be measured. Latent monotonicity cannot be evaluated directly, but it implies manifest monotonicity across a variety of observed scores, such as the restscore, a single item score, and in some cases the total score. In this study, we show that manifest monotonicity can be tested by means of the order-constrained statistical inference framework. We propose a procedure that uses this framework to determine whether manifest monotonicity should be rejected for specific items. This approach provides a likelihood ratio test for which the p-value can be approximated through simulation. A simulation study is presented that evaluates the Type I error rate and power of the test, and the procedure is applied to empirical data.
ERIC Educational Resources Information Center
Ervin, Nancy S.
How accurately deltas (statistics measuring the difficulty of items) established by pre-test populations reflect deltas obtained from final form populations, and the consequent utility of pre-test deltas for constructing final (operational test) forms to meet developed statistical specifications were studied. Data were examined from five subject…
Quantum Statistical Testing of a Quantum Random Number Generator
Humble, Travis S
2014-01-01
The unobservable elements in a quantum technology, e.g., the quantum state, complicate system verification against promised behavior. Using model-based system engineering, we present methods for verifying the opera- tion of a prototypical quantum random number generator. We begin with the algorithmic design of the QRNG followed by the synthesis of its physical design requirements. We next discuss how quantum statistical testing can be used to verify device behavior as well as detect device bias. We conclude by highlighting how system design and verification methods must influence effort to certify future quantum technologies.
Quantum statistical testing of a quantum random number generator
NASA Astrophysics Data System (ADS)
Humble, Travis S.
2014-10-01
The unobservable elements in a quantum technology, e.g., the quantum state, complicate system verification against promised behavior. Using model-based system engineering, we present methods for verifying the operation of a prototypical quantum random number generator. We begin with the algorithmic design of the QRNG followed by the synthesis of its physical design requirements. We next discuss how quantum statistical testing can be used to verify device behavior as well as detect device bias. We conclude by highlighting how system design and verification methods must influence effort to certify future quantum technologies.
[Statistical tests in medical research: traditional methods vs. multivariate NPC permutation tests].
Arboretti, Rosa; Bordignon, Paolo; Corain, Livio; Palermo, Giuseppe; Pesarin, Fortunato; Salmaso, Luigi
2015-01-01
Statistical tests in medical research: traditional methods vs. multivariate npc permutation tests.Within medical research, a useful statistical tool is based on hypotheses testing in terms of the so-called null, that is the treatment has no effect, and alternative hypotheses, that is the treatment has some effects. By controlling the risks of wrong decisions, empirical data are used in order to possibly reject the null hypotheses in favour of the alternative, so that demonstrating the efficacy of a treatment of interest. The multivariate permutation tests, based on the nonparametric combination - NPC method, provide an innovative, robust and effective hypotheses testing solution to many real problems that are commonly encountered in medical research when multiple end-points are observed. This paper discusses the various approaches to hypothesis testing and the main advantages of NPC tests, which consist in the fact that they require much less stringent assumptions than traditional statistical tests. Moreover, the related results may be extended to the reference population even in case of selection-bias, that is non-random sampling. In this work, we review and discuss some basic testing procedures along with the theoretical and practical relevance of NPC tests showing their effectiveness in medical research. Within the non-parametric methods, NPC tests represent the current "frontier" of statistical research, but already widely available in the practice of analysis of clinical data.
Biostatistics Series Module 7: The Statistics of Diagnostic Tests.
Hazra, Avijit; Gogtay, Nithya
2017-01-01
- the one lying on the "elbow" of the curve. Cohen's kappa (κ) statistic is a measure of inter-rater agreement for categorical variables. It can also be applied to assess how far two tests agree with respect to diagnostic categorization. It is generally thought to be a more robust measure than simple percent agreement calculation since kappa takes into account the agreement occurring by chance.
Biostatistics Series Module 7: The Statistics of Diagnostic Tests
Hazra, Avijit; Gogtay, Nithya
2017-01-01
optimum cutoff – the one lying on the “elbow” of the curve. Cohen's kappa (κ) statistic is a measure of inter-rater agreement for categorical variables. It can also be applied to assess how far two tests agree with respect to diagnostic categorization. It is generally thought to be a more robust measure than simple percent agreement calculation since kappa takes into account the agreement occurring by chance. PMID:28216720
Brannath, Werner; Bretz, Frank; Maurer, Willi; Sarkar, Sanat
2009-12-01
The two-sided Simes test is known to control the type I error rate with bivariate normal test statistics. For one-sided hypotheses, control of the type I error rate requires that the correlation between the bivariate normal test statistics is non-negative. In this article, we introduce a trimmed version of the one-sided weighted Simes test for two hypotheses which rejects if (i) the one-sided weighted Simes test rejects and (ii) both p-values are below one minus the respective weighted Bonferroni adjusted level. We show that the trimmed version controls the type I error rate at nominal significance level alpha if (i) the common distribution of test statistics is point symmetric and (ii) the two-sided weighted Simes test at level 2alpha controls the level. These assumptions apply, for instance, to bivariate normal test statistics with arbitrary correlation. In a simulation study, we compare the power of the trimmed weighted Simes test with the power of the weighted Bonferroni test and the untrimmed weighted Simes test. An additional result of this article ensures type I error rate control of the usual weighted Simes test under a weak version of the positive regression dependence condition for the case of two hypotheses. This condition is shown to apply to the two-sided p-values of one- or two-sample t-tests for bivariate normal endpoints with arbitrary correlation and to the corresponding one-sided p-values if the correlation is non-negative. The Simes test for such types of bivariate t-tests has not been considered before. According to our main result, the trimmed version of the weighted Simes test then also applies to the one-sided bivariate t-test with arbitrary correlation.
Statistical Tests of Taylor's Hypothesis: An Application to Precipitation Fields
NASA Astrophysics Data System (ADS)
Murthi, A.; Li, B.; Bowman, K.; North, G.; Genton, M.; Sherman, M.
2009-05-01
The Taylor Hypothesis (TH) as applied to rainfall is a proposition about the space-time covariance structure of the rainfall field. Specifically, it supposes that if a spatio-temporal precipitation field with a stationary covariance Cov(r, τ) in both space r and time τ, moves with a constant velocity v, then the temporal covariance at time lag τ is equal to the spatial covariance at space lag v τ, that is, Cov(0, τ) = Cov(v τ, 0). Qualitatively this means that the field evolves slowly in time relative to the advective time scale, which is often referred to as the 'frozen field' hypothesis. Of specific interest is whether there is a cut-off or decorrelation time scale for which the TH holds for a given mean flow velocity v. In this study the validity of the TH is tested for precipitation fields using high-resolution gridded NEXRAD radar reflectivity data produced by the WSI Corporation by employing two different statistical approaches. The first method is based upon rigorous hypothesis testing while the second is based on a simple correlation analysis, which neglects possible dependencies in the correlation estimates. We use radar reflectivity values from the southeastern United States with an approximate horizontal resolution of 4 km x 4 km and a temporal resolution of 15 minutes. During the 4-day period from 2 to 5 May 2002, substantial precipitation occurs in the region of interest, and the motion of the precipitation systems is approximately uniform. The results of both statistical methods suggest that the TH might hold for the shortest space and time scales resolved by the data (4 km and 15 minutes), but that it does not hold for longer periods or larger spatial scales. Also, the simple correlation analysis tends to overestimate the statistical significance through failing to account for correlations between the covariance estimates.
Statistical Tests of the PTHA Poisson Assumption for Submarine Landslides
NASA Astrophysics Data System (ADS)
Geist, E. L.; Chaytor, J. D.; Parsons, T.; Ten Brink, U. S.
2012-12-01
We demonstrate that a sequence of dated mass transport deposits (MTDs) can provide information to statistically test whether or not submarine landslides associated with these deposits conform to a Poisson model of occurrence. Probabilistic tsunami hazard analysis (PTHA) most often assumes Poissonian occurrence for all sources, with an exponential distribution of return times. Using dates that define the bounds of individual MTDs, we first describe likelihood and Monte Carlo methods of parameter estimation for a suite of candidate occurrence models (Poisson, lognormal, gamma, Brownian Passage Time). In addition to age-dating uncertainty, both methods incorporate uncertainty caused by the open time intervals: i.e., before the first and after the last event to the present. Accounting for these open intervals is critical when there are a small number of observed events. The optimal occurrence model is selected according to both the Akaike Information Criteria (AIC) and Akaike's Bayesian Information Criterion (ABIC). In addition, the likelihood ratio test can be performed on occurrence models from the same family: e.g., the gamma model relative to the exponential model of return time distribution. Parameter estimation, model selection, and hypothesis testing are performed on data from two IODP holes in the northern Gulf of Mexico that penetrated a total of 14 MTDs, some of which are correlated between the two holes. Each of these events has been assigned an age based on microfossil zonations and magnetostratigraphic datums. Results from these sites indicate that the Poisson assumption is likely valid. However, parameter estimation results using the likelihood method for one of the sites suggest that the events may have occurred quasi-periodically. Methods developed in this study provide tools with which one can determine both the rate of occurrence and the statistical validity of the Poisson assumption when submarine landslides are included in PTHA.
ERIC Educational Resources Information Center
Bolt, Daniel M.; Gierl, Mark J.
2006-01-01
Inspection of differential item functioning (DIF) in translated test items can be informed by graphical comparisons of item response functions (IRFs) across translated forms. Due to the many forms of DIF that can emerge in such analyses, it is important to develop statistical tests that can confirm various characteristics of DIF when present.…
Jones, P L; Swain, W T; Trammell, C J
1999-01-01
When a population is too large for exhaustive study, as is the case for all possible uses of a software system, a statistically correct sample must be drawn as a basis for inferences about the population. A Markov chain usage model is an engineering formalism that represents the population of possible uses for which a product is to be tested. In statistical testing of software based on a Markov chain usage model, the rich body of analytical results available for Markov chains provides numerous insights that can be used in both product development and test planing. A usage model is based on specifications rather than code, so insights that result from model building can inform product decisions in the early stages of a project when the opportunity to prevent problems is the greatest. Statistical testing based on a usage model provides a sound scientific basis for quantifying the reliability of software.
Comparison of Statistical Methods for Detector Testing Programs
Rennie, John Alan; Abhold, Mark
2016-10-14
A typical goal for any detector testing program is to ascertain not only the performance of the detector systems under test, but also the confidence that systems accepted using that testing program’s acceptance criteria will exceed a minimum acceptable performance (which is usually expressed as the minimum acceptable success probability, p). A similar problem often arises in statistics, where we would like to ascertain the fraction, p, of a population of items that possess a property that may take one of two possible values. Typically, the problem is approached by drawing a fixed sample of size n, with the number of items out of n that possess the desired property, x, being termed successes. The sample mean gives an estimate of the population mean p ≈ x/n, although usually it is desirable to accompany such an estimate with a statement concerning the range within which p may fall and the confidence associated with that range. Procedures for establishing such ranges and confidence limits are described in detail by Clopper, Brown, and Agresti for two-sided symmetric confidence intervals.
A test statistic for the affected-sib-set method.
Lange, K
1986-07-01
This paper discusses generalizations of the affected-sib-pair method. First, the requirement that sib identity-by-descent relations be known unambiguously is relaxed by substituting sib identity-by-state relations. This permits affected sibs to be used even when their parents are unavailable for typing. In the limit of an infinite number of marker alleles each of infinitesimal population frequency, the identity-by-state relations coincide with the usual identity-by-descent relations. Second, a weighted pairs test statistic is proposed that covers affected sib sets of size greater than two. These generalizations make the affected-sib-pair method a more powerful technique for detecting departures from independent segregation of disease and marker phenotypes. A sample calculation suggests such a departure for tuberculoid leprosy and the HLA D locus.
A statistical design for testing apomictic diversification through linkage analysis.
Zeng, Yanru; Hou, Wei; Song, Shuang; Feng, Sisi; Shen, Lin; Xia, Guohua; Wu, Rongling
2014-03-01
The capacity of apomixis to generate maternal clones through seed reproduction has made it a useful characteristic for the fixation of heterosis in plant breeding. It has been observed that apomixis displays pronounced intra- and interspecific diversification, but the genetic mechanisms underlying this diversification remains elusive, obstructing the exploitation of this phenomenon in practical breeding programs. By capitalizing on molecular information in mapping populations, we describe and assess a statistical design that deploys linkage analysis to estimate and test the pattern and extent of apomictic differences at various levels from genotypes to species. The design is based on two reciprocal crosses between two individuals each chosen from a hermaphrodite or monoecious species. A multinomial distribution likelihood is constructed by combining marker information from two crosses. The EM algorithm is implemented to estimate the rate of apomixis and test its difference between two plant populations or species as the parents. The design is validated by computer simulation. A real data analysis of two reciprocal crosses between hickory (Carya cathayensis) and pecan (C. illinoensis) demonstrates the utilization and usefulness of the design in practice. The design provides a tool to address fundamental and applied questions related to the evolution and breeding of apomixis.
Statistical tests of additional plate boundaries from plate motion inversions
NASA Technical Reports Server (NTRS)
Stein, S.; Gordon, R. G.
1984-01-01
The application of the F-ratio test, a standard statistical technique, to the results of relative plate motion inversions has been investigated. The method tests whether the improvement in fit of the model to the data resulting from the addition of another plate to the model is greater than that expected purely by chance. This approach appears to be useful in determining whether additional plate boundaries are justified. Previous results have been confirmed favoring separate North American and South American plates with a boundary located beween 30 N and the equator. Using Chase's global relative motion data, it is shown that in addition to separate West African and Somalian plates, separate West Indian and Australian plates, with a best-fitting boundary between 70 E and 90 E, can be resolved. These results are generally consistent with the observation that the Indian plate's internal deformation extends somewhat westward of the Ninetyeast Ridge. The relative motion pole is similar to Minster and Jordan's and predicts the NW-SE compression observed in earthquake mechanisms near the Ninetyeast Ridge.
A Unifying Framework for Teaching Nonparametric Statistical Tests
ERIC Educational Resources Information Center
Bargagliotti, Anna E.; Orrison, Michael E.
2014-01-01
Increased importance is being placed on statistics at both the K-12 and undergraduate level. Research divulging effective methods to teach specific statistical concepts is still widely sought after. In this paper, we focus on best practices for teaching topics in nonparametric statistics at the undergraduate level. To motivate the work, we…
Development and testing of improved statistical wind power forecasting methods.
Mendes, J.; Bessa, R.J.; Keko, H.; Sumaili, J.; Miranda, V.; Ferreira, C.; Gama, J.; Botterud, A.; Zhou, Z.; Wang, J.
2011-12-06
Wind power forecasting (WPF) provides important inputs to power system operators and electricity market participants. It is therefore not surprising that WPF has attracted increasing interest within the electric power industry. In this report, we document our research on improving statistical WPF algorithms for point, uncertainty, and ramp forecasting. Below, we provide a brief introduction to the research presented in the following chapters. For a detailed overview of the state-of-the-art in wind power forecasting, we refer to [1]. Our related work on the application of WPF in operational decisions is documented in [2]. Point forecasts of wind power are highly dependent on the training criteria used in the statistical algorithms that are used to convert weather forecasts and observational data to a power forecast. In Chapter 2, we explore the application of information theoretic learning (ITL) as opposed to the classical minimum square error (MSE) criterion for point forecasting. In contrast to the MSE criterion, ITL criteria do not assume a Gaussian distribution of the forecasting errors. We investigate to what extent ITL criteria yield better results. In addition, we analyze time-adaptive training algorithms and how they enable WPF algorithms to cope with non-stationary data and, thus, to adapt to new situations without requiring additional offline training of the model. We test the new point forecasting algorithms on two wind farms located in the U.S. Midwest. Although there have been advancements in deterministic WPF, a single-valued forecast cannot provide information on the dispersion of observations around the predicted value. We argue that it is essential to generate, together with (or as an alternative to) point forecasts, a representation of the wind power uncertainty. Wind power uncertainty representation can take the form of probabilistic forecasts (e.g., probability density function, quantiles), risk indices (e.g., prediction risk index) or scenarios
ERIC Educational Resources Information Center
Wainer, Howard; And Others
Four researchers at the Educational Testing Service describe what they consider some of the most vexing research problems they face. While these problems are not completely statistical, they all have major statistical components. Following the introduction (section 1), in section 2, "Problems with the Simultaneous Estimation of Many True…
Strong gravitational lensing statistics as a test of cosmogonic scenarios
NASA Technical Reports Server (NTRS)
Cen, Renyue; Gott, J. Richard, III; Ostriker, Jeremiah P.; Turner, Edwin L.
1994-01-01
Gravitational lensing statistics can provide a direct and powerful test of cosmic structure formation theories. Since lensing tests, directly, the magnitude of the nonlinear mass density fluctuations on lines of sight to distant objects, no issues of 'bias' (of mass fluctuations with respect to galaxy density fluctuations) exist here, although lensing observations provide their own ambiguities of interpretation. We develop numerical techniques for generating model density distributions with the very large spatial dynamic range required by lensing considerations and for identifying regions of the simulations capable of multiple image lensing in a conservative and computationally efficient way that should be accurate for splittings significantly larger than 3 seconds. Applying these techniques to existing standard Cold dark matter (CDM) (Omega = 1) and Primeval Baryon Isocurvature (PBI) (Omega = 0.2) simulations (normalized to the Cosmic Background Explorer Satellite (COBE) amplitude), we find that the CDM model predicts large splitting (greater than 8 seconds) lensing events roughly an order-of-magnitude more frequently than the PBI model. Under the reasonable but idealized assumption that lensing structrues can be modeled as singular isothermal spheres (SIS), the predictions can be directly compared to observations of lensing events in quasar samples. Several large splitting (Delta Theta is greater than 8 seconds) cases are predicted in the standard CDM model (the exact number being dependent on the treatment of amplification bias), whereas none is observed. In a formal sense, the comparison excludes the CDM model at high confidence (essentially for the same reason that CDM predicts excessive small-scale cosmic velocity dispersions.) A very rough assessment of low-density but flat CDM model (Omega = 0.3, Lambda/3H(sup 2 sub 0) = 0.7) indicates a far lower and probably acceptable level of lensing. The PBI model is consistent with, but not strongly tested by, the
A statistical framework for testing modularity in multidimensional data.
Márquez, Eladio J
2008-10-01
Modular variation of multivariate traits results from modular distribution of effects of genetic and epigenetic interactions among those traits. However, statistical methods rarely detect truly modular patterns, possibly because the processes that generate intramodular associations may overlap spatially. Methodologically, this overlap may cause multiple patterns of modularity to be equally consistent with observed covariances. To deal with this indeterminacy, the present study outlines a framework for testing a priori hypotheses of modularity in which putative modules are mathematically represented as multidimensional subspaces embedded in the data. Model expectations are computed by subdividing the data into arrays of variables, and intermodular interactions are represented by overlapping arrays. Covariance structures are thus modeled as the outcome of complex and nonorthogonal intermodular interactions. This approach is demonstrated by analyzing mandibular modularity in nine rodent species. A total of 620 models are fit to each species, and the most strongly supported are heuristically modified to improve their fit. Five modules common to all species are identified, which approximately map to the developmental modules of the mandible. Within species, these modules are embedded within larger "super-modules," suggesting that these conserved modules act as building blocks from which covariation patterns are built.
A weighted generalized score statistic for comparison of predictive values of diagnostic tests.
Kosinski, Andrzej S
2013-03-15
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
The Geometry of Probability, Statistics, and Test Theory.
ERIC Educational Resources Information Center
Zimmerman, Donald W.; Zumbo, Bruno D.
2001-01-01
Presents a model of tests and measurement that identifies test scores with Hilbert space vectors and true and error components of scores with linear operators. This geometric point of view brings to light relations among elementary concepts in test theory, including reliability, validity, and parallel tests. (Author/SLD)
Stork, LeAnna M.; Gennings, Chris; Carchman, Richard; Carter, Jr., Walter H.; Pounds, Joel G.; Mumtaz, Moiz
2006-12-01
Several assumptions, defined and undefined, are used in the toxicity assessment of chemical mixtures. In scientific practice mixture components in the low-dose region, particularly subthreshold doses, are often assumed to behave additively (i.e., zero interaction) based on heuristic arguments. This assumption has important implications in the practice of risk assessment, but has not been experimentally tested. We have developed methodology to test for additivity in the sense of Berenbaum (Advances in Cancer Research, 1981), based on the statistical equivalence testing literature where the null hypothesis of interaction is rejected for the alternative hypothesis of additivity when data support the claim. The implication of this approach is that conclusions of additivity are made with a false positive rate controlled by the experimenter. The claim of additivity is based on prespecified additivity margins, which are chosen using expert biological judgment such that small deviations from additivity, which are not considered to be biologically important, are not statistically significant. This approach is in contrast to the usual hypothesis-testing framework that assumes additivity in the null hypothesis and rejects when there is significant evidence of interaction. In this scenario, failure to reject may be due to lack of statistical power making the claim of additivity problematic. The proposed method is illustrated in a mixture of five organophosphorus pesticides that were experimentally evaluated alone and at relevant mixing ratios. Motor activity was assessed in adult male rats following acute exposure. Four low-dose mixture groups were evaluated. Evidence of additivity is found in three of the four low-dose mixture groups.The proposed method tests for additivity of the whole mixture and does not take into account subset interactions (e.g., synergistic, antagonistic) that may have occurred and cancelled each other out.
Understanding the Sampling Distribution and Its Use in Testing Statistical Significance.
ERIC Educational Resources Information Center
Breunig, Nancy A.
Despite the increasing criticism of statistical significance testing by researchers, particularly in the publication of the 1994 American Psychological Association's style manual, statistical significance test results are still popular in journal articles. For this reason, it remains important to understand the logic of inferential statistics. A…
Ensuring Positiveness of the Scaled Difference Chi-Square Test Statistic
ERIC Educational Resources Information Center
Satorra, Albert; Bentler, Peter M.
2010-01-01
A scaled difference test statistic T[tilde][subscript d] that can be computed from standard software of structural equation models (SEM) by hand calculations was proposed in Satorra and Bentler (Psychometrika 66:507-514, 2001). The statistic T[tilde][subscript d] is asymptotically equivalent to the scaled difference test statistic T[bar][subscript…
Interpretation of Statistical Significance Testing: A Matter of Perspective.
ERIC Educational Resources Information Center
McClure, John; Suen, Hoi K.
1994-01-01
This article compares three models that have been the foundation for approaches to the analysis of statistical significance in early childhood research--the Fisherian and the Neyman-Pearson models (both considered "classical" approaches), and the Bayesian model. The article concludes that all three models have a place in the analysis of research…
New heterogeneous test statistics for the unbalanced fixed-effect nested design.
Guo, Jiin-Huarng; Billard, L; Luh, Wei-Ming
2011-05-01
When the underlying variances are unknown or/and unequal, using the conventional F test is problematic in the two-factor hierarchical data structure. Prompted by the approximate test statistics (Welch and Alexander-Govern methods), the authors develop four new heterogeneous test statistics to test factor A and factor B nested within A for the unbalanced fixed-effect two-stage nested design under variance heterogeneity. The actual significance levels and statistical power of the test statistics were compared in a simulation study. The results show that the proposed procedures maintain better Type I error rate control and have greater statistical power than those obtained by the conventional F test in various conditions. Therefore, the proposed test statistics are recommended in terms of robustness and easy implementation.
2011-06-01
exercise events do not necessarily require statistically defendable results. In an OT&E experiment, the results may often require statistical rigor...EXAMINING THE STATISTICAL RIGOR OF TEST AND EVALUATION RESULTS IN THE LIVE, VIRTUAL AND CONSTRUCTIVE...Department of Defense, or the U.S. Government. AFIT/IOA/ENS/11-06 EXAMINING THE STATISTICAL RIGOR OF TEST AND EVALUATION RESULTS IN THE LIVE, VIRTUAL
Statistical algorithms for a comprehensive test ban treaty discrimination framework
Foote, N.D.; Anderson, D.N.; Higbee, K.T.; Miller, N.E.; Redgate, T.; Rohay, A.C.; Hagedorn, D.N.
1996-10-01
Seismic discrimination is the process of identifying a candidate seismic event as an earthquake or explosion using information from seismic waveform features (seismic discriminants). In the CTBT setting, low energy seismic activity must be detected and identified. A defensible CTBT discrimination decision requires an understanding of false-negative (declaring an event to be an earthquake given it is an explosion) and false-position (declaring an event to be an explosion given it is an earthquake) rates. These rates are derived from a statistical discrimination framework. A discrimination framework can be as simple as a single statistical algorithm or it can be a mathematical construct that integrates many different types of statistical algorithms and CTBT technologies. In either case, the result is the identification of an event and the numerical assessment of the accuracy of an identification, that is, false-negative and false-positive rates. In Anderson et al., eight statistical discrimination algorithms are evaluated relative to their ability to give results that effectively contribute to a decision process and to be interpretable with physical (seismic) theory. These algorithms can be discrimination frameworks individually or components of a larger framework. The eight algorithms are linear discrimination (LDA), quadratic discrimination (QDA), variably regularized discrimination (VRDA), flexible discrimination (FDA), logistic discrimination, K-th nearest neighbor (KNN), kernel discrimination, and classification and regression trees (CART). In this report, the performance of these eight algorithms, as applied to regional seismic data, is documented. Based on the findings in Anderson et al. and this analysis: CART is an appropriate algorithm for an automated CTBT setting.
Evaluating clinical significance: incorporating robust statistics with normative comparison tests.
van Wieringen, Katrina; Cribbie, Robert A
2014-05-01
The purpose of this study was to evaluate a modified test of equivalence for conducting normative comparisons when distribution shapes are non-normal and variances are unequal. A Monte Carlo study was used to compare the empirical Type I error rates and power of the proposed Schuirmann-Yuen test of equivalence, which utilizes trimmed means, with that of the previously recommended Schuirmann and Schuirmann-Welch tests of equivalence when the assumptions of normality and variance homogeneity are satisfied, as well as when they are not satisfied. The empirical Type I error rates of the Schuirmann-Yuen were much closer to the nominal α level than those of the Schuirmann or Schuirmann-Welch tests, and the power of the Schuirmann-Yuen was substantially greater than that of the Schuirmann or Schuirmann-Welch tests when distributions were skewed or outliers were present. The Schuirmann-Yuen test is recommended for assessing clinical significance with normative comparisons.
Leiva, David; Solanas, Antonio; Salafranca, Lluís
2008-05-01
In the present article, we focus on two indices that quantify directionality and skew-symmetrical patterns in social interactions as measures of social reciprocity: the directional consistency (DC) and skew-symmetry indices. Although both indices enable researchers to describe social groups, most studies require statistical inferential tests. The main aims of the present study are first, to propose an overall statistical technique for testing null hypotheses regarding social reciprocity in behavioral studies, using the DC and skew-symmetry statistics (Phi) at group level; and second, to compare both statistics in order to allow researchers to choose the optimal measure depending on the conditions. In order to allow researchers to make statistical decisions, statistical significance for both statistics has been estimated by means of a Monte Carlo simulation. Furthermore, this study will enable researchers to choose the optimal observational conditions for carrying out their research, since the power of the statistical tests has been estimated.
Statistical Revisions in the Washington Pre-College Testing Program.
ERIC Educational Resources Information Center
Beanblossom, Gary F.; And Others
The Washington Pre-College (WPC) program decided, in fall 1967, to inaugurate in April 1968 the testing of high school students during the spring of their junior year. The advantages of this shift from senior year testing were to provide guidance data for earlier, more extensive use in high school and to make these data available to colleges at…
Estimating Statistical Power When Making Adjustments for Multiple Tests
ERIC Educational Resources Information Center
Porter, Kristin E.
2016-01-01
In recent years, there has been increasing focus on the issue of multiple hypotheses testing in education evaluation studies. In these studies, researchers are typically interested in testing the effectiveness of an intervention on multiple outcomes, for multiple subgroups, at multiple points in time or across multiple treatment groups. When…
Statistics of sampling for microbiological testing of foodborne pathogens
Technology Transfer Automated Retrieval System (TEKTRAN)
Despite the many recent advances in protocols for testing for pathogens in foods, a number of challenges still exist. For example, the microbiological safety of food cannot be completely ensured by testing because microorganisms are not evenly distributed throughout the food. Therefore, since it i...
A statistical approach to nondestructive testing of laser welds
Duncan, H.A.
1983-07-01
A statistical analysis of the data obtained from a relatively new nondestructive technique for laser welding is presented. The technique is one in which information relating to the quality of the welded joint is extracted from the high intensity plume which is generated from the materials that are welded. The system is such that the detected plume is processed to give a numerical value associated with the material vaporization and consequently, the weld quality. Optimum thresholds for the region in which a weld can be considered as acceptable are determined based on the Neyman-Pearson criterion and Bayes rule.
ERIC Educational Resources Information Center
Monterde-i-Bort, Hector; Frias-Navarro, Dolores; Pascual-Llobell, Juan
2010-01-01
The empirical study we present here deals with a pedagogical issue that has not been thoroughly explored up until now in our field. Previous empirical studies in other sectors have identified the opinions of researchers about this topic, showing that completely unacceptable interpretations have been made of significance tests and other statistical…
Misapplication of a Statistical Test: Comment on “Lies, Damned Lies, and Statistics (in Geology)”
NASA Astrophysics Data System (ADS)
Weigel, Robert S.
2011-02-01
In his Forum, P. Vermeesch (Eos, 90(47), 443, doi:10.1029/2009EO470004, 2009) argues that “the strong dependence of p values on sample size makes them uninterpretable” with an example where p values in a hypothesis test using Pearson's chi-square statistic differed by a factor of 1016 when the sample size decreased tenfold. The data were a sequence of magnitude 4 or larger earthquake events (N = 118,415) spanning 3654 days [U.S. Geological Survey, 2010]. There are two problems with the analysis. First, Vermeesch applied the chi-square test to data with statistical properties that are inconsistent with those assumed in the derivation of the chi-square test. Second, he made an assumption that, using a straightforward calculation, can be shown to be inconsistent with the data. I address here only problems related to the application of statistics without reference to any additional physical processes that may also need to be addressed before statistical analysis is performed (e.g., the physics of how aftershocks are related to main earthquakes).
Evaluation of a New Mean Scaled and Moment Adjusted Test Statistic for SEM
ERIC Educational Resources Information Center
Tong, Xiaoxiao; Bentler, Peter M.
2013-01-01
Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and 2 well-known robust test…
The Power of Statistical Tests for Moderators in Meta-Analysis
ERIC Educational Resources Information Center
Hedges, Larry V.; Pigott, Therese D.
2004-01-01
Calculation of the statistical power of statistical tests is important in planning and interpreting the results of research studies, including meta-analyses. It is particularly important in moderator analyses in meta-analysis, which are often used as sensitivity analyses to rule out moderator effects but also may have low statistical power. This…
The Michigan Alcoholism Screening Test (MAST): A Statistical Validation Analysis
ERIC Educational Resources Information Center
Laux, John M.; Newman, Isadore; Brown, Russ
2004-01-01
This study extends the Michigan Alcoholism Screening Test (MAST; M. L. Selzer, 1971) literature base by examining 4 issues related to the validity of the MAST scores. Specifically, the authors examine the validity of the MAST scores in light of the presence of impression management, participant demographic variables, and item endorsement…
NASA Technical Reports Server (NTRS)
Purves, L.; Strang, R. F.; Dube, M. P.; Alea, P.; Ferragut, N.; Hershfeld, D.
1983-01-01
The software and procedures of a system of programs used to generate a report of the statistical correlation between NASTRAN modal analysis results and physical tests results from modal surveys are described. Topics discussed include: a mathematical description of statistical correlation, a user's guide for generating a statistical correlation report, a programmer's guide describing the organization and functions of individual programs leading to a statistical correlation report, and a set of examples including complete listings of programs, and input and output data.
Testing of hypotheses about altitude decompression sickness by statistical analyses
NASA Technical Reports Server (NTRS)
Van Liew, H. D.; Burkard, M. E.; Conkin, J.; Powell, M. R. (Principal Investigator)
1996-01-01
This communication extends a statistical analysis of forced-descent decompression sickness at altitude in exercising subjects (J Appl Physiol 1994; 76:2726-2734) with a data subset having an additional explanatory variable, rate of ascent. The original explanatory variables for risk-function analysis were environmental pressure of the altitude, duration of exposure, and duration of pure-O2 breathing before exposure; the best fit was consistent with the idea that instantaneous risk increases linearly as altitude exposure continues. Use of the new explanatory variable improved the fit of the smaller data subset, as indicated by log likelihood. Also, with ascent rate accounted for, replacement of the term for linear accrual of instantaneous risk by a term for rise and then decay made a highly significant improvement upon the original model (log likelihood increased by 37 log units). The authors conclude that a more representative data set and removal of the variability attributable to ascent rate allowed the rise-and-decay mechanism, which is expected from theory and observations, to become manifest.
Experimental tests of a statistical mechanics of static granular media
NASA Astrophysics Data System (ADS)
Schr"Oter, Matthias
2005-11-01
In 1989 Edwards and Oakeshott proposed a statistical mechanics theory of static granular materials described by a temperature-like state variable named compactivity [1]. We have made the first measurement of the compactivity of a granular material [2]. We have examined a granular column driven by flow pulses and have found that the system explores its phase space of mechanically stable configurations in a history-independent way. The system quickly approaches a steady state; the volume fluctuations about this steady state are Gaussian. The mean volume fraction can be varied by changing the flow rate of the pulses. We calculate the compactivity from the standard deviation of the volume fluctuations [3]. This talk will address the following two questions: (a) Are compactivity values measured with our ``thermometer'' different from values one might measure with a ``thermometer'' based on the grain volume distribution [4]? (b) Can compactivity be a control parameter of granular systems, for example, in size segregation in binary granular mixtures? [1] Edwards and Oakeshott, Physica A 157, 1080 (1989). [2] Schr"oter, Goldman, and Swinney, Phys. Rev. E 71, 030301 (2005). [3] Nowak, Knight, Ben-Naim, Jaeger, and Nagel, Phys. Rev. E 57, 1971 (1988). [4] Edwards, Bruji'c, and Makse, in Unifying Concepts in Granular Media and Glasses, edited by Coniglio et al. (Elsevier, Amsterdam, 2004)
A Statistical Test of Uniformity in Solar Cycle Indices
NASA Technical Reports Server (NTRS)
Hathaway David H.
2012-01-01
Several indices are used to characterize the solar activity cycle. Key among these are: the International Sunspot Number, the Group Sunspot Number, Sunspot Area, and 10.7 cm Radio Flux. A valuable aspect of these indices is the length of the record -- many decades and many (different) 11-year cycles. However, this valuable length-of-record attribute has an inherent problem in that it requires many different observers and observing systems. This can lead to non-uniformity in the datasets and subsequent erroneous conclusions about solar cycle behavior. The sunspot numbers are obtained by counting sunspot groups and individual sunspots on a daily basis. This suggests that the day-to-day and month-to-month variations in these numbers should follow Poisson Statistics and be proportional to the square-root of the sunspot numbers themselves. Examining the historical records of these indices indicates that this is indeed the case - even with Sunspot Area and 10.7 cm Radio Flux. The ratios of the RMS variations to the square-root of the indices themselves are relatively constant with little variation over the phase of each solar cycle or from small to large solar cycles. There are, however, important step-like changes in these ratios associated with changes in observer and/or observer system. Here we show how these variations can be used to construct more uniform datasets.
Testing the DGP model with gravitational lensing statistics
NASA Astrophysics Data System (ADS)
Zhu, Zong-Hong; Sereno, M.
2008-09-01
Aims: The self-accelerating braneworld model (DGP) appears to provide a simple alternative to the standard ΛCDM cosmology to explain the current cosmic acceleration, which is strongly indicated by measurements of type Ia supernovae, as well as other concordant observations. Methods: We investigate observational constraints on this scenario provided by gravitational-lensing statistics using the Cosmic Lens All-Sky Survey (CLASS) lensing sample. Results: We show that a substantial part of the parameter space of the DGP model agrees well with that of radio source gravitational lensing sample. Conclusions: In the flat case, Ω_K=0, the likelihood is maximized, L=L_max, for ΩM = 0.30-0.11+0.19. If we relax the prior on Ω_K, the likelihood peaks at Ω_M,Ωr_c ≃ 0.29, 0.12, slightly in the region of open models. The confidence contours are, however, elongated such that we are unable to discard any of the close, flat or open models.
Development and performances of a high statistics PMT test facility
NASA Astrophysics Data System (ADS)
Maximiliano Mollo, Carlos
2016-04-01
Since almost a century photomultipliers have been the main sensors for photon detection in nuclear and astro-particle physics experiments. In recent years the search for cosmic neutrinos gave birth to enormous size experiments (Antares, Kamiokande, Super-Kamiokande, etc.) and even kilometric scale experiments as ICECUBE and the future KM3NeT. A very large volume neutrino telescope like KM3NeT requires several hundreds of thousands photomultipliers. The performance of the telescope strictly depends on the performance of each PMT. For this reason, it is mandatory to measure the characteristics of each single sensor. The characterization of a PMT normally requires more than 8 hours mostly due to the darkening step. This means that it is not feasible to measure the parameters of each PMT of a neutrino telescope without a system able to test more than one PMT simultaneously. For this application, we have designed, developed and realized a system able to measure the main characteristics of 62 3-inch photomultipliers simultaneously. Two measurement sessions per day are possible. In this work, we describe the design constraints and how they have been satisfied. Finally, we show the performance of the system and the first results coming from the test of few thousand tested PMTs.
ERIC Educational Resources Information Center
Norris, John M.
2015-01-01
Traditions of statistical significance testing in second language (L2) quantitative research are strongly entrenched in how researchers design studies, select analyses, and interpret results. However, statistical significance tests using "p" values are commonly misinterpreted by researchers, reviewers, readers, and others, leading to…
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"
ERIC Educational Resources Information Center
Ozturk, Elif
2012-01-01
The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
ERIC Educational Resources Information Center
Denbleyker, John Nickolas
2012-01-01
The shortcomings of the proportion above cut (PAC) statistic used so prominently in the educational landscape renders it a very problematic measure for making correct inferences with student test data. The limitations of PAC-based statistics are more pronounced with cross-test comparisons due to their dependency on cut-score locations. A better…
A Review of Post-1994 Literature on Whether Statistical Significance Tests Should Be Banned.
ERIC Educational Resources Information Center
Sullivan, Jeremy R.
This paper summarizes the literature regarding statistical significance testing with an emphasis on: (1) the post-1994 literature in various disciplines; (2) alternatives to statistical significance testing; and (3) literature exploring why researchers have demonstrably failed to be influenced by the 1994 American Psychological Association…
The Historical Growth of Statistical Significance Testing in Psychology--and Its Future Prospects.
ERIC Educational Resources Information Center
Hubbard, Raymond; Ryan, Patricia A.
2000-01-01
Examined the historical growth in the popularity of statistical significance testing using a random sample of data from 12 American Psychological Association journals. Results replicate and extend findings from a study that used only one such journal. Discusses the role of statistical significance testing and the use of replication and…
EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.
Tong, Xiaoxiao; Bentler, Peter M
2013-01-01
Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.
On the Correct Use of Statistical Tests: Comment on “Lies, Damned Lies, and Statistics (in Geology)”
NASA Astrophysics Data System (ADS)
Sornette, D.; Pisarenko, V. F.
2011-02-01
Taking the distribution of global seismicity over weekdays as an illustration, Pieter Vermeesch (Eos, 90(47), 443, doi:10:1029/2009EO470004, 2009) in his Forum presented an argument in which a standard chi-square test is found to be so sensitively dependent on the sample size that probabilities of earthquake occurrence from these tests are uninterpretable. He suggests that statistical tests used in the geosciences to “make deductions more ‘objective’” are at best useless, if not misleading. In complete contradiction, we affirm that statistical tests, if they are used properly, are always informative. Vermeesch's error is to assume that it is possible to reduce in the chi-square test simultaneously both the number of earthquakes in each weekday and the sample size by 10. Instead, Vermeesch should have taken 10% of the original data set and then again grouped it into 7 days. Without doing this, it was inevitable that Vermeesch reached his erroneous conclusion.
Zhang, Fanghong; Miyaoka, Etsuo; Huang, Fuping; Tanaka, Yutaka
2015-01-01
The problem for establishing noninferiority is discussed between a new treatment and a standard (control) treatment with ordinal categorical data. A measure of treatment effect is used and a method of specifying noninferiority margin for the measure is provided. Two Z-type test statistics are proposed where the estimation of variance is constructed under the shifted null hypothesis using U-statistics. Furthermore, the confidence interval and the sample size formula are given based on the proposed test statistics. The proposed procedure is applied to a dataset from a clinical trial. A simulation study is conducted to compare the performance of the proposed test statistics with that of the existing ones, and the results show that the proposed test statistics are better in terms of the deviation from nominal level and the power.
NASA Technical Reports Server (NTRS)
Xu, Kuan-Man
2006-01-01
A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.
Evaluation of heart failure biomarker tests: a survey of statistical considerations.
De, Arkendra; Meier, Kristen; Tang, Rong; Li, Meijuan; Gwise, Thomas; Gomatam, Shanti; Pennello, Gene
2013-08-01
Biomarkers assessing cardiovascular function can encompass a wide range of biochemical or physiological measurements. Medical tests that measure biomarkers are typically evaluated for measurement validation and clinical performance in the context of their intended use. General statistical principles for the evaluation of medical tests are discussed in this paper in the context of heart failure. Statistical aspects of study design and analysis to be considered while assessing the quality of measurements and the clinical performance of tests are highlighted. A discussion of statistical considerations for specific clinical uses is also provided. The remarks in this paper mainly focus on methods and considerations for statistical evaluation of medical tests from the perspective of bias and precision. With such an evaluation of performance, healthcare professionals could have information that leads to a better understanding on the strengths and limitations of tests related to heart failure.
New Statistics for Testing Differential Expression of Pathways from Microarray Data
NASA Astrophysics Data System (ADS)
Siu, Hoicheong; Dong, Hua; Jin, Li; Xiong, Momiao
Exploring biological meaning from microarray data is very important but remains a great challenge. Here, we developed three new statistics: linear combination test, quadratic test and de-correlation test to identify differentially expressed pathways from gene expression profile. We apply our statistics to two rheumatoid arthritis datasets. Notably, our results reveal three significant pathways and 275 genes in common in two datasets. The pathways we found are meaningful to uncover the disease mechanisms of rheumatoid arthritis, which implies that our statistics are a powerful tool in functional analysis of gene expression data.
Testing for phylogenetic signal in biological traits: the ubiquity of cross-product statistics.
Pavoine, Sandrine; Ricotta, Carlo
2013-03-01
To evaluate rates of evolution, to establish tests of correlation between two traits, or to investigate to what degree the phylogeny of a species assemblage is predictive of a trait value so-called tests for phylogenetic signal are used. Being based on different approaches, these tests are generally thought to possess quite different statistical performances. In this article, we show that the Blomberg et al. K and K*, the Abouheif index, the Moran's I, and the Mantel correlation are all based on a cross-product statistic, and are thus all related to each other when they are associated to a permutation test of phylogenetic signal. What changes is only the way phylogenetic and trait similarities (or dissimilarities) among the tips of a phylogeny are computed. The definitions of the phylogenetic and trait-based (dis)similarities among tips thus determines the performance of the tests. We shortly discuss the biological and statistical consequences (in terms of power and type I error of the tests) of the observed relatedness among the statistics that allow tests for phylogenetic signal. Blomberg et al. K* statistic appears as one on the most efficient approaches to test for phylogenetic signal. When branch lengths are not available or not accurate, Abouheif's Cmean statistic is a powerful alternative to K*.
ERIC Educational Resources Information Center
LeMire, Steven D.
2010-01-01
This paper proposes an argument framework for the teaching of null hypothesis statistical testing and its application in support of research. Elements of the Toulmin (1958) model of argument are used to illustrate the use of p values and Type I and Type II error rates in support of claims about statistical parameters and subject matter research…
Evaluation of Small-Sample Statistics that Test Whether Variables Measure the Same Trait.
ERIC Educational Resources Information Center
Rasmussen, Jeffrey Lee
1988-01-01
The performance was studied of five small-sample statistics--by F. M. Lord, W. Kristof, Q. McNemar, R. A. Forsyth and L. S. Feldt, and J. P. Braden--that test whether two variables measure the same trait except for measurement error. Effects of non-normality were investigated. The McNemar statistic was most powerful. (TJH)
Mnemonic Aids during Tests: Worthless Frivolity or Effective Tool in Statistics Education?
ERIC Educational Resources Information Center
Larwin, Karen H.; Larwin, David A.; Gorman, Jennifer
2012-01-01
Researchers have explored many pedagogical approaches in an effort to assist students in finding understanding and comfort in required statistics courses. This study investigates the impact of mnemonic aids used during tests on students' statistics course performance in particular. In addition, the present study explores several hypotheses that…
The Use of Person-Fit Statistics To Analyze Placement Tests.
ERIC Educational Resources Information Center
Dodeen, Hamzeh
Person fit is a statistical index that can be used as a direct measure to assess test accuracy by analyzing the response pattern of examinees and identifying those who misfit the testing model. This misfitting is a source of inaccuracy in estimating an individual's ability, and it decreases the expected criterion-related validity of the test being…
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.
Lin, Johnny; Bentler, Peter M
2012-01-01
Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.
A General Class of Test Statistics for Van Valen’s Red Queen Hypothesis
Wiltshire, Jelani; Huffer, Fred W.; Parker, William C.
2014-01-01
Van Valen’s Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen’s work, various statistical approaches have been used to address the relationship between taxon age and the rate of extinction. We propose a general class of test statistics that can be used to test for the effect of age on the rate of extinction. These test statistics allow for a varying background rate of extinction and attempt to remove the effects of other covariates when assessing the effect of age on extinction. No model is assumed for the covariate effects. Instead we control for covariate effects by pairing or grouping together similar species. Simulations are used to compare the power of the statistics. We apply the test statistics to data on Foram extinctions and find that age has a positive effect on the rate of extinction. A derivation of the null distribution of one of the test statistics is provided in the supplementary material. PMID:24910489
Statistical Power of Randomization Tests Used with Multiple-Baseline Designs.
ERIC Educational Resources Information Center
Ferron, John; Sentovich, Chris
2002-01-01
Estimated statistical power for three randomization tests used with multiple-baseline designs using Monte Carlo methods. For an effect size of 0.5, none of the tests provided an adequate level of power, and for an effect size of 1.0, power was adequate for the Koehler-Levin test and the Marascuilo-Busk test only when the series length was long and…
ERIC Educational Resources Information Center
Thompson, Bruce
This paper evaluates the logic underlying various criticisms of statistical significance testing and makes specific recommendations for scientific and editorial practice that might better increase the knowledge base. Reliance on the traditional hypothesis testing model has led to a major bias against nonsignificant results and to misinterpretation…
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis
ERIC Educational Resources Information Center
Lin, Johnny; Bentler, Peter M.
2012-01-01
Goodness-of-fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square, but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's (1984) asymptotically distribution-free method and Satorra Bentler's…
ERIC Educational Resources Information Center
Luh, Wei-Ming; Guo, Jiin-Huarng
2002-01-01
Used Johnson's transformation (N. Johnson, 1978) with approximate test statistics to test the homogeneity of simple linear regression slopes in the presence of nonnormality and Type I, Type II or complete heteroscedasticity. Computer simulations show that the proposed techniques can control Type I error under various circumstances. (SLD)
A Note on Three Statistical Tests in the Logistic Regression DIF Procedure
ERIC Educational Resources Information Center
Paek, Insu
2012-01-01
Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…
Statistical Techniques for Criterion-Referenced Tests. Final Report. October, 1976-October, 1977.
ERIC Educational Resources Information Center
Wilcox, Rand R.
Three statistical problems related to criterion-referenced testing are investigated: estimation of the likelihood of a false-positive or false-negative decision with a mastery test, estimation of true scores in the Compound Binomial Error Model, and comparison of the examinees to a control. Two methods for estimating the likelihood of…
Evaluating Two Models of Collaborative Tests in an Online Introductory Statistics Course
ERIC Educational Resources Information Center
Björnsdóttir, Auðbjörg; Garfield, Joan; Everson, Michelle
2015-01-01
This study explored the use of two different types of collaborative tests in an online introductory statistics course. A study was designed and carried out to investigate three research questions: (1) What is the difference in students' learning between using consensus and non-consensus collaborative tests in the online environment?, (2) What is…
Using Multiple DIF Statistics with the Same Items Appearing in Different Test Forms.
ERIC Educational Resources Information Center
Kubiak, Anna T.; Cowell, William R.
A procedure used to average several Mantel-Haenszel delta difference values for an item is described and evaluated. The differential item functioning (DIF) procedure used by the Educational Testing Service (ETS) is based on the Mantel-Haenszel statistical technique for studying matched groups. It is standard procedure at ETS to analyze test items…
The Comparability of the Statistical Characteristics of Test Items Generated by Computer Algorithms.
ERIC Educational Resources Information Center
Meisner, Richard; And Others
This paper presents a study on the generation of mathematics test items using algorithmic methods. The history of this approach is briefly reviewed and is followed by a survey of the research to date on the statistical parallelism of algorithmically generated mathematics items. Results are presented for 8 parallel test forms generated using 16…
ERIC Educational Resources Information Center
White, Desley
2015-01-01
Two practical activities are described, which aim to support critical thinking about statistics as they concern multiple outcomes testing. Formulae are presented in Microsoft Excel spreadsheets, which are used to calculate the inflation of error associated with the quantity of tests performed. This is followed by a decision-making exercise, where…
What Are Null Hypotheses? The Reasoning Linking Scientific and Statistical Hypothesis Testing
ERIC Educational Resources Information Center
Lawson, Anton E.
2008-01-01
We should dispense with use of the confusing term "null hypothesis" in educational research reports. To explain why the term should be dropped, the nature of, and relationship between, scientific and statistical hypothesis testing is clarified by explication of (a) the scientific reasoning used by Gregor Mendel in testing specific…
Nonclassicality tests by classical bounds on the statistics of multiple outcomes
Luis, Alfredo
2010-08-15
We derive simple practical tests revealing the quantum nature of states by the violation of classical upper bounds on the statistics of multiple outcomes of an observable. These criteria can be expressed in terms of the Kullback-Leibler divergence (or relative entropy). Nonclassicality tests for multiple outcomes can be satisfied by states that do not fulfill the corresponding single-outcome criteria.
Steffen, Jason H.; Ford, Eric B.; Rowe, Jason F.; Fabrycky, Daniel C.; Holman, Matthew J.; Welsh, William F.; Borucki, William J.; Batalha, Natalie M.; Bryson, Steve; Caldwell, Douglas A.; Ciardi, David R.; /Caltech /NASA, Ames /SETI Inst., Mtn. View
2012-01-01
We analyze the deviations of transit times from a linear ephemeris for the Kepler Objects of Interest (KOI) through Quarter six (Q6) of science data. We conduct two statistical tests for all KOIs and a related statistical test for all pairs of KOIs in multi-transiting systems. These tests identify several systems which show potentially interesting transit timing variations (TTVs). Strong TTV systems have been valuable for the confirmation of planets and their mass measurements. Many of the systems identified in this study should prove fruitful for detailed TTV studies.
Steffen, Jason H.; Ford, Eric B.; Rowe, Jason F.; Borucki, William J.; Bryson, Steve; Caldwell, Douglas A.; Jenkins, Jon M.; Koch, David G.; Sanderfer, Dwight T.; Seader, Shawn; Twicken, Joseph D.; Fabrycky, Daniel C.; Welsh, William F.; Batalha, Natalie M.; Ciardi, David R.; Prsa, Andrej
2012-09-10
We analyze the deviations of transit times from a linear ephemeris for the Kepler Objects of Interest (KOI) through quarter six of science data. We conduct two statistical tests for all KOIs and a related statistical test for all pairs of KOIs in multi-transiting systems. These tests identify several systems which show potentially interesting transit timing variations (TTVs). Strong TTV systems have been valuable for the confirmation of planets and their mass measurements. Many of the systems identified in this study should prove fruitful for detailed TTV studies.
Exact Statistical Tests for Heterogeneity of Frequencies Based on Extreme Values
Wu, Chih-Chieh; Grimson, Roger C.; Shete, Sanjay
2014-01-01
Sophisticated statistical analyses of incidence frequencies are often required for various epidemiologic and biomedical applications. Among the most commonly applied methods is Pearson's χ2 test, which is structured to detect non-specific anomalous patterns of frequencies and is useful for testing the significance for incidence heterogeneity. However, the Pearson's χ2 test is not efficient for assessing the significance of frequency in a particular cell (or class) to be attributed to chance alone. We recently developed statistical tests for detecting temporal anomalies of disease cases based on maximum and minimum frequencies; these tests are actually designed to test of significance for a particular high or low frequency. We show that our proposed methods are more sensitive and powerful for testing extreme cell counts than is the Pearson's χ2 test. We elucidated and illustrated the differences in sensitivity among our tests and the Pearson's χ2 test by analyzing a data set of Langerhans cell histiocytosis cases and its hypothetical sets. We also computed and compared the statistical power of these methods using various sets of cell numbers and alternative frequencies. Our study will provide investigators with useful guidelines for selecting the appropriate tests for their studies. PMID:25558124
ERIC Educational Resources Information Center
McArthur, David; Chou, Chih-Ping
Diagnostic testing confronts several challenges at once, among which are issues of test interpretation and immediate modification of the test itself in response to the interpretation. Several methods are available for administering and evaluating a test in real-time, towards optimizing the examiner's chances of isolating a persistent pattern of…
Improved Test Planning and Analysis Through the Use of Advanced Statistical Methods
NASA Technical Reports Server (NTRS)
Green, Lawrence L.; Maxwell, Katherine A.; Glass, David E.; Vaughn, Wallace L.; Barger, Weston; Cook, Mylan
2016-01-01
The goal of this work is, through computational simulations, to provide statistically-based evidence to convince the testing community that a distributed testing approach is superior to a clustered testing approach for most situations. For clustered testing, numerous, repeated test points are acquired at a limited number of test conditions. For distributed testing, only one or a few test points are requested at many different conditions. The statistical techniques of Analysis of Variance (ANOVA), Design of Experiments (DOE) and Response Surface Methods (RSM) are applied to enable distributed test planning, data analysis and test augmentation. The D-Optimal class of DOE is used to plan an optimally efficient single- and multi-factor test. The resulting simulated test data are analyzed via ANOVA and a parametric model is constructed using RSM. Finally, ANOVA can be used to plan a second round of testing to augment the existing data set with new data points. The use of these techniques is demonstrated through several illustrative examples. To date, many thousands of comparisons have been performed and the results strongly support the conclusion that the distributed testing approach outperforms the clustered testing approach.
Wang, Q.; Denton, D.L.; Shukla, R.
2000-01-01
As a follow up to the recommendations of the September 1995 SETAC Pellston Workshop on Whole Effluent Toxicity (WET) on test methods and appropriate endpoints, this paper will discuss the applications and statistical properties of using a statistical criterion of minimum significant difference (MSD). The authors examined the upper limits of acceptable MSDs as acceptance criterion in the case of normally distributed data. The implications of this approach are examined in terms of false negative rate as well as false positive rate. Results indicated that the proposed approach has reasonable statistical properties. Reproductive data from short-term chronic WET test with Ceriodaphnia dubia tests were used to demonstrate the applications of the proposed approach. The data were collected by the North Carolina Department of Environment, Health, and Natural Resources (Raleigh, NC, USA) as part of their National Pollutant Discharge Elimination System program.
Green, John; Wheeler, James R
2013-11-15
Solvents are often used to aid test item preparation in aquatic ecotoxicity experiments. This paper discusses the practical, statistical and regulatory considerations. The selection of the appropriate control (if a solvent is used) for statistical analysis is investigated using a database of 141 responses (endpoints) from 71 experiments. The advantages and disadvantages of basing the statistical analysis of treatment effects to the water control alone, solvent control alone, combined controls, or a conditional strategy of combining controls, when not statistically significantly different, are tested. The latter two approaches are shown to have distinct advantages. It is recommended that this approach continue to be the standard used for regulatory and research aquatic ecotoxicology studies. However, wherever technically feasible a solvent should not be employed or at least the concentration minimized.
Canton, S.P.
1994-12-31
Past studies have shown considerable variability in whole effluent toxicity tests in terms of LC{sub 50}`s and NOEC`s from reference toxicant tests. However, this approach cannot differentiate between variability in test organisms themselves from the variable response to a toxicant. A data base of control treatments in chronic WET tests was constructed allowing evaluation of mean performance of WET test organisms Ceriodaphnia dubia and Pimephales promelas not subjected to chemical stress. Surrogate test series were then constructed by randomly selecting replicates from this control data base. These surrogate test series were analyzed using standard EPA statistical procedures to determine NOEC`s for survival and both NOEC`s and IC{sub 25} for reproduction and growth. Since NOEC`s have a significance level (p) of 0.05, it follows that approximately 5% of the tests could ``fail`` simply due to chance and this was, in fact, the case for these surrogate tests. The IC{sub 25} statistic is a linear interpolation technique, with 95% confidence intervals calculated through a bootstrap method. It does not have a statistical check for significance. With the IC{sub 25} statistic, 10.5% of the Ceriodaphnia tests indicated toxicity (i.e. an IC{sub 25} of less than 1 00% ``effluent``), while this increased to 37% for fathead minnows. There appear to be fundamental flaws in the calculation of the IC{sub 25} statistic and its confidence intervals, as currently provided in EPA documentation. Until these flaws are addressed, it is recommended that this method not be used in the analysis of chronic toxicity data.
Statistical studies of animal response data from USF toxicity screening test method
NASA Technical Reports Server (NTRS)
Hilado, C. J.; Machado, A. M.
1978-01-01
Statistical examination of animal response data obtained using Procedure B of the USF toxicity screening test method indicates that the data deviate only slightly from a normal or Gaussian distribution. This slight departure from normality is not expected to invalidate conclusions based on theoretical statistics. Comparison of times to staggering, convulsions, collapse, and death as endpoints shows that time to death appears to be the most reliable endpoint because it offers the lowest probability of missed observations and premature judgements.
A NEW TEST OF THE STATISTICAL NATURE OF THE BRIGHTEST CLUSTER GALAXIES
Lin, Yen-Ting; Ostriker, Jeremiah P.; Miller, Christopher J.
2010-06-01
A novel statistic is proposed to examine the hypothesis that all cluster galaxies are drawn from the same luminosity distribution (LD). In such a 'statistical model' of galaxy LD, the brightest cluster galaxies (BCGs) are simply the statistical extreme of the galaxy population. Using a large sample of nearby clusters, we show that BCGs in high luminosity clusters (e.g., L {sub tot} {approx}> 4 x 10{sup 11} h {sup -2} {sub 70} L {sub sun}) are unlikely (probability {<=}3 x 10{sup -4}) to be drawn from the LD defined by all red cluster galaxies more luminous than M{sub r} = -20. On the other hand, BCGs in less luminous clusters are consistent with being the statistical extreme. Applying our method to the second brightest galaxies, we show that they are consistent with being the statistical extreme, which implies that the BCGs are also distinct from non-BCG luminous, red, cluster galaxies. We point out some issues with the interpretation of the classical tests proposed by Tremaine and Richstone (TR) that are designed to examine the statistical nature of BCGs, investigate the robustness of both our statistical test and those of TR against difficulties in photometry of galaxies of large angular size, and discuss the implication of our findings on surveys that use the luminous red galaxies to measure the baryon acoustic oscillation features in the galaxy power spectrum.
A New Test of the Statistical Nature of the Brightest Cluster Galaxies
NASA Astrophysics Data System (ADS)
Lin, Yen-Ting; Ostriker, Jeremiah P.; Miller, Christopher J.
2010-06-01
A novel statistic is proposed to examine the hypothesis that all cluster galaxies are drawn from the same luminosity distribution (LD). In such a "statistical model" of galaxy LD, the brightest cluster galaxies (BCGs) are simply the statistical extreme of the galaxy population. Using a large sample of nearby clusters, we show that BCGs in high luminosity clusters (e.g., L tot >~ 4 × 1011 h -2 70 L sun) are unlikely (probability <=3 × 10-4) to be drawn from the LD defined by all red cluster galaxies more luminous than Mr = -20. On the other hand, BCGs in less luminous clusters are consistent with being the statistical extreme. Applying our method to the second brightest galaxies, we show that they are consistent with being the statistical extreme, which implies that the BCGs are also distinct from non-BCG luminous, red, cluster galaxies. We point out some issues with the interpretation of the classical tests proposed by Tremaine & Richstone (TR) that are designed to examine the statistical nature of BCGs, investigate the robustness of both our statistical test and those of TR against difficulties in photometry of galaxies of large angular size, and discuss the implication of our findings on surveys that use the luminous red galaxies to measure the baryon acoustic oscillation features in the galaxy power spectrum.
New advances in methodology for statistical tests useful in geostatistical studies
Borgman, L.E.
1988-05-01
Methodology for statistical procedures to perform tests of hypothesis pertaining to various aspects of geostatistical investigations has been slow in developing. The correlated nature of the data precludes most classical tests and makes the design of new tests difficult. Recent studies have led to modifications of the classical t test which allow for the intercorrelation. In addition, results for certain nonparametric tests have been obtained. The conclusions of these studies provide a variety of new tools for the geostatistician in deciding questions on significant differences and magnitudes.
A new model test in high energy physics in frequentist and Bayesian statistical formalisms
NASA Astrophysics Data System (ADS)
Kamenshchikov, A.
2017-01-01
A problem of a new physical model test given observed experimental data is a typical one for modern experiments of high energy physics (HEP). A solution of the problem may be provided with two alternative statistical formalisms, namely frequentist and Bayesian, which are widely spread in contemporary HEP searches. A characteristic experimental situation is modeled from general considerations and both the approaches are utilized in order to test a new model. The results are juxtaposed, what demonstrates their consistency in this work. An effect of a systematic uncertainty treatment in the statistical analysis is also considered.
Optimization and statistical evaluation of dissolution tests for indinavir sulfate capsules.
Carvalho-Silva, B; Moreira-Campos, L M; Nunan, E A; Vianna-Soares, C D; Araujo-Alves, B L; Cesar, I C; Pianetti, G A
2004-11-01
An optimization, statistically based on t-student test, to set up dissolution test conditions for indinavir sulfate capsules is presented. Three dissolution media, including that reported in United States Pharmacopeial Forum, and two apparatus, paddle and basket, were applied. Two different indinavir sulfate capsules, products A and B, were evaluated. For a reliable statistical analysis eighteen capsules were assayed in each condition based on the combination of dissolution medium and apparatus. All tested media were statistically equivalent (P > 0.05) for both drug products when paddle apparatus was employed at the stirring speed of 50 rpm. The use of basket apparatus at the stirring speed of 50 rpm caused significant decrease in the drug release percent for the product B (P < 0.05). The best dissolution conditions tested, for products A and B, were applied to evaluate capsules dissolution profiles. Twelve dosage units were assayed and dissolution efficiency concept was used, for each condition, to obtain results with statistical significance (P > 0.05). Optimal conditions to carry out the dissolution test were 900 ml of 0.1 M hydrochloric acid as dissolution medium, basket at 100 rpm stirring speed and 260 nm ultraviolet detection.
Lee, Chaeyoung
2012-11-01
Epistasis that may explain a large portion of the phenotypic variation for complex economic traits of animals has been ignored in many genetic association studies. A Baysian method was introduced to draw inferences about multilocus genotypic effects based on their marginal posterior distributions by a Gibbs sampler. A simulation study was conducted to provide statistical powers under various unbalanced designs by using this method. Data were simulated by combined designs of number of loci, within genotype variance, and sample size in unbalanced designs with or without null combined genotype cells. Mean empirical statistical power was estimated for testing posterior mean estimate of combined genotype effect. A practical example for obtaining empirical statistical power estimates with a given sample size was provided under unbalanced designs. The empirical statistical powers would be useful for determining an optimal design when interactive associations of multiple loci with complex phenotypes were examined.
Testing independence of bivariate interval-censored data using modified Kendall's tau statistic.
Kim, Yuneung; Lim, Johan; Park, DoHwan
2015-11-01
In this paper, we study a nonparametric procedure to test independence of bivariate interval censored data; for both current status data (case 1 interval-censored data) and case 2 interval-censored data. To do it, we propose a score-based modification of the Kendall's tau statistic for bivariate interval-censored data. Our modification defines the Kendall's tau statistic with expected numbers of concordant and disconcordant pairs of data. The performance of the modified approach is illustrated by simulation studies and application to the AIDS study. We compare our method to alternative approaches such as the two-stage estimation method by Sun et al. (Scandinavian Journal of Statistics, 2006) and the multiple imputation method by Betensky and Finkelstein (Statistics in Medicine, 1999b).
Standard Errors and Confidence Intervals of Norm Statistics for Educational and Psychological Tests.
Oosterhuis, Hannah E M; van der Ark, L Andries; Sijtsma, Klaas
2016-11-14
Norm statistics allow for the interpretation of scores on psychological and educational tests, by relating the test score of an individual test taker to the test scores of individuals belonging to the same gender, age, or education groups, et cetera. Given the uncertainty due to sampling error, one would expect researchers to report standard errors for norm statistics. In practice, standard errors are seldom reported; they are either unavailable or derived under strong distributional assumptions that may not be realistic for test scores. We derived standard errors for four norm statistics (standard deviation, percentile ranks, stanine boundaries and Z-scores) under the mild assumption that the test scores are multinomially distributed. A simulation study showed that the standard errors were unbiased and that corresponding Wald-based confidence intervals had good coverage. Finally, we discuss the possibilities for applying the standard errors in practical test use in education and psychology. The procedure is provided via the R function check.norms, which is available in the mokken package.
NASA Technical Reports Server (NTRS)
Colvin, E. L.; Emptage, M. R.
1992-01-01
The breaking load test provides quantitative stress corrosion cracking data by determining the residual strength of tension specimens that have been exposed to corrosive environments. Eight laboratories have participated in a cooperative test program under the auspices of ASTM Committee G-1 to evaluate the new test method. All eight laboratories were able to distinguish between three tempers of aluminum alloy 7075. The statistical analysis procedures that were used in the test program do not work well in all situations. An alternative procedure using Box-Cox transformations shows a great deal of promise. An ASTM standard method has been drafted which incorporates the Box-Cox procedure.
Diagnosing Skills of Statistical Hypothesis Testing Using the Rule Space Method
ERIC Educational Resources Information Center
Im, Seongah; Yin, Yue
2009-01-01
This study illustrated the use of the Rule Space Method to diagnose students' proficiencies in, skills and knowledge of statistical hypothesis testing. Participants included 96 undergraduate and, graduate students, of whom 94 were classified into one or more of the knowledge states identified by, the rule space analysis. Analysis at the level of…
Alphas and Asterisks: The Development of Statistical Significance Testing Standards in Sociology
ERIC Educational Resources Information Center
Leahey, Erin
2005-01-01
In this paper, I trace the development of statistical significance testing standards in sociology by analyzing data from articles published in two prestigious sociology journals between 1935 and 2000. I focus on the role of two key elements in the diffusion literature, contagion and rationality, as well as the role of institutional factors. I…
ERIC Educational Resources Information Center
Ho, Andrew D.; Yu, Carol C.
2015-01-01
Many statistical analyses benefit from the assumption that unconditional or conditional distributions are continuous and normal. More than 50 years ago in this journal, Lord and Cook chronicled departures from normality in educational tests, and Micerri similarly showed that the normality assumption is met rarely in educational and psychological…
Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics.
Chen, Wenan; Larrabee, Beth R; Ovsyannikova, Inna G; Kennedy, Richard B; Haralambieva, Iana H; Poland, Gregory A; Schaid, Daniel J
2015-07-01
Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf.
A Short-Cut Statistic for Item Analysis of Mastery Tests: A Comparison of Three Procedures.
ERIC Educational Resources Information Center
Subkoviak, Michael J.; Harris, Deborah J.
This study examined three statistical methods for selecting items for mastery tests. One is the pretest-posttest method due to Cox and Vargas (1966); it is computationally simple, but has a number of serious limitations. The second is a latent trait method recommended by van der Linden (1981); it is computationally complex, but has a number of…
Identifying Local Dependence with a Score Test Statistic Based on the Bifactor Logistic Model
ERIC Educational Resources Information Center
Liu, Yang; Thissen, David
2012-01-01
Local dependence (LD) refers to the violation of the local independence assumption of most item response models. Statistics that indicate LD between a pair of items on a test or questionnaire that is being fitted with an item response model can play a useful diagnostic role in applications of item response theory. In this article, a new score test…
A statistical comparison of impact and ambient testing results from the Alamosa Canyon Bridge
Doebling, S.W.; Farrar, C.R.; Cornwell, P.
1996-12-31
In this paper, the modal properties of the Alamosa Canyon Bridge obtained using ambient data are compared to those obtained from impact hammer vibration tests. Using ambient sources of excitation to determine the modal characteristics of large civil engineering structures is desirable for several reasons. The forced vibration testing of such structures generally requires a large amount of specialized equipment and trained personnel making the tests quite expensive. Also, an automated health monitoring system for a large civil structure will most likely use ambient excitation. A modal identification procedure based on a statistical Monte Carlo analysis using the Eigensystem Realization Algorithm is used to compute the modal parameters and their statistics. The results show that for most of the measured modes, the differences between the modal frequencies of the ambient and hammer data sets are statistically significant. However, the differences between the corresponding damping ratio results are not statistically significant. Also, one of the modes identified from the hammer test data was not identifiable from the ambient data set.
Connecting Science and Mathematics: The Nature of Scientific and Statistical Hypothesis Testing
ERIC Educational Resources Information Center
Lawson, Anton E.; Oehrtman, Michael; Jensen, Jamie
2008-01-01
Confusion persists concerning the roles played by scientific hypotheses and predictions in doing science. This confusion extends to the nature of scientific and statistical hypothesis testing. The present paper utilizes the "If/and/then/Therefore" pattern of hypothetico-deductive (HD) reasoning to explicate the nature of both scientific and…
Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.
ERIC Educational Resources Information Center
Kieffer, Kevin M.; Thompson, Bruce
As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate…
Recent Literature on Whether Statistical Significance Tests Should or Should Not Be Banned.
ERIC Educational Resources Information Center
Deegear, James
This paper summarizes the literature regarding statistical significant testing with an emphasis on recent literature in various discipline and literature exploring why researchers have demonstrably failed to be influenced by the American Psychological Association publication manual's encouragement to report effect sizes. Also considered are…
A Critique of One-Tailed Hypothesis Test Procedures in Business and Economics Statistics Textbooks.
ERIC Educational Resources Information Center
Liu, Tung; Stone, Courtenay C.
1999-01-01
Surveys introductory business and economics statistics textbooks and finds that they differ over the best way to explain one-tailed hypothesis tests: the simple null-hypothesis approach or the composite null-hypothesis approach. Argues that the composite null-hypothesis approach contains methodological shortcomings that make it more difficult for…
An Algorithm to Improve Test Answer Copying Detection Using the Omega Statistic
ERIC Educational Resources Information Center
Maeda, Hotaka; Zhang, Bo
2017-01-01
The omega (?) statistic is reputed to be one of the best indices for detecting answer copying on multiple choice tests, but its performance relies on the accurate estimation of copier ability, which is challenging because responses from the copiers may have been contaminated. We propose an algorithm that aims to identify and delete the suspected…
Detecting trends in raptor counts: power and type I error rates of various statistical tests
Hatfield, J.S.; Gould, W.R.; Hoover, B.A.; Fuller, M.R.; Lindquist, E.L.
1996-01-01
We conducted simulations that estimated power and type I error rates of statistical tests for detecting trends in raptor population count data collected from a single monitoring site. Results of the simulations were used to help analyze count data of bald eagles (Haliaeetus leucocephalus) from 7 national forests in Michigan, Minnesota, and Wisconsin during 1980-1989. Seven statistical tests were evaluated, including simple linear regression on the log scale and linear regression with a permutation test. Using 1,000 replications each, we simulated n = 10 and n = 50 years of count data and trends ranging from -5 to 5% change/year. We evaluated the tests at 3 critical levels (alpha = 0.01, 0.05, and 0.10) for both upper- and lower-tailed tests. Exponential count data were simulated by adding sampling error with a coefficient of variation of 40% from either a log-normal or autocorrelated log-normal distribution. Not surprisingly, tests performed with 50 years of data were much more powerful than tests with 10 years of data. Positive autocorrelation inflated alpha-levels upward from their nominal levels, making the tests less conservative and more likely to reject the null hypothesis of no trend. Of the tests studied, Cox and Stuart's test and Pollard's test clearly had lower power than the others. Surprisingly, the linear regression t-test, Collins' linear regression permutation test, and the nonparametric Lehmann's and Mann's tests all had similar power in our simulations. Analyses of the count data suggested that bald eagles had increasing trends on at least 2 of the 7 national forests during 1980-1989.
Halpin, Peter F; Stam, Henderikus J
2006-01-01
The application of statistical testing in psychological research over the period of 1940-1960 is examined in order to address psychologists' reconciliation of the extant controversy between the Fisher and Neyman-Pearson approaches. Textbooks of psychological statistics and the psychological journal literature are reviewed to examine the presence of what Gigerenzer (1993) called a hybrid model of statistical testing. Such a model is present in the textbooks, although the mathematically incomplete character of this model precludes the appearance of a similarly hybridized approach to statistical testing in the research literature. The implications of this hybrid model for psychological research and the statistical testing controversy are discussed.
Reproducibility-optimized test statistic for ranking genes in microarray studies.
Elo, Laura L; Filén, Sanna; Lahesmaa, Riitta; Aittokallio, Tero
2008-01-01
A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. While previous studies on simulated or spike-in datasets do not provide practical guidance on how to choose the best method for a given real dataset, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene- anking statistic directly from the data. In comparison with existing ranking methods, the reproducibilityoptimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in dataset. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given dataset without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibilityoptimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.
Testing statistical self-similarity in the topology of river networks
NASA Astrophysics Data System (ADS)
Mantilla, Ricardo; Troutman, Brent M.; Gupta, Vijay K.
2010-09-01
Recent work has demonstrated that the topological properties of real river networks deviate significantly from predictions of Shreve's random model. At the same time the property of mean self-similarity postulated by Tokunaga's model is well supported by data. Recently, a new class of network model called random self-similar networks (RSN) that combines self-similarity and randomness has been introduced to replicate important topological features observed in real river networks. We investigate if the hypothesis of statistical self-similarity in the RSN model is supported by data on a set of 30 basins located across the continental United States that encompass a wide range of hydroclimatic variability. We demonstrate that the generators of the RSN model obey a geometric distribution, and self-similarity holds in a statistical sense in 26 of these 30 basins. The parameters describing the distribution of interior and exterior generators are tested to be statistically different and the difference is shown to produce the well-known Hack's law. The inter-basin variability of RSN parameters is found to be statistically significant. We also test generator dependence on two climatic indices, mean annual precipitation and radiative index of dryness. Some indication of climatic influence on the generators is detected, but this influence is not statistically significant with the sample size available. Finally, two key applications of the RSN model to hydrology and geomorphology are briefly discussed.
NASA Astrophysics Data System (ADS)
Coelho, Carlos A.; Marques, Filipe J.
2013-09-01
In this paper the authors combine the equicorrelation and equivariance test introduced by Wilks [13] with the likelihood ratio test (l.r.t.) for independence of groups of variables to obtain the l.r.t. of block equicorrelation and equivariance. This test or its single block version may find applications in many areas as in psychology, education, medicine, genetics and they are important "in many tests of multivariate analysis, e.g. in MANOVA, Profile Analysis, Growth Curve analysis, etc" [12, 9]. By decomposing the overall hypothesis into the hypotheses of independence of groups of variables and the hypothesis of equicorrelation and equivariance we are able to obtain the expressions for the overall l.r.t. statistic and its moments. From these we obtain a suitable factorization of the characteristic function (c.f.) of the logarithm of the l.r.t. statistic, which enables us to develop highly manageable and precise near-exact distributions for the test statistic.
Obuchowski, Nancy A; Buckler, Andrew; Kinahan, Paul; Chen-Mayer, Heather; Petrick, Nicholas; Barboriak, Daniel P; Bullen, Jennifer; Barnhart, Huiman; Sullivan, Daniel C
2016-04-01
A major initiative of the Quantitative Imaging Biomarker Alliance is to develop standards-based documents called "Profiles," which describe one or more technical performance claims for a given imaging modality. The term "actor" denotes any entity (device, software, or person) whose performance must meet certain specifications for the claim to be met. The objective of this paper is to present the statistical issues in testing actors' conformance with the specifications. In particular, we present the general rationale and interpretation of the claims, the minimum requirements for testing whether an actor achieves the performance requirements, the study designs used for testing conformity, and the statistical analysis plan. We use three examples to illustrate the process: apparent diffusion coefficient in solid tumors measured by MRI, change in Perc 15 as a biomarker for the progression of emphysema, and percent change in solid tumor volume by computed tomography as a biomarker for lung cancer progression.
A Test by Any Other Name: P Values, Bayes Factors, and Statistical Inference.
Stern, Hal S
2016-01-01
Procedures used for statistical inference are receiving increased scrutiny as the scientific community studies the factors associated with insuring reproducible research. This note addresses recent negative attention directed at p values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye toward better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required.
2011-04-30
Commander, Naval Sea Systems Command • Army Contracting Command, U.S. Army Materiel Command • Program Manager, Airborne, Maritime and Fixed Station...are in the area of the Design and Acquisition of Military Assets. Specific domains of interests include the concept of value and its integration...inference may point to areas where the test may be modified or additional control measures may be introduced to increase the likelihood of obtaining
Schoenberg, Mike R; Dawson, Kyra A; Duff, Kevin; Patton, Doyle; Scott, James G; Adams, Russell L
2006-10-01
The Rey Auditory Verbal Learning Test [RAVLT; Rey, A. (1941). L'examen psychologique dans les cas d'encéphalopathie traumatique. Archives de Psychologie, 28, 21] is a commonly used neuropsychological measure that assesses verbal learning and memory. Normative data have been compiled [Schmidt, M. (1996). Rey Auditory and Verbal Learning Test: A handbook. Los Angeles, CA: Western Psychological Services]. When assessing an individual suspected of neurological dysfunction, useful comparisons include the extent that the patient deviates from healthy peers and also how closely the subject's performance matches those with known brain injury. This study provides the means and S.D.'s of 392 individuals with documented neurological dysfunction [closed head TBI (n=68), neoplasms (n=57), stroke (n=47), Dementia of the Alzheimer's type (n=158), and presurgical epilepsy left seizure focus (n=28), presurgical epilepsy right seizure focus (n=34)] and 122 patients with no known neurological dysfunction and psychiatric complaints. Patients were stratified into three age groups, 16-35, 36-59, and 60-88. Data were provided for trials I-V, List B, immediate recall, 30-min delayed recall, and recognition. Classification characteristics of the RAVLT using [Schmidt, M. (1996). Rey Auditory and Verbal Learning Test: A handbook. Los Angeles, CA: Western Psychological Services] meta-norms found the RAVLT to best distinguish patients suspected of Alzheimer's disease from the psychiatric comparison group.
An Application of M[subscript 2] Statistic to Evaluate the Fit of Cognitive Diagnostic Models
ERIC Educational Resources Information Center
Liu, Yanlou; Tian, Wei; Xin, Tao
2016-01-01
The fit of cognitive diagnostic models (CDMs) to response data needs to be evaluated, since CDMs might yield misleading results when they do not fit the data well. Limited-information statistic M[subscript 2] and the associated root mean square error of approximation (RMSEA[subscript 2]) in item factor analysis were extended to evaluate the fit of…
An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics.
Kim, Junghi; Bai, Yun; Pan, Wei
2015-12-01
We study the problem of testing for single marker-multiple phenotype associations based on genome-wide association study (GWAS) summary statistics without access to individual-level genotype and phenotype data. For most published GWASs, because obtaining summary data is substantially easier than accessing individual-level phenotype and genotype data, while often multiple correlated traits have been collected, the problem studied here has become increasingly important. We propose a powerful adaptive test and compare its performance with some existing tests. We illustrate its applications to analyses of a meta-analyzed GWAS dataset with three blood lipid traits and another with sex-stratified anthropometric traits, and further demonstrate its potential power gain over some existing methods through realistic simulation studies. We start from the situation with only one set of (possibly meta-analyzed) genome-wide summary statistics, then extend the method to meta-analysis of multiple sets of genome-wide summary statistics, each from one GWAS. We expect the proposed test to be useful in practice as more powerful than or complementary to existing methods.
Score statistic to test for genetic correlation for proband-family design.
el Galta, R; van Duijn, C M; van Houwelingen, J C; Houwing-Duistermaat, J J
2005-07-01
In genetic epidemiological studies informative families are often oversampled to increase the power of a study. For a proband-family design, where relatives of probands are sampled, we derive the score statistic to test for clustering of binary and quantitative traits within families due to genetic factors. The derived score statistic is robust to ascertainment scheme. We considered correlation due to unspecified genetic effects and/or due to sharing alleles identical by descent (IBD) at observed marker locations in a candidate region. A simulation study was carried out to study the distribution of the statistic under the null hypothesis in small data-sets. To illustrate the score statistic, data from 33 families with type 2 diabetes mellitus (DM2) were analyzed. In addition to the binary outcome DM2 we also analyzed the quantitative outcome, body mass index (BMI). For both traits familial aggregation was highly significant. For DM2, also including IBD sharing at marker D3S3681 as a cause of correlation gave an even more significant result, which suggests the presence of a trait gene linked to this marker. We conclude that for the proband-family design the score statistic is a powerful and robust tool for detecting clustering of outcomes.
Ahn, Soyeon; Park, Seong Ho; Lee, Kyoung Ho
2013-05-01
Demonstrating similarity between compared groups--that is, equivalence or noninferiority of the outcome of one group to the outcome of another group--requires a different analytic approach than determining the difference between groups--that is, superiority of one group over another. Neither a statistically significant difference between groups (P < .05) nor a lack of significant difference (P ≥ .05) from conventional statistical tests provides answers about equivalence/noninferiority. Statistical testing of equivalence/noninferiority generally uses a confidence interval, where equivalence/noninferiority is claimed when the confidence interval of the difference in outcome between compared groups is within a predetermined equivalence/noninferiority margin that represents a clinically or scientifically acceptable range of differences and is typically described by Δ. The equivalence/noninferiority margin should be justified both clinically and statistically, considering the loss in the main outcome and the compensatory gain, and be chosen conservatively to avoid making a false claim of equivalence/noninferiority for an inferior outcome. Sample size estimation needs to be specified for equivalence/noninferiority design, considering Δ in addition to other general factors. The need for equivalence/noninferiority research studies is expected to increase in radiology, and a good understanding of the fundamental principles of the methodology will be helpful for conducting as well as for interpreting such studies.
Jenkinson, Garrett; Goutsias, John
2013-05-28
The master equation is used extensively to model chemical reaction systems with stochastic dynamics. However, and despite its phenomenological simplicity, it is not in general possible to compute the solution of this equation. Drawing exact samples from the master equation is possible, but can be computationally demanding, especially when estimating high-order statistical summaries or joint probability distributions. As a consequence, one often relies on analytical approximations to the solution of the master equation or on computational techniques that draw approximative samples from this equation. Unfortunately, it is not in general possible to check whether a particular approximation scheme is valid. The main objective of this paper is to develop an effective methodology to address this problem based on statistical hypothesis testing. By drawing a moderate number of samples from the master equation, the proposed techniques use the well-known Kolmogorov-Smirnov statistic to reject the validity of a given approximation method or accept it with a certain level of confidence. Our approach is general enough to deal with any master equation and can be used to test the validity of any analytical approximation method or any approximative sampling technique of interest. A number of examples, based on the Schlögl model of chemistry and the SIR model of epidemiology, clearly illustrate the effectiveness and potential of the proposed statistical framework.
ERIC Educational Resources Information Center
Huberty, Carl J.
1993-01-01
Twenty-eight books published from 1910 to 1949, 19 books published from 1990 to 1992, and 5 multiple edition books were reviewed to examine the presentations of statistical testing, particularly coverage of the p-value and fixed-alpha approaches. Statistical testing itself is not at fault, but some textbook presentations, testing practices, and…
Taroni, F; Biedermann, A; Bozza, S
2016-02-01
Many people regard the concept of hypothesis testing as fundamental to inferential statistics. Various schools of thought, in particular frequentist and Bayesian, have promoted radically different solutions for taking a decision about the plausibility of competing hypotheses. Comprehensive philosophical comparisons about their advantages and drawbacks are widely available and continue to span over large debates in the literature. More recently, controversial discussion was initiated by an editorial decision of a scientific journal [1] to refuse any paper submitted for publication containing null hypothesis testing procedures. Since the large majority of papers published in forensic journals propose the evaluation of statistical evidence based on the so called p-values, it is of interest to expose the discussion of this journal's decision within the forensic science community. This paper aims to provide forensic science researchers with a primer on the main concepts and their implications for making informed methodological choices.
2007-11-02
Software Encryption. Dec. 1993. Cambridge, U.K.: R. Anderson, pp. 185-190. [2] Donald E. Knuth , The Art of Computer Programming. Vol 2: Seminumer- ical...two statistical tests for keystream sequences,” Electronics Letters. 23, pp. 365-366. [3] D. E. Knuth (1998), The Art of Computer Programming. Vol. 2...combination of the linearly independent m-bit vectors. Maple An interactive computer algebra system that provides a complete mathematical environment for the
Statistical study of thermal fracture of ceramic materials in the water quench test
NASA Technical Reports Server (NTRS)
Rogers, Wayne P.; Emery, Ashley F.; Bradt, Richard C.; Kobayashi, Albert S.
1987-01-01
The Weibull statistical theory of fracture was applied to thermal shock of ceramics in the water quench test. Transient thermal stresses and probability of failure were calculated for a cylindrical specimen cooled by convection. The convective heat transfer coefficient was calibrated using the time to failure which was measured with an acoustic emission technique. Theoretical failure probability distributions as a function of time and quench temperature compare favorably with experimental results for three high-alumina ceramics and a glass.
Statistical analysis of the hen's egg test for micronucleus induction (HET-MN assay).
Hothorn, Ludwig A; Reisinger, Kerstin; Wolf, Thorsten; Poth, Albrecht; Fieblinger, Dagmar; Liebsch, Manfred; Pirow, Ralph
2013-09-18
The HET-MN assay (hen's egg test for micronucleus induction) is different from other in vitro genotoxicity assays in that it includes toxicologically important features such as absorption, distribution, metabolic activation, and excretion of the test compound. As a promising follow-up to complement existing in vitro test batteries for genotoxicity, the HET-MN is currently undergoing a formal validation. To optimize the validation, the present study describes a critical analysis of previously obtained HET-MN data to check the experimental design and to identify the most appropriate statistical procedure to evaluate treatment effects. Six statistical challenges (I-VI) of general relevance were identified, and remedies were provided which can be transferred to similarly designed test methods: a Williams-type trend test is proposed for overdispersed counts (II) by means of a square-root transformation which is robust for small sample sizes (I), variance heterogeneity (III), and possible downturn effects at high doses (IV). Due to near-to-zero or even zero-count data occurring in the negative control (V), a conditional comparison of the treatment groups against the mean of the historical controls (VI) instead of the concurrent control was proposed, which is in accordance with US-FDA recommendations. For the modified Williams-type tests, the power can be estimated depending on the magnitude and shape of the trend, the number of dose groups, and the magnitude of the MN counts in the negative control. The experimental design used previously (i.e. six eggs per dose group, scoring of 1000 cells per egg) was confirmed. The proposed approaches are easily available in the statistical computing environment R, and the corresponding R-codes are provided.
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Statistical correlation analysis for comparing vibration data from test and analysis
NASA Technical Reports Server (NTRS)
Butler, T. G.; Strang, R. F.; Purves, L. R.; Hershfeld, D. J.
1986-01-01
A theory was developed to compare vibration modes obtained by NASTRAN analysis with those obtained experimentally. Because many more analytical modes can be obtained than experimental modes, the analytical set was treated as expansion functions for putting both sources in comparative form. The dimensional symmetry was developed for three general cases: nonsymmetric whole model compared with a nonsymmetric whole structural test, symmetric analytical portion compared with a symmetric experimental portion, and analytical symmetric portion with a whole experimental test. The theory was coded and a statistical correlation program was installed as a utility. The theory is established with small classical structures.
Case Studies for the Statistical Design of Experiments Applied to Powered Rotor Wind Tunnel Tests
NASA Technical Reports Server (NTRS)
Overmeyer, Austin D.; Tanner, Philip E.; Martin, Preston B.; Commo, Sean A.
2015-01-01
The application of statistical Design of Experiments (DOE) to helicopter wind tunnel testing was explored during two powered rotor wind tunnel entries during the summers of 2012 and 2013. These tests were performed jointly by the U.S. Army Aviation Development Directorate Joint Research Program Office and NASA Rotary Wing Project Office, currently the Revolutionary Vertical Lift Project, at NASA Langley Research Center located in Hampton, Virginia. Both entries were conducted in the 14- by 22-Foot Subsonic Tunnel with a small portion of the overall tests devoted to developing case studies of the DOE approach as it applies to powered rotor testing. A 16-47 times reduction in the number of data points required was estimated by comparing the DOE approach to conventional testing methods. The average error for the DOE surface response model for the OH-58F test was 0.95 percent and 4.06 percent for drag and download, respectively. The DOE surface response model of the Active Flow Control test captured the drag within 4.1 percent of measured data. The operational differences between the two testing approaches are identified, but did not prevent the safe operation of the powered rotor model throughout the DOE test matrices.
Statistics, Handle with Care: Detecting Multiple Model Components with the Likelihood Ratio Test
NASA Astrophysics Data System (ADS)
Protassov, Rostislav; van Dyk, David A.; Connors, Alanna; Kashyap, Vinay L.; Siemiginowska, Aneta
2002-05-01
The likelihood ratio test (LRT) and the related F-test, popularized in astrophysics by Eadie and coworkers in 1971, Bevington in 1969, Lampton, Margon, & Bowyer, in 1976, Cash in 1979, and Avni in 1978, do not (even asymptotically) adhere to their nominal χ2 and F-distributions in many statistical tests common in astrophysics, thereby casting many marginal line or source detections and nondetections into doubt. Although the above authors illustrate the many legitimate uses of these statistics, in some important cases it can be impossible to compute the correct false positive rate. For example, it has become common practice to use the LRT or the F-test to detect a line in a spectral model or a source above background despite the lack of certain required regularity conditions. (These applications were not originally suggested by Cash or by Bevington.) In these and other settings that involve testing a hypothesis that is on the boundary of the parameter space, contrary to common practice, the nominal χ2 distribution for the LRT or the F-distribution for the F-test should not be used. In this paper, we characterize an important class of problems in which the LRT and the F-test fail and illustrate this nonstandard behavior. We briefly sketch several possible acceptable alternatives, focusing on Bayesian posterior predictive probability values. We present this method in some detail since it is a simple, robust, and intuitive approach. This alternative method is illustrated using the gamma-ray burst of 1997 May 8 (GRB 970508) to investigate the presence of an Fe K emission line during the initial phase of the observation. There are many legitimate uses of the LRT and the F-test in astrophysics, and even when these tests are inappropriate, there remain several statistical alternatives (e.g., judicious use of error bars and Bayes factors). Nevertheless, there are numerous cases of the inappropriate use of the LRT and similar tests in the literature, bringing substantive
Coulson, Melissa; Healey, Michelle; Fidler, Fiona; Cumming, Geoff
2010-01-01
A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST), or confidence intervals (CIs). Authors of articles published in psychology, behavioral neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST.
Coulson, Melissa; Healey, Michelle; Fidler, Fiona; Cumming, Geoff
2010-01-01
A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST), or confidence intervals (CIs). Authors of articles published in psychology, behavioral neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST. PMID:21607077
Gershgorin, B.; Majda, A.J.
2011-02-20
A statistically exactly solvable model for passive tracers is introduced as a test model for the authors' Nonlinear Extended Kalman Filter (NEKF) as well as other filtering algorithms. The model involves a Gaussian velocity field and a passive tracer governed by the advection-diffusion equation with an imposed mean gradient. The model has direct relevance to engineering problems such as the spread of pollutants in the air or contaminants in the water as well as climate change problems concerning the transport of greenhouse gases such as carbon dioxide with strongly intermittent probability distributions consistent with the actual observations of the atmosphere. One of the attractive properties of the model is the existence of the exact statistical solution. In particular, this unique feature of the model provides an opportunity to design and test fast and efficient algorithms for real-time data assimilation based on rigorous mathematical theory for a turbulence model problem with many active spatiotemporal scales. Here, we extensively study the performance of the NEKF which uses the exact first and second order nonlinear statistics without any approximations due to linearization. The role of partial and sparse observations, the frequency of observations and the observation noise strength in recovering the true signal, its spectrum, and fat tail probability distribution are the central issues discussed here. The results of our study provide useful guidelines for filtering realistic turbulent systems with passive tracers through partial observations.
Statistical Tests of System Linearity Based on the Method of Surrogate Data
Hunter, N.; Paez, T.; Red-Horse, J.
1998-11-04
When dealing with measured data from dynamic systems we often make the tacit assumption that the data are generated by linear dynamics. While some systematic tests for linearity and determinism are available - for example the coherence fimction, the probability density fimction, and the bispectrum - fi,u-ther tests that quanti$ the existence and the degree of nonlinearity are clearly needed. In this paper we demonstrate a statistical test for the nonlinearity exhibited by a dynamic system excited by Gaussian random noise. We perform the usual division of the input and response time series data into blocks as required by the Welch method of spectrum estimation and search for significant relationships between a given input fkequency and response at harmonics of the selected input frequency. We argue that systematic tests based on the recently developed statistical method of surrogate data readily detect significant nonlinear relationships. The paper elucidates the method of surrogate data. Typical results are illustrated for a linear single degree-of-freedom system and for a system with polynomial stiffness nonlinearity.
Feyissa, Daniel D.; Aher, Yogesh D.; Engidawork, Ephrem; Höger, Harald; Lubec, Gert; Korz, Volker
2017-01-01
Animal models for anxiety, depressive-like and cognitive diseases or aging often involve testing of subjects in behavioral test batteries. The large number of test variables with different mean variations and within and between test correlations often constitute a significant problem in determining essential variables to assess behavioral patterns and their variation in individual animals as well as appropriate statistical treatment. Therefore, we applied a multivariate approach (principal component analysis) to analyse the behavioral data of 162 male adult Sprague-Dawley rats that underwent a behavioral test battery including commonly used tests for spatial learning and memory (holeboard) and different behavioral patterns (open field, elevated plus maze, forced swim test) as well as for motor abilities (Rota rod). The high dimensional behavioral results were reduced to fewer components associated with spatial cognition, general activity, anxiety-, and depression-like behavior and motor ability. The loading scores of individual rats on these different components allow an assessment and the distribution of individual features in a population of animals. The reduced number of components can be used also for statistical calculations like appropriate sample sizes for valid discriminations between experimental groups, which otherwise have to be done on each variable. Because the animals were intact, untreated and experimentally naïve the results reflect trait patterns of behavior and thus individuality. The distribution of animals with high or low levels of anxiety, depressive-like behavior, general activity and cognitive features in a local population provides information of the probability of their appeareance in experimental samples and thus may help to avoid biases. However, such an analysis initially requires a large cohort of animals in order to gain a valid assessment. PMID:28261069
Feyissa, Daniel D; Aher, Yogesh D; Engidawork, Ephrem; Höger, Harald; Lubec, Gert; Korz, Volker
2017-01-01
Animal models for anxiety, depressive-like and cognitive diseases or aging often involve testing of subjects in behavioral test batteries. The large number of test variables with different mean variations and within and between test correlations often constitute a significant problem in determining essential variables to assess behavioral patterns and their variation in individual animals as well as appropriate statistical treatment. Therefore, we applied a multivariate approach (principal component analysis) to analyse the behavioral data of 162 male adult Sprague-Dawley rats that underwent a behavioral test battery including commonly used tests for spatial learning and memory (holeboard) and different behavioral patterns (open field, elevated plus maze, forced swim test) as well as for motor abilities (Rota rod). The high dimensional behavioral results were reduced to fewer components associated with spatial cognition, general activity, anxiety-, and depression-like behavior and motor ability. The loading scores of individual rats on these different components allow an assessment and the distribution of individual features in a population of animals. The reduced number of components can be used also for statistical calculations like appropriate sample sizes for valid discriminations between experimental groups, which otherwise have to be done on each variable. Because the animals were intact, untreated and experimentally naïve the results reflect trait patterns of behavior and thus individuality. The distribution of animals with high or low levels of anxiety, depressive-like behavior, general activity and cognitive features in a local population provides information of the probability of their appeareance in experimental samples and thus may help to avoid biases. However, such an analysis initially requires a large cohort of animals in order to gain a valid assessment.
A statistical test for the equality of differently adjusted incidence rate ratios.
Hoffmann, Kurt; Pischon, Tobias; Schulz, Mandy; Schulze, Matthias B; Ray, Jennifer; Boeing, Heiner
2008-03-01
An incidence rate ratio (IRR) is a meaningful effect measure in epidemiology if it is adjusted for all important confounders. For evaluation of the impact of adjustment, adjusted IRRs should be compared with crude IRRs. The aim of this methodological study was to present a statistical approach for testing the equality of adjusted and crude IRRs and to derive a confidence interval for the ratio of the two IRRs. The method can be extended to compare two differently adjusted IRRs and, thus, to evaluate the effect of additional adjustment. The method runs immediately on existing software. To illustrate the application of this approach, the authors studied adjusted IRRs for two risk factors of type 2 diabetes using data from the European Prospective Investigation into Cancer and Nutrition-Potsdam Study from 2005. The statistical method described may be helpful as an additional tool for analyzing epidemiologic cohort data and for interpreting results obtained from Cox regression models with adjustment for different covariates.
A Test By Any Other Name: P-values, Bayes Factors and Statistical Inference
Stern, Hal S.
2016-01-01
The exchange between Hoitjink, van Kooten and Hulsker (in press) (HKH) and Morey, Wagenmakers, and Rouder (in press) (MWR) in this issue is focused on the use of Bayes factors for statistical inference but raises a number of more general questions about Bayesian and frequentist approaches to inference. This note addresses recent negative attention directed at p-values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye towards better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required. PMID:26881954
Statistical auditing and randomness test of lotto k/N-type games
NASA Astrophysics Data System (ADS)
Coronel-Brizio, H. F.; Hernández-Montoya, A. R.; Rapallo, F.; Scalas, E.
2008-11-01
One of the most popular lottery games worldwide is the so-called “lotto k/N”. It considers N numbers 1,2,…,N from which k are drawn randomly, without replacement. A player selects k or more numbers and the first prize is shared amongst those players whose selected numbers match all of the k randomly drawn. Exact rules may vary in different countries. In this paper, mean values and covariances for the random variables representing the numbers drawn from this kind of game are presented, with the aim of using them to audit statistically the consistency of a given sample of historical results with theoretical values coming from a hypergeometric statistical model. The method can be adapted to test pseudorandom number generators.
Statistical analysis of aquifer-test results for nine regional aquifers in Louisiana
Martin, Angel; Early, D.A.
1987-01-01
This report, prepared as part of the Gulf Coast Regional Aquifer-System Analysis project, presents a compilation, summarization, and statistical analysis of aquifer-test results for nine regional aquifers in Louisiana. These are from youngest to oldest: The alluvial, Pleistocene, Evangeline, Jasper, Catahoula, Cockfield, Sparta, Carrizo, and Wilcox aquifers. Approximately 1,500 aquifer tests in U.S. Geological Survey files in Louisiana were examined and 1,001 were input to a computer file. Analysis of the aquifer test results and plots that describe aquifer hydraulic characteristics were made for each regional aquifer. Results indicate that, on the average, permeability (hydraulic conductivity) generally tends to decrease from the youngest aquifers to the oldest. The most permeable aquifers in Louisiana are the alluvial and Pleistocene aquifers; whereas, the least permeable are the Carrizo and Wilcox aquifers. (Author 's abstract)
2010-01-01
Background The null hypothesis significance test (NHST) is the most frequently used statistical method, although its inferential validity has been widely criticized since its introduction. In 1988, the International Committee of Medical Journal Editors (ICMJE) warned against sole reliance on NHST to substantiate study conclusions and suggested supplementary use of confidence intervals (CI). Our objective was to evaluate the extent and quality in the use of NHST and CI, both in English and Spanish language biomedical publications between 1995 and 2006, taking into account the International Committee of Medical Journal Editors recommendations, with particular focus on the accuracy of the interpretation of statistical significance and the validity of conclusions. Methods Original articles published in three English and three Spanish biomedical journals in three fields (General Medicine, Clinical Specialties and Epidemiology - Public Health) were considered for this study. Papers published in 1995-1996, 2000-2001, and 2005-2006 were selected through a systematic sampling method. After excluding the purely descriptive and theoretical articles, analytic studies were evaluated for their use of NHST with P-values and/or CI for interpretation of statistical "significance" and "relevance" in study conclusions. Results Among 1,043 original papers, 874 were selected for detailed review. The exclusive use of P-values was less frequent in English language publications as well as in Public Health journals; overall such use decreased from 41% in 1995-1996 to 21% in 2005-2006. While the use of CI increased over time, the "significance fallacy" (to equate statistical and substantive significance) appeared very often, mainly in journals devoted to clinical specialties (81%). In papers originally written in English and Spanish, 15% and 10%, respectively, mentioned statistical significance in their conclusions. Conclusions Overall, results of our review show some improvements in
A new method to test shear wave splitting: Improving statistical assessment of splitting parameters
NASA Astrophysics Data System (ADS)
Corbalan Castejon, Ana
Shear wave splitting has proved to be a very useful technique to probe for seismic anisotropy in the earth's interior, and measurements of seismic anisotropy are perhaps the best way to constrain the strain history of the lithosphere and asthenosphere. However, existent methods of shear wave splitting analysis do not estimate uncertainty correctly, and do not allow for careful statistical modeling of anisotropy and uncertainty in complex scenarios. Consequently, the interpretation of shear wave splitting measurements has an undesirable subjective component. This study illustrates a new method to characterize shear wave splitting and the associated uncertainty based on the cross-convolution method [Menke and Levin, 2003]. This new method has been tested on synthetic data and benchmarked with data from the Pasadena, California seismic station (PAS). Synthetic tests show that the method can successfully obtain the splitting parameters from observed split shear waves. PAS results are very reasonable and consistent with previous studies [Liu et al., 1995; Ozalaybey and Savage, 1995; Polet and Kanamori, 2002]. As presented, the Menke and Levin [2003] method does not explicitly model the errors. Our method works on noisy data without any particular need for processing, it fully accounts for correlation structures on the noise, and it models the errors with a proper bootstrapping approach. Hence, the method presented here casts the analysis of shear wave splitting into a more formal statistical context, allowing for formal hypothesis testing and more nuanced interpretation of seismic anisotropy results.
Drug-excipient compatibility testing using a high-throughput approach and statistical design.
Wyttenbach, Nicole; Birringer, Christian; Alsenz, Jochem; Kuentz, Martin
2005-01-01
The aim of our research was to develop a miniaturized high throughput drug-excipient compatibility test. Experiments were planned and evaluated using statistical experimental design. Binary mixtures of a drug, acetylsalicylic acid, or fluoxetine hydrochloride, and of excipients commonly used in solid dosage forms were prepared at a ratio of approximately 1:100 in 96-well microtiter plates. Samples were exposed to different temperature (40 degrees C/ 50 degrees C) and humidity (10%/75%) for different time (1 week/4 weeks), and chemical drug degradation was analyzed using a fast gradient high pressure liquid chromatography (HPLC). Categorical statistical design was applied to identify the effects and interactions of time, temperature, humidity, and excipient on drug degradation. Acetylsalicylic acid was least stable in the presence of magnesium stearate, dibasic calcium phosphate, or sodium starch glycolate. Fluoxetine hydrochloride exhibited a marked degradation only with lactose. Factor-interaction plots revealed that the relative humidity had the strongest effect on the drug excipient blends tested. In conclusion, the developed technique enables fast drug-excipient compatibility testing and identification of interactions. Since only 0.1 mg of drug is needed per data point, fast rational preselection of the pharmaceutical additives can be performed early in solid dosage form development.
Statistical tests of a periodicity hypothesis for crater formation rate - II
NASA Astrophysics Data System (ADS)
Yabushita, S.
1996-04-01
A statistical test is made of the periodicity hypothesis for crater formation rate, using a new data set compiled by Grieve. The criterion adopted is that of Broadbent, modified so as to take into account the loss of craters with time. Small craters (diameters <=2 km) are highly concentrated near the recent epoch, and are not adequate as a data set for testing. Various subsets of the original data are subjected to the test and a period close to 30 Myr is detected. On the assumption of random distribution of crater ages, the probability of detecting such a period is calculated at 50, 73 and 64 per cent respectively for craters with
Chu, Tsong-Lun; Varuttamaseni, Athi; Baek, Joo-Seok
2016-11-01
The U.S. Nuclear Regulatory Commission (NRC) encourages the use of probabilistic risk assessment (PRA) technology in all regulatory matters, to the extent supported by the state-of-the-art in PRA methods and data. Although much has been accomplished in the area of risk-informed regulation, risk assessment for digital systems has not been fully developed. The NRC established a plan for research on digital systems to identify and develop methods, analytical tools, and regulatory guidance for (1) including models of digital systems in the PRAs of nuclear power plants (NPPs), and (2) incorporating digital systems in the NRC's risk-informed licensing and oversight activities. Under NRC's sponsorship, Brookhaven National Laboratory (BNL) explored approaches for addressing the failures of digital instrumentation and control (I and C) systems in the current NPP PRA framework. Specific areas investigated included PRA modeling digital hardware, development of a philosophical basis for defining software failure, and identification of desirable attributes of quantitative software reliability methods. Based on the earlier research, statistical testing is considered a promising method for quantifying software reliability. This paper describes a statistical software testing approach for quantifying software reliability and applies it to the loop-operating control system (LOCS) of an experimental loop of the Advanced Test Reactor (ATR) at Idaho National Laboratory (INL).
A statistical test of the unified model of active galactic nuclei
NASA Astrophysics Data System (ADS)
Hong, Xiao-yu; Wan, Tong-shan
1995-02-01
A statistical test is carried out on the AGN unified model using a sample of superluminal sources. For different classes of source the distribution of R, the luminosity ration between the core and extended region, and the mean angle overlineφ between the jet and the line of sight were evaluated. Correlations among R, the Lorentz factor γ, the projected size of the jet d and the linear size of the source l were examined. It was found that there is anticorrelation between R and d, and correlation between γ and l. These results are favorable to the orientation interpretation of the unified model of the AGN.
Hickey, Graeme L; Kefford, Ben J; Dunlop, Jason E; Craig, Peter S
2008-11-01
Species sensitivity distributions (SSDs) may accurately predict the proportion of species in a community that are at hazard from environmental contaminants only if they contain sensitivity data from a large sample of species representative of the mix of species present in the locality or habitat of interest. With current widely accepted ecotoxicological methods, however, this rarely occurs. Two recent suggestions address this problem. First, use rapid toxicity tests, which are less rigorous than conventional tests, to approximate experimentally the sensitivity of many species quickly and in approximate proportion to naturally occurring communities. Second, use expert judgements regarding the sensitivity of higher taxonomic groups (e.g., orders) and Bayesian statistical methods to construct SSDs that reflect the richness (or perceived importance) of these groups. Here, we describe and analyze several models from a Bayesian perspective to construct SSDs from data derived using rapid toxicity testing, combining both rapid test data and expert opinion. We compare these new models with two frequentist approaches, Kaplan-Meier and a log-normal distribution, using a large data set on the salinity sensitivity of freshwater macroinvertebrates from Victoria (Australia). The frequentist log-normal analysis produced a SSD that overestimated the hazard to species relative to the Kaplan-Meier and Bayesian analyses. Of the Bayesian analyses investigated, the introduction of a weighting factor to account for the richness (or importance) of taxonomic groups influenced the calculated hazard to species. Furthermore, Bayesian methods allowed us to determine credible intervals representing SSD uncertainty. We recommend that rapid tests, expert judgements, and novel Bayesian statistical methods be used so that SSDs reflect communities of organisms found in nature.
Zhang, Kai; Traskin, Mikhail; Small, Dylan S
2012-03-01
For group-randomized trials, randomization inference based on rank statistics provides robust, exact inference against nonnormal distributions. However, in a matched-pair design, the currently available rank-based statistics lose significant power compared to normal linear mixed model (LMM) test statistics when the LMM is true. In this article, we investigate and develop an optimal test statistic over all statistics in the form of the weighted sum of signed Mann-Whitney-Wilcoxon statistics under certain assumptions. This test is almost as powerful as the LMM even when the LMM is true, but it is much more powerful for heavy tailed distributions. A simulation study is conducted to examine the power.
Statistical Testing of Dynamically Downscaled Rainfall Data for the East Coast of Australia
NASA Astrophysics Data System (ADS)
Parana Manage, Nadeeka; Lockart, Natalie; Willgoose, Garry; Kuczera, George
2015-04-01
This study performs a validation of statistical properties of downscaled climate data, concentrating on the rainfall which is required for hydrology predictions used in reservoir simulations. The data sets used in this study have been produced by the NARCliM (NSW/ACT Regional Climate Modelling) project which provides a dynamically downscaled climate dataset for South-East Australia at 10km resolution. NARCliM has used three configurations of the Weather Research Forecasting Regional Climate Model and four different GCMs (MIROC-medres 3.2, ECHAM5, CCCMA 3.1 and CSIRO mk3.0) from CMIP3 to perform twelve ensembles of simulations for current and future climates. Additionally to the GCM-driven simulations, three control run simulations driven by the NCEP/NCAR reanalysis for the entire period of 1950-2009 has also been performed by the project. The validation has been performed in the Upper Hunter region of Australia which is a semi-arid to arid region 200 kilometres North-West of Sydney. The analysis used the time series of downscaled rainfall data and ground based measurements for selected Bureau of Meteorology rainfall stations within the study area. The initial testing of the gridded rainfall was focused on the autoregressive characteristics of time series because the reservoir performance depends on long-term average runoffs. A correlation analysis was performed for fortnightly, monthly and annual averaged time resolutions showing a good statistical match between reanalysis and ground truth. The spatial variation of the statistics of gridded rainfall series were calculated and plotted at the catchment scale. The spatial correlation analysis shows a poor agreement between NARCliM data and ground truth at each time resolution. However, the spatial variability plots show a strong link between the statistics and orography at the catchment scale.
Sassenhagen, Jona; Alday, Phillip M
2016-11-01
Experimental research on behavior and cognition frequently rests on stimulus or subject selection where not all characteristics can be fully controlled, even when attempting strict matching. For example, when contrasting patients to controls, variables such as intelligence or socioeconomic status are often correlated with patient status. Similarly, when presenting word stimuli, variables such as word frequency are often correlated with primary variables of interest. One procedure very commonly employed to control for such nuisance effects is conducting inferential tests on confounding stimulus or subject characteristics. For example, if word length is not significantly different for two stimulus sets, they are considered as matched for word length. Such a test has high error rates and is conceptually misguided. It reflects a common misunderstanding of statistical tests: interpreting significance not to refer to inference about a particular population parameter, but about 1. the sample in question, 2. the practical relevance of a sample difference (so that a nonsignificant test is taken to indicate evidence for the absence of relevant differences). We show inferential testing for assessing nuisance effects to be inappropriate both pragmatically and philosophically, present a survey showing its high prevalence, and briefly discuss an alternative in the form of regression including nuisance variables.
2012-01-01
Background Because of the large volume of data and the intrinsic variation of data intensity observed in microarray experiments, different statistical methods have been used to systematically extract biological information and to quantify the associated uncertainty. The simplest method to identify differentially expressed genes is to evaluate the ratio of average intensities in two different conditions and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed. This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment. Results A new method of finding differentially expressed genes, called distributional fold change (DFC) test is introduced. The method is based on an analysis of the intensity distribution of all microarray probe sets mapped to a three dimensional feature space composed of average expression level, average difference of gene expression and total variance. The proposed method allows one to rank each feature based on the signal-to-noise ratio and to ascertain for each feature the confidence level and power for being differentially expressed. The performance of the new method was evaluated using the total and partial area under receiver operating curves and tested on 11 data sets from Gene Omnibus Database with independently verified differentially expressed genes and compared with the t-test and shrinkage t-test. Overall the DFC test performed the best – on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets. Conclusions The
More powerful genetic association testing via a new statistical framework for integrative genomics
Zhao, Sihai D.; Cai, T. Tony; Li, Hongzhe
2015-01-01
Integrative genomics offers a promising approach to more powerful genetic association studies. The hope is that combining outcome and genotype data with other types of genomic information can lead to more powerful SNP detection. We present a new association test based on a statistical model that explicitly assumes that genetic variations affect the outcome through perturbing gene expression levels. It is shown analytically that the proposed approach can have more power to detect SNPs that are associated with the outcome through transcriptional regulation, compared to tests using the outcome and genotype data alone, and simulations show that our method is relatively robust to misspecification. We also provide a strategy for applying our approach to high-dimensional genomic data. We use this strategy to identify a potentially new association between a SNP and a yeast cell’s response to the natural product tomatidine, which standard association analysis did not detect. PMID:24975802
More powerful genetic association testing via a new statistical framework for integrative genomics.
Zhao, Sihai D; Cai, T Tony; Li, Hongzhe
2014-12-01
Integrative genomics offers a promising approach to more powerful genetic association studies. The hope is that combining outcome and genotype data with other types of genomic information can lead to more powerful SNP detection. We present a new association test based on a statistical model that explicitly assumes that genetic variations affect the outcome through perturbing gene expression levels. It is shown analytically that the proposed approach can have more power to detect SNPs that are associated with the outcome through transcriptional regulation, compared to tests using the outcome and genotype data alone, and simulations show that our method is relatively robust to misspecification. We also provide a strategy for applying our approach to high-dimensional genomic data. We use this strategy to identify a potentially new association between a SNP and a yeast cell's response to the natural product tomatidine, which standard association analysis did not detect.
Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing
2016-01-01
A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method. PMID:26761006
Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing
2016-01-08
A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method.
Statistical power of likelihood ratio and Wald tests in latent class models with covariates.
Gudicha, Dereje W; Schmittmann, Verena D; Vermunt, Jeroen K
2016-12-30
This paper discusses power and sample-size computation for likelihood ratio and Wald testing of the significance of covariate effects in latent class models. For both tests, asymptotic distributions can be used; that is, the test statistic can be assumed to follow a central Chi-square under the null hypothesis and a non-central Chi-square under the alternative hypothesis. Power or sample-size computation using these asymptotic distributions requires specification of the non-centrality parameter, which in practice is rarely known. We show how to calculate this non-centrality parameter using a large simulated data set from the model under the alternative hypothesis. A simulation study is conducted evaluating the adequacy of the proposed power analysis methods, determining the key study design factor affecting the power level, and comparing the performance of the likelihood ratio and Wald test. The proposed power analysis methods turn out to perform very well for a broad range of conditions. Moreover, apart from effect size and sample size, an important factor affecting the power is the class separation, implying that when class separation is low, rather large sample sizes are needed to achieve a reasonable power level.
Debate on GMOs Health Risks after Statistical Findings in Regulatory Tests
de Vendômois, Joël Spiroux; Cellier, Dominique; Vélot, Christian; Clair, Emilie; Mesnage, Robin; Séralini, Gilles-Eric
2010-01-01
We summarize the major points of international debate on health risk studies for the main commercialized edible GMOs. These GMOs are soy, maize and oilseed rape designed to contain new pesticide residues since they have been modified to be herbicide-tolerant (mostly to Roundup) or to produce mutated Bt toxins. The debated alimentary chronic risks may come from unpredictable insertional mutagenesis effects, metabolic effects, or from the new pesticide residues. The most detailed regulatory tests on the GMOs are three-month long feeding trials of laboratory rats, which are biochemically assessed. The tests are not compulsory, and are not independently conducted. The test data and the corresponding results are kept in secret by the companies. Our previous analyses of regulatory raw data at these levels, taking the representative examples of three GM maize NK 603, MON 810, and MON 863 led us to conclude that hepatorenal toxicities were possible, and that longer testing was necessary. Our study was criticized by the company developing the GMOs in question and the regulatory bodies, mainly on the divergent biological interpretations of statistically significant biochemical and physiological effects. We present the scientific reasons for the crucially different biological interpretations and also highlight the shortcomings in the experimental protocols designed by the company. The debate implies an enormous responsibility towards public health and is essential due to nonexistent traceability or epidemiological studies in the GMO-producing countries. PMID:20941377
Debate on GMOs health risks after statistical findings in regulatory tests.
de Vendômois, Joël Spiroux; Cellier, Dominique; Vélot, Christian; Clair, Emilie; Mesnage, Robin; Séralini, Gilles-Eric
2010-10-05
We summarize the major points of international debate on health risk studies for the main commercialized edible GMOs. These GMOs are soy, maize and oilseed rape designed to contain new pesticide residues since they have been modified to be herbicide-tolerant (mostly to Roundup) or to produce mutated Bt toxins. The debated alimentary chronic risks may come from unpredictable insertional mutagenesis effects, metabolic effects, or from the new pesticide residues. The most detailed regulatory tests on the GMOs are three-month long feeding trials of laboratory rats, which are biochemically assessed. The tests are not compulsory, and are not independently conducted. The test data and the corresponding results are kept in secret by the companies. Our previous analyses of regulatory raw data at these levels, taking the representative examples of three GM maize NK 603, MON 810, and MON 863 led us to conclude that hepatorenal toxicities were possible, and that longer testing was necessary. Our study was criticized by the company developing the GMOs in question and the regulatory bodies, mainly on the divergent biological interpretations of statistically significant biochemical and physiological effects. We present the scientific reasons for the crucially different biological interpretations and also highlight the shortcomings in the experimental protocols designed by the company. The debate implies an enormous responsibility towards public health and is essential due to nonexistent traceability or epidemiological studies in the GMO-producing countries.
Parker, Albert E; Hamilton, Martin A; Tomasino, Stephen F
2014-01-01
A performance standard for a disinfectant test method can be evaluated by quantifying the (Type I) pass-error rate for ineffective products and the (Type II) fail-error rate for highly effective products. This paper shows how to calculate these error rates for test methods where the log reduction in a microbial population is used as a measure of antimicrobial efficacy. The calculations can be used to assess performance standards that may require multiple tests of multiple microbes at multiple laboratories. Notably, the error rates account for among-laboratory variance of the log reductions estimated from a multilaboratory data set and the correlation among tests of different microbes conducted in the same laboratory. Performance standards that require that a disinfectant product pass all tests or multiple tests on average, are considered. The proposed statistical methodology is flexible and allows for a different acceptable outcome for each microbe tested, since, for example, variability may be different for different microbes. The approach can also be applied to semiquantitative methods for which product efficacy is reported as the number of positive carriers out of a treated set and the density of the microbes on control carriers is quantified, thereby allowing a log reduction to be calculated. Therefore, using the approach described in this paper, the error rates can also be calculated for semiquantitative method performance standards specified solely in terms of the maximum allowable number of positive carriers per test. The calculations are demonstrated in a case study of the current performance standard for the semiquantitative AOAC Use-Dilution Methods for Pseudomonas aeruginosa (964.02) and Staphylococcus aureus (955.15), which allow up to one positive carrier out of a set of 60 inoculated and treated carriers in each test. A simulation study was also conducted to verify the validity of the model's assumptions and accuracy. Our approach, easily implemented
Colegrave, Nick; Ruxton, Graeme D
2017-03-29
A common approach to the analysis of experimental data across much of the biological sciences is test-qualified pooling. Here non-significant terms are dropped from a statistical model, effectively pooling the variation associated with each removed term with the error term used to test hypotheses (or estimate effect sizes). This pooling is only carried out if statistical testing on the basis of applying that data to a previous more complicated model provides motivation for this model simplification; hence the pooling is test-qualified. In pooling, the researcher increases the degrees of freedom of the error term with the aim of increasing statistical power to test their hypotheses of interest. Despite this approach being widely adopted and explicitly recommended by some of the most widely cited statistical textbooks aimed at biologists, here we argue that (except in highly specialized circumstances that we can identify) the hoped-for improvement in statistical power will be small or non-existent, and there is likely to be much reduced reliability of the statistical procedures through deviation of type I error rates from nominal levels. We thus call for greatly reduced use of test-qualified pooling across experimental biology, more careful justification of any use that continues, and a different philosophy for initial selection of statistical models in the light of this change in procedure.
Improved tests reveal that the accelarating moment release hypothesis is statistically insignificant
Hardebeck, J.L.; Felzer, K.R.; Michael, A.J.
2008-01-01
We test the hypothesis that accelerating moment release (AMR) is a precursor to large earthquakes, using data from California, Nevada, and Sumatra. Spurious cases of AMR can arise from data fitting because the time period, area, and sometimes magnitude range analyzed before each main shock are often optimized to produce the strongest AMR signal. Optimizing the search criteria can identify apparent AMR even if no robust signal exists. For both 1950-2006 California-Nevada M ??? 6.5 earthquakes and the 2004 M9.3 Sumatra earthquake, we can find two contradictory patterns in the pre-main shock earthquakes by data fitting: AMR and decelerating moment release. We compare the apparent AMR found in the real data to the apparent AMR found in four types of synthetic catalogs with no inherent AMR. When spatiotemporal clustering is included in the simulations, similar AMR signals are found by data fitting in both the real and synthetic data sets even though the synthetic data sets contain no real AMR. These tests demonstrate that apparent AMR may arise from a combination of data fitting and normal foreshock and aftershock activity. In principle, data-fitting artifacts could be avoided if the free parameters were determined from scaling relationships between the duration and spatial extent of the AMR pattern and the magnitude of the earthquake that follows it. However, we demonstrate that previously proposed scaling relationships are unstable, statistical artifacts caused by the use of a minimum magnitude for the earthquake catalog that scales with the main shock magnitude. Some recent AMR studies have used spatial regions based on hypothetical stress loading patterns, rather than circles, to select the data. We show that previous tests were biased and that unbiased tests do not find this change to the method to be an improvement. The use of declustered catalogs has also been proposed to eliminate the effect of clustering but we demonstrate that this does not increase the
A statistical design for testing transgenerational genomic imprinting in natural human populations.
Li, Yao; Guo, Yunqian; Wang, Jianxin; Hou, Wei; Chang, Myron N; Liao, Duanping; Wu, Rongling
2011-02-25
Genomic imprinting is a phenomenon in which the same allele is expressed differently, depending on its parental origin. Such a phenomenon, also called the parent-of-origin effect, has been recognized to play a pivotal role in embryological development and pathogenesis in many species. Here we propose a statistical design for detecting imprinted loci that control quantitative traits based on a random set of three-generation families from a natural population in humans. This design provides a pathway for characterizing the effects of imprinted genes on a complex trait or disease at different generations and testing transgenerational changes of imprinted effects. The design is integrated with population and cytogenetic principles of gene segregation and transmission from a previous generation to next. The implementation of the EM algorithm within the design framework leads to the estimation of genetic parameters that define imprinted effects. A simulation study is used to investigate the statistical properties of the model and validate its utilization. This new design, coupled with increasingly used genome-wide association studies, should have an immediate implication for studying the genetic architecture of complex traits in humans.
Tropospheric delay statistics measured by two site test interferometers at Goldstone, California
NASA Astrophysics Data System (ADS)
Morabito, David D.; D'Addario, Larry R.; Acosta, Roberto J.; Nessel, James A.
2013-12-01
Site test interferometers (STIs) have been deployed at two locations within the NASA Deep Space Network tracking complex in Goldstone, California. An STI measures the difference of atmospheric delay fluctuations over a distance comparable to the separations of microwave antennas that could be combined as phased arrays for communication and navigation. The purpose of the Goldstone STIs is to assess the suitability of Goldstone as an uplink array site and to statistically characterize atmosphere-induced phase delay fluctuations for application to future arrays. Each instrument consists of two ~1 m diameter antennas and associated electronics separated by ~200 m. The antennas continuously observe signals emitted by geostationary satellites and produce measurements of the phase difference between the received signals. The two locations at Goldstone are separated by 12.5 km and differ in elevation by 119 m. We find that their delay fluctuations are statistically similar but do not appear as shifted versions of each other, suggesting that the length scale for evolution of the turbulence pattern is shorter than the separation between instruments. We also find that the fluctuations are slightly weaker at the higher altitude site.
ERIC Educational Resources Information Center
Zheng, Yinggan; Gierl, Mark J.; Cui, Ying
2010-01-01
This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…
Hierarchical searching in model-based LADAR ATR using statistical separability tests
NASA Astrophysics Data System (ADS)
DelMarco, Stephen; Sobel, Erik; Douglas, Joel
2006-05-01
In this work we investigate simultaneous object identification improvement and efficient library search for model-based object recognition applications. We develop an algorithm to provide efficient, prioritized, hierarchical searching of the object model database. A common approach to model-based object recognition chooses the object label corresponding to the best match score. However, due to corrupting effects the best match score does not always correspond to the correct object model. To address this problem, we propose a search strategy which exploits information contained in a number of representative elements of the library to drill down to a small class with high probability of containing the object. We first optimally partition the library into a hierarchic taxonomy of disjoint classes. A small number of representative elements are used to characterize each object model class. At each hierarchy level, the observed object is matched against the representative elements of each class to generate score sets. A hypothesis testing problem, using a distribution-free statistical test, is defined on the score sets and used to choose the appropriate class for a prioritized search. We conduct a probabilistic analysis of the computational cost savings, and provide a formula measuring the computational advantage of the proposed approach. We generate numerical results using match scores derived from matching highly-detailed CAD models of civilian ground vehicles used in 3-D LADAR ATR. We present numerical results showing effects on classification performance of significance level and representative element number in the score set hypothesis testing problem.
Jha, Sumit Kumar; Pullum, Laura L; Ramanathan, Arvind
2016-01-01
Embedded intelligent systems ranging from tiny im- plantable biomedical devices to large swarms of autonomous un- manned aerial systems are becoming pervasive in our daily lives. While we depend on the flawless functioning of such intelligent systems, and often take their behavioral correctness and safety for granted, it is notoriously difficult to generate test cases that expose subtle errors in the implementations of machine learning algorithms. Hence, the validation of intelligent systems is usually achieved by studying their behavior on representative data sets, using methods such as cross-validation and bootstrapping.In this paper, we present a new testing methodology for studying the correctness of intelligent systems. Our approach uses symbolic decision procedures coupled with statistical hypothesis testing to. We also use our algorithm to analyze the robustness of a human detection algorithm built using the OpenCV open-source computer vision library. We show that the human detection implementation can fail to detect humans in perturbed video frames even when the perturbations are so small that the corresponding frames look identical to the naked eye.
Experimental Test of Heisenberg's Measurement Uncertainty Relation Based on Statistical Distances
NASA Astrophysics Data System (ADS)
Ma, Wenchao; Ma, Zhihao; Wang, Hengyan; Chen, Zhihua; Liu, Ying; Kong, Fei; Li, Zhaokai; Peng, Xinhua; Shi, Mingjun; Shi, Fazhan; Fei, Shao-Ming; Du, Jiangfeng
2016-04-01
Incompatible observables can be approximated by compatible observables in joint measurement or measured sequentially, with constrained accuracy as implied by Heisenberg's original formulation of the uncertainty principle. Recently, Busch, Lahti, and Werner proposed inaccuracy trade-off relations based on statistical distances between probability distributions of measurement outcomes [P. Busch et al., Phys. Rev. Lett. 111, 160405 (2013); P. Busch et al., Phys. Rev. A 89, 012129 (2014)]. Here we reformulate their theoretical framework, derive an improved relation for qubit measurement, and perform an experimental test on a spin system. The relation reveals that the worst-case inaccuracy is tightly bounded from below by the incompatibility of target observables, and is verified by the experiment employing joint measurement in which two compatible observables designed to approximate two incompatible observables on one qubit are measured simultaneously.
Palazón, L; Navas, A
2017-06-01
Information on sediment contribution and transport dynamics from the contributing catchments is needed to develop management plans to tackle environmental problems related with effects of fine sediment as reservoir siltation. In this respect, the fingerprinting technique is an indirect technique known to be valuable and effective for sediment source identification in river catchments. Large variability in sediment delivery was found in previous studies in the Barasona catchment (1509 km(2), Central Spanish Pyrenees). Simulation results with SWAT and fingerprinting approaches identified badlands and agricultural uses as the main contributors to sediment supply in the reservoir. In this study the <63 μm sediment fraction from the surface reservoir sediments (2 cm) are investigated following the fingerprinting procedure to assess how the use of different statistical procedures affects the amounts of source contributions. Three optimum composite fingerprints were selected to discriminate between source contributions based in land uses/land covers from the same dataset by the application of (1) discriminant function analysis; and its combination (as second step) with (2) Kruskal-Wallis H-test and (3) principal components analysis. Source contribution results were different between assessed options with the greatest differences observed for option using #3, including the two step process: principal components analysis and discriminant function analysis. The characteristics of the solutions by the applied mixing model and the conceptual understanding of the catchment showed that the most reliable solution was achieved using #2, the two step process of Kruskal-Wallis H-test and discriminant function analysis. The assessment showed the importance of the statistical procedure used to define the optimum composite fingerprint for sediment fingerprinting applications.
NASA Astrophysics Data System (ADS)
Guo, Bingjie; Bitner-Gregersen, Elzbieta Maria; Sun, Hui; Block Helmers, Jens
2013-04-01
Earlier investigations have indicated that proper prediction of nonlinear loads and responses due to nonlinear waves is important for ship safety in extreme seas. However, the nonlinear loads and responses in extreme seas have not been sufficiently investigated yet, particularly when rogue waves are considered. A question remains whether the existing linear codes can predict nonlinear loads and responses with a satisfactory accuracy and how large the deviations from linear predictions are. To indicate it response statistics have been studied based on the model tests carried out with a LNG tanker in the towing tank of the Technical University of Berlin (TUB), and compared with the statistics derived from numerical simulations using the DNV code WASIM. It is a potential code for wave-ship interaction based on 3D Panel method, which can perform both linear and nonlinear simulation. The numerical simulations with WASIM and the model tests in extreme and rogue waves have been performed. The analysis of ship motions (heave and pitch) and bending moments, in both regular and irregular waves, is performed. The results from the linear and nonlinear simulations are compared with experimental data to indicate the impact of wave non-linearity on loads and response calculations when the code based on the Rankine Panel Method is used. The study shows that nonlinearities may have significant effect on extreme motions and bending moment generated by strongly nonlinear waves. The effect of water depth on ship responses is also demonstrated using numerical simulations. Uncertainties related to the results are discussed, giving particular attention to sampling variability.
ERIC Educational Resources Information Center
Luh, Wei-Ming; Guo, Jiin-Huarng
2005-01-01
To deal with nonnormal and heterogeneous data for the one-way fixed effect analysis of variance model, the authors adopted a trimmed means method in conjunction with Hall's invertible transformation into a heteroscedastic test statistic (Alexander-Govern test or Welch test). The results of simulation experiments showed that the proposed technique…
Combining test statistics and models in bootstrapped model rejection: it is a balancing act
2014-01-01
Background Model rejections lie at the heart of systems biology, since they provide conclusive statements: that the corresponding mechanistic assumptions do not serve as valid explanations for the experimental data. Rejections are usually done using e.g. the chi-square test (χ2) or the Durbin-Watson test (DW). Analytical formulas for the corresponding distributions rely on assumptions that typically are not fulfilled. This problem is partly alleviated by the usage of bootstrapping, a computationally heavy approach to calculate an empirical distribution. Bootstrapping also allows for a natural extension to estimation of joint distributions, but this feature has so far been little exploited. Results We herein show that simplistic combinations of bootstrapped tests, like the max or min of the individual p-values, give inconsistent, i.e. overly conservative or liberal, results. A new two-dimensional (2D) approach based on parametric bootstrapping, on the other hand, is found both consistent and with a higher power than the individual tests, when tested on static and dynamic examples where the truth is known. In the same examples, the most superior test is a 2D χ2vsχ2, where the second χ2-value comes from an additional help model, and its ability to describe bootstraps from the tested model. This superiority is lost if the help model is too simple, or too flexible. If a useful help model is found, the most powerful approach is the bootstrapped log-likelihood ratio (LHR). We show that this is because the LHR is one-dimensional, because the second dimension comes at a cost, and because LHR has retained most of the crucial information in the 2D distribution. These approaches statistically resolve a previously published rejection example for the first time. Conclusions We have shown how to, and how not to, combine tests in a bootstrap setting, when the combination is advantageous, and when it is advantageous to include a second model. These results also provide a deeper
ERIC Educational Resources Information Center
Tabor, Josh
2010-01-01
On the 2009 AP[c] Statistics Exam, students were asked to create a statistic to measure skewness in a distribution. This paper explores several of the most popular student responses and evaluates which statistic performs best when sampling from various skewed populations. (Contains 8 figures, 3 tables, and 4 footnotes.)
What's the best statistic for a simple test of genetic association in a case-control study?
Kuo, Chia-Ling; Feingold, Eleanor
2010-04-01
Genome-wide genetic association studies typically start with univariate statistical tests of each marker. In principle, this single-SNP scanning is statistically straightforward--the testing is done with standard methods (e.g. chi(2) tests, regression) that have been well studied for decades. However, a number of different tests and testing procedures can be used. In a case-control study, one can use a 1 df allele-based test, a 1 or 2 df genotype-based test, or a compound procedure that combines two or more of these statistics. Additionally, most of the tests can be performed with or without covariates included in the model. While there are a number of statistical papers that make power comparisons among subsets of these methods, none has comprehensively tackled the question of which of the methods in common use is best suited to univariate scanning in a genome-wide association study. In this paper, we consider a wide variety of realistic test procedures, and first compare the power of the different procedures to detect a single locus under different genetic models. We then address the question of whether or when it is a good idea to include covariates in the analysis. We conclude that the most commonly used approach to handle covariates--modeling covariate main effects but not interactions--is almost never a good idea. Finally, we consider the performance of the statistics in a genome scan context.
DWPF Sample Vial Insert Study-Statistical Analysis of DWPF Mock-Up Test Data
Harris, S.P.
1997-09-18
This report is prepared as part of Technical/QA Task Plan WSRC-RP-97-351 which was issued in response to Technical Task Request HLW/DWPF/TTR-970132 submitted by DWPF. Presented in this report is a statistical analysis of DWPF Mock-up test data for evaluation of two new analytical methods which use insert samples from the existing HydragardTM sampler. The first is a new hydrofluoric acid based method called the Cold Chemical Method (Cold Chem) and the second is a modified fusion method.Either new DWPF analytical method could result in a two to three fold improvement in sample analysis time.Both new methods use the existing HydragardTM sampler to collect a smaller insert sample from the process sampling system. The insert testing methodology applies to the DWPF Slurry Mix Evaporator (SME) and the Melter Feed Tank (MFT) samples.The insert sample is named after the initial trials which placed the container inside the sample (peanut) vials. Samples in small 3 ml containers (Inserts) are analyzed by either the cold chemical method or a modified fusion method. The current analytical method uses a HydragardTM sample station to obtain nearly full 15 ml peanut vials. The samples are prepared by a multi-step process for Inductively Coupled Plasma (ICP) analysis by drying, vitrification, grinding and finally dissolution by either mixed acid or fusion. In contrast, the insert sample is placed directly in the dissolution vessel, thus eliminating the drying, vitrification and grinding operations for the Cold chem method. Although the modified fusion still requires drying and calcine conversion, the process is rapid due to the decreased sample size and that no vitrification step is required.A slurry feed simulant material was acquired from the TNX pilot facility from the test run designated as PX-7.The Mock-up test data were gathered on the basis of a statistical design presented in SRT-SCS-97004 (Rev. 0). Simulant PX-7 samples were taken in the DWPF Analytical Cell Mock
Sofer, Tamar; Heller, Ruth; Bogomolov, Marina; Avery, Christy L; Graff, Mariaelisa; North, Kari E; Reiner, Alex P; Thornton, Timothy A; Rice, Kenneth; Benjamini, Yoav; Laurie, Cathy C; Kerr, Kathleen F
2017-01-15
In genome-wide association studies (GWAS), "generalization" is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWERg ) and FDR (FDRg ) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values <5×10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values <6.6×10-5 (89 regions), we generalized SNPs from 27 regions.
Exact statistical tests for the intersection of independent lists of genes
NATARAJAN, LOKI; PU, MINYA; MESSER, KAREN
2012-01-01
Public data repositories have enabled researchers to compare results across multiple genomic studies in order to replicate findings. A common approach is to first rank genes according to an hypothesis of interest within each study. Then, lists of the top-ranked genes within each study are compared across studies. Genes recaptured as highly ranked (usually above some threshold) in multiple studies are considered to be significant. However, this comparison strategy often remains informal, in that Type I error and false discovery rate are usually uncontrolled. In this paper, we formalize an inferential strategy for this kind of list-intersection discovery test. We show how to compute a p-value associated with a `recaptured' set of genes, using a closed-form Poisson approximation to the distribution of the size of the recaptured set. The distribution of the test statistic depends on the rank threshold and the number of studies within which a gene must be recaptured. We use a Poisson approximation to investigate operating characteristics of the test. We give practical guidance on how to design a bioinformatic list-intersection study with prespecified control of Type I error (at the set level) and false discovery rate (at the gene level). We show how choice of test parameters will affect the expected proportion of significant genes identified. We present a strategy for identifying optimal choice of parameters, depending on the particular alternative hypothesis which might hold. We illustrate our methods using prostate cancer gene-expression datasets from the curated Oncomine database. PMID:23335952
McAlinden, Colm; Khadka, Jyoti; Pesudovs, Konrad
2011-07-01
The ever-expanding choice of ocular metrology and imaging equipment has driven research into the validity of their measurements. Consequently, studies of the agreement between two instruments or clinical tests have proliferated in the ophthalmic literature. It is important that researchers apply the appropriate statistical tests in agreement studies. Correlation coefficients are hazardous and should be avoided. The 'limits of agreement' method originally proposed by Altman and Bland in 1983 is the statistical procedure of choice. Its step-by-step use and practical considerations in relation to optometry and ophthalmology are detailed in addition to sample size considerations and statistical approaches to precision (repeatability or reproducibility) estimates.
Statistical methods for the analysis of a screening test for chronic beryllium disease
Frome, E.L.; Neubert, R.L.; Smith, M.H.; Littlefield, L.G.; Colyer, S.P.
1994-10-01
The lymphocyte proliferation test (LPT) is a noninvasive screening procedure used to identify persons who may have chronic beryllium disease. A practical problem in the analysis of LPT well counts is the occurrence of outlying data values (approximately 7% of the time). A log-linear regression model is used to describe the expected well counts for each set of test conditions. The variance of the well counts is proportional to the square of the expected counts, and two resistant regression methods are used to estimate the parameters of interest. The first approach uses least absolute values (LAV) on the log of the well counts to estimate beryllium stimulation indices (SIs) and the coefficient of variation. The second approach uses a resistant regression version of maximum quasi-likelihood estimation. A major advantage of the resistant regression methods is that it is not necessary to identify and delete outliers. These two new methods for the statistical analysis of the LPT data and the outlier rejection method that is currently being used are applied to 173 LPT assays. The authors strongly recommend the LAV method for routine analysis of the LPT.
A Tool Preference Choice Method for RNA Secondary Structure Prediction by SVM with Statistical Tests
Hor, Chiou-Yi; Yang, Chang-Biau; Chang, Chia-Hung; Tseng, Chiou-Ting; Chen, Hung-Hsin
2013-01-01
The Prediction of RNA secondary structures has drawn much attention from both biologists and computer scientists. Many useful tools have been developed for this purpose. These tools have their individual strengths and weaknesses. As a result, based on support vector machines (SVM), we propose a tool choice method which integrates three prediction tools: pknotsRG, RNAStructure, and NUPACK. Our method first extracts features from the target RNA sequence, and adopts two information-theoretic feature selection methods for feature ranking. We propose a method to combine feature selection and classifier fusion in an incremental manner. Our test data set contains 720 RNA sequences, where 225 pseudoknotted RNA sequences are obtained from PseudoBase, and 495 nested RNA sequences are obtained from RNA SSTRAND. The method serves as a preprocessing way in analyzing RNA sequences before the RNA secondary structure prediction tools are employed. In addition, the performance of various configurations is subject to statistical tests to examine their significance. The best base-pair accuracy achieved is 75.5%, which is obtained by the proposed incremental method, and is significantly higher than 68.8%, which is associated with the best predictor, pknotsRG. PMID:23641141
Test Statistics for the Identification of Assembly Neurons in Parallel Spike Trains
Picado Muiño, David; Borgelt, Christian
2015-01-01
In recent years numerous improvements have been made in multiple-electrode recordings (i.e., parallel spike-train recordings) and spike sorting to the extent that nowadays it is possible to monitor the activity of up to hundreds of neurons simultaneously. Due to these improvements it is now potentially possible to identify assembly activity (roughly understood as significant synchronous spiking of a group of neurons) from these recordings, which—if it can be demonstrated reliably—would significantly improve our understanding of neural activity and neural coding. However, several methodological problems remain when trying to do so and, among them, a principal one is the combinatorial explosion that one faces when considering all potential neuronal assemblies, since in principle every subset of the recorded neurons constitutes a candidate set for an assembly. We present several statistical tests to identify assembly neurons (i.e., neurons that participate in a neuronal assembly) from parallel spike trains with the aim of reducing the set of neurons to a relevant subset of them and this way ease the task of identifying neuronal assemblies in further analyses. These tests are an improvement of those introduced in the work by Berger et al. (2010) based on additional features like spike weight or pairwise overlap and on alternative ways to identify spike coincidences (e.g., by avoiding time binning, which tends to lose information). PMID:25866503
FLAGS: A Flexible and Adaptive Association Test for Gene Sets Using Summary Statistics
Huang, Jianfei; Wang, Kai; Wei, Peng; Liu, Xiangtao; Liu, Xiaoming; Tan, Kai; Boerwinkle, Eric; Potash, James B.; Han, Shizhong
2016-01-01
Genome-wide association studies (GWAS) have been widely used for identifying common variants associated with complex diseases. Despite remarkable success in uncovering many risk variants and providing novel insights into disease biology, genetic variants identified to date fail to explain the vast majority of the heritability for most complex diseases. One explanation is that there are still a large number of common variants that remain to be discovered, but their effect sizes are generally too small to be detected individually. Accordingly, gene set analysis of GWAS, which examines a group of functionally related genes, has been proposed as a complementary approach to single-marker analysis. Here, we propose a flexible and adaptive test for gene sets (FLAGS), using summary statistics. Extensive simulations showed that this method has an appropriate type I error rate and outperforms existing methods with increased power. As a proof of principle, through real data analyses of Crohn’s disease GWAS data and bipolar disorder GWAS meta-analysis results, we demonstrated the superior performance of FLAGS over several state-of-the-art association tests for gene sets. Our method allows for the more powerful application of gene set analysis to complex diseases, which will have broad use given that GWAS summary results are increasingly publicly available. PMID:26773050
Létourneau, Daniel McNiven, Andrea; Keller, Harald; Wang, An; Amin, Md Nurul; Pearce, Jim; Norrlinger, Bernhard; Jaffray, David A.
2014-12-15
Purpose: High-quality radiation therapy using highly conformal dose distributions and image-guided techniques requires optimum machine delivery performance. In this work, a monitoring system for multileaf collimator (MLC) performance, integrating semiautomated MLC quality control (QC) tests and statistical process control tools, was developed. The MLC performance monitoring system was used for almost a year on two commercially available MLC models. Control charts were used to establish MLC performance and assess test frequency required to achieve a given level of performance. MLC-related interlocks and servicing events were recorded during the monitoring period and were investigated as indicators of MLC performance variations. Methods: The QC test developed as part of the MLC performance monitoring system uses 2D megavoltage images (acquired using an electronic portal imaging device) of 23 fields to determine the location of the leaves with respect to the radiation isocenter. The precision of the MLC performance monitoring QC test and the MLC itself was assessed by detecting the MLC leaf positions on 127 megavoltage images of a static field. After initial calibration, the MLC performance monitoring QC test was performed 3–4 times/week over a period of 10–11 months to monitor positional accuracy of individual leaves for two different MLC models. Analysis of test results was performed using individuals control charts per leaf with control limits computed based on the measurements as well as two sets of specifications of ±0.5 and ±1 mm. Out-of-specification and out-of-control leaves were automatically flagged by the monitoring system and reviewed monthly by physicists. MLC-related interlocks reported by the linear accelerator and servicing events were recorded to help identify potential causes of nonrandom MLC leaf positioning variations. Results: The precision of the MLC performance monitoring QC test and the MLC itself was within ±0.22 mm for most MLC leaves
NASA Astrophysics Data System (ADS)
Roggo, Y.; Duponchel, L.; Ruckebusch, C.; Huvenne, J.-P.
2003-06-01
Near-infrared spectroscopy (NIRS) has been applied for both qualitative and quantitative evaluation of sugar beet. However, chemometrics methods are numerous and a choice criterion is sometime difficult to determine. In order to select the most accurate chemometrics method, statistical tests are developed. In the first part, quantitative models, which predict sucrose content of sugar beet, are compared. To realize a systematic study, 54 models are developed with different spectral pre-treatments (Standard Normal Variate (SNV), Detrending (D), first and second Derivative), different spectral ranges and different regression methods (Principal Component Regression (PCR), Partial Least Squares (PLS), Modified PLS (MPLS)). Analyze of variance and Fisher's tests are computed to compare respectively bias and Standard Error of Prediction Corrected for bias (SEP(C)). The model developed with full spectra pre-treated by SNV, second derivative and MPLS methods gives accurate results: bias is 0.008 and SEP(C) is 0.097 g of sucrose per 100 g of sample on a concentration range between 14 and 21 g/100 g. In the second part, McNemar's test is applied to compare the classification methods. The classifications are used with two data sets: the first data set concerns the disease resistance of sugar beet and the second deals with spectral differences between four spectrometers. The performances of four well-known classification methods are compared on the NIRS data: Linear Discriminant Analysis (LDA), K Nearest Neighbors method (KNN), Simple Modeling of Class Analogy (SIMCA) and Learning Vector Quantization neural network (LVQ) are computed. In this study, the most accurate method (SIMCA) has a prediction rate of 81.9% of good classification on the disease resistance determination and has 99.4% of good classification on the instrument data set.
NASA Astrophysics Data System (ADS)
Taylor, Steven R.; Anderson, Dale N.
2011-02-01
In his Forum, P. Vermeesch (Eos, 90(47), 443, doi:10.1029/2009EO470004, 2009) applied Pearson's chi-square test to a large catalog of earthquakes to test the hypothesis that earthquakes are uniformly distributed across day of week (the formal null hypothesis that an earthquake has equal probability of occurring on any day). In his analysis, this hypothesis is rejected, and he proposes that the statistical test implies that earthquakes are correlated with day of the week (with specifically high seismicity on Sunday), and therefore strong dependence of p values on sample size makes them uninterpretable. It is a well-known property of classical hypothesis tests that the power of a statistical test is a function of the degrees of freedom, so that a test with large degrees of freedom will always have the resolution to reject the null. Consideration for practical as well as statistical significance is essential. Selecting bins so that the chi-square test fails to reject the null hypothesis is essentially formulating a test to agree with a foregone conclusion. To the point, this data set does not exhibit uniform seismicity across time, and the statistical test is summarizing the data correctly. With proper attention to the application setting, and formulation of the null and alternative hypotheses, summarizing with p values is technically sound.
Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F
2013-08-01
To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches - for example, analysis of variance (ANOVA) - are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will
ERIC Educational Resources Information Center
MacDonald, Paul; Paunonen, Sampo V.
2002-01-01
Examined the behavior of item and person statistics from item response theory and classical test theory frameworks through Monte Carlo methods with simulated test data. Findings suggest that item difficulty and person ability estimates are highly comparable for both approaches. (SLD)
Chlorine-36 data at Yucca Mountain: Statistical tests of conceptual models for unsaturated-zone flow
Campbell, K.; Wolfsberg, A.; Fabryka-Martin, J.; Sweetkind, D.
2003-01-01
An extensive set of chlorine-36 (36Cl) data has been collected in the Exploratory Studies Facility (ESF), an 8-km-long tunnel at Yucca Mountain, Nevada, for the purpose of developing and testing conceptual models of flow and transport in the unsaturated zone (UZ) at this site. At several locations, the measured values of 36Cl/Cl ratios for salts leached from rock samples are high enough to provide strong evidence that at least a small component of bomb-pulse 36Cl, fallout from atmospheric testing of nuclear devices in the 1950s and 1960s, was measured, implying that some fraction of the water traveled from the ground surface through 200-300 m of unsaturated rock to the level of the ESF during the last 50 years. These data are analyzed here using a formal statistical approach based on log-linear models to evaluate alternative conceptual models for the distribution of such fast flow paths. The most significant determinant of the presence of bomb-pulse 36Cl in a sample from the welded Topopah Spring unit (TSw) is the structural setting from which the sample was collected. Our analysis generally supports the conceptual model that a fault that cuts through the nonwelded Paintbrush tuff unit (PTn) that overlies the TSw is required in order for bomb-pulse 36Cl to be transmitted to the sample depth in less than 50 years. Away from PTn-cutting faults, the ages of water samples at the ESF appear to be a strong function of the thickness of the nonwelded tuff between the ground surface and the ESF, due to slow matrix flow in that unit. ?? 2002 Elsevier Science B.V. All rights reserved.
Chlorine-36 data at Yucca Mountain: statistical tests of conceptual models for unsaturated-zone flow
NASA Astrophysics Data System (ADS)
Campbell, Katherine; Wolfsberg, Andrew; Fabryka-Martin, June; Sweetkind, Donald
2003-05-01
An extensive set of chlorine-36 ( 36Cl) data has been collected in the Exploratory Studies Facility (ESF), an 8-km-long tunnel at Yucca Mountain, Nevada, for the purpose of developing and testing conceptual models of flow and transport in the unsaturated zone (UZ) at this site. At several locations, the measured values of 36Cl/Cl ratios for salts leached from rock samples are high enough to provide strong evidence that at least a small component of bomb-pulse 36Cl, fallout from atmospheric testing of nuclear devices in the 1950s and 1960s, was measured, implying that some fraction of the water traveled from the ground surface through 200-300 m of unsaturated rock to the level of the ESF during the last 50 years. These data are analyzed here using a formal statistical approach based on log-linear models to evaluate alternative conceptual models for the distribution of such fast flow paths. The most significant determinant of the presence of bomb-pulse 36Cl in a sample from the welded Topopah Spring unit (TSw) is the structural setting from which the sample was collected. Our analysis generally supports the conceptual model that a fault that cuts through the nonwelded Paintbrush tuff unit (PTn) that overlies the TSw is required in order for bomb-pulse 36Cl to be transmitted to the sample depth in less than 50 years. Away from PTn-cutting faults, the ages of water samples at the ESF appear to be a strong function of the thickness of the nonwelded tuff between the ground surface and the ESF, due to slow matrix flow in that unit.
Campbell, Katherine; Wolfsberg, Andrew; Fabryka-Martin, June; Sweetkind, Donald
2003-01-01
An extensive set of chlorine-36 (36Cl) data has been collected in the Exploratory Studies Facility (ESF), an 8-km-long tunnel at Yucca Mountain, Nevada, for the purpose of developing and testing conceptual models of flow and transport in the unsaturated zone (UZ) at this site. At several locations, the measured values of 36Cl/Cl ratios for salts leached from rock samples are high enough to provide strong evidence that at least a small component of bomb-pulse 36Cl, fallout from atmospheric testing of nuclear devices in the 1950s and 1960s, was measured, implying that some fraction of the water traveled from the ground surface through 200-300 m of unsaturated rock to the level of the ESF during the last 50 years. These data are analyzed here using a formal statistical approach based on log-linear models to evaluate alternative conceptual models for the distribution of such fast flow paths. The most significant determinant of the presence of bomb-pulse 36Cl in a sample from the welded Topopah Spring unit (TSw) is the structural setting from which the sample was collected. Our analysis generally supports the conceptual model that a fault that cuts through the nonwelded Paintbrush tuff unit (PTn) that overlies the TSw is required in order for bomb-pulse 36Cl to be transmitted to the sample depth in less than 50 years. Away from PTn-cutting faults, the ages of water samples at the ESF appear to be a strong function of the thickness of the nonwelded tuff between the ground surface and the ESF, due to slow matrix flow in that unit.
Ramus, Claire; Hovasse, Agnès; Marcellin, Marlène; Hesse, Anne-Marie; Mouton-Barbosa, Emmanuelle; Bouyssié, David; Vaca, Sebastian; Carapito, Christine; Chaoui, Karima; Bruley, Christophe; Garin, Jérôme; Cianférani, Sarah; Ferro, Myriam; Dorssaeler, Alain Van; Burlet-Schiltz, Odile; Schaeffer, Christine; Couté, Yohann; Gonzalez de Peredo, Anne
2016-03-01
This data article describes a controlled, spiked proteomic dataset for which the "ground truth" of variant proteins is known. It is based on the LC-MS analysis of samples composed of a fixed background of yeast lysate and different spiked amounts of the UPS1 mixture of 48 recombinant proteins. It can be used to objectively evaluate bioinformatic pipelines for label-free quantitative analysis, and their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. More specifically, it can be useful for tuning software tools parameters, but also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. The raw MS files can be downloaded from ProteomeXchange with identifier PXD001819. Starting from some raw files of this dataset, we also provide here some processed data obtained through various bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, to exemplify the use of such data in the context of software benchmarking, as discussed in details in the accompanying manuscript [1]. The experimental design used here for data processing takes advantage of the different spike levels introduced in the samples composing the dataset, and processed data are merged in a single file to facilitate the evaluation and illustration of software tools results for the detection of variant proteins with different absolute expression levels and fold change values.
Chen, David; Shah, Anup; Nguyen, Hien; Loo, Dorothy; Inder, Kerry L; Hill, Michelle M
2014-09-05
The utility of high-throughput quantitative proteomics to identify differentially abundant proteins en-masse relies on suitable and accessible statistical methodology, which remains mostly an unmet need. We present a free web-based tool, called Quantitative Proteomics p-value Calculator (QPPC), designed for accessibility and usability by proteomics scientists and biologists. Being an online tool, there is no requirement for software installation. Furthermore, QPPC accepts generic peptide ratio data generated by any mass spectrometer and database search engine. Importantly, QPPC utilizes the permutation test that we recently found to be superior to other methods for analysis of peptide ratios because it does not assume normal distributions.1 QPPC assists the user in selecting significantly altered proteins based on numerical fold change, or standard deviation from the mean or median, together with the permutation p-value. Output is in the form of comma separated values files, along with graphical visualization using volcano plots and histograms. We evaluate the optimal parameters for use of QPPC, including the permutation level and the effect of outlier and contaminant peptides on p-value variability. The optimal parameters defined are deployed as default for the web-tool at http://qppc.di.uq.edu.au/ .
Podoll, Amber S; Bell, Cynthia S; Molony, Donald A
2012-01-01
Nephrologists rely on valid clinical studies to inform their health care decisions. Knowledge of simple statistical principles equips the prudent nephrologist with the skills that allow him or her to critically evaluate clinical studies and to determine the validity of the results. Important in this process is knowing when certain statistical tests are used appropriately and if their application in interpreting research data will most likely lead to the most robust or valid conclusions. The research team bears the responsibility for determining the statistical analysis during the design phase of the study and subsequently for carrying out the appropriate analysis. This will ensure that bias is minimized and "valid" results are reported. We have summarized the important caveats and components in correctly choosing a statistical test with a series of tables. With this format, we wish to provide a tool for the nephrologist/researcher that he or she can use when required to decide if an appropriate statistical analysis plan was implemented for any particular study. We have included in these tables the types of statistical tests that might be used best for analysis of different types of comparisons on small and on larger patient samples.
ERIC Educational Resources Information Center
Jones, Andrew T.
2011-01-01
Practitioners often depend on item analysis to select items for exam forms and have a variety of options available to them. These include the point-biserial correlation, the agreement statistic, the B index, and the phi coefficient. Although research has demonstrated that these statistics can be useful for item selection, no research as of yet has…
The T(ea) Test: Scripted Stories Increase Statistical Method Selection Skills
ERIC Educational Resources Information Center
Hackathorn, Jana; Ashdown, Brien
2015-01-01
To teach statistics, teachers must attempt to overcome pedagogical obstacles, such as dread, anxiety, and boredom. There are many options available to teachers that facilitate a pedagogically conducive environment in the classroom. The current study examined the effectiveness of incorporating scripted stories and humor into statistical method…
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
NASA Astrophysics Data System (ADS)
Shiraishi, Maresuke; Hikage, Chiaki; Namba, Ryo; Namikawa, Toshiya; Hazumi, Masashi
2016-08-01
The B -mode polarization in the cosmic microwave background (CMB) anisotropies at large angular scales provides compelling evidence for the primordial gravitational waves (GWs). It is often stated that a discovery of the GWs establishes the quantum fluctuation of vacuum during the cosmic inflation. Since the GWs could also be generated by source fields, however, we need to check if a sizable signal exists due to such source fields before reaching a firm conclusion when the B mode is discovered. Source fields of particular types can generate non-Gaussianity (NG) in the GWs. Testing statistics of the B mode is a powerful way of detecting such NG. As a concrete example, we show a model in which gauge field sources chiral GWs via a pseudoscalar coupling and forecast the detection significance at the future CMB satellite LiteBIRD. Effects of residual foregrounds and lensing B mode are both taken into account. We find the B -mode bispectrum "BBB" is in particular sensitive to the source-field NG, which is detectable at LiteBIRD with a >3 σ significance. Therefore the search for the BBB will be indispensable toward unambiguously establishing quantum fluctuation of vacuum when the B mode is discovered. We also introduced the Minkowski functional to detect the NGs. While we find that the Minkowski functional is less efficient than the harmonic-space bispectrum estimator, it still serves as a useful cross-check. Finally, we also discuss the possibility of extracting clean information on parity violation of GWs and new types of parity-violating observables induced by lensing.
Brown, Geoffrey W.; Sandstrom, Mary M.; Preston, Daniel N.; ...
2014-11-17
In this study, the Integrated Data Collection Analysis (IDCA) program has conducted a proficiency test for small-scale safety and thermal (SSST) testing of homemade explosives (HMEs). Described here are statistical analyses of the results from this test for impact, friction, electrostatic discharge, and differential scanning calorimetry analysis of the RDX Class 5 Type II standard. The material was tested as a well-characterized standard several times during the proficiency test to assess differences among participants and the range of results that may arise for well-behaved explosive materials.
An objective statistical test for eccentricity forcing of Oligo-Miocene climate
NASA Astrophysics Data System (ADS)
Proistosescu, C.; Huybers, P.; Maloof, A. C.
2008-12-01
We seek a maximally objective test for the presence of orbital features in Oligocene and Miocene δ18O records from marine sediments. Changes in Earth's orbital eccentricity are thought to be an important control on the long term variability of climate during the Oligocene and Miocene Epochs. However, such an important control from eccentricity is surprising because eccentricity has relatively little influence on Earth's annual average insolation budget. Nevertheless, if significant eccentricity variability is present, it would provide important insight into the operation of the climate system at long timescales. Here we use previously published data, but using a chronology which is initially independent of orbital assumptions, to test for the presence of eccentricity period variability in the Oligocene/Miocene sediment records. In contrast to the sawtooth climate record of the Pleistocene, the Oligocene and Miocene climate record appears smooth and symmetric and does not reset itself every hundred thousand years. This smooth variation, as well as the time interval spanning many eccentricity periods makes Oligocene and Miocene paleorecords very suitable for evaluating the importance of eccentricity forcing. First, we construct time scales depending only upon the ages of geomagnetic reversals with intervening ages linearly interpolated with depth. Such a single age-depth relationship is, however, too uncertain to assess whether orbital features are present. Thus, we construct a second depth-derived age-model by averaging ages across multiple sediment cores which have, at least partly, independent accumulation rate histories. But ages are still too uncertain to permit unambiguous detection of orbital variability. Thus we employ limited tuning assumptions and measure the degree by orbital period variability increases using spectral power estimates. By tuning we know that we are biasing the record toward showing orbital variations, but we account for this bias in our
ERIC Educational Resources Information Center
Callamaras, Peter
1983-01-01
This buyer's guide to seven major types of statistics software packages for microcomputers reviews Edu-Ware Statistics 3.0; Financial Planning; Speed Stat; Statistics with DAISY; Human Systems Dynamics package of Stats Plus, ANOVA II, and REGRESS II; Maxistat; and Moore-Barnes' MBC Test Construction and MBC Correlation. (MBR)
Technology Transfer Automated Retrieval System (TEKTRAN)
Whether a required Salmonella test series is passed or failed depends not only on the presence of the bacteria, but also on the methods for taking samples, the methods for culturing samples, and the statistics associated with the sampling plan. The pass-fail probabilities of the two-class attribute...
ERIC Educational Resources Information Center
Oshima, T. C.; Raju, Nambury S.; Nanda, Alice O.
2006-01-01
A new item parameter replication method is proposed for assessing the statistical significance of the noncompensatory differential item functioning (NCDIF) index associated with the differential functioning of items and tests framework. In this new method, a cutoff score for each item is determined by obtaining a (1-alpha ) percentile rank score…
ERIC Educational Resources Information Center
Rogosa, David
1981-01-01
The form of the Johnson-Neyman region of significance is shown to be determined by the statistic for testing the null hypothesis that the population within-group regressions are parallel. Results are obtained for both simultaneous and nonsimultaneous regions of significance. (Author)
Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F
2013-01-01
Abstract To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches – for example, analysis of variance (ANOVA) – are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and
Festing, Michael F. W.
2014-01-01
The safety of chemicals, drugs, novel foods and genetically modified crops is often tested using repeat-dose sub-acute toxicity tests in rats or mice. It is important to avoid misinterpretations of the results as these tests are used to help determine safe exposure levels in humans. Treated and control groups are compared for a range of haematological, biochemical and other biomarkers which may indicate tissue damage or other adverse effects. However, the statistical analysis and presentation of such data poses problems due to the large number of statistical tests which are involved. Often, it is not clear whether a “statistically significant” effect is real or a false positive (type I error) due to sampling variation. The author's conclusions appear to be reached somewhat subjectively by the pattern of statistical significances, discounting those which they judge to be type I errors and ignoring any biomarker where the p-value is greater than p = 0.05. However, by using standardised effect sizes (SESs) a range of graphical methods and an over-all assessment of the mean absolute response can be made. The approach is an extension, not a replacement of existing methods. It is intended to assist toxicologists and regulators in the interpretation of the results. Here, the SES analysis has been applied to data from nine published sub-acute toxicity tests in order to compare the findings with those of the author's. Line plots, box plots and bar plots show the pattern of response. Dose-response relationships are easily seen. A “bootstrap” test compares the mean absolute differences across dose groups. In four out of seven papers where the no observed adverse effect level (NOAEL) was estimated by the authors, it was set too high according to the bootstrap test, suggesting that possible toxicity is under-estimated. PMID:25426843
NASA Astrophysics Data System (ADS)
Hilborn, Robert C.
1997-04-01
The connection between the spin of particles and the permutation symmetry ("statistics") of multiparticle states lies at the heart of much of atomic, molecular, condensed matter, and nuclear physics. The spin-statistics theorem of relativistic quantum field theory seems to provide a theoretical basis for this connection. There are, however, loopholes (O. W. Greenberg, Phys. Rev. D 43, 4111 (1991).) that allow for a field theory of identical particles whose statistics interpolate smoothly between that of bosons and fermions. Thus, it is up to experiment to reveal how closely nature follows the usual spin- statistics connection. After reviewing experiments that provide stringent limits on possible violations of the spin-statistics connection for electrons, I shall describe recent analogous experiments for spin-0 particles (R. C. Hilborn and C. L. Yuca, Phys. Rev. Lett. 76, 2844 (1996).) using diode laser spectroscopy of the A-band of molecular oxygen near 760 nm. These experiments show that the probability of finding two ^16O nuclei (spin-0 particles) in an antisymmetric state is less than 1ppm. I shall also discuss proposals to test the spin-statistics connection for photons.
Residuals and the Residual-Based Statistic for Testing Goodness of Fit of Structural Equation Models
ERIC Educational Resources Information Center
Foldnes, Njal; Foss, Tron; Olsson, Ulf Henning
2012-01-01
The residuals obtained from fitting a structural equation model are crucial ingredients in obtaining chi-square goodness-of-fit statistics for the model. The authors present a didactic discussion of the residuals, obtaining a geometrical interpretation by recognizing the residuals as the result of oblique projections. This sheds light on the…
A Statistical Analysis of Infrequent Events on Multiple-Choice Tests that Indicate Probable Cheating
ERIC Educational Resources Information Center
Sundermann, Michael J.
2008-01-01
A statistical analysis of multiple-choice answers is performed to identify anomalies that can be used as evidence of student cheating. The ratio of exact errors in common (EEIC: two students put the same wrong answer for a question) to differences (D: two students get different answers) was found to be a good indicator of cheating under a wide…
Basic Mathematics Test Predicts Statistics Achievement and Overall First Year Academic Success
ERIC Educational Resources Information Center
Fonteyne, Lot; De Fruyt, Filip; Dewulf, Nele; Duyck, Wouter; Erauw, Kris; Goeminne, Katy; Lammertyn, Jan; Marchant, Thierry; Moerkerke, Beatrijs; Oosterlinck, Tom; Rosseel, Yves
2015-01-01
In the psychology and educational science programs at Ghent University, only 36.1% of the new incoming students in 2011 and 2012 passed all exams. Despite availability of information, many students underestimate the scientific character of social science programs. Statistics courses are a major obstacle in this matter. Not all enrolling students…
Comment on a Wilcox Test Statistic for Comparing Means When Variances Are Unequal.
ERIC Educational Resources Information Center
Hsiung, Tung-Hsing; And Others
1994-01-01
The alternative proposed by Wilcox (1989) to the James second-order statistic for comparing population means when variances are heterogeneous can sometimes be invalid. The degree to which the procedure is invalid depends on differences in sample size, the expected values of the observations, and population variances. (SLD)
The Adequacy of Different Robust Statistical Tests in Comparing Two Independent Groups
ERIC Educational Resources Information Center
Pero-Cebollero, Maribel; Guardia-Olmos, Joan
2013-01-01
In the current study, we evaluated various robust statistical methods for comparing two independent groups. Two scenarios for simulation were generated: one of equality and another of population mean differences. In each of the scenarios, 33 experimental conditions were used as a function of sample size, standard deviation and asymmetry. For each…
ERIC Educational Resources Information Center
Petocz, Peter; Sowey, Eric
2008-01-01
In this article, the authors focus on hypothesis testing--that peculiarly statistical way of deciding things. Statistical methods for testing hypotheses were developed in the 1920s and 1930s by some of the most famous statisticians, in particular Ronald Fisher, Jerzy Neyman and Egon Pearson, who laid the foundations of almost all modern methods of…
Statistical Profiling of Academic Oral English Proficiency Based on an ITA Screening Test
ERIC Educational Resources Information Center
Choi, Ick Kyu
2013-01-01
At the University of California, Los Angeles, the Test of Oral Proficiency (TOP), an internally developed oral proficiency test, is administered to international teaching assistant (ITA) candidates to ensure an appropriate level of academic oral English proficiency. Test taker performances are rated live by two raters according to four subscales.…
ERIC Educational Resources Information Center
Huynh, Huynh
1979-01-01
In mastery testing, the raw agreement index and the kappa index may be estimated via one test administration when the test scores follow beta-binomial distributions. This paper reports formulae, tables, and a computer program which facilitate the computation of the standard errors of the estimates. (Author/CTM)
ERIC Educational Resources Information Center
Reese, Lynda M.
This study extended prior Law School Admission Council (LSAC) research related to the item response theory (IRT) local item independence assumption into the realm of classical test theory. Initially, results from the Law School Admission Test (LSAT) and two other tests were investigated to determine the approximate state of local item independence…
Posada, David
2006-07-01
ModelTest server is a web-based application for the selection of models of nucleotide substitution using the program ModelTest. The server takes as input a text file with likelihood scores for the set of candidate models. Models can be selected with hierarchical likelihood ratio tests, or with the Akaike or Bayesian information criteria. The output includes several statistics for the assessment of model selection uncertainty, for model averaging or to estimate the relative importance of model parameters. The server can be accessed at http://darwin.uvigo.es/software/modeltest_server.html.
Posada, David
2006-01-01
ModelTest server is a web-based application for the selection of models of nucleotide substitution using the program ModelTest. The server takes as input a text file with likelihood scores for the set of candidate models. Models can be selected with hierarchical likelihood ratio tests, or with the Akaike or Bayesian information criteria. The output includes several statistics for the assessment of model selection uncertainty, for model averaging or to estimate the relative importance of model parameters. The server can be accessed at . PMID:16845102
2014-10-01
first introduced in the seminal paper by Wallis (1951). Wallis extended the previous work of Wald and Wolfowitz (1946) for a normally distrib- uted...Statistical tolerance regions: Theory, applications, and computation (Vol. 744). Hoboken, NJ: John Wiley & Sons. 22Defense ARJ, October 2014...D. C. (2001). Design and analysis of experiments (5th ed.). Hoboken, NJ: John Wiley & Sons. Montgomery, D. C. (2005). Design and analysis of
NASA Astrophysics Data System (ADS)
Perlicki, Krzysztof
2010-03-01
A low-cost statistical polarization mode dispersion/polarization dependent loss emulator is presented in this article. The emulator was constructed by concatenating 15 highly birefringence optical-fiber segments and randomly varying the mode coupling between them by rotating the polarization state. The impact of polarization effects on polarization division multiplexing transmission quality was measured. The designed polarization mode dispersion/polarization dependent loss emulator was applied to mimic the polarization effects of real optical-fiber links.
Okamura, H; Punt, A E; Semba, Y; Ichinokawa, M
2013-04-01
This paper proposes a new and flexible statistical method for marginal increment analysis that directly accounts for periodicity in circular data using a circular-linear regression model with random effects. The method is applied to vertebral marginal increment data for Alaska skate Bathyraja parmifera. The best fit model selected using the AIC indicates that growth bands are formed annually. Simulation, where the underlying characteristics of the data are known, shows that the method performs satisfactorily when uncertainty is not extremely high.
Statistical tests with accurate size and power for balanced linear mixed models.
Muller, Keith E; Edwards, Lloyd J; Simpson, Sean L; Taylor, Douglas J
2007-08-30
The convenience of linear mixed models for Gaussian data has led to their widespread use. Unfortunately, standard mixed model tests often have greatly inflated test size in small samples. Many applications with correlated outcomes in medical imaging and other fields have simple properties which do not require the generality of a mixed model. Alternately, stating the special cases as a general linear multivariate model allows analysing them with either the univariate or multivariate approach to repeated measures (UNIREP, MULTIREP). Even in small samples, an appropriate UNIREP or MULTIREP test always controls test size and has a good power approximation, in sharp contrast to mixed model tests. Hence, mixed model tests should never be used when one of the UNIREP tests (uncorrected, Huynh-Feldt, Geisser-Greenhouse, Box conservative) or MULTIREP tests (Wilks, Hotelling-Lawley, Roy's, Pillai-Bartlett) apply. Convenient methods give exact power for the uncorrected and Box conservative tests. Simulations demonstrate that new power approximations for all four UNIREP tests eliminate most inaccuracy in existing methods. In turn, free software implements the approximations to give a better choice of sample size. Two repeated measures power analyses illustrate the methods. The examples highlight the advantages of examining the entire response surface of power as a function of sample size, mean differences, and variability.
A comparison of exact tests for trend with binary endpoints using Bartholomew's statistic.
Consiglio, J D; Shan, G; Wilding, G E
2014-01-01
Tests for trend are important in a number of scientific fields when trends associated with binary variables are of interest. Implementing the standard Cochran-Armitage trend test requires an arbitrary choice of scores assigned to represent the grouping variable. Bartholomew proposed a test for qualitatively ordered samples using asymptotic critical values, but type I error control can be problematic in finite samples. To our knowledge, use of the exact probability distribution has not been explored, and we study its use in the present paper. Specifically we consider an approach based on conditioning on both sets of marginal totals and three unconditional approaches where only the marginal totals corresponding to the group sample sizes are treated as fixed. While slightly conservative, all four tests are guaranteed to have actual type I error rates below the nominal level. The unconditional tests are found to exhibit far less conservatism than the conditional test and thereby gain a power advantage.
Statistical properties of an early stopping rule for resampling-based multiple testing.
Jiang, Hui; Salzman, Julia
2012-12-01
Resampling-based methods for multiple hypothesis testing often lead to long run times when the number of tests is large. This paper presents a simple rule that substantially reduces computation by allowing resampling to terminate early on a subset of tests. We prove that the method has a low probability of obtaining a set of rejected hypotheses different from those rejected without early stopping, and obtain error bounds for multiple hypothesis testing. Simulation shows that our approach saves more computation than other available procedures.
Turnidge, John; Bordash, Gerry
2007-07-01
Quality control (QC) ranges for antimicrobial agents against QC strains for both dilution and disk diffusion testing are currently set by the Clinical and Laboratory Standards Institute (CLSI), using data gathered in predefined structured multilaboratory studies, so-called tier 2 studies. The ranges are finally selected by the relevant CLSI subcommittee, based largely on visual inspection and a few simple rules. We have developed statistical methods for analyzing the data from tier 2 studies and applied them to QC strain-antimicrobial agent combinations from 178 dilution testing data sets and 48 disk diffusion data sets, including a method for identifying possible outlier data from individual laboratories. The methods are based on the fact that dilution testing MIC data were log normally distributed and disk diffusion zone diameter data were normally distributed. For dilution testing, compared to QC ranges actually set by CLSI, calculated ranges were identical in 68% of cases, narrower in 7% of cases, and wider in 14% of cases. For disk diffusion testing, calculated ranges were identical to CLSI ranges in 33% of cases, narrower in 8% of cases, and 1 to 2 mm wider in 58% of cases. Possible outliers were detected in 8% of diffusion test data but none of the disk diffusion data. Application of statistical techniques to the analysis of QC tier 2 data and the setting of QC ranges is relatively simple to perform on spreadsheets, and the output enhances the current CLSI methods for setting of QC ranges.
ERIC Educational Resources Information Center
Wilson, Kenneth M.; Powers, Donald E.
This study was undertaken to clarify the internal structure of the Law School Admission Test (LSAT) and shed light on the ability or abilities measured by the three item types that make up the test (logical reasoning, analytical reasoning, and reading comprehension). The study used data for two forms of the LSAT for general samples of LSAT…
Hybrid Statistical Testing for Nuclear Material Accounting Data and/or Process Monitoring Data
Ticknor, Lawrence O.; Hamada, Michael Scott; Sprinkle, James K.; Burr, Thomas Lee
2015-04-14
The two tests employed in the hybrid testing scheme are Page’s cumulative sums for all streams within a Balance Period (maximum of the maximums and average of the maximums) and Crosier’s multivariate cumulative sum applied to incremental cumulative sums across Balance Periods. The role of residuals for both kinds of data is discussed.
ERIC Educational Resources Information Center
Deacon, S. Helene; Leung, Dilys
2013-01-01
This study tested the diverging predictions of recent theories of children's learning of spelling regularities. We asked younger (Grades 1 and 2) and older (Grades 3 and 4) elementary school-aged children to choose the correct endings for words that varied in their morphological structure. We tested the impacts of semantic frequency by…
Multilevel Factor Analysis by Model Segregation: New Applications for Robust Test Statistics
ERIC Educational Resources Information Center
Schweig, Jonathan
2014-01-01
Measures of classroom environments have become central to policy efforts that assess school and teacher quality. This has sparked a wide interest in using multilevel factor analysis to test measurement hypotheses about classroom-level variables. One approach partitions the total covariance matrix and tests models separately on the…
ERIC Educational Resources Information Center
Woodruff, David; Wu, Yi-Fang
2012-01-01
The purpose of this paper is to illustrate alpha's robustness and usefulness, using actual and simulated educational test data. The sampling properties of alpha are compared with the sampling properties of several other reliability coefficients: Guttman's lambda[subscript 2], lambda[subscript 4], and lambda[subscript 6]; test-retest reliability;…
NASA Technical Reports Server (NTRS)
Hughes, William O.; McNelis, Anne M.
2010-01-01
The Earth Observing System (EOS) Terra spacecraft was launched on an Atlas IIAS launch vehicle on its mission to observe planet Earth in late 1999. Prior to launch, the new design of the spacecraft's pyroshock separation system was characterized by a series of 13 separation ground tests. The analysis methods used to evaluate this unusually large amount of shock data will be discussed in this paper, with particular emphasis on population distributions and finding statistically significant families of data, leading to an overall shock separation interface level. The wealth of ground test data also allowed a derivation of a Mission Assurance level for the flight. All of the flight shock measurements were below the EOS Terra Mission Assurance level thus contributing to the overall success of the EOS Terra mission. The effectiveness of the statistical methodology for characterizing the shock interface level and for developing a flight Mission Assurance level from a large sample size of shock data is demonstrated in this paper.
Divine, George; Norton, H James; Hunt, Ronald; Dienemann, Jacqueline
2013-09-01
When a study uses an ordinal outcome measure with unknown differences in the anchors and a small range such as 4 or 7, use of the Wilcoxon rank sum test or the Wilcoxon signed rank test may be most appropriate. However, because nonparametric methods are at best indirect functions of standard measures of location such as means or medians, the choice of the most appropriate summary measure can be difficult. The issues underlying use of these tests are discussed. The Wilcoxon-Mann-Whitney odds directly reflects the quantity that the rank sum procedure actually tests, and thus it can be a superior summary measure. Unlike the means and medians, its value will have a one-to-one correspondence with the Wilcoxon rank sum test result. The companion article appearing in this issue of Anesthesia & Analgesia ("Aromatherapy as Treatment for Postoperative Nausea: A Randomized Trial") illustrates these issues and provides an example of a situation for which the medians imply no difference between 2 groups, even though the groups are, in fact, quite different. The trial cited also provides an example of a single sample that has a median of zero, yet there is a substantial shift for much of the nonzero data, and the Wilcoxon signed rank test is quite significant. These examples highlight the potential discordance between medians and Wilcoxon test results. Along with the issues surrounding the choice of a summary measure, there are considerations for the computation of sample size and power, confidence intervals, and multiple comparison adjustment. In addition, despite the increased robustness of the Wilcoxon procedures relative to parametric tests, some circumstances in which the Wilcoxon tests may perform poorly are noted, along with alternative versions of the procedures that correct for such limitations.
Fujita, André; Takahashi, Daniel Y; Patriota, Alexandre G; Sato, João R
2014-12-10
Statistical inference of functional magnetic resonance imaging (fMRI) data is an important tool in neuroscience investigation. One major hypothesis in neuroscience is that the presence or not of a psychiatric disorder can be explained by the differences in how neurons cluster in the brain. Therefore, it is of interest to verify whether the properties of the clusters change between groups of patients and controls. The usual method to show group differences in brain imaging is to carry out a voxel-wise univariate analysis for a difference between the mean group responses using an appropriate test and to assemble the resulting 'significantly different voxels' into clusters, testing again at cluster level. In this approach, of course, the primary voxel-level test is blind to any cluster structure. Direct assessments of differences between groups at the cluster level seem to be missing in brain imaging. For this reason, we introduce a novel non-parametric statistical test called analysis of cluster structure variability (ANOCVA), which statistically tests whether two or more populations are equally clustered. The proposed method allows us to compare the clustering structure of multiple groups simultaneously and also to identify features that contribute to the differential clustering. We illustrate the performance of ANOCVA through simulations and an application to an fMRI dataset composed of children with attention deficit hyperactivity disorder (ADHD) and controls. Results show that there are several differences in the clustering structure of the brain between them. Furthermore, we identify some brain regions previously not described to be involved in the ADHD pathophysiology, generating new hypotheses to be tested. The proposed method is general enough to be applied to other types of datasets, not limited to fMRI, where comparison of clustering structures is of interest.
NASA Technical Reports Server (NTRS)
Dimitri, P. S.; Wall, C. 3rd; Oas, J. G.; Rauch, S. D.
2001-01-01
Meniere's disease (MD) and migraine associated dizziness (MAD) are two disorders that can have similar symptomatologies, but differ vastly in treatment. Vestibular testing is sometimes used to help differentiate between these disorders, but the inefficiency of a human interpreter analyzing a multitude of variables independently decreases its utility. Our hypothesis was that we could objectively discriminate between patients with MD and those with MAD using select variables from the vestibular test battery. Sinusoidal harmonic acceleration test variables were reduced to three vestibulo-ocular reflex physiologic parameters: gain, time constant, and asymmetry. A combination of these parameters plus a measurement of reduced vestibular response from caloric testing allowed us to achieve a joint classification rate of 91%, independent quadratic classification algorithm. Data from posturography were not useful for this type of differentiation. Overall, our classification function can be used as an unbiased assistant to discriminate between MD and MAD and gave us insight into the pathophysiologic differences between the two disorders.
Antweiler, Kai; Schreiter, Susanne; Keilwagen, Jens; Baldrian, Petr; Kropf, Siegfried; Smalla, Kornelia; Grosch, Rita; Heuer, Holger
2017-03-01
A statistical method was developed to test for equivalence of microbial communities analysed by next-generation sequencing of amplicons. The test uses Bray-Curtis distances between the microbial community structures and is based on a two-sample jackknife procedure. This approach was applied to investigate putative effects of the antifungal biocontrol strain RU47 on fungal communities in three arable soils which were analysed by high-throughput ITS amplicon sequencing. Two contrasting workflows to produce abundance tables of operational taxonomic units from sequence data were applied. For both, the developed test indicated highly significant equivalence of the fungal communities with or without previous exposure to RU47 for all soil types, with reference to fungal community differences in conjunction with field site or cropping history. However, minor effects of RU47 on fungal communities were statistically significant using highly sensitive multivariate tests. Nearly all fungal taxa responding to RU47 increased in relative abundance indicating the absence of ecotoxicological effects. Use of the developed equivalence test is not restricted to evaluate effects on soil microbial communities by inoculants for biocontrol, bioremediation or other purposes, but could also be applied for biosafety assessment of compounds like pesticides, or genetically engineered plants.
A Statistical Test of Walrasian Equilibrium by Means of Complex Networks Theory
NASA Astrophysics Data System (ADS)
Bargigli, Leonardo; Viaggiu, Stefano; Lionetto, Andrea
2016-10-01
We represent an exchange economy in terms of statistical ensembles for complex networks by introducing the concept of market configuration. This is defined as a sequence of nonnegative discrete random variables {w_{ij}} describing the flow of a given commodity from agent i to agent j. This sequence can be arranged in a nonnegative matrix W which we can regard as the representation of a weighted and directed network or digraph G. Our main result consists in showing that general equilibrium theory imposes highly restrictive conditions upon market configurations, which are in most cases not fulfilled by real markets. An explicit example with reference to the e-MID interbank credit market is provided.
ERIC Educational Resources Information Center
Glas, C. A. W.
In a previous study (1998), how to evaluate whether adaptive testing data used for online calibration sufficiently fit the item response model used by C. Glas was studied. Three approaches were suggested, based on a Lagrange multiplier (LM) statistic, a Wald statistic, and a cumulative sum (CUMSUM) statistic respectively. For all these methods,…
Nataraja, M.C.; Dhang, N.; Gupta, A.P.
1999-07-01
The variation in impact resistance of steel fiber-reinforced concrete and plain concrete as determined from a drop weight test is reported. The observed coefficients of variation are about 57 and 46% for first-crack resistance and the ultimate resistance in the case of fiber concrete and the corresponding values for plain concrete are 54 and 51%, respectively. The goodness-of-fit test indicated poor fitness of the impact-resistance test results produced in this study to normal distribution at 95% level of confidence for both fiber-reinforced and plain concrete. However, the percentage increase in number of blows from first crack to failure for both fiber-reinforced concrete and as well as plain concrete fit to normal distribution as indicated by the goodness-of-fit test. The coefficient of variation in percentage increase in the number of blows beyond first crack for fiber-reinforced concrete and plain concrete is 51.9 and 43.1%, respectively. Minimum number of tests required to reliably measure the properties of the material can be suggested based on the observed levels of variation.
Goedhart, Paul W; van der Voet, Hilko; Baldacchino, Ferdinando; Arpaia, Salvatore
2014-01-01
Genetic modification of plants may result in unintended effects causing potentially adverse effects on the environment. A comparative safety assessment is therefore required by authorities, such as the European Food Safety Authority, in which the genetically modified plant is compared with its conventional counterpart. Part of the environmental risk assessment is a comparative field experiment in which the effect on non-target organisms is compared. Statistical analysis of such trials come in two flavors: difference testing and equivalence testing. It is important to know the statistical properties of these, for example, the power to detect environmental change of a given magnitude, before the start of an experiment. Such prospective power analysis can best be studied by means of a statistical simulation model. This paper describes a general framework for simulating data typically encountered in environmental risk assessment of genetically modified plants. The simulation model, available as Supplementary Material, can be used to generate count data having different statistical distributions possibly with excess-zeros. In addition the model employs completely randomized or randomized block experiments, can be used to simulate single or multiple trials across environments, enables genotype by environment interaction by adding random variety effects, and finally includes repeated measures in time following a constant, linear or quadratic pattern in time possibly with some form of autocorrelation. The model also allows to add a set of reference varieties to the GM plants and its comparator to assess the natural variation which can then be used to set limits of concern for equivalence testing. The different count distributions are described in some detail and some examples of how to use the simulation model to study various aspects, including a prospective power analysis, are provided. PMID:24834325
Goedhart, Paul W; van der Voet, Hilko; Baldacchino, Ferdinando; Arpaia, Salvatore
2014-04-01
Genetic modification of plants may result in unintended effects causing potentially adverse effects on the environment. A comparative safety assessment is therefore required by authorities, such as the European Food Safety Authority, in which the genetically modified plant is compared with its conventional counterpart. Part of the environmental risk assessment is a comparative field experiment in which the effect on non-target organisms is compared. Statistical analysis of such trials come in two flavors: difference testing and equivalence testing. It is important to know the statistical properties of these, for example, the power to detect environmental change of a given magnitude, before the start of an experiment. Such prospective power analysis can best be studied by means of a statistical simulation model. This paper describes a general framework for simulating data typically encountered in environmental risk assessment of genetically modified plants. The simulation model, available as Supplementary Material, can be used to generate count data having different statistical distributions possibly with excess-zeros. In addition the model employs completely randomized or randomized block experiments, can be used to simulate single or multiple trials across environments, enables genotype by environment interaction by adding random variety effects, and finally includes repeated measures in time following a constant, linear or quadratic pattern in time possibly with some form of autocorrelation. The model also allows to add a set of reference varieties to the GM plants and its comparator to assess the natural variation which can then be used to set limits of concern for equivalence testing. The different count distributions are described in some detail and some examples of how to use the simulation model to study various aspects, including a prospective power analysis, are provided.
On statistical tests for homogeneity of two bivariate zero-inflated Poisson populations.
Yuen, Hak-Keung; Chow, Shein-Chung; Tse, Siu-Keung
2015-01-01
The problem of testing treatment difference in the occurrence of a study endpoint in a randomized parallel-group comparative clinical trial with repeated responses under the assumption that the responses follow a bivariate zero-inflated Poisson (ZIP) distribution is considered. Likelihood ratio test for homogeneity of two bivariate ZIP populations is derived. Approximate formula for sample size calculation is also obtained, which achieves a desired power for detecting a clinically meaningful difference under an alternative hypothesis. An example concerning the comparison of treatment effect in an addictive clinical trial in terms of the number of days of illicit drug use during a month is given for illustrative purposes.
Garbarino, J.R.; Jones, B.E.; Stein, G.P.
1985-01-01
In an interlaboratory test, inductively coupled plasma atomic emission spectrometry (ICP-AES) was compared with flame atomic absorption spectrometry and molecular absorption spectrophotometry for the determination of 17 major and trace elements in 100 filtered natural water samples. No unacceptable biases were detected. The analysis precision of ICP-AES was found to be equal to or better than alternative methods. Known-addition recovery experiments demonstrated that the ICP-AES determinations are accurate to between plus or minus 2 and plus or minus 10 percent; four-fifths of the tests yielded average recoveries of 95-105 percent, with an average relative standard deviation of about 5 percent.
Statistical analysis and ground-based testing of the on-orbit Space Shuttle damage detection sensors
NASA Astrophysics Data System (ADS)
Miles, Brian H.; Tanner, Elizabeth A.; Carter, John P.; Kamerman, Gary W.; Schwartz, Robert
2005-05-01
The loss of Space Shuttle Columbia and her crew led to the creation of the Columbia Accident Investigation Board (CAIB), which concluded that a piece of external fuel tank insulating foam impacted the Shuttle"s wing leading edge. The foam created a hole in the reinforced carbon/carbon (RCC) insulating material which gravely compromised the Shuttle"s thermal protection system (TPS). In response to the CAIB recommendation, the upcoming Return to Flight Shuttle Mission (STS-114) NASA will include a Shuttle deployed sensor suite which, among other sensors, will include two laser sensing systems, Sandia National Lab"s Laser Dynamic Range Imager (LDRI) and Neptec"s Laser Camera System (LCS) to collect 3-D imagery of the Shuttle"s exterior. Herein is described a ground-based statistical testing procedure that will be used by NASA as part of a damage detection performance assessment studying the performance of each of the two laser radar systems in detecting and identifying impact damage to the Shuttle. A statistical framework based on binomial and Bayesian statistics is used to describe the probability of detection and associated statistical confidence. A mock-up of a section of Shuttle wing RCC with interchangeable panels includes a random pattern of 1/4" and 1" diameter holes on the simulated RCC panels and is cataloged prior to double-blind testing. A team of ladar sensor operators will acquire laser radar imagery of the wing mock-up using a robotic platform in a laboratory at Johnson Space Center to execute linear image scans of the wing mock-up. The test matrix will vary robotic platform motion to simulate boom wobble and alter lighting and background conditions at the 6.5 foot and 10 foot sensor-wing stand-off distances to be used on orbit. A separate team of image analysts will process and review the data and characterize and record the damage that is found. A suite of software programs has been developed to support hole location definition, damage disposition
Kipiński, Lech; König, Reinhard; Sielużycki, Cezary; Kordecki, Wojciech
2011-10-01
Stationarity is a crucial yet rarely questioned assumption in the analysis of time series of magneto- (MEG) or electroencephalography (EEG). One key drawback of the commonly used tests for stationarity of encephalographic time series is the fact that conclusions on stationarity are only indirectly inferred either from the Gaussianity (e.g. the Shapiro-Wilk test or Kolmogorov-Smirnov test) or the randomness of the time series and the absence of trend using very simple time-series models (e.g. the sign and trend tests by Bendat and Piersol). We present a novel approach to the analysis of the stationarity of MEG and EEG time series by applying modern statistical methods which were specifically developed in econometrics to verify the hypothesis that a time series is stationary. We report our findings of the application of three different tests of stationarity--the Kwiatkowski-Phillips-Schmidt-Schin (KPSS) test for trend or mean stationarity, the Phillips-Perron (PP) test for the presence of a unit root and the White test for homoscedasticity--on an illustrative set of MEG data. For five stimulation sessions, we found already for short epochs of duration of 250 and 500 ms that, although the majority of the studied epochs of single MEG trials were usually mean-stationary (KPSS test and PP test), they were classified as nonstationary due to their heteroscedasticity (White test). We also observed that the presence of external auditory stimulation did not significantly affect the findings regarding the stationarity of the data. We conclude that the combination of these tests allows a refined analysis of the stationarity of MEG and EEG time series.
Tests of Mediation: Paradoxical Decline in Statistical Power as a Function of Mediator Collinearity
ERIC Educational Resources Information Center
Beasley, T. Mark
2014-01-01
Increasing the correlation between the independent variable and the mediator ("a" coefficient) increases the effect size ("ab") for mediation analysis; however, increasing a by definition increases collinearity in mediation models. As a result, the standard error of product tests increase. The variance inflation caused by…
Brain morphometry measurements are required in test guidelines proposed by the USEPA to screen chemicals for developmental neurotoxicity. Because the DNT is a screening battery, the analysis of this data should be sensitive to dose-related changes in the pattern of brain growt...
ERIC Educational Resources Information Center
Godleski, Stephanie A.; Ostrov, Jamie M.
2010-01-01
The present study used both categorical and dimensional approaches to test the association between relational and physical aggression and hostile intent attributions for both relational and instrumental provocation situations using the National Institute of Child Health and Human Development longitudinal Study of Early Child Care and Youth…
The Probability of Exceedance as a Nonparametric Person-Fit Statistic for Tests of Moderate Length
ERIC Educational Resources Information Center
Tendeiro, Jorge N.; Meijer, Rob R.
2013-01-01
To classify an item score pattern as not fitting a nonparametric item response theory (NIRT) model, the probability of exceedance (PE) of an observed response vector x can be determined as the sum of the probabilities of all response vectors that are, at most, as likely as x, conditional on the test's total score. Vector x is to be considered…
Zaki, Rafdzah; Bulgiba, Awang; Nordin, Noorhaire; Azina Ismail, Noor
2013-01-01
Objective(s): Reliability measures precision or the extent to which test results can be replicated. This is the first ever systematic review to identify statistical methods used to measure reliability of equipment measuring continuous variables. This studyalso aims to highlight the inappropriate statistical method used in the reliability analysis and its implication in the medical practice. Materials and Methods: In 2010, five electronic databases were searched between 2007 and 2009 to look for reliability studies. A total of 5,795 titles were initially identified. Only 282 titles were potentially related, and finally 42 fitted the inclusion criteria. Results: The Intra-class Correlation Coefficient (ICC) is the most popular method with 25 (60%) studies having used this method followed by the comparing means (8 or 19%). Out of 25 studies using the ICC, only 7 (28%) reported the confidence intervals and types of ICC used. Most studies (71%) also tested the agreement of instruments. Conclusion: This study finds that the Intra-class Correlation Coefficient is the most popular method used to assess the reliability of medical instruments measuring continuous outcomes. There are also inappropriate applications and interpretations of statistical methods in some studies. It is important for medical researchers to be aware of this issue, and be able to correctly perform analysis in reliability studies. PMID:23997908
Adams, Michael C; Barbano, David M
2015-06-01
Our objective was to develop a statistical approach that could be used to determine whether a handler's fat, protein, or other solids mid-infrared (MIR) spectrophotometer test values were different, on average, from a milk regulatory laboratory's MIR test values when split-sampling test values are not available. To accomplish this objective, the Proc GLM procedure of SAS (SAS Institute Inc., Cary, NC) was used to develop a multiple linear regression model to evaluate 4 mo of MIR producer payment testing data (112 to 167 producers per month) from 2 different MIR instruments. For each of the 4 mo and each of the 2 components (fat or protein), the GLM model was Response=Instrument+Producer+Date+2-Way Interactions+3-Way Interaction. Instrument was significant in determining fat and protein tests for 3 of the 4 mo, and Producer was significant in determining fat and protein tests for all 4 mo. This model was also used to establish fat and protein least significant differences (LSD) between instruments. Fat LSD between instruments ranged from 0.0108 to 0.0144% (α=0.05) for the 4 mo studied, whereas protein LSD between instruments ranged from 0.0046 to 0.0085% (α=0.05). In addition, regression analysis was used to determine the effects of component concentration and date of sampling on fat and protein differences between 2 MIR instruments. This statistical approach could be performed monthly to document a regulatory laboratory's verification that a given handler's instrument has obtained a different test result, on average, from that of the regulatory laboratory's and that an adjustment to producer payment may be required.
NASA Astrophysics Data System (ADS)
Pater, Liana; Miclea, Şerban; Izvercian, Monica
2016-06-01
This paper considers the impact of SMEs' annual turnover upon its marketing activities (in terms of marketing responsibility, strategic planning and budgeting). Empirical results and literature reviews unveil that SMEs managers incline to partake in planned and profitable marketing activities, depending on their turnover's level. Thus, using the collected data form 131 Romanian SMEs managers, we have applied the Chi-Square Test in order to validate or invalidate three research assumptions (hypotheses), created starting from the empirical and literature findings.
Rey deCastro, B; Neuberg, Donna
2007-05-30
Biological assays often utilize experimental designs where observations are replicated at multiple levels, and where each level represents a separate component of the assay's overall variance. Statistical analysis of such data usually ignores these design effects, whereas more sophisticated methods would improve the statistical power of assays. This report evaluates the statistical performance of an in vitro MCF-7 cell proliferation assay (E-SCREEN) by identifying the optimal generalized linear mixed model (GLMM) that accurately represents the assay's experimental design and variance components. Our statistical assessment found that 17beta-oestradiol cell culture assay data were best modelled with a GLMM configured with a reciprocal link function, a gamma error distribution, and three sources of design variation: plate-to-plate; well-to-well, and the interaction between plate-to-plate variation and dose. The gamma-distributed random error of the assay was estimated to have a coefficient of variation (COV) = 3.2 per cent, and a variance component score test described by X. Lin found that each of the three variance components were statistically significant. The optimal GLMM also confirmed the estrogenicity of five weakly oestrogenic polychlorinated biphenyls (PCBs 17, 49, 66, 74, and 128). Based on information criteria, the optimal gamma GLMM consistently out-performed equivalent naive normal and log-normal linear models, both with and without random effects terms. Because the gamma GLMM was by far the best model on conceptual and empirical grounds, and requires only trivially more effort to use, we encourage its use and suggest that naive models be avoided when possible.
Cohn, T.A.; England, J.F.; Berenbrock, C.E.; Mason, R.R.; Stedinger, J.R.; Lamontagne, J.R.
2013-01-01
he Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as “less-than” values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.
Zaki, Rafdzah; Bulgiba, Awang; Ismail, Roshidi; Ismail, Noor Azina
2012-01-01
Background Accurate values are a must in medicine. An important parameter in determining the quality of a medical instrument is agreement with a gold standard. Various statistical methods have been used to test for agreement. Some of these methods have been shown to be inappropriate. This can result in misleading conclusions about the validity of an instrument. The Bland-Altman method is the most popular method judging by the many citations of the article proposing this method. However, the number of citations does not necessarily mean that this method has been applied in agreement research. No previous study has been conducted to look into this. This is the first systematic review to identify statistical methods used to test for agreement of medical instruments. The proportion of various statistical methods found in this review will also reflect the proportion of medical instruments that have been validated using those particular methods in current clinical practice. Methodology/Findings Five electronic databases were searched between 2007 and 2009 to look for agreement studies. A total of 3,260 titles were initially identified. Only 412 titles were potentially related, and finally 210 fitted the inclusion criteria. The Bland-Altman method is the most popular method with 178 (85%) studies having used this method, followed by the correlation coefficient (27%) and means comparison (18%). Some of the inappropriate methods highlighted by Altman and Bland since the 1980s are still in use. Conclusions This study finds that the Bland-Altman method is the most popular method used in agreement research. There are still inappropriate applications of statistical methods in some studies. It is important for a clinician or medical researcher to be aware of this issue because misleading conclusions from inappropriate analyses will jeopardize the quality of the evidence, which in turn will influence quality of care given to patients in the future. PMID:22662248
NASA Astrophysics Data System (ADS)
Nielsen, Allan A.; Conradsen, Knut; Skriver, Henning
2016-10-01
Test statistics for comparison of real (as opposed to complex) variance-covariance matrices exist in the statistics literature [1]. In earlier publications we have described a test statistic for the equality of two variance-covariance matrices following the complex Wishart distribution with an associated p-value [2]. We showed their application to bitemporal change detection and to edge detection [3] in multilook, polarimetric synthetic aperture radar (SAR) data in the covariance matrix representation [4]. The test statistic and the associated p-value is described in [5] also. In [6] we focussed on the block-diagonal case, we elaborated on some computer implementation issues, and we gave examples on the application to change detection in both full and dual polarization bitemporal, bifrequency, multilook SAR data. In [7] we described an omnibus test statistic Q for the equality of k variance-covariance matrices following the complex Wishart distribution. We also described a factorization of Q = R2 R3 … Rk where Q and Rj determine if and when a difference occurs. Additionally, we gave p-values for Q and Rj. Finally, we demonstrated the use of Q and Rj and the p-values to change detection in truly multitemporal, full polarization SAR data. Here we illustrate the methods by means of airborne L-band SAR data (EMISAR) [8,9]. The methods may be applied to other polarimetric SAR data also such as data from Sentinel-1, COSMO-SkyMed, TerraSAR-X, ALOS, and RadarSat-2 and also to single-pol data. The account given here closely follows that given our recent IEEE TGRS paper [7]. Selected References [1] Anderson, T. W., An Introduction to Multivariate Statistical Analysis, John Wiley, New York, third ed. (2003). [2] Conradsen, K., Nielsen, A. A., Schou, J., and Skriver, H., "A test statistic in the complex Wishart distribution and its application to change detection in polarimetric SAR data," IEEE Transactions on Geoscience and Remote Sensing 41(1): 4-19, 2003. [3] Schou, J
Lew, Bartosz
2008-08-15
We introduce and analyze a method for testing statistical isotropy and Gaussianity and apply it to the Wilkinson Microwave Anisotropy Probe (WMAP) cosmic microwave background (CMB) foreground reduced temperature maps. We also test cross-channel difference maps to constrain levels of residual foreground contamination and systematic uncertainties. We divide the sky into regions of varying size and shape and measure the first four moments of the one-point distribution within these regions, and using their simulated spatial distributions we test the statistical isotropy and Gaussianity hypotheses. By randomly varying orientations of these regions, we sample the underlying CMB field in a new manner, that offers a richer exploration of the data content, and avoids possible biasing due to a single choice of sky division. In our analysis we account for all two-point correlations between different regions and also show the impact on the results when these correlations are neglected. The statistical significance is assessed via comparison with realistic Monte Carlo simulations. We find the three-year WMAP maps to agree well with the isotropic, Gaussian random field simulations as probed by regions corresponding to the angular scales ranging from 6 Degree-Sign to 30 Degree-Sign at 68% confidence level (CL). We report a strong, anomalous (99.8% CL) dipole 'excess' in the V band of the three-year WMAP data and also in the V band of the WMAP five-year data (99.3% CL). Using our statistics, we notice large scale hemispherical power asymmetry, and find that it is not highly statistically significant in the WMAP three-year data ( Less-Than-Or-Equivalent-To 97%) at scales l{<=}40. The significance is even smaller if multipoles up to l=1024 are considered ({approx}90% CL). We give constraints on the amplitude of the previously proposed CMB dipole modulation field parameter. We find some hints of foreground contamination in the form of a locally strong, anomalous kurtosis excess in
EEG-based Drowsiness Detection for Safe Driving Using Chaotic Features and Statistical Tests.
Mardi, Zahra; Ashtiani, Seyedeh Naghmeh Miri; Mikaili, Mohammad
2011-05-01
Electro encephalography (EEG) is one of the most reliable sources to detect sleep onset while driving. In this study, we have tried to demonstrate that sleepiness and alertness signals are separable with an appropriate margin by extracting suitable features. So, first of all, we have recorded EEG signals from 10 volunteers. They were obliged to avoid sleeping for about 20 hours before the test. We recorded the signals while subjects did a virtual driving game. They tried to pass some barriers that were shown on monitor. Process of recording was ended after 45 minutes. Then, after preprocessing of recorded signals, we labeled them by drowsiness and alertness by using times associated with pass times of the barriers or crash times to them. Then, we extracted some chaotic features (include Higuchi's fractal dimension and Petrosian's fractal dimension) and logarithm of energy of signal. By applying the two-tailed t-test, we have shown that these features can create 95% significance level of difference between drowsiness and alertness in each EEG channels. Ability of each feature has been evaluated by artificial neural network and accuracy of classification with all features was about 83.3% and this accuracy has been obtained without performing any optimization process on classifier.
Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M.; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert
2016-01-01
The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008–2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0. PMID:27892471
NASA Astrophysics Data System (ADS)
Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M.; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert
2016-11-01
The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008–2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.
NASA Astrophysics Data System (ADS)
Verschuur, Gerrit L.
2014-06-01
The archive of IRIS, PLANCK and WMAP data available at the IRSA website of IPAC allows the apparent associations between galactic neutral hydrogen (HI) features and small-scale structure in WMAP and PLANCK data to be closely examined. In addition, HI new observations made with the Green Bank Telescope are used to perform a statistical test of putative associations. It is concluded that attention should be paid to the possibility that some of the small-scale structure found in WMAP and PLANCK data harbors the signature of a previously unrecognized source of high-frequency continuum emission in the Galaxy.
A Test Model for Fluctuation-Dissipation Theorems with Time Periodic Statistics (PREPRINT)
2010-03-09
Cov(u2(, u ∗ 2) ∂γ2 = − ∫ t −∞ ∫ t −∞ (2t− s− r)e−γ2(2t−s−r) ( 〈ψ(s, t)ψ(r, t)〉 − 〈ψ(s, t)〉〈ψ(r, t)〉 ) f2(s)f2(r)dsdr, (88) References [1] R. Abramov ...Short-time linear response with reduced-rank tangent map. Chinese Annals of Mathematics, Series B., 30:447–462, 2009. [2] R. Abramov and A.J. Majda... Abramov and A.J. Majda. New approximations and tests of linear fluctuation-response for chaotic nonlinear forced-dissipative dynamical sys- tems. J
A statistical test for periodicity hypothesis in the crater formation rate
NASA Astrophysics Data System (ADS)
Yabushita, S.
1991-06-01
The hypothesis that the crater formation rate exhibits periodicity is examined by adopting a criterion proposed by Broadbent, which is more stringent than those adopted previously. Data sets of Alvarez and Muller, Rampino and Stothers and of Grieve are tested. The data set of Rampino and Stothers is found to satisfy the adopted criterion for periodicity with period P = 30 Myr. Again, small craters (D less than 10 km) in the data set of Grieve satisfy the criterion even better with P = 30 Myr and 50 Myr, but large craters do not satisfy the criterion. Removal of some of the very young craters (ages less than 8 Myr) yields three significant periods, 16.5, 30, and 50 Myr. Taken at face value, the result would indicate that small impactors hit the earth at intervals of 16.5 Myr and that this period is modulated by the galactic tide.
Nosedal-Sanchez, Alvaro; Jackson, Charles S.; Huerta, Gabriel
2016-07-20
A new test statistic for climate model evaluation has been developed that potentially mitigates some of the limitations that exist for observing and representing field and space dependencies of climate phenomena. Traditionally such dependencies have been ignored when climate models have been evaluated against observational data, which makes it difficult to assess whether any given model is simulating observed climate for the right reasons. The new statistic uses Gaussian Markov random fields for estimating field and space dependencies within a first-order grid point neighborhood structure. We illustrate the ability of Gaussian Markov random fields to represent empirical estimates of fieldmore » and space covariances using "witch hat" graphs. We further use the new statistic to evaluate the tropical response of a climate model (CAM3.1) to changes in two parameters important to its representation of cloud and precipitation physics. Overall, the inclusion of dependency information did not alter significantly the recognition of those regions of parameter space that best approximated observations. However, there were some qualitative differences in the shape of the response surface that suggest how such a measure could affect estimates of model uncertainty.« less
Shang, Han Lin
2015-01-01
Although there are continuing developments in the methods for forecasting mortality, there are few comparisons of the accuracy of the forecasts. The subject of the statistical validity of these comparisons, which is essential to demographic forecasting, has all but been ignored. We introduce Friedman's test statistics to examine whether the differences in point and interval forecast accuracies are statistically significant between methods. We introduce the Nemenyi test statistic to identify which methods give results that are statistically significantly different from others. Using sex-specific and age-specific data from 20 countries, we apply these two test statistics to examine the forecast accuracy obtained from several principal component methods, which can be categorized into coherent and non-coherent forecasting methods.
Möbius, Wolfram; Gerland, Ulrich
2010-08-19
The positions of nucleosomes in eukaryotic genomes determine which parts of the DNA sequence are readily accessible for regulatory proteins and which are not. Genome-wide maps of nucleosome positions have revealed a salient pattern around transcription start sites, involving a nucleosome-free region (NFR) flanked by a pronounced periodic pattern in the average nucleosome density. While the periodic pattern clearly reflects well-positioned nucleosomes, the positioning mechanism is less clear. A recent experimental study by Mavrich et al. argued that the pattern observed in Saccharomyces cerevisiae is qualitatively consistent with a "barrier nucleosome model," in which the oscillatory pattern is created by the statistical positioning mechanism of Kornberg and Stryer. On the other hand, there is clear evidence for intrinsic sequence preferences of nucleosomes, and it is unclear to what extent these sequence preferences affect the observed pattern. To test the barrier nucleosome model, we quantitatively analyze yeast nucleosome positioning data both up- and downstream from NFRs. Our analysis is based on the Tonks model of statistical physics which quantifies the interplay between the excluded-volume interaction of nucleosomes and their positional entropy. We find that although the typical patterns on the two sides of the NFR are different, they are both quantitatively described by the same physical model with the same parameters, but different boundary conditions. The inferred boundary conditions suggest that the first nucleosome downstream from the NFR (the +1 nucleosome) is typically directly positioned while the first nucleosome upstream is statistically positioned via a nucleosome-repelling DNA region. These boundary conditions, which can be locally encoded into the genome sequence, significantly shape the statistical distribution of nucleosomes over a range of up to approximately 1,000 bp to each side.
Burr, Tom; Hamada, Michael S.; Ticknor, Larry; Sprinkle, James
2015-01-01
The aim of nuclear safeguards is to ensure that special nuclear material is used for peaceful purposes. Historically, nuclear material accounting (NMA) has provided the quantitative basis for monitoring for nuclear material loss or diversion, and process monitoring (PM) data is collected by the operator to monitor the process. PM data typically support NMA in various ways, often by providing a basis to estimate some of the in-process nuclear material inventory. We develop options for combining PM residuals and NMA residuals (residual = measurement - prediction), using a hybrid of period-driven and data-driven hypothesis testing. The modified statistical tests can be used on time series of NMA residuals (the NMA residual is the familiar material balance), or on a combination of PM and NMA residuals. The PM residuals can be generated on a fixed time schedule or as events occur.
Burr, Tom; Hamada, Michael S.; Ticknor, Larry; ...
2015-01-01
The aim of nuclear safeguards is to ensure that special nuclear material is used for peaceful purposes. Historically, nuclear material accounting (NMA) has provided the quantitative basis for monitoring for nuclear material loss or diversion, and process monitoring (PM) data is collected by the operator to monitor the process. PM data typically support NMA in various ways, often by providing a basis to estimate some of the in-process nuclear material inventory. We develop options for combining PM residuals and NMA residuals (residual = measurement - prediction), using a hybrid of period-driven and data-driven hypothesis testing. The modified statistical tests canmore » be used on time series of NMA residuals (the NMA residual is the familiar material balance), or on a combination of PM and NMA residuals. The PM residuals can be generated on a fixed time schedule or as events occur.« less
NASA Technical Reports Server (NTRS)
Matney, Mark
2011-01-01
A number of statistical tools have been developed over the years for assessing the risk of reentering objects to human populations. These tools make use of the characteristics (e.g., mass, material, shape, size) of debris that are predicted by aerothermal models to survive reentry. The statistical tools use this information to compute the probability that one or more of the surviving debris might hit a person on the ground and cause one or more casualties. The statistical portion of the analysis relies on a number of assumptions about how the debris footprint and the human population are distributed in latitude and longitude, and how to use that information to arrive at realistic risk numbers. Because this information is used in making policy and engineering decisions, it is important that these assumptions be tested using empirical data. This study uses the latest database of known uncontrolled reentry locations measured by the United States Department of Defense. The predicted ground footprint distributions of these objects are based on the theory that their orbits behave basically like simple Kepler orbits. However, there are a number of factors in the final stages of reentry - including the effects of gravitational harmonics, the effects of the Earth s equatorial bulge on the atmosphere, and the rotation of the Earth and atmosphere - that could cause them to diverge from simple Kepler orbit behavior and possibly change the probability of reentering over a given location. In this paper, the measured latitude and longitude distributions of these objects are directly compared with the predicted distributions, providing a fundamental empirical test of the model assumptions.
NASA Astrophysics Data System (ADS)
Matney, M.
2012-01-01
A number of statistical tools have been developed over the years for assessing the risk of reentering objects to human populations. These tools make use of the characteristics (e.g., mass, material, shape, size) of debris that are predicted by aerothermal models to survive reentry. The statistical tools use this information to compute the probability that one or more of the surviving debris might hit a person on the ground and cause one or more casualties. The statistical portion of the analysis relies on a number of assumptions about how the debris footprint and the human population are distributed in latitude and longitude, and how to use that information to arrive at realistic risk numbers. Because this information is used in making policy and engineering decisions, it is important that these assumptions be tested using empirical data. This study uses the latest database of known uncontrolled reentry locations measured by the United States Department of Defense. The predicted ground footprint distributions of these objects are based on the theory that their orbits behave basically like simple Kepler orbits. However, there are a number of factors in the final stages of reentry - including the effects of gravitational harmonics, the effects of the Earth's equatorial bulge on the atmosphere, and the rotation of the Earth and atmosphere - that could cause them to diverge from simple Kepler orbit behavior and possibly change the probability of reentering over a given location. In this paper, the measured latitude and longitude distributions of these objects are directly compared with the predicted distributions, providing a fundamental empirical test of the model assumptions.
Xiao, Qingtai; Xu, Jianxin; Wang, Hua
2016-01-01
A new index, the estimate of the error variance, which can be used to quantify the evolution of the flow patterns when multiphase components or tracers are difficultly distinguishable, was proposed. The homogeneity degree of the luminance space distribution behind the viewing windows in the direct contact boiling heat transfer process was explored. With image analysis and a linear statistical model, the F-test of the statistical analysis was used to test whether the light was uniform, and a non-linear method was used to determine the direction and position of a fixed source light. The experimental results showed that the inflection point of the new index was approximately equal to the mixing time. The new index has been popularized and applied to a multiphase macro mixing process by top blowing in a stirred tank. Moreover, a general quantifying model was introduced for demonstrating the relationship between the flow patterns of the bubble swarms and heat transfer. The results can be applied to investigate other mixing processes that are very difficult to recognize the target. PMID:27527065
NASA Astrophysics Data System (ADS)
Xiao, Qingtai; Xu, Jianxin; Wang, Hua
2016-08-01
A new index, the estimate of the error variance, which can be used to quantify the evolution of the flow patterns when multiphase components or tracers are difficultly distinguishable, was proposed. The homogeneity degree of the luminance space distribution behind the viewing windows in the direct contact boiling heat transfer process was explored. With image analysis and a linear statistical model, the F-test of the statistical analysis was used to test whether the light was uniform, and a non-linear method was used to determine the direction and position of a fixed source light. The experimental results showed that the inflection point of the new index was approximately equal to the mixing time. The new index has been popularized and applied to a multiphase macro mixing process by top blowing in a stirred tank. Moreover, a general quantifying model was introduced for demonstrating the relationship between the flow patterns of the bubble swarms and heat transfer. The results can be applied to investigate other mixing processes that are very difficult to recognize the target.
Xiao, Qingtai; Xu, Jianxin; Wang, Hua
2016-08-16
A new index, the estimate of the error variance, which can be used to quantify the evolution of the flow patterns when multiphase components or tracers are difficultly distinguishable, was proposed. The homogeneity degree of the luminance space distribution behind the viewing windows in the direct contact boiling heat transfer process was explored. With image analysis and a linear statistical model, the F-test of the statistical analysis was used to test whether the light was uniform, and a non-linear method was used to determine the direction and position of a fixed source light. The experimental results showed that the inflection point of the new index was approximately equal to the mixing time. The new index has been popularized and applied to a multiphase macro mixing process by top blowing in a stirred tank. Moreover, a general quantifying model was introduced for demonstrating the relationship between the flow patterns of the bubble swarms and heat transfer. The results can be applied to investigate other mixing processes that are very difficult to recognize the target.
Clark, Robert D
2003-01-01
It is becoming increasingly common in quantitative structure/activity relationship (QSAR) analyses to use external test sets to evaluate the likely stability and predictivity of the models obtained. In some cases, such as those involving variable selection, an internal test set--i.e., a cross-validation set--is also used. Care is sometimes taken to ensure that the subsets used exhibit response and/or property distributions similar to those of the data set as a whole, but more often the individual observations are simply assigned 'at random.' In the special case of MLR without variable selection, it can be analytically demonstrated that this strategy is inferior to others. Most particularly, D-optimal design performs better if the form of the regression equation is known and the variables involved are well behaved. This report introduces an alternative, non-parametric approach termed 'boosted leave-many-out' (boosted LMO) cross-validation. In this method, relatively small training sets are chosen by applying optimizable k-dissimilarity selection (OptiSim) using a small subsample size (k = 4, in this case), with the unselected observations being reserved as a test set for the corresponding reduced model. Predictive errors for the full model are then estimated by aggregating results over several such analyses. The countervailing effects of training and test set size, diversity, and representativeness on PLS model statistics are described for CoMFA analysis of a large data set of COX2 inhibitors.
Zhang Shuangnan; Xie Yi
2012-10-01
We test models for the evolution of neutron star (NS) magnetic fields (B). Our model for the evolution of the NS spin is taken from an analysis of pulsar timing noise presented by Hobbs et al.. We first test the standard model of a pulsar's magnetosphere in which B does not change with time and magnetic dipole radiation is assumed to dominate the pulsar's spin-down. We find that this model fails to predict both the magnitudes and signs of the second derivatives of the spin frequencies ({nu}-double dot). We then construct a phenomenological model of the evolution of B, which contains a long-term decay (LTD) modulated by short-term oscillations; a pulsar's spin is thus modified by its B-evolution. We find that an exponential LTD is not favored by the observed statistical properties of {nu}-double dot for young pulsars and fails to explain the fact that {nu}-double dot is negative for roughly half of the old pulsars. A simple power-law LTD can explain all the observed statistical properties of {nu}-double dot. Finally, we discuss some physical implications of our results to models of the B-decay of NSs and suggest reliable determination of the true ages of many young NSs is needed, in order to constrain further the physical mechanisms of their B-decay. Our model can be further tested with the measured evolutions of {nu}-dot and {nu}-double dot for an individual pulsar; the decay index, oscillation amplitude, and period can also be determined this way for the pulsar.
NASA Astrophysics Data System (ADS)
Tsaneva, M. G.; Krezhova, D. D.; Yanev, T. K.
2010-10-01
A statistical model is proposed for analysis of the texture of land cover types for global and regional land cover classification by using texture features extracted by multiresolution image analysis techniques. It consists of four novel indices representing second-order texture, which are calculated after wavelet decomposition of an image and after texture extraction by a new approach that makes use of a four-pixel texture unit. The model was applied to four satellite images of the Black Sea region, obtained by Terra/MODIS and Aqua/MODIS at different spatial resolution. In single texture classification experiments, we used 15 subimages (50 × 50 pixels) of the selected classes of land covers that are present in the satellite images studied. These subimages were subjected to one-level and two-level decompositions by using orthonormal spline and Gabor-like spline wavelets. The texture indices were calculated and used as feature vectors in the supervised classification system with neural networks. The testing of the model was based on the use of two kinds of widely accepted statistical texture quantities: five texture features determined by the co-occurrence matrix (angular second moment, contrast, correlation, inverse difference moment, entropy), and four statistical texture features determined after the wavelet transformation (mean, standard deviation, energy, entropy). The supervised neural network classification was performed and the discrimination ability of the proposed texture indices was found comparable with that for the sets of five GLCM texture features and four wavelet-based texture features. The results obtained from the neural network classifier showed that the proposed texture model yielded an accuracy of 92.86% on average after orthonormal wavelet decomposition and 100% after Gabor-like wavelet decomposition for texture classification of the examined land cover types on satellite images.
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James
2014-01-01
Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.
NASA Astrophysics Data System (ADS)
Evans, Mark
2016-12-01
A new parametric approach, termed the Wilshire equations, offers the realistic potential of being able to accurately lift materials operating at in-service conditions from accelerated test results lasting no more than 5000 hours. The success of this approach can be attributed to a well-defined linear relationship that appears to exist between various creep properties and a log transformation of the normalized stress. However, these linear trends are subject to discontinuities, the number of which appears to differ from material to material. These discontinuities have until now been (1) treated as abrupt in nature and (2) identified by eye from an inspection of simple graphical plots of the data. This article puts forward a statistical test for determining the correct number of discontinuities present within a creep data set and a method for allowing these discontinuities to occur more gradually, so that the methodology is more in line with the accepted view as to how creep mechanisms evolve with changing test conditions. These two developments are fully illustrated using creep data sets on two steel alloys. When these new procedures are applied to these steel alloys, not only do they produce more accurate and realistic looking long-term predictions of the minimum creep rate, but they also lead to different conclusions about the mechanisms determining the rates of creep from those originally put forward by Wilshire.
NASA Astrophysics Data System (ADS)
Zhao, J. Q.; Yang, J.; Li, P. X.; Liu, M. Y.; Shi, Y. M.
2016-06-01
Accurate and timely change detection of Earth's surface features is extremely important for understanding relationships and interactions between people and natural phenomena. Many traditional methods of change detection only use a part of polarization information and the supervised threshold selection. Those methods are insufficiency and time-costing. In this paper, we present a novel unsupervised change-detection method based on quad-polarimetric SAR data and automatic threshold selection to solve the problem of change detection. First, speckle noise is removed for the two registered SAR images. Second, the similarity measure is calculated by the test statistic, and automatic threshold selection of KI is introduced to obtain the change map. The efficiency of the proposed method is demonstrated by the quad-pol SAR images acquired by Radarsat-2 over Wuhan of China.
Debiasse, Melissa B; Nelson, Bradley J; Hellberg, Michael E
2014-01-01
Conflicting patterns of population differentiation between the mitochondrial and nuclear genomes (mito-nuclear discordance) have become increasingly evident as multilocus data sets have become easier to generate. Incomplete lineage sorting (ILS) of nucDNA is often implicated as the cause of such discordance, stemming from the large effective population size of nucDNA relative to mtDNA. However, selection, sex-biased dispersal and historical demography can also lead to mito-nuclear discordance. Here, we compare patterns of genetic diversity and subdivision for six nuclear protein-coding gene regions to those for mtDNA in a common Caribbean coral reef sponge, Callyspongia vaginalis, along the Florida reef tract. We also evaluated a suite of summary statistics to determine which are effective metrics for comparing empirical and simulated data when testing drivers of mito-nuclear discordance in a statistical framework. While earlier work revealed three divergent and geographically subdivided mtDNACOI haplotypes separated by 2.4% sequence divergence, nuclear alleles were admixed with respect to mitochondrial clade and geography. Bayesian analysis showed that substitution rates for the nuclear loci were up to 7 times faster than for mitochondrial COI. Coalescent simulations and neutrality tests suggested that mito-nuclear discordance in C. vaginalis is not the result of ILS in the nucDNA or selection on the mtDNA but is more likely caused by changes in population size. Sperm-mediated gene flow may also influence patterns of population subdivision in the nucDNA.
ERIC Educational Resources Information Center
Fu, Jianbin; Wise, Maxwell
2012-01-01
In the Cognitively Based Assessment of, for, and as Learning ("CBAL"™) research initiative, innovative K-12 prototype tests based on cognitive competency models are developed. This report presents the statistical results of the 2 CBAL Grade 8 writing tests and 2 Grade 7 reading tests administered to students in 20 states in spring 2011.…
ERIC Educational Resources Information Center
Qasem, Mamun Ali Naji; Altrairy, Abdulrhman; Govil, Punita
2012-01-01
This research has aimed at constructing Criterion Referenced Test to measure the statistical competencies of the Post-graduate Students in Education Colleges in Yemeni Universities, at examining the validity of the test's grades (the descriptive validity and the Domain Selection Validity), at examining the test's grades Reliability according to…
ERIC Educational Resources Information Center
Burton, Richard F.; Miller, David J.
1999-01-01
Discusses statistical procedures for increasing test unreliability due to guessing in multiple choice and true/false tests. Proposes two new measures of test unreliability: one concerned with resolution of defined levels of knowledge and the other with the probability of examinees being incorrectly ranked. Both models are based on the binomial…
CAP,JEROME S.; TRACEY,BRIAN
1999-11-15
Aerospace payloads, such as satellites, are subjected to vibroacoustic excitation during launch. Sandia's MTI satellite has recently been certified to this environment using a combination of base input random vibration and reverberant acoustic noise. The initial choices for the acoustic and random vibration test specifications were obtained from the launch vehicle Interface Control Document (ICD). In order to tailor the random vibration levels for the laboratory certification testing, it was necessary to determine whether vibration energy was flowing across the launch vehicle interface from the satellite to the launch vehicle or the other direction. For frequencies below 120 Hz this issue was addressed using response limiting techniques based on results from the Coupled Loads Analysis (CLA). However, since the CLA Finite Element Analysis FEA model was only correlated for frequencies below 120 Hz, Statistical Energy Analysis (SEA) was considered to be a better choice for predicting the direction of the energy flow for frequencies above 120 Hz. The existing SEA model of the launch vehicle had been developed using the VibroAcoustic Payload Environment Prediction System (VAPEPS) computer code [1]. Therefore, the satellite would have to be modeled using VAPEPS as well. As is the case for any computational model, the confidence in its predictive capability increases if one can correlate a sample prediction against experimental data. Fortunately, Sandia had the ideal data set for correlating an SEA model of the MTI satellite--the measured response of a realistic assembly to a reverberant acoustic test that was performed during MTI's qualification test series. The first part of this paper will briefly describe the VAPEPS modeling effort and present the results of the correlation study for the VAPEPS model. The second part of this paper will present the results from a study that used a commercial SEA software package [2] to study the effects of in-plane modes and to
Aoiz, F J; González-Lezana, T; Sáez Rábanos, V
2007-11-07
A complete formulation of a statistical quasiclassical trajectory (SQCT) model is presented in this work along with a detailed comparison with results obtained with the statistical quantum mechanical (SQM) model for the H+ +D2 and H+ +H2 reactions. The basic difference between the SQCT and the SQM models lies in the fact that trajectories instead of wave functions are propagated in the entrance and exit channels. Other than this the two formulations are entirely similar and both comply with the principle of detailed balance and conservation of parity. Reaction probabilities, and integral and differential cross sections (DCS's) for these reactions at different levels of product's state resolution and from various initial states are shown and discussed. The agreement is in most cases excellent and indicates that the effect of tunneling through the centrifugal barrier is negligible. Some differences are found, however, between state resolved observables calculated by the SQCT and the SQM methods which makes use of the centrifugal sudden (coupled states) approximation (SQM-CS). When this approximation is removed and the full close coupling treatment is used in the SQM model (SQM-CC), an almost perfect agreement is achieved. This shows that the SQCT is sensitive enough to show the relatively small inaccuracies resulting from the decoupling inherent to the CS approximation. In addition, the effect of ignoring the parity conservation is thoroughly examined. This effect is in general minor except in particular cases such as the DCS from initial rotational state j=0. It is shown, however, that in order to reproduce the sharp forward and backward peaks the conservation of parity has to be taken into account.
NASA Astrophysics Data System (ADS)
Auchmann, Renate; Brönnimann, Stefan; Croci-Maspoli, Mischa
2016-04-01
For the correction of inhomogeneities in sub-daily temperature series, Auchmann and Brönnimann (2012) developed a physics-based model for one specific type of break, i.e. the transition from a Wild screen to a Stevenson screen at one specific station in Basel, Switzerland. The model is based solely on physical considerations, no relationships of the covariates to the differences between the parallel measurements have been investigated. The physics-based model requires detailed information on the screen geometry, the location, and includes a variety of covariates in the model. The model is mainly based on correcting the radiation error, including a modification by ambient wind. In this study we test the application of the model to another station, Zurich, experiencing the same type of transition. Furthermore we compare the performance of the physics based correction to purely statistical correction approaches (constant correction, correcting for annual cycle using spline). In Zurich the Wild screen was replaced in 1954 by the Stevenson screen, from 1954-1960 parallel temperature measurements in both screens were taken, which will be used to assess the performance of the applied corrections. For Zurich the required model input is available (i.e. three times daily observations of wind, cloud cover, pressure and humidity measurements, local times of sunset and sunrise). However, a large number of stations do not measure these additional input data required for the model, which hampers the transferability and applicability of the model to other stations. Hence, we test possible simplifications and generalizations of the model to make it more easily applicable to stations with the same type of inhomogeneity. In a last step we test whether other types of transitions (e.g., from a Stevenson screen to an automated weather system) can be corrected using the principle of a physics-based approach.
NASA Astrophysics Data System (ADS)
Zhan, Yimin; Mechefske, Chris K.
2007-07-01
Optimal maintenance decision analysis is heavily dependent on the accuracy of condition indicators. A condition indicator that is subject to such varying operating conditions as load is unable to provide precise condition information of the monitored object for making optimal operational maintenance decisions even if the maintenance program is established within a rigorous theoretical framework. For this reason, the performance of condition monitoring techniques applied to rotating machinery under varying load conditions has been a long-term concern and has attracted intensive research interest. Part I of this study proposed a novel technique based on adaptive autoregressive modeling and hypothesis tests. The method is able to automatically search for the optimal time-series model order and establish a compromised autoregressive model fitting based on the healthy gear motion residual signals under varying load conditions. The condition of the monitored gearbox is numerically represented by a modified Kolmogorov-Smirnov test statistic. Part II of this study is devoted to applications of the proposed technique to entire lifetime condition detection of three gearboxes with distinct physical specifications, distinct load conditions, and distinct failure modes. A comprehensive and thorough comparative study is conducted between the proposed technique and several counterparts. The detection technique is further enhanced by a proposed method to automatically identify and generate fault alerts with the aid of the Wilcoxon rank-sum test and thus requires no supervision from maintenance personnel. Experimental analysis demonstrated that the proposed technique applied to automatic identification and generation of fault alerts also features two highly desirable properties, i.e. few false alerts and early alert for incipient faults. Furthermore, it is found that the proposed technique is able to identify two types of abnormalities, i.e. strong ghost components abruptly
Kossobokov, V.G.; Romashkova, L.L.; Keilis-Borok, V. I.; Healy, J.H.
1999-01-01
Algorithms M8 and MSc (i.e., the Mendocino Scenario) were used in a real-time intermediate-term research prediction of the strongest earthquakes in the Circum-Pacific seismic belt. Predictions are made by M8 first. Then, the areas of alarm are reduced by MSc at the cost that some earthquakes are missed in the second approximation of prediction. In 1992-1997, five earthquakes of magnitude 8 and above occurred in the test area: all of them were predicted by M8 and MSc identified correctly the locations of four of them. The space-time volume of the alarms is 36% and 18%, correspondingly, when estimated with a normalized product measure of empirical distribution of epicenters and uniform time. The statistical significance of the achieved results is beyond 99% both for M8 and MSc. For magnitude 7.5 + , 10 out of 19 earthquakes were predicted by M8 in 40% and five were predicted by M8-MSc in 13% of the total volume considered. This implies a significance level of 81% for M8 and 92% for M8-MSc. The lower significance levels might result from a global change in seismic regime in 1993-1996, when the rate of the largest events has doubled and all of them become exclusively normal or reversed faults. The predictions are fully reproducible; the algorithms M8 and MSc in complete formal definitions were published before we started our experiment [Keilis-Borok, V.I., Kossobokov, V.G., 1990. Premonitory activation of seismic flow: Algorithm M8, Phys. Earth and Planet. Inter. 61, 73-83; Kossobokov, V.G., Keilis-Borok, V.I., Smith, S.W., 1990. Localization of intermediate-term earthquake prediction, J. Geophys. Res., 95, 19763-19772; Healy, J.H., Kossobokov, V.G., Dewey, J.W., 1992. A test to evaluate the earthquake prediction algorithm, M8. U.S. Geol. Surv. OFR 92-401]. M8 is available from the IASPEI Software Library [Healy, J.H., Keilis-Borok, V.I., Lee, W.H.K. (Eds.), 1997. Algorithms for Earthquake Statistics and Prediction, Vol. 6. IASPEI Software Library]. ?? 1999 Elsevier
Ratte, H.T.
1996-10-01
In order to protect an ecosystem against anthropogenic stressors such as xenobiotics, potential impacts on its sensitive populations must be investigated. A general simulation approach was developed for validating biotest end points in the Daphnia reproduction test. Various toxic-effect scenarios, sample sizes, and inspection regimes were used to study the behavior and robustness of different end points. The intrinsic rate of natural increase (IR) and the capacity for increase (CI) were estimated because of their ecological significance. Both parameters were compared to conventionally chosen end points, the offspring number per female (ON) and the percent mortality (MO). The IR appeared to be the most sensitive end point among the different toxic-effect scenarios. In particular, effects on the age at first reproduction, which are highly relevant in population dynamics, were integrated. In general, the CI was as sensitive as the IR. However, the CI tends to overestimate the first brood. In contrast to ON and MO, both the IR and CI responded sensitively to the inspection regime. The IR was found to require daily recording if reproduction and mortality events, at least until the first broods appeared. Whereas the value of the CI remained questionable, from a statistical and ecological viewpoint the IR appeared to be superior.
Distribution of the two-sample t-test statistic following blinded sample size re-estimation.
Lu, Kaifeng
2016-05-01
We consider the blinded sample size re-estimation based on the simple one-sample variance estimator at an interim analysis. We characterize the exact distribution of the standard two-sample t-test statistic at the final analysis. We describe a simulation algorithm for the evaluation of the probability of rejecting the null hypothesis at given treatment effect. We compare the blinded sample size re-estimation method with two unblinded methods with respect to the empirical type I error, the empirical power, and the empirical distribution of the standard deviation estimator and final sample size. We characterize the type I error inflation across the range of standardized non-inferiority margin for non-inferiority trials, and derive the adjusted significance level to ensure type I error control for given sample size of the internal pilot study. We show that the adjusted significance level increases as the sample size of the internal pilot study increases. Copyright © 2016 John Wiley & Sons, Ltd.
Suner, Aslı; Karakülah, Gökhan; Koşaner, Özgün; Dicle, Oğuz
2015-01-01
The improper use of statistical methods is common in analyzing and interpreting research data in biological and medical sciences. The objective of this study was to develop a decision support tool encompassing the commonly used statistical tests in biomedical research by combining and updating the present decision trees for appropriate statistical test selection. First, the decision trees in textbooks, published articles, and online resources were scrutinized, and a more comprehensive unified one was devised via the integration of 10 distinct decision trees. The questions also in the decision steps were revised by simplifying and enriching of the questions with examples. Then, our decision tree was implemented into the web environment and the tool titled StatXFinder was developed. Finally, usability and satisfaction questionnaires were applied to the users of the tool, and StatXFinder was reorganized in line with the feedback obtained from these questionnaires. StatXFinder provides users with decision support in the selection of 85 distinct parametric and non-parametric statistical tests by directing 44 different yes-no questions. The accuracy rate of the statistical test recommendations obtained by 36 participants, with the cases applied, were 83.3 % for "difficult" tests, and 88.9 % for "easy" tests. The mean system usability score of the tool was found 87.43 ± 10.01 (minimum: 70-maximum: 100). A statistically significant difference could not be seen between total system usability score and participants' attributes (p value >0.05). The User Satisfaction Questionnaire showed that 97.2 % of the participants appreciated the tool, and almost all of the participants (35 of 36) thought of recommending the tool to the others. In conclusion, StatXFinder, can be utilized as an instructional and guiding tool for biomedical researchers with limited statistics knowledge. StatXFinder is freely available at http://webb.deu.edu.tr/tb/statxfinder.
Parreño, Viviana; López, María Virginia; Rodriguez, Daniela; Vena, María Marta; Izuel, Mercedes; Filippi, Jorge; Romera, Alejandra; Faverin, Claudia; Bellinzoni, Rodolfo; Fernandez, Fernando; Marangunich, Laura
2010-03-16
Infectious Bovine Rhinothracheitis (IBR) caused by bovine herpesvirus 1 (BoHV-1) infection is distributed worldwide. BoHV-1 either alone or in association with other respiratory cattle pathogens causes significant economic losses to the livestock industry. The aim of this work was to validate a guinea pig model as an alternative method to the current BoHV-1 vaccine potency testing in calves. Guinea pigs were immunized with two doses of vaccine, 21 days apart and sampled at 30 days post vaccination (dpv). BoHV-1 antibody (Ab) response to vaccination in guinea pigs, measured by ELISA and virus neutralization (VN), was statistically compared to the Ab response in cattle. The guinea pig model showed a dose-response relationship to the BoVH-1 antigen concentration in the vaccine and it was able to discriminate among vaccines containing 1log(10) difference in its BoHV-1 concentration with very good repeatability and reproducibility (CV < or = 20%). A regression analysis of the Ab titers obtained in guinea pigs and bovines at 30 and 60dpv, respectively, allowed us to classify vaccines in three potency categories: "very satisfactory", "satisfactory" and "unsatisfactory". Bovines immunized with vaccines corresponding to each of these three categories were experimentally challenged with BoVH-1 virus, the level of protection, as measured by reduction of virus shedding and disease severity, correlated well with the vaccine category used. Data generated by 85 experiments, which included vaccination of calves and guinea pigs with 18 reference vaccines of known potency, 8 placebos and 18 commercial vaccines, was subjected to statistical analysis. Concordance analysis indicated almost perfect agreement between the model and the target species for Ab titers measured by ELISA and almost perfect to substantial agreement when Ab titers were measured by VN. Taken together these results indicate that the developed guinea pig model represents a novel and reliable tool to estimate batch
NASA Astrophysics Data System (ADS)
Wang, H. J.; Shi, W. L.; Chen, X. H.
2006-05-01
The West Development Policy being implemented in China is causing significant land use and land cover (LULC) changes in West China. With the up-to-date satellite database of the Global Land Cover Characteristics Database (GLCCD) that characterizes the lower boundary conditions, the regional climate model RIEMS-TEA is used to simulate possible impacts of the significant LULC variation. The model was run for five continuous three-month periods from 1 June to 1 September of 1993, 1994, 1995, 1996, and 1997, and the results of the five groups are examined by means of a student t-test to identify the statistical significance of regional climate variation. The main results are: (1) The regional climate is affected by the LULC variation because the equilibrium of water and heat transfer in the air-vegetation interface is changed. (2) The integrated impact of the LULC variation on regional climate is not only limited to West China where the LULC varies, but also to some areas in the model domain where the LULC does not vary at all. (3) The East Asian monsoon system and its vertical structure are adjusted by the large scale LULC variation in western China, where the consequences axe the enhancement of the westward water vapor transfer from the east east and the relevant increase of wet-hydrostatic energy in the middle-upper atmospheric layers. (4) The ecological engineering in West China affects significantly the regional climate in Northwest China, North China and the middle-lower reaches of the Yangtze River; there are obvious effects in South, Northeast, and Southwest China, but minor effects in Tibet.
Pasaniuc, Bogdan; Zaitlen, Noah; Lettre, Guillaume; Chen, Gary K.; Tandon, Arti; Kao, W. H. Linda; Ruczinski, Ingo; Fornage, Myriam; Siscovick, David S.; Zhu, Xiaofeng; Larkin, Emma; Lange, Leslie A.; Cupples, L. Adrienne; Yang, Qiong; Akylbekova, Ermeg L.; Musani, Solomon K.; Divers, Jasmin; Mychaleckyj, Joe; Li, Mingyao; Papanicolaou, George J.; Millikan, Robert C.; Ambrosone, Christine B.; John, Esther M.; Bernstein, Leslie; Zheng, Wei; Hu, Jennifer J.; Ziegler, Regina G.; Nyante, Sarah J.; Bandera, Elisa V.; Ingles, Sue A.; Press, Michael F.; Chanock, Stephen J.; Deming, Sandra L.; Rodriguez-Gil, Jorge L.; Palmer, Cameron D.; Buxbaum, Sarah; Ekunwe, Lynette; Hirschhorn, Joel N.; Henderson, Brian E.; Myers, Simon; Haiman, Christopher A.; Reich, David; Patterson, Nick; Wilson, James G.; Price, Alkes L.
2011-01-01
While genome-wide association studies (GWAS) have primarily examined populations of European ancestry, more recent studies often involve additional populations, including admixed populations such as African Americans and Latinos. In admixed populations, linkage disequilibrium (LD) exists both at a fine scale in ancestral populations and at a coarse scale (admixture-LD) due to chromosomal segments of distinct ancestry. Disease association statistics in admixed populations have previously considered SNP association (LD mapping) or admixture association (mapping by admixture-LD), but not both. Here, we introduce a new statistical framework for combining SNP and admixture association in case-control studies, as well as methods for local ancestry-aware imputation. We illustrate the gain in statistical power achieved by these methods by analyzing data of 6,209 unrelated African Americans from the CARe project genotyped on the Affymetrix 6.0 chip, in conjunction with both simulated and real phenotypes, as well as by analyzing the FGFR2 locus using breast cancer GWAS data from 5,761 African-American women. We show that, at typed SNPs, our method yields an 8% increase in statistical power for finding disease risk loci compared to the power achieved by standard methods in case-control studies. At imputed SNPs, we observe an 11% increase in statistical power for mapping disease loci when our local ancestry-aware imputation framework and the new scoring statistic are jointly employed. Finally, we show that our method increases statistical power in regions harboring the causal SNP in the case when the causal SNP is untyped and cannot be imputed. Our methods and our publicly available software are broadly applicable to GWAS in admixed populations. PMID:21541012
Trickey, Amber W.; Crosby, Moira E.; Singh, Monika; Dort, Jonathan M.
2014-01-01
Background The application of evidence-based medicine to patient care requires unique skills of the physician. Advancing residents' abilities to accurately evaluate the quality of evidence is built on understanding of fundamental research concepts. The American Board of Surgery In-Training Examination (ABSITE) provides a relevant measure of surgical residents' knowledge of research design and statistics. Objective We implemented a research education curriculum in an independent academic medical center general residency program, and assessed the effect on ABSITE scores. Methods The curriculum consisted of five 1-hour monthly research and statistics lectures. The lectures were presented before the 2012 and 2013 examinations. Forty residents completing ABSITE examinations from 2007 to 2013 were included in the study. Two investigators independently identified research-related item topics from examination summary reports. Correct and incorrect responses were compared precurriculum and postcurriculum. Regression models were calculated to estimate improvement in postcurriculum scores, adjusted for individuals' scores over time and postgraduate year level. Results Residents demonstrated significant improvement in postcurriculum examination scores for research and statistics items. Correct responses increased 27% (P < .001). Residents were 5 times more likely to achieve a perfect score on research and statistics items postcurriculum (P < .001). Conclusions Residents at all levels demonstrated improved research and statistics scores after receiving the curriculum. Because the ABSITE includes a wide spectrum of research topics, sustained improvements suggest a genuine level of understanding that will promote lifelong evaluation and clinical application of the surgical literature. PMID:26140115
ERIC Educational Resources Information Center
Levin, Joel R.; Ferron, John M.; Kratochwill, Thomas R.
2012-01-01
In this four-investigation Monte Carlo simulation study, we examined the properties of nonparametric randomization and permutation statistical tests applied to single-case ABAB...AB and alternating treatment designs based on either systematically alternating or randomly determined phase assignments. Contrary to previous admonitions, when…
ERIC Educational Resources Information Center
Osler, James Edward, II
2015-01-01
This monograph provides an epistemological rational for the Accumulative Manifold Validation Analysis [also referred by the acronym "AMOVA"] statistical methodology designed to test psychometric instruments. This form of inquiry is a form of mathematical optimization in the discipline of linear stochastic modelling. AMOVA is an in-depth…
ERIC Educational Resources Information Center
Gómez-Benito, Juana; Hidalgo, Maria Dolores; Zumbo, Bruno D.
2013-01-01
The objective of this article was to find an optimal decision rule for identifying polytomous items with large or moderate amounts of differential functioning. The effectiveness of combining statistical tests with effect size measures was assessed using logistic discriminant function analysis and two effect size measures: R[superscript 2] and…
Josse, Florent; Lefebvre, Yannick; Todeschini, Patrick; Turato, Silvia; Meister, Eric
2006-07-01
Assessing the structural integrity of a nuclear Reactor Pressure Vessel (RPV) subjected to pressurized-thermal-shock (PTS) transients is extremely important to safety. In addition to conventional deterministic calculations to confirm RPV integrity, Electricite de France (EDF) carries out probabilistic analyses. Probabilistic analyses are interesting because some key variables, albeit conventionally taken at conservative values, can be modeled more accurately through statistical variability. One variable which significantly affects RPV structural integrity assessment is cleavage fracture initiation toughness. The reference fracture toughness method currently in use at EDF is the RCCM and ASME Code lower-bound K{sub IC} based on the indexing parameter RT{sub NDT}. However, in order to quantify the toughness scatter for probabilistic analyses, the master curve method is being analyzed at present. Furthermore, the master curve method is a direct means of evaluating fracture toughness based on K{sub JC} data. In the framework of the master curve investigation undertaken by EDF, this article deals with the following two statistical items: building a master curve from an extract of a fracture toughness dataset (from the European project 'Unified Reference Fracture Toughness Design curves for RPV Steels') and controlling statistical uncertainty for both mono-temperature and multi-temperature tests. Concerning the first point, master curve temperature dependence is empirical in nature. To determine the 'original' master curve, Wallin postulated that a unified description of fracture toughness temperature dependence for ferritic steels is possible, and used a large number of data corresponding to nuclear-grade pressure vessel steels and welds. Our working hypothesis is that some ferritic steels may behave in slightly different ways. Therefore we focused exclusively on the basic french reactor vessel metal of types A508 Class 3 and A 533 grade B Class 1, taking the sampling
Simonson, K.M.
1998-08-01
The rate at which a mine detection system falsely identifies man-made or natural clutter objects as mines is referred to as the system's false alarm rate (FAR). Generally expressed as a rate per unit area or time, the FAR is one of the primary metrics used to gauge system performance. In this report, an overview is given of statistical methods appropriate for the analysis of data relating to FAR. Techniques are presented for determining a suitable size for the clutter collection area, for summarizing the performance of a single sensor, and for comparing different sensors. For readers requiring more thorough coverage of the topics discussed, references to the statistical literature are provided. A companion report addresses statistical issues related to the estimation of mine detection probabilities.
ERIC Educational Resources Information Center
Chalmers, R. Philip; Counsell, Alyssa; Flora, David B.
2016-01-01
Differential test functioning, or DTF, occurs when one or more items in a test demonstrate differential item functioning (DIF) and the aggregate of these effects are witnessed at the test level. In many applications, DTF can be more important than DIF when the overall effects of DIF at the test level can be quantified. However, optimal statistical…
ERIC Educational Resources Information Center
Gerlich, Bella Karr; Berard, G. Lynn
2010-01-01
The READ Scale (Reference Effort Assessment Data) is a six-point scale tool for recording qualitative statistics by placing an emphasis on recording effort, knowledge, skills, and teaching used by staff during a reference transaction. Institutional research grants enabled the authors to conduct a national study of the READ Scale at 14 diverse…
Henn, Julian; Meindl, Kathrin
2015-03-01
Statistical tests are applied for the detection of systematic errors in data sets from least-squares refinements or other residual-based reconstruction processes. Samples of the residuals of the data are tested against the hypothesis that they belong to the same distribution. For this it is necessary that they show the same mean values and variances within the limits given by statistical fluctuations. When the samples differ significantly from each other, they are not from the same distribution within the limits set by the significance level. Therefore they cannot originate from a single Gaussian function in this case. It is shown that a significance cutoff results in exactly this case. Significance cutoffs are still frequently used in charge-density studies. The tests are applied to artificial data with and without systematic errors and to experimental data from the literature.
NASA Astrophysics Data System (ADS)
Khan, Shahjahan
Often scientific information on various data generating processes are presented in the from of numerical and categorical data. Except for some very rare occasions, generally such data represent a small part of the population, or selected outcomes of any data generating process. Although, valuable and useful information is lurking in the array of scientific data, generally, they are unavailable to the users. Appropriate statistical methods are essential to reveal the hidden "jewels" in the mess of the row data. Exploratory data analysis methods are used to uncover such valuable characteristics of the observed data. Statistical inference provides techniques to make valid conclusions about the unknown characteristics or parameters of the population from which scientifically drawn sample data are selected. Usually, statistical inference includes estimation of population parameters as well as performing test of hypotheses on the parameters. However, prediction of future responses and determining the prediction distributions are also part of statistical inference. Both Classical or Frequentists and Bayesian approaches are used in statistical inference. The commonly used Classical approach is based on the sample data alone. In contrast, increasingly popular Beyesian approach uses prior distribution on the parameters along with the sample data to make inferences. The non-parametric and robust methods are also being used in situations where commonly used model assumptions are unsupported. In this chapter,we cover the philosophical andmethodological aspects of both the Classical and Bayesian approaches.Moreover, some aspects of predictive inference are also included. In the absence of any evidence to support assumptions regarding the distribution of the underlying population, or if the variable is measured only in ordinal scale, non-parametric methods are used. Robust methods are employed to avoid any significant changes in the results due to deviations from the model
NASA Astrophysics Data System (ADS)
Khan, Shahjahan
Often scientific information on various data generating processes are presented in the from of numerical and categorical data. Except for some very rare occasions, generally such data represent a small part of the population, or selected outcomes of any data generating process. Although, valuable and useful information is lurking in the array of scientific data, generally, they are unavailable to the users. Appropriate statistical methods are essential to reveal the hidden “jewels” in the mess of the row data. Exploratory data analysis methods are used to uncover such valuable characteristics of the observed data. Statistical inference provides techniques to make valid conclusions about the unknown characteristics or parameters of the population from which scientifically drawn sample data are selected. Usually, statistical inference includes estimation of population parameters as well as performing test of hypotheses on the parameters. However, prediction of future responses and determining the prediction distributions are also part of statistical inference. Both Classical or Frequentists and Bayesian approaches are used in statistical inference. The commonly used Classical approach is based on the sample data alone. In contrast, increasingly popular Beyesian approach uses prior distribution on the parameters along with the sample data to make inferences. The non-parametric and robust methods are also being used in situations where commonly used model assumptions are unsupported. In this chapter,we cover the philosophical andmethodological aspects of both the Classical and Bayesian approaches.Moreover, some aspects of predictive inference are also included. In the absence of any evidence to support assumptions regarding the distribution of the underlying population, or if the variable is measured only in ordinal scale, non-parametric methods are used. Robust methods are employed to avoid any significant changes in the results due to deviations from the model
ERIC Educational Resources Information Center
Wilkins, M. Elaine
2012-01-01
In 2001, No Child Left Behind introduced the highly qualified status for k-12 teachers, which mandated the successful scores on a series of high-stakes test; within this series is the Pre-Professional Skills Test (PPST) or PRAXIS I. The PPST measures basic k-12 skills for reading, writing, and mathematics. The mathematics sub-test is a national…
ERIC Educational Resources Information Center
Jacko, Edward J.; Huck, Schuyler W.
The Alpert-Haber Achievement Anxiety Test was developed to measure the extent to which individuals experience test anxiety. In at least two published studies, the authors claim to have used the test when in fact the response format was changed from that used in the original instrument and the "buffer" items were omitted. To investigate…
ERIC Educational Resources Information Center
Zenisky, April L.; Hambleton, Ronald K.; Sireci, Stephen G.
Measurement specialists routinely assume examinee responses to test items are independent of one another. However, previous research has shown that many contemporary tests contain item dependencies and not accounting for these dependencies leads to misleading estimates of item, test, and ability parameters. In this study, methods for detecting…
Test of the statistical model in Mo96 with the BaF2γ calorimeter DANCE array
NASA Astrophysics Data System (ADS)
Sheets, S. A.; Agvaanluvsan, U.; Becker, J. A.; Bečvář, F.; Bredeweg, T. A.; Haight, R. C.; Jandel, M.; Krtička, M.; Mitchell, G. E.; O'Donnell, J. M.; Parker, W.; Reifarth, R.; Rundberg, R. S.; Sharapov, E. I.; Ullmann, J. L.; Vieira, D. J.; Wilhelmy, J. B.; Wouters, J. M.; Wu, C. Y.
2009-02-01
The γ-ray cascades following the Mo95(n,γ)Mo96 reaction were studied with the γ calorimeter DANCE (Detector for Advanced Neutron Capture Experiments) consisting of 160 BaF2 scintillation detectors at the Los Alamos Neutron Science Center. The γ-ray energy spectra for different multiplicities were measured for s- and p-wave resonances below 2 keV. The shapes of these spectra were found to be in very good agreement with simulations using the DICEBOX statistical model code. The relevant model parameters used for the level density and photon strength functions were identical with those that provided the best fit of the data from a recent measurement of the thermal Mo95(n,γ)Mo96 reaction with the two-step-cascade method. The reported results strongly suggest that the extreme statistical model works very well in the mass region near A=100.
Test of the statistical model in {sup 96}Mo with the BaF{sub 2}{gamma} calorimeter DANCE array
Sheets, S. A.; Mitchell, G. E.; Agvaanluvsan, U.; Becker, J. A.; Parker, W.; Wu, C. Y.; Becvar, F.; Krticka, M.; Bredeweg, T. A.; Haight, R. C.; Jandel, M.; O'Donnell, J. M.; Reifarth, R.; Rundberg, R. S.; Ullmann, J. L.; Vieira, D. J.; Wilhelmy, J. B.; Wouters, J. M.; Sharapov, E. I.
2009-02-15
The {gamma}-ray cascades following the {sup 95}Mo(n,{gamma}){sup 96}Mo reaction were studied with the {gamma} calorimeter DANCE (Detector for Advanced Neutron Capture Experiments) consisting of 160 BaF{sub 2} scintillation detectors at the Los Alamos Neutron Science Center. The {gamma}-ray energy spectra for different multiplicities were measured for s- and p-wave resonances below 2 keV. The shapes of these spectra were found to be in very good agreement with simulations using the DICEBOX statistical model code. The relevant model parameters used for the level density and photon strength functions were identical with those that provided the best fit of the data from a recent measurement of the thermal {sup 95}Mo(n,{gamma}){sup 96}Mo reaction with the two-step-cascade method. The reported results strongly suggest that the extreme statistical model works very well in the mass region near A=100.
Zhang Youcai; Yang Xiaohu; Springel, Volker
2010-10-10
We study the topology of cosmic large-scale structure through the genus statistics, using galaxy catalogs generated from the Millennium Simulation and observational data from the latest Sloan Digital Sky Survey Data Release (SDSS DR7). We introduce a new method for constructing galaxy density fields and for measuring the genus statistics of its isodensity surfaces. It is based on a Delaunay tessellation field estimation (DTFE) technique that allows the definition of a piece-wise continuous density field and the exact computation of the topology of its polygonal isodensity contours, without introducing any free numerical parameter. Besides this new approach, we also employ the traditional approaches of smoothing the galaxy distribution with a Gaussian of fixed width, or by adaptively smoothing with a kernel that encloses a constant number of neighboring galaxies. Our results show that the Delaunay-based method extracts the largest amount of topological information. Unlike the traditional approach for genus statistics, it is able to discriminate between the different theoretical galaxy catalogs analyzed here, both in real space and in redshift space, even though they are based on the same underlying simulation model. In particular, the DTFE approach detects with high confidence a discrepancy of one of the semi-analytic models studied here compared with the SDSS data, while the other models are found to be consistent.
Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu
2015-01-01
Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing
Rivas-Ruiz, Rodolfo; Pérez-Rodríguez, Marcela; Talavera, Juan O
2013-01-01
Among the test to show differences between means, the Student t test is the most characteristic. Its basic algebraic structure shows the difference between two means weighted by their dispersion. In this way, you can estimate the p value and the 95 % confidence interval of the mean difference. An essential feature is that the variable from which the mean is going to be calculated must have a normal distribution. The Student t test is used to compare two unrelated means (compared between two maneuvers), this is known as t test for independent samples. It is also used to compare two related means (a comparison before and after a maneuver in just one group), what is called paired t test. When the comparison is between more than two means (three or more dependent means, or three or more independent means) an ANOVA test (or an analysis of variance) it is used to perform the analysis.
NASA Astrophysics Data System (ADS)
Ignatius, K.; Henning, S.; Stratmann, F.
2013-12-01
We encountered the question of how to do statistical inference and uncertainty estimation for the aerosol particle hygroscopicity (κ) measured up- and downstream of a hilltop in two conditions: during full-cloud events (FCE) where a cap cloud was present on the hilltop, and under cloud-free conditions (non-cloud events, NCE). The aim was to show with statistical testing that particle hygroscopicity is altered by cloud processing. This type of statistical experimental design known as a 'pre-post case control study', 'between-within design' or 'mixed design' is common in medicine and biostatistics but it may not be familiar to all researchers in the atmospheric sciences. Therefore we review the statistical testing methods that can be applied to solve these kind of problems. The key point is that these methods use the pre-measurement as a covariate to the post-measurement, which accounts for the daily variation and reduces variance in the analysis. All the three tests, Change score analysis, Analysis of Covariance (ANCOVA) and multi-way Analysis of Variance (ANOVA) gave similar results and suggested a statistically significant change in κ between FCE and NCE. Quantification of the uncertainty in hygroscopicities derived from cloud condensation nuclei (CCN) measurements implies an uncertainty interval estimation in a nonlinear expression where the uncertainty of one parameter is Gaussian with known mean and variance. We concluded that the commonly used way of estimating and showing the uncertainty intervals in hygroscopicity studies may make the error bars appear too large. Using simple Monte Carlo sampling and plotting the resulting nonlinear distribution and its quantiles may better represent the probability mass in the uncertainty distribution.
NASA Astrophysics Data System (ADS)
Cheng, Hanqi; Small, Mitchell J.; Pekney, Natalie J.
2015-10-01
The objective of the current work was to develop a statistical method and associated tool to evaluate the impact of oil and natural gas exploration and production activities on local air quality. Nonparametric regression of pollutant concentrations on wind direction was combined with bootstrap hypothesis testing to provide statistical inference regarding the existence of a local/regional air quality impact. The block bootstrap method was employed to address the effect of autocorrelation on test significance. The method was applied to short-term air monitoring data collected at three sites within Pennsylvania's Allegheny National Forest. All of the measured pollutant concentrations were well below the National Ambient Air Quality Standards, so the usual criteria and methods for data analysis were not sufficient. Using advanced directional analysis methods, test results were first applied to verify the existence of a regional impact at a background site. Next the impact of an oil field on local NOx and SO2 concentrations at a second monitoring site was identified after removal of the regional effect. Analysis of a third site also revealed air quality impacts from nearby areas with a high density of oil and gas wells. All results and conclusions were quantified in terms of statistical significance level for the associated inferences. The proposed method can be used to formulate hypotheses and verify conclusions regarding oil and gas well impacts on air quality and support better-informed decisions for their management and regulation.
NASA Astrophysics Data System (ADS)
Shih, A. L.; Liu, J. Y. G.
2015-12-01
A median-based method and a z test are employed to find characteristics of seismo-ionospheric precursor (SIP) of the total electron content (TEC) in global ionosphere map (GIM) associated with 129 M≥5.5 earthquakes in Taiwan during 1999-2014. Results show that both negative and positive anomalies in the GIM TEC with the statistical significance of the z test appear few days before the earthquakes. The receiver operating characteristic (ROC) curve is further applied to see whether the SIPs exist in Taiwan.
Durak, Sibel; Ercan, Eyup Sabri; Ardic, Ulku Akyol; Yuce, Deniz; Ercan, Elif; Ipci, Melis
2014-08-01
The aims of this study were to evaluate the neuropsychological characteristics of the restrictive (R) subtype according to the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition and the attention-deficit/hyperactivity disorder (ADHD) combined (CB) type and predominantly inattentive (PI) type subtypes and to evaluate whether methylphenidate (MPH) affects neurocognitive test battery scores according to these subtypes. This study included 360 children and adolescents (277 boys, 83 girls) between 7 and 15 years of age who had been diagnosed with ADHD and compared the neuropsychological characteristics and MPH treatment responses of patients with the R subtype-which has been suggested for inclusion among the ADHD subtypes in the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition-with those of patients with the PI and CB subtypes. They did not differ from the control subjects in the complex attention domain, which includes Continuous Performance Test, Stroop test, and Shifting Attention Test, which suggests that the R subtype displayed a lower level of deterioration in these domains compared with the PI and CB subtypes. The patients with the CB and PI subtypes did not differ from the control subjects in the Continuous Performance Test correct response domain, whereas those with the R subtype presented a poorer performance than the control subjects. The R subtype requires a more detailed evaluation because it presented similar results in the remaining neuropsychological evaluations and MPH responses.
NASA Astrophysics Data System (ADS)
Woolley, Thomas W.; Dawson, George O.
It has been two decades since the first power analysis of a psychological journal and 10 years since the Journal of Research in Science Teaching made its contribution to this debate. One purpose of this article is to investigate what power-related changes, if any, have occurred in science education research over the past decade as a result of the earlier survey. In addition, previous recommendations are expanded and expounded upon within the context of more recent work in this area. The absence of any consistent mode of presenting statistical results, as well as little change with regard to power-related issues are reported. Guidelines for reporting the minimal amount of information demanded for clear and independent evaluation of research results by readers are also proposed.
Shechtman, Orit; Gutierrez, Zeida; Kokendofer, Emily
2005-01-01
Controversy exists in the literature concerning the ability of the five-rung grip test to identify submaximal effort. The purpose of this study was to analyze four methods commonly used to evaluate the shape of the curve generated by maximal versus submaximal efforts. Thirty hand therapy patients performed the five-rung grip test maximally and submaximally with both their injured and uninjured hands. Grip strength scores were recorded at each of the five-rung positions. Next, four methods were used to analyze the data 1) visual analysis, 2) analysis of variance, 3) normalization, and 4) calculation of the standard deviation across the five strength scores. Analysis by all methods demonstrated that there were no differences between the injured hand exerting maximal effort and the uninjured hand exerting submaximal effort. In all four methods, the five-rung grip strength test was unable to distinguish between the injured hand exerting maximal effort and the uninjured hand exerting submaximal effort. The results suggest that the five-rung grip strength test should not be used to determine sincerity of effort in people with hand injuries, and that the shape of the curve generated by the five-rung grip strength test may not be related to level of effort but rather to the amount of force generated by the gripping hand.
Statistical Methods for Astronomy
NASA Astrophysics Data System (ADS)
Feigelson, Eric D.; Babu, G. Jogesh
Statistical methodology, with deep roots in probability theory, providesquantitative procedures for extracting scientific knowledge from astronomical dataand for testing astrophysical theory. In recent decades, statistics has enormouslyincreased in scope and sophistication. After a historical perspective, this reviewoutlines concepts of mathematical statistics, elements of probability theory,hypothesis tests, and point estimation. Least squares, maximum likelihood, andBayesian approaches to statistical inference are outlined. Resampling methods,particularly the bootstrap, provide valuable procedures when distributionsfunctions of statistics are not known. Several approaches to model selection andgoodness of fit are considered.
Cap, J.S.
1997-11-01
Defining the maximum expected shock and vibration responses for an on-road truck transportation environment is strongly dependent on the amount of response data that can be obtained. One common test scheme consists of measuring response data over a relatively short prescribed road course and then reviewing that data to obtain the maximum response levels. The more mathematically rigorous alternative is to collect an unbiased ensemble of response data during a long road trip. This paper compares data gathered both ways during a recent on-road certification test for a tractor trailer van being designed by Sandia.
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Ploutz-Snyder, Robert; Fiedler, James
2011-01-01
In their 2009 Annals of Statistics paper, Gavrilov, Benjamini, and Sarkar report the results of a simulation assessing the robustness of their adaptive step-down procedure (GBS) for controlling the false discovery rate (FDR) when normally distributed test statistics are serially correlated. In this study we extend the investigation to the case of multiple comparisons involving correlated non-central t-statistics, in particular when several treatments or time periods are being compared to a control in a repeated-measures design with many dependent outcome measures. In addition, we consider several dependence structures other than serial correlation and illustrate how the FDR depends on the interaction between effect size and the type of correlation structure as indexed by Foerstner s distance metric from an identity. The relationship between the correlation matrix R of the original dependent variables and R, the correlation matrix of associated t-statistics is also studied. In general R depends not only on R, but also on sample size and the signed effect sizes for the multiple comparisons.
Statistics 101 for Radiologists.
Anvari, Arash; Halpern, Elkan F; Samir, Anthony E
2015-10-01
Diagnostic tests have wide clinical applications, including screening, diagnosis, measuring treatment effect, and determining prognosis. Interpreting diagnostic test results requires an understanding of key statistical concepts used to evaluate test efficacy. This review explains descriptive statistics and discusses probability, including mutually exclusive and independent events and conditional probability. In the inferential statistics section, a statistical perspective on study design is provided, together with an explanation of how to select appropriate statistical tests. Key concepts in recruiting study samples are discussed, including representativeness and random sampling. Variable types are defined, including predictor, outcome, and covariate variables, and the relationship of these variables to one another. In the hypothesis testing section, we explain how to determine if observed differences between groups are likely to be due to chance. We explain type I and II errors, statistical significance, and study power, followed by an explanation of effect sizes and how confidence intervals can be used to generalize observed effect sizes to the larger population. Statistical tests are explained in four categories: t tests and analysis of variance, proportion analysis tests, nonparametric tests, and regression techniques. We discuss sensitivity, specificity, accuracy, receiver operating characteristic analysis, and likelihood ratios. Measures of reliability and agreement, including κ statistics, intraclass correlation coefficients, and Bland-Altman graphs and analysis, are introduced.
ERIC Educational Resources Information Center
Lange, Matthew; Dawson, Andrew
2009-01-01
To test claims that postcolonial civil violence is a common legacy of colonialism, we create a dataset on the colonial heritage of 160 countries and explore whether a history of colonialism is related to indicators of inter-communal conflict, political rebellion and civil war in the years 1960-1999. The analysis provides evidence against sweeping…
ERIC Educational Resources Information Center
Cheng, Ying-Yao; Wang, Wen-Chung; Ho, Yi-Hui
2009-01-01
Educational and psychological tests are often composed of multiple short subtests, each measuring a distinct latent trait. Unfortunately, short subtests suffer from low measurement precision, which makes the bandwidth-fidelity dilemma inevitable. In this study, the authors demonstrate how a multidimensional Rasch analysis can be employed to take…
ERIC Educational Resources Information Center
Klein, Ariel; Badia, Toni
2015-01-01
In this study we show how complex creative relations can arise from fairly frequent semantic relations observed in everyday language. By doing this, we reflect on some key cognitive aspects of linguistic and general creativity. In our experimentation, we automated the process of solving a battery of Remote Associates Test tasks. By applying…
Ramos Verri, Fellippo; Santiago Junior, Joel Ferreira; de Faria Almeida, Daniel Augusto; de Oliveira, Guilherme Bérgamo Brandão; de Souza Batista, Victor Eduardo; Marques Honório, Heitor; Noritomi, Pedro Yoshito; Pellizzer, Eduardo Piza
2015-01-02
The study of short implants is relevant to the biomechanics of dental implants, and research on crown increase has implications for the daily clinic. The aim of this study was to analyze the biomechanical interactions of a singular implant-supported prosthesis of different crown heights under vertical and oblique force, using the 3-D finite element method. Six 3-D models were designed with Invesalius 3.0, Rhinoceros 3D 4.0, and Solidworks 2010 software. Each model was constructed with a mandibular segment of bone block, including an implant supporting a screwed metal-ceramic crown. The crown height was set at 10, 12.5, and 15 mm. The applied force was 200 N (axial) and 100 N (oblique). We performed an ANOVA statistical test and Tukey tests; p<0.05 was considered statistically significant. The increase of crown height did not influence the stress distribution on screw prosthetic (p>0.05) under axial load. However, crown heights of 12.5 and 15 mm caused statistically significant damage to the stress distribution of screws and to the cortical bone (p<0.001) under oblique load. High crown to implant (C/I) ratio harmed microstrain distribution on bone tissue under axial and oblique loads (p<0.001). Crown increase was a possible deleterious factor to the screws and to the different regions of bone tissue.
Velasco-Tapia, Fernando
2014-01-01
Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC) volcanic range (Mexican Volcanic Belt). In this locality, the volcanic activity (3.7 to 0.5 Ma) was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward's linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas) in the comingled lavas (binary mixtures).
NASA Astrophysics Data System (ADS)
Calderon, Christopher P.; Weiss, Lucien E.; Moerner, W. E.
2014-05-01
Experimental advances have improved the two- (2D) and three-dimensional (3D) spatial resolution that can be extracted from in vivo single-molecule measurements. This enables researchers to quantitatively infer the magnitude and directionality of forces experienced by biomolecules in their native environment. Situations where such force information is relevant range from mitosis to directed transport of protein cargo along cytoskeletal structures. Models commonly applied to quantify single-molecule dynamics assume that effective forces and velocity in the x ,y (or x ,y,z) directions are statistically independent, but this assumption is physically unrealistic in many situations. We present a hypothesis testing approach capable of determining if there is evidence of statistical dependence between positional coordinates in experimentally measured trajectories; if the hypothesis of independence between spatial coordinates is rejected, then a new model accounting for 2D (3D) interactions can and should be considered. Our hypothesis testing technique is robust, meaning it can detect interactions, even if the noise statistics are not well captured by the model. The approach is demonstrated on control simulations and on experimental data (directed transport of intraflagellar transport protein 88 homolog in the primary cilium).
Velasco-Tapia, Fernando
2014-01-01
Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC) volcanic range (Mexican Volcanic Belt). In this locality, the volcanic activity (3.7 to 0.5 Ma) was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward's linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas) in the comingled lavas (binary mixtures). PMID:24737994
Bridge, P D; Sawilowsky, S S
1999-03-01
To effectively evaluate medical literature, practicing physicians and medical researchers must understand the impact of statistical tests on research outcomes. Applying inefficient statistics not only increases the need for resources, but more importantly increases the probability of committing a Type I or Type II error. The t-test is one of the most prevalent tests used in the medical field and is the uniformally most powerful unbiased test (UMPU) under normal curve theory. But does it maintain its UMPU properties when assumptions of normality are violated? A Monte Carlo investigation evaluates the comparative power of the independent samples t-test and its nonparametric counterpart, the Wilcoxon Rank-Sum (WRS) test, to violations from population normality, using three commonly occurring distributions and small sample sizes. The t-test was more powerful under relatively symmetric distributions, although the magnitude of the differences was moderate. Under distributions with extreme skews, the WRS held large power advantages. When distributions consist of heavier tails or extreme skews, the WRS should be the test of choice. In turn, when population characteristics are unknown, the WRS is recommended, based on the magnitude of these power differences in extreme skews, and the modest variation in symmetric distributions.
Jamalabadi, Hamidreza; Alizadeh, Sarah; Schönauer, Monika; Leibold, Christian; Gais, Steffen
2016-05-01
Multivariate pattern analysis (MVPA) has recently become a popular tool for data analysis. Often, classification accuracy as quantified by correct classification rate (CCR) is used to illustrate the size of the effect under investigation. However, we show that in low sample size (LSS), low effect size (LES) data, which is typical in neuroscience, the distribution of CCRs from cross-validation of linear MVPA is asymmetric and can show classification rates considerably below what would be expected from chance classification. Conversely, the mode of the distribution in these cases is above expected chance levels, leading to a spuriously high number of above chance CCRs. This unexpected distribution has strong implications when using MVPA for hypothesis testing. Our analyses warrant the conclusion that CCRs do not well reflect the size of the effect under investigation. Moreover, the skewness of the null-distribution precludes the use of many standard parametric tests to assess significance of CCRs. We propose that MVPA results should be reported in terms of P values, which are estimated using randomization tests. Also, our results show that cross-validation procedures using a low number of folds, e.g. twofold, are generally more sensitive, even though the average CCRs are often considerably lower than those obtained using a higher number of folds. Hum Brain Mapp 37:1842-1855, 2016. © 2016 Wiley Periodicals, Inc.
Zainal-Abideen, M; Aris, A; Yusof, F; Abdul-Majid, Z; Selamat, A; Omar, S I
2012-01-01
In this study of coagulation operation, a comparison was made between the optimum jar test values for pH, coagulant and coagulant aid obtained from traditional methods (an adjusted one-factor-at-a-time (OFAT) method) and with central composite design (the standard design of response surface methodology (RSM)). Alum (coagulant) and polymer (coagulant aid) were used to treat a water source with very low pH and high aluminium concentration at Sri-Gading water treatment plant (WTP) Malaysia. The optimum conditions for these factors were chosen when the final turbidity, pH after coagulation and residual aluminium were within 0-5 NTU, 6.5-7.5 and 0-0.20 mg/l respectively. Traditional and RSM jar tests were conducted to find their respective optimum coagulation conditions. It was observed that the optimum dose for alum obtained through the traditional method was 12 mg/l, while the value for polymer was set constant at 0.020 mg/l. Through RSM optimization, the optimum dose for alum was 7 mg/l and for polymer was 0.004 mg/l. Optimum pH for the coagulation operation obtained through traditional methods and RSM was 7.6. The final turbidity, pH after coagulation and residual aluminium recorded were all within acceptable limits. The RSM method was demonstrated to be an appropriate approach for the optimization and was validated by a further test.
Paybins, Katherine S.; Nishikawa, Tracy; Izbicki, John A.; Reichard, Eric G.
1998-01-01
To better understand flow processes, solute-transport processes, and ground-water/surface-water interactions on the Santa Clara River in Ventura County, California, a 24-hour fluorescent-dye tracer study was performed under steady-state flow conditions on a 28-mile reach of the river. The study reach includes perennial (uppermost and lowermost) subreaches and ephemeral subreaches of the lower Piru Creek and the middle Santa Clara River. Dye was injected at a site on Piru Creek, and fluorescence of river water was measured continuously at four sites and intermittently at two sites. Discharge measurements were also made at the six sites. The time of travel of the dye, peak dye concentration, and time-variance of time-concentration curves were obtained at each site. The long tails of the time-concentration curves are indicative of sources/sinks within the river, such as riffles and pools, or transient bank storage. A statistical analysis of the data indicates that, in general, the transport characteristics follow Fickian theory. These data and previously collected discharge data were used to calibrate a one-dimensional flow model (DAFLOW) and a solute-transport model (BLTM). DAFLOW solves a simplified form of the diffusion-wave equation and uses empirical relations between flow rate and cross-sectional area, and flow rate and channel width. BLTM uses the velocity data from DAFLOW and solves the advection-dispersion transport equation, including first-order decay. The simulations of dye transport indicated that (1) ground-water recharge explains the loss of dye mass in the middle, ephemeral, subreaches, and (2) ground-water recharge does not explain the loss of dye mass in the uppermost and lowermost, perennial, subreaches. This loss of mass was simulated using a linear decay term. The loss of mass in the perennial subreaches may be caused by a combination of photodecay or adsorption/desorption.
NASA Astrophysics Data System (ADS)
Markowitz, A.
2015-09-01
We summarize two papers providing the first X-ray-derived statistical constraints for both clumpy-torus model parameters and cloud ensemble properties. In Markowitz, Krumpe, & Nikutta (2014), we explored multi-timescale variability in line-of-sight X-ray absorbing gas as a function of optical classification. We examined 55 Seyferts monitored with the Rossi X-ray Timing Explorer, and found in 8 objects a total of 12 eclipses, with durations between hours and years. Most clouds are commensurate with the outer portions of the BLR, or the inner regions of infrared-emitting dusty tori. The detection of eclipses in type Is disfavors sharp-edged tori. We provide probabilities to observe a source undergoing an absorption event for both type Is and IIs, yielding constraints in [N_0, sigma, i] parameter space. In Nikutta et al., in prep., we infer that the small cloud angular sizes, as seen from the SMBH, imply the presence of >10^7 clouds in BLR+torus to explain observed covering factors. Cloud size is roughly proportional to distance from the SMBH, hinting at the formation processes (e.g. disk fragmentation). All observed clouds are sub-critical with respect to tidal disruption; self-gravity alone cannot contain them. External forces (e.g. magnetic fields, ambient pressure) are needed to contain them, or otherwise the clouds must be short-lived. Finally, we infer that the radial cloud density distribution behaves as 1/r^{0.7}, compatible with VLTI observations. Our results span both dusty and non-dusty clumpy media, and probe model parameter space complementary to that for short-term eclipses observed with XMM-Newton, Suzaku, and Chandra.
Ronco, A; Gagnon, P; Diaz-Baez, M C; Arkhipchuk, V; Castillo, G; Castillo, L E; Dutka, B J; Pica-Granados, Y; Ridal, J; Srivastava, R C; Sánchez, A
2002-01-01
There is an urgent need to evaluate the presence of toxicants in waters used for human consumption and to develop strategies to reduce and prevent their contamination. The International Development Research Centre undertook an intercalibration project to develop and validate a battery of bioassays for toxicity testing of water samples. The project was carried out in two phases by research institutions from eight countries that formed the WaterTox network. Results for the first phase were reported in the special September 2000 issue of Environmental Toxicology. Phase II involved toxicity screening tests of environmental and blind samples (chemical solutions of unknown composition to participating laboratories) using the following battery: Daphnia magna, Hydra attenuata, seed root inhibition with Lactuca sativa, and Selenastrum capricornutum. This battery was also used to assess potential toxicity in concentrated (10x) water samples. Results are presented for a set of six blind samples sent to the participating laboratories over a 1-year period. Analyses were performed for each bioassay to evaluate variations among laboratories of responses to negative controls, violations of test quality control criteria, false positive responses induced by sample concentration, and variability within and between labs of responses to toxic samples. Analyses of the data from all bioassays and labs provided comparisons of false positive rates (based on blind negative samples), test sensitivities to a metal or organic toxicant, and interlaboratory test variability. Results indicate that the battery was reliable in detecting toxicity when present. However, some false positives were identified with a concentrated soft-water sample and with the Lactuca and Hydra (sublethal end-point) tests. Probabilities of detecting false positives for individual and combined toxic responses of the four bioassays are presented. Overall, interlaboratory comparisons indicate a good reliability of the
Test of the validity of the spin statistics with X-ray spectroscopy - VIP2 at LNGS Gran Sasso
NASA Astrophysics Data System (ADS)
Marton, Johann; VIP2 Collaboration
2016-09-01
We are experimentally investigating possible violations of standard quantum mechanics predictions in the Gran Sasso underground laboratory in Italy. We test with high precision the Pauli Exclusion Principle and the collapse of the wave function (collapse models). We present our method of searching for possible small violations of the Pauli Exclusion Principle (PEP) for electrons, through the search for ``anomalous'' X-ray transitions in copper atoms, produced by ``fresh'' electrons (brought inside the copper bar by circulating current) which can have the probability to undergo Pauli-forbidden transition to the 1 s level already occupied by two electrons and we describe the VIP2 (VIolation of PEP) experiment under data taking at the Gran Sasso underground laboratories. In this talk the new VIP2 setup installed in the Gran Sasso underground laboratory will be presented. The goal of VIP2 is to test the PEP for electrons with unprecedented accuracy, down to a limit in the probability that PEP is violated at the level of 10E-31. We show preliminary experimental results and discuss implications of a possible violation. Supported by the Austrian Science Fund (project P25529-N20.
Mielenz, Norbert; Spilke, Joachim; Krejcova, Hana; Schüler, Lutz
2006-10-01
Random regression models are widely used in the field of animal breeding for the genetic evaluation of daily milk yields from different test days. These models are capable of handling different environmental effects on the respective test day, and they describe the characteristics of the course of the lactation period by using suitable covariates with fixed and random regression coefficients. As the numerically expensive estimation of parameters is already part of advanced computer software, modifications of random regression models will considerably grow in importance for statistical evaluations of nutrition and behaviour experiments with animals. Random regression models belong to the large class of linear mixed models. Thus, when choosing a model, or more precisely, when selecting a suitable covariance structure of the random effects, the information criteria of Akaike and Schwarz can be used. In this study, the fitting of random regression models for a statistical analysis of a feeding experiment with dairy cows is illustrated under application of the program package SAS. For each of the feeding groups, lactation curves modelled by covariates with fixed regression coefficients are estimated simultaneously. With the help of the fixed regression coefficients, differences between the groups are estimated and then tested for significance. The covariance structure of the random and subject-specific effects and the serial correlation matrix are selected by using information criteria and by estimating correlations between repeated measurements. For the verification of the selected model and the alternative models, mean values and standard deviations estimated with ordinary least square residuals are used.
NASA Astrophysics Data System (ADS)
Markowitz, Alex; Krumpe, Mirko; Nikutta, R.
2016-06-01
In two papers (Markowitz, Krumpe, & Nikutta 2014, and Nikutta et al., in prep.), we derive the first X-ray statistical constraints for clumpy-torus models in Seyfert AGN by quantifying multi-timescale variability in line of-sight X-ray absorbing gas as a function of optical classification.We systematically search for discrete absorption events in the vast archive of RXTE monitoring of 55 nearby type Is and Compton-thin type IIs. We are sensitive to discrete absorption events due to clouds of full-covering, neutral/mildly ionized gas transiting the line of sight. Our results apply to both dusty and non-dusty clumpy media, and probe model parameter space complementary to that for eclipses observed with XMM-Newton, Suzaku, and Chandra.We detect twelve eclipse events in eight Seyferts, roughly tripling the number previously published from this archive. Event durations span hours to years. Most of our detected clouds are Compton-thin, and most clouds' distances from the black hole are inferred to be commensurate with the outer portions of the BLR or the inner regions of infrared-emitting dusty tori.We present the density profiles of the highest-quality eclipse events; the column density profile for an eclipsing cloud in NGC 3783 is doubly spiked, possibly indicating a cloud that is being tidallysheared. We discuss implications for cloud distributions in the context of clumpy-torus models. We calculate eclipse probabilities for orientation-dependent Type I/II unification schemes.We present constraints on cloud sizes, stability, and radial distribution. We infer that clouds' small angular sizes as seen from the SMBH imply 107 clouds required across the BLR + torus. Cloud size is roughly proportional to distance from the black hole, hinting at the formation processes (e.g., disk fragmentation). All observed clouds are sub-critical with respect to tidal disruption; self-gravity alone cannot contain them. External forces, such as magnetic fields or ambient pressure, are
Dziak, John J; Lanza, Stephanie T; Tan, Xianming
2014-01-01
Selecting the number of different classes which will be assumed to exist in the population is an important step in latent class analysis (LCA). The bootstrap likelihood ratio test (BLRT) provides a data-driven way to evaluate the relative adequacy of a (K -1)-class model compared to a K-class model. However, very little is known about how to predict the power or the required sample size for the BLRT in LCA. Based on extensive Monte Carlo simulations, we provide practical effect size measures and power curves which can be used to predict power for the BLRT in LCA given a proposed sample size and a set of hypothesized population parameters. Estimated power curves and tables provide guidance for researchers wishing to size a study to have sufficient power to detect hypothesized underlying latent classes.
Dziak, John J.; Lanza, Stephanie T.; Tan, Xianming
2014-01-01
Selecting the number of different classes which will be assumed to exist in the population is an important step in latent class analysis (LCA). The bootstrap likelihood ratio test (BLRT) provides a data-driven way to evaluate the relative adequacy of a (K −1)-class model compared to a K-class model. However, very little is known about how to predict the power or the required sample size for the BLRT in LCA. Based on extensive Monte Carlo simulations, we provide practical effect size measures and power curves which can be used to predict power for the BLRT in LCA given a proposed sample size and a set of hypothesized population parameters. Estimated power curves and tables provide guidance for researchers wishing to size a study to have sufficient power to detect hypothesized underlying latent classes. PMID:25328371
NASA Astrophysics Data System (ADS)
Jiang, Yulei
2002-04-01
Both student's t-test for paired data and the Dorfman- Berbaum-Metz (DBM) method report a P value in comparing ROC curves of competing diagnostic modalities. We empirically compared the P values from the t-test and the DBM method using data of two observer studies involving the lung-nodule detection (15 readers 240 cases) and breast-lesion classification (10 readers 104 cases). We made 596,637 comparisons based on data drawn from different combinations and subsets of the readers and cases. The average difference in the P values was 0.11 and 0.058 in the lung nodule study (of two separate analyses) and 0.0061 in the breast lesion study. The lung nodule study showed, in the analysis that demonstrated statistical significance with the original full dataset, both P<0.05 or both p>0.05 in 83% of the comparisons. The t-test alone reported P<0.05 in 17%, and the DBM method alone reported P<0.05 in 1% of the comparisons. A second analysis of the part of the lung nodule study that did not show statistical significance with the original full dataset found both P<0.05 or both P>0.05 in 99% of the comparisons. The t-test alone reported P<0.05 in 1%, and the DBM method alone reported P<0.05 in less than 1% of the comparisons. The breast lesion study showed both P<0.05 or both P>0.05 in 91% of the comparisons. The t-test alone reported P<0.05 in 5%, and the DBM method alone reported P<0.05 in 4% of the comparisons. These results indicate that the t-test and the DBM method generally report similar P values, but their conclusions regarding statistical significance often differ and the DBM method should be used because it accounts for both reader and case variances.
Shi, Runhua; McLarty, Jerry W
2009-10-01
In this article, we introduced basic concepts of statistics, type of distributions, and descriptive statistics. A few examples were also provided. The basic concepts presented herein are only a fraction of the concepts related to descriptive statistics. Also, there are many commonly used distributions not presented herein, such as Poisson distributions for rare events and exponential distributions, F distributions, and logistic distributions. More information can be found in many statistics books and publications.
ERIC Educational Resources Information Center
Petocz, Peter; Sowey, Eric
2008-01-01
As a branch of knowledge, Statistics is ubiquitous and its applications can be found in (almost) every field of human endeavour. In this article, the authors track down the possible source of the link between the "Siren song" and applications of Statistics. Answers to their previous five questions and five new questions on Statistics are presented.
Lee, L.; Helsel, D.
2007-01-01
Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
Zlotnik, V A; Zurbuchen, B R; Ptak, T
2001-01-01
Over the last decade the dipole-flow test (DFT) evolved from the general idea of using recirculatory flow to evaluate aquifer properties, to the development of prototype instrumentation and feasibility studies, to a reliable tool for characterization of aquifer heterogeneity. The DFT involves the interpretation of head in recirculatory flow between injection and extraction sections (chambers) in a single well isolated from each other by a multipacker system. In this study, the steady-state dipole flow test (DFT) has been used to characterize the statistics of horizontal hydraulic conductivity (Kr) of the highly permeable, heterogeneous, and thin aquifer at the Horkheimer Insel site, Germany. In previous studies, Kr estimates were based on the steady-state head difference between chambers. A new by-chamber interpretation is proposed that is based on drawdown within each individual chamber. This interpretation yields more detailed information on structure of heterogeneity of the aquifer without introducing complexity into the analysis. The DFT results indicate that Kr ranges from 49 to 6000 m/day (mean ln Kr [(m/s)] approximately -4, and variance of ln Kr [(m/s)] approximately 1-2). Descriptive statistics from the DFT compare well with those from previous field and laboratory tests (pumping, borehole flowmeter, and permeameter tests and grain-size analysis) at this site. It is shown that the role of confining boundaries in the DFT interpretation is negligible even in this case of a thin (< 4 m thick) aquifer. This study demonstrates the flexibility of the DFT and expands the potential application of this method to a wide range of hydrogeologic settings.
Sinha, Ritwik; Luo, Yuqun
2007-01-01
Construction of precise confidence sets of disease gene locations after initial identification of linked regions can improve the efficiency of the ensuing fine mapping effort. We took the confidence set inference, a framework proposed and implemented using the Mean test statistic (CSI-Mean) and improved the efficiency substantially by using a likelihood ratio test statistic (CSI-MLS). The CSI framework requires knowledge of some disease-model-related parameters. In the absence of prior knowledge of these parameters, a two-step procedure may be employed: 1) the parameters are estimated using a coarse map of markers; 2) CSI-Mean or CSI-MLS are applied to construct the confidence sets of the disease gene locations using a finer map of markers, assuming the estimates from Step 1 for the required parameters. In this article we show that the advantages of CSI-MLS over CSI-Mean, previously demonstrated when the required parameters are known, are preserved in this two-step procedure, using both the simulated and real data contributed to Problems 2 and 3 of Genetic Analysis Workshop 15. In addition, our result suggests that microsatellite data, when available, should be used in Step 1. Also explored in detail is the effect of the absence of parental genotypes on the performance of CSI-MLS.
Explorations in statistics: statistical facets of reproducibility.
Curran-Everett, Douglas
2016-06-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This eleventh installment of Explorations in Statistics explores statistical facets of reproducibility. If we obtain an experimental result that is scientifically meaningful and statistically unusual, we would like to know that our result reflects a general biological phenomenon that another researcher could reproduce if (s)he repeated our experiment. But more often than not, we may learn this researcher cannot replicate our result. The National Institutes of Health and the Federation of American Societies for Experimental Biology have created training modules and outlined strategies to help improve the reproducibility of research. These particular approaches are necessary, but they are not sufficient. The principles of hypothesis testing and estimation are inherent to the notion of reproducibility in science. If we want to improve the reproducibility of our research, then we need to rethink how we apply fundamental concepts of statistics to our science.
Flynn, Kevin; Swintek, Joe; Johnson, Rodney
2017-02-01
Because of various Congressional mandates to protect the environment from endocrine disrupting chemicals (EDCs), the United States Environmental Protection Agency (USEPA) initiated the Endocrine Disruptor Screening Program. In the context of this framework, the Office of Research and Development within the USEPA developed the Medaka Extended One Generation Reproduction Test (MEOGRT) to characterize the endocrine action of a suspected EDC. One important endpoint of the MEOGRT is fecundity of medaka breeding pairs. Power analyses were conducted to determine the number of replicates needed in proposed test designs and to determine the effects that varying reproductive parameters (e.g. mean fecundity, variance, and days with no egg production) would have on the statistical power of the test. The MEOGRT Reproduction Power Analysis Tool (MRPAT) is a software tool developed to expedite these power analyses by both calculating estimates of the needed reproductive parameters (e.g. population mean and variance) and performing the power analysis under user specified scenarios. Example scenarios are detailed that highlight the importance of the reproductive parameters on statistical power. When control fecundity is increased from 21 to 38 eggs per pair per day and the variance decreased from 49 to 20, the gain in power is equivalent to increasing replication by 2.5 times. On the other hand, if 10% of the breeding pairs, including controls, do not spawn, the power to detect a 40% decrease in fecundity drops to 0.54 from nearly 0.98 when all pairs have some level of egg production. Perhaps most importantly, MRPAT was used to inform the decision making process that lead to the final recommendation of the MEOGRT to have 24 control breeding pairs and 12 breeding pairs in each exposure group.
Statistical Inference at Work: Statistical Process Control as an Example
ERIC Educational Resources Information Center
Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia
2008-01-01
To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…
Explorations in Statistics: Power
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2010-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This fifth installment of "Explorations in Statistics" revisits power, a concept fundamental to the test of a null hypothesis. Power is the probability that we reject the null hypothesis when it is false. Four…
NSTec Environmental Restoration
2009-04-20
A statistical analysis and geologic evaluation of recently acquired laboratory-derived physical property data are being performed to better understand and more precisely correlate physical properties with specific geologic parameters associated with non-zeolitized tuffs at the Nevada Test Site. Physical property data include wet and dry bulk density, grain density (i.e., specific gravity), total porosity, and effective porosity. Geologic parameters utilized include degree of welding, lithology, stratigraphy, geographic area, and matrix mineralogy (i.e., vitric versus devitrified). Initial results indicate a very good correlation between physical properties and geologic parameters such as degree of welding, lithology, and matrix mineralogy. However, physical properties appear to be independent of stratigraphy and geographic area, suggesting that the data are transferrable with regards to these two geologic parameters. Statistical analyses also indicate that the assumed grain density of 2.65 grams per cubic centimeter used to calculate porosity in some samples is too high. This results in corresponding calculated porosity values approximately 5 percent too high (e.g., 45 percent versus 40 percent), which can be significant in the lower porosity rocks. Similar analyses and evaluations of zeolitic tuffs and carbonate rock physical properties data are ongoing as well as comparisons to geophysical log values.
Price-Whelan, Adrian M.; Agüeros, Marcel A.; Fournier, Amanda P.; Street, Rachel; Ofek, Eran O.; Covey, Kevin R.; Levitan, David; Sesar, Branimir; Laher, Russ R.; Surace, Jason
2014-01-20
Many photometric time-domain surveys are driven by specific goals, such as searches for supernovae or transiting exoplanets, which set the cadence with which fields are re-imaged. In the case of the Palomar Transient Factory (PTF), several sub-surveys are conducted in parallel, leading to non-uniform sampling over its ∼20,000 deg{sup 2} footprint. While the median 7.26 deg{sup 2} PTF field has been imaged ∼40 times in the R band, ∼2300 deg{sup 2} have been observed >100 times. We use PTF data to study the trade off between searching for microlensing events in a survey whose footprint is much larger than that of typical microlensing searches, but with far-from-optimal time sampling. To examine the probability that microlensing events can be recovered in these data, we test statistics used on uniformly sampled data to identify variables and transients. We find that the von Neumann ratio performs best for identifying simulated microlensing events in our data. We develop a selection method using this statistic and apply it to data from fields with >10 R-band observations, 1.1 × 10{sup 9} light curves, uncovering three candidate microlensing events. We lack simultaneous, multi-color photometry to confirm these as microlensing events. However, their number is consistent with predictions for the event rate in the PTF footprint over the survey's three years of operations, as estimated from near-field microlensing models. This work can help constrain all-sky event rate predictions and tests microlensing signal recovery in large data sets, which will be useful to future time-domain surveys, such as that planned with the Large Synoptic Survey Telescope.
... population, or about 25 million Americans, has experienced tinnitus lasting at least five minutes in the past ... by NIDCD Epidemiology and Statistics Program staff: (1) tinnitus prevalence was obtained from the 2008 National Health ...
Schulpen, Sjors H W; Pennings, Jeroen L A; Tonk, Elisa C M; Piersma, Aldert H
2014-03-21
The embryonic stem cell test (EST) is applied as a model system for detection of embryotoxicants. The application of transcriptomics allows a more detailed effect assessment compared to the morphological endpoint. Genes involved in cell differentiation, modulated by chemical exposures, may be useful as biomarkers of developmental toxicity. We describe a statistical approach to obtain a predictive gene set for toxicity potency ranking of compounds within one class. This resulted in a gene set based on differential gene expression across concentration-response series of phthalatic monoesters. We determined the concentration at which gene expression was changed at least 1.5-fold. Genes responding with the same potency ranking in vitro and in vivo embryotoxicity were selected. A leave-one-out cross-validation showed that the relative potency of each phthalate was always predicted correctly. The classical morphological 50% effect level (ID50) in EST was similar to the predicted concentration using gene set expression responses. A general down-regulation of development-related genes and up-regulation of cell-cycle related genes was observed, reminiscent of the differentiation inhibition in EST. This study illustrates the feasibility of applying dedicated gene set selections as biomarkers for developmental toxicity potency ranking on the basis of in vitro testing in the EST.
Osnes, J.D. ); Winberg, A.; Andersson, J.E.; Larsson, N.A. )
1991-09-27
Statistical and probabilistic methods for estimating the probability that a fracture is nonconductive (or equivalently, the conductive-fracture frequency) and the distribution of the transmissivities of conductive fractures from transmissivity measurements made in single-hole injection (well) tests were developed. These methods were applied to a database consisting of over 1,000 measurements made in nearly 25 km of borehole at five sites in Sweden. The depths of the measurements ranged from near the surface to over 600-m deep, and packer spacings of 20- and 25-m were used. A probabilistic model that describes the distribution of a series of transmissivity measurements was derived. When the parameters of this model were estimated using maximum likelihood estimators, the resulting estimated distributions generally fit the cumulative histograms of the transmissivity measurements very well. Further, estimates of the mean transmissivity of conductive fractures based on the maximum likelihood estimates of the model's parameters were reasonable, both in magnitude and in trend, with respect to depth. The estimates of the conductive fracture probability were generated in the range of 0.5--5.0 percent, with the higher values at shallow depths and with increasingly smaller values as depth increased. An estimation procedure based on the probabilistic model and the maximum likelihood estimators of its parameters was recommended. Some guidelines regarding the design of injection test programs were drawn from the recommended estimation procedure and the parameter estimates based on the Swedish data. 24 refs., 12 figs., 14 tabs.
Cason, J A; Cox, N A; Buhr, R J; Richardson, L J
2010-09-01
Whether a required Salmonella test series is passed or failed depends not only on the presence of the bacteria but also on the methods for taking samples, the methods for culturing samples, and the statistics associated with the sampling plan. The pass-fail probabilities of the 2-class attribute sampling plans used for testing chilled chicken carcasses in the United States and Europe were compared by calculation and simulation. Testing in the United States uses whole-carcass rinses (WCR), with a maximum number of 12 positives out of 51 carcasses in a test set. Those numbers were chosen so that a plant operating with a Salmonella prevalence of 20%, the national baseline result for broiler chicken carcasses, has an approximately 80% probability of passing a test set. The European Union requires taking neck skin samples of approximately 8.3 g each from 150 carcasses, with the neck skins cultured in pools of 3 and with 7 positives as the maximum passing score for a test set of 50 composite samples. For each of these sampling plans, binomial probabilities were calculated and 100,000 complete sampling sets were simulated using a random number generator in a spreadsheet. Calculations indicated that a 20% positive rate in WCR samples was approximately equivalent to an 11.42% positive rate in composite neck skin samples or a 3.96% positive rate in individual neck skin samples within a pool of 3. With 20% as the prevalence rate, 79.3% of the simulated WCR sets passed with 12 or fewer positive carcasses per set, very near the expected 80% rate. Under simulated European conditions, a Salmonella prevalence of 3.96% in individual neck skin samples yielded a passing rate of 79.1%. The 2 sampling plans thus have roughly equivalent outcomes if WCR samples have a Salmonella-positive rate of 20% and individual neck skin samples have a positive rate of 3.96%. Sampling and culturing methods must also be considered in comparing the different standards for Salmonella.
Dexter, Alex; Race, Alan M; Styles, Iain B; Bunch, Josephine
2016-11-15
Spatial clustering is a powerful tool in mass spectrometry imaging (MSI) and has been demonstrated to be capable of differentiating tumor types, visualizing intratumor heterogeneity, and segmenting anatomical structures. Several clustering methods have been applied to mass spectrometry imaging data, but a principled comparison and evaluation of different clustering techniques presents a significant challenge. We propose that testing whether the data has a multivariate normal distribution within clusters can be used to evaluate the performance when using algorithms that assume normality in the data, such as k-means clustering. In cases where clustering has been performed using the cosine distance, conversion of the data to polar coordinates prior to normality testing should be performed to ensure normality is tested in the correct coordinate system. In addition to these evaluations of internal consistency, we demonstrate that the multivariate normal distribution can then be used as a basis for statistical modeling of MSI data. This allows the generation of synthetic MSI data sets with known ground truth, providing a means of external clustering evaluation. To demonstrate this, reference data from seven anatomical regions of an MSI image of a coronal section of mouse brain were modeled. From this, a set of synthetic data based on this model was generated. Results of r(2) fitting of the chi-squared quantile-quantile plots on the seven anatomical regions confirmed that the data acquired from each spatial region was found to be closer to normally distributed in polar space than in Euclidean. Finally, principal component analysis was applied to a single data set that included synthetic and real data. No significant differences were found between the two data types, indicating the suitability of these methods for generating realistic synthetic data.
ERIC Educational Resources Information Center
Chicot, Katie; Holmes, Hilary
2012-01-01
The use, and misuse, of statistics is commonplace, yet in the printed format data representations can be either over simplified, supposedly for impact, or so complex as to lead to boredom, supposedly for completeness and accuracy. In this article the link to the video clip shows how dynamic visual representations can enliven and enhance the…
NASA Technical Reports Server (NTRS)
Young, M.; Koslovsky, M.; Schaefer, Caroline M.; Feiveson, A. H.
2017-01-01
Back by popular demand, the JSC Biostatistics Laboratory and LSAH statisticians are offering an opportunity to discuss your statistical challenges and needs. Take the opportunity to meet the individuals offering expert statistical support to the JSC community. Join us for an informal conversation about any questions you may have encountered with issues of experimental design, analysis, or data visualization. Get answers to common questions about sample size, repeated measures, statistical assumptions, missing data, multiple testing, time-to-event data, and when to trust the results of your analyses.
NASA Astrophysics Data System (ADS)
Frimann, S.; Jørgensen, J. K.; Haugbølle, T.
2016-02-01
Context. Both observations and simulations of embedded protostars have progressed rapidly in recent years. Bringing them together is an important step in advancing our knowledge about the earliest phases of star formation. Aims: To compare synthetic continuum images and spectral energy distributions (SEDs), calculated from large-scale numerical simulations, to observational studies, thereby aiding in both the interpretation of the observations and in testing the fidelity of the simulations. Methods: The adaptive mesh refinement code, RAMSES, is used to simulate the evolution of a 5 pc × 5 pc × 5 pc molecular cloud. The simulation has a maximum resolution of 8 AU, resolving simultaneously the molecular cloud on parsec scales and individual protostellar systems on AU scales. The simulation is post-processed with the radiative transfer code RADMC-3D, which is used to create synthetic continuum images and SEDs of the protostellar systems. In this way, more than 13 000 unique radiative transfer models, of a variety of different protostellar systems, are produced. Results: Over the course of 0.76 Myr the simulation forms more than 500 protostars, primarily within two sub-clusters. The synthetic SEDs are used to calculate evolutionary tracers Tbol and Lsmm/Lbol. It is shown that, while the observed distributions of the tracers are well matched by the simulation, they generally do a poor job of tracking the protostellar ages. Disks form early in the simulation, with 40% of the Class 0 protostars being encircled by one. The flux emission from the simulated disks is found to be, on average, a factor ~6 too low relative to real observations; an issue that can be traced back to numerical effects on the smallest scales in the simulation. The simulated distribution of protostellar luminosities spans more than three order of magnitudes, similar to the observed distribution. Cores and protostars are found to be closely associated with one another, with the distance distribution
Croarkin, M. Carroll
2001-01-01
For more than 50 years, the Statistical Engineering Division (SED) has been instrumental in the success of a broad spectrum of metrology projects at NBS/NIST. This paper highlights fundamental contributions of NBS/NIST statisticians to statistics and to measurement science and technology. Published methods developed by SED staff, especially during the early years, endure as cornerstones of statistics not only in metrology and standards applications, but as data-analytic resources used across all disciplines. The history of statistics at NBS/NIST began with the formation of what is now the SED. Examples from the first five decades of the SED illustrate the critical role of the division in the successful resolution of a few of the highly visible, and sometimes controversial, statistical studies of national importance. A review of the history of major early publications of the division on statistical methods, design of experiments, and error analysis and uncertainty is followed by a survey of several thematic areas. The accompanying examples illustrate the importance of SED in the history of statistics, measurements and standards: calibration and measurement assurance, interlaboratory tests, development of measurement methods, Standard Reference Materials, statistical computing, and dissemination of measurement technology. A brief look forward sketches the expanding opportunity and demand for SED statisticians created by current trends in research and development at NIST. PMID:27500023
ERIC Educational Resources Information Center
Delaval, Marine; Michinov, Nicolas; Le Bohec, Olivier; Le Hénaff, Benjamin
2017-01-01
The aim of this study was to examine how social or temporal-self comparison feedback, delivered in real-time in a web-based training environment, could influence the academic performance of students in a statistics examination. First-year psychology students were given the opportunity to train for a statistics examination during a semester by…
Rendón-Macías, Mario Enrique; Villasís-Keever, Miguel Ángel; Miranda-Novales, María Guadalupe
2016-01-01
Descriptive statistics is the branch of statistics that gives recommendations on how to summarize clearly and simply research data in tables, figures, charts, or graphs. Before performing a descriptive analysis it is paramount to summarize its goal or goals, and to identify the measurement scales of the different variables recorded in the study. Tables or charts aim to provide timely information on the results of an investigation. The graphs show trends and can be histograms, pie charts, "box and whiskers" plots, line graphs, or scatter plots. Images serve as examples to reinforce concepts or facts. The choice of a chart, graph, or image must be based on the study objectives. Usually it is not recommended to use more than seven in an article, also depending on its length.
Order Statistics and Nonparametric Statistics.
2014-09-26
Topics investigated include the following: Probability that a fuze will fire; moving order statistics; distribution theory and properties of the...problem posed by an Army Scientist: A fuze will fire when at least n-i (or n-2) of n detonators function within time span t. What is the probability of
NASA Astrophysics Data System (ADS)
Goodman, Joseph W.
2000-07-01
The Wiley Classics Library consists of selected books that have become recognized classics in their respective fields. With these new unabridged and inexpensive editions, Wiley hopes to extend the life of these important works by making them available to future generations of mathematicians and scientists. Currently available in the Series: T. W. Anderson The Statistical Analysis of Time Series T. S. Arthanari & Yadolah Dodge Mathematical Programming in Statistics Emil Artin Geometric Algebra Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences Robert G. Bartle The Elements of Integration and Lebesgue Measure George E. P. Box & Norman R. Draper Evolutionary Operation: A Statistical Method for Process Improvement George E. P. Box & George C. Tiao Bayesian Inference in Statistical Analysis R. W. Carter Finite Groups of Lie Type: Conjugacy Classes and Complex Characters R. W. Carter Simple Groups of Lie Type William G. Cochran & Gertrude M. Cox Experimental Designs, Second Edition Richard Courant Differential and Integral Calculus, Volume I RIchard Courant Differential and Integral Calculus, Volume II Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume I Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume II D. R. Cox Planning of Experiments Harold S. M. Coxeter Introduction to Geometry, Second Edition Charles W. Curtis & Irving Reiner Representation Theory of Finite Groups and Associative Algebras Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume I Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume II Cuthbert Daniel Fitting Equations to Data: Computer Analysis of Multifactor Data, Second Edition Bruno de Finetti Theory of Probability, Volume I Bruno de Finetti Theory of Probability, Volume 2 W. Edwards Deming Sample Design in Business Research
NASA Technical Reports Server (NTRS)
da Silva, Arlindo M.; Norris, Peter M.
2013-01-01
Part I presented a Monte Carlo Bayesian method for constraining a complex statistical model of GCM sub-gridcolumn moisture variability using high-resolution MODIS cloud data, thereby permitting large-scale model parameter estimation and cloud data assimilation. This part performs some basic testing of this new approach, verifying that it does indeed significantly reduce mean and standard deviation biases with respect to the assimilated MODIS cloud optical depth, brightness temperature and cloud top pressure, and that it also improves the simulated rotational-Ramman scattering cloud optical centroid pressure (OCP) against independent (non-assimilated) retrievals from the OMI instrument. Of particular interest, the Monte Carlo method does show skill in the especially difficult case where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach allows finite jumps into regions of non-zero cloud probability. In the example provided, the method is able to restore marine stratocumulus near the Californian coast where the background state has a clear swath. This paper also examines a number of algorithmic and physical sensitivities of the new method and provides guidance for its cost-effective implementation. One obvious difficulty for the method, and other cloud data assimilation methods as well, is the lack of information content in the cloud observables on cloud vertical structure, beyond cloud top pressure and optical thickness, thus necessitating strong dependence on the background vertical moisture structure. It is found that a simple flow-dependent correlation modification due to Riishojgaard (1998) provides some help in this respect, by better honoring inversion structures in the background state.
Pekney, Natalie J.; Cheng, Hanqi; Small, Mitchell J.
2015-11-05
Abstract: The objective of the current work was to develop a statistical method and associated tool to evaluate the impact of oil and natural gas exploration and production activities on local air quality.
NASA Astrophysics Data System (ADS)
Grégoire, G.
2016-05-01
This chapter is devoted to two objectives. The first one is to answer the request expressed by attendees of the first Astrostatistics School (Annecy, October 2013) to be provided with an elementary vademecum of statistics that would facilitate understanding of the given courses. In this spirit we recall very basic notions, that is definitions and properties that we think sufficient to benefit from courses given in the Astrostatistical School. Thus we give briefly definitions and elementary properties on random variables and vectors, distributions, estimation and tests, maximum likelihood methodology. We intend to present basic ideas in a hopefully comprehensible way. We do not try to give a rigorous presentation, and due to the place devoted to this chapter, can cover only a rather limited field of statistics. The second aim is to focus on some statistical tools that are useful in classification: basic introduction to Bayesian statistics, maximum likelihood methodology, Gaussian vectors and Gaussian mixture models.
NASA Astrophysics Data System (ADS)
Paine, Gregory Harold
1982-03-01
The primary objective of the thesis is to explore the dynamical properties of small nerve networks by means of the methods of statistical mechanics. To this end, a general formalism is developed and applied to elementary groupings of model neurons which are driven by either constant (steady state) or nonconstant (nonsteady state) forces. Neuronal models described by a system of coupled, nonlinear, first-order, ordinary differential equations are considered. A linearized form of the neuronal equations is studied in detail. A Lagrange function corresponding to the linear neural network is constructed which, through a Legendre transformation, provides a constant of motion. By invoking the Maximum-Entropy Principle with the single integral of motion as a constraint, a probability distribution function for the network in a steady state can be obtained. The formalism is implemented for some simple networks driven by a constant force; accordingly, the analysis focuses on a study of fluctuations about the steady state. In particular, a network composed of N noninteracting neurons, termed Free Thinkers, is considered in detail, with a view to interpretation and numerical estimation of the Lagrange multiplier corresponding to the constant of motion. As an archetypical example of a net of interacting neurons, the classical neural oscillator, consisting of two mutually inhibitory neurons, is investigated. It is further shown that in the case of a network driven by a nonconstant force, the Maximum-Entropy Principle can be applied to determine a probability distribution functional describing the network in a nonsteady state. The above examples are reconsidered with nonconstant driving forces which produce small deviations from the steady state. Numerical studies are performed on simplified models of two physical systems: the starfish central nervous system and the mammalian olfactory bulb. Discussions are given as to how statistical neurodynamics can be used to gain a better
ERIC Educational Resources Information Center
Bedian, Vahe
The material looks at: 1) Choosing a Clinical Test; 2) Evaluation of Diagnostic Tests; 3) Multi-Disease and Multi-Test Analysis; and 4) Medical Decision Analysis. It is felt users will be able to: 1) calculate the predictive value of a positive or negative test result in model clinical situations; 2) estimate the sensitivity and specificity…
Tellinghuisen, Joel
2008-01-01
The method of least squares is probably the most powerful data analysis tool available to scientists. Toward a fuller appreciation of that power, this work begins with an elementary review of statistics fundamentals, and then progressively increases in sophistication as the coverage is extended to the theory and practice of linear and nonlinear least squares. The results are illustrated in application to data analysis problems important in the life sciences. The review of fundamentals includes the role of sampling and its connection to probability distributions, the Central Limit Theorem, and the importance of finite variance. Linear least squares are presented using matrix notation, and the significance of the key probability distributions-Gaussian, chi-square, and t-is illustrated with Monte Carlo calculations. The meaning of correlation is discussed, including its role in the propagation of error. When the data themselves are correlated, special methods are needed for the fitting, as they are also when fitting with constraints. Nonlinear fitting gives rise to nonnormal parameter distributions, but the 10% Rule of Thumb suggests that such problems will be insignificant when the parameter is sufficiently well determined. Illustrations include calibration with linear and nonlinear response functions, the dangers inherent in fitting inverted data (e.g., Lineweaver-Burk equation), an analysis of the reliability of the van't Hoff analysis, the problem of correlated data in the Guggenheim method, and the optimization of isothermal titration calorimetry procedures using the variance-covariance matrix for experiment design. The work concludes with illustrations on assessing and presenting results.
Zhang, Xiaoshuai; Yang, Xiaowei; Yuan, Zhongshang; Liu, Yanxun; Li, Fangyu; Peng, Bin; Zhu, Dianwen; Zhao, Jinghua; Xue, Fuzhong
2013-01-01
For genome-wide association data analysis, two genes in any pathway, two SNPs in the two linked gene regions respectively or in the two linked exons respectively within one gene are often correlated with each other. We therefore proposed the concept of gene-gene co-association, which refers to the effects not only due to the traditional interaction under nearly independent condition but the correlation between two genes. Furthermore, we constructed a novel statistic for detecting gene-gene co-association based on Partial Least Squares Path Modeling (PLSPM). Through simulation, the relationship between traditional interaction and co-association was highlighted under three different types of co-association. Both simulation and real data analysis demonstrated that the proposed PLSPM-based statistic has better performance than single SNP-based logistic model, PCA-based logistic model, and other gene-based methods.
Seneta, Eugene; Seif, Fritz J; Liebermeister, Hermann; Dietz, Klaus
2004-11-01
In the second half of the nineteenth century, when the typical course of various febrile clinical phenomena was found to be specific to particular infectious diseases, Carl Liebermeister successfully pioneered the investigation of the patho physiology of fever and the regulation of body temperature. He applied biophysical and pharmacological antipyresis, especially for the treatment of typhoid fever, and developed new statistical tools for the evaluation of therapeutic results.
... PRS GO PSN PSEN GRAFT Contact Us News Plastic Surgery Statistics Plastic surgery procedural statistics from the ... Plastic Surgery Statistics 2005 Plastic Surgery Statistics 2016 Plastic Surgery Statistics Stats Report 2016 National Clearinghouse of ...
ERIC Educational Resources Information Center
Longford, Nicholas T.
A case is presented for adjusting the scores for free response items in the Advanced Placement (AP) tests. Using information about the rating process from the reliability studies, administrations of the AP test for three subject areas, psychology, computer science, and English language and composition, are analyzed. In the reliability studies, 299…
Cooley, Laura A.; Oster, Alexandra M.; Rose, Charles E.; Wejnert, Cyprian; Le, Binh C.; Paz-Bailey, Gabriela
2014-01-01
In 2011, 62% of estimated new HIV diagnoses in the United States were attributed to male-to-male sexual contact (men who have sex with men, MSM); 39% of these MSM were black or African American. HIV testing, recommended at least annually by CDC for sexually active MSM, is an essential first step in HIV care and treatment for HIV-positive individuals. A variety of HIV testing initiatives, designed to reach populations disproportionately affected by HIV, have been developed at both national and local levels. We assessed changes in HIV testing behavior among MSM participating in the National HIV Behavioral Surveillance System in 2008 and 2011. We compared the percentages tested in the previous 12 months in 2008 and 2011, overall and by race/ethnicity and age group. In unadjusted analyses, recent HIV testing increased from 63% in 2008 to 67% in 2011 overall (P<0.001), from 63% to 71% among black MSM (P<0.001), and from 63% to 75% among MSM of other/multiple races (P<0.001); testing did not increase significantly for white or Hispanic/Latino MSM. Multivariable model results indicated an overall increase in recent HIV testing (adjusted prevalence ratio [aPR] = 1.07, P<0.001). Increases were largest for black MSM (aPR = 1.12, P<0.001) and MSM of other/multiple races (aPR = 1.20, P<0.001). Among MSM aged 18–19 years, recent HIV testing was shown to increase significantly among black MSM (aPR = 1.20, P = 0.007), but not among MSM of other racial/ethnic groups. Increases in recent HIV testing among populations most affected by HIV are encouraging, but despite these increases, improved testing coverage is needed to meet CDC recommendations. PMID:25180514
Guo, Junfeng; Wang, Chao; Chan, Kung-Sik; Jin, Dakai; Saha, Punam K.; Sieren, Jered P.; Barr, R. G.; Han, MeiLan K.; Kazerooni, Ella; Cooper, Christopher B.; Couper, David; Hoffman, Eric A.
2016-01-01
Purpose: A test object (phantom) is an important tool to evaluate comparability and stability of CT scanners used in multicenter and longitudinal studies. However, there are many sources of error that can interfere with the test object-derived quantitative measurements. Here the authors investigated three major possible sources of operator error in the use of a test object employed to assess pulmonary density-related as well as airway-related metrics. Methods: Two kinds of experiments were carried out to assess measurement variability caused by imperfect scanning status. The first one consisted of three experiments. A COPDGene test object was scanned using a dual source multidetector computed tomographic scanner (Siemens Somatom Flash) with the Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS) inspiration protocol (120 kV, 110 mAs, pitch = 1, slice thickness = 0.75 mm, slice spacing = 0.5 mm) to evaluate the effects of tilt angle, water bottle offset, and air bubble size. After analysis of these results, a guideline was reached in order to achieve more reliable results for this test object. Next the authors applied the above findings to 2272 test object scans collected over 4 years as part of the SPIROMICS study. The authors compared changes of the data consistency before and after excluding the scans that failed to pass the guideline. Results: This study established the following limits for the test object: tilt index ≤0.3, water bottle offset limits of [−6.6 mm, 7.4 mm], and no air bubble within the water bottle, where tilt index is a measure incorporating two tilt angles around x- and y-axis. With 95% confidence, the density measurement variation for all five interested materials in the test object (acrylic, water, lung, inside air, and outside air) resulting from all three error sources can be limited to ±0.9 HU (summed in quadrature), when all the requirements are satisfied. The authors applied these criteria to 2272 SPIROMICS
... Standards Act and Program MQSA Insights MQSA National Statistics Share Tweet Linkedin Pin it More sharing options ... but should level off with time. Archived Scorecard Statistics 2017 Scorecard Statistics 2016 Scorecard Statistics (Archived) 2015 ...
ERIC Educational Resources Information Center
Neman, Ronald S.; And Others
The study represents an extension of previous research involving the development of scales for the five-card, orally administered, and tape-recorded version of the Thematic Apperception Test(TAT). Scale development is documented and national norms are presented based on a national probability sample of 1,398 youths administered the Cycle III test…
Nakatsuka, Hiroki; Matsubara, Ichirou; Ohtani, Haruhiko
2003-04-01
The aim of this single photon emission computed tomography(SPECT) study was to determine the abnormality of the regional cerebral blood flow(rCBF) using a three-dimensional stereotactic surface projection (3 D-SSP) in 18 patients who were referred to the hospital because of forgetfulness. Two intergroup comparison by 3 D-SSP analysis was conducted based on MRI, kana pick-out test and Mini-Mental State Examination (MMSE) results. Of the MRI findings, in the brain atrophy group, rCBF was decreased in the posterior cingulate gyrus, medial temporal structure and parieto-temporal association cortex; these rCBF-decreased areas are similar to the Alzheimer disease pattern. In the group where the MMSE was normal but the kana pick-out test was abnormal, rCBF was decreased in the posterior cingulate gyrus and cinguloparietal transitional area. In the group where both the MMSE and kana pick-out test were abnormal, rCBF was decreased in the parieto-temporal association cortex, temporal cortex and medial temporal structure. These results suggest that 3 D-SSP analysis of the SPECT with MMSE and the kana pick-out test provides the possibility of early diagnosis of initial stage of Alzheimer's disease.
ERIC Educational Resources Information Center
SAW, J.G.
THIS VOLUME DEALS WITH THE BIVARIATE NORMAL DISTRIBUTION. THE AUTHOR MAKES A DISTINCTION BETWEEN DISTRIBUTION AND DENSITY FROM WHICH HE DEVELOPS THE CONSEQUENCES OF THIS DISTINCTION FOR HYPOTHESIS TESTING. OTHER ENTRIES IN THIS SERIES ARE ED 003 044 AND ED 003 045. (JK)
ERIC Educational Resources Information Center
Lee, Y-W.
2004-01-01
The purpose of the study reported in this article is to empirically examine passage-related local item dependence (LID) by using an IRT (item response theory) based LID index called Q3 in an EFL reading comprehension test, with a special focus on item types as a potentially competing source of LID with passages. In this article, definitions and…
Statistical assessment of Monte Carlo distributional tallies
Kiedrowski, Brian C; Solomon, Clell J
2010-12-09
Four tests are developed to assess the statistical reliability of distributional or mesh tallies. To this end, the relative variance density function is developed and its moments are studied using simplified, non-transport models. The statistical tests are performed upon the results of MCNP calculations of three different transport test problems and appear to show that the tests are appropriate indicators of global statistical quality.
Statistical analysis principles for Omics data.
Dunkler, Daniela; Sánchez-Cabo, Fátima; Heinze, Georg
2011-01-01
In Omics experiments, typically thousands of hypotheses are tested simultaneously, each based on very few independent replicates. Traditional tests like the t-test were shown to perform poorly with this new type of data. Furthermore, simultaneous consideration of many hypotheses, each prone to a decision error, requires powerful adjustments for this multiple testing situation. After a general introduction to statistical testing, we present the moderated t-statistic, the SAM statistic, and the RankProduct statistic which have been developed to evaluate hypotheses in typical Omics experiments. We also provide an introduction to the multiple testing problem and discuss some state-of-the-art procedures to address this issue. The presented test statistics are subjected to a comparative analysis of a microarray experiment comparing tissue samples of two groups of tumors. All calculations can be done using the freely available statistical software R. Accompanying, commented code is available at: http://www.meduniwien.ac.at/msi/biometrie/MIMB.
Statistics Poker: Reinforcing Basic Statistical Concepts
ERIC Educational Resources Information Center
Leech, Nancy L.
2008-01-01
Learning basic statistical concepts does not need to be tedious or dry; it can be fun and interesting through cooperative learning in the small-group activity of Statistics Poker. This article describes a teaching approach for reinforcing basic statistical concepts that can help students who have high anxiety and makes learning and reinforcing…
Predict! Teaching Statistics Using Informational Statistical Inference
ERIC Educational Resources Information Center
Makar, Katie
2013-01-01
Statistics is one of the most widely used topics for everyday life in the school mathematics curriculum. Unfortunately, the statistics taught in schools focuses on calculations and procedures before students have a chance to see it as a useful and powerful tool. Researchers have found that a dominant view of statistics is as an assortment of tools…
Hasegawa, Takahiro
2016-09-01
In recent years, immunological science has evolved, and cancer vaccines are now approved and available for treating existing cancers. Because cancer vaccines require time to elicit an immune response, a delayed treatment effect is expected and is actually observed in drug approval studies. Accordingly, we propose the evaluation of survival endpoints by weighted log-rank tests with the Fleming-Harrington class of weights. We consider group sequential monitoring, which allows early efficacy stopping, and determine a semiparametric information fraction for the Fleming-Harrington family of weights, which is necessary for the error spending function. Moreover, we give a flexible survival model in cancer vaccine studies that considers not only the delayed treatment effect but also the long-term survivors. In a Monte Carlo simulation study, we illustrate that when the primary analysis is a weighted log-rank test emphasizing the late differences, the proposed information fraction can be a useful alternative to the surrogate information fraction, which is proportional to the number of events. Copyright © 2016 John Wiley & Sons, Ltd.
SOCR: Statistics Online Computational Resource
Dinov, Ivo D.
2011-01-01
The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an integrated educational web-based framework for: interactive distribution modeling, virtual online probability experimentation, statistical data analysis, visualization and integration. Following years of experience in statistical teaching at all college levels using established licensed statistical software packages, like STATA, S-PLUS, R, SPSS, SAS, Systat, etc., we have attempted to engineer a new statistics education environment, the Statistics Online Computational Resource (SOCR). This resource performs many of the standard types of statistical analysis, much like other classical tools. In addition, it is designed in a plug-in object-oriented architecture and is completely platform independent, web-based, interactive, extensible and secure. Over the past 4 years we have tested, fine-tuned and reanalyzed the SOCR framework in many of our undergraduate and graduate probability and statistics courses and have evidence that SOCR resources build student’s intuition and enhance their learning. PMID:21451741
Ranald Macdonald and statistical inference.
Smith, Philip T
2009-05-01
Ranald Roderick Macdonald (1945-2007) was an important contributor to mathematical psychology in the UK, as a referee and action editor for British Journal of Mathematical and Statistical Psychology and as a participant and organizer at the British Psychological Society's Mathematics, statistics and computing section meetings. This appreciation argues that his most important contribution was to the foundations of significance testing, where his concern about what information was relevant in interpreting the results of significance tests led him to be a persuasive advocate for the 'Weak Fisherian' form of hypothesis testing.
Statistical Methods in Cosmology
NASA Astrophysics Data System (ADS)
Verde, L.
2010-03-01
The advent of large data-set in cosmology has meant that in the past 10 or 20 years our knowledge and understanding of the Universe has changed not only quantitatively but also, and most importantly, qualitatively. Cosmologists rely on data where a host of useful information is enclosed, but is encoded in a non-trivial way. The challenges in extracting this information must be overcome to make the most of a large experimental effort. Even after having converged to a standard cosmological model (the LCDM model) we should keep in mind that this model is described by 10 or more physical parameters and if we want to study deviations from it, the number of parameters is even larger. Dealing with such a high dimensional parameter space and finding parameters constraints is a challenge on itself. Cosmologists want to be able to compare and combine different data sets both for testing for possible disagreements (which could indicate new physics) and for improving parameter determinations. Finally, cosmologists in many cases want to find out, before actually doing the experiment, how much one would be able to learn from it. For all these reasons, sophisiticated statistical techniques are being employed in cosmology, and it has become crucial to know some statistical background to understand recent literature in the field. I will introduce some statistical tools that any cosmologist should know about in order to be able to understand recently published results from the analysis of cosmological data sets. I will not present a complete and rigorous introduction to statistics as there are several good books which are reported in the references. The reader should refer to those.
Schultheiss, Oliver C; Yankova, Diana; Dirlikov, Benjamin; Schad, Daniel J
2009-01-01
Previous studies that have examined the relationship between implicit and explicit motive measures have consistently found little variance overlap between both types of measures regardless of thematic content domain (i.e., power, achievement, affiliation). However, this independence may be artifactual because the primary means of measuring implicit motives--content-coding stories people write about picture cues--are incommensurable with the primary means of measuring explicit motives: having individuals fill out self-report scales. To provide a better test of the presumed independence between both types of measures, we measured implicit motives with a Picture Story Exercise (PSE; McClelland, Koestner, & Weinberger, 1989) and explicit motives with a cue- and response-matched questionnaire version of the PSE (PSE-Q) and a traditional measure of explicit motives, the Personality Research Form (PRF; Jackson, 1984) in 190 research participants. Correlations between the PSE and the PSE-Q were small and mostly nonsignificant, whereas the PSE-Q showed significant variance overlap with the PRF within and across thematic domains. We conclude that the independence postulate holds even when more commensurable measures of implicit and explicit motives are used.
Neuroendocrine Tumor: Statistics
... Tumor > Neuroendocrine Tumor: Statistics Request Permissions Neuroendocrine Tumor: Statistics Approved by the Cancer.Net Editorial Board , 11/ ... the body. It is important to remember that statistics on how many people survive this type of ...
Adrenal Gland Tumors: Statistics
... Gland Tumor: Statistics Request Permissions Adrenal Gland Tumor: Statistics Approved by the Cancer.Net Editorial Board , 03/ ... primary adrenal gland tumor is very uncommon. Exact statistics are not available for this type of tumor ...
Statistical inference and Aristotle's Rhetoric.
Macdonald, Ranald R
2004-11-01
Formal logic operates in a closed system where all the information relevant to any conclusion is present, whereas this is not the case when one reasons about events and states of the world. Pollard and Richardson drew attention to the fact that the reasoning behind statistical tests does not lead to logically justifiable conclusions. In this paper statistical inferences are defended not by logic but by the standards of everyday reasoning. Aristotle invented formal logic, but argued that people mostly get at the truth with the aid of enthymemes--incomplete syllogisms which include arguing from examples, analogies and signs. It is proposed that statistical tests work in the same way--in that they are based on examples, invoke the analogy of a model and use the size of the effect under test as a sign that the chance hypothesis is unlikely. Of existing theories of statistical inference only a weak version of Fisher's takes this into account. Aristotle anticipated Fisher by producing an argument of the form that there were too many cases in which an outcome went in a particular direction for that direction to be plausibly attributed to chance. We can therefore conclude that Aristotle would have approved of statistical inference and there is a good reason for calling this form of statistical inference classical.
STATISTICAL ANALYSIS, REPORTS), (*PROBABILITY, REPORTS), INFORMATION THEORY, DIFFERENTIAL EQUATIONS, STATISTICAL PROCESSES, STOCHASTIC PROCESSES, MULTIVARIATE ANALYSIS, DISTRIBUTION THEORY , DECISION THEORY, MEASURE THEORY, OPTIMIZATION
Recent statistical methods for orientation data
NASA Technical Reports Server (NTRS)
Batschelet, E.
1972-01-01
The application of statistical methods for determining the areas of animal orientation and navigation are discussed. The method employed is limited to the two-dimensional case. Various tests for determining the validity of the statistical analysis are presented. Mathematical models are included to support the theoretical considerations and tables of data are developed to show the value of information obtained by statistical analysis.
[Bayesian statistic: an approach fitted to clinic].
Meyer, N; Vinzio, S; Goichot, B
2009-03-01
Bayesian statistic has known a growing success though quite limited. This is surprising since Bayes' theorem on which this paradigm relies is frequently used by the clinicians. There is a direct link between the routine diagnostic test and the Bayesian statistic. This link is the Bayes' theorem which allows one to compute positive and negative predictive values of a test. The principle of this theorem is extended to simple statistical situations as an introduction to Bayesian statistic. The conceptual simplicity of Bayesian statistic should make for a greater acceptance in the biomedical world.
Statistical Reference Datasets
National Institute of Standards and Technology Data Gateway
Statistical Reference Datasets (Web, free access) The Statistical Reference Datasets is also supported by the Standard Reference Data Program. The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software.
Statistical modeling of software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1992-01-01
This working paper discusses the statistical simulation part of a controlled software development experiment being conducted under the direction of the System Validation Methods Branch, Information Systems Division, NASA Langley Research Center. The experiment uses guidance and control software (GCS) aboard a fictitious planetary landing spacecraft: real-time control software operating on a transient mission. Software execution is simulated to study the statistical aspects of reliability and other failure characteristics of the software during development, testing, and random usage. Quantification of software reliability is a major goal. Various reliability concepts are discussed. Experiments are described for performing simulations and collecting appropriate simulated software performance and failure data. This data is then used to make statistical inferences about the quality of the software development and verification processes as well as inferences about the reliability of software versions and reliability growth under random testing and debugging.
Instructional Theory for Teaching Statistics.
ERIC Educational Resources Information Center
Atwood, Jan R.; Dinham, Sarah M.
Metatheoretical analysis of Ausubel's Theory of Meaningful Verbal Learning and Gagne's Theory of Instruction using the Dickoff and James paradigm produced two instructional systems for basic statistics. The systems were tested with a pretest-posttest control group design utilizing students enrolled in an introductory-level graduate statistics…
Mathematical and statistical analysis
NASA Technical Reports Server (NTRS)
Houston, A. Glen
1988-01-01
The goal of the mathematical and statistical analysis component of RICIS is to research, develop, and evaluate mathematical and statistical techniques for aerospace technology applications. Specific research areas of interest include modeling, simulation, experiment design, reliability assessment, and numerical analysis.
... Research AMIGAS Fighting Cervical Cancer Worldwide Stay Informed Statistics for Other Kinds of Cancer Breast Cervical Colorectal ( ... Skin Vaginal and Vulvar Cancer Home Uterine Cancer Statistics Language: English Español (Spanish) Recommend on Facebook Tweet ...
Experiment in Elementary Statistics
ERIC Educational Resources Information Center
Fernando, P. C. B.
1976-01-01
Presents an undergraduate laboratory exercise in elementary statistics in which students verify empirically the various aspects of the Gaussian distribution. Sampling techniques and other commonly used statistical procedures are introduced. (CP)
On More Sensitive Periodogram Statistics
NASA Astrophysics Data System (ADS)
Bélanger, G.
2016-05-01
Period searches in event data have traditionally used the Rayleigh statistic, R 2. For X-ray pulsars, the standard has been the Z 2 statistic, which sums over more than one harmonic. For γ-rays, the H-test, which optimizes the number of harmonics to sum, is often used. These periodograms all suffer from the same problem, namely artifacts caused by correlations in the Fourier components that arise from testing frequencies with a non-integer number of cycles. This article addresses this problem. The modified Rayleigh statistic is discussed, its generalization to any harmonic, {{ R }}k2, is formulated, and from the latter, the modified Z 2 statistic, {{ Z }}2, is constructed. Versions of these statistics for binned data and point measurements are derived, and it is shown that the variance in the uncertainties can have an important influence on the periodogram. It is shown how to combine the information about the signal frequency from the different harmonics to estimate its value with maximum accuracy. The methods are applied to an XMM-Newton observation of the Crab pulsar for which a decomposition of the pulse profile is presented, and shows that most of the power is in the second, third, and fifth harmonics. Statistical detection power of the {{ R }}k2 statistic is superior to the FFT and equivalent to the Lomb--Scargle (LS). Response to gaps in the data is assessed, and it is shown that the LS does not protect against the distortions they cause. The main conclusion of this work is that the classical R 2 and Z 2 should be replaced by {{ R }}k2 and {{ Z }}2 in all applications with event data, and the LS should be replaced by the {{ R }}k2 when the uncertainty varies from one point measurement to another.
ERIC Educational Resources Information Center
Lenard, Christopher; McCarthy, Sally; Mills, Terence
2014-01-01
There are many different aspects of statistics. Statistics involves mathematics, computing, and applications to almost every field of endeavour. Each aspect provides an opportunity to spark someone's interest in the subject. In this paper we discuss some ethical aspects of statistics, and describe how an introduction to ethics has been…
Teaching Statistics Using SAS.
ERIC Educational Resources Information Center
Mandeville, Garrett K.
The Statistical Analysis System (SAS) is presented as the single most appropriate statistical package to use as an aid in teaching statistics. A brief review of literature in which SAS is compared to SPSS, BMDP, and other packages is followed by six examples which demonstrate features unique to SAS which have pedagogical utility. Of particular…
Minnesota Health Statistics 1988.
ERIC Educational Resources Information Center
Minnesota State Dept. of Health, St. Paul.
This document comprises the 1988 annual statistical report of the Minnesota Center for Health Statistics. After introductory technical notes on changes in format, sources of data, and geographic allocation of vital events, an overview is provided of vital health statistics in all areas. Thereafter, separate sections of the report provide tables…
Statistical inference for inverse problems
NASA Astrophysics Data System (ADS)
Bissantz, Nicolai; Holzmann, Hajo
2008-06-01
In this paper we study statistical inference for certain inverse problems. We go beyond mere estimation purposes and review and develop the construction of confidence intervals and confidence bands in some inverse problems, including deconvolution and the backward heat equation. Further, we discuss the construction of certain hypothesis tests, in particular concerning the number of local maxima of the unknown function. The methods are illustrated in a case study, where we analyze the distribution of heliocentric escape velocities of galaxies in the Centaurus galaxy cluster, and provide statistical evidence for its bimodality.
Which statistics should tropical biologists learn?
Loaiza Velásquez, Natalia; González Lutz, María Isabel; Monge-Nájera, Julián
2011-09-01
Tropical biologists study the richest and most endangered biodiversity in the planet, and in these times of climate change and mega-extinctions, the need for efficient, good quality research is more pressing than in the past. However, the statistical component in research published by tropical authors sometimes suffers from poor quality in data collection; mediocre or bad experimental design and a rigid and outdated view of data analysis. To suggest improvements in their statistical education, we listed all the statistical tests and other quantitative analyses used in two leading tropical journals, the Revista de Biología Tropical and Biotropica, during a year. The 12 most frequent tests in the articles were: Analysis of Variance (ANOVA), Chi-Square Test, Student's T Test, Linear Regression, Pearson's Correlation Coefficient, Mann-Whitney U Test, Kruskal-Wallis Test, Shannon's Diversity Index, Tukey's Test, Cluster Analysis, Spearman's Rank Correlation Test and Principal Component Analysis. We conclude that statistical education for tropical biologists must abandon the old syllabus based on the mathematical side of statistics and concentrate on the correct selection of these and other procedures and tests, on their biological interpretation and on the use of reliable and friendly freeware. We think that their time will be better spent understanding and protecting tropical ecosystems than trying to learn the mathematical foundations of statistics: in most cases, a well designed one-semester course should be enough for their basic requirements.
Equivalent statistics and data interpretation.
Francis, Gregory
2016-10-14
Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.
Statistical Mechanics of Zooplankton.
Hinow, Peter; Nihongi, Ai; Strickler, J Rudi
2015-01-01
Statistical mechanics provides the link between microscopic properties of many-particle systems and macroscopic properties such as pressure and temperature. Observations of similar "microscopic" quantities exist for the motion of zooplankton, as well as many species of other social animals. Herein, we propose to take average squared velocities as the definition of the "ecological temperature" of a population under different conditions on nutrients, light, oxygen and others. We test the usefulness of this definition on observations of the crustacean zooplankton Daphnia pulicaria. In one set of experiments, D. pulicaria is infested with the pathogen Vibrio cholerae, the causative agent of cholera. We find that infested D. pulicaria under light exposure have a significantly greater ecological temperature, which puts them at a greater risk of detection by visual predators. In a second set of experiments, we observe D. pulicaria in cold and warm water, and in darkness and under light exposure. Overall, our ecological temperature is a good discriminator of the crustacean's swimming behavior.
Statistical Mechanics of Zooplankton
Hinow, Peter; Nihongi, Ai; Strickler, J. Rudi
2015-01-01
Statistical mechanics provides the link between microscopic properties of many-particle systems and macroscopic properties such as pressure and temperature. Observations of similar “microscopic” quantities exist for the motion of zooplankton, as well as many species of other social animals. Herein, we propose to take average squared velocities as the definition of the “ecological temperature” of a population under different conditions on nutrients, light, oxygen and others. We test the usefulness of this definition on observations of the crustacean zooplankton Daphnia pulicaria. In one set of experiments, D. pulicaria is infested with the pathogen Vibrio cholerae, the causative agent of cholera. We find that infested D. pulicaria under light exposure have a significantly greater ecological temperature, which puts them at a greater risk of detection by visual predators. In a second set of experiments, we observe D. pulicaria in cold and warm water, and in darkness and under light exposure. Overall, our ecological temperature is a good discriminator of the crustacean’s swimming behavior. PMID:26270537
Thorn, Joanna C; Turner, Emma L; Hounsome, Luke; Walsh, Eleanor; Down, Liz; Verne, Julia; Donovan, Jenny L; Neal, David E; Hamdy, Freddie C; Martin, Richard M; Noble, Sian M
2016-01-01
Objectives To evaluate the accuracy of routine data for costing inpatient resource use in a large clinical trial and to investigate costing methodologies. Design Final-year inpatient cost profiles were derived using (1) data extracted from medical records mapped to the National Health Service (NHS) reference costs via service codes and (2) Hospital Episode Statistics (HES) data using NHS reference costs. Trust finance departments were consulted to obtain costs for comparison purposes. Setting 7 UK secondary care centres. Population A subsample of 292 men identified as having died at least a year after being diagnosed with prostate cancer in Cluster randomised triAl of PSA testing for Prostate cancer (CAP), a long-running trial to evaluate the effectiveness and cost-effectiveness of prostate-specific antigen (PSA) testing. Results Both inpatient cost profiles showed a rise in costs in the months leading up to death, and were broadly similar. The difference in mean inpatient costs was £899, with HES data yielding ∼8% lower costs than medical record data (differences compatible with chance, p=0.3). Events were missing from both data sets. 11 men (3.8%) had events identified in HES that were all missing from medical record review, while 7 men (2.4%) had events identified in medical record review that were all missing from HES. The response from finance departments to requests for cost data was poor: only 3 of 7 departments returned adequate data sets within 6 months. Conclusions Using HES routine data coupled with NHS reference costs resulted in mean annual inpatient costs that were very similar to those derived via medical record review; therefore, routinely available data can be used as the primary method of costing resource use in large clinical trials. Neither HES nor medical record review represent gold standards of data collection. Requesting cost data from finance departments is impractical for large clinical trials. Trial registration number ISRCTN92187251
... Naloxone Pain Prevention Treatment Trends & Statistics Women and Drugs Publications Funding Funding Opportunities Clinical Research Post-Award Concerns General Information Grant & Contract Application ...
Statistical distribution sampling
NASA Technical Reports Server (NTRS)
Johnson, E. S.
1975-01-01
Determining the distribution of statistics by sampling was investigated. Characteristic functions, the quadratic regression problem, and the differential equations for the characteristic functions are analyzed.
ERIC Educational Resources Information Center
Ciftci, S. Koza; Karadag, Engin; Akdal, Pinar
2014-01-01
The purpose of this study was to determine the effect of statistics instruction using computer-based tools, on statistics anxiety, attitude, and achievement. This study was designed as quasi-experimental research and the pattern used was a matched pre-test/post-test with control group design. Data was collected using three scales: a Statistics…
Statistical Power in Meta-Analysis
ERIC Educational Resources Information Center
Liu, Jin
2015-01-01
Statistical power is important in a meta-analysis study, although few studies have examined the performance of simulated power in meta-analysis. The purpose of this study is to inform researchers about statistical power estimation on two sample mean difference test under different situations: (1) the discrepancy between the analytical power and…
Statistical Power Analysis of Rehabilitation Counseling Research.
ERIC Educational Resources Information Center
Kosciulek, John F.; Szymanski, Edna Mora
1993-01-01
Provided initial assessment of the statistical power of rehabilitation counseling research published in selected rehabilitation journals. From 5 relevant journals, found 32 articles that contained statistical tests that could be power analyzed. Findings indicated that rehabilitation counselor researchers had little chance of finding small…
Code of Federal Regulations, 2011 CFR
2011-07-01
... 40 Protection of Environment 33 2011-07-01 2011-07-01 false Statistics. 1065.602 Section 1065.602 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR POLLUTION CONTROLS ENGINE-TESTING PROCEDURES Calculations and Data Requirements § 1065.602 Statistics. (a) Overview. This section...
Code of Federal Regulations, 2010 CFR
2010-07-01
... 40 Protection of Environment 32 2010-07-01 2010-07-01 false Statistics. 1065.602 Section 1065.602 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR POLLUTION CONTROLS ENGINE-TESTING PROCEDURES Calculations and Data Requirements § 1065.602 Statistics. (a) Overview. This section...
Code of Federal Regulations, 2013 CFR
2013-07-01
... 40 Protection of Environment 34 2013-07-01 2013-07-01 false Statistics. 1065.602 Section 1065.602 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR POLLUTION CONTROLS ENGINE-TESTING PROCEDURES Calculations and Data Requirements § 1065.602 Statistics. (a) Overview. This section...
Code of Federal Regulations, 2012 CFR
2012-07-01
... 40 Protection of Environment 34 2012-07-01 2012-07-01 false Statistics. 1065.602 Section 1065.602 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR POLLUTION CONTROLS ENGINE-TESTING PROCEDURES Calculations and Data Requirements § 1065.602 Statistics. (a) Overview. This section...
Code of Federal Regulations, 2014 CFR
2014-07-01
... 40 Protection of Environment 33 2014-07-01 2014-07-01 false Statistics. 1065.602 Section 1065.602 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR POLLUTION CONTROLS ENGINE-TESTING PROCEDURES Calculations and Data Requirements § 1065.602 Statistics. (a) Overview. This section...
Microcomputer Support in Basic Statistics Instruction.
ERIC Educational Resources Information Center
Schafer, William D; Johnson, Charles E.
This paper presents examples of effective uses of microcomputers to support basic statistics instruction. All programs are written in Applesoft BASIC for Apple II Plus microcomputers and compatible equipment. They have been field tested in statistics courses at the University of Maryland. Microcomputers can be used with color monitors for…
Atmospheric statistics for aerospace vehicle operations
NASA Technical Reports Server (NTRS)
Smith, O. E.; Batts, G. W.
1993-01-01
Statistical analysis of atmospheric variables was performed for the Shuttle Transportation System (STS) design trade studies and the establishment of launch commit criteria. Atmospheric constraint statistics have been developed for the NASP test flight, the Advanced Launch System, and the National Launch System. The concepts and analysis techniques discussed in the paper are applicable to the design and operations of any future aerospace vehicle.
Application of Statistics in Engineering Technology Programs
ERIC Educational Resources Information Center
Zhan, Wei; Fink, Rainer; Fang, Alex
2010-01-01
Statistics is a critical tool for robustness analysis, measurement system error analysis, test data analysis, probabilistic risk assessment, and many other fields in the engineering world. Traditionally, however, statistics is not extensively used in undergraduate engineering technology (ET) programs, resulting in a major disconnect from industry…
Teaching Statistics without Sadistics.
ERIC Educational Resources Information Center
Forte, James A.
1995-01-01
Five steps designed to take anxiety out of statistics for social work students are outlined. First, statistics anxiety is identified as an educational problem. Second, instructional objectives and procedures to achieve them are presented and methods and tools for evaluating the course are explored. Strategies for, and obstacles to, making…
STATSIM: Exercises in Statistics.
ERIC Educational Resources Information Center
Thomas, David B.; And Others
A computer-based learning simulation was developed at Florida State University which allows for high interactive responding via a time-sharing terminal for the purpose of demonstrating descriptive and inferential statistics. The statistical simulation (STATSIM) is comprised of four modules--chi square, t, z, and F distribution--and elucidates the…
Understanding Undergraduate Statistical Anxiety
ERIC Educational Resources Information Center
McKim, Courtney
2014-01-01
The purpose of this study was to understand undergraduate students' views of statistics. Results reveal that students with less anxiety have a higher interest in statistics and also believe in their ability to perform well in the course. Also students who have a more positive attitude about the class tend to have a higher belief in their…
ERIC Educational Resources Information Center
Hodgson, Ted; Andersen, Lyle; Robison-Cox, Jim; Jones, Clain
2004-01-01
Water quality experiments, especially the use of macroinvertebrates as indicators of water quality, offer an ideal context for connecting statistics and science. In the STAR program for secondary students and teachers, water quality experiments were also used as a context for teaching statistics. In this article, we trace one activity that uses…
Towards Statistically Undetectable Steganography
2011-06-30
Statistically Undciectable Steganography 5a. CONTRACT NUMBER FA9550-08-1-0084 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Prof. Jessica...approved for public release: distribution is unlimited. 13. SUPPLEMENTARY NOTES 14. ABSTRACT Fundamental asymptotic laws for imperfect steganography ...formats. 15. SUBJECT TERMS Steganography . covert communication, statistical detectability. asymptotic performance, secure pay load, minimum