[Dilemma of null hypothesis in ecological hypothesis's experiment test.
Li, Ji
2016-06-01
Experimental test is one of the major test methods of ecological hypothesis, though there are many arguments due to null hypothesis. Quinn and Dunham (1983) analyzed the hypothesis deduction model from Platt (1964) and thus stated that there is no null hypothesis in ecology that can be strictly tested by experiments. Fisher's falsificationism and Neyman-Pearson (N-P)'s non-decisivity inhibit statistical null hypothesis from being strictly tested. Moreover, since the null hypothesis H 0 (α=1, β=0) and alternative hypothesis H 1 '(α'=1, β'=0) in ecological progresses are diffe-rent from classic physics, the ecological null hypothesis can neither be strictly tested experimentally. These dilemmas of null hypothesis could be relieved via the reduction of P value, careful selection of null hypothesis, non-centralization of non-null hypothesis, and two-tailed test. However, the statistical null hypothesis significance testing (NHST) should not to be equivalent to the causality logistical test in ecological hypothesis. Hence, the findings and conclusions about methodological studies and experimental tests based on NHST are not always logically reliable.
A Critique of One-Tailed Hypothesis Test Procedures in Business and Economics Statistics Textbooks.
ERIC Educational Resources Information Center
Liu, Tung; Stone, Courtenay C.
1999-01-01
Surveys introductory business and economics statistics textbooks and finds that they differ over the best way to explain one-tailed hypothesis tests: the simple null-hypothesis approach or the composite null-hypothesis approach. Argues that the composite null-hypothesis approach contains methodological shortcomings that make it more difficult for…
Explorations in statistics: hypothesis tests and P values.
Curran-Everett, Douglas
2009-06-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This second installment of Explorations in Statistics delves into test statistics and P values, two concepts fundamental to the test of a scientific null hypothesis. The essence of a test statistic is that it compares what we observe in the experiment to what we expect to see if the null hypothesis is true. The P value associated with the magnitude of that test statistic answers this question: if the null hypothesis is true, what proportion of possible values of the test statistic are at least as extreme as the one I got? Although statisticians continue to stress the limitations of hypothesis tests, there are two realities we must acknowledge: hypothesis tests are ingrained within science, and the simple test of a null hypothesis can be useful. As a result, it behooves us to explore the notions of hypothesis tests, test statistics, and P values.
A statistical test to show negligible trend
Philip M. Dixon; Joseph H.K. Pechmann
2005-01-01
The usual statistical tests of trend are inappropriate for demonstrating the absence of trend. This is because failure to reject the null hypothesis of no trend does not prove that null hypothesis. The appropriate statistical method is based on an equivalence test. The null hypothesis is that the trend is not zero, i.e., outside an a priori specified equivalence region...
Null but not void: considerations for hypothesis testing.
Shaw, Pamela A; Proschan, Michael A
2013-01-30
Standard statistical theory teaches us that once the null and alternative hypotheses have been defined for a parameter, the choice of the statistical test is clear. Standard theory does not teach us how to choose the null or alternative hypothesis appropriate to the scientific question of interest. Neither does it tell us that in some cases, depending on which alternatives are realistic, we may want to define our null hypothesis differently. Problems in statistical practice are frequently not as pristinely summarized as the classic theory in our textbooks. In this article, we present examples in statistical hypothesis testing in which seemingly simple choices are in fact rich with nuance that, when given full consideration, make the choice of the right hypothesis test much less straightforward. Published 2012. This article is a US Government work and is in the public domain in the USA.
Explorations in Statistics: Power
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2010-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This fifth installment of "Explorations in Statistics" revisits power, a concept fundamental to the test of a null hypothesis. Power is the probability that we reject the null hypothesis when it is false. Four…
P value and the theory of hypothesis testing: an explanation for new researchers.
Biau, David Jean; Jolles, Brigitte M; Porcher, Raphaël
2010-03-01
In the 1920s, Ronald Fisher developed the theory behind the p value and Jerzy Neyman and Egon Pearson developed the theory of hypothesis testing. These distinct theories have provided researchers important quantitative tools to confirm or refute their hypotheses. The p value is the probability to obtain an effect equal to or more extreme than the one observed presuming the null hypothesis of no effect is true; it gives researchers a measure of the strength of evidence against the null hypothesis. As commonly used, investigators will select a threshold p value below which they will reject the null hypothesis. The theory of hypothesis testing allows researchers to reject a null hypothesis in favor of an alternative hypothesis of some effect. As commonly used, investigators choose Type I error (rejecting the null hypothesis when it is true) and Type II error (accepting the null hypothesis when it is false) levels and determine some critical region. If the test statistic falls into that critical region, the null hypothesis is rejected in favor of the alternative hypothesis. Despite similarities between the two, the p value and the theory of hypothesis testing are different theories that often are misunderstood and confused, leading researchers to improper conclusions. Perhaps the most common misconception is to consider the p value as the probability that the null hypothesis is true rather than the probability of obtaining the difference observed, or one that is more extreme, considering the null is true. Another concern is the risk that an important proportion of statistically significant results are falsely significant. Researchers should have a minimum understanding of these two theories so that they are better able to plan, conduct, interpret, and report scientific experiments.
ERIC Educational Resources Information Center
Marmolejo-Ramos, Fernando; Cousineau, Denis
2017-01-01
The number of articles showing dissatisfaction with the null hypothesis statistical testing (NHST) framework has been progressively increasing over the years. Alternatives to NHST have been proposed and the Bayesian approach seems to have achieved the highest amount of visibility. In this last part of the special issue, a few alternative…
Bayes factor and posterior probability: Complementary statistical evidence to p-value.
Lin, Ruitao; Yin, Guosheng
2015-09-01
As a convention, a p-value is often computed in hypothesis testing and compared with the nominal level of 0.05 to determine whether to reject the null hypothesis. Although the smaller the p-value, the more significant the statistical test, it is difficult to perceive the p-value in a probability scale and quantify it as the strength of the data against the null hypothesis. In contrast, the Bayesian posterior probability of the null hypothesis has an explicit interpretation of how strong the data support the null. We make a comparison of the p-value and the posterior probability by considering a recent clinical trial. The results show that even when we reject the null hypothesis, there is still a substantial probability (around 20%) that the null is true. Not only should we examine whether the data would have rarely occurred under the null hypothesis, but we also need to know whether the data would be rare under the alternative. As a result, the p-value only provides one side of the information, for which the Bayes factor and posterior probability may offer complementary evidence. Copyright © 2015 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Tryon, Warren W.; Lewis, Charles
2008-01-01
Evidence of group matching frequently takes the form of a nonsignificant test of statistical difference. Theoretical hypotheses of no difference are also tested in this way. These practices are flawed in that null hypothesis statistical testing provides evidence against the null hypothesis and failing to reject H[subscript 0] is not evidence…
ERIC Educational Resources Information Center
Trafimow, David
2017-01-01
There has been much controversy over the null hypothesis significance testing procedure, with much of the criticism centered on the problem of inverse inference. Specifically, p gives the probability of the finding (or one more extreme) given the null hypothesis, whereas the null hypothesis significance testing procedure involves drawing a…
Hypothesis Testing Using Spatially Dependent Heavy Tailed Multisensor Data
2014-12-01
Office of Research 113 Bowne Hall Syracuse, NY 13244 -1200 ABSTRACT HYPOTHESIS TESTING USING SPATIALLY DEPENDENT HEAVY-TAILED MULTISENSOR DATA Report...consistent with the null hypothesis of linearity and can be used to estimate the distribution of a test statistic that can discrimi- nate between the null... Test for nonlinearity. Histogram is generated using the surrogate data. The statistic of the original time series is represented by the solid line
ERIC Educational Resources Information Center
LeMire, Steven D.
2010-01-01
This paper proposes an argument framework for the teaching of null hypothesis statistical testing and its application in support of research. Elements of the Toulmin (1958) model of argument are used to illustrate the use of p values and Type I and Type II error rates in support of claims about statistical parameters and subject matter research…
What is too much variation? The null hypothesis in small-area analysis.
Diehr, P; Cain, K; Connell, F; Volinn, E
1990-01-01
A small-area analysis (SAA) in health services research often calculates surgery rates for several small areas, compares the largest rate to the smallest, notes that the difference is large, and attempts to explain this discrepancy as a function of service availability, physician practice styles, or other factors. SAAs are often difficult to interpret because there is little theoretical basis for determining how much variation would be expected under the null hypothesis that all of the small areas have similar underlying surgery rates and that the observed variation is due to chance. We developed a computer program to simulate the distribution of several commonly used descriptive statistics under the null hypothesis, and used it to examine the variability in rates among the counties of the state of Washington. The expected variability when the null hypothesis is true is surprisingly large, and becomes worse for procedures with low incidence, for smaller populations, when there is variability among the populations of the counties, and when readmissions are possible. The characteristics of four descriptive statistics were studied and compared. None was uniformly good, but the chi-square statistic had better performance than the others. When we reanalyzed five journal articles that presented sufficient data, the results were usually statistically significant. Since SAA research today is tending to deal with low-incidence events, smaller populations, and measures where readmissions are possible, more research is needed on the distribution of small-area statistics under the null hypothesis. New standards are proposed for the presentation of SAA results. PMID:2312306
What is too much variation? The null hypothesis in small-area analysis.
Diehr, P; Cain, K; Connell, F; Volinn, E
1990-02-01
A small-area analysis (SAA) in health services research often calculates surgery rates for several small areas, compares the largest rate to the smallest, notes that the difference is large, and attempts to explain this discrepancy as a function of service availability, physician practice styles, or other factors. SAAs are often difficult to interpret because there is little theoretical basis for determining how much variation would be expected under the null hypothesis that all of the small areas have similar underlying surgery rates and that the observed variation is due to chance. We developed a computer program to simulate the distribution of several commonly used descriptive statistics under the null hypothesis, and used it to examine the variability in rates among the counties of the state of Washington. The expected variability when the null hypothesis is true is surprisingly large, and becomes worse for procedures with low incidence, for smaller populations, when there is variability among the populations of the counties, and when readmissions are possible. The characteristics of four descriptive statistics were studied and compared. None was uniformly good, but the chi-square statistic had better performance than the others. When we reanalyzed five journal articles that presented sufficient data, the results were usually statistically significant. Since SAA research today is tending to deal with low-incidence events, smaller populations, and measures where readmissions are possible, more research is needed on the distribution of small-area statistics under the null hypothesis. New standards are proposed for the presentation of SAA results.
Testing of Hypothesis in Equivalence and Non Inferiority Trials-A Concept.
Juneja, Atul; Aggarwal, Abha R; Adhikari, Tulsi; Pandey, Arvind
2016-04-01
Establishing the appropriate hypothesis is one of the important steps for carrying out the statistical tests/analysis. Its understanding is important for interpreting the results of statistical analysis. The current communication attempts to provide the concept of testing of hypothesis in non inferiority and equivalence trials, where the null hypothesis is just reverse of what is set up for conventional superiority trials. It is similarly looked for rejection for establishing the fact the researcher is intending to prove. It is important to mention that equivalence or non inferiority cannot be proved by accepting the null hypothesis of no difference. Hence, establishing the appropriate statistical hypothesis is extremely important to arrive at meaningful conclusion for the set objectives in research.
To P or Not to P: Backing Bayesian Statistics.
Buchinsky, Farrel J; Chadha, Neil K
2017-12-01
In biomedical research, it is imperative to differentiate chance variation from truth before we generalize what we see in a sample of subjects to the wider population. For decades, we have relied on null hypothesis significance testing, where we calculate P values for our data to decide whether to reject a null hypothesis. This methodology is subject to substantial misinterpretation and errant conclusions. Instead of working backward by calculating the probability of our data if the null hypothesis were true, Bayesian statistics allow us instead to work forward, calculating the probability of our hypothesis given the available data. This methodology gives us a mathematical means of incorporating our "prior probabilities" from previous study data (if any) to produce new "posterior probabilities." Bayesian statistics tell us how confidently we should believe what we believe. It is time to embrace and encourage their use in our otolaryngology research.
Suggestions for presenting the results of data analyses
Anderson, David R.; Link, William A.; Johnson, Douglas H.; Burnham, Kenneth P.
2001-01-01
We give suggestions for the presentation of research results from frequentist, information-theoretic, and Bayesian analysis paradigms, followed by several general suggestions. The information-theoretic and Bayesian methods offer alternative approaches to data analysis and inference compared to traditionally used methods. Guidance is lacking on the presentation of results under these alternative procedures and on nontesting aspects of classical frequentists methods of statistical analysis. Null hypothesis testing has come under intense criticism. We recommend less reporting of the results of statistical tests of null hypotheses in cases where the null is surely false anyway, or where the null hypothesis is of little interest to science or management.
Testing for nonlinearity in time series: The method of surrogate data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Theiler, J.; Galdrikian, B.; Longtin, A.
1991-01-01
We describe a statistical approach for identifying nonlinearity in time series; in particular, we want to avoid claims of chaos when simpler models (such as linearly correlated noise) can explain the data. The method requires a careful statement of the null hypothesis which characterizes a candidate linear process, the generation of an ensemble of surrogate'' data sets which are similar to the original time series but consistent with the null hypothesis, and the computation of a discriminating statistic for the original and for each of the surrogate data sets. The idea is to test the original time series against themore » null hypothesis by checking whether the discriminating statistic computed for the original time series differs significantly from the statistics computed for each of the surrogate sets. We present algorithms for generating surrogate data under various null hypotheses, and we show the results of numerical experiments on artificial data using correlation dimension, Lyapunov exponent, and forecasting error as discriminating statistics. Finally, we consider a number of experimental time series -- including sunspots, electroencephalogram (EEG) signals, and fluid convection -- and evaluate the statistical significance of the evidence for nonlinear structure in each case. 56 refs., 8 figs.« less
ERIC Educational Resources Information Center
Wilcox, Rand R.; Serang, Sarfaraz
2017-01-01
The article provides perspectives on p values, null hypothesis testing, and alternative techniques in light of modern robust statistical methods. Null hypothesis testing and "p" values can provide useful information provided they are interpreted in a sound manner, which includes taking into account insights and advances that have…
An Extension of RSS-based Model Comparison Tests for Weighted Least Squares
2012-08-22
use the model comparison test statistic to analyze the null hypothesis. Under the null hypothesis, the weighted least squares cost functional is JWLS ...q̂WLSH ) = 10.3040×106. Under the alternative hypothesis, the weighted least squares cost functional is JWLS (q̂WLS) = 8.8394 × 106. Thus the model
Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie
2013-01-01
Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.
Toward "Constructing" the Concept of Statistical Power: An Optical Analogy.
ERIC Educational Resources Information Center
Rogers, Bruce G.
This paper presents a visual analogy that may be used by instructors to teach the concept of statistical power in statistical courses. Statistical power is mathematically defined as the probability of rejecting a null hypothesis when that null is false, or, equivalently, the probability of detecting a relationship when it exists. The analogy…
The Harm Done to Reproducibility by the Culture of Null Hypothesis Significance Testing.
Lash, Timothy L
2017-09-15
In the last few years, stakeholders in the scientific community have raised alarms about a perceived lack of reproducibility of scientific results. In reaction, guidelines for journals have been promulgated and grant applicants have been asked to address the rigor and reproducibility of their proposed projects. Neither solution addresses a primary culprit, which is the culture of null hypothesis significance testing that dominates statistical analysis and inference. In an innovative research enterprise, selection of results for further evaluation based on null hypothesis significance testing is doomed to yield a low proportion of reproducible results and a high proportion of effects that are initially overestimated. In addition, the culture of null hypothesis significance testing discourages quantitative adjustments to account for systematic errors and quantitative incorporation of prior information. These strategies would otherwise improve reproducibility and have not been previously proposed in the widely cited literature on this topic. Without discarding the culture of null hypothesis significance testing and implementing these alternative methods for statistical analysis and inference, all other strategies for improving reproducibility will yield marginal gains at best. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
The Importance of Teaching Power in Statistical Hypothesis Testing
ERIC Educational Resources Information Center
Olinsky, Alan; Schumacher, Phyllis; Quinn, John
2012-01-01
In this paper, we discuss the importance of teaching power considerations in statistical hypothesis testing. Statistical power analysis determines the ability of a study to detect a meaningful effect size, where the effect size is the difference between the hypothesized value of the population parameter under the null hypothesis and the true value…
The Need for Nuance in the Null Hypothesis Significance Testing Debate
ERIC Educational Resources Information Center
Häggström, Olle
2017-01-01
Null hypothesis significance testing (NHST) provides an important statistical toolbox, but there are a number of ways in which it is often abused and misinterpreted, with bad consequences for the reliability and progress of science. Parts of contemporary NHST debate, especially in the psychological sciences, is reviewed, and a suggestion is made…
Thou Shalt Not Bear False Witness against Null Hypothesis Significance Testing
ERIC Educational Resources Information Center
García-Pérez, Miguel A.
2017-01-01
Null hypothesis significance testing (NHST) has been the subject of debate for decades and alternative approaches to data analysis have been proposed. This article addresses this debate from the perspective of scientific inquiry and inference. Inference is an inverse problem and application of statistical methods cannot reveal whether effects…
An omnibus test for the global null hypothesis.
Futschik, Andreas; Taus, Thomas; Zehetmayer, Sonja
2018-01-01
Global hypothesis tests are a useful tool in the context of clinical trials, genetic studies, or meta-analyses, when researchers are not interested in testing individual hypotheses, but in testing whether none of the hypotheses is false. There are several possibilities how to test the global null hypothesis when the individual null hypotheses are independent. If it is assumed that many of the individual null hypotheses are false, combination tests have been recommended to maximize power. If, however, it is assumed that only one or a few null hypotheses are false, global tests based on individual test statistics are more powerful (e.g. Bonferroni or Simes test). However, usually there is no a priori knowledge on the number of false individual null hypotheses. We therefore propose an omnibus test based on cumulative sums of the transformed p-values. We show that this test yields an impressive overall performance. The proposed method is implemented in an R-package called omnibus.
Unicorns do exist: a tutorial on "proving" the null hypothesis.
Streiner, David L
2003-12-01
Introductory statistics classes teach us that we can never prove the null hypothesis; all we can do is reject or fail to reject it. However, there are times when it is necessary to try to prove the nonexistence of a difference between groups. This most often happens within the context of comparing a new treatment against an established one and showing that the new intervention is not inferior to the standard. This article first outlines the logic of "noninferiority" testing by differentiating between the null hypothesis (that which we are trying to nullify) and the "nill" hypothesis (there is no difference), reversing the role of the null and alternate hypotheses, and defining an interval within which groups are said to be equivalent. We then work through an example and show how to calculate sample sizes for noninferiority studies.
Explorations in Statistics: Hypothesis Tests and P Values
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2009-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This second installment of "Explorations in Statistics" delves into test statistics and P values, two concepts fundamental to the test of a scientific null hypothesis. The essence of a test statistic is that it compares what…
Précis of statistical significance: rationale, validity, and utility.
Chow, S L
1998-04-01
The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of data. Neither can any of them fulfill the nonstatistical functions expected of them by critics.
Bundschuh, Mirco; Newman, Michael C; Zubrod, Jochen P; Seitz, Frank; Rosenfeldt, Ricki R; Schulz, Ralf
2015-03-01
We argued recently that the positive predictive value (PPV) and the negative predictive value (NPV) are valuable metrics to include during null hypothesis significance testing: They inform the researcher about the probability of statistically significant and non-significant test outcomes actually being true. Although commonly misunderstood, a reported p value estimates only the probability of obtaining the results or more extreme results if the null hypothesis of no effect was true. Calculations of the more informative PPV and NPV require a priori estimate of the probability (R). The present document discusses challenges of estimating R.
Biostatistics Series Module 2: Overview of Hypothesis Testing.
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Hypothesis testing (or statistical inference) is one of the major applications of biostatistics. Much of medical research begins with a research question that can be framed as a hypothesis. Inferential statistics begins with a null hypothesis that reflects the conservative position of no change or no difference in comparison to baseline or between groups. Usually, the researcher has reason to believe that there is some effect or some difference which is the alternative hypothesis. The researcher therefore proceeds to study samples and measure outcomes in the hope of generating evidence strong enough for the statistician to be able to reject the null hypothesis. The concept of the P value is almost universally used in hypothesis testing. It denotes the probability of obtaining by chance a result at least as extreme as that observed, even when the null hypothesis is true and no real difference exists. Usually, if P is < 0.05 the null hypothesis is rejected and sample results are deemed statistically significant. With the increasing availability of computers and access to specialized statistical software, the drudgery involved in statistical calculations is now a thing of the past, once the learning curve of the software has been traversed. The life sciences researcher is therefore free to devote oneself to optimally designing the study, carefully selecting the hypothesis tests to be applied, and taking care in conducting the study well. Unfortunately, selecting the right test seems difficult initially. Thinking of the research hypothesis as addressing one of five generic research questions helps in selection of the right hypothesis test. In addition, it is important to be clear about the nature of the variables (e.g., numerical vs. categorical; parametric vs. nonparametric) and the number of groups or data sets being compared (e.g., two or more than two) at a time. The same research question may be explored by more than one type of hypothesis test. While this may be of utility in highlighting different aspects of the problem, merely reapplying different tests to the same issue in the hope of finding a P < 0.05 is a wrong use of statistics. Finally, it is becoming the norm that an estimate of the size of any effect, expressed with its 95% confidence interval, is required for meaningful interpretation of results. A large study is likely to have a small (and therefore "statistically significant") P value, but a "real" estimate of the effect would be provided by the 95% confidence interval. If the intervals overlap between two interventions, then the difference between them is not so clear-cut even if P < 0.05. The two approaches are now considered complementary to one another.
Biostatistics Series Module 2: Overview of Hypothesis Testing
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Hypothesis testing (or statistical inference) is one of the major applications of biostatistics. Much of medical research begins with a research question that can be framed as a hypothesis. Inferential statistics begins with a null hypothesis that reflects the conservative position of no change or no difference in comparison to baseline or between groups. Usually, the researcher has reason to believe that there is some effect or some difference which is the alternative hypothesis. The researcher therefore proceeds to study samples and measure outcomes in the hope of generating evidence strong enough for the statistician to be able to reject the null hypothesis. The concept of the P value is almost universally used in hypothesis testing. It denotes the probability of obtaining by chance a result at least as extreme as that observed, even when the null hypothesis is true and no real difference exists. Usually, if P is < 0.05 the null hypothesis is rejected and sample results are deemed statistically significant. With the increasing availability of computers and access to specialized statistical software, the drudgery involved in statistical calculations is now a thing of the past, once the learning curve of the software has been traversed. The life sciences researcher is therefore free to devote oneself to optimally designing the study, carefully selecting the hypothesis tests to be applied, and taking care in conducting the study well. Unfortunately, selecting the right test seems difficult initially. Thinking of the research hypothesis as addressing one of five generic research questions helps in selection of the right hypothesis test. In addition, it is important to be clear about the nature of the variables (e.g., numerical vs. categorical; parametric vs. nonparametric) and the number of groups or data sets being compared (e.g., two or more than two) at a time. The same research question may be explored by more than one type of hypothesis test. While this may be of utility in highlighting different aspects of the problem, merely reapplying different tests to the same issue in the hope of finding a P < 0.05 is a wrong use of statistics. Finally, it is becoming the norm that an estimate of the size of any effect, expressed with its 95% confidence interval, is required for meaningful interpretation of results. A large study is likely to have a small (and therefore “statistically significant”) P value, but a “real” estimate of the effect would be provided by the 95% confidence interval. If the intervals overlap between two interventions, then the difference between them is not so clear-cut even if P < 0.05. The two approaches are now considered complementary to one another. PMID:27057011
ERIC Educational Resources Information Center
Paek, Insu
2010-01-01
Conservative bias in rejection of a null hypothesis from using the continuity correction in the Mantel-Haenszel (MH) procedure was examined through simulation in a differential item functioning (DIF) investigation context in which statistical testing uses a prespecified level [alpha] for the decision on an item with respect to DIF. The standard MH…
Sirota, Miroslav; Kostovičová, Lenka; Juanchich, Marie
2014-08-01
Knowing which properties of visual displays facilitate statistical reasoning bears practical and theoretical implications. Therefore, we studied the effect of one property of visual diplays - iconicity (i.e., the resemblance of a visual sign to its referent) - on Bayesian reasoning. Two main accounts of statistical reasoning predict different effect of iconicity on Bayesian reasoning. The ecological-rationality account predicts a positive iconicity effect, because more highly iconic signs resemble more individuated objects, which tap better into an evolutionary-designed frequency-coding mechanism that, in turn, facilitates Bayesian reasoning. The nested-sets account predicts a null iconicity effect, because iconicity does not affect the salience of a nested-sets structure-the factor facilitating Bayesian reasoning processed by a general reasoning mechanism. In two well-powered experiments (N = 577), we found no support for a positive iconicity effect across different iconicity levels that were manipulated in different visual displays (meta-analytical overall effect: log OR = -0.13, 95% CI [-0.53, 0.28]). A Bayes factor analysis provided strong evidence in favor of the null hypothesis-the null iconicity effect. Thus, these findings corroborate the nested-sets rather than the ecological-rationality account of statistical reasoning.
Power Enhancement in High Dimensional Cross-Sectional Tests
Fan, Jianqing; Liao, Yuan; Yao, Jiawei
2016-01-01
We propose a novel technique to boost the power of testing a high-dimensional vector H : θ = 0 against sparse alternatives where the null hypothesis is violated only by a couple of components. Existing tests based on quadratic forms such as the Wald statistic often suffer from low powers due to the accumulation of errors in estimating high-dimensional parameters. More powerful tests for sparse alternatives such as thresholding and extreme-value tests, on the other hand, require either stringent conditions or bootstrap to derive the null distribution and often suffer from size distortions due to the slow convergence. Based on a screening technique, we introduce a “power enhancement component”, which is zero under the null hypothesis with high probability, but diverges quickly under sparse alternatives. The proposed test statistic combines the power enhancement component with an asymptotically pivotal statistic, and strengthens the power under sparse alternatives. The null distribution does not require stringent regularity conditions, and is completely determined by that of the pivotal statistic. As specific applications, the proposed methods are applied to testing the factor pricing models and validating the cross-sectional independence in panel data models. PMID:26778846
When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment.
Szucs, Denes; Ioannidis, John P A
2017-01-01
Null hypothesis significance testing (NHST) has several shortcomings that are likely contributing factors behind the widely debated replication crisis of (cognitive) neuroscience, psychology, and biomedical science in general. We review these shortcomings and suggest that, after sustained negative experience, NHST should no longer be the default, dominant statistical practice of all biomedical and psychological research. If theoretical predictions are weak we should not rely on all or nothing hypothesis tests. Different inferential methods may be most suitable for different types of research questions. Whenever researchers use NHST they should justify its use, and publish pre-study power calculations and effect sizes, including negative findings. Hypothesis-testing studies should be pre-registered and optimally raw data published. The current statistics lite educational approach for students that has sustained the widespread, spurious use of NHST should be phased out.
When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment
Szucs, Denes; Ioannidis, John P. A.
2017-01-01
Null hypothesis significance testing (NHST) has several shortcomings that are likely contributing factors behind the widely debated replication crisis of (cognitive) neuroscience, psychology, and biomedical science in general. We review these shortcomings and suggest that, after sustained negative experience, NHST should no longer be the default, dominant statistical practice of all biomedical and psychological research. If theoretical predictions are weak we should not rely on all or nothing hypothesis tests. Different inferential methods may be most suitable for different types of research questions. Whenever researchers use NHST they should justify its use, and publish pre-study power calculations and effect sizes, including negative findings. Hypothesis-testing studies should be pre-registered and optimally raw data published. The current statistics lite educational approach for students that has sustained the widespread, spurious use of NHST should be phased out. PMID:28824397
Statistical significance versus clinical relevance.
van Rijn, Marieke H C; Bech, Anneke; Bouyer, Jean; van den Brand, Jan A J G
2017-04-01
In March this year, the American Statistical Association (ASA) posted a statement on the correct use of P-values, in response to a growing concern that the P-value is commonly misused and misinterpreted. We aim to translate these warnings given by the ASA into a language more easily understood by clinicians and researchers without a deep background in statistics. Moreover, we intend to illustrate the limitations of P-values, even when used and interpreted correctly, and bring more attention to the clinical relevance of study findings using two recently reported studies as examples. We argue that P-values are often misinterpreted. A common mistake is saying that P < 0.05 means that the null hypothesis is false, and P ≥0.05 means that the null hypothesis is true. The correct interpretation of a P-value of 0.05 is that if the null hypothesis were indeed true, a similar or more extreme result would occur 5% of the times upon repeating the study in a similar sample. In other words, the P-value informs about the likelihood of the data given the null hypothesis and not the other way around. A possible alternative related to the P-value is the confidence interval (CI). It provides more information on the magnitude of an effect and the imprecision with which that effect was estimated. However, there is no magic bullet to replace P-values and stop erroneous interpretation of scientific results. Scientists and readers alike should make themselves familiar with the correct, nuanced interpretation of statistical tests, P-values and CIs. © The Author 2017. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
Optimizing Aircraft Availability: Where to Spend Your Next O&M Dollar
2010-03-01
patterns of variance are present. In addition, we use the Breusch - Pagan test to statistically determine whether homoscedasticity exists. For this... Breusch - Pagan test , large p-values are preferred so that we may accept the null hypothesis of normality. Failure to meet the fourth assumption is...Next, we show the residual by predicted plot and the Breusch - Pagan test for constant variance of the residuals. The null hypothesis is that the
Saraf, Sanatan; Mathew, Thomas; Roy, Anindya
2015-01-01
For the statistical validation of surrogate endpoints, an alternative formulation is proposed for testing Prentice's fourth criterion, under a bivariate normal model. In such a setup, the criterion involves inference concerning an appropriate regression parameter, and the criterion holds if the regression parameter is zero. Testing such a null hypothesis has been criticized in the literature since it can only be used to reject a poor surrogate, and not to validate a good surrogate. In order to circumvent this, an equivalence hypothesis is formulated for the regression parameter, namely the hypothesis that the parameter is equivalent to zero. Such an equivalence hypothesis is formulated as an alternative hypothesis, so that the surrogate endpoint is statistically validated when the null hypothesis is rejected. Confidence intervals for the regression parameter and tests for the equivalence hypothesis are proposed using bootstrap methods and small sample asymptotics, and their performances are numerically evaluated and recommendations are made. The choice of the equivalence margin is a regulatory issue that needs to be addressed. The proposed equivalence testing formulation is also adopted for other parameters that have been proposed in the literature on surrogate endpoint validation, namely, the relative effect and proportion explained.
Conservative Tests under Satisficing Models of Publication Bias.
McCrary, Justin; Christensen, Garret; Fanelli, Daniele
2016-01-01
Publication bias leads consumers of research to observe a selected sample of statistical estimates calculated by producers of research. We calculate critical values for statistical significance that could help to adjust after the fact for the distortions created by this selection effect, assuming that the only source of publication bias is file drawer bias. These adjusted critical values are easy to calculate and differ from unadjusted critical values by approximately 50%-rather than rejecting a null hypothesis when the t-ratio exceeds 2, the analysis suggests rejecting a null hypothesis when the t-ratio exceeds 3. Samples of published social science research indicate that on average, across research fields, approximately 30% of published t-statistics fall between the standard and adjusted cutoffs.
Conservative Tests under Satisficing Models of Publication Bias
McCrary, Justin; Christensen, Garret; Fanelli, Daniele
2016-01-01
Publication bias leads consumers of research to observe a selected sample of statistical estimates calculated by producers of research. We calculate critical values for statistical significance that could help to adjust after the fact for the distortions created by this selection effect, assuming that the only source of publication bias is file drawer bias. These adjusted critical values are easy to calculate and differ from unadjusted critical values by approximately 50%—rather than rejecting a null hypothesis when the t-ratio exceeds 2, the analysis suggests rejecting a null hypothesis when the t-ratio exceeds 3. Samples of published social science research indicate that on average, across research fields, approximately 30% of published t-statistics fall between the standard and adjusted cutoffs. PMID:26901834
Mathematical Capture of Human Data for Computer Model Building and Validation
2014-04-03
weapon. The Projectile, the VDE , and the IDE weapons had effects of financial loss for the targeted participant, while the MRAD yielded its own...for LE, Centroid and TE for the baseline and The VDE weapon conditions since p-values exceeded α. All other conditions rejected the null...hypothesis except the LE for VDE weapon. The K-S Statistics were correspondingly lower for the measures that failed to reject the null hypothesis. The CDF
Towers, Sherry; Mubayi, Anuj; Castillo-Chavez, Carlos
2018-01-01
When attempting to statistically distinguish between a null and an alternative hypothesis, many researchers in the life and social sciences turn to binned statistical analysis methods, or methods that are simply based on the moments of a distribution (such as the mean, and variance). These methods have the advantage of simplicity of implementation, and simplicity of explanation. However, when null and alternative hypotheses manifest themselves in subtle differences in patterns in the data, binned analysis methods may be insensitive to these differences, and researchers may erroneously fail to reject the null hypothesis when in fact more sensitive statistical analysis methods might produce a different result when the null hypothesis is actually false. Here, with a focus on two recent conflicting studies of contagion in mass killings as instructive examples, we discuss how the use of unbinned likelihood methods makes optimal use of the information in the data; a fact that has been long known in statistical theory, but perhaps is not as widely appreciated amongst general researchers in the life and social sciences. In 2015, Towers et al published a paper that quantified the long-suspected contagion effect in mass killings. However, in 2017, Lankford & Tomek subsequently published a paper, based upon the same data, that claimed to contradict the results of the earlier study. The former used unbinned likelihood methods, and the latter used binned methods, and comparison of distribution moments. Using these analyses, we also discuss how visualization of the data can aid in determination of the most appropriate statistical analysis methods to distinguish between a null and alternate hypothesis. We also discuss the importance of assessment of the robustness of analysis results to methodological assumptions made (for example, arbitrary choices of number of bins and bin widths when using binned methods); an issue that is widely overlooked in the literature, but is critical to analysis reproducibility and robustness. When an analysis cannot distinguish between a null and alternate hypothesis, care must be taken to ensure that the analysis methodology itself maximizes the use of information in the data that can distinguish between the two hypotheses. The use of binned methods by Lankford & Tomek (2017), that examined how many mass killings fell within a 14 day window from a previous mass killing, substantially reduced the sensitivity of their analysis to contagion effects. The unbinned likelihood methods used by Towers et al (2015) did not suffer from this problem. While a binned analysis might be favorable for simplicity and clarity of presentation, unbinned likelihood methods are preferable when effects might be somewhat subtle.
Mubayi, Anuj; Castillo-Chavez, Carlos
2018-01-01
Background When attempting to statistically distinguish between a null and an alternative hypothesis, many researchers in the life and social sciences turn to binned statistical analysis methods, or methods that are simply based on the moments of a distribution (such as the mean, and variance). These methods have the advantage of simplicity of implementation, and simplicity of explanation. However, when null and alternative hypotheses manifest themselves in subtle differences in patterns in the data, binned analysis methods may be insensitive to these differences, and researchers may erroneously fail to reject the null hypothesis when in fact more sensitive statistical analysis methods might produce a different result when the null hypothesis is actually false. Here, with a focus on two recent conflicting studies of contagion in mass killings as instructive examples, we discuss how the use of unbinned likelihood methods makes optimal use of the information in the data; a fact that has been long known in statistical theory, but perhaps is not as widely appreciated amongst general researchers in the life and social sciences. Methods In 2015, Towers et al published a paper that quantified the long-suspected contagion effect in mass killings. However, in 2017, Lankford & Tomek subsequently published a paper, based upon the same data, that claimed to contradict the results of the earlier study. The former used unbinned likelihood methods, and the latter used binned methods, and comparison of distribution moments. Using these analyses, we also discuss how visualization of the data can aid in determination of the most appropriate statistical analysis methods to distinguish between a null and alternate hypothesis. We also discuss the importance of assessment of the robustness of analysis results to methodological assumptions made (for example, arbitrary choices of number of bins and bin widths when using binned methods); an issue that is widely overlooked in the literature, but is critical to analysis reproducibility and robustness. Conclusions When an analysis cannot distinguish between a null and alternate hypothesis, care must be taken to ensure that the analysis methodology itself maximizes the use of information in the data that can distinguish between the two hypotheses. The use of binned methods by Lankford & Tomek (2017), that examined how many mass killings fell within a 14 day window from a previous mass killing, substantially reduced the sensitivity of their analysis to contagion effects. The unbinned likelihood methods used by Towers et al (2015) did not suffer from this problem. While a binned analysis might be favorable for simplicity and clarity of presentation, unbinned likelihood methods are preferable when effects might be somewhat subtle. PMID:29742115
Statistical modeling, detection, and segmentation of stains in digitized fabric images
NASA Astrophysics Data System (ADS)
Gururajan, Arunkumar; Sari-Sarraf, Hamed; Hequet, Eric F.
2007-02-01
This paper will describe a novel and automated system based on a computer vision approach, for objective evaluation of stain release on cotton fabrics. Digitized color images of the stained fabrics are obtained, and the pixel values in the color and intensity planes of these images are probabilistically modeled as a Gaussian Mixture Model (GMM). Stain detection is posed as a decision theoretic problem, where the null hypothesis corresponds to absence of a stain. The null hypothesis and the alternate hypothesis mathematically translate into a first order GMM and a second order GMM respectively. The parameters of the GMM are estimated using a modified Expectation-Maximization (EM) algorithm. Minimum Description Length (MDL) is then used as the test statistic to decide the verity of the null hypothesis. The stain is then segmented by a decision rule based on the probability map generated by the EM algorithm. The proposed approach was tested on a dataset of 48 fabric images soiled with stains of ketchup, corn oil, mustard, ragu sauce, revlon makeup and grape juice. The decision theoretic part of the algorithm produced a correct detection rate (true positive) of 93% and a false alarm rate of 5% on these set of images.
In the Beginning-There Is the Introduction-and Your Study Hypothesis.
Vetter, Thomas R; Mascha, Edward J
2017-05-01
Writing a manuscript for a medical journal is very akin to writing a newspaper article-albeit a scholarly one. Like any journalist, you have a story to tell. You need to tell your story in a way that is easy to follow and makes a compelling case to the reader. Although recommended since the beginning of the 20th century, the conventional Introduction-Methods-Results-And-Discussion (IMRAD) scientific reporting structure has only been the standard since the 1980s. The Introduction should be focused and succinct in communicating the significance, background, rationale, study aims or objectives, and the primary (and secondary, if appropriate) study hypotheses. Hypothesis testing involves posing both a null and an alternative hypothesis. The null hypothesis proposes that no difference or association exists on the outcome variable of interest between the interventions or groups being compared. The alternative hypothesis is the opposite of the null hypothesis and thus typically proposes that a difference in the population does exist between the groups being compared on the parameter of interest. Most investigators seek to reject the null hypothesis because of their expectation that the studied intervention does result in a difference between the study groups or that the association of interest does exist. Therefore, in most clinical and basic science studies and manuscripts, the alternative hypothesis is stated, not the null hypothesis. Also, in the Introduction, the alternative hypothesis is typically stated in the direction of interest, or the expected direction. However, when assessing the association of interest, researchers typically look in both directions (ie, favoring 1 group or the other) by conducting a 2-tailed statistical test because the true direction of the effect is typically not known, and either direction would be important to report.
Statistical power analysis in wildlife research
Steidl, R.J.; Hayes, J.P.
1997-01-01
Statistical power analysis can be used to increase the efficiency of research efforts and to clarify research results. Power analysis is most valuable in the design or planning phases of research efforts. Such prospective (a priori) power analyses can be used to guide research design and to estimate the number of samples necessary to achieve a high probability of detecting biologically significant effects. Retrospective (a posteriori) power analysis has been advocated as a method to increase information about hypothesis tests that were not rejected. However, estimating power for tests of null hypotheses that were not rejected with the effect size observed in the study is incorrect; these power estimates will always be a??0.50 when bias adjusted and have no relation to true power. Therefore, retrospective power estimates based on the observed effect size for hypothesis tests that were not rejected are misleading; retrospective power estimates are only meaningful when based on effect sizes other than the observed effect size, such as those effect sizes hypothesized to be biologically significant. Retrospective power analysis can be used effectively to estimate the number of samples or effect size that would have been necessary for a completed study to have rejected a specific null hypothesis. Simply presenting confidence intervals can provide additional information about null hypotheses that were not rejected, including information about the size of the true effect and whether or not there is adequate evidence to 'accept' a null hypothesis as true. We suggest that (1) statistical power analyses be routinely incorporated into research planning efforts to increase their efficiency, (2) confidence intervals be used in lieu of retrospective power analyses for null hypotheses that were not rejected to assess the likely size of the true effect, (3) minimum biologically significant effect sizes be used for all power analyses, and (4) if retrospective power estimates are to be reported, then the I?-level, effect sizes, and sample sizes used in calculations must also be reported.
Goodman and Kruskal's TAU-B Statistics: A Fortran-77 Subroutine.
ERIC Educational Resources Information Center
Berry, Kenneth J.; Mielke, Paul W., Jr.
1986-01-01
An algorithm and associated FORTRAN-77 computer subroutine are described for computing Goodman and Kruskal's tau-b statistic along with the associated nonasymptotic probability value under the null hypothesis tau=O. (Author)
Bayesian models based on test statistics for multiple hypothesis testing problems.
Ji, Yuan; Lu, Yiling; Mills, Gordon B
2008-04-01
We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.
Bayesian Methods for Determining the Importance of Effects
USDA-ARS?s Scientific Manuscript database
Criticisms have plagued the frequentist null-hypothesis significance testing (NHST) procedure since the day it was created from the Fisher Significance Test and Hypothesis Test of Jerzy Neyman and Egon Pearson. Alternatives to NHST exist in frequentist statistics, but competing methods are also avai...
On Some Assumptions of the Null Hypothesis Statistical Testing
ERIC Educational Resources Information Center
Patriota, Alexandre Galvão
2017-01-01
Bayesian and classical statistical approaches are based on different types of logical principles. In order to avoid mistaken inferences and misguided interpretations, the practitioner must respect the inference rules embedded into each statistical method. Ignoring these principles leads to the paradoxical conclusions that the hypothesis…
The researcher and the consultant: from testing to probability statements.
Hamra, Ghassan B; Stang, Andreas; Poole, Charles
2015-09-01
In the first instalment of this series, Stang and Poole provided an overview of Fisher significance testing (ST), Neyman-Pearson null hypothesis testing (NHT), and their unfortunate and unintended offspring, null hypothesis significance testing. In addition to elucidating the distinction between the first two and the evolution of the third, the authors alluded to alternative models of statistical inference; namely, Bayesian statistics. Bayesian inference has experienced a revival in recent decades, with many researchers advocating for its use as both a complement and an alternative to NHT and ST. This article will continue in the direction of the first instalment, providing practicing researchers with an introduction to Bayesian inference. Our work will draw on the examples and discussion of the previous dialogue.
ERIC Educational Resources Information Center
Case, Catherine; Whitaker, Douglas
2016-01-01
In the criminal justice system, defendants accused of a crime are presumed innocent until proven guilty. Statistical inference in any context is built on an analogous principle: The null hypothesis--often a hypothesis of "no difference" or "no effect"--is presumed true unless there is sufficient evidence against it. In this…
Explorations in Statistics: Permutation Methods
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2012-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This eighth installment of "Explorations in Statistics" explores permutation methods, empiric procedures we can use to assess an experimental result--to test a null hypothesis--when we are reluctant to trust statistical…
Concerns regarding a call for pluralism of information theory and hypothesis testing
Lukacs, P.M.; Thompson, W.L.; Kendall, W.L.; Gould, W.R.; Doherty, P.F.; Burnham, K.P.; Anderson, D.R.
2007-01-01
1. Stephens et al . (2005) argue for `pluralism? in statistical analysis, combining null hypothesis testing and information-theoretic (I-T) methods. We show that I-T methods are more informative even in single variable problems and we provide an ecological example. 2. I-T methods allow inferences to be made from multiple models simultaneously. We believe multimodel inference is the future of data analysis, which cannot be achieved with null hypothesis-testing approaches. 3. We argue for a stronger emphasis on critical thinking in science in general and less reliance on exploratory data analysis and data dredging. Deriving alternative hypotheses is central to science; deriving a single interesting science hypothesis and then comparing it to a default null hypothesis (e.g. `no difference?) is not an efficient strategy for gaining knowledge. We think this single-hypothesis strategy has been relied upon too often in the past. 4. We clarify misconceptions presented by Stephens et al . (2005). 5. We think inference should be made about models, directly linked to scientific hypotheses, and their parameters conditioned on data, Prob(Hj| data). I-T methods provide a basis for this inference. Null hypothesis testing merely provides a probability statement about the data conditioned on a null model, Prob(data |H0). 6. Synthesis and applications. I-T methods provide a more informative approach to inference. I-T methods provide a direct measure of evidence for or against hypotheses and a means to consider simultaneously multiple hypotheses as a basis for rigorous inference. Progress in our science can be accelerated if modern methods can be used intelligently; this includes various I-T and Bayesian methods.
Unadjusted Bivariate Two-Group Comparisons: When Simpler is Better.
Vetter, Thomas R; Mascha, Edward J
2018-01-01
Hypothesis testing involves posing both a null hypothesis and an alternative hypothesis. This basic statistical tutorial discusses the appropriate use, including their so-called assumptions, of the common unadjusted bivariate tests for hypothesis testing and thus comparing study sample data for a difference or association. The appropriate choice of a statistical test is predicated on the type of data being analyzed and compared. The unpaired or independent samples t test is used to test the null hypothesis that the 2 population means are equal, thereby accepting the alternative hypothesis that the 2 population means are not equal. The unpaired t test is intended for comparing dependent continuous (interval or ratio) data from 2 study groups. A common mistake is to apply several unpaired t tests when comparing data from 3 or more study groups. In this situation, an analysis of variance with post hoc (posttest) intragroup comparisons should instead be applied. Another common mistake is to apply a series of unpaired t tests when comparing sequentially collected data from 2 study groups. In this situation, a repeated-measures analysis of variance, with tests for group-by-time interaction, and post hoc comparisons, as appropriate, should instead be applied in analyzing data from sequential collection points. The paired t test is used to assess the difference in the means of 2 study groups when the sample observations have been obtained in pairs, often before and after an intervention in each study subject. The Pearson chi-square test is widely used to test the null hypothesis that 2 unpaired categorical variables, each with 2 or more nominal levels (values), are independent of each other. When the null hypothesis is rejected, 1 concludes that there is a probable association between the 2 unpaired categorical variables. When comparing 2 groups on an ordinal or nonnormally distributed continuous outcome variable, the 2-sample t test is usually not appropriate. The Wilcoxon-Mann-Whitney test is instead preferred. When making paired comparisons on data that are ordinal, or continuous but nonnormally distributed, the Wilcoxon signed-rank test can be used. In analyzing their data, researchers should consider the continued merits of these simple yet equally valid unadjusted bivariate statistical tests. However, the appropriate use of an unadjusted bivariate test still requires a solid understanding of its utility, assumptions (requirements), and limitations. This understanding will mitigate the risk of misleading findings, interpretations, and conclusions.
SANABRIA, FEDERICO; KILLEEN, PETER R.
2008-01-01
Despite being under challenge for the past 50 years, null hypothesis significance testing (NHST) remains dominant in the scientific field for want of viable alternatives. NHST, along with its significance level p, is inadequate for most of the uses to which it is put, a flaw that is of particular interest to educational practitioners who too often must use it to sanctify their research. In this article, we review the failure of NHST and propose prep, the probability of replicating an effect, as a more useful statistic for evaluating research and aiding practical decision making. PMID:19122766
Replication Unreliability in Psychology: Elusive Phenomena or “Elusive” Statistical Power?
Tressoldi, Patrizio E.
2012-01-01
The focus of this paper is to analyze whether the unreliability of results related to certain controversial psychological phenomena may be a consequence of their low statistical power. Applying the Null Hypothesis Statistical Testing (NHST), still the widest used statistical approach, unreliability derives from the failure to refute the null hypothesis, in particular when exact or quasi-exact replications of experiments are carried out. Taking as example the results of meta-analyses related to four different controversial phenomena, subliminal semantic priming, incubation effect for problem solving, unconscious thought theory, and non-local perception, it was found that, except for semantic priming on categorization, the statistical power to detect the expected effect size (ES) of the typical study, is low or very low. The low power in most studies undermines the use of NHST to study phenomena with moderate or low ESs. We conclude by providing some suggestions on how to increase the statistical power or use different statistical approaches to help discriminate whether the results obtained may or may not be used to support or to refute the reality of a phenomenon with small ES. PMID:22783215
Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong
2013-01-01
As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.
Fienup, Daniel M; Critchfield, Thomas S
2010-01-01
Computerized lessons that reflect stimulus equivalence principles were used to teach college students concepts related to inferential statistics and hypothesis decision making. Lesson 1 taught participants concepts related to inferential statistics, and Lesson 2 taught them to base hypothesis decisions on a scientific hypothesis and the direction of an effect. Lesson 3 taught the conditional influence of inferential statistics over decisions regarding the scientific and null hypotheses. Participants entered the study with low scores on the targeted skills and left the study demonstrating a high level of accuracy on these skills, which involved mastering more relations than were taught formally. This study illustrates the efficiency of equivalence-based instruction in establishing academic skills in sophisticated learners. PMID:21358904
Hypothesis testing of a change point during cognitive decline among Alzheimer's disease patients.
Ji, Ming; Xiong, Chengjie; Grundman, Michael
2003-10-01
In this paper, we present a statistical hypothesis test for detecting a change point over the course of cognitive decline among Alzheimer's disease patients. The model under the null hypothesis assumes a constant rate of cognitive decline over time and the model under the alternative hypothesis is a general bilinear model with an unknown change point. When the change point is unknown, however, the null distribution of the test statistics is not analytically tractable and has to be simulated by parametric bootstrap. When the alternative hypothesis that a change point exists is accepted, we propose an estimate of its location based on the Akaike's Information Criterion. We applied our method to a data set from the Neuropsychological Database Initiative by implementing our hypothesis testing method to analyze Mini Mental Status Exam scores based on a random-slope and random-intercept model with a bilinear fixed effect. Our result shows that despite large amount of missing data, accelerated decline did occur for MMSE among AD patients. Our finding supports the clinical belief of the existence of a change point during cognitive decline among AD patients and suggests the use of change point models for the longitudinal modeling of cognitive decline in AD research.
Huang, Peng; Ou, Ai-hua; Piantadosi, Steven; Tan, Ming
2014-11-01
We discuss the problem of properly defining treatment superiority through the specification of hypotheses in clinical trials. The need to precisely define the notion of superiority in a one-sided hypothesis test problem has been well recognized by many authors. Ideally designed null and alternative hypotheses should correspond to a partition of all possible scenarios of underlying true probability models P={P(ω):ω∈Ω} such that the alternative hypothesis Ha={P(ω):ω∈Ωa} can be inferred upon the rejection of null hypothesis Ho={P(ω):ω∈Ω(o)} However, in many cases, tests are carried out and recommendations are made without a precise definition of superiority or a specification of alternative hypothesis. Moreover, in some applications, the union of probability models specified by the chosen null and alternative hypothesis does not constitute a completed model collection P (i.e., H(o)∪H(a) is smaller than P). This not only imposes a strong non-validated assumption of the underlying true models, but also leads to different superiority claims depending on which test is used instead of scientific plausibility. Different ways to partition P fro testing treatment superiority often have different implications on sample size, power, and significance in both efficacy and comparative effectiveness trial design. Such differences are often overlooked. We provide a theoretical framework for evaluating the statistical properties of different specification of superiority in typical hypothesis testing. This can help investigators to select proper hypotheses for treatment comparison inclinical trial design. Copyright © 2014 Elsevier Inc. All rights reserved.
Invited Commentary: Can Issues With Reproducibility in Science Be Blamed on Hypothesis Testing?
Weinberg, Clarice R.
2017-01-01
Abstract In the accompanying article (Am J Epidemiol. 2017;186(6):646–647), Dr. Timothy Lash makes a forceful case that the problems with reproducibility in science stem from our “culture” of null hypothesis significance testing. He notes that when attention is selectively given to statistically significant findings, the estimated effects will be systematically biased away from the null. Here I revisit the recent history of genetic epidemiology and argue for retaining statistical testing as an important part of the tool kit. Particularly when many factors are considered in an agnostic way, in what Lash calls “innovative” research, investigators need a selection strategy to identify which findings are most likely to be genuine, and hence worthy of further study. PMID:28938713
Confidence Intervals for Effect Sizes: Applying Bootstrap Resampling
ERIC Educational Resources Information Center
Banjanovic, Erin S.; Osborne, Jason W.
2016-01-01
Confidence intervals for effect sizes (CIES) provide readers with an estimate of the strength of a reported statistic as well as the relative precision of the point estimate. These statistics offer more information and context than null hypothesis statistic testing. Although confidence intervals have been recommended by scholars for many years,…
Elaborating Selected Statistical Concepts with Common Experience.
ERIC Educational Resources Information Center
Weaver, Kenneth A.
1992-01-01
Presents ways of elaborating statistical concepts so as to make course material more meaningful for students. Describes examples using exclamations, circus and cartoon characters, and falling leaves to illustrate variability, null hypothesis testing, and confidence interval. Concludes that the exercises increase student comprehension of the text…
Bayesian evaluation of effect size after replicating an original study
van Aert, Robbie C. M.; van Assen, Marcel A. L. M.
2017-01-01
The vast majority of published results in the literature is statistically significant, which raises concerns about their reliability. The Reproducibility Project Psychology (RPP) and Experimental Economics Replication Project (EE-RP) both replicated a large number of published studies in psychology and economics. The original study and replication were statistically significant in 36.1% in RPP and 68.8% in EE-RP suggesting many null effects among the replicated studies. However, evidence in favor of the null hypothesis cannot be examined with null hypothesis significance testing. We developed a Bayesian meta-analysis method called snapshot hybrid that is easy to use and understand and quantifies the amount of evidence in favor of a zero, small, medium and large effect. The method computes posterior model probabilities for a zero, small, medium, and large effect and adjusts for publication bias by taking into account that the original study is statistically significant. We first analytically approximate the methods performance, and demonstrate the necessity to control for the original study’s significance to enable the accumulation of evidence for a true zero effect. Then we applied the method to the data of RPP and EE-RP, showing that the underlying effect sizes of the included studies in EE-RP are generally larger than in RPP, but that the sample sizes of especially the included studies in RPP are often too small to draw definite conclusions about the true effect size. We also illustrate how snapshot hybrid can be used to determine the required sample size of the replication akin to power analysis in null hypothesis significance testing and present an easy to use web application (https://rvanaert.shinyapps.io/snapshot/) and R code for applying the method. PMID:28388646
The frequentist implications of optional stopping on Bayesian hypothesis tests.
Sanborn, Adam N; Hills, Thomas T
2014-04-01
Null hypothesis significance testing (NHST) is the most commonly used statistical methodology in psychology. The probability of achieving a value as extreme or more extreme than the statistic obtained from the data is evaluated, and if it is low enough, the null hypothesis is rejected. However, because common experimental practice often clashes with the assumptions underlying NHST, these calculated probabilities are often incorrect. Most commonly, experimenters use tests that assume that sample sizes are fixed in advance of data collection but then use the data to determine when to stop; in the limit, experimenters can use data monitoring to guarantee that the null hypothesis will be rejected. Bayesian hypothesis testing (BHT) provides a solution to these ills because the stopping rule used is irrelevant to the calculation of a Bayes factor. In addition, there are strong mathematical guarantees on the frequentist properties of BHT that are comforting for researchers concerned that stopping rules could influence the Bayes factors produced. Here, we show that these guaranteed bounds have limited scope and often do not apply in psychological research. Specifically, we quantitatively demonstrate the impact of optional stopping on the resulting Bayes factors in two common situations: (1) when the truth is a combination of the hypotheses, such as in a heterogeneous population, and (2) when a hypothesis is composite-taking multiple parameter values-such as the alternative hypothesis in a t-test. We found that, for these situations, while the Bayesian interpretation remains correct regardless of the stopping rule used, the choice of stopping rule can, in some situations, greatly increase the chance of experimenters finding evidence in the direction they desire. We suggest ways to control these frequentist implications of stopping rules on BHT.
Performing Inferential Statistics Prior to Data Collection
ERIC Educational Resources Information Center
Trafimow, David; MacDonald, Justin A.
2017-01-01
Typically, in education and psychology research, the investigator collects data and subsequently performs descriptive and inferential statistics. For example, a researcher might compute group means and use the null hypothesis significance testing procedure to draw conclusions about the populations from which the groups were drawn. We propose an…
Significance levels for studies with correlated test statistics.
Shi, Jianxin; Levinson, Douglas F; Whittemore, Alice S
2008-07-01
When testing large numbers of null hypotheses, one needs to assess the evidence against the global null hypothesis that none of the hypotheses is false. Such evidence typically is based on the test statistic of the largest magnitude, whose statistical significance is evaluated by permuting the sample units to simulate its null distribution. Efron (2007) has noted that correlation among the test statistics can induce substantial interstudy variation in the shapes of their histograms, which may cause misleading tail counts. Here, we show that permutation-based estimates of the overall significance level also can be misleading when the test statistics are correlated. We propose that such estimates be conditioned on a simple measure of the spread of the observed histogram, and we provide a method for obtaining conditional significance levels. We justify this conditioning using the conditionality principle described by Cox and Hinkley (1974). Application of the method to gene expression data illustrates the circumstances when conditional significance levels are needed.
Three Strategies for the Critical Use of Statistical Methods in Psychological Research
ERIC Educational Resources Information Center
Campitelli, Guillermo; Macbeth, Guillermo; Ospina, Raydonal; Marmolejo-Ramos, Fernando
2017-01-01
We present three strategies to replace the null hypothesis statistical significance testing approach in psychological research: (1) visual representation of cognitive processes and predictions, (2) visual representation of data distributions and choice of the appropriate distribution for analysis, and (3) model comparison. The three strategies…
Changing world extreme temperature statistics
NASA Astrophysics Data System (ADS)
Finkel, J. M.; Katz, J. I.
2018-04-01
We use the Global Historical Climatology Network--daily database to calculate a nonparametric statistic that describes the rate at which all-time daily high and low temperature records have been set in nine geographic regions (continents or major portions of continents) during periods mostly from the mid-20th Century to the present. This statistic was defined in our earlier work on temperature records in the 48 contiguous United States. In contrast to this earlier work, we find that in every region except North America all-time high records were set at a rate significantly (at least $3\\sigma$) higher than in the null hypothesis of a stationary climate. Except in Antarctica, all-time low records were set at a rate significantly lower than in the null hypothesis. In Europe, North Africa and North Asia the rate of setting new all-time highs increased suddenly in the 1990's, suggesting a change in regional climate regime; in most other regions there was a steadier increase.
Data-driven inference for the spatial scan statistic.
Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C
2011-08-02
Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
ERIC Educational Resources Information Center
Spinella, Sarah
2011-01-01
As result replicability is essential to science and difficult to achieve through external replicability, the present paper notes the insufficiency of null hypothesis statistical significance testing (NHSST) and explains the bootstrap as a plausible alternative, with a heuristic example to illustrate the bootstrap method. The bootstrap relies on…
I Am 95% Confident That the Earth is Round: An Interview about Statistics with Chris Spatz.
ERIC Educational Resources Information Center
Dillon, Kathleen M.
1999-01-01
Presents an interview with Chris Spatz who is a professor of psychology at Hendrix College in Conway (Arkansas). Discusses the null hypothesis statistical texts (NHST) and the arguments for and against the use of NHST, the changes in research articles, textbook changes, and the Internet. (CMK)
Invited Commentary: Can Issues With Reproducibility in Science Be Blamed on Hypothesis Testing?
Weinberg, Clarice R
2017-09-15
In the accompanying article (Am J Epidemiol. 2017;186(6):646-647), Dr. Timothy Lash makes a forceful case that the problems with reproducibility in science stem from our "culture" of null hypothesis significance testing. He notes that when attention is selectively given to statistically significant findings, the estimated effects will be systematically biased away from the null. Here I revisit the recent history of genetic epidemiology and argue for retaining statistical testing as an important part of the tool kit. Particularly when many factors are considered in an agnostic way, in what Lash calls "innovative" research, investigators need a selection strategy to identify which findings are most likely to be genuine, and hence worthy of further study. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health 2017. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Significance tests for functional data with complex dependence structure.
Staicu, Ana-Maria; Lahiri, Soumen N; Carroll, Raymond J
2015-01-01
We propose an L 2 -norm based global testing procedure for the null hypothesis that multiple group mean functions are equal, for functional data with complex dependence structure. Specifically, we consider the setting of functional data with a multilevel structure of the form groups-clusters or subjects-units, where the unit-level profiles are spatially correlated within the cluster, and the cluster-level data are independent. Orthogonal series expansions are used to approximate the group mean functions and the test statistic is estimated using the basis coefficients. The asymptotic null distribution of the test statistic is developed, under mild regularity conditions. To our knowledge this is the first work that studies hypothesis testing, when data have such complex multilevel functional and spatial structure. Two small-sample alternatives, including a novel block bootstrap for functional data, are proposed, and their performance is examined in simulation studies. The paper concludes with an illustration of a motivating experiment.
2016-12-01
KS and AD Statistical Power via Monte Carlo Simulation Statistical power is the probability of correctly rejecting the null hypothesis when the...Select a caveat DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited. Determining the Statistical Power...real-world data to test the accuracy of the simulation. Statistical comparison of these metrics can be necessary when making such a determination
Invited Commentary: The Need for Cognitive Science in Methodology.
Greenland, Sander
2017-09-15
There is no complete solution for the problem of abuse of statistics, but methodological training needs to cover cognitive biases and other psychosocial factors affecting inferences. The present paper discusses 3 common cognitive distortions: 1) dichotomania, the compulsion to perceive quantities as dichotomous even when dichotomization is unnecessary and misleading, as in inferences based on whether a P value is "statistically significant"; 2) nullism, the tendency to privilege the hypothesis of no difference or no effect when there is no scientific basis for doing so, as when testing only the null hypothesis; and 3) statistical reification, treating hypothetical data distributions and statistical models as if they reflect known physical laws rather than speculative assumptions for thought experiments. As commonly misused, null-hypothesis significance testing combines these cognitive problems to produce highly distorted interpretation and reporting of study results. Interval estimation has so far proven to be an inadequate solution because it involves dichotomization, an avenue for nullism. Sensitivity and bias analyses have been proposed to address reproducibility problems (Am J Epidemiol. 2017;186(6):646-647); these methods can indeed address reification, but they can also introduce new distortions via misleading specifications for bias parameters. P values can be reframed to lessen distortions by presenting them without reference to a cutoff, providing them for relevant alternatives to the null, and recognizing their dependence on all assumptions used in their computation; they nonetheless require rescaling for measuring evidence. I conclude that methodological development and training should go beyond coverage of mechanistic biases (e.g., confounding, selection bias, measurement error) to cover distortions of conclusions produced by statistical methods and psychosocial forces. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Nikolakopoulou, Adriani; Mavridis, Dimitris; Furukawa, Toshi A; Cipriani, Andrea; Tricco, Andrea C; Straus, Sharon E; Siontis, George C M; Egger, Matthias; Salanti, Georgia
2018-02-28
To examine whether the continuous updating of networks of prospectively planned randomised controlled trials (RCTs) ("living" network meta-analysis) provides strong evidence against the null hypothesis in comparative effectiveness of medical interventions earlier than the updating of conventional, pairwise meta-analysis. Empirical study of the accumulating evidence about the comparative effectiveness of clinical interventions. Database of network meta-analyses of RCTs identified through searches of Medline, Embase, and the Cochrane Database of Systematic Reviews until 14 April 2015. Network meta-analyses published after January 2012 that compared at least five treatments and included at least 20 RCTs. Clinical experts were asked to identify in each network the treatment comparison of greatest clinical interest. Comparisons were excluded for which direct and indirect evidence disagreed, based on side, or node, splitting test (P<0.10). Cumulative pairwise and network meta-analyses were performed for each selected comparison. Monitoring boundaries of statistical significance were constructed and the evidence against the null hypothesis was considered to be strong when the monitoring boundaries were crossed. A significance level was defined as α=5%, power of 90% (β=10%), and an anticipated treatment effect to detect equal to the final estimate from the network meta-analysis. The frequency and time to strong evidence was compared against the null hypothesis between pairwise and network meta-analyses. 49 comparisons of interest from 44 networks were included; most (n=39, 80%) were between active drugs, mainly from the specialties of cardiology, endocrinology, psychiatry, and rheumatology. 29 comparisons were informed by both direct and indirect evidence (59%), 13 by indirect evidence (27%), and 7 by direct evidence (14%). Both network and pairwise meta-analysis provided strong evidence against the null hypothesis for seven comparisons, but for an additional 10 comparisons only network meta-analysis provided strong evidence against the null hypothesis (P=0.002). The median time to strong evidence against the null hypothesis was 19 years with living network meta-analysis and 23 years with living pairwise meta-analysis (hazard ratio 2.78, 95% confidence interval 1.00 to 7.72, P=0.05). Studies directly comparing the treatments of interest continued to be published for eight comparisons after strong evidence had become evident in network meta-analysis. In comparative effectiveness research, prospectively planned living network meta-analyses produced strong evidence against the null hypothesis more often and earlier than conventional, pairwise meta-analyses. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Nikolakopoulou, Adriani; Mavridis, Dimitris; Furukawa, Toshi A; Cipriani, Andrea; Tricco, Andrea C; Straus, Sharon E; Siontis, George C M; Egger, Matthias
2018-01-01
Abstract Objective To examine whether the continuous updating of networks of prospectively planned randomised controlled trials (RCTs) (“living” network meta-analysis) provides strong evidence against the null hypothesis in comparative effectiveness of medical interventions earlier than the updating of conventional, pairwise meta-analysis. Design Empirical study of the accumulating evidence about the comparative effectiveness of clinical interventions. Data sources Database of network meta-analyses of RCTs identified through searches of Medline, Embase, and the Cochrane Database of Systematic Reviews until 14 April 2015. Eligibility criteria for study selection Network meta-analyses published after January 2012 that compared at least five treatments and included at least 20 RCTs. Clinical experts were asked to identify in each network the treatment comparison of greatest clinical interest. Comparisons were excluded for which direct and indirect evidence disagreed, based on side, or node, splitting test (P<0.10). Outcomes and analysis Cumulative pairwise and network meta-analyses were performed for each selected comparison. Monitoring boundaries of statistical significance were constructed and the evidence against the null hypothesis was considered to be strong when the monitoring boundaries were crossed. A significance level was defined as α=5%, power of 90% (β=10%), and an anticipated treatment effect to detect equal to the final estimate from the network meta-analysis. The frequency and time to strong evidence was compared against the null hypothesis between pairwise and network meta-analyses. Results 49 comparisons of interest from 44 networks were included; most (n=39, 80%) were between active drugs, mainly from the specialties of cardiology, endocrinology, psychiatry, and rheumatology. 29 comparisons were informed by both direct and indirect evidence (59%), 13 by indirect evidence (27%), and 7 by direct evidence (14%). Both network and pairwise meta-analysis provided strong evidence against the null hypothesis for seven comparisons, but for an additional 10 comparisons only network meta-analysis provided strong evidence against the null hypothesis (P=0.002). The median time to strong evidence against the null hypothesis was 19 years with living network meta-analysis and 23 years with living pairwise meta-analysis (hazard ratio 2.78, 95% confidence interval 1.00 to 7.72, P=0.05). Studies directly comparing the treatments of interest continued to be published for eight comparisons after strong evidence had become evident in network meta-analysis. Conclusions In comparative effectiveness research, prospectively planned living network meta-analyses produced strong evidence against the null hypothesis more often and earlier than conventional, pairwise meta-analyses. PMID:29490922
Test of association: which one is the most appropriate for my study?
Gonzalez-Chica, David Alejandro; Bastos, João Luiz; Duquia, Rodrigo Pereira; Bonamigo, Renan Rangel; Martínez-Mesa, Jeovany
2015-01-01
Hypothesis tests are statistical tools widely used for assessing whether or not there is an association between two or more variables. These tests provide a probability of the type 1 error (p-value), which is used to accept or reject the null study hypothesis. To provide a practical guide to help researchers carefully select the most appropriate procedure to answer the research question. We discuss the logic of hypothesis testing and present the prerequisites of each procedure based on practical examples.
High Impact = High Statistical Standards? Not Necessarily So
Tressoldi, Patrizio E.; Giofré, David; Sella, Francesco; Cumming, Geoff
2013-01-01
What are the statistical practices of articles published in journals with a high impact factor? Are there differences compared with articles published in journals with a somewhat lower impact factor that have adopted editorial policies to reduce the impact of limitations of Null Hypothesis Significance Testing? To investigate these questions, the current study analyzed all articles related to psychological, neuropsychological and medical issues, published in 2011 in four journals with high impact factors: Science, Nature, The New England Journal of Medicine and The Lancet, and three journals with relatively lower impact factors: Neuropsychology, Journal of Experimental Psychology-Applied and the American Journal of Public Health. Results show that Null Hypothesis Significance Testing without any use of confidence intervals, effect size, prospective power and model estimation, is the prevalent statistical practice used in articles published in Nature, 89%, followed by articles published in Science, 42%. By contrast, in all other journals, both with high and lower impact factors, most articles report confidence intervals and/or effect size measures. We interpreted these differences as consequences of the editorial policies adopted by the journal editors, which are probably the most effective means to improve the statistical practices in journals with high or low impact factors. PMID:23418533
High impact = high statistical standards? Not necessarily so.
Tressoldi, Patrizio E; Giofré, David; Sella, Francesco; Cumming, Geoff
2013-01-01
What are the statistical practices of articles published in journals with a high impact factor? Are there differences compared with articles published in journals with a somewhat lower impact factor that have adopted editorial policies to reduce the impact of limitations of Null Hypothesis Significance Testing? To investigate these questions, the current study analyzed all articles related to psychological, neuropsychological and medical issues, published in 2011 in four journals with high impact factors: Science, Nature, The New England Journal of Medicine and The Lancet, and three journals with relatively lower impact factors: Neuropsychology, Journal of Experimental Psychology-Applied and the American Journal of Public Health. Results show that Null Hypothesis Significance Testing without any use of confidence intervals, effect size, prospective power and model estimation, is the prevalent statistical practice used in articles published in Nature, 89%, followed by articles published in Science, 42%. By contrast, in all other journals, both with high and lower impact factors, most articles report confidence intervals and/or effect size measures. We interpreted these differences as consequences of the editorial policies adopted by the journal editors, which are probably the most effective means to improve the statistical practices in journals with high or low impact factors.
Outlier Removal and the Relation with Reporting Errors and Quality of Psychological Research
Bakker, Marjan; Wicherts, Jelte M.
2014-01-01
Background The removal of outliers to acquire a significant result is a questionable research practice that appears to be commonly used in psychology. In this study, we investigated whether the removal of outliers in psychology papers is related to weaker evidence (against the null hypothesis of no effect), a higher prevalence of reporting errors, and smaller sample sizes in these papers compared to papers in the same journals that did not report the exclusion of outliers from the analyses. Methods and Findings We retrieved a total of 2667 statistical results of null hypothesis significance tests from 153 articles in main psychology journals, and compared results from articles in which outliers were removed (N = 92) with results from articles that reported no exclusion of outliers (N = 61). We preregistered our hypotheses and methods and analyzed the data at the level of articles. Results show no significant difference between the two types of articles in median p value, sample sizes, or prevalence of all reporting errors, large reporting errors, and reporting errors that concerned the statistical significance. However, we did find a discrepancy between the reported degrees of freedom of t tests and the reported sample size in 41% of articles that did not report removal of any data values. This suggests common failure to report data exclusions (or missingness) in psychological articles. Conclusions We failed to find that the removal of outliers from the analysis in psychological articles was related to weaker evidence (against the null hypothesis of no effect), sample size, or the prevalence of errors. However, our control sample might be contaminated due to nondisclosure of excluded values in articles that did not report exclusion of outliers. Results therefore highlight the importance of more transparent reporting of statistical analyses. PMID:25072606
Classical Statistics and Statistical Learning in Imaging Neuroscience
Bzdok, Danilo
2017-01-01
Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques. PMID:29056896
Sources of Error and the Statistical Formulation of M S: m b Seismic Event Screening Analysis
NASA Astrophysics Data System (ADS)
Anderson, D. N.; Patton, H. J.; Taylor, S. R.; Bonner, J. L.; Selby, N. D.
2014-03-01
The Comprehensive Nuclear-Test-Ban Treaty (CTBT), a global ban on nuclear explosions, is currently in a ratification phase. Under the CTBT, an International Monitoring System (IMS) of seismic, hydroacoustic, infrasonic and radionuclide sensors is operational, and the data from the IMS is analysed by the International Data Centre (IDC). The IDC provides CTBT signatories basic seismic event parameters and a screening analysis indicating whether an event exhibits explosion characteristics (for example, shallow depth). An important component of the screening analysis is a statistical test of the null hypothesis H 0: explosion characteristics using empirical measurements of seismic energy (magnitudes). The established magnitude used for event size is the body-wave magnitude (denoted m b) computed from the initial segment of a seismic waveform. IDC screening analysis is applied to events with m b greater than 3.5. The Rayleigh wave magnitude (denoted M S) is a measure of later arriving surface wave energy. Magnitudes are measurements of seismic energy that include adjustments (physical correction model) for path and distance effects between event and station. Relative to m b, earthquakes generally have a larger M S magnitude than explosions. This article proposes a hypothesis test (screening analysis) using M S and m b that expressly accounts for physical correction model inadequacy in the standard error of the test statistic. With this hypothesis test formulation, the 2009 Democratic Peoples Republic of Korea announced nuclear weapon test fails to reject the null hypothesis H 0: explosion characteristics.
Map LineUps: Effects of spatial structure on graphical inference.
Beecham, Roger; Dykes, Jason; Meulemans, Wouter; Slingsby, Aidan; Turkay, Cagatay; Wood, Jo
2017-01-01
Fundamental to the effective use of visualization as an analytic and descriptive tool is the assurance that presenting data visually provides the capability of making inferences from what we see. This paper explores two related approaches to quantifying the confidence we may have in making visual inferences from mapped geospatial data. We adapt Wickham et al.'s 'Visual Line-up' method as a direct analogy with Null Hypothesis Significance Testing (NHST) and propose a new approach for generating more credible spatial null hypotheses. Rather than using as a spatial null hypothesis the unrealistic assumption of complete spatial randomness, we propose spatially autocorrelated simulations as alternative nulls. We conduct a set of crowdsourced experiments (n=361) to determine the just noticeable difference (JND) between pairs of choropleth maps of geographic units controlling for spatial autocorrelation (Moran's I statistic) and geometric configuration (variance in spatial unit area). Results indicate that people's abilities to perceive differences in spatial autocorrelation vary with baseline autocorrelation structure and the geometric configuration of geographic units. These results allow us, for the first time, to construct a visual equivalent of statistical power for geospatial data. Our JND results add to those provided in recent years by Klippel et al. (2011), Harrison et al. (2014) and Kay & Heer (2015) for correlation visualization. Importantly, they provide an empirical basis for an improved construction of visual line-ups for maps and the development of theory to inform geospatial tests of graphical inference.
Ling, Zhi-Qiang; Wang, Yi; Mukaisho, Kenichi; Hattori, Takanori; Tatsuta, Takeshi; Ge, Ming-Hua; Jin, Li; Mao, Wei-Min; Sugihara, Hiroyuki
2010-06-01
Tests of differentially expressed genes (DEGs) from microarray experiments are based on the null hypothesis that genes that are irrelevant to the phenotype/stimulus are expressed equally in the target and control samples. However, this strict hypothesis is not always true, as there can be several transcriptomic background differences between target and control samples, including different cell/tissue types, different cell cycle stages and different biological donors. These differences lead to increased false positives, which have little biological/medical significance. In this article, we propose a statistical framework to identify DEGs between target and control samples from expression microarray data allowing transcriptomic background differences between these samples by introducing a modified null hypothesis that the gene expression background difference is normally distributed. We use an iterative procedure to perform robust estimation of the null hypothesis and identify DEGs as outliers. We evaluated our method using our own triplicate microarray experiment, followed by validations with reverse transcription-polymerase chain reaction (RT-PCR) and on the MicroArray Quality Control dataset. The evaluations suggest that our technique (i) results in less false positive and false negative results, as measured by the degree of agreement with RT-PCR of the same samples, (ii) can be applied to different microarray platforms and results in better reproducibility as measured by the degree of DEG identification concordance both intra- and inter-platforms and (iii) can be applied efficiently with only a few microarray replicates. Based on these evaluations, we propose that this method not only identifies more reliable and biologically/medically significant DEG, but also reduces the power-cost tradeoff problem in the microarray field. Source code and binaries freely available for download at http://comonca.org.cn/fdca/resources/softwares/deg.zip.
A SIGNIFICANCE TEST FOR THE LASSO1
Lockhart, Richard; Taylor, Jonathan; Tibshirani, Ryan J.; Tibshirani, Robert
2014-01-01
In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an Exp(1) asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i.e., testing for a single significant predictor variable against the global null) requires only weak assumptions on the predictor matrix X. On the other hand, our proof for a general step in the lasso path places further technical assumptions on X and the generative model, but still allows for the important high-dimensional case p > n, and does not necessarily require that the current lasso model achieves perfect recovery of the truly active variables. Of course, for testing the significance of an additional variable between two nested linear models, one typically uses the chi-squared test, comparing the drop in residual sum of squares (RSS) to a χ12 distribution. But when this additional variable is not fixed, and has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than χ12 under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter λ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the l1 penalty. Therefore, the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties—adaptivity and shrinkage—and its null distribution is tractable and asymptotically Exp(1). PMID:25574062
Krefeld-Schwalb, Antonia; Witte, Erich H.; Zenker, Frank
2018-01-01
In psychology as elsewhere, the main statistical inference strategy to establish empirical effects is null-hypothesis significance testing (NHST). The recent failure to replicate allegedly well-established NHST-results, however, implies that such results lack sufficient statistical power, and thus feature unacceptably high error-rates. Using data-simulation to estimate the error-rates of NHST-results, we advocate the research program strategy (RPS) as a superior methodology. RPS integrates Frequentist with Bayesian inference elements, and leads from a preliminary discovery against a (random) H0-hypothesis to a statistical H1-verification. Not only do RPS-results feature significantly lower error-rates than NHST-results, RPS also addresses key-deficits of a “pure” Frequentist and a standard Bayesian approach. In particular, RPS aggregates underpowered results safely. RPS therefore provides a tool to regain the trust the discipline had lost during the ongoing replicability-crisis. PMID:29740363
Krefeld-Schwalb, Antonia; Witte, Erich H; Zenker, Frank
2018-01-01
In psychology as elsewhere, the main statistical inference strategy to establish empirical effects is null-hypothesis significance testing (NHST). The recent failure to replicate allegedly well-established NHST-results, however, implies that such results lack sufficient statistical power, and thus feature unacceptably high error-rates. Using data-simulation to estimate the error-rates of NHST-results, we advocate the research program strategy (RPS) as a superior methodology. RPS integrates Frequentist with Bayesian inference elements, and leads from a preliminary discovery against a (random) H 0 -hypothesis to a statistical H 1 -verification. Not only do RPS-results feature significantly lower error-rates than NHST-results, RPS also addresses key-deficits of a "pure" Frequentist and a standard Bayesian approach. In particular, RPS aggregates underpowered results safely. RPS therefore provides a tool to regain the trust the discipline had lost during the ongoing replicability-crisis.
Estimating Required Contingency Funds for Construction Projects using Multiple Linear Regression
2006-03-01
Breusch - Pagan test , in which the null hypothesis states that the residuals have constant variance. The alternate hypothesis is that the residuals do not...variance, the Breusch - Pagan test provides statistical evidence that the assumption is justified. For the proposed model, the p-value is 0.173...entire test sample. v Acknowledgments First, I would like to acknowledge the influence and help of Greg Hoffman. His work served as the
Building Intuitions about Statistical Inference Based on Resampling
ERIC Educational Resources Information Center
Watson, Jane; Chance, Beth
2012-01-01
Formal inference, which makes theoretical assumptions about distributions and applies hypothesis testing procedures with null and alternative hypotheses, is notoriously difficult for tertiary students to master. The debate about whether this content should appear in Years 11 and 12 of the "Australian Curriculum: Mathematics" has gone on…
Predictive Fusion of Geophysical Waveforms using Fisher's Method, under the Alternative Hypothesis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carmichael, Joshua Daniel; Nemzek, Robert James; Webster, Jeremy David
2017-05-05
This presentation tries to understand how to combine different signatures from an event or source together in a defensible way. The objective was to build a digital detector that continuously combines detection statistics recording explosions to screen sources of interest from null sources.
Random variability explains apparent global clustering of large earthquakes
Michael, A.J.
2011-01-01
The occurrence of 5 Mw ≥ 8.5 earthquakes since 2004 has created a debate over whether or not we are in a global cluster of large earthquakes, temporarily raising risks above long-term levels. I use three classes of statistical tests to determine if the record of M ≥ 7 earthquakes since 1900 can reject a null hypothesis of independent random events with a constant rate plus localized aftershock sequences. The data cannot reject this null hypothesis. Thus, the temporal distribution of large global earthquakes is well-described by a random process, plus localized aftershocks, and apparent clustering is due to random variability. Therefore the risk of future events has not increased, except within ongoing aftershock sequences, and should be estimated from the longest possible record of events.
Hypothesis testing for band size detection of high-dimensional banded precision matrices.
An, Baiguo; Guo, Jianhua; Liu, Yufeng
2014-06-01
Many statistical analysis procedures require a good estimator for a high-dimensional covariance matrix or its inverse, the precision matrix. When the precision matrix is banded, the Cholesky-based method often yields a good estimator of the precision matrix. One important aspect of this method is determination of the band size of the precision matrix. In practice, crossvalidation is commonly used; however, we show that crossvalidation not only is computationally intensive but can be very unstable. In this paper, we propose a new hypothesis testing procedure to determine the band size in high dimensions. Our proposed test statistic is shown to be asymptotically normal under the null hypothesis, and its theoretical power is studied. Numerical examples demonstrate the effectiveness of our testing procedure.
ERIC Educational Resources Information Center
Stallings, William M.
In the educational research literature alpha, the a priori level of significance, and p, the a posteriori probability of obtaining a test statistic of at least a certain value when the null hypothesis is true, are often confused. Explanations for this confusion are offered. Paradoxically, alpha retains a prominent place in textbook discussions of…
Bayes Factor Approaches for Testing Interval Null Hypotheses
ERIC Educational Resources Information Center
Morey, Richard D.; Rouder, Jeffrey N.
2011-01-01
Psychological theories are statements of constraint. The role of hypothesis testing in psychology is to test whether specific theoretical constraints hold in data. Bayesian statistics is well suited to the task of finding supporting evidence for constraint, because it allows for comparing evidence for 2 hypotheses against each another. One issue…
Neuroimaging Research: from Null-Hypothesis Falsification to Out-Of-Sample Generalization
ERIC Educational Resources Information Center
Bzdok, Danilo; Varoquaux, Gaël; Thirion, Bertrand
2017-01-01
Brain-imaging technology has boosted the quantification of neurobiological phenomena underlying human mental operations and their disturbances. Since its inception, drawing inference on neurophysiological effects hinged on classical statistical methods, especially, the general linear model. The tens of thousands of variables per brain scan were…
Students' Understanding of Conditional Probability on Entering University
ERIC Educational Resources Information Center
Reaburn, Robyn
2013-01-01
An understanding of conditional probability is essential for students of inferential statistics as it is used in Null Hypothesis Tests. Conditional probability is also used in Bayes' theorem, in the interpretation of medical screening tests and in quality control procedures. This study examines the understanding of conditional probability of…
NASA Astrophysics Data System (ADS)
Cannas, Barbara; Fanni, Alessandra; Murari, Andrea; Pisano, Fabio; Contributors, JET
2018-02-01
In this paper, the dynamic characteristics of type-I ELM time-series from the JET tokamak, the world’s largest magnetic confinement plasma physics experiment, have been investigated. The dynamic analysis has been focused on the detection of nonlinear structure in D α radiation time series. Firstly, the method of surrogate data has been applied to evaluate the statistical significance of the null hypothesis of static nonlinear distortion of an underlying Gaussian linear process. Several nonlinear statistics have been evaluated, such us the time delayed mutual information, the correlation dimension and the maximal Lyapunov exponent. The obtained results allow us to reject the null hypothesis, giving evidence of underlying nonlinear dynamics. Moreover, no evidence of low-dimensional chaos has been found; indeed, the analysed time series are better characterized by the power law sensitivity to initial conditions which can suggest a motion at the ‘edge of chaos’, at the border between chaotic and regular non-chaotic dynamics. This uncertainty makes it necessary to further investigate about the nature of the nonlinear dynamics. For this purpose, a second surrogate test to distinguish chaotic orbits from pseudo-periodic orbits has been applied. In this case, we cannot reject the null hypothesis which means that the ELM time series is possibly pseudo-periodic. In order to reproduce pseudo-periodic dynamical properties, a periodic state-of-the-art model, proposed to reproduce the ELM cycle, has been corrupted by a dynamical noise, obtaining time series qualitatively in agreement with experimental time series.
Diaz, Francisco J.; McDonald, Peter R.; Pinter, Abraham; Chaguturu, Rathnam
2018-01-01
Biomolecular screening research frequently searches for the chemical compounds that are most likely to make a biochemical or cell-based assay system produce a strong continuous response. Several doses are tested with each compound and it is assumed that, if there is a dose-response relationship, the relationship follows a monotonic curve, usually a version of the median-effect equation. However, the null hypothesis of no relationship cannot be statistically tested using this equation. We used a linearized version of this equation to define a measure of pharmacological effect size, and use this measure to rank the investigated compounds in order of their overall capability to produce strong responses. The null hypothesis that none of the examined doses of a particular compound produced a strong response can be tested with this approach. The proposed approach is based on a new statistical model of the important concept of response detection limit, a concept that is usually neglected in the analysis of dose-response data with continuous responses. The methodology is illustrated with data from a study searching for compounds that neutralize the infection by a human immunodeficiency virus of brain glioblastoma cells. PMID:24905187
Testing the null hypothesis: the forgotten legacy of Karl Popper?
Wilkinson, Mick
2013-01-01
Testing of the null hypothesis is a fundamental aspect of the scientific method and has its basis in the falsification theory of Karl Popper. Null hypothesis testing makes use of deductive reasoning to ensure that the truth of conclusions is irrefutable. In contrast, attempting to demonstrate the new facts on the basis of testing the experimental or research hypothesis makes use of inductive reasoning and is prone to the problem of the Uniformity of Nature assumption described by David Hume in the eighteenth century. Despite this issue and the well documented solution provided by Popper's falsification theory, the majority of publications are still written such that they suggest the research hypothesis is being tested. This is contrary to accepted scientific convention and possibly highlights a poor understanding of the application of conventional significance-based data analysis approaches. Our work should remain driven by conjecture and attempted falsification such that it is always the null hypothesis that is tested. The write up of our studies should make it clear that we are indeed testing the null hypothesis and conforming to the established and accepted philosophical conventions of the scientific method.
Filipiak, Katarzyna; Klein, Daniel; Roy, Anuradha
2017-01-01
The problem of testing the separability of a covariance matrix against an unstructured variance-covariance matrix is studied in the context of multivariate repeated measures data using Rao's score test (RST). The RST statistic is developed with the first component of the separable structure as a first-order autoregressive (AR(1)) correlation matrix or an unstructured (UN) covariance matrix under the assumption of multivariate normality. It is shown that the distribution of the RST statistic under the null hypothesis of any separability does not depend on the true values of the mean or the unstructured components of the separable structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data, where the likelihood ratio test (LRT) cannot be used, and it outperforms the standard LRT in a number of contexts. Monte Carlo simulations are then used to study the comparative behavior of the null distribution of the RST statistic, as well as that of the LRT statistic, in terms of sample size considerations, and for the estimation of the empirical percentiles. Our findings are compared with existing results where the first component of the separable structure is a compound symmetry (CS) correlation matrix. It is also shown by simulations that the empirical null distribution of the RST statistic converges faster than the empirical null distribution of the LRT statistic to the limiting χ 2 distribution. The tests are implemented on a real dataset from medical studies. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Use of Pearson's Chi-Square for Testing Equality of Percentile Profiles across Multiple Populations.
Johnson, William D; Beyl, Robbie A; Burton, Jeffrey H; Johnson, Callie M; Romer, Jacob E; Zhang, Lei
2015-08-01
In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For example, it may be more informative to compare two or more populations with respect to their within population distributions by testing the hypothesis that their corresponding respective 10 th , 50 th , and 90 th percentiles are equal. As a generalization of the median test, the proposed test statistic is asymptotically distributed as Chi-square with degrees of freedom dependent upon the number of percentiles tested and constraints of the null hypothesis. Results from simulation studies are used to validate the nominal 0.05 significance level under the null hypothesis, and asymptotic power properties that are suitable for testing equality of percentile profiles against selected profile discrepancies for a variety of underlying distributions. A pragmatic example is provided to illustrate the comparison of the percentile profiles for four body mass index distributions.
Three common misuses of P values
Kim, Jeehyoung; Bang, Heejung
2016-01-01
“Significance” has a specific meaning in science, especially in statistics. The p-value as a measure of statistical significance (evidence against a null hypothesis) has long been used in statistical inference and has served as a key player in science and research. Despite its clear mathematical definition and original purpose, and being just one of the many statistical measures/criteria, its role has been over-emphasized along with hypothesis testing. Observing and reflecting on this practice, some journals have attempted to ban reporting of p-values, and the American Statistical Association (for the first time in its 177 year old history) released a statement on p-values in 2016. In this article, we intend to review the correct definition of the p-value as well as its common misuses, in the hope that our article is useful to clinicians and researchers. PMID:27695640
The Epistemology of Mathematical and Statistical Modeling: A Quiet Methodological Revolution
ERIC Educational Resources Information Center
Rodgers, Joseph Lee
2010-01-01
A quiet methodological revolution, a modeling revolution, has occurred over the past several decades, almost without discussion. In contrast, the 20th century ended with contentious argument over the utility of null hypothesis significance testing (NHST). The NHST controversy may have been at least partially irrelevant, because in certain ways the…
Spatial autocorrelation in growth of undisturbed natural pine stands across Georgia
Raymond L. Czaplewski; Robin M. Reich; William A. Bechtold
1994-01-01
Moran's I statistic measures the spatial autocorrelation in a random variable measured at discrete locations in space. Permutation procedures test the null hypothesis that the observed Moran's I value is no greater than that expected by chance. The spatial autocorrelation of gross basal area increment is analyzed for undisturbed, naturally regenerated stands...
2014-01-01
Background Thresholds for statistical significance are insufficiently demonstrated by 95% confidence intervals or P-values when assessing results from randomised clinical trials. First, a P-value only shows the probability of getting a result assuming that the null hypothesis is true and does not reflect the probability of getting a result assuming an alternative hypothesis to the null hypothesis is true. Second, a confidence interval or a P-value showing significance may be caused by multiplicity. Third, statistical significance does not necessarily result in clinical significance. Therefore, assessment of intervention effects in randomised clinical trials deserves more rigour in order to become more valid. Methods Several methodologies for assessing the statistical and clinical significance of intervention effects in randomised clinical trials were considered. Balancing simplicity and comprehensiveness, a simple five-step procedure was developed. Results For a more valid assessment of results from a randomised clinical trial we propose the following five-steps: (1) report the confidence intervals and the exact P-values; (2) report Bayes factor for the primary outcome, being the ratio of the probability that a given trial result is compatible with a ‘null’ effect (corresponding to the P-value) divided by the probability that the trial result is compatible with the intervention effect hypothesised in the sample size calculation; (3) adjust the confidence intervals and the statistical significance threshold if the trial is stopped early or if interim analyses have been conducted; (4) adjust the confidence intervals and the P-values for multiplicity due to number of outcome comparisons; and (5) assess clinical significance of the trial results. Conclusions If the proposed five-step procedure is followed, this may increase the validity of assessments of intervention effects in randomised clinical trials. PMID:24588900
Perneger, Thomas V; Combescure, Christophe
2017-07-01
Published P-values provide a window into the global enterprise of medical research. The aim of this study was to use the distribution of published P-values to estimate the relative frequencies of null and alternative hypotheses and to seek irregularities suggestive of publication bias. This cross-sectional study included P-values published in 120 medical research articles in 2016 (30 each from the BMJ, JAMA, Lancet, and New England Journal of Medicine). The observed distribution of P-values was compared with expected distributions under the null hypothesis (i.e., uniform between 0 and 1) and the alternative hypothesis (strictly decreasing from 0 to 1). P-values were categorized according to conventional levels of statistical significance and in one-percent intervals. Among 4,158 recorded P-values, 26.1% were highly significant (P < 0.001), 9.1% were moderately significant (P ≥ 0.001 to < 0.01), 11.7% were weakly significant (P ≥ 0.01 to < 0.05), and 53.2% were nonsignificant (P ≥ 0.05). We noted three irregularities: (1) high proportion of P-values <0.001, especially in observational studies, (2) excess of P-values equal to 1, and (3) about twice as many P-values less than 0.05 compared with those more than 0.05. The latter finding was seen in both randomized trials and observational studies, and in most types of analyses, excepting heterogeneity tests and interaction tests. Under plausible assumptions, we estimate that about half of the tested hypotheses were null and the other half were alternative. This analysis suggests that statistical tests published in medical journals are not a random sample of null and alternative hypotheses but that selective reporting is prevalent. In particular, significant results are about twice as likely to be reported as nonsignificant results. Copyright © 2017 Elsevier Inc. All rights reserved.
Assessment of resampling methods for causality testing: A note on the US inflation behavior
Kyrtsou, Catherine; Kugiumtzis, Dimitris; Diks, Cees
2017-01-01
Different resampling methods for the null hypothesis of no Granger causality are assessed in the setting of multivariate time series, taking into account that the driving-response coupling is conditioned on the other observed variables. As appropriate test statistic for this setting, the partial transfer entropy (PTE), an information and model-free measure, is used. Two resampling techniques, time-shifted surrogates and the stationary bootstrap, are combined with three independence settings (giving a total of six resampling methods), all approximating the null hypothesis of no Granger causality. In these three settings, the level of dependence is changed, while the conditioning variables remain intact. The empirical null distribution of the PTE, as the surrogate and bootstrapped time series become more independent, is examined along with the size and power of the respective tests. Additionally, we consider a seventh resampling method by contemporaneously resampling the driving and the response time series using the stationary bootstrap. Although this case does not comply with the no causality hypothesis, one can obtain an accurate sampling distribution for the mean of the test statistic since its value is zero under H0. Results indicate that as the resampling setting gets more independent, the test becomes more conservative. Finally, we conclude with a real application. More specifically, we investigate the causal links among the growth rates for the US CPI, money supply and crude oil. Based on the PTE and the seven resampling methods, we consistently find that changes in crude oil cause inflation conditioning on money supply in the post-1986 period. However this relationship cannot be explained on the basis of traditional cost-push mechanisms. PMID:28708870
Assessment of resampling methods for causality testing: A note on the US inflation behavior.
Papana, Angeliki; Kyrtsou, Catherine; Kugiumtzis, Dimitris; Diks, Cees
2017-01-01
Different resampling methods for the null hypothesis of no Granger causality are assessed in the setting of multivariate time series, taking into account that the driving-response coupling is conditioned on the other observed variables. As appropriate test statistic for this setting, the partial transfer entropy (PTE), an information and model-free measure, is used. Two resampling techniques, time-shifted surrogates and the stationary bootstrap, are combined with three independence settings (giving a total of six resampling methods), all approximating the null hypothesis of no Granger causality. In these three settings, the level of dependence is changed, while the conditioning variables remain intact. The empirical null distribution of the PTE, as the surrogate and bootstrapped time series become more independent, is examined along with the size and power of the respective tests. Additionally, we consider a seventh resampling method by contemporaneously resampling the driving and the response time series using the stationary bootstrap. Although this case does not comply with the no causality hypothesis, one can obtain an accurate sampling distribution for the mean of the test statistic since its value is zero under H0. Results indicate that as the resampling setting gets more independent, the test becomes more conservative. Finally, we conclude with a real application. More specifically, we investigate the causal links among the growth rates for the US CPI, money supply and crude oil. Based on the PTE and the seven resampling methods, we consistently find that changes in crude oil cause inflation conditioning on money supply in the post-1986 period. However this relationship cannot be explained on the basis of traditional cost-push mechanisms.
NASA Astrophysics Data System (ADS)
Al-Sarrani, Nauaf
The purpose of this study was to obtain Science faculty concerns and professional development needs to adopt blended learning in their teaching at Taibah University. To answer these two research questions the survey instrument was designed to collect quantitative and qualitative data from close-ended and open-ended questions. The participants' general characteristics were first presented, then the quantitative measures were presented as the results of the null hypotheses. The data analysis for research question one revealed a statistically significant difference in the participants' concerns in adopting BL by their gender sig = .0015. The significances were found in stages one (sig = .000) and stage five (sig = .006) for female faculty. Therefore, null hypothesis 1.1 was rejected (There are no statistically significant differences between science faculty's gender and their concerns in adopting BL). The data analysis indicated also that there were no relationships between science faculty's age, academic rank, nationality, country of graduation and years of teaching experience and their concerns in adopting BL in their teaching, so the null hypotheses 1.2-7 were accepted (There are no statistically significant differences between Science faculty's age and their concerns in adopting BL, there are no statistically significant differences between Science faculty's academic rank and their concerns in adopting BL, there are no statistically significant differences between Science faculty's nationality and their concerns in adopting BL, there are no statistically significant differences between Science faculty's content area and their concerns in adopting BL, there are no statistically significant differences between Science faculty's country of graduation and their concerns in adopting BL and there are no statistically significant differences between Science faculty's years of teaching experience and their concerns in adopting BL). The data analyses for research question two revealed that there was a statistically significant difference between science faculty's use of technology in teaching by department and their attitudes towards technology integration in the Science curriculum. Lambda MANOVA test result was sig =.019 at the alpha = .05 level. Follow up ANOVA result indicated that Chemistry department was significant in the use of computer-based technology (sig =.049) and instructional technology use (sig =.041). Therefore, null hypothesis 2.1 was rejected (There are no statistically significant differences between science faculty's attitudes towards technology integration in the Science curriculum and faculty's use of technology in teaching by department). The data also revealed that there was no statistically significant difference (p<.05) between science faculty's use of technology in teaching by department and their instructional technology use on pedagogy. Therefore, null hypothesis 2.2 was accepted (There are no statistically significant differences between science faculty's perceptions of the effects of faculty IT use on pedagogy and faculty's use of technology in teaching by department). The data also revealed that there was a statistically significant difference between science faculty's use of technology in teaching by department and their professional development needs in adopting BL. Lambda MANOVA test result was .007 at the alpha = .05 level. The follow up ANOVA results showed that the value of significance of Science faculty's professional development needs for adopting BL was smaller than .05 in the Chemistry department with sig =.001 in instructional technology use. Therefore, null hypothesis 2.3 was rejected (There are no statistically significant differences between Science faculty's perceptions of technology professional development needs and faculty's use of technology in teaching by department). Qualitative measures included analyzing data based on answers to three open-ended questions, numbers thirty-six, seventy-four, and seventy-five. These three questions were on blended learning concerns comments (question 36, which had 10 units), professional development activities, support, or incentive requested (question 74, which had 28 units), and the most important professional development activities, support, or incentive (question 75, which had 37 units). These questions yielded 75 units, 23 categories and 8 themes that triangulated with the quantitative data. These 8 themes were then combined to obtain overall themes for all qualitative questions in the study. The two most important themes were "Professional development" with three categories; Professional development through workshops (10 units), Workshops (10 units), Professional development (5 units) and the second overall theme was "Technical support" with two categories: Internet connectivity (4 units), and Technical support (4 units). Finally, based on quantitative and qualitative data, the summary, conclusions, and recommendations for Taibah University regarding faculty adoption of BL in teaching were presented. The recommendations for future studies focused on Science faculty Level of Use and technology use in Saudi universities.
A General Class of Test Statistics for Van Valen’s Red Queen Hypothesis
Wiltshire, Jelani; Huffer, Fred W.; Parker, William C.
2014-01-01
Van Valen’s Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen’s work, various statistical approaches have been used to address the relationship between taxon age and the rate of extinction. We propose a general class of test statistics that can be used to test for the effect of age on the rate of extinction. These test statistics allow for a varying background rate of extinction and attempt to remove the effects of other covariates when assessing the effect of age on extinction. No model is assumed for the covariate effects. Instead we control for covariate effects by pairing or grouping together similar species. Simulations are used to compare the power of the statistics. We apply the test statistics to data on Foram extinctions and find that age has a positive effect on the rate of extinction. A derivation of the null distribution of one of the test statistics is provided in the supplementary material. PMID:24910489
A General Class of Test Statistics for Van Valen's Red Queen Hypothesis.
Wiltshire, Jelani; Huffer, Fred W; Parker, William C
2014-09-01
Van Valen's Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen's work, various statistical approaches have been used to address the relationship between taxon age and the rate of extinction. We propose a general class of test statistics that can be used to test for the effect of age on the rate of extinction. These test statistics allow for a varying background rate of extinction and attempt to remove the effects of other covariates when assessing the effect of age on extinction. No model is assumed for the covariate effects. Instead we control for covariate effects by pairing or grouping together similar species. Simulations are used to compare the power of the statistics. We apply the test statistics to data on Foram extinctions and find that age has a positive effect on the rate of extinction. A derivation of the null distribution of one of the test statistics is provided in the supplementary material.
A more powerful test based on ratio distribution for retention noninferiority hypothesis.
Deng, Ling; Chen, Gang
2013-03-11
Rothmann et al. ( 2003 ) proposed a method for the statistical inference of fraction retention noninferiority (NI) hypothesis. A fraction retention hypothesis is defined as a ratio of the new treatment effect verse the control effect in the context of a time to event endpoint. One of the major concerns using this method in the design of an NI trial is that with a limited sample size, the power of the study is usually very low. This makes an NI trial not applicable particularly when using time to event endpoint. To improve power, Wang et al. ( 2006 ) proposed a ratio test based on asymptotic normality theory. Under a strong assumption (equal variance of the NI test statistic under null and alternative hypotheses), the sample size using Wang's test was much smaller than that using Rothmann's test. However, in practice, the assumption of equal variance is generally questionable for an NI trial design. This assumption is removed in the ratio test proposed in this article, which is derived directly from a Cauchy-like ratio distribution. In addition, using this method, the fundamental assumption used in Rothmann's test, that the observed control effect is always positive, that is, the observed hazard ratio for placebo over the control is greater than 1, is no longer necessary. Without assuming equal variance under null and alternative hypotheses, the sample size required for an NI trial can be significantly reduced if using the proposed ratio test for a fraction retention NI hypothesis.
Yang, Yang; DeGruttola, Victor
2016-01-01
Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients. PMID:22740584
Yang, Yang; DeGruttola, Victor
2012-06-22
Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients.
A shift from significance test to hypothesis test through power analysis in medical research.
Singh, G
2006-01-01
Medical research literature until recently, exhibited substantial dominance of the Fisher's significance test approach of statistical inference concentrating more on probability of type I error over Neyman-Pearson's hypothesis test considering both probability of type I and II error. Fisher's approach dichotomises results into significant or not significant results with a P value. The Neyman-Pearson's approach talks of acceptance or rejection of null hypothesis. Based on the same theory these two approaches deal with same objective and conclude in their own way. The advancement in computing techniques and availability of statistical software have resulted in increasing application of power calculations in medical research and thereby reporting the result of significance tests in the light of power of the test also. Significance test approach, when it incorporates power analysis contains the essence of hypothesis test approach. It may be safely argued that rising application of power analysis in medical research may have initiated a shift from Fisher's significance test to Neyman-Pearson's hypothesis test procedure.
ERIC Educational Resources Information Center
Hoekstra, Rink; Johnson, Addie; Kiers, Henk A. L.
2012-01-01
The use of confidence intervals (CIs) as an addition or as an alternative to null hypothesis significance testing (NHST) has been promoted as a means to make researchers more aware of the uncertainty that is inherent in statistical inference. Little is known, however, about whether presenting results via CIs affects how readers judge the…
ERIC Educational Resources Information Center
Dunkel, Curtis S.; Harbke, Colin R.; Papini, Dennis R.
2009-01-01
The authors proposed that birth order affects psychosocial outcomes through differential investment from parent to child and differences in the degree of identification from child to parent. The authors conducted this study to test these 2 models. Despite the use of statistical and methodological procedures to increase sensitivity and reduce…
A Ratio Test of Interrater Agreement with High Specificity
ERIC Educational Resources Information Center
Cousineau, Denis; Laurencelle, Louis
2015-01-01
Existing tests of interrater agreements have high statistical power; however, they lack specificity. If the ratings of the two raters do not show agreement but are not random, the current tests, some of which are based on Cohen's kappa, will often reject the null hypothesis, leading to the wrong conclusion that agreement is present. A new test of…
One-way ANOVA based on interval information
NASA Astrophysics Data System (ADS)
Hesamian, Gholamreza
2016-08-01
This paper deals with extending the one-way analysis of variance (ANOVA) to the case where the observed data are represented by closed intervals rather than real numbers. In this approach, first a notion of interval random variable is introduced. Especially, a normal distribution with interval parameters is introduced to investigate hypotheses about the equality of interval means or test the homogeneity of interval variances assumption. Moreover, the least significant difference (LSD method) for investigating multiple comparison of interval means is developed when the null hypothesis about the equality of means is rejected. Then, at a given interval significance level, an index is applied to compare the interval test statistic and the related interval critical value as a criterion to accept or reject the null interval hypothesis of interest. Finally, the method of decision-making leads to some degrees to accept or reject the interval hypotheses. An applied example will be used to show the performance of this method.
A Test by Any Other Name: P Values, Bayes Factors, and Statistical Inference.
Stern, Hal S
2016-01-01
Procedures used for statistical inference are receiving increased scrutiny as the scientific community studies the factors associated with insuring reproducible research. This note addresses recent negative attention directed at p values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye toward better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required.
Developing the research hypothesis.
Toledo, Alexander H; Flikkema, Robert; Toledo-Pereyra, Luis H
2011-01-01
The research hypothesis is needed for a sound and well-developed research study. The research hypothesis contributes to the solution of the research problem. Types of research hypotheses include inductive and deductive, directional and non-directional, and null and alternative hypotheses. Rejecting the null hypothesis and accepting the alternative hypothesis is the basis for building a good research study. This work reviews the most important aspects of organizing and establishing an efficient and complete hypothesis.
Detecting anomalies in CMB maps: a new method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Neelakanta, Jayanth T., E-mail: jayanthtn@gmail.com
2015-10-01
Ever since WMAP announced its first results, different analyses have shown that there is weak evidence for several large-scale anomalies in the CMB data. While the evidence for each anomaly appears to be weak, the fact that there are multiple seemingly unrelated anomalies makes it difficult to account for them via a single statistical fluke. So, one is led to considering a combination of these anomalies. But, if we ''hand-pick'' the anomalies (test statistics) to consider, we are making an a posteriori choice. In this article, we propose two statistics that do not suffer from this problem. The statistics aremore » linear and quadratic combinations of the a{sub ℓ m}'s with random co-efficients, and they test the null hypothesis that the a{sub ℓ m}'s are independent, normally-distributed, zero-mean random variables with an m-independent variance. The motivation for considering multiple modes is this: because most physical models that lead to large-scale anomalies result in coupling multiple ℓ and m modes, the ''coherence'' of this coupling should get enhanced if a combination of different modes is considered. In this sense, the statistics are thus much more generic than those that have been hitherto considered in literature. Using fiducial data, we demonstrate that the method works and discuss how it can be used with actual CMB data to make quite general statements about the incompatibility of the data with the null hypothesis.« less
An asymptotic analysis of the logrank test.
Strawderman, R L
1997-01-01
Asymptotic expansions for the null distribution of the logrank statistic and its distribution under local proportional hazards alternatives are developed in the case of iid observations. The results, which are derived from the work of Gu (1992) and Taniguchi (1992), are easy to interpret, and provide some theoretical justification for many behavioral characteristics of the logrank test that have been previously observed in simulation studies. We focus primarily upon (i) the inadequacy of the usual normal approximation under treatment group imbalance; and, (ii) the effects of treatment group imbalance on power and sample size calculations. A simple transformation of the logrank statistic is also derived based on results in Konishi (1991) and is found to substantially improve the standard normal approximation to its distribution under the null hypothesis of no survival difference when there is treatment group imbalance.
After p Values: The New Statistics for Undergraduate Neuroscience Education.
Calin-Jageman, Robert J
2017-01-01
Statistical inference is a methodological cornerstone for neuroscience education. For many years this has meant inculcating neuroscience majors into null hypothesis significance testing with p values. There is increasing concern, however, about the pervasive misuse of p values. It is time to start planning statistics curricula for neuroscience majors that replaces or de-emphasizes p values. One promising alternative approach is what Cumming has dubbed the "New Statistics", an approach that emphasizes effect sizes, confidence intervals, meta-analysis, and open science. I give an example of the New Statistics in action and describe some of the key benefits of adopting this approach in neuroscience education.
The potential for increased power from combining P-values testing the same hypothesis.
Ganju, Jitendra; Julie Ma, Guoguang
2017-02-01
The conventional approach to hypothesis testing for formal inference is to prespecify a single test statistic thought to be optimal. However, we usually have more than one test statistic in mind for testing the null hypothesis of no treatment effect but we do not know which one is the most powerful. Rather than relying on a single p-value, combining p-values from prespecified multiple test statistics can be used for inference. Combining functions include Fisher's combination test and the minimum p-value. Using randomization-based tests, the increase in power can be remarkable when compared with a single test and Simes's method. The versatility of the method is that it also applies when the number of covariates exceeds the number of observations. The increase in power is large enough to prefer combined p-values over a single p-value. The limitation is that the method does not provide an unbiased estimator of the treatment effect and does not apply to situations when the model includes treatment by covariate interaction.
Taroni, F; Biedermann, A; Bozza, S
2016-02-01
Many people regard the concept of hypothesis testing as fundamental to inferential statistics. Various schools of thought, in particular frequentist and Bayesian, have promoted radically different solutions for taking a decision about the plausibility of competing hypotheses. Comprehensive philosophical comparisons about their advantages and drawbacks are widely available and continue to span over large debates in the literature. More recently, controversial discussion was initiated by an editorial decision of a scientific journal [1] to refuse any paper submitted for publication containing null hypothesis testing procedures. Since the large majority of papers published in forensic journals propose the evaluation of statistical evidence based on the so called p-values, it is of interest to expose the discussion of this journal's decision within the forensic science community. This paper aims to provide forensic science researchers with a primer on the main concepts and their implications for making informed methodological choices. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research.
Amrhein, Valentin; Korner-Nievergelt, Fränzi; Roth, Tobias
2017-01-01
The widespread use of 'statistical significance' as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (according to the American Statistical Association). We review why degrading p -values into 'significant' and 'nonsignificant' contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p -values at face value, but mistrust results with larger p -values. In either case, p -values tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance ( p ≤ 0.05) is hardly replicable: at a good statistical power of 80%, two studies will be 'conflicting', meaning that one is significant and the other is not, in one third of the cases if there is a true effect. A replication can therefore not be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgment based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to selective reporting and to publication bias against nonsignificant findings. Data dredging, p -hacking, and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p -values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p -values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that 'there is no effect'. Information on possible true effect sizes that are compatible with the data must be obtained from the point estimate, e.g., from a sample average, and from the interval estimate, such as a confidence interval. We review how confusion about interpretation of larger p -values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, for example that decision rules should rather be more stringent, that sample sizes could decrease, or that p -values should better be completely abandoned. We conclude that whatever method of statistical inference we use, dichotomous threshold thinking must give way to non-automated informed judgment.
The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research
Korner-Nievergelt, Fränzi; Roth, Tobias
2017-01-01
The widespread use of ‘statistical significance’ as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (according to the American Statistical Association). We review why degrading p-values into ‘significant’ and ‘nonsignificant’ contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p ≤ 0.05) is hardly replicable: at a good statistical power of 80%, two studies will be ‘conflicting’, meaning that one is significant and the other is not, in one third of the cases if there is a true effect. A replication can therefore not be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgment based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to selective reporting and to publication bias against nonsignificant findings. Data dredging, p-hacking, and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that ‘there is no effect’. Information on possible true effect sizes that are compatible with the data must be obtained from the point estimate, e.g., from a sample average, and from the interval estimate, such as a confidence interval. We review how confusion about interpretation of larger p-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, for example that decision rules should rather be more stringent, that sample sizes could decrease, or that p-values should better be completely abandoned. We conclude that whatever method of statistical inference we use, dichotomous threshold thinking must give way to non-automated informed judgment. PMID:28698825
ERIC Educational Resources Information Center
Harris, Cydnie Ellen Smith
2012-01-01
The effect of the leadership style of the secondary school principal on student achievement in select public schools in Louisiana was examined in this study. The null hypothesis was that there was no statistically significant difference between principal leadership style and student academic achievement. The researcher submitted the LEAD-Self…
A Hands-On Exercise Improves Understanding of the Standard Error of the Mean
ERIC Educational Resources Information Center
Ryan, Robert S.
2006-01-01
One of the most difficult concepts for statistics students is the standard error of the mean. To improve understanding of this concept, 1 group of students used a hands-on procedure to sample from small populations representing either a true or false null hypothesis. The distribution of 120 sample means (n = 3) from each population had standard…
NASA Astrophysics Data System (ADS)
Cianciara, Aleksander
2016-09-01
The paper presents the results of research aimed at verifying the hypothesis that the Weibull distribution is an appropriate statistical distribution model of microseismicity emission characteristics, namely: energy of phenomena and inter-event time. It is understood that the emission under consideration is induced by the natural rock mass fracturing. Because the recorded emission contain noise, therefore, it is subjected to an appropriate filtering. The study has been conducted using the method of statistical verification of null hypothesis that the Weibull distribution fits the empirical cumulative distribution function. As the model describing the cumulative distribution function is given in an analytical form, its verification may be performed using the Kolmogorov-Smirnov goodness-of-fit test. Interpretations by means of probabilistic methods require specifying the correct model describing the statistical distribution of data. Because in these methods measurement data are not used directly, but their statistical distributions, e.g., in the method based on the hazard analysis, or in that that uses maximum value statistics.
Deblauwe, Vincent; Kennel, Pol; Couteron, Pierre
2012-01-01
Background Independence between observations is a standard prerequisite of traditional statistical tests of association. This condition is, however, violated when autocorrelation is present within the data. In the case of variables that are regularly sampled in space (i.e. lattice data or images), such as those provided by remote-sensing or geographical databases, this problem is particularly acute. Because analytic derivation of the null probability distribution of the test statistic (e.g. Pearson's r) is not always possible when autocorrelation is present, we propose instead the use of a Monte Carlo simulation with surrogate data. Methodology/Principal Findings The null hypothesis that two observed mapped variables are the result of independent pattern generating processes is tested here by generating sets of random image data while preserving the autocorrelation function of the original images. Surrogates are generated by matching the dual-tree complex wavelet spectra (and hence the autocorrelation functions) of white noise images with the spectra of the original images. The generated images can then be used to build the probability distribution function of any statistic of association under the null hypothesis. We demonstrate the validity of a statistical test of association based on these surrogates with both actual and synthetic data and compare it with a corrected parametric test and three existing methods that generate surrogates (randomization, random rotations and shifts, and iterative amplitude adjusted Fourier transform). Type I error control was excellent, even with strong and long-range autocorrelation, which is not the case for alternative methods. Conclusions/Significance The wavelet-based surrogates are particularly appropriate in cases where autocorrelation appears at all scales or is direction-dependent (anisotropy). We explore the potential of the method for association tests involving a lattice of binary data and discuss its potential for validation of species distribution models. An implementation of the method in Java for the generation of wavelet-based surrogates is available online as supporting material. PMID:23144961
An alternative approach to confidence interval estimation for the win ratio statistic.
Luo, Xiaodong; Tian, Hong; Mohanty, Surya; Tsai, Wei Yann
2015-03-01
Pocock et al. (2012, European Heart Journal 33, 176-182) proposed a win ratio approach to analyzing composite endpoints comprised of outcomes with different clinical priorities. In this article, we establish a statistical framework for this approach. We derive the null hypothesis and propose a closed-form variance estimator for the win ratio statistic in all pairwise matching situation. Our simulation study shows that the proposed variance estimator performs well regardless of the magnitude of treatment effect size and the type of the joint distribution of the outcomes. © 2014, The International Biometric Society.
A Closer Look at Data Independence: Comment on “Lies, Damned Lies, and Statistics (in Geology)”
NASA Astrophysics Data System (ADS)
Kravtsov, Sergey; Saunders, Rolando Olivas
2011-02-01
In his Forum (Eos, 90(47), 443, doi:10.1029/2009EO470004, 2009), P. Vermeesch suggests that statistical tests are not fit to interpret long data records. He asserts that for large enough data sets any true null hypothesis will always be rejected. This is certainly not the case! Here we revisit this author's example of weekly distribution of earthquakes and show that statistical results support the commonsense expectation that seismic activity does not depend on weekday (see the online supplement to this Eos issue for details (http://www.agu.org/eos_elec/)).
Goovaerts, Pierre; Jacquez, Geoffrey M
2004-01-01
Background Complete Spatial Randomness (CSR) is the null hypothesis employed by many statistical tests for spatial pattern, such as local cluster or boundary analysis. CSR is however not a relevant null hypothesis for highly complex and organized systems such as those encountered in the environmental and health sciences in which underlying spatial pattern is present. This paper presents a geostatistical approach to filter the noise caused by spatially varying population size and to generate spatially correlated neutral models that account for regional background obtained by geostatistical smoothing of observed mortality rates. These neutral models were used in conjunction with the local Moran statistics to identify spatial clusters and outliers in the geographical distribution of male and female lung cancer in Nassau, Queens, and Suffolk counties, New York, USA. Results We developed a typology of neutral models that progressively relaxes the assumptions of null hypotheses, allowing for the presence of spatial autocorrelation, non-uniform risk, and incorporation of spatially heterogeneous population sizes. Incorporation of spatial autocorrelation led to fewer significant ZIP codes than found in previous studies, confirming earlier claims that CSR can lead to over-identification of the number of significant spatial clusters or outliers. Accounting for population size through geostatistical filtering increased the size of clusters while removing most of the spatial outliers. Integration of regional background into the neutral models yielded substantially different spatial clusters and outliers, leading to the identification of ZIP codes where SMR values significantly depart from their regional background. Conclusion The approach presented in this paper enables researchers to assess geographic relationships using appropriate null hypotheses that account for the background variation extant in real-world systems. In particular, this new methodology allows one to identify geographic pattern above and beyond background variation. The implementation of this approach in spatial statistical software will facilitate the detection of spatial disparities in mortality rates, establishing the rationale for targeted cancer control interventions, including consideration of health services needs, and resource allocation for screening and diagnostic testing. It will allow researchers to systematically evaluate how sensitive their results are to assumptions implicit under alternative null hypotheses. PMID:15272930
ERIC Educational Resources Information Center
Martuza, Victor R.; Engel, John D.
Results from classical power analysis (Brewer, 1972) suggest that a researcher should not set a=p (when p is less than a) in a posteriori fashion when a study yields statistically significant results because of a resulting decrease in power. The purpose of the present report is to use Bayesian theory in examining the validity of this…
Capturing the Full Potential of the Synthetic Theater Operations Research Model (STORM)
2014-09-01
Lucas Thesis Co-Advisor Dashi I. Singham Thesis Co-Advisor Rachel Silvestrini Second Reader Robert F. Dell Chair, Department of...significance for which the observed data indicates that the null hypothesis should be rejected (Wackerly, Mendenhall III, & Scheaffer, 2008. The vast...pdf Wackerly, D., Mendenhall III, W., & Scheaffer, R. (2008). Mathematical statistics with applications. Belmont, CA: Brooks/Cole. 75 INITIAL
Chiba, Yasutaka
2017-09-01
Fisher's exact test is commonly used to compare two groups when the outcome is binary in randomized trials. In the context of causal inference, this test explores the sharp causal null hypothesis (i.e. the causal effect of treatment is the same for all subjects), but not the weak causal null hypothesis (i.e. the causal risks are the same in the two groups). Therefore, in general, rejection of the null hypothesis by Fisher's exact test does not mean that the causal risk difference is not zero. Recently, Chiba (Journal of Biometrics and Biostatistics 2015; 6: 244) developed a new exact test for the weak causal null hypothesis when the outcome is binary in randomized trials; the new test is not based on any large sample theory and does not require any assumption. In this paper, we extend the new test; we create a version of the test applicable to a stratified analysis. The stratified exact test that we propose is general in nature and can be used in several approaches toward the estimation of treatment effects after adjusting for stratification factors. The stratified Fisher's exact test of Jung (Biometrical Journal 2014; 56: 129-140) tests the sharp causal null hypothesis. This test applies a crude estimator of the treatment effect and can be regarded as a special case of our proposed exact test. Our proposed stratified exact test can be straightforwardly extended to analysis of noninferiority trials and to construct the associated confidence interval. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A single test for rejecting the null hypothesis in subgroups and in the overall sample.
Lin, Yunzhi; Zhou, Kefei; Ganju, Jitendra
2017-01-01
In clinical trials, some patient subgroups are likely to demonstrate larger effect sizes than other subgroups. For example, the effect size, or informally the benefit with treatment, is often greater in patients with a moderate condition of a disease than in those with a mild condition. A limitation of the usual method of analysis is that it does not incorporate this ordering of effect size by patient subgroup. We propose a test statistic which supplements the conventional test by including this information and simultaneously tests the null hypothesis in pre-specified subgroups and in the overall sample. It results in more power than the conventional test when the differences in effect sizes across subgroups are at least moderately large; otherwise it loses power. The method involves combining p-values from models fit to pre-specified subgroups and the overall sample in a manner that assigns greater weight to subgroups in which a larger effect size is expected. Results are presented for randomized trials with two and three subgroups.
Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies.
Sayyari, Erfan; Mirarab, Siavash
2018-02-28
Phylogenetic species trees typically represent the speciation history as a bifurcating tree. Speciation events that simultaneously create more than two descendants, thereby creating polytomies in the phylogeny, are possible. Moreover, the inability to resolve relationships is often shown as a (soft) polytomy. Both types of polytomies have been traditionally studied in the context of gene tree reconstruction from sequence data. However, polytomies in the species tree cannot be detected or ruled out without considering gene tree discordance. In this paper, we describe a statistical test based on properties of the multi-species coalescent model to test the null hypothesis that a branch in an estimated species tree should be replaced by a polytomy. On both simulated and biological datasets, we show that the null hypothesis is rejected for all but the shortest branches, and in most cases, it is retained for true polytomies. The test, available as part of the Accurate Species TRee ALgorithm (ASTRAL) package, can help systematists decide whether their datasets are sufficient to resolve specific relationships of interest.
Accuracy of maxillary positioning after standard and inverted orthognathic sequencing.
Ritto, Fabio G; Ritto, Thiago G; Ribeiro, Danilo Passeado; Medeiros, Paulo José; de Moraes, Márcio
2014-05-01
This study aimed to compare the accuracy of maxillary positioning after bimaxillary orthognathic surgery, using 2 sequences. A total of 80 cephalograms (40 preoperative and 40 postoperative) from 40 patients were analyzed. Group 1 included radiographs of patients submitted to conventional sequence, whereas group 2 patients were submitted to inverted sequence. The final position of the maxillary central incisor was obtained after vertical and horizontal measurements of the tracings, and it was compared with what had been planned. The null hypothesis, which stated that there would be no difference between the groups, was tested. After applying the Welch t test for comparison of mean differences between maxillary desired and achieved position, considering a statistical significance of 5% and a 2-tailed test, the null hypothesis was not rejected (P > .05). Thus, there was no difference in the accuracy of maxillary positioning between groups. Conventional and inverted sequencing proved to be reliable in positioning the maxilla after LeFort I osteotomy in bimaxillary orthognathic surgeries. Copyright © 2014 Elsevier Inc. All rights reserved.
Calculating p-values and their significances with the Energy Test for large datasets
NASA Astrophysics Data System (ADS)
Barter, W.; Burr, C.; Parkes, C.
2018-04-01
The energy test method is a multi-dimensional test of whether two samples are consistent with arising from the same underlying population, through the calculation of a single test statistic (called the T-value). The method has recently been used in particle physics to search for samples that differ due to CP violation. The generalised extreme value function has previously been used to describe the distribution of T-values under the null hypothesis that the two samples are drawn from the same underlying population. We show that, in a simple test case, the distribution is not sufficiently well described by the generalised extreme value function. We present a new method, where the distribution of T-values under the null hypothesis when comparing two large samples can be found by scaling the distribution found when comparing small samples drawn from the same population. This method can then be used to quickly calculate the p-values associated with the results of the test.
Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies
Sayyari, Erfan
2018-01-01
Phylogenetic species trees typically represent the speciation history as a bifurcating tree. Speciation events that simultaneously create more than two descendants, thereby creating polytomies in the phylogeny, are possible. Moreover, the inability to resolve relationships is often shown as a (soft) polytomy. Both types of polytomies have been traditionally studied in the context of gene tree reconstruction from sequence data. However, polytomies in the species tree cannot be detected or ruled out without considering gene tree discordance. In this paper, we describe a statistical test based on properties of the multi-species coalescent model to test the null hypothesis that a branch in an estimated species tree should be replaced by a polytomy. On both simulated and biological datasets, we show that the null hypothesis is rejected for all but the shortest branches, and in most cases, it is retained for true polytomies. The test, available as part of the Accurate Species TRee ALgorithm (ASTRAL) package, can help systematists decide whether their datasets are sufficient to resolve specific relationships of interest. PMID:29495636
Silva, Ivair R
2018-01-15
Type I error probability spending functions are commonly used for designing sequential analysis of binomial data in clinical trials, but it is also quickly emerging for near-continuous sequential analysis of post-market drug and vaccine safety surveillance. It is well known that, for clinical trials, when the null hypothesis is not rejected, it is still important to minimize the sample size. Unlike in post-market drug and vaccine safety surveillance, that is not important. In post-market safety surveillance, specially when the surveillance involves identification of potential signals, the meaningful statistical performance measure to be minimized is the expected sample size when the null hypothesis is rejected. The present paper shows that, instead of the convex Type I error spending shape conventionally used in clinical trials, a concave shape is more indicated for post-market drug and vaccine safety surveillance. This is shown for both, continuous and group sequential analysis. Copyright © 2017 John Wiley & Sons, Ltd.
Tressoldi, Patrizio E.
2011-01-01
Starting from the famous phrase “extraordinary claims require extraordinary evidence,” we will present the evidence supporting the concept that human visual perception may have non-local properties, in other words, that it may operate beyond the space and time constraints of sensory organs, in order to discuss which criteria can be used to define evidence as extraordinary. This evidence has been obtained from seven databases which are related to six different protocols used to test the reality and the functioning of non-local perception, analyzed using both a frequentist and a new Bayesian meta-analysis statistical procedure. According to a frequentist meta-analysis, the null hypothesis can be rejected for all six protocols even if the effect sizes range from 0.007 to 0.28. According to Bayesian meta-analysis, the Bayes factors provides strong evidence to support the alternative hypothesis (H1) over the null hypothesis (H0), but only for three out of the six protocols. We will discuss whether quantitative psychology can contribute to defining the criteria for the acceptance of new scientific ideas in order to avoid the inconclusive controversies between supporters and opponents. PMID:21713069
Gaus, Wilhelm
2014-09-02
The US National Toxicology Program (NTP) is assessed by a statistician. In the NTP-program groups of rodents are fed for a certain period of time with different doses of the substance that is being investigated. Then the animals are sacrificed and all organs are examined pathologically. Such an investigation facilitates many statistical tests. Technical Report TR 578 on Ginkgo biloba is used as an example. More than 4800 statistical tests are possible with the investigations performed. Due to a thought experiment we expect >240 false significant tests. In actuality, 209 significant pathological findings were reported. The readers of Toxicology Letters should carefully distinguish between confirmative and explorative statistics. A confirmative interpretation of a significant test rejects the null-hypothesis and delivers "statistical proof". It is only allowed if (i) a precise hypothesis was established independently from the data used for the test and (ii) the computed p-values are adjusted for multiple testing if more than one test was performed. Otherwise an explorative interpretation generates a hypothesis. We conclude that NTP-reports - including TR 578 on Ginkgo biloba - deliver explorative statistics, i.e. they generate hypotheses, but do not prove them. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Statistical analysis of particle trajectories in living cells
NASA Astrophysics Data System (ADS)
Briane, Vincent; Kervrann, Charles; Vimond, Myriam
2018-06-01
Recent advances in molecular biology and fluorescence microscopy imaging have made possible the inference of the dynamics of molecules in living cells. Such inference allows us to understand and determine the organization and function of the cell. The trajectories of particles (e.g., biomolecules) in living cells, computed with the help of object tracking methods, can be modeled with diffusion processes. Three types of diffusion are considered: (i) free diffusion, (ii) subdiffusion, and (iii) superdiffusion. The mean-square displacement (MSD) is generally used to discriminate the three types of particle dynamics. We propose here a nonparametric three-decision test as an alternative to the MSD method. The rejection of the null hypothesis, i.e., free diffusion, is accompanied by claims of the direction of the alternative (subdiffusion or superdiffusion). We study the asymptotic behavior of the test statistic under the null hypothesis and under parametric alternatives which are currently considered in the biophysics literature. In addition, we adapt the multiple-testing procedure of Benjamini and Hochberg to fit with the three-decision-test setting, in order to apply the test procedure to a collection of independent trajectories. The performance of our procedure is much better than the MSD method as confirmed by Monte Carlo experiments. The method is demonstrated on real data sets corresponding to protein dynamics observed in fluorescence microscopy.
Krypotos, Angelos-Miltiadis; Klugkist, Irene; Engelhard, Iris M.
2017-01-01
ABSTRACT Threat conditioning procedures have allowed the experimental investigation of the pathogenesis of Post-Traumatic Stress Disorder. The findings of these procedures have also provided stable foundations for the development of relevant intervention programs (e.g. exposure therapy). Statistical inference of threat conditioning procedures is commonly based on p-values and Null Hypothesis Significance Testing (NHST). Nowadays, however, there is a growing concern about this statistical approach, as many scientists point to the various limitations of p-values and NHST. As an alternative, the use of Bayes factors and Bayesian hypothesis testing has been suggested. In this article, we apply this statistical approach to threat conditioning data. In order to enable the easy computation of Bayes factors for threat conditioning data we present a new R package named condir, which can be used either via the R console or via a Shiny application. This article provides both a non-technical introduction to Bayesian analysis for researchers using the threat conditioning paradigm, and the necessary tools for computing Bayes factors easily. PMID:29038683
Nonparametric estimation and testing of fixed effects panel data models
Henderson, Daniel J.; Carroll, Raymond J.; Li, Qi
2009-01-01
In this paper we consider the problem of estimating nonparametric panel data models with fixed effects. We introduce an iterative nonparametric kernel estimator. We also extend the estimation method to the case of a semiparametric partially linear fixed effects model. To determine whether a parametric, semiparametric or nonparametric model is appropriate, we propose test statistics to test between the three alternatives in practice. We further propose a test statistic for testing the null hypothesis of random effects against fixed effects in a nonparametric panel data regression model. Simulations are used to examine the finite sample performance of the proposed estimators and the test statistics. PMID:19444335
Killeen's (2005) "p[subscript rep]" Coefficient: Logical and Mathematical Problems
ERIC Educational Resources Information Center
Maraun, Michael; Gabriel, Stephanie
2010-01-01
In his article, "An Alternative to Null-Hypothesis Significance Tests," Killeen (2005) urged the discipline to abandon the practice of "p[subscript obs]"-based null hypothesis testing and to quantify the signal-to-noise characteristics of experimental outcomes with replication probabilities. He described the coefficient that he…
What Constitutes Science and Scientific Evidence: Roles of Null Hypothesis Testing
ERIC Educational Resources Information Center
Chang, Mark
2017-01-01
We briefly discuss the philosophical basis of science, causality, and scientific evidence, by introducing the hidden but most fundamental principle of science: the similarity principle. The principle's use in scientific discovery is illustrated with Simpson's paradox and other examples. In discussing the value of null hypothesis statistical…
A Bayesian bird's eye view of ‘Replications of important results in social psychology’
Schönbrodt, Felix D.; Yao, Yuling; Gelman, Andrew; Wagenmakers, Eric-Jan
2017-01-01
We applied three Bayesian methods to reanalyse the preregistered contributions to the Social Psychology special issue ‘Replications of Important Results in Social Psychology’ (Nosek & Lakens. 2014 Registered reports: a method to increase the credibility of published results. Soc. Psychol. 45, 137–141. (doi:10.1027/1864-9335/a000192)). First, individual-experiment Bayesian parameter estimation revealed that for directed effect size measures, only three out of 44 central 95% credible intervals did not overlap with zero and fell in the expected direction. For undirected effect size measures, only four out of 59 credible intervals contained values greater than 0.10 (10% of variance explained) and only 19 intervals contained values larger than 0.05. Second, a Bayesian random-effects meta-analysis for all 38 t-tests showed that only one out of the 38 hierarchically estimated credible intervals did not overlap with zero and fell in the expected direction. Third, a Bayes factor hypothesis test was used to quantify the evidence for the null hypothesis against a default one-sided alternative. Only seven out of 60 Bayes factors indicated non-anecdotal support in favour of the alternative hypothesis (BF10>3), whereas 51 Bayes factors indicated at least some support for the null hypothesis. We hope that future analyses of replication success will embrace a more inclusive statistical approach by adopting a wider range of complementary techniques. PMID:28280547
A null model for microbial diversification
Straub, Timothy J.
2017-01-01
Whether prokaryotes (Bacteria and Archaea) are naturally organized into phenotypically and genetically cohesive units comparable to animal or plant species remains contested, frustrating attempts to estimate how many such units there might be, or to identify the ecological roles they play. Analyses of gene sequences in various closely related prokaryotic groups reveal that sequence diversity is typically organized into distinct clusters, and processes such as periodic selection and extensive recombination are understood to be drivers of cluster formation (“speciation”). However, observed patterns are rarely compared with those obtainable with simple null models of diversification under stochastic lineage birth and death and random genetic drift. Via a combination of simulations and analyses of core and phylogenetic marker genes, we show that patterns of diversity for the genera Escherichia, Neisseria, and Borrelia are generally indistinguishable from patterns arising under a null model. We suggest that caution should thus be taken in interpreting observed clustering as a result of selective evolutionary forces. Unknown forces do, however, appear to play a role in Helicobacter pylori, and some individual genes in all groups fail to conform to the null model. Taken together, we recommend the presented birth−death model as a null hypothesis in prokaryotic speciation studies. It is only when the real data are statistically different from the expectations under the null model that some speciation process should be invoked. PMID:28630293
WASP (Write a Scientific Paper) using Excel - 8: t-Tests.
Grech, Victor
2018-06-01
t-Testing is a common component of inferential statistics when comparing two means. This paper explains the central limit theorem and the concept of the null hypothesis as well as types of errors. On the practical side, this paper outlines how different t-tests may be performed in Microsoft Excel, for different purposes, both statically as well as dynamically, with Excel's functions. Copyright © 2018 Elsevier B.V. All rights reserved.
Peltier, Helene; Baagøe, Hans J.; Camphuysen, Kees C. J.; Czeck, Richard; Dabin, Willy; Daniel, Pierre; Deaville, Rob; Haelters, Jan; Jauniaux, Thierry; Jensen, Lasse F.; Jepson, Paul D.; Keijl, Guido O.; Siebert, Ursula; Van Canneyt, Olivier; Ridoux, Vincent
2013-01-01
Ecological indicators for monitoring strategies are expected to combine three major characteristics: ecological significance, statistical credibility, and cost-effectiveness. Strategies based on stranding networks rank highly in cost-effectiveness, but their ecological significance and statistical credibility are disputed. Our present goal is to improve the value of stranding data as population indicator as part of monitoring strategies by constructing the spatial and temporal null hypothesis for strandings. The null hypothesis is defined as: small cetacean distribution and mortality are uniform in space and constant in time. We used a drift model to map stranding probabilities and predict stranding patterns of cetacean carcasses under H0 across the North Sea, the Channel and the Bay of Biscay, for the period 1990–2009. As the most common cetacean occurring in this area, we chose the harbour porpoise Phocoena phocoena for our modelling. The difference between these strandings expected under H0 and observed strandings is defined as the stranding anomaly. It constituted the stranding data series corrected for drift conditions. Seasonal decomposition of stranding anomaly suggested that drift conditions did not explain observed seasonal variations of porpoise strandings. Long-term stranding anomalies increased first in the southern North Sea, the Channel and Bay of Biscay coasts, and finally the eastern North Sea. The hypothesis of changes in porpoise distribution was consistent with local visual surveys, mostly SCANS surveys (1994 and 2005). This new indicator could be applied to cetacean populations across the world and more widely to marine megafauna. PMID:23614031
Bayesian inference for psychology, part IV: parameter estimation and Bayes factors.
Rouder, Jeffrey N; Haaf, Julia M; Vandekerckhove, Joachim
2018-02-01
In the psychological literature, there are two seemingly different approaches to inference: that from estimation of posterior intervals and that from Bayes factors. We provide an overview of each method and show that a salient difference is the choice of models. The two approaches as commonly practiced can be unified with a certain model specification, now popular in the statistics literature, called spike-and-slab priors. A spike-and-slab prior is a mixture of a null model, the spike, with an effect model, the slab. The estimate of the effect size here is a function of the Bayes factor, showing that estimation and model comparison can be unified. The salient difference is that common Bayes factor approaches provide for privileged consideration of theoretically useful parameter values, such as the value corresponding to the null hypothesis, while estimation approaches do not. Both approaches, either privileging the null or not, are useful depending on the goals of the analyst.
Shaping Up the Practice of Null Hypothesis Significance Testing.
ERIC Educational Resources Information Center
Wainer, Howard; Robinson, Daniel H.
2003-01-01
Discusses criticisms of null hypothesis significance testing (NHST), suggesting that historical use of NHST was reasonable, and current users should read Sir Ronald Fisher's applied work. Notes that modifications to NHST and interpretations of its outcomes might better suit the needs of modern science. Concludes that NHST is most often useful as…
Hemenway, David
2009-09-01
Hypothesis testing can be misused and misinterpreted in various ways. Limitations in the research design, for example, can make it almost impossible to reject the null hypothesis that a policy has no effect. This article discusses two examples of such experimental designs and analyses, in which, unfortunately, the researchers touted their null results as strong evidence of no effect.
Shi, Haolun; Yin, Guosheng
2018-02-21
Simon's two-stage design is one of the most commonly used methods in phase II clinical trials with binary endpoints. The design tests the null hypothesis that the response rate is less than an uninteresting level, versus the alternative hypothesis that the response rate is greater than a desirable target level. From a Bayesian perspective, we compute the posterior probabilities of the null and alternative hypotheses given that a promising result is declared in Simon's design. Our study reveals that because the frequentist hypothesis testing framework places its focus on the null hypothesis, a potentially efficacious treatment identified by rejecting the null under Simon's design could have only less than 10% posterior probability of attaining the desirable target level. Due to the indifference region between the null and alternative, rejecting the null does not necessarily mean that the drug achieves the desirable response level. To clarify such ambiguity, we propose a Bayesian enhancement two-stage (BET) design, which guarantees a high posterior probability of the response rate reaching the target level, while allowing for early termination and sample size saving in case that the drug's response rate is smaller than the clinically uninteresting level. Moreover, the BET design can be naturally adapted to accommodate survival endpoints. We conduct extensive simulation studies to examine the empirical performance of our design and present two trial examples as applications. © 2018, The International Biometric Society.
Nature's style: Naturally trendy
Cohn, T.A.; Lins, H.F.
2005-01-01
Hydroclimatological time series often exhibit trends. While trend magnitude can be determined with little ambiguity, the corresponding statistical significance, sometimes cited to bolster scientific and political argument, is less certain because significance depends critically on the null hypothesis which in turn reflects subjective notions about what one expects to see. We consider statistical trend tests of hydroclimatological data in the presence of long-term persistence (LTP). Monte Carlo experiments employing FARIMA models indicate that trend tests which fail to consider LTP greatly overstate the statistical significance of observed trends when LTP is present. A new test is presented that avoids this problem. From a practical standpoint, however, it may be preferable to acknowledge that the concept of statistical significance is meaningless when discussing poorly understood systems.
Nature's style: Naturally trendy
NASA Astrophysics Data System (ADS)
Cohn, Timothy A.; Lins, Harry F.
2005-12-01
Hydroclimatological time series often exhibit trends. While trend magnitude can be determined with little ambiguity, the corresponding statistical significance, sometimes cited to bolster scientific and political argument, is less certain because significance depends critically on the null hypothesis which in turn reflects subjective notions about what one expects to see. We consider statistical trend tests of hydroclimatological data in the presence of long-term persistence (LTP). Monte Carlo experiments employing FARIMA models indicate that trend tests which fail to consider LTP greatly overstate the statistical significance of observed trends when LTP is present. A new test is presented that avoids this problem. From a practical standpoint, however, it may be preferable to acknowledge that the concept of statistical significance is meaningless when discussing poorly understood systems.
On the insignificance of Herschel's sunspot correlation
NASA Astrophysics Data System (ADS)
Love, Jeffrey J.
2013-08-01
We examine William Herschel's hypothesis that solar-cycle variation of the Sun's irradiance has a modulating effect on the Earth's climate and that this is, specifically, manifested as an anticorrelation between sunspot number and the market price of wheat. Since Herschel first proposed his hypothesis in 1801, it has been regarded with both interest and skepticism. Recently, reports have been published that either support Herschel's hypothesis or rely on its validity. As a test of Herschel's hypothesis, we seek to reject a null hypothesis of a statistically random correlation between historical sunspot numbers, wheat prices in London and the United States, and wheat farm yields in the United States. We employ binary-correlation, Pearson-correlation, and frequency-domain methods. We test our methods using a historical geomagnetic activity index, well known to be causally correlated with sunspot number. As expected, the measured correlation between sunspot number and geomagnetic activity would be an unlikely realization of random data; the correlation is "statistically significant." On the other hand, measured correlations between sunspot number and wheat price and wheat yield data would be very likely realizations of random data; these correlations are "insignificant." Therefore, Herschel's hypothesis must be regarded with skepticism. We compare and contrast our results with those of other researchers. We discuss procedures for evaluating hypotheses that are formulated from historical data.
The Heuristic Value of p in Inductive Statistical Inference
Krueger, Joachim I.; Heck, Patrick R.
2017-01-01
Many statistical methods yield the probability of the observed data – or data more extreme – under the assumption that a particular hypothesis is true. This probability is commonly known as ‘the’ p-value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p-value has been subjected to much speculation, analysis, and criticism. We explore how well the p-value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p-value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p-value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say. PMID:28649206
The Heuristic Value of p in Inductive Statistical Inference.
Krueger, Joachim I; Heck, Patrick R
2017-01-01
Many statistical methods yield the probability of the observed data - or data more extreme - under the assumption that a particular hypothesis is true. This probability is commonly known as 'the' p -value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p -value has been subjected to much speculation, analysis, and criticism. We explore how well the p -value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p -value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p -value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say.
Mudge, Joseph F; Penny, Faith M; Houlahan, Jeff E
2012-12-01
Setting optimal significance levels that minimize Type I and Type II errors allows for more transparent and well-considered statistical decision making compared to the traditional α = 0.05 significance level. We use the optimal α approach to re-assess conclusions reached by three recently published tests of the pace-of-life syndrome hypothesis, which attempts to unify occurrences of different physiological, behavioral, and life history characteristics under one theory, over different scales of biological organization. While some of the conclusions reached using optimal α were consistent to those previously reported using the traditional α = 0.05 threshold, opposing conclusions were also frequently reached. The optimal α approach reduced probabilities of Type I and Type II errors, and ensured statistical significance was associated with biological relevance. Biologists should seriously consider their choice of α when conducting null hypothesis significance tests, as there are serious disadvantages with consistent reliance on the traditional but arbitrary α = 0.05 significance level. Copyright © 2012 WILEY Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Lehmann, Rüdiger; Lösler, Michael
2017-12-01
Geodetic deformation analysis can be interpreted as a model selection problem. The null model indicates that no deformation has occurred. It is opposed to a number of alternative models, which stipulate different deformation patterns. A common way to select the right model is the usage of a statistical hypothesis test. However, since we have to test a series of deformation patterns, this must be a multiple test. As an alternative solution for the test problem, we propose the p-value approach. Another approach arises from information theory. Here, the Akaike information criterion (AIC) or some alternative is used to select an appropriate model for a given set of observations. Both approaches are discussed and applied to two test scenarios: A synthetic levelling network and the Delft test data set. It is demonstrated that they work but behave differently, sometimes even producing different results. Hypothesis tests are well-established in geodesy, but may suffer from an unfavourable choice of the decision error rates. The multiple test also suffers from statistical dependencies between the test statistics, which are neglected. Both problems are overcome by applying information criterions like AIC.
Zhang, Fanghong; Miyaoka, Etsuo; Huang, Fuping; Tanaka, Yutaka
2015-01-01
The problem for establishing noninferiority is discussed between a new treatment and a standard (control) treatment with ordinal categorical data. A measure of treatment effect is used and a method of specifying noninferiority margin for the measure is provided. Two Z-type test statistics are proposed where the estimation of variance is constructed under the shifted null hypothesis using U-statistics. Furthermore, the confidence interval and the sample size formula are given based on the proposed test statistics. The proposed procedure is applied to a dataset from a clinical trial. A simulation study is conducted to compare the performance of the proposed test statistics with that of the existing ones, and the results show that the proposed test statistics are better in terms of the deviation from nominal level and the power.
A Tutorial in Bayesian Potential Outcomes Mediation Analysis.
Miočević, Milica; Gonzalez, Oscar; Valente, Matthew J; MacKinnon, David P
2018-01-01
Statistical mediation analysis is used to investigate intermediate variables in the relation between independent and dependent variables. Causal interpretation of mediation analyses is challenging because randomization of subjects to levels of the independent variable does not rule out the possibility of unmeasured confounders of the mediator to outcome relation. Furthermore, commonly used frequentist methods for mediation analysis compute the probability of the data given the null hypothesis, which is not the probability of a hypothesis given the data as in Bayesian analysis. Under certain assumptions, applying the potential outcomes framework to mediation analysis allows for the computation of causal effects, and statistical mediation in the Bayesian framework gives indirect effects probabilistic interpretations. This tutorial combines causal inference and Bayesian methods for mediation analysis so the indirect and direct effects have both causal and probabilistic interpretations. Steps in Bayesian causal mediation analysis are shown in the application to an empirical example.
A two-hypothesis approach to establishing a life detection/biohazard protocol for planetary samples
NASA Astrophysics Data System (ADS)
Conley, Catharine; Steele, Andrew
2016-07-01
The COSPAR policy on performing a biohazard assessment on samples brought from Mars to Earth is framed in the context of a concern for false-positive results. However, as noted during the 2012 Workshop for Life Detection in Samples from Mars (ref. Kminek et al., 2014), a more significant concern for planetary samples brought to Earth is false-negative results, because an undetected biohazard could increase risk to the Earth. This is the reason that stringent contamination control must be a high priority for all Category V Restricted Earth Return missions. A useful conceptual framework for addressing these concerns involves two complementary 'null' hypotheses: testing both of them, together, would allow statistical and community confidence to be developed regarding one or the other conclusion. As noted above, false negatives are of primary concern for safety of the Earth, so the 'Earth Safety null hypothesis' -- that must be disproved to assure low risk to the Earth from samples introduced by Category V Restricted Earth Return missions -- is 'There is native life in these samples.' False positives are of primary concern for Astrobiology, so the 'Astrobiology null hypothesis' -- that must be disproved in order to demonstrate the existence of extraterrestrial life is 'There is no life in these samples.' The presence of Earth contamination would render both of these hypotheses more difficult to disprove. Both these hypotheses can be tested following a strict science protocol; analyse, interprete, test the hypotheses and repeat. The science measurements undertaken are then done in an iterative fashion that responds to discovery with both hypotheses testable from interpretation of the scientific data. This is a robust, community involved activity that ensures maximum science return with minimal sample use.
Statistical evaluation of synchronous spike patterns extracted by frequent item set mining
Torre, Emiliano; Picado-Muiño, David; Denker, Michael; Borgelt, Christian; Grün, Sonja
2013-01-01
We recently proposed frequent itemset mining (FIM) as a method to perform an optimized search for patterns of synchronous spikes (item sets) in massively parallel spike trains. This search outputs the occurrence count (support) of individual patterns that are not trivially explained by the counts of any superset (closed frequent item sets). The number of patterns found by FIM makes direct statistical tests infeasible due to severe multiple testing. To overcome this issue, we proposed to test the significance not of individual patterns, but instead of their signatures, defined as the pairs of pattern size z and support c. Here, we derive in detail a statistical test for the significance of the signatures under the null hypothesis of full independence (pattern spectrum filtering, PSF) by means of surrogate data. As a result, injected spike patterns that mimic assembly activity are well detected, yielding a low false negative rate. However, this approach is prone to additionally classify patterns resulting from chance overlap of real assembly activity and background spiking as significant. These patterns represent false positives with respect to the null hypothesis of having one assembly of given signature embedded in otherwise independent spiking activity. We propose the additional method of pattern set reduction (PSR) to remove these false positives by conditional filtering. By employing stochastic simulations of parallel spike trains with correlated activity in form of injected spike synchrony in subsets of the neurons, we demonstrate for a range of parameter settings that the analysis scheme composed of FIM, PSF and PSR allows to reliably detect active assemblies in massively parallel spike trains. PMID:24167487
Mebane, Christopher A.
2015-01-01
Criticisms of the uses of the no-observed-effect concentration (NOEC) and the lowest-observed-effect concentration (LOEC) and more generally the entire null hypothesis statistical testing scheme are hardly new or unique to the field of ecotoxicology [1-4]. Among the criticisms of NOECs and LOECs is that statistically similar LOECs (in terms of p value) can represent drastically different levels of effect. For instance, my colleagues and I found that a battery of chronic toxicity tests with different species and endpoints yielded LOECs with minimum detectable differences ranging from 3% to 48% reductions from controls [5].
Confidence intervals for single-case effect size measures based on randomization test inversion.
Michiels, Bart; Heyvaert, Mieke; Meulders, Ann; Onghena, Patrick
2017-02-01
In the current paper, we present a method to construct nonparametric confidence intervals (CIs) for single-case effect size measures in the context of various single-case designs. We use the relationship between a two-sided statistical hypothesis test at significance level α and a 100 (1 - α) % two-sided CI to construct CIs for any effect size measure θ that contain all point null hypothesis θ values that cannot be rejected by the hypothesis test at significance level α. This method of hypothesis test inversion (HTI) can be employed using a randomization test as the statistical hypothesis test in order to construct a nonparametric CI for θ. We will refer to this procedure as randomization test inversion (RTI). We illustrate RTI in a situation in which θ is the unstandardized and the standardized difference in means between two treatments in a completely randomized single-case design. Additionally, we demonstrate how RTI can be extended to other types of single-case designs. Finally, we discuss a few challenges for RTI as well as possibilities when using the method with other effect size measures, such as rank-based nonoverlap indices. Supplementary to this paper, we provide easy-to-use R code, which allows the user to construct nonparametric CIs according to the proposed method.
UNIFORMLY MOST POWERFUL BAYESIAN TESTS
Johnson, Valen E.
2014-01-01
Uniformly most powerful tests are statistical hypothesis tests that provide the greatest power against a fixed null hypothesis among all tests of a given size. In this article, the notion of uniformly most powerful tests is extended to the Bayesian setting by defining uniformly most powerful Bayesian tests to be tests that maximize the probability that the Bayes factor, in favor of the alternative hypothesis, exceeds a specified threshold. Like their classical counterpart, uniformly most powerful Bayesian tests are most easily defined in one-parameter exponential family models, although extensions outside of this class are possible. The connection between uniformly most powerful tests and uniformly most powerful Bayesian tests can be used to provide an approximate calibration between p-values and Bayes factors. Finally, issues regarding the strong dependence of resulting Bayes factors and p-values on sample size are discussed. PMID:24659829
Long working hours and use of psychotropic medicine: a follow-up study with register linkage.
Hannerz, Harald; Albertsen, Karen
2016-03-01
This study aimed to investigate the possibility of a prospective association between long working hours and use of psychotropic medicine. Survey data drawn from random samples of the general working population of Denmark in the time period 1995-2010 were linked to national registers covering all inhabitants. The participants were followed for first occurrence of redeemed prescriptions for psychotropic medicine. The primary analysis included 25,959 observations (19,259 persons) and yielded a total of 2914 new cases of psychotropic drug use in 99,018 person-years at risk. Poisson regression was used to model incidence rates of redeemed prescriptions for psychotropic medicine as a function of working hours (32-40, 41-48, >48 hours/week). The analysis was controlled for gender, age, sample, shift work, and socioeconomic status. A likelihood ratio test was used to test the null hypothesis, which stated that the incidence rates were independent of weekly working hours. The likelihood ratio test did not reject the null hypothesis (P=0.085). The rate ratio (RR) was 1.04 [95% confidence interval (95% CI) 0.94-1.15] for the contrast 41-48 versus 32-40 work hours/week and 1.15 (95% CI 1.02-1.30) for >48 versus 32-40 hours/week. None of the rate ratios that were estimated in the present study were statistically significant after adjustment for multiple testing. However, stratified analyses, in which 30 RR were estimated, generated the hypothesis that overtime work (>48 hours/week) might be associated with an increased risk among night or shift workers (RR=1.51, 95% CI 1.15-1.98). The present study did not find a statistically significant association between long working hours and incidence of psychotropic drug usage among Danish employees.
Rodgers, Joseph Lee
2016-01-01
The Bayesian-frequentist debate typically portrays these statistical perspectives as opposing views. However, both Bayesian and frequentist statisticians have expanded their epistemological basis away from a singular focus on the null hypothesis, to a broader perspective involving the development and comparison of competing statistical/mathematical models. For frequentists, statistical developments such as structural equation modeling and multilevel modeling have facilitated this transition. For Bayesians, the Bayes factor has facilitated this transition. The Bayes factor is treated in articles within this issue of Multivariate Behavioral Research. The current presentation provides brief commentary on those articles and more extended discussion of the transition toward a modern modeling epistemology. In certain respects, Bayesians and frequentists share common goals.
Earthquake likelihood model testing
Schorlemmer, D.; Gerstenberger, M.C.; Wiemer, S.; Jackson, D.D.; Rhoades, D.A.
2007-01-01
INTRODUCTIONThe Regional Earthquake Likelihood Models (RELM) project aims to produce and evaluate alternate models of earthquake potential (probability per unit volume, magnitude, and time) for California. Based on differing assumptions, these models are produced to test the validity of their assumptions and to explore which models should be incorporated in seismic hazard and risk evaluation. Tests based on physical and geological criteria are useful but we focus on statistical methods using future earthquake catalog data only. We envision two evaluations: a test of consistency with observed data and a comparison of all pairs of models for relative consistency. Both tests are based on the likelihood method, and both are fully prospective (i.e., the models are not adjusted to fit the test data). To be tested, each model must assign a probability to any possible event within a specified region of space, time, and magnitude. For our tests the models must use a common format: earthquake rates in specified “bins” with location, magnitude, time, and focal mechanism limits.Seismology cannot yet deterministically predict individual earthquakes; however, it should seek the best possible models for forecasting earthquake occurrence. This paper describes the statistical rules of an experiment to examine and test earthquake forecasts. The primary purposes of the tests described below are to evaluate physical models for earthquakes, assure that source models used in seismic hazard and risk studies are consistent with earthquake data, and provide quantitative measures by which models can be assigned weights in a consensus model or be judged as suitable for particular regions.In this paper we develop a statistical method for testing earthquake likelihood models. A companion paper (Schorlemmer and Gerstenberger 2007, this issue) discusses the actual implementation of these tests in the framework of the RELM initiative.Statistical testing of hypotheses is a common task and a wide range of possible testing procedures exist. Jolliffe and Stephenson (2003) present different forecast verifications from atmospheric science, among them likelihood testing of probability forecasts and testing the occurrence of binary events. Testing binary events requires that for each forecasted event, the spatial, temporal and magnitude limits be given. Although major earthquakes can be considered binary events, the models within the RELM project express their forecasts on a spatial grid and in 0.1 magnitude units; thus the results are a distribution of rates over space and magnitude. These forecasts can be tested with likelihood tests.In general, likelihood tests assume a valid null hypothesis against which a given hypothesis is tested. The outcome is either a rejection of the null hypothesis in favor of the test hypothesis or a nonrejection, meaning the test hypothesis cannot outperform the null hypothesis at a given significance level. Within RELM, there is no accepted null hypothesis and thus the likelihood test needs to be expanded to allow comparable testing of equipollent hypotheses.To test models against one another, we require that forecasts are expressed in a standard format: the average rate of earthquake occurrence within pre-specified limits of hypocentral latitude, longitude, depth, magnitude, time period, and focal mechanisms. Focal mechanisms should either be described as the inclination of P-axis, declination of P-axis, and inclination of the T-axis, or as strike, dip, and rake angles. Schorlemmer and Gerstenberger (2007, this issue) designed classes of these parameters such that similar models will be tested against each other. These classes make the forecasts comparable between models. Additionally, we are limited to testing only what is precisely defined and consistently reported in earthquake catalogs. Therefore it is currently not possible to test such information as fault rupture length or area, asperity location, etc. Also, to account for data quality issues, we allow for location and magnitude uncertainties as well as the probability that an event is dependent on another event.As we mentioned above, only models with comparable forecasts can be tested against each other. Our current tests are designed to examine grid-based models. This requires that any fault-based model be adapted to a grid before testing is possible. While this is a limitation of the testing, it is an inherent difficulty in any such comparative testing. Please refer to appendix B for a statistical evaluation of the application of the Poisson hypothesis to fault-based models.The testing suite we present consists of three different tests: L-Test, N-Test, and R-Test. These tests are defined similarily to Kagan and Jackson (1995). The first two tests examine the consistency of the hypotheses with the observations while the last test compares the spatial performances of the models.
On the insignificance of Herschel's sunspot correlation
Love, Jeffrey J.
2013-01-01
We examine William Herschel's hypothesis that solar-cycle variation of the Sun's irradiance has a modulating effect on the Earth's climate and that this is, specifically, manifested as an anticorrelation between sunspot number and the market price of wheat. Since Herschel first proposed his hypothesis in 1801, it has been regarded with both interest and skepticism. Recently, reports have been published that either support Herschel's hypothesis or rely on its validity. As a test of Herschel's hypothesis, we seek to reject a null hypothesis of a statistically random correlation between historical sunspot numbers, wheat prices in London and the United States, and wheat farm yields in the United States. We employ binary-correlation, Pearson-correlation, and frequency-domain methods. We test our methods using a historical geomagnetic activity index, well known to be causally correlated with sunspot number. As expected, the measured correlation between sunspot number and geomagnetic activity would be an unlikely realization of random data; the correlation is “statistically significant.” On the other hand, measured correlations between sunspot number and wheat price and wheat yield data would be very likely realizations of random data; these correlations are “insignificant.” Therefore, Herschel's hypothesis must be regarded with skepticism. We compare and contrast our results with those of other researchers. We discuss procedures for evaluating hypotheses that are formulated from historical data.
Overgaard, Morten; Lindeløv, Jonas; Svejstrup, Stinna; Døssing, Marianne; Hvid, Tanja; Kauffmann, Oliver; Mouridsen, Kim
2013-01-01
This paper reports an experiment intended to test a particular hypothesis derived from blindsight research, which we name the “source misidentification hypothesis.” According to this hypothesis, a subject may be correct about a stimulus without being correct about how she had access to this knowledge (whether the stimulus was visual, auditory, or something else). We test this hypothesis in healthy subjects, asking them to report whether a masked stimulus was presented auditorily or visually, what the stimulus was, and how clearly they experienced the stimulus using the Perceptual Awareness Scale (PAS). We suggest that knowledge about perceptual modality may be a necessary precondition in order to issue correct reports of which stimulus was presented. Furthermore, we find that PAS ratings correlate with correctness, and that subjects are at chance level when reporting no conscious experience of the stimulus. To demonstrate that particular levels of reporting accuracy are obtained, we employ a statistical strategy, which operationally tests the hypothesis of non-equality, such that the usual rejection of the null-hypothesis admits the conclusion of equivalence. PMID:23508677
Comparing transformation methods for DNA microarray data
Thygesen, Helene H; Zwinderman, Aeilko H
2004-01-01
Background When DNA microarray data are used for gene clustering, genotype/phenotype correlation studies, or tissue classification the signal intensities are usually transformed and normalized in several steps in order to improve comparability and signal/noise ratio. These steps may include subtraction of an estimated background signal, subtracting the reference signal, smoothing (to account for nonlinear measurement effects), and more. Different authors use different approaches, and it is generally not clear to users which method they should prefer. Results We used the ratio between biological variance and measurement variance (which is an F-like statistic) as a quality measure for transformation methods, and we demonstrate a method for maximizing that variance ratio on real data. We explore a number of transformations issues, including Box-Cox transformation, baseline shift, partial subtraction of the log-reference signal and smoothing. It appears that the optimal choice of parameters for the transformation methods depends on the data. Further, the behavior of the variance ratio, under the null hypothesis of zero biological variance, appears to depend on the choice of parameters. Conclusions The use of replicates in microarray experiments is important. Adjustment for the null-hypothesis behavior of the variance ratio is critical to the selection of transformation method. PMID:15202953
Comparing transformation methods for DNA microarray data.
Thygesen, Helene H; Zwinderman, Aeilko H
2004-06-17
When DNA microarray data are used for gene clustering, genotype/phenotype correlation studies, or tissue classification the signal intensities are usually transformed and normalized in several steps in order to improve comparability and signal/noise ratio. These steps may include subtraction of an estimated background signal, subtracting the reference signal, smoothing (to account for nonlinear measurement effects), and more. Different authors use different approaches, and it is generally not clear to users which method they should prefer. We used the ratio between biological variance and measurement variance (which is an F-like statistic) as a quality measure for transformation methods, and we demonstrate a method for maximizing that variance ratio on real data. We explore a number of transformations issues, including Box-Cox transformation, baseline shift, partial subtraction of the log-reference signal and smoothing. It appears that the optimal choice of parameters for the transformation methods depends on the data. Further, the behavior of the variance ratio, under the null hypothesis of zero biological variance, appears to depend on the choice of parameters. The use of replicates in microarray experiments is important. Adjustment for the null-hypothesis behavior of the variance ratio is critical to the selection of transformation method.
Globigerinoides ruber morphotypes in the Gulf of Mexico: a test of null hypothesis
Thirumalai, Kaustubh; Richey, Julie N.; Quinn, Terrence M.; Poore, Richard Z.
2014-01-01
Planktic foraminifer Globigerinoides ruber (G. ruber), due to its abundance and ubiquity in the tropical/subtropical mixed layer, has been the workhorse of paleoceanographic studies investigating past sea-surface conditions on a range of timescales. Recent geochemical work on the two principal white G. ruber (W) morphotypes, sensu stricto (ss) and sensu lato (sl), has hypothesized differences in seasonal preferences or calcification depths, implying that reconstructions using a non-selective mixture of morphotypes could potentially be biased. Here, we test these hypotheses by performing stable isotope and abundance measurements on the two morphotypes in sediment trap, core-top, and downcore samples from the northern Gulf of Mexico. As a test of null hypothesis, we perform the same analyses on couplets of G. ruber (W) specimens with attributes intermediate to the holotypic ss and sl morphologies. We find no systematic or significant offsets in coeval ss-sl δ18O, and δ13C. These offsets are no larger than those in the intermediate pairs. Coupling our results with foraminiferal statistical model INFAUNAL, we find that contrary to previous work elsewhere, there is no evidence for discrepancies in ss-sl calcifying depth habitat or seasonality in the Gulf of Mexico.
Interspecific interactions through 2 million years: are competitive outcomes predictable?
Di Martino, Emanuela; Rust, Seabourne
2016-01-01
Ecological interactions affect the survival and reproduction of individuals. However, ecological interactions are notoriously difficult to measure in extinct populations, hindering our understanding of how the outcomes of interactions such as competition vary in time and influence long-term evolutionary changes. Here, the outcomes of spatial competition in a temporally continuous community over evolutionary timescales are presented for the first time. Our research domain is encrusting cheilostome bryozoans from the Wanganui Basin of New Zealand over a ca 2 Myr time period (Pleistocene to Recent). We find that a subset of species can be identified as consistent winners, and others as consistent losers, in the sense that they win or lose interspecific competitive encounters statistically more often than the null hypothesis of 50%. Most species do not improve or worsen in their competitive abilities through the 2 Myr period, but a minority of species are winners in some intervals and losers in others. We found that conspecifics tend to cluster spatially and interact more often than expected under a null hypothesis: most of these are stand-off interactions where the two colonies involved stopped growing at edges of encounter. Counterintuitively, competitive ability has no bearing on ecological dominance. PMID:27581885
Sequential parallel comparison design with binary and time-to-event outcomes.
Silverman, Rachel Kloss; Ivanova, Anastasia; Fine, Jason
2018-04-30
Sequential parallel comparison design (SPCD) has been proposed to increase the likelihood of success of clinical trials especially trials with possibly high placebo effect. Sequential parallel comparison design is conducted with 2 stages. Participants are randomized between active therapy and placebo in stage 1. Then, stage 1 placebo nonresponders are rerandomized between active therapy and placebo. Data from the 2 stages are pooled to yield a single P value. We consider SPCD with binary and with time-to-event outcomes. For time-to-event outcomes, response is defined as a favorable event prior to the end of follow-up for a given stage of SPCD. We show that for these cases, the usual test statistics from stages 1 and 2 are asymptotically normal and uncorrelated under the null hypothesis, leading to a straightforward combined testing procedure. In addition, we show that the estimators of the treatment effects from the 2 stages are asymptotically normal and uncorrelated under the null and alternative hypothesis, yielding confidence interval procedures with correct coverage. Simulations and real data analysis demonstrate the utility of the binary and time-to-event SPCD. Copyright © 2018 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Chen, C.; Rundle, J. B.; Holliday, J. R.; Nanjo, K.; Turcotte, D. L.; Li, S.; Tiampo, K. F.
2005-12-01
Forecast verification procedures for statistical events with binary outcomes typically rely on the use of contingency tables and Relative Operating Characteristic (ROC) diagrams. Originally developed for the statistical evaluation of tornado forecasts on a county-by-county basis, these methods can be adapted to the evaluation of competing earthquake forecasts. Here we apply these methods retrospectively to two forecasts for the m = 7.3 1999 Chi-Chi, Taiwan, earthquake. These forecasts are based on a method, Pattern Informatics (PI), that locates likely sites for future large earthquakes based on large change in activity of the smallest earthquakes. A competing null hypothesis, Relative Intensity (RI), is based on the idea that future large earthquake locations are correlated with sites having the greatest frequency of small earthquakes. We show that for Taiwan, the PI forecast method is superior to the RI forecast null hypothesis. Inspection of the two maps indicates that their forecast locations are indeed quite different. Our results confirm an earlier result suggesting that the earthquake preparation process for events such as the Chi-Chi earthquake involves anomalous changes in activation or quiescence, and that signatures of these processes can be detected in precursory seismicity data. Furthermore, we find that our methods can accurately forecast the locations of aftershocks from precursory seismicity changes alone, implying that the main shock together with its aftershocks represent a single manifestation of the formation of a high-stress region nucleating prior to the main shock.
Two-sample binary phase 2 trials with low type I error and low sample size
Litwin, Samuel; Basickes, Stanley; Ross, Eric A.
2017-01-01
Summary We address design of two-stage clinical trials comparing experimental and control patients. Our end-point is success or failure, however measured, with null hypothesis that the chance of success in both arms is p0 and alternative that it is p0 among controls and p1 > p0 among experimental patients. Standard rules will have the null hypothesis rejected when the number of successes in the (E)xperimental arm, E, sufficiently exceeds C, that among (C)ontrols. Here, we combine one-sample rejection decision rules, E ≥ m, with two-sample rules of the form E – C > r to achieve two-sample tests with low sample number and low type I error. We find designs with sample numbers not far from the minimum possible using standard two-sample rules, but with type I error of 5% rather than 15% or 20% associated with them, and of equal power. This level of type I error is achieved locally, near the stated null, and increases to 15% or 20% when the null is significantly higher than specified. We increase the attractiveness of these designs to patients by using 2:1 randomization. Examples of the application of this new design covering both high and low success rates under the null hypothesis are provided. PMID:28118686
Wavelet analysis in ecology and epidemiology: impact of statistical tests
Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario
2014-01-01
Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the ‘beta-surrogate’ method. PMID:24284892
Wavelet analysis in ecology and epidemiology: impact of statistical tests.
Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario
2014-02-06
Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the 'beta-surrogate' method.
Multi-arm group sequential designs with a simultaneous stopping rule.
Urach, S; Posch, M
2016-12-30
Multi-arm group sequential clinical trials are efficient designs to compare multiple treatments to a control. They allow one to test for treatment effects already in interim analyses and can have a lower average sample number than fixed sample designs. Their operating characteristics depend on the stopping rule: We consider simultaneous stopping, where the whole trial is stopped as soon as for any of the arms the null hypothesis of no treatment effect can be rejected, and separate stopping, where only recruitment to arms for which a significant treatment effect could be demonstrated is stopped, but the other arms are continued. For both stopping rules, the family-wise error rate can be controlled by the closed testing procedure applied to group sequential tests of intersection and elementary hypotheses. The group sequential boundaries for the separate stopping rule also control the family-wise error rate if the simultaneous stopping rule is applied. However, we show that for the simultaneous stopping rule, one can apply improved, less conservative stopping boundaries for local tests of elementary hypotheses. We derive corresponding improved Pocock and O'Brien type boundaries as well as optimized boundaries to maximize the power or average sample number and investigate the operating characteristics and small sample properties of the resulting designs. To control the power to reject at least one null hypothesis, the simultaneous stopping rule requires a lower average sample number than the separate stopping rule. This comes at the cost of a lower power to reject all null hypotheses. Some of this loss in power can be regained by applying the improved stopping boundaries for the simultaneous stopping rule. The procedures are illustrated with clinical trials in systemic sclerosis and narcolepsy. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Interpreting observational studies: why empirical calibration is needed to correct p-values
Schuemie, Martijn J; Ryan, Patrick B; DuMouchel, William; Suchard, Marc A; Madigan, David
2014-01-01
Often the literature makes assertions of medical product effects on the basis of ‘ p < 0.05’. The underlying premise is that at this threshold, there is only a 5% probability that the observed effect would be seen by chance when in reality there is no effect. In observational studies, much more than in randomized trials, bias and confounding may undermine this premise. To test this premise, we selected three exemplar drug safety studies from literature, representing a case–control, a cohort, and a self-controlled case series design. We attempted to replicate these studies as best we could for the drugs studied in the original articles. Next, we applied the same three designs to sets of negative controls: drugs that are not believed to cause the outcome of interest. We observed how often p < 0.05 when the null hypothesis is true, and we fitted distributions to the effect estimates. Using these distributions, we compute calibrated p-values that reflect the probability of observing the effect estimate under the null hypothesis, taking both random and systematic error into account. An automated analysis of scientific literature was performed to evaluate the potential impact of such a calibration. Our experiment provides evidence that the majority of observational studies would declare statistical significance when no effect is present. Empirical calibration was found to reduce spurious results to the desired 5% level. Applying these adjustments to literature suggests that at least 54% of findings with p < 0.05 are not actually statistically significant and should be reevaluated. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:23900808
Estimating the Probability of Traditional Copying, Conditional on Answer-Copying Statistics.
Allen, Jeff; Ghattas, Andrew
2016-06-01
Statistics for detecting copying on multiple-choice tests produce p values measuring the probability of a value at least as large as that observed, under the null hypothesis of no copying. The posterior probability of copying is arguably more relevant than the p value, but cannot be derived from Bayes' theorem unless the population probability of copying and probability distribution of the answer-copying statistic under copying are known. In this article, the authors develop an estimator for the posterior probability of copying that is based on estimable quantities and can be used with any answer-copying statistic. The performance of the estimator is evaluated via simulation, and the authors demonstrate how to apply the formula using actual data. Potential uses, generalizability to other types of cheating, and limitations of the approach are discussed.
Wildfire cluster detection using space-time scan statistics
NASA Astrophysics Data System (ADS)
Tonini, M.; Tuia, D.; Ratle, F.; Kanevski, M.
2009-04-01
The aim of the present study is to identify spatio-temporal clusters of fires sequences using space-time scan statistics. These statistical methods are specifically designed to detect clusters and assess their significance. Basically, scan statistics work by comparing a set of events occurring inside a scanning window (or a space-time cylinder for spatio-temporal data) with those that lie outside. Windows of increasing size scan the zone across space and time: the likelihood ratio is calculated for each window (comparing the ratio "observed cases over expected" inside and outside): the window with the maximum value is assumed to be the most probable cluster, and so on. Under the null hypothesis of spatial and temporal randomness, these events are distributed according to a known discrete-state random process (Poisson or Bernoulli), which parameters can be estimated. Given this assumption, it is possible to test whether or not the null hypothesis holds in a specific area. In order to deal with fires data, the space-time permutation scan statistic has been applied since it does not require the explicit specification of the population-at risk in each cylinder. The case study is represented by Florida daily fire detection using the Moderate Resolution Imaging Spectroradiometer (MODIS) active fire product during the period 2003-2006. As result, statistically significant clusters have been identified. Performing the analyses over the entire frame period, three out of the five most likely clusters have been identified in the forest areas, on the North of the country; the other two clusters cover a large zone in the South, corresponding to agricultural land and the prairies in the Everglades. Furthermore, the analyses have been performed separately for the four years to analyze if the wildfires recur each year during the same period. It emerges that clusters of forest fires are more frequent in hot seasons (spring and summer), while in the South areas they are widely present along the whole year. The analysis of fires distribution to evaluate if they are statistically more frequent in some area or/and in some period of the year, can be useful to support fire management and to focus on prevention measures.
Wilcoxon's signed-rank statistic: what null hypothesis and why it matters.
Li, Heng; Johnson, Terri
2014-01-01
In statistical literature, the term 'signed-rank test' (or 'Wilcoxon signed-rank test') has been used to refer to two distinct tests: a test for symmetry of distribution and a test for the median of a symmetric distribution, sharing a common test statistic. To avoid potential ambiguity, we propose to refer to those two tests by different names, as 'test for symmetry based on signed-rank statistic' and 'test for median based on signed-rank statistic', respectively. The utility of such terminological differentiation should become evident through our discussion of how those tests connect and contrast with sign test and one-sample t-test. Published 2014. This article is a U.S. Government work and is in the public domain in the USA. Published 2014. This article is a U.S. Government work and is in the public domain in the USA.
A Statistical Test of Correlations and Periodicities in the Geological Records
NASA Astrophysics Data System (ADS)
Yabushita, S.
1997-09-01
Matsumoto & Kubotani argued that there is a positive and statistically significant correlation between cratering and mass extinction. This argument is critically examined by adopting a method of Ertel used by Matsumoto & Kubotani but by applying it more directly to the extinction and cratering records. It is shown that on the null-hypothesis of random distribution of crater ages, the observed correlation has a probability of occurrence of 13%. However, when large craters are excluded whose ages agree with the times of peaks of extinction rate of marine fauna, one obtains a negative correlation. This result strongly indicates that mass extinction are not due to accumulation of impacts but due to isolated gigantic impacts.
Null Effects and Publication Bias in Special Education Research
ERIC Educational Resources Information Center
Cook, Bryan G.; Therrien, William J.
2017-01-01
Researchers sometimes conduct a study and find that the predicted relation between variables did not exist or that the intervention did not have a positive impact on student outcomes; these are referred to as null findings because they fail to disconfirm the null hypothesis. Rather than consider such studies as failures and disregard the null…
A spatial scan statistic for multiple clusters.
Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie
2011-10-01
Spatial scan statistics are commonly used for geographical disease surveillance and cluster detection. While there are multiple clusters coexisting in the study area, they become difficult to detect because of clusters' shadowing effect to each other. The recently proposed sequential method showed its better power for detecting the second weaker cluster, but did not improve the ability of detecting the first stronger cluster which is more important than the second one. We propose a new extension of the spatial scan statistic which could be used to detect multiple clusters. Through constructing two or more clusters in the alternative hypothesis, our proposed method accounts for other coexisting clusters in the detecting and evaluating process. The performance of the proposed method is compared to the sequential method through an intensive simulation study, in which our proposed method shows better power in terms of both rejecting the null hypothesis and accurately detecting the coexisting clusters. In the real study of hand-foot-mouth disease data in Pingdu city, a true cluster town is successfully detected by our proposed method, which cannot be evaluated to be statistically significant by the standard method due to another cluster's shadowing effect. Copyright © 2011 Elsevier Inc. All rights reserved.
The Impact of Economic Factors and Acquisition Reforms on the Cost of Defense Weapon Systems
2006-03-01
test for homoskedasticity, the Breusch - Pagan test is employed. The null hypothesis of the Breusch - Pagan test is that the variance is equal to zero...made. Using the Breusch - Pagan test shown in Table 19 below, the prob>chi2 is greater than 05.=α , therefore we fail to reject the null hypothesis...overrunpercentfp100 Breusch - Pagan Test (Ho=Constant Variance) Estimated Results Variance Standard Deviation overrunpercent100
OSO 8 observational limits to the acoustic coronal heating mechanism
NASA Technical Reports Server (NTRS)
Bruner, E. C., Jr.
1981-01-01
An improved analysis of time-resolved line profiles of the C IV resonance line at 1548 A has been used to test the acoustic wave hypothesis of solar coronal heating. It is shown that the observed motions and brightness fluctuations are consistent with the existence of acoustic waves. Specific account is taken of the effect of photon statistics on the observed velocities, and a test is devised to determine whether the motions represent propagating or evanescent waves. It is found that on the average about as much energy is carried upward as downward such that the net acoustic flux density is statistically consistent with zero. The statistical uncertainty in this null result is three orders of magnitue lower than the flux level needed to heat the corona.
NASA Astrophysics Data System (ADS)
Menne, Matthew J.; Williams, Claude N., Jr.
2005-10-01
An evaluation of three hypothesis test statistics that are commonly used in the detection of undocumented changepoints is described. The goal of the evaluation was to determine whether the use of multiple tests could improve undocumented, artificial changepoint detection skill in climate series. The use of successive hypothesis testing is compared to optimal approaches, both of which are designed for situations in which multiple undocumented changepoints may be present. In addition, the importance of the form of the composite climate reference series is evaluated, particularly with regard to the impact of undocumented changepoints in the various component series that are used to calculate the composite.In a comparison of single test changepoint detection skill, the composite reference series formulation is shown to be less important than the choice of the hypothesis test statistic, provided that the composite is calculated from the serially complete and homogeneous component series. However, each of the evaluated composite series is not equally susceptible to the presence of changepoints in its components, which may be erroneously attributed to the target series. Moreover, a reference formulation that is based on the averaging of the first-difference component series is susceptible to random walks when the composition of the component series changes through time (e.g., values are missing), and its use is, therefore, not recommended. When more than one test is required to reject the null hypothesis of no changepoint, the number of detected changepoints is reduced proportionately less than the number of false alarms in a wide variety of Monte Carlo simulations. Consequently, a consensus of hypothesis tests appears to improve undocumented changepoint detection skill, especially when reference series homogeneity is violated. A consensus of successive hypothesis tests using a semihierarchic splitting algorithm also compares favorably to optimal solutions, even when changepoints are not hierarchic.
A risk-based approach to flood management decisions in a nonstationary world
NASA Astrophysics Data System (ADS)
Rosner, Ana; Vogel, Richard M.; Kirshen, Paul H.
2014-03-01
Traditional approaches to flood management in a nonstationary world begin with a null hypothesis test of "no trend" and its likelihood, with little or no attention given to the likelihood that we might ignore a trend if it really existed. Concluding a trend exists when it does not, or rejecting a trend when it exists are known as type I and type II errors, respectively. Decision-makers are poorly served by statistical and/or decision methods that do not carefully consider both over- and under-preparation errors, respectively. Similarly, little attention is given to how to integrate uncertainty in our ability to detect trends into a flood management decision context. We show how trend hypothesis test results can be combined with an adaptation's infrastructure costs and damages avoided to provide a rational decision approach in a nonstationary world. The criterion of expected regret is shown to be a useful metric that integrates the statistical, economic, and hydrological aspects of the flood management problem in a nonstationary world.
The sumLINK statistic for genetic linkage analysis in the presence of heterogeneity.
Christensen, G B; Knight, S; Camp, N J
2009-11-01
We present the "sumLINK" statistic--the sum of multipoint LOD scores for the subset of pedigrees with nominally significant linkage evidence at a given locus--as an alternative to common methods to identify susceptibility loci in the presence of heterogeneity. We also suggest the "sumLOD" statistic (the sum of positive multipoint LOD scores) as a companion to the sumLINK. sumLINK analysis identifies genetic regions of extreme consistency across pedigrees without regard to negative evidence from unlinked or uninformative pedigrees. Significance is determined by an innovative permutation procedure based on genome shuffling that randomizes linkage information across pedigrees. This procedure for generating the empirical null distribution may be useful for other linkage-based statistics as well. Using 500 genome-wide analyses of simulated null data, we show that the genome shuffling procedure results in the correct type 1 error rates for both the sumLINK and sumLOD. The power of the statistics was tested using 100 sets of simulated genome-wide data from the alternative hypothesis from GAW13. Finally, we illustrate the statistics in an analysis of 190 aggressive prostate cancer pedigrees from the International Consortium for Prostate Cancer Genetics, where we identified a new susceptibility locus. We propose that the sumLINK and sumLOD are ideal for collaborative projects and meta-analyses, as they do not require any sharing of identifiable data between contributing institutions. Further, loci identified with the sumLINK have good potential for gene localization via statistical recombinant mapping, as, by definition, several linked pedigrees contribute to each peak.
Two-sample binary phase 2 trials with low type I error and low sample size.
Litwin, Samuel; Basickes, Stanley; Ross, Eric A
2017-04-30
We address design of two-stage clinical trials comparing experimental and control patients. Our end point is success or failure, however measured, with null hypothesis that the chance of success in both arms is p 0 and alternative that it is p 0 among controls and p 1 > p 0 among experimental patients. Standard rules will have the null hypothesis rejected when the number of successes in the (E)xperimental arm, E, sufficiently exceeds C, that among (C)ontrols. Here, we combine one-sample rejection decision rules, E⩾m, with two-sample rules of the form E - C > r to achieve two-sample tests with low sample number and low type I error. We find designs with sample numbers not far from the minimum possible using standard two-sample rules, but with type I error of 5% rather than 15% or 20% associated with them, and of equal power. This level of type I error is achieved locally, near the stated null, and increases to 15% or 20% when the null is significantly higher than specified. We increase the attractiveness of these designs to patients by using 2:1 randomization. Examples of the application of this new design covering both high and low success rates under the null hypothesis are provided. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Jorge, Inmaculada; Navarro, Pedro; Martínez-Acedo, Pablo; Núñez, Estefanía; Serrano, Horacio; Alfranca, Arántzazu; Redondo, Juan Miguel; Vázquez, Jesús
2009-01-01
Statistical models for the analysis of protein expression changes by stable isotope labeling are still poorly developed, particularly for data obtained by 16O/18O labeling. Besides large scale test experiments to validate the null hypothesis are lacking. Although the study of mechanisms underlying biological actions promoted by vascular endothelial growth factor (VEGF) on endothelial cells is of considerable interest, quantitative proteomics studies on this subject are scarce and have been performed after exposing cells to the factor for long periods of time. In this work we present the largest quantitative proteomics study to date on the short term effects of VEGF on human umbilical vein endothelial cells by 18O/16O labeling. Current statistical models based on normality and variance homogeneity were found unsuitable to describe the null hypothesis in a large scale test experiment performed on these cells, producing false expression changes. A random effects model was developed including four different sources of variance at the spectrum-fitting, scan, peptide, and protein levels. With the new model the number of outliers at scan and peptide levels was negligible in three large scale experiments, and only one false protein expression change was observed in the test experiment among more than 1000 proteins. The new model allowed the detection of significant protein expression changes upon VEGF stimulation for 4 and 8 h. The consistency of the changes observed at 4 h was confirmed by a replica at a smaller scale and further validated by Western blot analysis of some proteins. Most of the observed changes have not been described previously and are consistent with a pattern of protein expression that dynamically changes over time following the evolution of the angiogenic response. With this statistical model the 18O labeling approach emerges as a very promising and robust alternative to perform quantitative proteomics studies at a depth of several thousand proteins. PMID:19181660
Impact of a Respiratory Therapy Assess-and-Treat Protocol on Adult Cardiothoracic ICU Readmissions.
Dailey, Robert T; Malinowski, Thomas; Baugher, Mitchel; Rowley, Daniel D
2017-05-01
The purpose of this retrospective medical record review was to report on recidivism to the ICU among adult postoperative cardiac and thoracic patients managed with a respiratory therapy assess-and-treat (RTAT) protocol. Our primary null hypothesis was that there would be no difference in all-cause unexpected readmissions and escalations between the RTAT group and the physician-ordered respiratory care group. Our secondary null hypothesis was that there would be no difference in primary respiratory-related readmissions, ICU length of stay, or hospital length of stay. We reviewed 1,400 medical records of cardiac and thoracic postoperative subjects between January 2015 and October 2016. The RTAT is driven by a standardized patient assessment tool, which is completed by a registered respiratory therapist. The tool develops a respiratory severity score for each patient and directs interventions for bronchial hygiene, aerosol therapy, and lung inflation therapy based on an algorithm. The protocol period commenced on December 1, 2015, and continued through October 2016. Data relative to unplanned admissions to the ICU for all causes as well as respiratory-related causes were evaluated. There was a statistically significant difference in the all-cause unplanned ICU admission rate between the RTAT (5.8% [95% CI 4.3-7.9]) and the physician-ordered respiratory care (8.8% [95% CI 6.9-11.1]) groups ( P = .034). There was no statistically significant difference in respiratory-related unplanned ICU admissions with RTAT (36% [95% CI 22.7-51.6]) compared with the physician-ordered respiratory care (53% [95% CI 41.1-64.8]) group ( P = .09). The RTAT protocol group spent 1 d less in the ICU ( P < .001) and in the hospital ( P < .001). RTAT protocol implementation demonstrated a statistically significant reduction in all-cause ICU readmissions. The reduction in respiratory-related ICU readmissions did not reach statistical significance. Copyright © 2017 by Daedalus Enterprises.
NASA Astrophysics Data System (ADS)
Jacob, Rinku; Harikrishnan, K. P.; Misra, R.; Ambika, G.
2018-01-01
Recurrence networks and the associated statistical measures have become important tools in the analysis of time series data. In this work, we test how effective the recurrence network measures are in analyzing real world data involving two main types of noise, white noise and colored noise. We use two prominent network measures as discriminating statistic for hypothesis testing using surrogate data for a specific null hypothesis that the data is derived from a linear stochastic process. We show that the characteristic path length is especially efficient as a discriminating measure with the conclusions reasonably accurate even with limited number of data points in the time series. We also highlight an additional advantage of the network approach in identifying the dimensionality of the system underlying the time series through a convergence measure derived from the probability distribution of the local clustering coefficients. As examples of real world data, we use the light curves from a prominent black hole system and show that a combined analysis using three primary network measures can provide vital information regarding the nature of temporal variability of light curves from different spectroscopic classes.
Does McNemar's test compare the sensitivities and specificities of two diagnostic tests?
Kim, Soeun; Lee, Woojoo
2017-02-01
McNemar's test is often used in practice to compare the sensitivities and specificities for the evaluation of two diagnostic tests. For correct evaluation of accuracy, an intuitive recommendation is to test the diseased and the non-diseased groups separately so that the sensitivities can be compared among the diseased, and specificities can be compared among the healthy group of people. This paper provides a rigorous theoretical framework for this argument and study the validity of McNemar's test regardless of the conditional independence assumption. We derive McNemar's test statistic under the null hypothesis considering both assumptions of conditional independence and conditional dependence. We then perform power analyses to show how the result is affected by the amount of the conditional dependence under alternative hypothesis.
On computation of p-values in parametric linkage analysis.
Kurbasic, Azra; Hössjer, Ola
2004-01-01
Parametric linkage analysis is usually used to find chromosomal regions linked to a disease (phenotype) that is described with a specific genetic model. This is done by investigating the relations between the disease and genetic markers, that is, well-characterized loci of known position with a clear Mendelian mode of inheritance. Assume we have found an interesting region on a chromosome that we suspect is linked to the disease. Then we want to test the hypothesis of no linkage versus the alternative one of linkage. As a measure we use the maximal lod score Z(max). It is well known that the maximal lod score has asymptotically a (2 ln 10)(-1) x (1/2 chi2(0) + 1/2 chi2(1)) distribution under the null hypothesis of no linkage when only one point (one marker) on the chromosome is studied. In this paper, we show, both by simulations and theoretical arguments, that the null hypothesis distribution of Zmax has no simple form when more than one marker is used (multipoint analysis). In fact, the distribution of Zmax depends on the number of families, their structure, the assumed genetic model, marker denseness, and marker informativity. This means that a constant critical limit of Zmax leads to tests associated with different significance levels. Because of the above-mentioned problems, from the statistical point of view the maximal lod score should be supplemented by a p-value when results are reported. Copyright (c) 2004 S. Karger AG, Basel.
Seasonal variation of sudden infant death syndrome in Hawaii.
Mage, David T
2004-11-01
To test whether the sudden infant death syndrome (SIDS) rate displays the universal winter maximum and summer minimum in Hawaii where there is no appreciable seasonal variation of temperature. The null hypothesis is tested that there is no seasonal variation of necropsied SIDS in Hawaii. The numbers of live births and SIDS cases by month for the years 1979 to 2002 were collected and the monthly SIDS distribution is predicted based on the age at death distribution. The state of Hawaii, located in the midst of the Pacific Ocean, has a semi-tropical climate with temperatures fluctuating diurnally as 25 +/- 5 degrees C throughout the year. Therefore homes are unheated and infants are not excessively swaddled. The Hawaii State Department of Health maintains vital statistics of all infant births and deaths. The results reject the null hypothesis of no seasonal variation of SIDS (p = 0.026). An explanation for the seasonal effect of the winter maximum and summer minimum for Hawaiian SIDS is that it arises from the cycle of the school session and summer vacation periods that represent variable intensity of a possible viral infection vector. SIDS rates in both Hawaii and the United States increase with parity, also indicating a possible role of school age siblings as carriers. The winter peak of the SIDS in Hawaii is support for the hypothesis that a low grade viral infection, insufficient by itself to be a visible cause of death at necropsy, may be implicated as contributing to SIDS in vulnerable infants.
A Primer on Bayesian Analysis for Experimental Psychopathologists
Krypotos, Angelos-Miltiadis; Blanken, Tessa F.; Arnaudova, Inna; Matzke, Dora; Beckers, Tom
2016-01-01
The principal goals of experimental psychopathology (EPP) research are to offer insights into the pathogenic mechanisms of mental disorders and to provide a stable ground for the development of clinical interventions. The main message of the present article is that those goals are better served by the adoption of Bayesian statistics than by the continued use of null-hypothesis significance testing (NHST). In the first part of the article we list the main disadvantages of NHST and explain why those disadvantages limit the conclusions that can be drawn from EPP research. Next, we highlight the advantages of Bayesian statistics. To illustrate, we then pit NHST and Bayesian analysis against each other using an experimental data set from our lab. Finally, we discuss some challenges when adopting Bayesian statistics. We hope that the present article will encourage experimental psychopathologists to embrace Bayesian statistics, which could strengthen the conclusions drawn from EPP research. PMID:28748068
2011-03-01
1.179 1 22 .289 POP-UP .000 1 22 .991 Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design ...POP-UP 2.104 1 22 .161 Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design : Intercept... design also limited the number of intended treatments. The experimental design originally was suppose to test all three adverse events that threaten
NASA Astrophysics Data System (ADS)
Erfanifard, Y.; Rezayan, F.
2014-10-01
Vegetation heterogeneity biases second-order summary statistics, e.g., Ripley's K-function, applied for spatial pattern analysis in ecology. Second-order investigation based on Ripley's K-function and related statistics (i.e., L- and pair correlation function g) is widely used in ecology to develop hypothesis on underlying processes by characterizing spatial patterns of vegetation. The aim of this study was to demonstrate effects of underlying heterogeneity of wild pistachio (Pistacia atlantica Desf.) trees on the second-order summary statistics of point pattern analysis in a part of Zagros woodlands, Iran. The spatial distribution of 431 wild pistachio trees was accurately mapped in a 40 ha stand in the Wild Pistachio & Almond Research Site, Fars province, Iran. Three commonly used second-order summary statistics (i.e., K-, L-, and g-functions) were applied to analyse their spatial pattern. The two-sample Kolmogorov-Smirnov goodness-of-fit test showed that the observed pattern significantly followed an inhomogeneous Poisson process null model in the study region. The results also showed that heterogeneous pattern of wild pistachio trees biased the homogeneous form of K-, L-, and g-functions, demonstrating a stronger aggregation of the trees at the scales of 0-50 m than actually existed and an aggregation at scales of 150-200 m, while regularly distributed. Consequently, we showed that heterogeneity of point patterns may bias the results of homogeneous second-order summary statistics and we also suggested applying inhomogeneous summary statistics with related null models for spatial pattern analysis of heterogeneous vegetations.
Statistical Hypothesis Testing in Intraspecific Phylogeography: NCPA versus ABC
Templeton, Alan R.
2009-01-01
Nested clade phylogeographic analysis (NCPA) and approximate Bayesian computation (ABC) have been used to test phylogeographic hypotheses. Multilocus NCPA tests null hypotheses, whereas ABC discriminates among a finite set of alternatives. The interpretive criteria of NCPA are explicit and allow complex models to be built from simple components. The interpretive criteria of ABC are ad hoc and require the specification of a complete phylogeographic model. The conclusions from ABC are often influenced by implicit assumptions arising from the many parameters needed to specify a complex model. These complex models confound many assumptions so that biological interpretations are difficult. Sampling error is accounted for in NCPA, but ABC ignores important sources of sampling error that creates pseudo-statistical power. NCPA generates the full sampling distribution of its statistics, but ABC only yields local probabilities, which in turn make it impossible to distinguish between a good fitting model, a non-informative model, and an over-determined model. Both NCPA and ABC use approximations, but convergences of the approximations used in NCPA are well defined whereas those in ABC are not. NCPA can analyze a large number of locations, but ABC cannot. Finally, the dimensionality of tested hypothesis is known in NCPA, but not for ABC. As a consequence, the “probabilities” generated by ABC are not true probabilities and are statistically non-interpretable. Accordingly, ABC should not be used for hypothesis testing, but simulation approaches are valuable when used in conjunction with NCPA or other methods that do not rely on highly parameterized models. PMID:19192182
A comparison of dental ultrasonic technologies on subgingival calculus removal: a pilot study.
Silva, Lidia Brión; Hodges, Kathleen O; Calley, Kristin Hamman; Seikel, John A
2012-01-01
This pilot study compared the clinical endpoints of the magnetostrictive and piezoelectric ultrasonic instruments on calculus removal. The null hypothesis stated that there is no statistically significant difference in calculus removal between the 2 instruments. A quasi-experimental pre- and post-test design was used. Eighteen participants were included. The magnetostrictive and piezoelectric ultrasonic instruments were used in 2 assigned contra-lateral quadrants on each participant. A data collector, blind to treatment assignment, assessed the calculus on 6 predetermined tooth sites before and after ultrasonic instrumentation. Calculus size was evaluated using ordinal measurements on a 4 point scale (0, 1, 2, 3). Subjects were required to have size 2 or 3 calculus deposit on the 6 predetermined sites. One clinician instrumented the pre-assigned quadrants. A maximum time of 20 minutes of instrumentation was allowed with each technology. Immediately after instrumentation, the data collector then conducted the post-test calculus evaluation. The repeated analysis of variance (ANOVA) was used to analyze the pre- and post-test calculus data (p≤0.05). The null hypothesis was accepted indicating that there is no statistically significant difference in calculus removal when comparing technologies (p≤0.05). Therefore, under similar conditions, both technologies removed the same amount of calculus. This research design could be used as a foundation for continued research in this field. Future studies include implementing this study design with a larger sample size and/or modifying the study design to include multiple clinicians who are data collectors. Also, deposit removal with periodontal maintenance patients could be explored.
A simple test of association for contingency tables with multiple column responses.
Decady, Y J; Thomas, D R
2000-09-01
Loughin and Scherer (1998, Biometrics 54, 630-637) investigated tests of association in two-way tables when one of the categorical variables allows for multiple-category responses from individual respondents. Standard chi-squared tests are invalid in this case, and they developed a bootstrap test procedure that provides good control of test levels under the null hypothesis. This procedure and some others that have been proposed are computationally involved and are based on techniques that are relatively unfamiliar to many practitioners. In this paper, the methods introduced by Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) for analyzing complex survey data are used to develop a simple test based on a corrected chi-squared statistic.
Harnessing Multivariate Statistics for Ellipsoidal Data in Structural Geology
NASA Astrophysics Data System (ADS)
Roberts, N.; Davis, J. R.; Titus, S.; Tikoff, B.
2015-12-01
Most structural geology articles do not state significance levels, report confidence intervals, or perform regressions to find trends. This is, in part, because structural data tend to include directions, orientations, ellipsoids, and tensors, which are not treatable by elementary statistics. We describe a full procedural methodology for the statistical treatment of ellipsoidal data. We use a reconstructed dataset of deformed ooids in Maryland from Cloos (1947) to illustrate the process. Normalized ellipsoids have five degrees of freedom and can be represented by a second order tensor. This tensor can be permuted into a five dimensional vector that belongs to a vector space and can be treated with standard multivariate statistics. Cloos made several claims about the distribution of deformation in the South Mountain fold, Maryland, and we reexamine two particular claims using hypothesis testing: 1) octahedral shear strain increases towards the axial plane of the fold; 2) finite strain orientation varies systematically along the trend of the axial trace as it bends with the Appalachian orogen. We then test the null hypothesis that the southern segment of South Mountain is the same as the northern segment. This test illustrates the application of ellipsoidal statistics, which combine both orientation and shape. We report confidence intervals for each test, and graphically display our results with novel plots. This poster illustrates the importance of statistics in structural geology, especially when working with noisy or small datasets.
Hitting Is Contagious in Baseball: Evidence from Long Hitting Streaks
Bock, Joel R.; Maewal, Akhilesh; Gough, David A.
2012-01-01
Data analysis is used to test the hypothesis that “hitting is contagious”. A statistical model is described to study the effect of a hot hitter upon his teammates’ batting during a consecutive game hitting streak. Box score data for entire seasons comprising streaks of length games, including a total observations were compiled. Treatment and control sample groups () were constructed from core lineups of players on the streaking batter’s team. The percentile method bootstrap was used to calculate confidence intervals for statistics representing differences in the mean distributions of two batting statistics between groups. Batters in the treatment group (hot streak active) showed statistically significant improvements in hitting performance, as compared against the control. Mean for the treatment group was found to be to percentage points higher during hot streaks (mean difference increased points), while the batting heat index introduced here was observed to increase by points. For each performance statistic, the null hypothesis was rejected at the significance level. We conclude that the evidence suggests the potential existence of a “statistical contagion effect”. Psychological mechanisms essential to the empirical results are suggested, as several studies from the scientific literature lend credence to contagious phenomena in sports. Causal inference from these results is difficult, but we suggest and discuss several latent variables that may contribute to the observed results, and offer possible directions for future research. PMID:23251507
Pioglitazone in early Parkinson's disease: a phase 2, multicentre, double-blind, randomised trial
2015-01-01
Summary Background A systematic assessment of potential disease-modifying compounds for Parkinson's disease concluded that pioglitazone could hold promise for the treatment of patients with this disease. We assessed the effect of pioglitazone on the progression of Parkinson's disease in a multicentre, double-blind, placebo-controlled, futility clinical trial. Methods Participants with the diagnosis of early Parkinson's disease on a stable regimen of 1 mg/day rasagiline or 10 mg/day selegiline were randomly assigned (1:1:1) to 15 mg/day pioglitazone, 45 mg/day pioglitazone, or placebo. Investigators were masked to the treatment assignment. Only the statistical centre and the central pharmacy knew the treatment name associated with the randomisation number. The primary outcome was the change in the total Unified Parkinson's Disease Rating Scale (UPDRS) score between the baseline and 44 weeks, analysed by intention to treat. The primary null hypothesis for each dose group was that the mean change in UPDRS was 3 points less than the mean change in the placebo group. The alternative hypothesis (of futility) was that pioglitazone is not meaningfully different from placebo. We rejected the null if there was significant evidence of futility at the one-sided alpha level of 0.10. The study is registered at ClinicalTrials.gov, number NCT01280123. Findings 210 patients from 35 sites in the USA were enrolled between May 10, 2011, and July 31, 2013. The primary analysis included 72 patients in the 15 mg group, 67 in the 45 mg group, and 71 in the placebo group. The mean total UPDRS change at 44 weeks was 4.42 (95% CI 2.55–6.28) for 15 mg pioglitazone, 5.13 (95% CI 3.17–7.08) for 45 mg pioglitazone, and 6.25 (95% CI 4.35–8.15) for placebo (higher change scores are worse). The mean difference between the 15 mg and placebo groups was −1.83 (80% CI −3.56 to −0.10) and the null hypothesis could not be rejected (p=0.19). The mean difference between the 45 mg and placebo groups was −1.12 (80% CI −2.93 to 0.69) and the null hypothesis was rejected in favour of futility (p=0.09). Planned sensitivity analyses of the primary outcome, using last value carried forward (LVCF) to handle missing data and using the completers' only sample, suggested that the 15 mg dose is also futile (p=0.09 for LVCF, p=0.09 for completers) but failed to reject the null hypothesis for the 45 mg dose (p=0.12 for LVCF, p=0.19 for completers). Six serious adverse events occurred in the 15 mg group, nine in the 45 mg group, and three in the placebo group; none were thought to be definitely or probably related to the study interventions. Interpretation These findings suggest that pioglitazone at the doses studied here is unlikely to modify progression in early Parkinson's disease. Further study of pioglitazone in a larger trial in patients with Parkinson's disease is not recommended. Funding National Institute of Neurological Disorders and Stroke. PMID:26116315
Bowden, Vanessa K; Loft, Shayne
2016-06-01
In 2 experiments we examined the impact of memory for prior events on conflict detection in simulated air traffic control under conditions where individuals proactively controlled aircraft and completed concurrent tasks. Individuals were faster to detect conflicts that had repeatedly been presented during training (positive transfer). Bayesian statistics indicated strong evidence for the null hypothesis that conflict detection was not impaired for events that resembled an aircraft pair that had repeatedly come close to conflicting during training. This is likely because aircraft altitude (the feature manipulated between training and test) was attended to by participants when proactively controlling aircraft. In contrast, a minor change to the relative position of a repeated nonconflicting aircraft pair moderately impaired conflict detection (negative transfer). There was strong evidence for the null hypothesis that positive transfer was not impacted by dividing participant attention, which suggests that part of the information retrieved regarding prior aircraft events was perceptual (the new aircraft pair "looked" like a conflict based on familiarity). These findings extend the effects previously reported by Loft, Humphreys, and Neal (2004), answering the recent strong and unanimous calls across the psychological science discipline to formally establish the robustness and generality of previously published effects. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Two-sample discrimination of Poisson means
NASA Technical Reports Server (NTRS)
Lampton, M.
1994-01-01
This paper presents a statistical test for detecting significant differences between two random count accumulations. The null hypothesis is that the two samples share a common random arrival process with a mean count proportional to each sample's exposure. The model represents the partition of N total events into two counts, A and B, as a sequence of N independent Bernoulli trials whose partition fraction, f, is determined by the ratio of the exposures of A and B. The detection of a significant difference is claimed when the background (null) hypothesis is rejected, which occurs when the observed sample falls in a critical region of (A, B) space. The critical region depends on f and the desired significance level, alpha. The model correctly takes into account the fluctuations in both the signals and the background data, including the important case of small numbers of counts in the signal, the background, or both. The significance can be exactly determined from the cumulative binomial distribution, which in turn can be inverted to determine the critical A(B) or B(A) contour. This paper gives efficient implementations of these tests, based on lookup tables. Applications include the detection of clustering of astronomical objects, the detection of faint emission or absorption lines in photon-limited spectroscopy, the detection of faint emitters or absorbers in photon-limited imaging, and dosimetry.
Using Bayes to get the most out of non-significant results.
Dienes, Zoltan
2014-01-01
No scientific conclusion follows automatically from a statistically non-significant result, yet people routinely use non-significant results to guide conclusions about the status of theories (or the effectiveness of practices). To know whether a non-significant result counts against a theory, or if it just indicates data insensitivity, researchers must use one of: power, intervals (such as confidence or credibility intervals), or else an indicator of the relative evidence for one theory over another, such as a Bayes factor. I argue Bayes factors allow theory to be linked to data in a way that overcomes the weaknesses of the other approaches. Specifically, Bayes factors use the data themselves to determine their sensitivity in distinguishing theories (unlike power), and they make use of those aspects of a theory's predictions that are often easiest to specify (unlike power and intervals, which require specifying the minimal interesting value in order to address theory). Bayes factors provide a coherent approach to determining whether non-significant results support a null hypothesis over a theory, or whether the data are just insensitive. They allow accepting and rejecting the null hypothesis to be put on an equal footing. Concrete examples are provided to indicate the range of application of a simple online Bayes calculator, which reveal both the strengths and weaknesses of Bayes factors.
Chandrasekaran, Srinivas Niranj; Yardimci, Galip Gürkan; Erdogan, Ozgün; Roach, Jeffrey; Carter, Charles W.
2013-01-01
We tested the idea that ancestral class I and II aminoacyl-tRNA synthetases arose on opposite strands of the same gene. We assembled excerpted 94-residue Urgenes for class I tryptophanyl-tRNA synthetase (TrpRS) and class II Histidyl-tRNA synthetase (HisRS) from a diverse group of species, by identifying and catenating three blocks coding for secondary structures that position the most highly conserved, active-site residues. The codon middle-base pairing frequency was 0.35 ± 0.0002 in all-by-all sense/antisense alignments for 211 TrpRS and 207 HisRS sequences, compared with frequencies between 0.22 ± 0.0009 and 0.27 ± 0.0005 for eight different representations of the null hypothesis. Clustering algorithms demonstrate further that profiles of middle-base pairing in the synthetase antisense alignments are correlated along the sequences from one species-pair to another, whereas this is not the case for similar operations on sets representing the null hypothesis. Most probable reconstructed sequences for ancestral nodes of maximum likelihood trees show that middle-base pairing frequency increases to approximately 0.42 ± 0.002 as bacterial trees approach their roots; ancestral nodes from trees including archaeal sequences show a less pronounced increase. Thus, contemporary and reconstructed sequences all validate important bioinformatic predictions based on descent from opposite strands of the same ancestral gene. They further provide novel evidence for the hypothesis that bacteria lie closer than archaea to the origin of translation. Moreover, the inverse polarity of genetic coding, together with a priori α-helix propensities suggest that in-frame coding on opposite strands leads to similar secondary structures with opposite polarity, as observed in TrpRS and HisRS crystal structures. PMID:23576570
Bayesian analysis of multimethod ego-depletion studies favours the null hypothesis.
Etherton, Joseph L; Osborne, Randall; Stephenson, Katelyn; Grace, Morgan; Jones, Chas; De Nadai, Alessandro S
2018-04-01
Ego-depletion refers to the purported decrease in performance on a task requiring self-control after engaging in a previous task involving self-control, with self-control proposed to be a limited resource. Despite many published studies consistent with this hypothesis, recurrent null findings within our laboratory and indications of publication bias have called into question the validity of the depletion effect. This project used three depletion protocols involved three different depleting initial tasks followed by three different self-control tasks as dependent measures (total n = 840). For each method, effect sizes were not significantly different from zero When data were aggregated across the three different methods and examined meta-analytically, the pooled effect size was not significantly different from zero (for all priors evaluated, Hedges' g = 0.10 with 95% credibility interval of [-0.05, 0.24]) and Bayes factors reflected strong support for the null hypothesis (Bayes factor > 25 for all priors evaluated). © 2018 The British Psychological Society.
A robust null hypothesis for the potential causes of megadrought in western North America
NASA Astrophysics Data System (ADS)
Ault, T.; St George, S.; Smerdon, J. E.; Coats, S.; Mankin, J. S.; Cruz, C. C.; Cook, B.; Stevenson, S.
2017-12-01
The western United States was affected by several megadroughts during the last 1200 years, most prominently during the Medieval Climate Anomaly (MCA: 800 to 1300 CE). A null hypothesis is developed to test the possibility that, given a sufficiently long period of time, these events are inevitable and occur purely as a consequence of internal climate variability. The null distribution of this hypothesis is populated by a linear inverse model (LIM) constructed from global sea-surface temperature anomalies and self-calibrated Palmer Drought Severity Index data for North America. Despite being trained only on seasonal data from the late 20th century, the LIM produces megadroughts that are comparable in their duration, spatial scale, and magnitude as the most severe events of the last 12 centuries. The null hypothesis therefore cannot be rejected with much confidence when considering these features of megadrought, meaning that similar events are possible today, even without any changes to boundary conditions. In contrast, the observed clustering of megadroughts in the MCA, as well as the change in mean hydroclimate between the MCA and the 1500-2000 period, are more likely to have been caused by either external forcing or by internal climate variability not well sampled during the latter half of the Twentieth Century. Finally, the results demonstrate the LIM is a viable tool for determining whether paleoclimate reconstructions events should be ascribed to external forcings, "out of sample" climate mechanisms, or if they are consistent with the variability observed during the recent period.
A Bayesian Perspective on the Reproducibility Project: Psychology.
Etz, Alexander; Vandekerckhove, Joachim
2016-01-01
We revisit the results of the recent Reproducibility Project: Psychology by the Open Science Collaboration. We compute Bayes factors-a quantity that can be used to express comparative evidence for an hypothesis but also for the null hypothesis-for a large subset (N = 72) of the original papers and their corresponding replication attempts. In our computation, we take into account the likely scenario that publication bias had distorted the originally published results. Overall, 75% of studies gave qualitatively similar results in terms of the amount of evidence provided. However, the evidence was often weak (i.e., Bayes factor < 10). The majority of the studies (64%) did not provide strong evidence for either the null or the alternative hypothesis in either the original or the replication, and no replication attempts provided strong evidence in favor of the null. In all cases where the original paper provided strong evidence but the replication did not (15%), the sample size in the replication was smaller than the original. Where the replication provided strong evidence but the original did not (10%), the replication sample size was larger. We conclude that the apparent failure of the Reproducibility Project to replicate many target effects can be adequately explained by overestimation of effect sizes (or overestimation of evidence against the null hypothesis) due to small sample sizes and publication bias in the psychological literature. We further conclude that traditional sample sizes are insufficient and that a more widespread adoption of Bayesian methods is desirable.
Wang, Hong-Qiang; Tsai, Chung-Jui
2013-01-01
With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher's Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu's procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data. A web server for CorSig is provided at http://202.127.200.1:8080/probeWeb. R code for CorSig is freely available for non-commercial use at http://aspendb.uga.edu/downloads.
van Tilburg, C W J; Stronks, D L; Groeneweg, J G; Huygen, F J P M
2017-03-01
Investigate the effect of percutaneous radiofrequency compared to a sham procedure, applied to the ramus communicans for treatment of lumbar disc pain. Randomized sham-controlled, double-blind, crossover, multicenter clinical trial. Multidisciplinary pain centres of two general hospitals. Sixty patients aged 18 or more with medical history and physical examination suggestive for lumbar disc pain and a reduction of two or more on a numerical rating scale (0-10) after a diagnostic ramus communicans test block. Treatment group: percutaneous radiofrequency treatment applied to the ramus communicans; sham: same procedure except radiofrequency treatment. pain reduction. Secondary outcome measure: Global Perceived Effect. No statistically significant difference in pain level over time between the groups, as well as in the group was found; however, the factor period yielded a statistically significant result. In the crossover group, 11 out of 16 patients experienced a reduction in NRS of 2 or more at 1 month (no significant deviation from chance). No statistically significant difference in satisfaction over time between the groups was found. The independent factors group and period also showed no statistically significant effects. The same applies to recovery: no statistically significant effects were found. The null hypothesis of no difference in pain reduction and in Global Perceived Effect between the treatment and sham group cannot be rejected. Post hoc analysis revealed that none of the investigated parameters contributed to the prediction of a significant pain reduction. Interrupting signalling through the ramus communicans may interfere with the transition of painful information from the discs to the central nervous system. Methodological differences exist in studies evaluating the efficacy of radiofrequency treatment for lumbar disc pain. A randomized, sham-controlled, double-blind, multicenter clinical trial on the effect of radiofrequency at the ramus communicans for lumbar disc pain was conducted. The null hypothesis of no difference in pain reduction and in Global Perceived Effect between the treatment and sham group cannot be rejected. © 2016 The Authors. European Journal of Pain published by John Wiley & Sons Ltd on behalf of European Pain Federation - EFIC®.
Statistical Selection of Biological Models for Genome-Wide Association Analyses.
Bi, Wenjian; Kang, Guolian; Pounds, Stanley B
2018-05-24
Genome-wide association studies have discovered many biologically important associations of genes with phenotypes. Typically, genome-wide association analyses formally test the association of each genetic feature (SNP, CNV, etc) with the phenotype of interest and summarize the results with multiplicity-adjusted p-values. However, very small p-values only provide evidence against the null hypothesis of no association without indicating which biological model best explains the observed data. Correctly identifying a specific biological model may improve the scientific interpretation and can be used to more effectively select and design a follow-up validation study. Thus, statistical methodology to identify the correct biological model for a particular genotype-phenotype association can be very useful to investigators. Here, we propose a general statistical method to summarize how accurately each of five biological models (null, additive, dominant, recessive, co-dominant) represents the data observed for each variant in a GWAS study. We show that the new method stringently controls the false discovery rate and asymptotically selects the correct biological model. Simulations of two-stage discovery-validation studies show that the new method has these properties and that its validation power is similar to or exceeds that of simple methods that use the same statistical model for all SNPs. Example analyses of three data sets also highlight these advantages of the new method. An R package is freely available at www.stjuderesearch.org/site/depts/biostats/maew. Copyright © 2018. Published by Elsevier Inc.
The continuum of hydroclimate variability in western North America during the last millennium
Ault, Toby R.; Cole, Julia E.; Overpeck, Jonathan T.; Pederson, Gregory T.; St. George, Scott; Otto-Bliesner, Bette; Woodhouse, Connie A.; Deser, Clara
2013-01-01
The distribution of climatic variance across the frequency spectrum has substantial importance for anticipating how climate will evolve in the future. Here we estimate power spectra and power laws (ß) from instrumental, proxy, and climate model data to characterize the hydroclimate continuum in western North America (WNA). We test the significance of our estimates of spectral densities and ß against the null hypothesis that they reflect solely the effects of local (non-climate) sources of autocorrelation at the monthly timescale. Although tree-ring based hydroclimate reconstructions are generally consistent with this null hypothesis, values of ß calculated from long-moisture sensitive chronologies (as opposed to reconstructions), and other types of hydroclimate proxies, exceed null expectations. We therefore argue that there is more low-frequency variability in hydroclimate than monthly autocorrelation alone can generate. Coupled model results archived as part of the Climate Model Intercomparison Project 5 (CMIP5) are consistent with the null hypothesis and appear unable to generate variance in hydroclimate commensurate with paleoclimate records. Consequently, at decadal to multidecadal timescales there is more variability in instrumental and proxy data than in the models, suggesting that the risk of prolonged droughts under climate change may be underestimated by CMIP5 simulations of the future.
Dunkel, Curtis S; Harbke, Colin R; Papini, Dennis R
2009-06-01
The authors proposed that birth order affects psychosocial outcomes through differential investment from parent to child and differences in the degree of identification from child to parent. The authors conducted this study to test these 2 models. Despite the use of statistical and methodological procedures to increase sensitivity and reduce error, the authors did not find support for the models. They discuss results in the context of the mixed-research findings regarding birth order and suggest further research on the proposed developmental dynamics that may produce birth-order effects.
A large scale test of the gaming-enhancement hypothesis.
Przybylski, Andrew K; Wang, John C
2016-01-01
A growing research literature suggests that regular electronic game play and game-based training programs may confer practically significant benefits to cognitive functioning. Most evidence supporting this idea, the gaming-enhancement hypothesis , has been collected in small-scale studies of university students and older adults. This research investigated the hypothesis in a general way with a large sample of 1,847 school-aged children. Our aim was to examine the relations between young people's gaming experiences and an objective test of reasoning performance. Using a Bayesian hypothesis testing approach, evidence for the gaming-enhancement and null hypotheses were compared. Results provided no substantive evidence supporting the idea that having preference for or regularly playing commercially available games was positively associated with reasoning ability. Evidence ranged from equivocal to very strong in support for the null hypothesis over what was predicted. The discussion focuses on the value of Bayesian hypothesis testing for investigating electronic gaming effects, the importance of open science practices, and pre-registered designs to improve the quality of future work.
Statistical reporting inconsistencies in experimental philosophy
Colombo, Matteo; Duev, Georgi; Nuijten, Michèle B.; Sprenger, Jan
2018-01-01
Experimental philosophy (x-phi) is a young field of research in the intersection of philosophy and psychology. It aims to make progress on philosophical questions by using experimental methods traditionally associated with the psychological and behavioral sciences, such as null hypothesis significance testing (NHST). Motivated by recent discussions about a methodological crisis in the behavioral sciences, questions have been raised about the methodological standards of x-phi. Here, we focus on one aspect of this question, namely the rate of inconsistencies in statistical reporting. Previous research has examined the extent to which published articles in psychology and other behavioral sciences present statistical inconsistencies in reporting the results of NHST. In this study, we used the R package statcheck to detect statistical inconsistencies in x-phi, and compared rates of inconsistencies in psychology and philosophy. We found that rates of inconsistencies in x-phi are lower than in the psychological and behavioral sciences. From the point of view of statistical reporting consistency, x-phi seems to do no worse, and perhaps even better, than psychological science. PMID:29649220
Wicherts, Jelte M.; Bakker, Marjan; Molenaar, Dylan
2011-01-01
Background The widespread reluctance to share published research data is often hypothesized to be due to the authors' fear that reanalysis may expose errors in their work or may produce conclusions that contradict their own. However, these hypotheses have not previously been studied systematically. Methods and Findings We related the reluctance to share research data for reanalysis to 1148 statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance. Conclusions Our findings on the basis of psychological papers suggest that statistical results are particularly hard to verify when reanalysis is more likely to lead to contrasting conclusions. This highlights the importance of establishing mandatory data archiving policies. PMID:22073203
Has the magnitude of floods across the USA changed with global CO2 levels?
Hirsch, Robert M.; Ryberg, Karen R.
2012-01-01
Statistical relationships between annual floods at 200 long-term (85–127 years of record) streamgauges in the coterminous United States and the global mean carbon dioxide concentration (GMCO2) record are explored. The streamgauge locations are limited to those with little or no regulation or urban development. The coterminous US is divided into four large regions and stationary bootstrapping is used to evaluate if the patterns of these statistical associations are significantly different from what would be expected under the null hypothesis that flood magnitudes are independent of GMCO2. In none of the four regions defined in this study is there strong statistical evidence for flood magnitudes increasing with increasing GMCO2. One region, the southwest, showed a statistically significant negative relationship between GMCO2 and flood magnitudes. The statistical methods applied compensate both for the inter-site correlation of flood magnitudes and the shorter-term (up to a few decades) serial correlation of floods.
Has the magnitude of floods across the USA changed with global CO 2 levels?
Hirsch, R.M.; Ryberg, K.R.
2012-01-01
Statistical relationships between annual floods at 200 long-term (85-127 years of record) streamgauges in the coterminous United States and the global mean carbon dioxide concentration (GMCO2) record are explored. The streamgauge locations are limited to those with little or no regulation or urban development. The coterminous US is divided into four large regions and stationary bootstrapping is used to evaluate if the patterns of these statistical associations are significantly different from what would be expected under the null hypothesis that flood magnitudes are independent of GMCO2. In none of the four regions defined in this study is there strong statistical evidence for flood magnitudes increasing with increasing GMCO2. One region, the southwest, showed a statistically significant negative relationship between GMCO2 and flood magnitudes. The statistical methods applied compensate both for the inter-site correlation of flood magnitudes and the shorter-term (up to a few decades) serial correlation of floods.
The importance of topographically corrected null models for analyzing ecological point processes.
McDowall, Philip; Lynch, Heather J
2017-07-01
Analyses of point process patterns and related techniques (e.g., MaxEnt) make use of the expected number of occurrences per unit area and second-order statistics based on the distance between occurrences. Ecologists working with point process data often assume that points exist on a two-dimensional x-y plane or within a three-dimensional volume, when in fact many observed point patterns are generated on a two-dimensional surface existing within three-dimensional space. For many surfaces, however, such as the topography of landscapes, the projection from the surface to the x-y plane preserves neither area nor distance. As such, when these point patterns are implicitly projected to and analyzed in the x-y plane, our expectations of the point pattern's statistical properties may not be met. When used in hypothesis testing, we find that the failure to account for the topography of the generating surface may bias statistical tests that incorrectly identify clustering and, furthermore, may bias coefficients in inhomogeneous point process models that incorporate slope as a covariate. We demonstrate the circumstances under which this bias is significant, and present simple methods that allow point processes to be simulated with corrections for topography. These point patterns can then be used to generate "topographically corrected" null models against which observed point processes can be compared. © 2017 by the Ecological Society of America.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hearin, Andrew P.; Zentner, Andrew R., E-mail: aph15@pitt.edu, E-mail: zentner@pitt.edu
Forthcoming projects such as the Dark Energy Survey, Joint Dark Energy Mission, and the Large Synoptic Survey Telescope, aim to measure weak lensing shear correlations with unprecedented accuracy. Weak lensing observables are sensitive to both the distance-redshift relation and the growth of structure in the Universe. If the cause of accelerated cosmic expansion is dark energy within general relativity, both cosmic distances and structure growth are governed by the properties of dark energy. Consequently, one may use lensing to check for this consistency and test general relativity. After reviewing the phenomenology of such tests, we address a major challenge tomore » such a program. The evolution of the baryonic component of the Universe is highly uncertain and can influence lensing observables, manifesting as modified structure growth for a fixed cosmic distance scale. Using two proposed methods, we show that one could be led to reject the null hypothesis of general relativity when it is the true theory if this uncertainty in baryonic processes is neglected. Recent simulations suggest that we can correct for baryonic effects using a parameterized model in which the halo mass-concentration relation is modified. The correction suffices to render biases small compared to statistical uncertainties. We study the ability of future weak lensing surveys to constrain the internal structures of halos and test the null hypothesis of general relativity simultaneously. Compared to alternative methods which null information from small-scales to mitigate sensitivity to baryonic physics, this internal calibration program should provide limits on deviations from general relativity that are several times more constraining. Specifically, we find that limits on general relativity in the case of internal calibration are degraded by only {approx} 30% or less compared to the case of perfect knowledge of nonlinear structure.« less
Reinterpreting maximum entropy in ecology: a null hypothesis constrained by ecological mechanism.
O'Dwyer, James P; Rominger, Andrew; Xiao, Xiao
2017-07-01
Simplified mechanistic models in ecology have been criticised for the fact that a good fit to data does not imply the mechanism is true: pattern does not equal process. In parallel, the maximum entropy principle (MaxEnt) has been applied in ecology to make predictions constrained by just a handful of state variables, like total abundance or species richness. But an outstanding question remains: what principle tells us which state variables to constrain? Here we attempt to solve both problems simultaneously, by translating a given set of mechanisms into the state variables to be used in MaxEnt, and then using this MaxEnt theory as a null model against which to compare mechanistic predictions. In particular, we identify the sufficient statistics needed to parametrise a given mechanistic model from data and use them as MaxEnt constraints. Our approach isolates exactly what mechanism is telling us over and above the state variables alone. © 2017 John Wiley & Sons Ltd/CNRS.
Robustness of survival estimates for radio-marked animals
Bunck, C.M.; Chen, C.-L.
1992-01-01
Telemetry techniques are often used to study the survival of birds and mammals; particularly whcn mark-recapture approaches are unsuitable. Both parametric and nonparametric methods to estimate survival have becn developed or modified from other applications. An implicit assumption in these approaches is that the probability of re-locating an animal with a functioning transmitter is one. A Monte Carlo study was conducted to determine the bias and variance of the Kaplan-Meier estimator and an estimator based also on the assumption of constant hazard and to eva!uate the performance of the two-sample tests associated with each. Modifications of each estimator which allow a re-Iocation probability of less than one are described and evaluated. Generallv the unmodified estimators were biased but had lower variance. At low sample sizes all estimators performed poorly. Under the null hypothesis, the distribution of all test statistics reasonably approximated the null distribution when survival was low but not when it was high. The power of the two-sample tests were similar.
Beyond statistical inference: A decision theory for science
KILLEEN, PETER R.
2008-01-01
Traditional null hypothesis significance testing does not yield the probability of the null or its alternative and, therefore, cannot logically ground scientific decisions. The decision theory proposed here calculates the expected utility of an effect on the basis of (1) the probability of replicating it and (2) a utility function on its size. It takes significance tests—which place all value on the replicability of an effect and none on its magnitude—as a special case, one in which the cost of a false positive is revealed to be an order of magnitude greater than the value of a true positive. More realistic utility functions credit both replicability and effect size, integrating them for a single index of merit. The analysis incorporates opportunity cost and is consistent with alternate measures of effect size, such as r2 and information transmission, and with Bayesian model selection criteria. An alternate formulation is functionally equivalent to the formal theory, transparent, and easy to compute. PMID:17201351
Beyond statistical inference: a decision theory for science.
Killeen, Peter R
2006-08-01
Traditional null hypothesis significance testing does not yield the probability of the null or its alternative and, therefore, cannot logically ground scientific decisions. The decision theory proposed here calculates the expected utility of an effect on the basis of (1) the probability of replicating it and (2) a utility function on its size. It takes significance tests--which place all value on the replicability of an effect and none on its magnitude--as a special case, one in which the cost of a false positive is revealed to be an order of magnitude greater than the value of a true positive. More realistic utility functions credit both replicability and effect size, integrating them for a single index of merit. The analysis incorporates opportunity cost and is consistent with alternate measures of effect size, such as r2 and information transmission, and with Bayesian model selection criteria. An alternate formulation is functionally equivalent to the formal theory, transparent, and easy to compute.
Things we still haven't learned (so far).
Ivarsson, Andreas; Andersen, Mark B; Stenling, Andreas; Johnson, Urban; Lindwall, Magnus
2015-08-01
Null hypothesis significance testing (NHST) is like an immortal horse that some researchers have been trying to beat to death for over 50 years but without any success. In this article we discuss the flaws in NHST, the historical background in relation to both Fisher's and Neyman and Pearson's statistical ideas, the common misunderstandings of what p < .05 actually means, and the 2010 APA publication manual's clear, but most often ignored, instructions to report effect sizes and to interpret what they all mean in the real world. In addition, we discuss how Bayesian statistics can be used to overcome some of the problems with NHST. We then analyze quantitative articles published over the past three years (2012-2014) in two top-rated sport and exercise psychology journals to determine whether we have learned what we should have learned decades ago about our use and meaningful interpretations of statistics.
Insignificant solar-terrestrial triggering of earthquakes
Love, Jeffrey J.; Thomas, Jeremy N.
2013-01-01
We examine the claim that solar-terrestrial interaction, as measured by sunspots, solar wind velocity, and geomagnetic activity, might play a role in triggering earthquakes. We count the number of earthquakes having magnitudes that exceed chosen thresholds in calendar years, months, and days, and we order these counts by the corresponding rank of annual, monthly, and daily averages of the solar-terrestrial variables. We measure the statistical significance of the difference between the earthquake-number distributions below and above the median of the solar-terrestrial averages by χ2 and Student's t tests. Across a range of earthquake magnitude thresholds, we find no consistent and statistically significant distributional differences. We also introduce time lags between the solar-terrestrial variables and the number of earthquakes, but again no statistically significant distributional difference is found. We cannot reject the null hypothesis of no solar-terrestrial triggering of earthquakes.
Estimating the proportion of true null hypotheses when the statistics are discrete.
Dialsingh, Isaac; Austin, Stefanie R; Altman, Naomi S
2015-07-15
In high-dimensional testing problems π0, the proportion of null hypotheses that are true is an important parameter. For discrete test statistics, the P values come from a discrete distribution with finite support and the null distribution may depend on an ancillary statistic such as a table margin that varies among the test statistics. Methods for estimating π0 developed for continuous test statistics, which depend on a uniform or identical null distribution of P values, may not perform well when applied to discrete testing problems. This article introduces a number of π0 estimators, the regression and 'T' methods that perform well with discrete test statistics and also assesses how well methods developed for or adapted from continuous tests perform with discrete tests. We demonstrate the usefulness of these estimators in the analysis of high-throughput biological RNA-seq and single-nucleotide polymorphism data. implemented in R. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Impact of parental relationships in maximum lod score affected sib-pair method.
Leutenegger, Anne-Louise; Génin, Emmanuelle; Thompson, Elizabeth A; Clerget-Darpoux, Françoise
2002-11-01
Many studies are done in small isolated populations and populations where marriages between relatives are encouraged. In this paper, we point out some problems with applying the maximum lod score (MLS) method (Risch, [1990] Am. J. Hum. Genet. 46:242-253) in these populations where relationships exist between the two parents of the affected sib-pairs. Characterizing the parental relationships by the kinship coefficient between the parents (f), the maternal inbreeding coefficient (alpha(m), and the paternal inbreeding coefficient (alpha(p)), we explored the relationship between the identity by descent (IBD) vector expected under the null hypothesis of no linkage and these quantities. We find that the expected IBD vector is no longer (0.25, 0.5, 0.25) when f, alpha(m), and alpha(p) differ from zero. In addition, the expected IBD vector does not always follow the triangle constraints recommended by Holmans ([1993] Am. J. Hum. Genet. 52:362-374). So the classically used MLS statistic needs to be adapted to the presence of parental relationships. We modified the software GENEHUNTER (Kruglyak et al. [1996] Am. J. Hum. Genet. 58: 1347-1363) to do so. Indeed, the current version of the software does not compute the likelihood properly under the null hypothesis. We studied the adapted statistic by simulating data on three different family structures: (1) parents are double first cousins (f=0.125, alpha(m)=alpha(p)=0), (2) each parent is the offspring of first cousins (f=0, alpha(m)=alpha(p)=0.0625), and (3) parents are related as in the pedigree from Goddard et al. ([1996] Am. J. Hum. Genet. 58:1286-1302) (f=0.109, alpha(m)=alpha(p)=0.0625). The appropriate threshold needs to be derived for each case in order to get the correct type I error. And using the classical statistic in the presence of both parental kinship and parental inbreeding almost always leads to false conclusions. Copyright 2002 Wiley-Liss, Inc.
Unscaled Bayes factors for multiple hypothesis testing in microarray experiments.
Bertolino, Francesco; Cabras, Stefano; Castellanos, Maria Eugenia; Racugno, Walter
2015-12-01
Multiple hypothesis testing collects a series of techniques usually based on p-values as a summary of the available evidence from many statistical tests. In hypothesis testing, under a Bayesian perspective, the evidence for a specified hypothesis against an alternative, conditionally on data, is given by the Bayes factor. In this study, we approach multiple hypothesis testing based on both Bayes factors and p-values, regarding multiple hypothesis testing as a multiple model selection problem. To obtain the Bayes factors we assume default priors that are typically improper. In this case, the Bayes factor is usually undetermined due to the ratio of prior pseudo-constants. We show that ignoring prior pseudo-constants leads to unscaled Bayes factor which do not invalidate the inferential procedure in multiple hypothesis testing, because they are used within a comparative scheme. In fact, using partial information from the p-values, we are able to approximate the sampling null distribution of the unscaled Bayes factor and use it within Efron's multiple testing procedure. The simulation study suggests that under normal sampling model and even with small sample sizes, our approach provides false positive and false negative proportions that are less than other common multiple hypothesis testing approaches based only on p-values. The proposed procedure is illustrated in two simulation studies, and the advantages of its use are showed in the analysis of two microarray experiments. © The Author(s) 2011.
Hypothesis Testing in the Real World
ERIC Educational Resources Information Center
Miller, Jeff
2017-01-01
Critics of null hypothesis significance testing suggest that (a) its basic logic is invalid and (b) it addresses a question that is of no interest. In contrast to (a), I argue that the underlying logic of hypothesis testing is actually extremely straightforward and compelling. To substantiate that, I present examples showing that hypothesis…
el Galta, Rachid; Uitte de Willige, Shirley; de Visser, Marieke C H; Helmer, Quinta; Hsu, Li; Houwing-Duistermaat, Jeanine J
2007-09-24
In this paper, we propose a one degree of freedom test for association between a candidate gene and a binary trait. This method is a generalization of Terwilliger's likelihood ratio statistic and is especially powerful for the situation of one associated haplotype. As an alternative to the likelihood ratio statistic, we derive a score statistic, which has a tractable expression. For haplotype analysis, we assume that phase is known. By means of a simulation study, we compare the performance of the score statistic to Pearson's chi-square statistic and the likelihood ratio statistic proposed by Terwilliger. We illustrate the method on three candidate genes studied in the Leiden Thrombophilia Study. We conclude that the statistic follows a chi square distribution under the null hypothesis and that the score statistic is more powerful than Terwilliger's likelihood ratio statistic when the associated haplotype has frequency between 0.1 and 0.4 and has a small impact on the studied disorder. With regard to Pearson's chi-square statistic, the score statistic has more power when the associated haplotype has frequency above 0.2 and the number of variants is above five.
NASA Astrophysics Data System (ADS)
Straka, Mika J.; Caldarelli, Guido; Squartini, Tiziano; Saracco, Fabio
2018-04-01
Bipartite networks provide an insightful representation of many systems, ranging from mutualistic networks of species interactions to investment networks in finance. The analyses of their topological structures have revealed the ubiquitous presence of properties which seem to characterize many—apparently different—systems. Nestedness, for example, has been observed in biological plant-pollinator as well as in country-product exportation networks. Due to the interdisciplinary character of complex networks, tools developed in one field, for example ecology, can greatly enrich other areas of research, such as economy and finance, and vice versa. With this in mind, we briefly review several entropy-based bipartite null models that have been recently proposed and discuss their application to real-world systems. The focus on these models is motivated by the fact that they show three very desirable features: analytical character, general applicability, and versatility. In this respect, entropy-based methods have been proven to perform satisfactorily both in providing benchmarks for testing evidence-based null hypotheses and in reconstructing unknown network configurations from partial information. Furthermore, entropy-based models have been successfully employed to analyze ecological as well as economic systems. As an example, the application of entropy-based null models has detected early-warning signals, both in economic and financial systems, of the 2007-2008 world crisis. Moreover, they have revealed a statistically-significant export specialization phenomenon of country export baskets in international trade, a result that seems to reconcile Ricardo's hypothesis in classical economics with recent findings on the (empirical) diversification industrial production at the national level. Finally, these null models have shown that the information contained in the nestedness is already accounted for by the degree sequence of the corresponding graphs.
The new statistics: why and how.
Cumming, Geoff
2014-01-01
We need to make substantial changes to how we conduct research. First, in response to heightened concern that our published research literature is incomplete and untrustworthy, we need new requirements to ensure research integrity. These include prespecification of studies whenever possible, avoidance of selection and other inappropriate data-analytic practices, complete reporting, and encouragement of replication. Second, in response to renewed recognition of the severe flaws of null-hypothesis significance testing (NHST), we need to shift from reliance on NHST to estimation and other preferred techniques. The new statistics refers to recommended practices, including estimation based on effect sizes, confidence intervals, and meta-analysis. The techniques are not new, but adopting them widely would be new for many researchers, as well as highly beneficial. This article explains why the new statistics are important and offers guidance for their use. It describes an eight-step new-statistics strategy for research with integrity, which starts with formulation of research questions in estimation terms, has no place for NHST, and is aimed at building a cumulative quantitative discipline.
Narayana, Sai Sathya; Deepa, Vinoth Kumar; Ahamed, Shafie; Sathish, Emmanuel Solomon; Meyappan, R; Satheesh Kumar, K S
2014-01-01
The objective of this study is to investigate the efficacy of bioactive glass containing product on remineralization of artificial induced carious enamel lesion and to compare its efficiency with other remineralization products using an in-vitro pH cycling method. The null hypothesis tested was bioactive glass has no effect on enamel remineralization. A total of 20 enamel samples of human molar teeth were subjected to artificial caries lesion formation using pH cycling method and was verified using high resolution scanning electron microscope (HRSEM). Each demineralized sample was then divided into five test groups each containing twenty. Group A - Bioactive glass (SHY-NM), Group B - Fluoride tooth paste (Amflor), Group C - CPP-ACP (Tooth mousse), Group D - CPP-ACPF (Tooth mousse plus), Group E - control. All the test groups were exposed to the pH cycling regime, the remineralizing agents were applied for 10 min except control. After 10 days period, the entire test groups were evaluated with HRSEM and quantitative assessment by energy dispersive X-ray spectroscopy. The obtained data was analyzed statistically using one-way ANOVA, Student's t-test and Tukey's multiple comparison tests. P ≤ 0.05 was considered to be significant. Rejection of the null hypothesis and highlights the concept of biomimetic bioactive glass as an effective remineralizing agent. To focus on the importance of minimal invasive treatment on incipient carious lesion by remineralization.
Kilinç, Delal Dara; Sayar, Gülşilay
2018-04-07
The aim of this study was to evaluate the effect of total surface sandblasting on the shear bond strength of two different retainer wires. The null hypothesis was that there is no difference in the bond strength of the two types of lingual retainer wires when they are sandblasted. One hundred and sixty human premolar teeth were equally divided into four groups (n=40). A pair of teeth was embedded in self-curing acrylic resin and polished. Retainer wires were applied on the etched and rinsed surfaces of the teeth. Four retainers were used: group 1: braided retainer (0.010×0.028″, Ortho Technology); group 2: sandblasted braided retainer (0.010×0.028″, Ortho Technology); group 3: coaxial retainer (0.0215″ Coaxial, 3M) and group 4: sandblasted coaxial retainer (0.0215″ Coaxial, 3M). The specimens were tested using a universal test machine in shear mode with a crosshead speed of one mm/min. One-way analysis of variance (Anova) was used to determine the significant differences among the groups. There was no significant difference (P=0.117) among the groups according to this test. The null hypothesis was accepted. There was no statistically significant difference among the shear bond strength values of the four groups. Copyright © 2018 CEO. Published by Elsevier Masson SAS. All rights reserved.
Using Bayes to get the most out of non-significant results
Dienes, Zoltan
2014-01-01
No scientific conclusion follows automatically from a statistically non-significant result, yet people routinely use non-significant results to guide conclusions about the status of theories (or the effectiveness of practices). To know whether a non-significant result counts against a theory, or if it just indicates data insensitivity, researchers must use one of: power, intervals (such as confidence or credibility intervals), or else an indicator of the relative evidence for one theory over another, such as a Bayes factor. I argue Bayes factors allow theory to be linked to data in a way that overcomes the weaknesses of the other approaches. Specifically, Bayes factors use the data themselves to determine their sensitivity in distinguishing theories (unlike power), and they make use of those aspects of a theory’s predictions that are often easiest to specify (unlike power and intervals, which require specifying the minimal interesting value in order to address theory). Bayes factors provide a coherent approach to determining whether non-significant results support a null hypothesis over a theory, or whether the data are just insensitive. They allow accepting and rejecting the null hypothesis to be put on an equal footing. Concrete examples are provided to indicate the range of application of a simple online Bayes calculator, which reveal both the strengths and weaknesses of Bayes factors. PMID:25120503
Reporting Practices and Use of Quantitative Methods in Canadian Journal Articles in Psychology.
Counsell, Alyssa; Harlow, Lisa L
2017-05-01
With recent focus on the state of research in psychology, it is essential to assess the nature of the statistical methods and analyses used and reported by psychological researchers. To that end, we investigated the prevalence of different statistical procedures and the nature of statistical reporting practices in recent articles from the four major Canadian psychology journals. The majority of authors evaluated their research hypotheses through the use of analysis of variance (ANOVA), t -tests, and multiple regression. Multivariate approaches were less common. Null hypothesis significance testing remains a popular strategy, but the majority of authors reported a standardized or unstandardized effect size measure alongside their significance test results. Confidence intervals on effect sizes were infrequently employed. Many authors provided minimal details about their statistical analyses and less than a third of the articles presented on data complications such as missing data and violations of statistical assumptions. Strengths of and areas needing improvement for reporting quantitative results are highlighted. The paper concludes with recommendations for how researchers and reviewers can improve comprehension and transparency in statistical reporting.
Rusk, Andria; Highfield, Linda; Wilkerson, J Michael; Harrell, Melissa; Obala, Andrew; Amick, Benjamin
2016-02-19
Efforts to improve malaria case management in sub-Saharan Africa have shifted focus to private antimalarial retailers to increase access to appropriate treatment. Demands to decrease intervention cost while increasing efficacy requires interventions tailored to geographic regions with demonstrated need. Cluster analysis presents an opportunity to meet this demand, but has not been applied to the retail sector or antimalarial retailer behaviors. This research conducted cluster analysis on medicine retailer behaviors in Kenya, to improve malaria case management and inform future interventions. Ninety-seven surveys were collected from medicine retailers working in the Webuye Health and Demographic Surveillance Site. Survey items included retailer training, education, antimalarial drug knowledge, recommending behavior, sales, and shop characteristics, and were analyzed using Kulldorff's spatial scan statistic. The Bernoulli purely spatial model for binomial data was used, comparing cases to controls. Statistical significance of found clusters was tested with a likelihood ratio test, using the null hypothesis of no clustering, and a p value based on 999 Monte Carlo simulations. The null hypothesis was rejected with p values of 0.05 or less. A statistically significant cluster of fewer than expected pharmacy-trained retailers was found (RR = .09, p = .001) when compared to the expected random distribution. Drug recommending behavior also yielded a statistically significant cluster, with fewer than expected retailers recommending the correct antimalarial medication to adults (RR = .018, p = .01), and fewer than expected shops selling that medication more often than outdated antimalarials when compared to random distribution (RR = 0.23, p = .007). All three of these clusters were co-located, overlapping in the northwest of the study area. Spatial clustering was found in the data. A concerning amount of correlation was found in one specific region in the study area where multiple behaviors converged in space, highlighting a prime target for interventions. These results also demonstrate the utility of applying geospatial methods in the study of medicine retailer behaviors, making the case for expanding this approach to other regions.
A large scale test of the gaming-enhancement hypothesis
Wang, John C.
2016-01-01
A growing research literature suggests that regular electronic game play and game-based training programs may confer practically significant benefits to cognitive functioning. Most evidence supporting this idea, the gaming-enhancement hypothesis, has been collected in small-scale studies of university students and older adults. This research investigated the hypothesis in a general way with a large sample of 1,847 school-aged children. Our aim was to examine the relations between young people’s gaming experiences and an objective test of reasoning performance. Using a Bayesian hypothesis testing approach, evidence for the gaming-enhancement and null hypotheses were compared. Results provided no substantive evidence supporting the idea that having preference for or regularly playing commercially available games was positively associated with reasoning ability. Evidence ranged from equivocal to very strong in support for the null hypothesis over what was predicted. The discussion focuses on the value of Bayesian hypothesis testing for investigating electronic gaming effects, the importance of open science practices, and pre-registered designs to improve the quality of future work. PMID:27896035
Coordinate based random effect size meta-analysis of neuroimaging studies.
Tench, C R; Tanasescu, Radu; Constantinescu, C S; Auer, D P; Cottam, W J
2017-06-01
Low power in neuroimaging studies can make them difficult to interpret, and Coordinate based meta-analysis (CBMA) may go some way to mitigating this issue. CBMA has been used in many analyses to detect where published functional MRI or voxel-based morphometry studies testing similar hypotheses report significant summary results (coordinates) consistently. Only the reported coordinates and possibly t statistics are analysed, and statistical significance of clusters is determined by coordinate density. Here a method of performing coordinate based random effect size meta-analysis and meta-regression is introduced. The algorithm (ClusterZ) analyses both coordinates and reported t statistic or Z score, standardised by the number of subjects. Statistical significance is determined not by coordinate density, but by a random effects meta-analyses of reported effects performed cluster-wise using standard statistical methods and taking account of censoring inherent in the published summary results. Type 1 error control is achieved using the false cluster discovery rate (FCDR), which is based on the false discovery rate. This controls both the family wise error rate under the null hypothesis that coordinates are randomly drawn from a standard stereotaxic space, and the proportion of significant clusters that are expected under the null. Such control is necessary to avoid propagating and even amplifying the very issues motivating the meta-analysis in the first place. ClusterZ is demonstrated on both numerically simulated data and on real data from reports of grey matter loss in multiple sclerosis (MS) and syndromes suggestive of MS, and of painful stimulus in healthy controls. The software implementation is available to download and use freely. Copyright © 2017 Elsevier Inc. All rights reserved.
Rosales, Corina; Patel, Niket; Gillard, Baiba K.; Yelamanchili, Dedipya; Yang, Yaliu; Courtney, Harry S.; Santos, Raul D.; Gotto, Antonio M.; Pownall, Henry J.
2016-01-01
The reaction of Streptococcal serum opacity factor (SOF) against plasma high-density lipoproteins (HDL) produces a large cholesteryl ester-rich microemulsion (CERM), a smaller neo HDL that is apolipoprotein (apo) AI-poor, and lipid-free apo AI. SOF is active vs. both human and mouse plasma HDL. In vivo injection of SOF into mice reduces plasma cholesterol ~40% in 3 hours while forming the same products observed in vitro, but at different ratios. Previous studies supported the hypothesis that labile apo AI is required for the SOF reaction vs. HDL. Here we further tested that hypothesis by studies of SOF against HDL from apo AI-null mice. When injected into apo AI-null mice, SOF reduced plasma cholesterol ~35% in three hours. The reaction of SOF vs. apo AI-null HDL in vitro produced a CERM and neo HDL, but no lipid-free apo. Moreover, according to the rate of CERM formation, the extent and rate of the SOF reaction vs. apo AI-null mouse HDL was less than that against wild-type (WT) mouse HDL. Chaotropic perturbation studies using guanidine hydrochloride showed that apo AI-null HDL was more stable than WT HDL. Human apo AI added to apo AI-null HDL was quantitatively incorporated, giving reconstituted HDL. Both SOF and guanidine hydrochloride displaced apo AI from the reconstituted HDL. These results support the conclusion that apo AI-null HDL is more stable than WT HDL because it lacks apo AI, a labile protein that is readily displaced by physico-chemical and biochemical perturbations. Thus, apo AI-null HDL is less SOF-reactive than WT HDL. The properties of apo AI-null HDL can be partially restored to those of WT HDL by the spontaneous incorporation of human apo AI. It remains to be determined what other HDL functions are affected by apo AI deletion. PMID:25790332
Wilkinson, Michael
2014-03-01
Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.
Transfer Entropy as a Log-Likelihood Ratio
NASA Astrophysics Data System (ADS)
Barnett, Lionel; Bossomaier, Terry
2012-09-01
Transfer entropy, an information-theoretic measure of time-directed information transfer between joint processes, has steadily gained popularity in the analysis of complex stochastic dynamics in diverse fields, including the neurosciences, ecology, climatology, and econometrics. We show that for a broad class of predictive models, the log-likelihood ratio test statistic for the null hypothesis of zero transfer entropy is a consistent estimator for the transfer entropy itself. For finite Markov chains, furthermore, no explicit model is required. In the general case, an asymptotic χ2 distribution is established for the transfer entropy estimator. The result generalizes the equivalence in the Gaussian case of transfer entropy and Granger causality, a statistical notion of causal influence based on prediction via vector autoregression, and establishes a fundamental connection between directed information transfer and causality in the Wiener-Granger sense.
Transfer entropy as a log-likelihood ratio.
Barnett, Lionel; Bossomaier, Terry
2012-09-28
Transfer entropy, an information-theoretic measure of time-directed information transfer between joint processes, has steadily gained popularity in the analysis of complex stochastic dynamics in diverse fields, including the neurosciences, ecology, climatology, and econometrics. We show that for a broad class of predictive models, the log-likelihood ratio test statistic for the null hypothesis of zero transfer entropy is a consistent estimator for the transfer entropy itself. For finite Markov chains, furthermore, no explicit model is required. In the general case, an asymptotic χ2 distribution is established for the transfer entropy estimator. The result generalizes the equivalence in the Gaussian case of transfer entropy and Granger causality, a statistical notion of causal influence based on prediction via vector autoregression, and establishes a fundamental connection between directed information transfer and causality in the Wiener-Granger sense.
Basic biostatistics for post-graduate students
Dakhale, Ganesh N.; Hiware, Sachin K.; Shinde, Abhijit T.; Mahatme, Mohini S.
2012-01-01
Statistical methods are important to draw valid conclusions from the obtained data. This article provides background information related to fundamental methods and techniques in biostatistics for the use of postgraduate students. Main focus is given to types of data, measurement of central variations and basic tests, which are useful for analysis of different types of observations. Few parameters like normal distribution, calculation of sample size, level of significance, null hypothesis, indices of variability, and different test are explained in detail by giving suitable examples. Using these guidelines, we are confident enough that postgraduate students will be able to classify distribution of data along with application of proper test. Information is also given regarding various free software programs and websites useful for calculations of statistics. Thus, postgraduate students will be benefitted in both ways whether they opt for academics or for industry. PMID:23087501
The epistemology of mathematical and statistical modeling: a quiet methodological revolution.
Rodgers, Joseph Lee
2010-01-01
A quiet methodological revolution, a modeling revolution, has occurred over the past several decades, almost without discussion. In contrast, the 20th century ended with contentious argument over the utility of null hypothesis significance testing (NHST). The NHST controversy may have been at least partially irrelevant, because in certain ways the modeling revolution obviated the NHST argument. I begin with a history of NHST and modeling and their relation to one another. Next, I define and illustrate principles involved in developing and evaluating mathematical models. Following, I discuss the difference between using statistical procedures within a rule-based framework and building mathematical models from a scientific epistemology. Only the former is treated carefully in most psychology graduate training. The pedagogical implications of this imbalance and the revised pedagogy required to account for the modeling revolution are described. To conclude, I discuss how attention to modeling implies shifting statistical practice in certain progressive ways. The epistemological basis of statistics has moved away from being a set of procedures, applied mechanistically, and moved toward building and evaluating statistical and scientific models. Copyrigiht 2009 APA, all rights reserved.
Confidence interval or p-value?: part 4 of a series on evaluation of scientific publications.
du Prel, Jean-Baptist; Hommel, Gerhard; Röhrig, Bernd; Blettner, Maria
2009-05-01
An understanding of p-values and confidence intervals is necessary for the evaluation of scientific articles. This article will inform the reader of the meaning and interpretation of these two statistical concepts. The uses of these two statistical concepts and the differences between them are discussed on the basis of a selective literature search concerning the methods employed in scientific articles. P-values in scientific studies are used to determine whether a null hypothesis formulated before the performance of the study is to be accepted or rejected. In exploratory studies, p-values enable the recognition of any statistically noteworthy findings. Confidence intervals provide information about a range in which the true value lies with a certain degree of probability, as well as about the direction and strength of the demonstrated effect. This enables conclusions to be drawn about the statistical plausibility and clinical relevance of the study findings. It is often useful for both statistical measures to be reported in scientific articles, because they provide complementary types of information.
Cluster mass inference via random field theory.
Zhang, Hui; Nichols, Thomas E; Johnson, Timothy D
2009-01-01
Cluster extent and voxel intensity are two widely used statistics in neuroimaging inference. Cluster extent is sensitive to spatially extended signals while voxel intensity is better for intense but focal signals. In order to leverage strength from both statistics, several nonparametric permutation methods have been proposed to combine the two methods. Simulation studies have shown that of the different cluster permutation methods, the cluster mass statistic is generally the best. However, to date, there is no parametric cluster mass inference available. In this paper, we propose a cluster mass inference method based on random field theory (RFT). We develop this method for Gaussian images, evaluate it on Gaussian and Gaussianized t-statistic images and investigate its statistical properties via simulation studies and real data. Simulation results show that the method is valid under the null hypothesis and demonstrate that it can be more powerful than the cluster extent inference method. Further, analyses with a single subject and a group fMRI dataset demonstrate better power than traditional cluster size inference, and good accuracy relative to a gold-standard permutation test.
Pridemore, William Alex; Freilich, Joshua D
2007-12-01
Since Roe v. Wade, most states have passed laws either restricting or further protecting reproductive rights. During a wave of anti-abortion violence in the early 1990s, several states also enacted legislation protecting abortion clinics, staff, and patients. One hypothesis drawn from the theoretical literature predicts that these laws provide a deterrent effect and thus fewer anti-abortion crimes in states that protect clinics and reproductive rights. An alternative hypothesis drawn from the literature expects a backlash effect from radical members of the movement and thus more crimes in states with protective legislation. We tested these competing hypotheses by taking advantage of unique data sets that gauge the strength of laws protecting clinics and reproductive rights and that provide self-report victimization data from clinics. Employing logistic regression and controlling for several potential covariates, we found null effects and thus no support for either hypothesis. The null findings were consistent across a number of different types of victimization. Our discussion contextualizes these results in terms of previous research on crimes against abortion providers, discusses alternative explanations for the null findings, and considers the implications for future policy development and research.
A Bayesian Perspective on the Reproducibility Project: Psychology
Etz, Alexander; Vandekerckhove, Joachim
2016-01-01
We revisit the results of the recent Reproducibility Project: Psychology by the Open Science Collaboration. We compute Bayes factors—a quantity that can be used to express comparative evidence for an hypothesis but also for the null hypothesis—for a large subset (N = 72) of the original papers and their corresponding replication attempts. In our computation, we take into account the likely scenario that publication bias had distorted the originally published results. Overall, 75% of studies gave qualitatively similar results in terms of the amount of evidence provided. However, the evidence was often weak (i.e., Bayes factor < 10). The majority of the studies (64%) did not provide strong evidence for either the null or the alternative hypothesis in either the original or the replication, and no replication attempts provided strong evidence in favor of the null. In all cases where the original paper provided strong evidence but the replication did not (15%), the sample size in the replication was smaller than the original. Where the replication provided strong evidence but the original did not (10%), the replication sample size was larger. We conclude that the apparent failure of the Reproducibility Project to replicate many target effects can be adequately explained by overestimation of effect sizes (or overestimation of evidence against the null hypothesis) due to small sample sizes and publication bias in the psychological literature. We further conclude that traditional sample sizes are insufficient and that a more widespread adoption of Bayesian methods is desirable. PMID:26919473
Interpreting null findings from trials of alcohol brief interventions.
Heather, Nick
2014-01-01
The effectiveness of alcohol brief intervention (ABI) has been established by a succession of meta-analyses but, because the effects of ABI are small, null findings from randomized controlled trials are often reported and can sometimes lead to skepticism regarding the benefits of ABI in routine practice. This article first explains why null findings are likely to occur under null hypothesis significance testing (NHST) due to the phenomenon known as "the dance of the p-values." A number of misconceptions about null findings are then described, using as an example the way in which the results of the primary care arm of a recent cluster-randomized trial of ABI in England (the SIPS project) have been misunderstood. These misinterpretations include the fallacy of "proving the null hypothesis" that lack of a significant difference between the means of sample groups can be taken as evidence of no difference between their population means, and the possible effects of this and related misunderstandings of the SIPS findings are examined. The mistaken inference that reductions in alcohol consumption seen in control groups from baseline to follow-up are evidence of real effects of control group procedures is then discussed and other possible reasons for such reductions, including regression to the mean, research participation effects, historical trends, and assessment reactivity, are described. From the standpoint of scientific progress, the chief problem about null findings under the conventional NHST approach is that it is not possible to distinguish "evidence of absence" from "absence of evidence." By contrast, under a Bayesian approach, such a distinction is possible and it is explained how this approach could classify ABIs in particular settings or among particular populations as either truly ineffective or as of unknown effectiveness, thus accelerating progress in the field of ABI research.
Weinheimer-Haus, Eileen M.; Mirza, Rita E.; Koh, Timothy J.
2015-01-01
The Nod-like receptor protein (NLRP)-3 inflammasome/IL-1β pathway is involved in the pathogenesis of various inflammatory skin diseases, but its biological role in wound healing remains to be elucidated. Since inflammation is typically thought to impede healing, we hypothesized that loss of NLRP-3 activity would result in a downregulated inflammatory response and accelerated wound healing. NLRP-3 null mice, caspase-1 null mice and C57Bl/6 wild type control mice (WT) received four 8 mm excisional cutaneous wounds; inflammation and healing were assessed during the early stage of wound healing. Consistent with our hypothesis, wounds from NLRP-3 null and caspase-1 null mice contained lower levels of the pro-inflammatory cytokines IL-1β and TNF-α compared to WT mice and had reduced neutrophil and macrophage accumulation. Contrary to our hypothesis, re-epithelialization, granulation tissue formation, and angiogenesis were delayed in NLRP-3 null mice and caspase-1 null mice compared to WT mice, indicating that NLRP-3 signaling is important for early events in wound healing. Topical treatment of excisional wounds with recombinant IL-1β partially restored granulation tissue formation in wounds of NLRP-3 null mice, confirming the importance of NLRP-3-dependent IL-1β production during early wound healing. Despite the improvement in healing, angiogenesis and levels of the pro-angiogenic growth factor VEGF were further reduced in IL-1β treated wounds, suggesting that IL-1β has a negative effect on angiogenesis and that NLRP-3 promotes angiogenesis in an IL-1β-independent manner. These findings indicate that the NLRP-3 inflammasome contributes to the early inflammatory phase following skin wounding and is important for efficient healing. PMID:25793779
Testing for purchasing power parity in the long-run for ASEAN-5
NASA Astrophysics Data System (ADS)
Choji, Niri Martha; Sek, Siok Kun
2017-04-01
For more than a decade, there has been a substantial interest in testing for the validity of the purchasing power parity (PPP) hypothesis empirically. This paper performs a test on revealing a long-run relative Purchasing Power Parity for a group of ASEAN-5 countries for the period of 1996-2016 using monthly data. For this purpose, we used the Pedroni co-integration method to test for the long-run hypothesis of purchasing power parity. We first tested for the stationarity of the variables and found that the variables are non-stationary at levels but stationary at first difference. Results of the Pedroni test rejected the null hypothesis of no co-integration meaning that we have enough evidence to support PPP in the long-run for the ASEAN-5 countries over the period of 1996-2016. In other words, the rejection of null hypothesis implies a long-run relation between nominal exchange rates and relative prices.
How to talk about protein‐level false discovery rates in shotgun proteomics
The, Matthew; Tasnim, Ayesha
2016-01-01
A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein‐level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein‐level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein‐level FDRs for both competing null hypotheses. PMID:27503675
NASA Astrophysics Data System (ADS)
Harken, B.; Geiges, A.; Rubin, Y.
2013-12-01
There are several stages in any hydrological modeling campaign, including: formulation and analysis of a priori information, data acquisition through field campaigns, inverse modeling, and forward modeling and prediction of some environmental performance metric (EPM). The EPM being predicted could be, for example, contaminant concentration, plume travel time, or aquifer recharge rate. These predictions often have significant bearing on some decision that must be made. Examples include: how to allocate limited remediation resources between multiple contaminated groundwater sites, where to place a waste repository site, and what extraction rates can be considered sustainable in an aquifer. Providing an answer to these questions depends on predictions of EPMs using forward models as well as levels of uncertainty related to these predictions. Uncertainty in model parameters, such as hydraulic conductivity, leads to uncertainty in EPM predictions. Often, field campaigns and inverse modeling efforts are planned and undertaken with reduction of parametric uncertainty as the objective. The tool of hypothesis testing allows this to be taken one step further by considering uncertainty reduction in the ultimate prediction of the EPM as the objective and gives a rational basis for weighing costs and benefits at each stage. When using the tool of statistical hypothesis testing, the EPM is cast into a binary outcome. This is formulated as null and alternative hypotheses, which can be accepted and rejected with statistical formality. When accounting for all sources of uncertainty at each stage, the level of significance of this test provides a rational basis for planning, optimization, and evaluation of the entire campaign. Case-specific information, such as consequences prediction error and site-specific costs can be used in establishing selection criteria based on what level of risk is deemed acceptable. This framework is demonstrated and discussed using various synthetic case studies. The case studies involve contaminated aquifers where a decision must be made based on prediction of when a contaminant will arrive at a given location. The EPM, in this case contaminant travel time, is cast into the hypothesis testing framework. The null hypothesis states that the contaminant plume will arrive at the specified location before a critical value of time passes, and the alternative hypothesis states that the plume will arrive after the critical time passes. Different field campaigns are analyzed based on effectiveness in reducing the probability of selecting the wrong hypothesis, which in this case corresponds to reducing uncertainty in the prediction of plume arrival time. To examine the role of inverse modeling in this framework, case studies involving both Maximum Likelihood parameter estimation and Bayesian inversion are used.
NASA Astrophysics Data System (ADS)
Hilburn, Monty D.
Successful lean manufacturing and cellular manufacturing execution relies upon a foundation of leadership commitment and strategic planning built upon solid data and robust analysis. The problem for this study was to create and employ a simple lean transformation planning model and review process that could be used to identify functional support staff resources required to plan and execute lean manufacturing cells within aerospace assembly and manufacturing sites. The lean planning model was developed using available literature for lean manufacturing kaizen best practices and validated through a Delphi panel of lean experts. The resulting model and a standardized review process were used to assess the state of lean transformation planning at five sites of an international aerospace manufacturing and assembly company. The results of the three day, on-site review were compared with baseline plans collected from each of the five sites to determine if there analyzed, with focus on three critical areas of lean planning: the number and type of manufacturing cells identified, the number, type, and duration of planned lean and continuous kaizen events, and the quantity and type of functional staffing resources planned to support the kaizen schedule. Summarized data of the baseline and on-site reviews was analyzed with descriptive statistics. ANOVAs and paired-t tests at 95% significance level were conducted on the means of data sets to determine if null hypotheses related to cell, kaizen event, and support resources could be rejected. The results of the research found significant differences between lean transformation plans developed by site leadership and plans developed utilizing the structured, on-site review process and lean transformation planning model. The null hypothesis that there was no difference between the means of pre-review and on-site cell counts was rejected, as was the null hypothesis that there was no significant difference in kaizen event plans. These factors are critical inputs into the support staffing resources calculation used by the lean planning model. Null hypothesis related to functional support staff resources was rejected for most functional groups, indicating that the baseline site plan inadequately provided for cross-functional staff involvement to support the lean transformation plan. Null hypotheses related to total lean transformation staffing could not be rejected, indicating that while total staffing plans were not significantly different than plans developed during the on-site review and through the use of the lean planning model, the allocation of staffing among various functional groups such as engineering, production, and materials planning was an issue. The on-site review process and simple lean transformation plan developed was determined to be useful in identifying short-comings in lean transformation planning within aerospace manufacturing and assembly sites. It was concluded that the differences uncovered were likely contributing factors affecting the effectiveness of aerospace manufacturing sites' implementation of lean cellular manufacturing.
The impact of p53 protein core domain structural alteration on ovarian cancer survival.
Rose, Stephen L; Robertson, Andrew D; Goodheart, Michael J; Smith, Brian J; DeYoung, Barry R; Buller, Richard E
2003-09-15
Although survival with a p53 missense mutation is highly variable, p53-null mutation is an independent adverse prognostic factor for advanced stage ovarian cancer. By evaluating ovarian cancer survival based upon a structure function analysis of the p53 protein, we tested the hypothesis that not all missense mutations are equivalent. The p53 gene was sequenced from 267 consecutive ovarian cancers. The effect of individual missense mutations on p53 structure was analyzed using the International Agency for Research on Cancer p53 Mutational Database, which specifies the effects of p53 mutations on p53 core domain structure. Mutations in the p53 core domain were classified as either explained or not explained in structural or functional terms by their predicted effects on protein folding, protein-DNA contacts, or mutation in highly conserved residues. Null mutations were classified by their mechanism of origin. Mutations were sequenced from 125 tumors. Effects of 62 of the 82 missense mutations (76%) could be explained by alterations in the p53 protein. Twenty-three (28%) of the explained mutations occurred in highly conserved regions of the p53 core protein. Twenty-two nonsense point mutations and 21 frameshift null mutations were sequenced. Survival was independent of missense mutation type and mechanism of null mutation. The hypothesis that not all missense mutations are equivalent is, therefore, rejected. Furthermore, p53 core domain structural alteration secondary to missense point mutation is not functionally equivalent to a p53-null mutation. The poor prognosis associated with p53-null mutation is independent of the mutation mechanism.
Estimation versus falsification approaches in sport and exercise science.
Wilkinson, Michael; Winter, Edward M
2018-05-22
There has been a recent resurgence in debate about methods for statistical inference in science. The debate addresses statistical concepts and their impact on the value and meaning of analyses' outcomes. In contrast, philosophical underpinnings of approaches and the extent to which analytical tools match philosophical goals of the scientific method have received less attention. This short piece considers application of the scientific method to "what-is-the-influence-of x-on-y" type questions characteristic of sport and exercise science. We consider applications and interpretations of estimation versus falsification based statistical approaches and their value in addressing how much x influences y, and in measurement error and method agreement settings. We compare estimation using magnitude based inference (MBI) with falsification using null hypothesis significance testing (NHST), and highlight the limited value both of falsification and NHST to address problems in sport and exercise science. We recommend adopting an estimation approach, expressing the uncertainty of effects of x on y, and their practical/clinical value against pre-determined effect magnitudes using MBI.
Detecting Multifractal Properties in Asset Returns:
NASA Astrophysics Data System (ADS)
Lux, Thomas
It has become popular recently to apply the multifractal formalism of statistical physics (scaling analysis of structure functions and f(α) singularity spectrum analysis) to financial data. The outcome of such studies is a nonlinear shape of the structure function and a nontrivial behavior of the spectrum. Eventually, this literature has moved from basic data analysis to estimation of particular variants of multifractal models for asset returns via fitting of the empirical τ(q) and f(α) functions. Here, we reinvestigate earlier claims of multifractality using four long time series of important financial markets. Taking the recently proposed multifractal models of asset returns as our starting point, we show that the typical "scaling estimators" used in the physics literature are unable to distinguish between spurious and "true" multiscaling of financial data. Designing explicit tests for multiscaling, we can in no case reject the null hypothesis that the apparent curvature of both the scaling function and the Hölder spectrum are spuriously generated by the particular fat-tailed distribution of financial data. Given the well-known overwhelming evidence in favor of different degrees of long-term dependence in the powers of returns, we interpret this inability to reject the null hypothesis of multiscaling as a lack of discriminatory power of the standard approach rather than as a true rejection of multiscaling. However, the complete "failure" of the multifractal apparatus in this setting also raises the question whether results in other areas (like geophysics) suffer from similar shortcomings of the traditional methodology.
Jain, Shikha; Shetty, K Sadashiva; Jain, Shweta; Jain, Sachin; Prakash, A T; Agrawal, Mamta
2015-07-01
To assess the null hypothesis that there is no difference in the rate of dental development and the occurrence of selected developmental anomalies related to shape, number, structure, and position of teeth between subjects with impacted mandibular canines and those with normally erupted canines. Pretreatment records of 42 subjects diagnosed with mandibular canines impaction (impaction group: IG) were compared with those of 84 subjects serving as a control reference sample (control group: CG). Independent t-tests were used to compare mean dental ages between the groups. Intergroup differences in distribution of subjects based on the rate of dental development and occurrence of selected dental anomalies were assessed using χ(2) tests. Odds of late, normal, and early developers and various categories of developmental anomalies between the IG and the CG were evaluated in terms of odds ratios. Mean dental age for the IG was lower than that for the CG in general. Specifically, this was true for girls (P < .05). Differences in the distribution of the subjects based on the rate of dental development and occurrence of positional anomalies also reached statistical significance (P < .05). The IG showed a higher frequency of late developers and positional anomalies compared with controls (odds ratios 3.00 and 2.82, respectively; P < .05). The null hypothesis was rejected. We identified close association of female subjects in the IG with retarded dental development compared with the female orthodontic patients. Increased frequency of positional developmental anomalies was also remarkable in the IG.
Long memory and multifractality: A joint test
NASA Astrophysics Data System (ADS)
Goddard, John; Onali, Enrico
2016-06-01
The properties of statistical tests for hypotheses concerning the parameters of the multifractal model of asset returns (MMAR) are investigated, using Monte Carlo techniques. We show that, in the presence of multifractality, conventional tests of long memory tend to over-reject the null hypothesis of no long memory. Our test addresses this issue by jointly estimating long memory and multifractality. The estimation and test procedures are applied to exchange rate data for 12 currencies. Among the nested model specifications that are investigated, in 11 out of 12 cases, daily returns are most appropriately characterized by a variant of the MMAR that applies a multifractal time-deformation process to NIID returns. There is no evidence of long memory.
Permutation entropy of finite-length white-noise time series.
Little, Douglas J; Kane, Deb M
2016-08-01
Permutation entropy (PE) is commonly used to discriminate complex structure from white noise in a time series. While the PE of white noise is well understood in the long time-series limit, analysis in the general case is currently lacking. Here the expectation value and variance of white-noise PE are derived as functions of the number of ordinal pattern trials, N, and the embedding dimension, D. It is demonstrated that the probability distribution of the white-noise PE converges to a χ^{2} distribution with D!-1 degrees of freedom as N becomes large. It is further demonstrated that the PE variance for an arbitrary time series can be estimated as the variance of a related metric, the Kullback-Leibler entropy (KLE), allowing the qualitative N≫D! condition to be recast as a quantitative estimate of the N required to achieve a desired PE calculation precision. Application of this theory to statistical inference is demonstrated in the case of an experimentally obtained noise series, where the probability of obtaining the observed PE value was calculated assuming a white-noise time series. Standard statistical inference can be used to draw conclusions whether the white-noise null hypothesis can be accepted or rejected. This methodology can be applied to other null hypotheses, such as discriminating whether two time series are generated from different complex system states.
Orr, H A
1998-01-01
Evolutionary biologists have long sought a way to determine whether a phenotypic difference between two taxa was caused by natural selection or random genetic drift. Here I argue that data from quantitative trait locus (QTL) analyses can be used to test the null hypothesis of neutral phenotypic evolution. I propose a sign test that compares the observed number of plus and minus alleles in the "high line" with that expected under neutrality, conditioning on the known phenotypic difference between the taxa. Rejection of the null hypothesis implies a role for directional natural selection. This test is applicable to any character in any organism in which QTL analysis can be performed. PMID:9691061
An Independent Filter for Gene Set Testing Based on Spectral Enrichment.
Frost, H Robert; Li, Zhigang; Asselbergs, Folkert W; Moore, Jason H
2015-01-01
Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.
Buu, Anne; Williams, L Keoki; Yang, James J
2018-03-01
We propose a new genome-wide association test for mixed binary and continuous phenotypes that uses an efficient numerical method to estimate the empirical distribution of the Fisher's combination statistic under the null hypothesis. Our simulation study shows that the proposed method controls the type I error rate and also maintains its power at the level of the permutation method. More importantly, the computational efficiency of the proposed method is much higher than the one of the permutation method. The simulation results also indicate that the power of the test increases when the genetic effect increases, the minor allele frequency increases, and the correlation between responses decreases. The statistical analysis on the database of the Study of Addiction: Genetics and Environment demonstrates that the proposed method combining multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests.
Giofrè, David; Cumming, Geoff; Fresc, Luca; Boedker, Ingrid; Tressoldi, Patrizio
2017-01-01
From January 2014, Psychological Science introduced new submission guidelines that encouraged the use of effect sizes, estimation, and meta-analysis (the "new statistics"), required extra detail of methods, and offered badges for use of open science practices. We investigated the use of these practices in empirical articles published by Psychological Science and, for comparison, by the Journal of Experimental Psychology: General, during the period of January 2013 to December 2015. The use of null hypothesis significance testing (NHST) was extremely high at all times and in both journals. In Psychological Science, the use of confidence intervals increased markedly overall, from 28% of articles in 2013 to 70% in 2015, as did the availability of open data (3 to 39%) and open materials (7 to 31%). The other journal showed smaller or much smaller changes. Our findings suggest that journal-specific submission guidelines may encourage desirable changes in authors' practices.
Booth, Brian G; Keijsers, Noël L W; Sijbers, Jan; Huysmans, Toon
2018-05-03
Pedobarography produces large sets of plantar pressure samples that are routinely subsampled (e.g. using regions of interest) or aggregated (e.g. center of pressure trajectories, peak pressure images) in order to simplify statistical analysis and provide intuitive clinical measures. We hypothesize that these data reductions discard gait information that can be used to differentiate between groups or conditions. To test the hypothesis of null information loss, we created an implementation of statistical parametric mapping (SPM) for dynamic plantar pressure datasets (i.e. plantar pressure videos). Our SPM software framework brings all plantar pressure videos into anatomical and temporal correspondence, then performs statistical tests at each sampling location in space and time. Novelly, we introduce non-linear temporal registration into the framework in order to normalize for timing differences within the stance phase. We refer to our software framework as STAPP: spatiotemporal analysis of plantar pressure measurements. Using STAPP, we tested our hypothesis on plantar pressure videos from 33 healthy subjects walking at different speeds. As walking speed increased, STAPP was able to identify significant decreases in plantar pressure at mid-stance from the heel through the lateral forefoot. The extent of these plantar pressure decreases has not previously been observed using existing plantar pressure analysis techniques. We therefore conclude that the subsampling of plantar pressure videos - a task which led to the discarding of gait information in our study - can be avoided using STAPP. Copyright © 2018 Elsevier B.V. All rights reserved.
Effects of Gum Chewing on Appetite and Digestion
2013-05-28
The Null Hypothesis is That Food Rheology Will Have no Effect on These Indices.; The Alternate Hypothesis is That Increased Mechanical Stimulation Will Result in Stronger Satiation/Satiety and Reduced Energy Intake.; Further, it is Hypothesized That the Effects of Mastication Will be Less Evident in Obese Compared to Lean Individuals.
NASA Technical Reports Server (NTRS)
Goepfert, T. M.; McCarthy, M.; Kittrell, F. S.; Stephens, C.; Ullrich, R. L.; Brinkley, B. R.; Medina, D.
2000-01-01
Mammary epithelial cells from p53 null mice have been shown recently to exhibit an increased risk for tumor development. Hormonal stimulation markedly increased tumor development in p53 null mammary cells. Here we demonstrate that mammary tumors arising in p53 null mammary cells are highly aneuploid, with greater than 70% of the tumor cells containing altered chromosome number and a mean chromosome number of 56. Normal mammary cells of p53 null genotype and aged less than 14 wk do not exhibit aneuploidy in primary cell culture. Significantly, the hormone progesterone, but not estrogen, increases the incidence of aneuploidy in morphologically normal p53 null mammary epithelial cells. Such cells exhibited 40% aneuploidy and a mean chromosome number of 54. The increase in aneuploidy measured in p53 null tumor cells or hormonally stimulated normal p53 null cells was not accompanied by centrosome amplification. These results suggest that normal levels of progesterone can facilitate chromosomal instability in the absence of the tumor suppressor gene, p53. The results support the emerging hypothesis based both on human epidemiological and animal model studies that progesterone markedly enhances mammary tumorigenesis.
The Importance of Proving the Null
ERIC Educational Resources Information Center
Gallistel, C. R.
2009-01-01
Null hypotheses are simple, precise, and theoretically important. Conventional statistical analysis cannot support them; Bayesian analysis can. The challenge in a Bayesian analysis is to formulate a suitably vague alternative, because the vaguer the alternative is (the more it spreads out the unit mass of prior probability), the more the null is…
On the importance of avoiding shortcuts in applying cognitive models to hierarchical data.
Boehm, Udo; Marsman, Maarten; Matzke, Dora; Wagenmakers, Eric-Jan
2018-06-12
Psychological experiments often yield data that are hierarchically structured. A number of popular shortcut strategies in cognitive modeling do not properly accommodate this structure and can result in biased conclusions. To gauge the severity of these biases, we conducted a simulation study for a two-group experiment. We first considered a modeling strategy that ignores the hierarchical data structure. In line with theoretical results, our simulations showed that Bayesian and frequentist methods that rely on this strategy are biased towards the null hypothesis. Secondly, we considered a modeling strategy that takes a two-step approach by first obtaining participant-level estimates from a hierarchical cognitive model and subsequently using these estimates in a follow-up statistical test. Methods that rely on this strategy are biased towards the alternative hypothesis. Only hierarchical models of the multilevel data lead to correct conclusions. Our results are particularly relevant for the use of hierarchical Bayesian parameter estimates in cognitive modeling.
The ranking probability approach and its usage in design and analysis of large-scale studies.
Kuo, Chia-Ling; Zaykin, Dmitri
2013-01-01
In experiments with many statistical tests there is need to balance type I and type II error rates while taking multiplicity into account. In the traditional approach, the nominal [Formula: see text]-level such as 0.05 is adjusted by the number of tests, [Formula: see text], i.e., as 0.05/[Formula: see text]. Assuming that some proportion of tests represent "true signals", that is, originate from a scenario where the null hypothesis is false, power depends on the number of true signals and the respective distribution of effect sizes. One way to define power is for it to be the probability of making at least one correct rejection at the assumed [Formula: see text]-level. We advocate an alternative way of establishing how "well-powered" a study is. In our approach, useful for studies with multiple tests, the ranking probability [Formula: see text] is controlled, defined as the probability of making at least [Formula: see text] correct rejections while rejecting hypotheses with [Formula: see text] smallest P-values. The two approaches are statistically related. Probability that the smallest P-value is a true signal (i.e., [Formula: see text]) is equal to the power at the level [Formula: see text], to an very good excellent approximation. Ranking probabilities are also related to the false discovery rate and to the Bayesian posterior probability of the null hypothesis. We study properties of our approach when the effect size distribution is replaced for convenience by a single "typical" value taken to be the mean of the underlying distribution. We conclude that its performance is often satisfactory under this simplification; however, substantial imprecision is to be expected when [Formula: see text] is very large and [Formula: see text] is small. Precision is largely restored when three values with the respective abundances are used instead of a single typical effect size value.
Mayhew, Terry M; Lucocq, John M
2011-03-01
Various methods for quantifying cellular immunogold labelling on transmission electron microscope thin sections are currently available. All rely on sound random sampling principles and are applicable to single immunolabelling across compartments within a given cell type or between different experimental groups of cells. Although methods are also available to test for colocalization in double/triple immunogold labelling studies, so far, these have relied on making multiple measurements of gold particle densities in defined areas or of inter-particle nearest neighbour distances. Here, we present alternative two-step approaches to codistribution and colocalization assessment that merely require raw counts of gold particles in distinct cellular compartments. For assessing codistribution over aggregate compartments, initial statistical evaluation involves combining contingency table and chi-squared analyses to provide predicted gold particle distributions. The observed and predicted distributions allow testing of the appropriate null hypothesis, namely, that there is no difference in the distribution patterns of proteins labelled by different sizes of gold particle. In short, the null hypothesis is that of colocalization. The approach for assessing colabelling recognises that, on thin sections, a compartment is made up of a set of sectional images (profiles) of cognate structures. The approach involves identifying two groups of compartmental profiles that are unlabelled and labelled for one gold marker size. The proportions in each group that are also labelled for the second gold marker size are then compared. Statistical analysis now uses a 2 × 2 contingency table combined with the Fisher exact probability test. Having identified double labelling, the profiles can be analysed further in order to identify characteristic features that might account for the double labelling. In each case, the approach is illustrated using synthetic and/or experimental datasets and can be refined to correct observed labelling patterns to specific labelling patterns. These simple and efficient approaches should be of more immediate utility to those interested in codistribution and colocalization in multiple immunogold labelling investigations.
A test to evaluate the earthquake prediction algorithm, M8
Healy, John H.; Kossobokov, Vladimir G.; Dewey, James W.
1992-01-01
A test of the algorithm M8 is described. The test is constructed to meet four rules, which we propose to be applicable to the test of any method for earthquake prediction: 1. An earthquake prediction technique should be presented as a well documented, logical algorithm that can be used by investigators without restrictions. 2. The algorithm should be coded in a common programming language and implementable on widely available computer systems. 3. A test of the earthquake prediction technique should involve future predictions with a black box version of the algorithm in which potentially adjustable parameters are fixed in advance. The source of the input data must be defined and ambiguities in these data must be resolved automatically by the algorithm. 4. At least one reasonable null hypothesis should be stated in advance of testing the earthquake prediction method, and it should be stated how this null hypothesis will be used to estimate the statistical significance of the earthquake predictions. The M8 algorithm has successfully predicted several destructive earthquakes, in the sense that the earthquakes occurred inside regions with linear dimensions from 384 to 854 km that the algorithm had identified as being in times of increased probability for strong earthquakes. In addition, M8 has successfully "post predicted" high percentages of strong earthquakes in regions to which it has been applied in retroactive studies. The statistical significance of previous predictions has not been established, however, and post-prediction studies in general are notoriously subject to success-enhancement through hindsight. Nor has it been determined how much more precise an M8 prediction might be than forecasts and probability-of-occurrence estimates made by other techniques. We view our test of M8 both as a means to better determine the effectiveness of M8 and as an experimental structure within which to make observations that might lead to improvements in the algorithm or conceivably lead to a radically different approach to earthquake prediction.
How to talk about protein-level false discovery rates in shotgun proteomics.
The, Matthew; Tasnim, Ayesha; Käll, Lukas
2016-09-01
A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein-level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein-level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein-level FDRs for both competing null hypotheses. © 2016 The Authors. Proteomics Published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
The Skillings-Mack test (Friedman test when there are missing data).
Chatfield, Mark; Mander, Adrian
2009-04-01
The Skillings-Mack statistic (Skillings and Mack, 1981, Technometrics 23: 171-177) is a general Friedman-type statistic that can be used in almost any block design with an arbitrary missing-data structure. The missing data can be either missing by design, for example, an incomplete block design, or missing completely at random. The Skillings-Mack test is equivalent to the Friedman test when there are no missing data in a balanced complete block design, and the Skillings-Mack test is equivalent to the test suggested in Durbin (1951, British Journal of Psychology, Statistical Section 4: 85-90) for a balanced incomplete block design. The Friedman test was implemented in Stata by Goldstein (1991, Stata Technical Bulletin 3: 26-27) and further developed in Goldstein (2005, Stata Journal 5: 285). This article introduces the skilmack command, which performs the Skillings-Mack test.The skilmack command is also useful when there are many ties or equal ranks (N.B. the Friedman statistic compared with the chi(2) distribution will give a conservative result), as well as for small samples; appropriate results can be obtained by simulating the distribution of the test statistic under the null hypothesis.
Xie, Yufen; Wang, Yingchun; Sun, Tong; Wang, Fangfei; Trostinskaia, Anna; Puscheck, Elizabeth; Rappolee, Daniel A
2005-05-01
Mitogen-activated protein kinase (MAPK) signaling pathways play an important role in controlling embryonic proliferation and differentiation. It has been demonstrated that sequential lipophilic signal transduction mediators that participate in the MAPK pathway are null post-implantation lethal. It is not clear why the lethality of these null mutants arises after implantation and not before. One hypothesis is that the gene product of these post-implantation lethal null mutants are not present before implantation in normal embryos and do not have function until after implantation. To test this hypothesis, we selected a set of lipophilic genes mediating MAPK signal transduction pathways whose null mutants result in early peri-implantation or placental lethality. These included FRS2alpha, GAB1, GRB2, SOS1, Raf-B, and Raf1. Products of these selected genes were detected and their locations and functions indicated by indirect immunocytochemistry and Western blotting for proteins and RT-polymerase chain reaction (PCR) for mRNA transcription. We report here that all six signal mediators are detected at the protein level in preimplantation mouse embryo, placental trophoblasts, and in cultured trophoblast stem cells (TSC). Proteins are all detected in E3.5 embryos at a time when the first known mitogenic intercellular communication has been documented. mRNA transcripts of two post-implantation null mutant genes are expressed in mouse preimplantation embryos and unfertilized eggs. These mRNA transcripts were detected as maternal mRNA in unfertilized eggs that could delay the lethality of null mutants. All of the proteins were detected in the cytoplasm or in the cell membrane. This study of spatial and temporal expression revealed that all of these six null mutants post-implantation genes in MAPK pathway are expressed and, where tested, phosphorylated/activated proteins are detected in the blastocyst. Studies on RNA expression using RT-PCR suggest that maternal RNA could play an important role in delaying the presence of the lethal phenotype of null mutations. Copyright (c) 2005 Wiley-Liss, Inc.
On the scaling of the distribution of daily price fluctuations in the Mexican financial market index
NASA Astrophysics Data System (ADS)
Alfonso, Léster; Mansilla, Ricardo; Terrero-Escalante, César A.
2012-05-01
In this paper, a statistical analysis of log-return fluctuations of the IPC, the Mexican Stock Market Index is presented. A sample of daily data covering the period from 04/09/2000-04/09/2010 was analyzed, and fitted to different distributions. Tests of the goodness of fit were performed in order to quantitatively asses the quality of the estimation. Special attention was paid to the impact of the size of the sample on the estimated decay of the distributions tail. In this study a forceful rejection of normality was obtained. On the other hand, the null hypothesis that the log-fluctuations are fitted to a α-stable Lévy distribution cannot be rejected at the 5% significance level.
Acar, Elif F; Sun, Lei
2013-06-01
Motivated by genetic association studies of SNPs with genotype uncertainty, we propose a generalization of the Kruskal-Wallis test that incorporates group uncertainty when comparing k samples. The extended test statistic is based on probability-weighted rank-sums and follows an asymptotic chi-square distribution with k - 1 degrees of freedom under the null hypothesis. Simulation studies confirm the validity and robustness of the proposed test in finite samples. Application to a genome-wide association study of type 1 diabetic complications further demonstrates the utilities of this generalized Kruskal-Wallis test for studies with group uncertainty. The method has been implemented as an open-resource R program, GKW. © 2013, The International Biometric Society.
Assessing significance in a Markov chain without mixing.
Chikina, Maria; Frieze, Alan; Pegden, Wesley
2017-03-14
We present a statistical test to detect that a presented state of a reversible Markov chain was not chosen from a stationary distribution. In particular, given a value function for the states of the Markov chain, we would like to show rigorously that the presented state is an outlier with respect to the values, by establishing a [Formula: see text] value under the null hypothesis that it was chosen from a stationary distribution of the chain. A simple heuristic used in practice is to sample ranks of states from long random trajectories on the Markov chain and compare these with the rank of the presented state; if the presented state is a [Formula: see text] outlier compared with the sampled ranks (its rank is in the bottom [Formula: see text] of sampled ranks), then this observation should correspond to a [Formula: see text] value of [Formula: see text] This significance is not rigorous, however, without good bounds on the mixing time of the Markov chain. Our test is the following: Given the presented state in the Markov chain, take a random walk from the presented state for any number of steps. We prove that observing that the presented state is an [Formula: see text]-outlier on the walk is significant at [Formula: see text] under the null hypothesis that the state was chosen from a stationary distribution. We assume nothing about the Markov chain beyond reversibility and show that significance at [Formula: see text] is best possible in general. We illustrate the use of our test with a potential application to the rigorous detection of gerrymandering in Congressional districting.
Cadenaro, Milena; Breschi, Lorenzo; Nucci, Cesare; Antoniolli, Francesca; Visintini, Erika; Prati, Carlo; Matis, Bruce A; Di Lenarda, Roberto
2008-01-01
This study evaluated the morphological effects produced in vivo by two in-office bleaching agents on enamel surface roughness using a noncontact profilometric analysis of epoxy replicas. The null hypothesis tested was that there would be no difference in the micromorphology of the enamel surface during or after bleaching with two different bleaching agents. Eighteen subjects were selected and randomly assigned to two treatment groups (n=9). The tooth whitening materials tested were 38% hydrogen peroxide (HP) (Opalescence Xtra Boost) and 35% carbamide peroxide (CP) (Rembrandt Quik Start). The bleaching agents were applied in accordance with manufacturer protocols. The treatments were repeated four times at one-week intervals. High precision impressions of the upper right incisor were taken at baseline as the control (CTRL) and after each bleaching treatment (T0: first application, T1: second application at one week, T2: third application at two weeks and T3: fourth application at three weeks). Epoxy resin replicas were poured from impressions, and the surface roughness was analyzed by means of a non-contact profilometer (Talysurf CLI 1000). Epoxy replicas were then observed using SEM. All data were statistically analyzed using ANOVA and differences were determined with a t-test. No significant differences in surface roughness were found on enamel replicas using either 38% hydrogen peroxide or 35% carbamide peroxide in vivo. This in vivo study supports the null hypothesis that two in-office bleaching agents, with either a high concentration of hydrogen or carbamide peroxide, do not alter enamel surface roughness, even after multiple applications.
Watanabe, Hiroshi; Nomura, Yoshikazu; Kuribayashi, Ami; Kurabayashi, Tohru
2018-02-01
We aimed to employ the Radia diagnostic software with the safety and efficacy of a new emerging dental X-ray modality (SEDENTEXCT) image quality (IQ) phantom in CT, and to evaluate its validity. The SEDENTEXCT IQ phantom and Radia diagnostic software were employed. The phantom was scanned using one medical full-body CT and two dentomaxillofacial cone beam CTs. The obtained images were imported to the Radia software, and the spatial resolution outputs were evaluated. The oversampling method was employed using our original wire phantom as a reference. The resultant modulation transfer function (MTF) curves were compared. The null hypothesis was that MTF curves generated using both methods would be in agreement. One-way analysis of variance tests were applied to the f50 and f10 values from the MTF curves. The f10 values were subjectively confirmed by observing the line pair modules. The Radia software reported the MTF curves on the xy-plane of the CT scans, but could not return f50 and f10 values on the z-axis. The null hypothesis concerning the reported MTF curves on the xy-plane was rejected. There were significant differences between the results of the Radia software and our reference method, except for f10 values in CS9300. These findings were consistent with our line pair observations. We evaluated the validity of the Radia software with the SEDENTEXCT IQ phantom. The data provided were semi-automatic, albeit with problems and statistically different from our reference. We hope the manufacturer will overcome these limitations.
Abad-Grau, Mara M; Medina-Medina, Nuria; Montes-Soldado, Rosana; Matesanz, Fuencisla; Bafna, Vineet
2012-01-01
Multimarker Transmission/Disequilibrium Tests (TDTs) are very robust association tests to population admixture and structure which may be used to identify susceptibility loci in genome-wide association studies. Multimarker TDTs using several markers may increase power by capturing high-degree associations. However, there is also a risk of spurious associations and power reduction due to the increase in degrees of freedom. In this study we show that associations found by tests built on simple null hypotheses are highly reproducible in a second independent data set regardless the number of markers. As a test exhibiting this feature to its maximum, we introduce the multimarker 2-Groups TDT (mTDT(2G)), a test which under the hypothesis of no linkage, asymptotically follows a χ2 distribution with 1 degree of freedom regardless the number of markers. The statistic requires the division of parental haplotypes into two groups: disease susceptibility and disease protective haplotype groups. We assessed the test behavior by performing an extensive simulation study as well as a real-data study using several data sets of two complex diseases. We show that mTDT(2G) test is highly efficient and it achieves the highest power among all the tests used, even when the null hypothesis is tested in a second independent data set. Therefore, mTDT(2G) turns out to be a very promising multimarker TDT to perform genome-wide searches for disease susceptibility loci that may be used as a preprocessing step in the construction of more accurate genetic models to predict individual susceptibility to complex diseases.
Abad-Grau, Mara M.; Medina-Medina, Nuria; Montes-Soldado, Rosana; Matesanz, Fuencisla; Bafna, Vineet
2012-01-01
Multimarker Transmission/Disequilibrium Tests (TDTs) are very robust association tests to population admixture and structure which may be used to identify susceptibility loci in genome-wide association studies. Multimarker TDTs using several markers may increase power by capturing high-degree associations. However, there is also a risk of spurious associations and power reduction due to the increase in degrees of freedom. In this study we show that associations found by tests built on simple null hypotheses are highly reproducible in a second independent data set regardless the number of markers. As a test exhibiting this feature to its maximum, we introduce the multimarker -Groups TDT ( ), a test which under the hypothesis of no linkage, asymptotically follows a distribution with degree of freedom regardless the number of markers. The statistic requires the division of parental haplotypes into two groups: disease susceptibility and disease protective haplotype groups. We assessed the test behavior by performing an extensive simulation study as well as a real-data study using several data sets of two complex diseases. We show that test is highly efficient and it achieves the highest power among all the tests used, even when the null hypothesis is tested in a second independent data set. Therefore, turns out to be a very promising multimarker TDT to perform genome-wide searches for disease susceptibility loci that may be used as a preprocessing step in the construction of more accurate genetic models to predict individual susceptibility to complex diseases. PMID:22363405
Assessing significance in a Markov chain without mixing
Chikina, Maria; Frieze, Alan; Pegden, Wesley
2017-01-01
We present a statistical test to detect that a presented state of a reversible Markov chain was not chosen from a stationary distribution. In particular, given a value function for the states of the Markov chain, we would like to show rigorously that the presented state is an outlier with respect to the values, by establishing a p value under the null hypothesis that it was chosen from a stationary distribution of the chain. A simple heuristic used in practice is to sample ranks of states from long random trajectories on the Markov chain and compare these with the rank of the presented state; if the presented state is a 0.1% outlier compared with the sampled ranks (its rank is in the bottom 0.1% of sampled ranks), then this observation should correspond to a p value of 0.001. This significance is not rigorous, however, without good bounds on the mixing time of the Markov chain. Our test is the following: Given the presented state in the Markov chain, take a random walk from the presented state for any number of steps. We prove that observing that the presented state is an ε-outlier on the walk is significant at p=2ε under the null hypothesis that the state was chosen from a stationary distribution. We assume nothing about the Markov chain beyond reversibility and show that significance at p≈ε is best possible in general. We illustrate the use of our test with a potential application to the rigorous detection of gerrymandering in Congressional districting. PMID:28246331
Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed p-values
Tong, Tiejun; Feng, Zeny; Hilton, Julia S.; Zhao, Hongyu
2013-01-01
Estimating the proportion of true null hypotheses, π0, has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π0 by incorporating the distribution pattern of the observed p-values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p-values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1 − λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance. PMID:24078762
Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed p-values.
Tong, Tiejun; Feng, Zeny; Hilton, Julia S; Zhao, Hongyu
2013-01-01
Estimating the proportion of true null hypotheses, π 0 , has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π 0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π 0 by incorporating the distribution pattern of the observed p -values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p -values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1 - λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance.
The Influence of Color and Illumination on the Interpretation of Emotions.
ERIC Educational Resources Information Center
Kohn, Imre Ransome
Research is presented that is derived from the hypothesis that a person's interpretation of emotional stimulus is affected by the painted hue and the light intensity of the visual environment. The reported experiment proved in part a null hypothesis; it was suggested that, within the considered variables of the experiment, either a person's…
Towards a framework for testing general relativity with extreme-mass-ratio-inspiral observations
NASA Astrophysics Data System (ADS)
Chua, A. J. K.; Hee, S.; Handley, W. J.; Higson, E.; Moore, C. J.; Gair, J. R.; Hobson, M. P.; Lasenby, A. N.
2018-07-01
Extreme-mass-ratio-inspiral observations from future space-based gravitational-wave detectors such as LISA will enable strong-field tests of general relativity with unprecedented precision, but at prohibitive computational cost if existing statistical techniques are used. In one such test that is currently employed for LIGO black hole binary mergers, generic deviations from relativity are represented by N deformation parameters in a generalized waveform model; the Bayesian evidence for each of its 2N combinatorial submodels is then combined into a posterior odds ratio for modified gravity over relativity in a null-hypothesis test. We adapt and apply this test to a generalized model for extreme-mass-ratio inspirals constructed on deformed black hole spacetimes, and focus our investigation on how computational efficiency can be increased through an evidence-free method of model selection. This method is akin to the algorithm known as product-space Markov chain Monte Carlo, but uses nested sampling and improved error estimates from a rethreading technique. We perform benchmarking and robustness checks for the method, and find order-of-magnitude computational gains over regular nested sampling in the case of synthetic data generated from the null model.
Towards a framework for testing general relativity with extreme-mass-ratio-inspiral observations
NASA Astrophysics Data System (ADS)
Chua, A. J. K.; Hee, S.; Handley, W. J.; Higson, E.; Moore, C. J.; Gair, J. R.; Hobson, M. P.; Lasenby, A. N.
2018-04-01
Extreme-mass-ratio-inspiral observations from future space-based gravitational-wave detectors such as LISA will enable strong-field tests of general relativity with unprecedented precision, but at prohibitive computational cost if existing statistical techniques are used. In one such test that is currently employed for LIGO black-hole binary mergers, generic deviations from relativity are represented by N deformation parameters in a generalised waveform model; the Bayesian evidence for each of its 2N combinatorial submodels is then combined into a posterior odds ratio for modified gravity over relativity in a null-hypothesis test. We adapt and apply this test to a generalised model for extreme-mass-ratio inspirals constructed on deformed black-hole spacetimes, and focus our investigation on how computational efficiency can be increased through an evidence-free method of model selection. This method is akin to the algorithm known as product-space Markov chain Monte Carlo, but uses nested sampling and improved error estimates from a rethreading technique. We perform benchmarking and robustness checks for the method, and find order-of-magnitude computational gains over regular nested sampling in the case of synthetic data generated from the null model.
Genealogies of rapidly adapting populations
Neher, Richard A.; Hallatschek, Oskar
2013-01-01
The genetic diversity of a species is shaped by its recent evolutionary history and can be used to infer demographic events or selective sweeps. Most inference methods are based on the null hypothesis that natural selection is a weak or infrequent evolutionary force. However, many species, particularly pathogens, are under continuous pressure to adapt in response to changing environments. A statistical framework for inference from diversity data of such populations is currently lacking. Towards this goal, we explore the properties of genealogies in a model of continual adaptation in asexual populations. We show that lineages trace back to a small pool of highly fit ancestors, in which almost simultaneous coalescence of more than two lineages frequently occurs. Whereas such multiple mergers are unlikely under the neutral coalescent, they create a unique genetic footprint in adapting populations. The site frequency spectrum of derived neutral alleles, for example, is nonmonotonic and has a peak at high frequencies, whereas Tajima’s D becomes more and more negative with increasing sample size. Because multiple merger coalescents emerge in many models of rapid adaptation, we argue that they should be considered as a null model for adapting populations. PMID:23269838
A Continuous Threshold Expectile Model.
Zhang, Feipeng; Li, Qunhua
2017-12-01
Expectile regression is a useful tool for exploring the relation between the response and the explanatory variables beyond the conditional mean. A continuous threshold expectile regression is developed for modeling data in which the effect of a covariate on the response variable is linear but varies below and above an unknown threshold in a continuous way. The estimators for the threshold and the regression coefficients are obtained using a grid search approach. The asymptotic properties for all the estimators are derived, and the estimator for the threshold is shown to achieve root-n consistency. A weighted CUSUM type test statistic is proposed for the existence of a threshold at a given expectile, and its asymptotic properties are derived under both the null and the local alternative models. This test only requires fitting the model under the null hypothesis in the absence of a threshold, thus it is computationally more efficient than the likelihood-ratio type tests. Simulation studies show that the proposed estimators and test have desirable finite sample performance in both homoscedastic and heteroscedastic cases. The application of the proposed method on a Dutch growth data and a baseball pitcher salary data reveals interesting insights. The proposed method is implemented in the R package cthreshER .
NASA Astrophysics Data System (ADS)
Goff, J.; Zahirovic, S.; Müller, D.
2017-12-01
Recently published spectral analyses of seafloor bathymetry concluded that abyssal hills, highly linear ridges that are formed along seafloor spreading centers, exhibit periodicities that correspond to Milankovitch cycles - variations in Earth's orbit that affect climate on periods of 23, 41 and 100 thousand years. These studies argue that this correspondence could be explained by modulation of volcanic output at the mid-ocean ridge due to lithostatic pressure variations associated with rising and falling sea level. If true, then the implications are substantial: mapping the topography of the seafloor with sonar could be used as a way to investigate past climate change. This "Milankovitch cycle" hypothesis predicts that the rise and fall of abyssal hills will be correlated to crustal age, which can be tested by stacking, or averaging, bathymetry as a function of age; stacking will enhance any age-dependent signal while suppressing random components, such as fault-generated topography. We apply age-stacking to data flanking the Southeast Indian Ridge ( 3.6 cm/yr half rate), northern East Pacific Rise ( 5.4 cm/yr half rate) and southern East Pacific Rise ( 7.8 cm/yr half rate), where multibeam bathymetric coverage is extensive on the ridge flanks. At the greatest precision possible given magnetic anomaly data coverage, we have revised digital crustal age models in these regions with updated axis and magnetic anomaly traces. We also utilize known 2nd-order spatial statistical properties of abyssal hills to predict the variability of the age-stack under the null hypothesis that abyssal hills are entirely random with respect to crustal age; the age-stacked profile is significantly different from zero only if it exceeds this expected variability by a large margin. Our results indicate, however, that the null hypothesis satisfactorily explains the age-stacking results in all three regions of study, thus providing no support for the Milankovitch cycle hypothesis. The random nature of abyssal hills is consistent with a primarily faulted origin. .
Han, Buhm; Kang, Hyun Min; Eskin, Eleazar
2009-01-01
With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu. PMID:19381255
Jacob, Laurent; Combes, Florence; Burger, Thomas
2018-06-18
We propose a new hypothesis test for the differential abundance of proteins in mass-spectrometry based relative quantification. An important feature of this type of high-throughput analyses is that it involves an enzymatic digestion of the sample proteins into peptides prior to identification and quantification. Due to numerous homology sequences, different proteins can lead to peptides with identical amino acid chains, so that their parent protein is ambiguous. These so-called shared peptides make the protein-level statistical analysis a challenge and are often not accounted for. In this article, we use a linear model describing peptide-protein relationships to build a likelihood ratio test of differential abundance for proteins. We show that the likelihood ratio statistic can be computed in linear time with the number of peptides. We also provide the asymptotic null distribution of a regularized version of our statistic. Experiments on both real and simulated datasets show that our procedures outperforms state-of-the-art methods. The procedures are available via the pepa.test function of the DAPAR Bioconductor R package.
Intra-fraction motion of the prostate is a random walk
NASA Astrophysics Data System (ADS)
Ballhausen, H.; Li, M.; Hegemann, N.-S.; Ganswindt, U.; Belka, C.
2015-01-01
A random walk model for intra-fraction motion has been proposed, where at each step the prostate moves a small amount from its current position in a random direction. Online tracking data from perineal ultrasound is used to validate or reject this model against alternatives. Intra-fraction motion of a prostate was recorded by 4D ultrasound (Elekta Clarity system) during 84 fractions of external beam radiotherapy of six patients. In total, the center of the prostate was tracked for 8 h in intervals of 4 s. Maximum likelihood model parameters were fitted to the data. The null hypothesis of a random walk was tested with the Dickey-Fuller test. The null hypothesis of stationarity was tested by the Kwiatkowski-Phillips-Schmidt-Shin test. The increase of variance in prostate position over time and the variability in motility between fractions were analyzed. Intra-fraction motion of the prostate was best described as a stochastic process with an auto-correlation coefficient of ρ = 0.92 ± 0.13. The random walk hypothesis (ρ = 1) could not be rejected (p = 0.27). The static noise hypothesis (ρ = 0) was rejected (p < 0.001). The Dickey-Fuller test rejected the null hypothesis ρ = 1 in 25% to 32% of cases. On average, the Kwiatkowski-Phillips-Schmidt-Shin test rejected the null hypothesis ρ = 0 with a probability of 93% to 96%. The variance in prostate position increased linearly over time (r2 = 0.9 ± 0.1). Variance kept increasing and did not settle at a maximum as would be expected from a stationary process. There was substantial variability in motility between fractions and patients with maximum aberrations from isocenter ranging from 0.5 mm to over 10 mm in one patient alone. In conclusion, evidence strongly suggests that intra-fraction motion of the prostate is a random walk and neither static (like inter-fraction setup errors) nor stationary (like a cyclic motion such as breathing, for example). The prostate tends to drift away from the isocenter during a fraction, and this variance increases with time, such that shorter fractions are beneficial to the problem of intra-fraction motion. As a consequence, fixed safety margins (which would over-compensate at the beginning and under-compensate at the end of a fraction) cannot optimally account for intra-fraction motion. Instead, online tracking and position correction on-the-fly should be considered as the preferred approach to counter intra-fraction motion.
Statistical analysis of secondary particle distributions in relativistic nucleus-nucleus collisions
NASA Technical Reports Server (NTRS)
Mcguire, Stephen C.
1987-01-01
The use is described of several statistical techniques to characterize structure in the angular distributions of secondary particles from nucleus-nucleus collisions in the energy range 24 to 61 GeV/nucleon. The objective of this work was to determine whether there are correlations between emitted particle intensity and angle that may be used to support the existence of the quark gluon plasma. The techniques include chi-square null hypothesis tests, the method of discrete Fourier transform analysis, and fluctuation analysis. We have also used the method of composite unit vectors to test for azimuthal asymmetry in a data set of 63 JACEE-3 events. Each method is presented in a manner that provides the reader with some practical detail regarding its application. Of those events with relatively high statistics, Fe approaches 0 at 55 GeV/nucleon was found to possess an azimuthal distribution with a highly non-random structure. No evidence of non-statistical fluctuations was found in the pseudo-rapidity distributions of the events studied. It is seen that the most effective application of these methods relies upon the availability of many events or single events that possess very high multiplicities.
Siller, Saul S.; Broadie, Kendal
2011-01-01
SUMMARY Fragile X syndrome (FXS), caused by loss of the fragile X mental retardation 1 (FMR1) product (FMRP), is the most common cause of inherited intellectual disability and autism spectrum disorders. FXS patients suffer multiple behavioral symptoms, including hyperactivity, disrupted circadian cycles, and learning and memory deficits. Recently, a study in the mouse FXS model showed that the tetracycline derivative minocycline effectively remediates the disease state via a proposed matrix metalloproteinase (MMP) inhibition mechanism. Here, we use the well-characterized Drosophila FXS model to assess the effects of minocycline treatment on multiple neural circuit morphological defects and to investigate the MMP hypothesis. We first treat Drosophila Fmr1 (dfmr1) null animals with minocycline to assay the effects on mutant synaptic architecture in three disparate locations: the neuromuscular junction (NMJ), clock neurons in the circadian activity circuit and Kenyon cells in the mushroom body learning and memory center. We find that minocycline effectively restores normal synaptic structure in all three circuits, promising therapeutic potential for FXS treatment. We next tested the MMP hypothesis by assaying the effects of overexpressing the sole Drosophila tissue inhibitor of MMP (TIMP) in dfmr1 null mutants. We find that TIMP overexpression effectively prevents defects in the NMJ synaptic architecture in dfmr1 mutants. Moreover, co-removal of dfmr1 similarly rescues TIMP overexpression phenotypes, including cellular tracheal defects and lethality. To further test the MMP hypothesis, we generated dfmr1;mmp1 double null mutants. Null mmp1 mutants are 100% lethal and display cellular tracheal defects, but co-removal of dfmr1 allows adult viability and prevents tracheal defects. Conversely, co-removal of mmp1 ameliorates the NMJ synaptic architecture defects in dfmr1 null mutants, despite the lack of detectable difference in MMP1 expression or gelatinase activity between the single dfmr1 mutants and controls. These results support minocycline as a promising potential FXS treatment and suggest that it might act via MMP inhibition. We conclude that FMRP and TIMP pathways interact in a reciprocal, bidirectional manner. PMID:21669931
Analysis of the Einstein sample of early-type galaxies
NASA Technical Reports Server (NTRS)
Eskridge, Paul B.; Fabbiano, Giuseppina
1993-01-01
The EINSTEIN galaxy catalog contains x-ray data for 148 early-type (E and SO) galaxies. A detailed analysis of the global properties of this sample are studied. By comparing the x-ray properties with other tracers of the ISM, as well as with observables related to the stellar dynamics and populations of the sample, we expect to determine more clearly the physical relationships that determine the evolution of early-type galaxies. Previous studies with smaller samples have explored the relationships between x-ray luminosity (L(sub x)) and luminosities in other bands. Using our larger sample and the statistical techniques of survival analysis, a number of these earlier analyses were repeated. For our full sample, a strong statistical correlation is found between L(sub X) and L(sub B) (the probability that the null hypothesis is upheld is P less than 10(exp -4) from a variety of rank correlation tests. Regressions with several algorithms yield consistent results.
Order-restricted inference for means with missing values.
Wang, Heng; Zhong, Ping-Shou
2017-09-01
Missing values appear very often in many applications, but the problem of missing values has not received much attention in testing order-restricted alternatives. Under the missing at random (MAR) assumption, we impute the missing values nonparametrically using kernel regression. For data with imputation, the classical likelihood ratio test designed for testing the order-restricted means is no longer applicable since the likelihood does not exist. This article proposes a novel method for constructing test statistics for assessing means with an increasing order or a decreasing order based on jackknife empirical likelihood (JEL) ratio. It is shown that the JEL ratio statistic evaluated under the null hypothesis converges to a chi-bar-square distribution, whose weights depend on missing probabilities and nonparametric imputation. Simulation study shows that the proposed test performs well under various missing scenarios and is robust for normally and nonnormally distributed data. The proposed method is applied to an Alzheimer's disease neuroimaging initiative data set for finding a biomarker for the diagnosis of the Alzheimer's disease. © 2017, The International Biometric Society.
Goudouri, Ourania-Menti; Kontonasaki, Eleana; Papadopoulou, Lambrini; Manda, Marianthi; Kavouras, Panagiotis; Triantafyllidis, Konstantinos S; Stefanidou, Maria; Koidis, Petros; Paraskevopoulos, Konstantinos M
2017-02-01
The aim of this study was the evaluation of the textural characteristics of an experimental sol-gel derived feldspathic dental ceramic, which has already been proven bioactive and the investigation of its flexural strength through Weibull Statistical Analysis. The null hypothesis was that the flexural strength of the experimental and the commercial dental ceramic would be of the same order, resulting in a dental ceramic with apatite forming ability and adequate mechanical integrity. Although the flexural strength of the experimental ceramics was not statistically significant different compared to the commercial one, the amount of blind pores due to processing was greater. The textural characteristics of the experimental ceramic were in accordance with the standard low porosity levels reported for dental ceramics used for fixed prosthetic restorations. Feldspathic dental ceramics with typical textural characteristics and advanced mechanical properties as well as enhanced apatite forming ability can be synthesized through the sol-gel method. Copyright © 2016 Elsevier Ltd. All rights reserved.
Distribution of the two-sample t-test statistic following blinded sample size re-estimation.
Lu, Kaifeng
2016-05-01
We consider the blinded sample size re-estimation based on the simple one-sample variance estimator at an interim analysis. We characterize the exact distribution of the standard two-sample t-test statistic at the final analysis. We describe a simulation algorithm for the evaluation of the probability of rejecting the null hypothesis at given treatment effect. We compare the blinded sample size re-estimation method with two unblinded methods with respect to the empirical type I error, the empirical power, and the empirical distribution of the standard deviation estimator and final sample size. We characterize the type I error inflation across the range of standardized non-inferiority margin for non-inferiority trials, and derive the adjusted significance level to ensure type I error control for given sample size of the internal pilot study. We show that the adjusted significance level increases as the sample size of the internal pilot study increases. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Prum, Richard O
2010-11-01
The Fisher-inspired, arbitrary intersexual selection models of Lande (1981) and Kirkpatrick (1982), including both stable and unstable equilibrium conditions, provide the appropriate null model for the evolution of traits and preferences by intersexual selection. Like the Hardy–Weinberg equilibrium, the Lande–Kirkpatrick (LK) mechanism arises as an intrinsic consequence of genetic variation in trait and preference in the absence of other evolutionary forces. The LK mechanism is equivalent to other intersexual selection mechanisms in the absence of additional selection on preference and with additional trait-viability and preference-viability correlations equal to zero. The LK null model predicts the evolution of arbitrary display traits that are neither honest nor dishonest, indicate nothing other than mating availability, and lack any meaning or design other than their potential to correspond to mating preferences. The current standard for demonstrating an arbitrary trait is impossible to meet because it requires proof of the null hypothesis. The LK null model makes distinct predictions about the evolvability of traits and preferences. Examples of recent intersexual selection research document the confirmationist pitfalls of lacking a null model. Incorporation of the LK null into intersexual selection will contribute to serious examination of the extent to which natural selection on preferences shapes signals.
2011-05-24
of 230 community similarity (Legendre and Legendre 1998). 231 232 Permutational Multivariate Analysis of Variance ( PerMANOVA ) (McArdle...241 null hypothesis can be rejected with a type I error rate of a. We used an implementation 242 of PerMANOVA that involved sequential removal...TEXTURE, and 249 HABITAT. 250 251 The null distribution for PerMANOVA tests for site-scale effects was generated 252 using a restricted
Castro, Marcelo P; Pataky, Todd C; Sole, Gisela; Vilas-Boas, Joao Paulo
2015-07-16
Ground reaction force (GRF) data from men and women are commonly pooled for analyses. However, it may not be justifiable to pool sexes on the basis of discrete parameters extracted from continuous GRF gait waveforms because this can miss continuous effects. Forty healthy participants (20 men and 20 women) walked at a cadence of 100 steps per minute across two force plates, recording GRFs. Two statistical methods were used to test the null hypothesis of no mean GRF differences between sexes: (i) Statistical Parametric Mapping-using the entire three-component GRF waveform; and (ii) traditional approach-using the first and second vertical GRF peaks. Statistical Parametric Mapping results suggested large sex differences, which post-hoc analyses suggested were due predominantly to higher anterior-posterior and vertical GRFs in early stance in women compared to men. Statistically significant differences were observed for the first GRF peak and similar values for the second GRF peak. These contrasting results emphasise that different parts of the waveform have different signal strengths and thus that one may use the traditional approach to choose arbitrary metrics and make arbitrary conclusions. We suggest that researchers and clinicians consider both the entire gait waveforms and sex-specificity when analysing GRF data. Copyright © 2015 Elsevier Ltd. All rights reserved.
Brown, Angus M
2010-04-01
The objective of the method described in this paper is to develop a spreadsheet template for the purpose of comparing multiple sample means. An initial analysis of variance (ANOVA) test on the data returns F--the test statistic. If F is larger than the critical F value drawn from the F distribution at the appropriate degrees of freedom, convention dictates rejection of the null hypothesis and allows subsequent multiple comparison testing to determine where the inequalities between the sample means lie. A variety of multiple comparison methods are described that return the 95% confidence intervals for differences between means using an inclusive pairwise comparison of the sample means. 2009 Elsevier Ireland Ltd. All rights reserved.
Statistics for X-chromosome associations.
Özbek, Umut; Lin, Hui-Min; Lin, Yan; Weeks, Daniel E; Chen, Wei; Shaffer, John R; Purcell, Shaun M; Feingold, Eleanor
2018-06-13
In a genome-wide association study (GWAS), association between genotype and phenotype at autosomal loci is generally tested by regression models. However, X-chromosome data are often excluded from published analyses of autosomes because of the difference between males and females in number of X chromosomes. Failure to analyze X-chromosome data at all is obviously less than ideal, and can lead to missed discoveries. Even when X-chromosome data are included, they are often analyzed with suboptimal statistics. Several mathematically sensible statistics for X-chromosome association have been proposed. The optimality of these statistics, however, is based on very specific simple genetic models. In addition, while previous simulation studies of these statistics have been informative, they have focused on single-marker tests and have not considered the types of error that occur even under the null hypothesis when the entire X chromosome is scanned. In this study, we comprehensively tested several X-chromosome association statistics using simulation studies that include the entire chromosome. We also considered a wide range of trait models for sex differences and phenotypic effects of X inactivation. We found that models that do not incorporate a sex effect can have large type I error in some cases. We also found that many of the best statistics perform well even when there are modest deviations, such as trait variance differences between the sexes or small sex differences in allele frequencies, from assumptions. © 2018 WILEY PERIODICALS, INC.
Spiegelhalter, D J; Freedman, L S
1986-01-01
The 'textbook' approach to determining sample size in a clinical trial has some fundamental weaknesses which we discuss. We describe a new predictive method which takes account of prior clinical opinion about the treatment difference. The method adopts the point of clinical equivalence (determined by interviewing the clinical participants) as the null hypothesis. Decision rules at the end of the study are based on whether the interval estimate of the treatment difference (classical or Bayesian) includes the null hypothesis. The prior distribution is used to predict the probabilities of making the decisions to use one or other treatment or to reserve final judgement. It is recommended that sample size be chosen to control the predicted probability of the last of these decisions. An example is given from a multi-centre trial of superficial bladder cancer.
Testing an earthquake prediction algorithm
Kossobokov, V.G.; Healy, J.H.; Dewey, J.W.
1997-01-01
A test to evaluate earthquake prediction algorithms is being applied to a Russian algorithm known as M8. The M8 algorithm makes intermediate term predictions for earthquakes to occur in a large circle, based on integral counts of transient seismicity in the circle. In a retroactive prediction for the period January 1, 1985 to July 1, 1991 the algorithm as configured for the forward test would have predicted eight of ten strong earthquakes in the test area. A null hypothesis, based on random assignment of predictions, predicts eight earthquakes in 2.87% of the trials. The forward test began July 1, 1991 and will run through December 31, 1997. As of July 1, 1995, the algorithm had forward predicted five out of nine earthquakes in the test area, which success ratio would have been achieved in 53% of random trials with the null hypothesis.
Unification of field theory and maximum entropy methods for learning probability densities
NASA Astrophysics Data System (ADS)
Kinney, Justin B.
2015-09-01
The need to estimate smooth probability distributions (a.k.a. probability densities) from finite sampled data is ubiquitous in science. Many approaches to this problem have been described, but none is yet regarded as providing a definitive solution. Maximum entropy estimation and Bayesian field theory are two such approaches. Both have origins in statistical physics, but the relationship between them has remained unclear. Here I unify these two methods by showing that every maximum entropy density estimate can be recovered in the infinite smoothness limit of an appropriate Bayesian field theory. I also show that Bayesian field theory estimation can be performed without imposing any boundary conditions on candidate densities, and that the infinite smoothness limit of these theories recovers the most common types of maximum entropy estimates. Bayesian field theory thus provides a natural test of the maximum entropy null hypothesis and, furthermore, returns an alternative (lower entropy) density estimate when the maximum entropy hypothesis is falsified. The computations necessary for this approach can be performed rapidly for one-dimensional data, and software for doing this is provided.
Unification of field theory and maximum entropy methods for learning probability densities.
Kinney, Justin B
2015-09-01
The need to estimate smooth probability distributions (a.k.a. probability densities) from finite sampled data is ubiquitous in science. Many approaches to this problem have been described, but none is yet regarded as providing a definitive solution. Maximum entropy estimation and Bayesian field theory are two such approaches. Both have origins in statistical physics, but the relationship between them has remained unclear. Here I unify these two methods by showing that every maximum entropy density estimate can be recovered in the infinite smoothness limit of an appropriate Bayesian field theory. I also show that Bayesian field theory estimation can be performed without imposing any boundary conditions on candidate densities, and that the infinite smoothness limit of these theories recovers the most common types of maximum entropy estimates. Bayesian field theory thus provides a natural test of the maximum entropy null hypothesis and, furthermore, returns an alternative (lower entropy) density estimate when the maximum entropy hypothesis is falsified. The computations necessary for this approach can be performed rapidly for one-dimensional data, and software for doing this is provided.
A phenological mid-domain effect in flowering diversity.
Morales, Manuel A; Dodge, Gary J; Inouye, David W
2005-01-01
In this paper, we test the mid-domain hypothesis as an explanation for observed patterns of flowering diversity in two sub-alpine communities of insect-pollinated plants. Observed species richness patterns showed an early-season increase in richness, a mid-season peak, and a late-season decrease. We show that a "mid-domain" null model can qualitatively match this pattern of flowering species richness, with R(2) values typically greater than 60%. We find significant or marginally significant departures from expected patterns of diversity for only 3 out of 12 year-site combinations. On the other hand, we do find a consistent pattern of departure when comparing observed versus null-model predicted flowering diversity averaged across years. Our results therefore support the hypothesis that ecological factors shape patterns of flowering phenology, but that the strength or nature of these environmental forcings may differ between years or the two habitats we studied, or may depend on species-specific characteristics of these plant communities. We conclude that mid-domain null models provide an important baseline from which to test departure of expected patterns of flowering diversity across temporal domains. Geometric constraints should be included first in the list of factors that drive seasonal patterns of flowering diversity.
A Bayesian Approach to the Paleomagnetic Conglomerate Test
NASA Astrophysics Data System (ADS)
Heslop, David; Roberts, Andrew P.
2018-02-01
The conglomerate test has served the paleomagnetic community for over 60 years as a means to detect remagnetizations. The test states that if a suite of clasts within a bed have uniformly random paleomagnetic directions, then the conglomerate cannot have experienced a pervasive event that remagnetized the clasts in the same direction. The current form of the conglomerate test is based on null hypothesis testing, which results in a binary "pass" (uniformly random directions) or "fail" (nonrandom directions) outcome. We have recast the conglomerate test in a Bayesian framework with the aim of providing more information concerning the level of support a given data set provides for a hypothesis of uniformly random paleomagnetic directions. Using this approach, we place the conglomerate test in a fully probabilistic framework that allows for inconclusive results when insufficient information is available to draw firm conclusions concerning the randomness or nonrandomness of directions. With our method, sample sets larger than those typically employed in paleomagnetism may be required to achieve strong support for a hypothesis of random directions. Given the potentially detrimental effect of unrecognized remagnetizations on paleomagnetic reconstructions, it is important to provide a means to draw statistically robust data-driven inferences. Our Bayesian analysis provides a means to do this for the conglomerate test.
Low-dimensional attractor for neural activity from local field potentials in optogenetic mice
Oprisan, Sorinel A.; Lynn, Patrick E.; Tompa, Tamas; Lavin, Antonieta
2015-01-01
We used optogenetic mice to investigate possible nonlinear responses of the medial prefrontal cortex (mPFC) local network to light stimuli delivered by a 473 nm laser through a fiber optics. Every 2 s, a brief 10 ms light pulse was applied and the local field potentials (LFPs) were recorded with a 10 kHz sampling rate. The experiment was repeated 100 times and we only retained and analyzed data from six animals that showed stable and repeatable response to optical stimulations. The presence of nonlinearity in our data was checked using the null hypothesis that the data were linearly correlated in the temporal domain, but were random otherwise. For each trail, 100 surrogate data sets were generated and both time reversal asymmetry and false nearest neighbor (FNN) were used as discriminating statistics for the null hypothesis. We found that nonlinearity is present in all LFP data. The first 0.5 s of each 2 s LFP recording were dominated by the transient response of the networks. For each trial, we used the last 1.5 s of steady activity to measure the phase resetting induced by the brief 10 ms light stimulus. After correcting the LFPs for the effect of phase resetting, additional preprocessing was carried out using dendrograms to identify “similar” groups among LFP trials. We found that the steady dynamics of mPFC in response to light stimuli could be reconstructed in a three-dimensional phase space with topologically similar “8”-shaped attractors across different animals. Our results also open the possibility of designing a low-dimensional model for optical stimulation of the mPFC local network. PMID:26483665
Del Fabbro, Egidio; Dev, Rony; Hui, David; Palmer, Lynn; Bruera, Eduardo
2013-04-01
Prior studies have suggested that melatonin, a frequently used integrative medicine, can attenuate weight loss, anorexia, and fatigue in patients with cancer. These studies were limited by a lack of blinding and absence of placebo controls. The primary purpose of this study was to compare melatonin with placebo for appetite improvement in patients with cancer cachexia. We performed a randomized, double-blind, 28-day trial of melatonin 20 mg versus placebo in patients with advanced lung or GI cancer, appetite scores ≥ 4 on a 0 to 10 scale (10 = worst appetite), and history of weight loss ≥ 5%. Assessments included weight, symptoms by the Edmonton Symptom Assessment Scale, and quality of life by the Functional Assessment of Anorexia/Cachexia Therapy (FAACT) questionnaire. Differences between groups from baseline to day 28 were analyzed using one-sided, two-sample t tests or Wilcoxon two-sample tests. Interim analysis halfway through the trial had a Lan-DeMets monitoring boundary with an O'Brien-Fleming stopping rule. Decision boundaries were to accept the null hypothesis of futility if the test statistic z < 0.39 (P ≥ .348) and reject the null hypothesis if z > 2.54 (P ≤ .0056). After interim analysis of 48 patients, the study was closed for futility. There were no significant differences between groups for appetite (P = .78) or other symptoms, weight (P = .17), FAACT score (P = .95), toxicity, or survival from baseline to day 28. In cachectic patients with advanced cancer, oral melatonin 20 mg at night did not improve appetite, weight, or quality of life compared with placebo.
Karadeniz, Ersan I; Gonzales, Carmen; Turk, Tamer; Isci, Devrim; Sahin-Saglam, Aynur M; Alkis, Huseyin; Elekdag-Turk, Selma; Darendeliler, M Ali
2013-05-01
To evaluate the null hypothesis that fluoride intake via drinking water has no effect on orthodontic root resorption in humans after orthodontic force application for 4 weeks and 12 weeks of retention. Forty-eight patients who required maxillary premolar extractions as part of their orthodontic treatment were selected from two cities in Turkey. These cities had a high and low fluoride concentration in public water of ≥2 pm and ≤0.05 pm, respectively. The patients were randomly separated into four groups of 12 each: group 1HH, high fluoride (≥2 ppm) and heavy force (225 g); group 2LH, low fluoride (≤0.05 ppm) and heavy force; group 3HL, high fluoride and light force (25 g); and group 4LL, low fluoride and light force. Light or heavy buccal tipping force was applied on the upper first premolars for 28 days. At day 28, the left premolars were extracted (positive control side); the right premolars (experimental side) were extracted after 12 weeks of retention. The samples were analyzed with microcomputed tomography. On the positive control side, under heavy force application, the high fluoride groups exhibited less root resorption (P = .015). On the experimental side, it was found that fluoride reduced the total volume of root resorption craters; however, this effect was not statistically significant (P = .237). Moreover, the results revealed that under heavy force application experimental teeth exhibited more root resorption than positive control groups. The null hypothesis could not be rejected. High fluoride intake from public water did not have a beneficial effect on the severity of root resorption after a 4-week orthodontic force application and 12 weeks of passive retention.
Affected sib pair tests in inbred populations.
Liu, W; Weir, B S
2004-11-01
The affected-sib-pair (ASP) method for detecting linkage between a disease locus and marker loci was first established 50 years ago, and since then numerous modifications have been made. We modify two identity-by-state (IBS) test statistics of Lange (Lange, 1986a, 1986b) to allow for inbreeding in the population. We evaluate the power and false positive rates of the modified tests under three disease models, using simulated data. Before estimating false positive rates, we demonstrate that IBS tests are tests of both linkage and linkage disequilibrium between marker and disease loci. Therefore, the null hypothesis of IBS tests should be no linkage and no LD. When the population inbreeding coefficient is large, the false positive rates of Lange's tests become much larger than the nominal value, while those of our modified tests remain close to the nominal value. To estimate power with a controlled false positive rate, we choose the cutoff values based on simulated datasets under the null hypothesis, so that both Lange's tests and the modified tests generate same false positive rate. The powers of Lange's z-test and our modified z-test are very close and do not change much with increasing inbreeding. The power of the modified chi-square test also stays stable when the inbreeding coefficient increases. However, the power of Lange's chi-square test increases with increasing inbreeding, and is larger than that of our modified chi-square test for large inbreeding coefficients. The power is high under a recessive disease model for both Lange's tests and the modified tests, though the power is low for additive and dominant disease models. Allowing for inbreeding is therefore appropriate, at least for diseases known to be recessive.
Two Bayesian tests of the GLOMOsys Model.
Field, Sarahanne M; Wagenmakers, Eric-Jan; Newell, Ben R; Zeelenberg, René; van Ravenzwaaij, Don
2016-12-01
Priming is arguably one of the key phenomena in contemporary social psychology. Recent retractions and failed replication attempts have led to a division in the field between proponents and skeptics and have reinforced the importance of confirming certain priming effects through replication. In this study, we describe the results of 2 preregistered replication attempts of 1 experiment by Förster and Denzler (2012). In both experiments, participants first processed letters either globally or locally, then were tested using a typicality rating task. Bayes factor hypothesis tests were conducted for both experiments: Experiment 1 (N = 100) yielded an indecisive Bayes factor of 1.38, indicating that the in-lab data are 1.38 times more likely to have occurred under the null hypothesis than under the alternative. Experiment 2 (N = 908) yielded a Bayes factor of 10.84, indicating strong support for the null hypothesis that global priming does not affect participants' mean typicality ratings. The failure to replicate this priming effect challenges existing support for the GLOMO sys model. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Hypothesis testing and earthquake prediction.
Jackson, D D
1996-04-30
Requirements for testing include advance specification of the conditional rate density (probability per unit time, area, and magnitude) or, alternatively, probabilities for specified intervals of time, space, and magnitude. Here I consider testing fully specified hypotheses, with no parameter adjustments or arbitrary decisions allowed during the test period. Because it may take decades to validate prediction methods, it is worthwhile to formulate testable hypotheses carefully in advance. Earthquake prediction generally implies that the probability will be temporarily higher than normal. Such a statement requires knowledge of "normal behavior"--that is, it requires a null hypothesis. Hypotheses can be tested in three ways: (i) by comparing the number of actual earth-quakes to the number predicted, (ii) by comparing the likelihood score of actual earthquakes to the predicted distribution, and (iii) by comparing the likelihood ratio to that of a null hypothesis. The first two tests are purely self-consistency tests, while the third is a direct comparison of two hypotheses. Predictions made without a statement of probability are very difficult to test, and any test must be based on the ratio of earthquakes in and out of the forecast regions.
Hypothesis testing and earthquake prediction.
Jackson, D D
1996-01-01
Requirements for testing include advance specification of the conditional rate density (probability per unit time, area, and magnitude) or, alternatively, probabilities for specified intervals of time, space, and magnitude. Here I consider testing fully specified hypotheses, with no parameter adjustments or arbitrary decisions allowed during the test period. Because it may take decades to validate prediction methods, it is worthwhile to formulate testable hypotheses carefully in advance. Earthquake prediction generally implies that the probability will be temporarily higher than normal. Such a statement requires knowledge of "normal behavior"--that is, it requires a null hypothesis. Hypotheses can be tested in three ways: (i) by comparing the number of actual earth-quakes to the number predicted, (ii) by comparing the likelihood score of actual earthquakes to the predicted distribution, and (iii) by comparing the likelihood ratio to that of a null hypothesis. The first two tests are purely self-consistency tests, while the third is a direct comparison of two hypotheses. Predictions made without a statement of probability are very difficult to test, and any test must be based on the ratio of earthquakes in and out of the forecast regions. PMID:11607663
Really a Matter of Data: A Reply to Solomon.
ERIC Educational Resources Information Center
Sroufe, L. Alan
1980-01-01
Replies to Solomon's paper that basic criticisms made earlier of Shaffran and Decaries' study still apply. Views the study as essentially a confirmation of the null hypothesis based on weak measures. (Author/RH)
DETECTING UNSPECIFIED STRUCTURE IN LOW-COUNT IMAGES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stein, Nathan M.; Dyk, David A. van; Kashyap, Vinay L.
Unexpected structure in images of astronomical sources often presents itself upon visual inspection of the image, but such apparent structure may either correspond to true features in the source or be due to noise in the data. This paper presents a method for testing whether inferred structure in an image with Poisson noise represents a significant departure from a baseline (null) model of the image. To infer image structure, we conduct a Bayesian analysis of a full model that uses a multiscale component to allow flexible departures from the posited null model. As a test statistic, we use a tailmore » probability of the posterior distribution under the full model. This choice of test statistic allows us to estimate a computationally efficient upper bound on a p-value that enables us to draw strong conclusions even when there are limited computational resources that can be devoted to simulations under the null model. We demonstrate the statistical performance of our method on simulated images. Applying our method to an X-ray image of the quasar 0730+257, we find significant evidence against the null model of a single point source and uniform background, lending support to the claim of an X-ray jet.« less
A Gaussian Mixture Model for Nulling Pulsars
NASA Astrophysics Data System (ADS)
Kaplan, D. L.; Swiggum, J. K.; Fichtenbauer, T. D. J.; Vallisneri, M.
2018-03-01
The phenomenon of pulsar nulling—where pulsars occasionally turn off for one or more pulses—provides insight into pulsar-emission mechanisms and the processes by which pulsars turn off when they cross the “death line.” However, while ever more pulsars are found that exhibit nulling behavior, the statistical techniques used to measure nulling are biased, with limited utility and precision. In this paper, we introduce an improved algorithm, based on Gaussian mixture models, for measuring pulsar nulling behavior. We demonstrate this algorithm on a number of pulsars observed as part of a larger sample of nulling pulsars, and show that it performs considerably better than existing techniques, yielding better precision and no bias. We further validate our algorithm on simulated data. Our algorithm is widely applicable to a large number of pulsars even if they do not show obvious nulls. Moreover, it can be used to derive nulling probabilities of nulling for individual pulses, which can be used for in-depth studies.
Collins, Ryan L; Hu, Ting; Wejse, Christian; Sirugo, Giorgio; Williams, Scott M; Moore, Jason H
2013-02-18
Identifying high-order genetics associations with non-additive (i.e. epistatic) effects in population-based studies of common human diseases is a computational challenge. Multifactor dimensionality reduction (MDR) is a machine learning method that was designed specifically for this problem. The goal of the present study was to apply MDR to mining high-order epistatic interactions in a population-based genetic study of tuberculosis (TB). The study used a previously published data set consisting of 19 candidate single-nucleotide polymorphisms (SNPs) in 321 pulmonary TB cases and 347 healthy controls from Guniea-Bissau in Africa. The ReliefF algorithm was applied first to generate a smaller set of the five most informative SNPs. MDR with 10-fold cross-validation was then applied to look at all possible combinations of two, three, four and five SNPs. The MDR model with the best testing accuracy (TA) consisted of SNPs rs2305619, rs187084, and rs11465421 (TA = 0.588) in PTX3, TLR9 and DC-Sign, respectively. A general 1000-fold permutation test of the null hypothesis of no association confirmed the statistical significance of the model (p = 0.008). An additional 1000-fold permutation test designed specifically to test the linear null hypothesis that the association effects are only additive confirmed the presence of non-additive (i.e. nonlinear) or epistatic effects (p = 0.013). An independent information-gain measure corroborated these results with a third-order epistatic interaction that was stronger than any lower-order associations. We have identified statistically significant evidence for a three-way epistatic interaction that is associated with susceptibility to TB. This interaction is stronger than any previously described one-way or two-way associations. This study highlights the importance of using machine learning methods that are designed to embrace, rather than ignore, the complexity of common diseases such as TB. We recommend future studies of the genetics of TB take into account the possibility that high-order epistatic interactions might play an important role in disease susceptibility.
Knösel, Michael; Ellenberger, David; Göldner, Yvonne; Sandoval, Paulo; Wiechmann, Dirk
2015-04-15
Sealant application during fixed appliances orthodontic treatment for enamel protection is common, however, reliable data on its durability in vivo are rare. This study aims at assessing the durability of a sealant (OpalSeal, Ultradent) for protection against white-spot lesion formation in orthodontic patients over 26 weeks in vivo, taking into account the provision or absence of an adequate oral hygiene. We tested the null hypothesis of (1) no significant abatement of the sealant after 26 weeks in fixed orthodontic treatment compared to baseline, and (2) no significant influence of the factor of brushing and oral hygiene (as screened by approximal plaque index, API) on the abatement of the sealant. Integrity and abatement of OpalSeal applicated directly following bracketing was assessed in thirty-six consecutive patients (n(teeth) = 796) undergoing orthodontic treatment with fixed appliances (male/female 12/24; mean age/SD 14.4/1.33 Y). Assessment of the fluorescing sealant preservation was by a black-light lamp, using a classification that was concepted in analogy to the ARI index: (3, sealant completely preserved; 2= > 50% preserved; 1 = <50%; 0 = no sealant observable) immediately following application (Baseline, T0), after 2 (T1), 8 (T2), 14 (T3), 20 (T4) and 26 weeks (T5). API was assessed at T0 and T1. Statistical analysis was by non-parametric repeated measures ANOVA (α = 5%, power >80%). At baseline, 43.4% of teeth had a positive API. Oral hygiene deteriorated after bracketing (T1, 53%) significantly. Null hypothesis (1) was rejected, while (2) was accepted: Mean values of both the well brushed and non-brushed anterior teeth undercut the score "1" at T3 (week 14). Despite a slightly better preservation of the sealer before and after T3 in not-sufficiently brushed (API-positive) teeth, this finding was statistically not significant. One single application of OpalSeal is unlikely to last throughout the entire fixed appliance treatment stage. On average, re-application of the sealant can be expected to be necessary after 3.5 months (week 14) in treatment.
Hagell, Peter; Westergren, Albert
Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).
Underpowered samples, false negatives, and unconscious learning.
Vadillo, Miguel A; Konstantinidis, Emmanouil; Shanks, David R
2016-02-01
The scientific community has witnessed growing concern about the high rate of false positives and unreliable results within the psychological literature, but the harmful impact of false negatives has been largely ignored. False negatives are particularly concerning in research areas where demonstrating the absence of an effect is crucial, such as studies of unconscious or implicit processing. Research on implicit processes seeks evidence of above-chance performance on some implicit behavioral measure at the same time as chance-level performance (that is, a null result) on an explicit measure of awareness. A systematic review of 73 studies of contextual cuing, a popular implicit learning paradigm, involving 181 statistical analyses of awareness tests, reveals how underpowered studies can lead to failure to reject a false null hypothesis. Among the studies that reported sufficient information, the meta-analytic effect size across awareness tests was d z = 0.31 (95 % CI 0.24-0.37), showing that participants' learning in these experiments was conscious. The unusually large number of positive results in this literature cannot be explained by selective publication. Instead, our analyses demonstrate that these tests are typically insensitive and underpowered to detect medium to small, but true, effects in awareness tests. These findings challenge a widespread and theoretically important claim about the extent of unconscious human cognition.
Zou, W; Ouyang, H
2016-02-01
We propose a multiple estimation adjustment (MEA) method to correct effect overestimation due to selection bias from a hypothesis-generating study (HGS) in pharmacogenetics. MEA uses a hierarchical Bayesian approach to model individual effect estimates from maximal likelihood estimation (MLE) in a region jointly and shrinks them toward the regional effect. Unlike many methods that model a fixed selection scheme, MEA capitalizes on local multiplicity independent of selection. We compared mean square errors (MSEs) in simulated HGSs from naive MLE, MEA and a conditional likelihood adjustment (CLA) method that model threshold selection bias. We observed that MEA effectively reduced MSE from MLE on null effects with or without selection, and had a clear advantage over CLA on extreme MLE estimates from null effects under lenient threshold selection in small samples, which are common among 'top' associations from a pharmacogenetics HGS.
Do Men Produce Higher Quality Ejaculates When Primed With Thoughts of Partner Infidelity?
Pham, Michael N; Barbaro, Nicole; Holub, Andrew M; Holden, Christopher J; Mogilski, Justin K; Lopes, Guilherme S; Nicolas, Sylis C A; Sela, Yael; Shackelford, Todd K; Zeigler-Hill, Virgil; Welling, Lisa L M
2018-01-01
Sperm competition theory can be used to generate the hypothesis that men alter the quality of their ejaculates as a function of sperm competition risk. Using a repeated measures experimental design, we investigated whether men produce a higher quality ejaculate when primed with cues to sperm competition (i.e., imagined partner infidelity) relative to a control prime. Men ( n = 45) submitted two masturbatory ejaculates-one ejaculate sample for each condition (i.e., sperm competition and control conditions). Ejaculates were assessed on 17 clinical parameters. The results did not support the hypothesis: Men did not produce higher quality ejaculates in the sperm competition condition relative to the control condition. Despite the null results of the current research, there is evidence for psychological and physiological adaptations to sperm competition in humans. We discuss methodological limitations that may have produced the null results and present methodological suggestions for research on human sperm competition.
Moro, Francesca; Morciano, Andrea; Tropea, Anna; Sagnella, Francesca; Palla, Carola; Scarinci, Elisa; Ciardulli, Andrea; Martinez, Daniela; Familiari, Alessandra; Liuzzo, Giovanna; Tritarelli, Alessandra; Cosentino, Nicola; Niccoli, Giampaolo; Crea, Filippo; Lanzone, Antonio; Apa, Rosanna
2013-12-01
To evaluate the long-term effects of drospirenone (DRSP)/ethinylestradiol (EE) alone, metformin alone, and DRSP/EE-metformin on CD4(+)CD28(null) T lymphocytes frequency, a cardiovascular risk marker, in patients with hyperinsulinemic polycystic ovary syndrome (PCOS). Randomized clinical trial. Ninety three patients with hyperinsulinemic PCOS were age matched and body mass index matched and randomized to receive a 6 months daily treatment with DRSP (3 mg)/EE (0.03 mg), or metformin (1500 mg), or DRSP/EE combined with metformin. CD4(+)CD28(null) T-cell frequencies. The DRSP/EE and metformin groups did not show any significant change in the CD4(+)CD28(null) frequency compared to the baseline. Interestingly, a statistically significant decrease in CD4(+)CD28(null) frequency occurred after 6 months of DRSP/EE-metformin (median 3-1.5; P < .01). Of note, this statistically significant association was confirmed after adjusting for baseline values in DRSP/EE-metformin group by analysis of covariance (P < .05). In women with hyperinsulinemic PCOS, combined therapy with DRSP/EE and metformin may reduce cardiovascular risk.
Frömke, Cornelia; Hothorn, Ludwig A; Kropf, Siegfried
2008-01-27
In many research areas it is necessary to find differences between treatment groups with several variables. For example, studies of microarray data seek to find a significant difference in location parameters from zero or one for ratios thereof for each variable. However, in some studies a significant deviation of the difference in locations from zero (or 1 in terms of the ratio) is biologically meaningless. A relevant difference or ratio is sought in such cases. This article addresses the use of relevance-shifted tests on ratios for a multivariate parallel two-sample group design. Two empirical procedures are proposed which embed the relevance-shifted test on ratios. As both procedures test a hypothesis for each variable, the resulting multiple testing problem has to be considered. Hence, the procedures include a multiplicity correction. Both procedures are extensions of available procedures for point null hypotheses achieving exact control of the familywise error rate. Whereas the shift of the null hypothesis alone would give straight-forward solutions, the problems that are the reason for the empirical considerations discussed here arise by the fact that the shift is considered in both directions and the whole parameter space in between these two limits has to be accepted as null hypothesis. The first algorithm to be discussed uses a permutation algorithm, and is appropriate for designs with a moderately large number of observations. However, many experiments have limited sample sizes. Then the second procedure might be more appropriate, where multiplicity is corrected according to a concept of data-driven order of hypotheses.
NASA Astrophysics Data System (ADS)
Psaltis, Dimitrios; Özel, Feryal; Chan, Chi-Kwan; Marrone, Daniel P.
2015-12-01
The half opening angle of a Kerr black hole shadow is always equal to (5 ± 0.2)GM/Dc2, where M is the mass of the black hole and D is its distance from the Earth. Therefore, measuring the size of a shadow and verifying whether it is within this 4% range constitutes a null hypothesis test of general relativity. We show that the black hole in the center of the Milky Way, Sgr A*, is the optimal target for performing this test with upcoming observations using the Event Horizon Telescope (EHT). We use the results of optical/IR monitoring of stellar orbits to show that the mass-to-distance ratio for Sgr A* is already known to an accuracy of ∼4%. We investigate our prior knowledge of the properties of the scattering screen between Sgr A* and the Earth, the effects of which will need to be corrected for in order for the black hole shadow to appear sharp against the background emission. Finally, we explore an edge detection scheme for interferometric data and a pattern matching algorithm based on the Hough/Radon transform and demonstrate that the shadow of the black hole at 1.3 mm can be localized, in principle, to within ∼9%. All these results suggest that our prior knowledge of the properties of the black hole, of scattering broadening, and of the accretion flow can only limit this general relativistic null hypothesis test with EHT observations of Sgr A* to ≲10%.
van Reenen, Mari; Westerhuis, Johan A; Reinecke, Carolus J; Venter, J Hendrik
2017-02-02
ERp is a variable selection and classification method for metabolomics data. ERp uses minimized classification error rates, based on data from a control and experimental group, to test the null hypothesis of no difference between the distributions of variables over the two groups. If the associated p-values are significant they indicate discriminatory variables (i.e. informative metabolites). The p-values are calculated assuming a common continuous strictly increasing cumulative distribution under the null hypothesis. This assumption is violated when zero-valued observations can occur with positive probability, a characteristic of GC-MS metabolomics data, disqualifying ERp in this context. This paper extends ERp to address two sources of zero-valued observations: (i) zeros reflecting the complete absence of a metabolite from a sample (true zeros); and (ii) zeros reflecting a measurement below the detection limit. This is achieved by allowing the null cumulative distribution function to take the form of a mixture between a jump at zero and a continuous strictly increasing function. The extended ERp approach is referred to as XERp. XERp is no longer non-parametric, but its null distributions depend only on one parameter, the true proportion of zeros. Under the null hypothesis this parameter can be estimated by the proportion of zeros in the available data. XERp is shown to perform well with regard to bias and power. To demonstrate the utility of XERp, it is applied to GC-MS data from a metabolomics study on tuberculosis meningitis in infants and children. We find that XERp is able to provide an informative shortlist of discriminatory variables, while attaining satisfactory classification accuracy for new subjects in a leave-one-out cross-validation context. XERp takes into account the distributional structure of data with a probability mass at zero without requiring any knowledge of the detection limit of the metabolomics platform. XERp is able to identify variables that discriminate between two groups by simultaneously extracting information from the difference in the proportion of zeros and shifts in the distributions of the non-zero observations. XERp uses simple rules to classify new subjects and a weight pair to adjust for unequal sample sizes or sensitivity and specificity requirements.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dawson, E.; Powell, J.F.; Sham, P.
1995-10-09
We describe a method of systematically searching for major genes in disorders of unknown mode of inheritance, using linkage analysis. Our method is designed to minimize the probability of missing linkage due to inadequate exploration of data. We illustrate this method with the results of a search for a locus for schizophrenia on chromosome 12 using 22 highly polymorphic markers in 23 high density pedigrees. The markers span approximately 85-90% of the chromosome and are on average 9.35 cM apart. We have analysed the data using the most plausible current genetic models and allowing for the presence of genetic heterogeneity.more » None of the markers was supportive of linkage and the distribution of the heterogeneity statistics was in accordance with the null hypothesis. 53 refs., 2 figs., 4 tabs.« less
Schumm, Walter R
2010-11-01
Ten narrative studies involving family histories of 262 children of gay fathers and lesbian mothers were evaluated statistically in response to Morrison's (2007) concerns about Cameron's (2006) research that had involved three narrative studies. Despite numerous attempts to bias the results in favour of the null hypothesis and allowing for up to 20 (of 63, 32%) coding errors, Cameron's (2006) hypothesis that gay and lesbian parents would be more likely to have gay, lesbian, bisexual or unsure (of sexual orientation) sons and daughters was confirmed. Percentages of children of gay and lesbian parents who adopted non-heterosexual identities ranged between 16% and 57%, with odds ratios of 1.7 to 12.1, depending on the mix of child and parent genders. Daughters of lesbian mothers were most likely (33% to 57%; odds ratios from 4.5 to 12.1) to report non-heterosexual identities. Data from ethnographic sources and from previous studies on gay and lesbian parenting were re-examined and found to support the hypothesis that social and parental influences may influence the expression of non-heterosexual identities and/or behaviour. Thus, evidence is presented from three different sources, contrary to most previous scientific opinion, even most previous scientific consensus, that suggests intergenerational transfer of sexual orientation can occur at statistically significant and substantial rates, especially for female parents or female children. In some analyses for sons, intergenerational transfer was not significant. Further research is needed with respect to pathways by which intergenerational transfer of sexual orientation may occur. The results confirm an evolving tendency among scholars to cite the possibility of some degree of intergenerational crossover of sexual orientation.
NASA Astrophysics Data System (ADS)
Sahoo, Ramendra; Jain, Vikrant
2017-04-01
Morphology of the landscape and derived features are regarded to be an important tool for inferring about tectonic activity in an area, since surface exposures of these subsurface processes may not be available or may get eroded away over time. This has led to an extensive research in application of the non-planar morphological attributes like river long profile and hypsometry for tectonic studies, whereas drainage network as a proxy for tectonic activity has not been explored greatly. Though, significant work has been done on drainage network pattern which started in a qualitative manner and over the years, has evolved to incorporate more quantitative aspects, like studying the evolution of a network under the influence of external and internal controls. Random Topology (RT) model is one of these concepts, which elucidates the connection between evolution of a drainage network pattern and the entropy of the drainage system and it states that in absence of any geological controls, a natural population of channel networks will be topologically random. We have used the entropy maximization principle to provide a theoretical structure for the RT model. Furthermore, analysis was carried out on the drainage network structures around Jwalamukhi thrust in the Kangra reentrant in western Himalayas, India, to investigate the tectonic activity in the region. Around one thousand networks were extracted from the foot-wall (fw) and hanging-wall (hw) region of the thrust sheet and later categorized based on their magnitudes. We have adopted the goodness of fit test for comparing the network patterns in fw and hw drainage with those derived using the RT model. The null hypothesis for the test was, the drainage networks in the fw are statistically more similar than those on the hw, to the network patterns derived using the RT model for any given magnitude. The test results are favorable to our null hypothesis for networks with smaller magnitudes (< 9), whereas for larger magnitudes, both hw and fw networks were found to be statistically not similar to the model network patterns. Calculation of pattern frequency for each magnitude and subsequent hypothesis testing were carried out using Matlab (v R2015a). Our results will help to define drainage network pattern as one of the geomorphic proxy to identify tectonically active area. This study also serve as a supplementary proof of the neo-tectonic control on the morphology of landscape and its derivatives around the Jwalamukhi thrust. Additionally, it will help to verify the theory of probabilistic evolution of drainage networks.
Phase II design with sequential testing of hypotheses within each stage.
Poulopoulou, Stavroula; Karlis, Dimitris; Yiannoutsos, Constantin T; Dafni, Urania
2014-01-01
The main goal of a Phase II clinical trial is to decide, whether a particular therapeutic regimen is effective enough to warrant further study. The hypothesis tested by Fleming's Phase II design (Fleming, 1982) is [Formula: see text] versus [Formula: see text], with level [Formula: see text] and with a power [Formula: see text] at [Formula: see text], where [Formula: see text] is chosen to represent the response probability achievable with standard treatment and [Formula: see text] is chosen such that the difference [Formula: see text] represents a targeted improvement with the new treatment. This hypothesis creates a misinterpretation mainly among clinicians that rejection of the null hypothesis is tantamount to accepting the alternative, and vice versa. As mentioned by Storer (1992), this introduces ambiguity in the evaluation of type I and II errors and the choice of the appropriate decision at the end of the study. Instead of testing this hypothesis, an alternative class of designs is proposed in which two hypotheses are tested sequentially. The hypothesis [Formula: see text] versus [Formula: see text] is tested first. If this null hypothesis is rejected, the hypothesis [Formula: see text] versus [Formula: see text] is tested next, in order to examine whether the therapy is effective enough to consider further testing in a Phase III study. For the derivation of the proposed design the exact binomial distribution is used to calculate the decision cut-points. The optimal design parameters are chosen, so as to minimize the average sample number (ASN) under specific upper bounds for error levels. The optimal values for the design were found using a simulated annealing method.
Sobiecki, Jakub G
2017-08-01
Despite the consistent findings of lower total cancer incidence in vegetarians than in meat-eaters in the UK, the results of studies of colorectal cancer (CRC) risk in British vegetarians have largely been null. This was in contrast to the hypothesis of a decreased risk of CRC in this population due to null intake of red and processed meats and increased intake of fibre. Although the data are inconsistent, it has been suggested that selenium (Se) status may influence CRC risk. A literature review was performed of studies on CRC risk in vegetarians, Se intakes and status in vegetarians, and changes of Se intakes and status in the UK throughout the follow-up periods of studies on CRC risk in British vegetarians. Vegetarians in the UK and other low-Se areas were found to have low Se intakes and status compared to non-vegetarians. There was some evidence of a reverse J-shaped curve of Se intakes and status in the UK throughout the last three decades. These presumed patterns were followed by the changes in CRC mortality or incidence in British vegetarians during this period. Available data on Se intake and status in British vegetarians, as well as the relationship between their secular changes in the UK and changes in CRC risk in this dietary group, are compatible with the hypothesis that low Se status may contribute to the largely null results of studies of CRC risk in vegetarians in the UK.
Clark, Cameron M; Lawlor-Savage, Linette; Goghari, Vina M
2017-01-01
Training of working memory as a method of increasing working memory capacity and fluid intelligence has received much attention in recent years. This burgeoning field remains highly controversial with empirically-backed disagreements at all levels of evidence, including individual studies, systematic reviews, and even meta-analyses. The current study investigated the effect of a randomized six week online working memory intervention on untrained cognitive abilities in a community-recruited sample of healthy young adults, in relation to both a processing speed training active control condition, as well as a no-contact control condition. Results of traditional null hypothesis significance testing, as well as Bayesian factor analyses, revealed support for the null hypothesis across all cognitive tests administered before and after training. Importantly, all three groups were similar at pre-training for a variety of individual variables purported to moderate transfer of training to fluid intelligence, including personality traits, motivation to train, and expectations of cognitive improvement from training. Because these results are consistent with experimental trials of equal or greater methodological rigor, we suggest that future research re-focus on: 1) other promising interventions known to increase memory performance in healthy young adults, and; 2) examining sub-populations or alternative populations in which working memory training may be efficacious.
The Importance of Proving the Null
Gallistel, C. R.
2010-01-01
Null hypotheses are simple, precise, and theoretically important. Conventional statistical analysis cannot support them; Bayesian analysis can. The challenge in a Bayesian analysis is to formulate a suitably vague alternative, because the vaguer the alternative is (the more it spreads out the unit mass of prior probability), the more the null is favored. A general solution is a sensitivity analysis: Compute the odds for or against the null as a function of the limit(s) on the vagueness of the alternative. If the odds on the null approach 1 from above as the hypothesized maximum size of the possible effect approaches 0, then the data favor the null over any vaguer alternative to it. The simple computations and the intuitive graphic representation of the analysis are illustrated by the analysis of diverse examples from the current literature. They pose 3 common experimental questions: (a) Are 2 means the same? (b) Is performance at chance? (c) Are factors additive? PMID:19348549
Quantifying lead-time bias in risk factor studies of cancer through simulation.
Jansen, Rick J; Alexander, Bruce H; Anderson, Kristin E; Church, Timothy R
2013-11-01
Lead-time is inherent in early detection and creates bias in observational studies of screening efficacy, but its potential to bias effect estimates in risk factor studies is not always recognized. We describe a form of this bias that conventional analyses cannot address and develop a model to quantify it. Surveillance Epidemiology and End Results (SEER) data form the basis for estimates of age-specific preclinical incidence, and log-normal distributions describe the preclinical duration distribution. Simulations assume a joint null hypothesis of no effect of either the risk factor or screening on the preclinical incidence of cancer, and then quantify the bias as the risk-factor odds ratio (OR) from this null study. This bias can be used as a factor to adjust observed OR in the actual study. For this particular study design, as average preclinical duration increased, the bias in the total-physical activity OR monotonically increased from 1% to 22% above the null, but the smoking OR monotonically decreased from 1% above the null to 5% below the null. The finding of nontrivial bias in fixed risk-factor effect estimates demonstrates the importance of quantitatively evaluating it in susceptible studies. Copyright © 2013 Elsevier Inc. All rights reserved.
The Many Null Distributions of Person Fit Indices.
ERIC Educational Resources Information Center
Molenaar, Ivo W.; Hoijtink, Herbert
1990-01-01
Statistical properties of person fit indices are reviewed as indicators of the extent to which a person's score pattern is in agreement with a measurement model. Distribution of a fit index and ability-free fit evaluation are discussed. The null distribution was simulated for a test of 20 items. (SLD)
Fully Bayesian tests of neutrality using genealogical summary statistics.
Drummond, Alexei J; Suchard, Marc A
2008-10-31
Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome. Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size. Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.
Natural killer T cell facilitated engraftment of rat skin but not islet xenografts in mice.
Gordon, Ethel J; Kelkar, Vinaya
2009-01-01
We have studied cellular components required for xenograft survival mediated by anti-CD154 monoclonal antibody (mAb) and a transfusion of donor spleen cells and found that the elimination of CD4(+) but not CD8(+) cells significantly improves graft survival. A contribution of other cellular components, such as natural killer (NK) cells and natural killer T (NKT) cells, for costimulation blockade-induced xenograft survival has not been clearly defined. We therefore tested the hypothesis that NK or NKT cells would promote rat islet and skin xenograft acceptance in mice. Lewis rat islets or skin was transplanted into wild type B6 mice or into B6 mice that were Jalpha18(null), CD1(null), or beta2 microglobulin (beta2M)(null) NK 1.1 depleted, or perforin(null). Graft recipients were pretreated with an infusion of donor derived spleen cells and a brief course of anti-CD154 mAb treatments. Additional groups received mAb or cells only. We first observed that the depletion of NK1.1 cells does not significantly interfere with graft survival in C57BL/6 (B6) mice. We used NKT cell deficient B6 mice to test the hypothesis that NKT cells are involved in islet and skin xenograft survival in our model. These mice bear a null mutation in the gene for the Jalpha18 component of the T-cell receptor. The component is uniquely associated with NKT cells. We found no difference in islet xenograft survival between Jalpha18(null) and wild type B6 mice. In contrast, median skin graft survival appeared shorter in Jalpha18(null) recipients. These data imply a role for Jalpha18(+) NKT cells in skin xenograft survival in treated mice. In order to confirm this inference, we tested skin xenograft survival in B6 CD1(null) mice because NKT cells are CD1 restricted. Results of these trials demonstrate that the absence of CD1(+) cells adversely affects rat skin graft survival. An additional assay in beta2M(null) mice demonstrated a requirement for major histocompatibility complex (MHC) class I expression in the graft host, and we demonstrate that CD1 is the requisite MHC component. We further demonstrated that, unlike reports for allograft survival, skin xenograft survival does not require perforin-secreting NK cells. We conclude that MHC class I(+) CD1(+) Jalpha18(+) NKT cells promote the survival of rat skin but not rat islet xenografts. These studies implicate different mechanisms for inducing and maintaining islet vs. skin xenograft survival in mice treated with donor antigen and anti-CD154 mAb, and further indicate a role for NKT cells but not NK cells in skin xenograft survival.
Techniques for recognizing identity of several response functions from the data of visual inspection
NASA Astrophysics Data System (ADS)
Nechval, Nicholas A.
1996-08-01
The purpose of this paper is to present some efficient techniques for recognizing from the observed data whether several response functions are identical to each other. For example, in an industrial setting the problem may be to determine whether the production coefficients established in a small-scale pilot study apply to each of several large- scale production facilities. The techniques proposed here combine sensor information from automated visual inspection of manufactured products which is carried out by means of pixel-by-pixel comparison of the sensed image of the product to be inspected with some reference pattern (or image). Let (a1, . . . , am) be p-dimensional parameters associated with m response models of the same type. This study is concerned with the simultaneous comparison of a1, . . . , am. A generalized maximum likelihood ratio (GMLR) test is derived for testing equality of these parameters, where each of the parameters represents a corresponding vector of regression coefficients. The GMLR test reduces to an equivalent test based on a statistic that has an F distribution. The main advantage of the test lies in its relative simplicity and the ease with which it can be applied. Another interesting test for the same problem is an application of Fisher's method of combining independent test statistics which can be considered as a parallel procedure to the GMLR test. The combination of independent test statistics does not appear to have been used very much in applied statistics. There does, however, seem to be potential data analytic value in techniques for combining distributional assessments in relation to statistically independent samples which are of joint experimental relevance. In addition, a new iterated test for the problem defined above is presented. A rejection of the null hypothesis by this test provides some reason why all the parameters are not equal. A numerical example is discussed in the context of the proposed procedures for hypothesis testing.
When decision heuristics and science collide.
Yu, Erica C; Sprenger, Amber M; Thomas, Rick P; Dougherty, Michael R
2014-04-01
The ongoing discussion among scientists about null-hypothesis significance testing and Bayesian data analysis has led to speculation about the practices and consequences of "researcher degrees of freedom." This article advances this debate by asking the broader questions that we, as scientists, should be asking: How do scientists make decisions in the course of doing research, and what is the impact of these decisions on scientific conclusions? We asked practicing scientists to collect data in a simulated research environment, and our findings show that some scientists use data collection heuristics that deviate from prescribed methodology. Monte Carlo simulations show that data collection heuristics based on p values lead to biases in estimated effect sizes and Bayes factors and to increases in both false-positive and false-negative rates, depending on the specific heuristic. We also show that using Bayesian data collection methods does not eliminate these biases. Thus, our study highlights the little appreciated fact that the process of doing science is a behavioral endeavor that can bias statistical description and inference in a manner that transcends adherence to any particular statistical framework.
Robust misinterpretation of confidence intervals.
Hoekstra, Rink; Morey, Richard D; Rouder, Jeffrey N; Wagenmakers, Eric-Jan
2014-10-01
Null hypothesis significance testing (NHST) is undoubtedly the most common inferential technique used to justify claims in the social sciences. However, even staunch defenders of NHST agree that its outcomes are often misinterpreted. Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual. Nevertheless, little is known about how researchers interpret CIs. In this study, 120 researchers and 442 students-all in the field of psychology-were asked to assess the truth value of six particular statements involving different interpretations of a CI. Although all six statements were false, both researchers and students endorsed, on average, more than three statements, indicating a gross misunderstanding of CIs. Self-declared experience with statistics was not related to researchers' performance, and, even more surprisingly, researchers hardly outperformed the students, even though the students had not received any education on statistical inference whatsoever. Our findings suggest that many researchers do not know the correct interpretation of a CI. The misunderstandings surrounding p-values and CIs are particularly unfortunate because they constitute the main tools by which psychologists draw conclusions from data.
Circumpulsar Asteroids: Inferences from Nulling Statistics and High Energy Correlations
NASA Astrophysics Data System (ADS)
Shannon, Ryan; Cordes, J. M.
2006-12-01
We have proposed that some classes of radio pulsar variability are associated with the entry of neutral asteroidal material into the pulsar magnetosphere. The region surrounding neutron stars is polluted with supernova fall-back material, which collapses and condenses into an asteroid-bearing disk that is stable for millions of years. Over time, collisional and radiative processes cause the asteroids to migrate inward until they are heated to the point of ionization. For older and cooler pulsars, asteroids ionize within the large magnetospheres and inject a sufficient amount of charged particles to alter the electrodynamics of the gap regions and modulate emission processes. This extrinsic model unifies many observed phenomena of variability that occur on time scales that are disparate with the much shorter time scales associated with pulsars and their magnetospheres. One such type of variability is nulling, in which certain pulsars exhibit episodes of quiescence that for some objects may be as short as a few pulse periods, but, for others, is longer than days. Here, in the context of this model, we examine the nulling phenomenon. We analyze the relationship between in-falling material and the statistics of nulling. In addition, as motivation for further high energy observations, we consider the relationship between the nulling and other magnetospheric processes.
Occupational exposure assessment: Practices in Malaysian nuclear agency
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sarowi, S. Muhd, E-mail: suzie@nuclearmalaysia.gov.my; Ramli, S. A.; Kontol, K. Mohamad
Malaysian Nuclear Agency (Nuclear Malaysia) is the leading agency in introducing and promoting the application of nuclear science technology in Malaysia. The agency provides major nuclear facilities purposely for research and commercialisation such as reactor, irradiation plants and radioisotope production laboratory. When dealing with ionizing radiation, there is an obligatory requirement to monitor and assess the radiation exposure to the workers. The personal dose of radiation workers were monitored monthly by assessing their Thermoluminescence Dosimeter (TLD) dose reading. This paper will discuss the current practice in managing, assessing, record keeping and reporting of the occupational exposure in Nuclear Malaysia includingmore » the Health Physic Group roles and challenges. The statistics on occupational radiation exposure of monitored workers working in different fields in Nuclear Malaysia from 2011 - 2013 will also be presented. The results show that the null hypothesis (H{sub 0}) was accepted which the means of every populations are all equal or not differ significantly. This hypothesis states that the dose exposure received by the radiation workers in Nuclear Malaysia is similar and there were no significant changes from 2011 to 2013. The radiation monitoring programme correlate with the requirement of our national law, the Atomic Energy Licensing Act 1984 (Act 304)« less
Occupational exposure assessment: Practices in Malaysian nuclear agency
NASA Astrophysics Data System (ADS)
Sarowi, S. Muhd; Ramli, S. A.; Kontol, K. Mohamad; Rahman, N. A. H. Abd.
2016-01-01
Malaysian Nuclear Agency (Nuclear Malaysia) is the leading agency in introducing and promoting the application of nuclear science technology in Malaysia. The agency provides major nuclear facilities purposely for research and commercialisation such as reactor, irradiation plants and radioisotope production laboratory. When dealing with ionizing radiation, there is an obligatory requirement to monitor and assess the radiation exposure to the workers. The personal dose of radiation workers were monitored monthly by assessing their Thermoluminescence Dosimeter (TLD) dose reading. This paper will discuss the current practice in managing, assessing, record keeping and reporting of the occupational exposure in Nuclear Malaysia including the Health Physic Group roles and challenges. The statistics on occupational radiation exposure of monitored workers working in different fields in Nuclear Malaysia from 2011 - 2013 will also be presented. The results show that the null hypothesis (H₀) was accepted which the means of every populations are all equal or not differ significantly. This hypothesis states that the dose exposure received by the radiation workers in Nuclear Malaysia is similar and there were no significant changes from 2011 to 2013. The radiation monitoring programme correlate with the requirement of our national law, the Atomic Energy Licensing Act 1984 (Act 304).
Are secular correlations between sunspots, geomagnetic activity, and global temperature significant?
Love, J.J.; Mursula, K.; Tsai, V.C.; Perkins, D.M.
2011-01-01
Recent studies have led to speculation that solar-terrestrial interaction, measured by sunspot number and geomagnetic activity, has played an important role in global temperature change over the past century or so. We treat this possibility as an hypothesis for testing. We examine the statistical significance of cross-correlations between sunspot number, geomagnetic activity, and global surface temperature for the years 1868-2008, solar cycles 11-23. The data contain substantial autocorrelation and nonstationarity, properties that are incompatible with standard measures of cross-correlational significance, but which can be largely removed by averaging over solar cycles and first-difference detrending. Treated data show an expected statistically- significant correlation between sunspot number and geomagnetic activity, Pearson p < 10-4, but correlations between global temperature and sunspot number (geomagnetic activity) are not significant, p = 0.9954, (p = 0.8171). In other words, straightforward analysis does not support widely-cited suggestions that these data record a prominent role for solar-terrestrial interaction in global climate change. With respect to the sunspot-number, geomagnetic-activity, and global-temperature data, three alternative hypotheses remain difficult to reject: (1) the role of solar-terrestrial interaction in recent climate change is contained wholly in long-term trends and not in any shorter-term secular variation, or, (2) an anthropogenic signal is hiding correlation between solar-terrestrial variables and global temperature, or, (3) the null hypothesis, recent climate change has not been influenced by solar-terrestrial interaction. ?? 2011 by the American Geophysical Union.
A Unified Mixed-Effects Model for Rare-Variant Association in Sequencing Studies
Sun, Jianping; Zheng, Yingye; Hsu, Li
2013-01-01
For rare-variant association analysis, due to extreme low frequencies of these variants, it is necessary to aggregate them by a prior set (e.g., genes and pathways) in order to achieve adequate power. In this paper, we consider hierarchical models to relate a set of rare variants to phenotype by modeling the effects of variants as a function of variant characteristics while allowing for variant-specific effect (heterogeneity). We derive a set of two score statistics, testing the group effect by variant characteristics and the heterogeneity effect. We make a novel modification to these score statistics so that they are independent under the null hypothesis and their asymptotic distributions can be derived. As a result, the computational burden is greatly reduced compared with permutation-based tests. Our approach provides a general testing framework for rare variants association, which includes many commonly used tests, such as the burden test [Li and Leal, 2008] and the sequence kernel association test [Wu et al., 2011], as special cases. Furthermore, in contrast to these tests, our proposed test has an added capacity to identify which components of variant characteristics and heterogeneity contribute to the association. Simulations under a wide range of scenarios show that the proposed test is valid, robust and powerful. An application to the Dallas Heart Study illustrates that apart from identifying genes with significant associations, the new method also provides additional information regarding the source of the association. Such information may be useful for generating hypothesis in future studies. PMID:23483651
2015-01-01
Objective: A study to compare the usage of throat swab testing for leukocyte esterase on a test strip(urine dip stick-multi stick) to rapid strep test for rapid diagnosis of Group A Beta hemolytic streptococci in cases of acute pharyngitis in children. Hypothesis: The testing of throat swab for leukocyte esterase on test strip currently used for urine testing may be used to detect throat infection and might be as useful as rapid strep. Methods: All patients who come with a complaint of sore throat and fever were examined clinically for erythema of pharynx, tonsils and also for any exudates. Informed consent was obtained from the parents and assent from the subjects. 3 swabs were taken from pharyngo-tonsillar region, testing for culture, rapid strep & Leukocyte Esterase. Results: Total number is 100. Cultures 9(+); for rapid strep== 84(-) and16 (+); For LE== 80(-) and 20(+) Statistics: From data configuration Rapid Strep versus LE test don’t seem to be a random (independent) assignment but extremely aligned. The Statistical results show rapid and LE show very agreeable results. Calculated Value of Chi Squared Exceeds Tabulated under 1 Degree Of Freedom (P<.0.0001) reject Null Hypothesis and Conclude Alternative Conclusions: Leukocyte esterase on throat swab is as useful as rapid strep test for rapid diagnosis of strep pharyngitis on test strip currently used for urine dip stick causing acute pharyngitis in children. PMID:27335975
Turned versus anodised dental implants: a meta-analysis.
Chrcanovic, B R; Albrektsson, T; Wennerberg, A
2016-09-01
The aim of this meta-analysis was to test the null hypothesis of no difference in the implant failure rates, marginal bone loss (MBL)and post-operative infection for patients being rehabilitated by turned versus anodised-surface implants, against the alternative hypothesis of a difference. An electronic search without time or language restrictions was undertaken in November 2015. Eligibility criteria included clinical human studies, either randomised or not. Thirty-eight publications were included. The results suggest a risk ratio of 2·82 (95% CI 1·95-4·06, P < 0·00001) for failure of turned implants, when compared to anodised-surface implants. Sensitivity analyses showed similar results when only the studies inserting implants in maxillae or mandibles were pooled. There were no statistically significant effects of turned implants on the MBL (mean difference-MD 0·02, 95%CI -0·16-0·20; P = 0·82) in comparison to anodised implants. The results of a meta-regression considering the follow-up period as a covariate suggested an increase of the MD with the increase in the follow-up time (MD increase 0·012 mm year(-1) ), however, without a statistical significance (P = 0·813). Due to lack of satisfactory information, meta-analysis for the outcome 'post-operative infection' was not performed. The results have to be interpreted with caution due to the presence of several confounding factors in the included studies. © 2016 John Wiley & Sons Ltd.
Ioannidis, John P. A.
2017-01-01
A typical rule that has been used for the endorsement of new medications by the Food and Drug Administration is to have two trials, each convincing on its own, demonstrating effectiveness. “Convincing” may be subjectively interpreted, but the use of p-values and the focus on statistical significance (in particular with p < .05 being coined significant) is pervasive in clinical research. Therefore, in this paper, we calculate with simulations what it means to have exactly two trials, each with p < .05, in terms of the actual strength of evidence quantified by Bayes factors. Our results show that different cases where two trials have a p-value below .05 have wildly differing Bayes factors. Bayes factors of at least 20 in favor of the alternative hypothesis are not necessarily achieved and they fail to be reached in a large proportion of cases, in particular when the true effect size is small (0.2 standard deviations) or zero. In a non-trivial number of cases, evidence actually points to the null hypothesis, in particular when the true effect size is zero, when the number of trials is large, and when the number of participants in both groups is low. We recommend use of Bayes factors as a routine tool to assess endorsement of new medications, because Bayes factors consistently quantify strength of evidence. Use of p-values may lead to paradoxical and spurious decision-making regarding the use of new medications. PMID:28273140
Are secular correlations between sunspots, geomagnetic activity, and global temperature significant?
NASA Astrophysics Data System (ADS)
Love, Jeffrey J.; Mursula, Kalevi; Tsai, Victor C.; Perkins, David M.
2011-11-01
Recent studies have led to speculation that solar-terrestrial interaction, measured by sunspot number and geomagnetic activity, has played an important role in global temperature change over the past century or so. We treat this possibility as an hypothesis for testing. We examine the statistical significance of cross-correlations between sunspot number, geomagnetic activity, and global surface temperature for the years 1868-2008, solar cycles 11-23. The data contain substantial autocorrelation and nonstationarity, properties that are incompatible with standard measures of cross-correlational significance, but which can be largely removed by averaging over solar cycles and first-difference detrending. Treated data show an expected statistically-significant correlation between sunspot number and geomagnetic activity, Pearson p < 10-4, but correlations between global temperature and sunspot number (geomagnetic activity) are not significant, p = 0.9954, (p = 0.8171). In other words, straightforward analysis does not support widely-cited suggestions that these data record a prominent role for solar-terrestrial interaction in global climate change. With respect to the sunspot-number, geomagnetic-activity, and global-temperature data, three alternative hypotheses remain difficult to reject: (1) the role of solar-terrestrial interaction in recent climate change is contained wholly in long-term trends and not in any shorter-term secular variation, or, (2) an anthropogenic signal is hiding correlation between solar-terrestrial variables and global temperature, or, (3) the null hypothesis, recent climate change has not been influenced by solar-terrestrial interaction.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Psaltis, Dimitrios; Özel, Feryal; Chan, Chi-Kwan
2015-12-01
The half opening angle of a Kerr black hole shadow is always equal to (5 ± 0.2)GM/Dc{sup 2}, where M is the mass of the black hole and D is its distance from the Earth. Therefore, measuring the size of a shadow and verifying whether it is within this 4% range constitutes a null hypothesis test of general relativity. We show that the black hole in the center of the Milky Way, Sgr A*, is the optimal target for performing this test with upcoming observations using the Event Horizon Telescope (EHT). We use the results of optical/IR monitoring of stellar orbits to showmore » that the mass-to-distance ratio for Sgr A* is already known to an accuracy of ∼4%. We investigate our prior knowledge of the properties of the scattering screen between Sgr A* and the Earth, the effects of which will need to be corrected for in order for the black hole shadow to appear sharp against the background emission. Finally, we explore an edge detection scheme for interferometric data and a pattern matching algorithm based on the Hough/Radon transform and demonstrate that the shadow of the black hole at 1.3 mm can be localized, in principle, to within ∼9%. All these results suggest that our prior knowledge of the properties of the black hole, of scattering broadening, and of the accretion flow can only limit this general relativistic null hypothesis test with EHT observations of Sgr A* to ≲10%.« less
Dewar, Alastair; Camplin, William; Barry, Jon; Kennedy, Paul
2014-12-01
Since the cessation of phosphoric acid production (in 1992) and subsequent closure and decommissioning (2004) of the Rhodia Consumer Specialties Limited plant in Whitehaven, the concentration levels of polonium-210 ((210)Po) in local marine materials have declined towards a level more typical of natural background. However, enhanced concentrations of (210)Po and lead-210 ((210)Pb), due to this historic industrial activity (plant discharges and ingrowth of (210)Po from (210)Pb), have been observed in fish and shellfish samples collected from this area over the last 20 years. The results of this monitoring, and assessments of the dose from these radionuclides, to high-rate aquatic food consumers are published annually in the Radioactivity in Food and the Environment (RIFE) report series. The RIFE assessment uses a simple approach to determine whether and by how much activity is enhanced above the normal background. As a potential tool to improve the assessment of enhanced concentrations of (210)Po in routine dose assessments, a formal statistical test, where the null hypothesis is that the Whitehaven area is contaminated with (210)Po, was applied to sample data. This statistical, modified "green", test has been used in assessments of chemicals by the OSPAR commission. It involves comparison of the reported environmental concentrations of (210)Po in a given aquatic species against its corresponding Background Assessment Concentration (BAC), which is based upon environmental samples collected from regions assumed to be not enhanced by industrial sources of (210)Po, over the period for which regular monitoring data are available (1990-2010). Unlike RIFE, these BAC values take account of the variability of the natural background level. As an example, for 2010 data, crab, lobster, mussels and winkles passed the modified "green" test (i.e. the null hypothesis is rejected) and as such are deemed not to be enhanced. Since the cessation of phosphoric acid production in 1992, the modified "green" test pass rate for crustaceans is ∼53% and ∼64% for molluscs. Results of dose calculations are made (i) using the RIFE approach and (ii) with the application of the modified "green" test, where samples passing the modified "green" test are assumed to have background levels and hence zero enhancement of (210)Po. Applying the modified "green" test reduces the dose on average by 44% over the period of this study (1990-2010). Crown Copyright © 2014. Published by Elsevier Ltd. All rights reserved.
Null Hypothesis Significance Testing and "p" Values
ERIC Educational Resources Information Center
Travers, Jason C.; Cook, Bryan G.; Cook, Lysandra
2017-01-01
"p" values are commonly reported in quantitative research, but are often misunderstood and misinterpreted by research consumers. Our aim in this article is to provide special educators with guidance for appropriately interpreting "p" values, with the broader goal of improving research consumers' understanding and interpretation…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Robinowitz, R.; Roberts, W.R.; Dolan, M.P.
1989-09-01
This study asked, What are the psychological characteristics of Vietnam combat veterans who claim Agent Orange exposure when compared with combat-experienced cohorts who do not report such contamination The question was researched among 153 heroin addicts, polydrug abusers, and chronic alcoholics who were seeking treatment: 58 reported moderate to high defoliant exposure while in combat; 95 reported minimal to no exposure while in Vietnam. The null hypothesis was accepted for measures of childhood and present family social climate, premilitary backgrounds, reasons for seeking treatment, patterns and types of illicit drug and alcohol use, interpersonal problems, intellectual functioning, and short-term memory.more » The null hypothesis was rejected for personality differences, however, those who self-reported high Agent Orange exposure scored significantly higher on MMPI scales F, Hypochondriasis, Depression, Paranoia, Psychasthenia, Schizophrenia, Mania, and Social interoversion. The results suggest that clinicians carefully assess attributional processing of those who report traumatic experience.« less
TMJ symptoms reduce chewing amplitude and velocity, and increase variability.
Radke, John C; Kamyszek, Greg J; Kull, Robert S; Velasco, Gerardo R
2017-09-04
The null hypothesis was that mandibular amplitude, velocity, and variability during gum chewing are not altered in subjects with temporomandibular joint (TMJ) internal derangements (ID). Thirty symptomatic subjects with confirmed ID consented to chew gum on their left and right sides while being tracked by an incisor-point jaw tracker. A gender and age matched control group (p > 0.67) volunteered to be likewise recorded. Student's t-test compared the ID group's mean values to the control group. The control group opened wider (p < 0.05) and chewed faster (p < 0.05) than the ID group. The mean cycle time of the ID group (0.929 s) was longer than the control group (0.751 s; p < 0.05) and more variable (p < 0.05). The ID group exhibited reduced amplitude and velocity but increased variability during chewing. The null hypothesis was rejected. Further study of adaptation to ID by patients should be pursued.
Rothmann, Mark
2005-01-01
When testing the equality of means from two different populations, a t-test or large sample normal test tend to be performed. For these tests, when the sample size or design for the second sample is dependent on the results of the first sample, the type I error probability is altered for each specific possibility in the null hypothesis. We will examine the impact on the type I error probabilities for two confidence interval procedures and procedures using test statistics when the design for the second sample or experiment is dependent on the results from the first sample or experiment (or series of experiments). Ways for controlling a desired maximum type I error probability or a desired type I error rate will be discussed. Results are applied to the setting of noninferiority comparisons in active controlled trials where the use of a placebo is unethical.
Random Positions of Dendritic Spines in Human Cerebral Cortex
Morales, Juan; Benavides-Piccione, Ruth; Dar, Mor; Fernaud, Isabel; Rodríguez, Angel; Anton-Sanchez, Laura; Bielza, Concha; Larrañaga, Pedro; DeFelipe, Javier
2014-01-01
Dendritic spines establish most excitatory synapses in the brain and are located in Purkinje cell's dendrites along helical paths, perhaps maximizing the probability to contact different axons. To test whether spine helixes also occur in neocortex, we reconstructed >500 dendritic segments from adult human cortex obtained from autopsies. With Fourier analysis and spatial statistics, we analyzed spine position along apical and basal dendrites of layer 3 pyramidal neurons from frontal, temporal, and cingulate cortex. Although we occasionally detected helical positioning, for the great majority of dendrites we could not reject the null hypothesis of spatial randomness in spine locations, either in apical or basal dendrites, in neurons of different cortical areas or among spines of different volumes and lengths. We conclude that in adult human neocortex spine positions are mostly random. We discuss the relevance of these results for spine formation and plasticity and their functional impact for cortical circuits. PMID:25057209
From Discovery to Justification: Outline of an Ideal Research Program in Empirical Psychology
Witte, Erich H.; Zenker, Frank
2017-01-01
The gold standard for an empirical science is the replicability of its research results. But the estimated average replicability rate of key-effects that top-tier psychology journals report falls between 36 and 39% (objective vs. subjective rate; Open Science Collaboration, 2015). So the standard mode of applying null-hypothesis significance testing (NHST) fails to adequately separate stable from random effects. Therefore, NHST does not fully convince as a statistical inference strategy. We argue that the replicability crisis is “home-made” because more sophisticated strategies can deliver results the successful replication of which is sufficiently probable. Thus, we can overcome the replicability crisis by integrating empirical results into genuine research programs. Instead of continuing to narrowly evaluate only the stability of data against random fluctuations (discovery context), such programs evaluate rival hypotheses against stable data (justification context). PMID:29163256
De Meeûs, Thierry
2014-03-01
In population genetics data analysis, researchers are often faced to the problem of decision making from a series of tests of the same null hypothesis. This is the case when one wants to test differentiation between pathogens found on different host species sampled from different locations (as many tests as number of locations). Many procedures are available to date but not all apply to all situations. Finding which tests are significant or if the whole series is significant, when tests are independent or not do not require the same procedures. In this note I describe several procedures, among the simplest and easiest to undertake, that should allow decision making in most (if not all) situations population geneticists (or biologists) should meet, in particular in host-parasite systems. Copyright © 2014 Elsevier B.V. All rights reserved.
Kelly, Clint D
2006-09-01
That empirical evidence is replicable is the foundation of science. Ronald Fisher a founding father of biostatistics, recommended that a null hypothesis be rejected more than once because "no isolated experiment, however significant in itself can suffice for the experimental demonstration of any natural phenomenon" (Fisher 1974:14). Despite this demand, animal behaviorists and behavioral ecologists seldom replicate studies. This practice is not part of our scientific culture, as it is in chemistry or physics, due to a number of factors, including a general disdain by journal editors and thesis committees for unoriginal work. I outline why and how we should replicate empirical studies, which studies should be given priority, and then elaborate on why we do not engage in this necessary endeavor. I also explain how to employ various statistics to test the replicability of a series of studies and illustrate these using published studies from the literature.
Greene, Leslie E; Riederer, Anne M; Marcus, Michele; Lkhasuren, Oyuntogos
2010-01-01
To our knowledge, reproductive health effects among male leather tannery workers have not been previously investigated. Tannery work involves exposure to chromium, solvents, and other chemicals, which has been associated with adverse pregnancy and fertility outcomes in animals or humans in some studies. This study retrospectively investigates the association of male leather tannery work with preterm delivery, spontaneous abortion, time to pregnancy, and infertility by comparing tannery employees to other workers in Ulaanbaatar, Mongolia. Participants were randomly selected from current employee rosters at eight tanneries and two bread-making companies. The results of this research suggest that tannery work may be associated with reduced fertility in males. The study had limited statistical power, and some factors are likely to have biased findings toward the null hypothesis; other limitations and possible sources of undetermined bias give reason for cautious interpretation. Additional studies should be conducted to further examine fertility among tannery workers.
Surveillance of the colorectal cancer disparities among demographic subgroups: a spatial analysis.
Hsu, Chiehwen Ed; Mas, Francisco Soto; Hickey, Jessica M; Miller, Jerry A; Lai, Dejian
2006-09-01
The literature suggests that colorectal cancer mortality in Texas is distributed inhomogeneously among specific demographic subgroups and in certain geographic regions over an extended period. To understand the extent of the demographic and geographic disparities, the present study examined colorectal cancer mortality in 15 demographic groups in Texas counties between 1990 and 2001. The Spatial Scan Statistic was used to assess the standardized mortality ratio, duration and age-adjusted rates of excess mortality, and their respective p-values for testing the null hypothesis of homogeneity of geographic and temporal distribution. The study confirmed the excess mortality in some Texas counties found in the literature, identified 13 additional excess mortality regions, and found 4 health regions with persistent excess mortality involving several population subgroups. Health disparities of colorectal cancer mortality continue to exist in Texas demographic subpopulations. Health education and intervention programs should be directed to the at-risk subpopulations in the identified regions.
Replicates in high dimensions, with applications to latent variable graphical models.
Tan, Kean Ming; Ning, Yang; Witten, Daniela M; Liu, Han
2016-12-01
In classical statistics, much thought has been put into experimental design and data collection. In the high-dimensional setting, however, experimental design has been less of a focus. In this paper, we stress the importance of collecting multiple replicates for each subject in this setting. We consider learning the structure of a graphical model with latent variables, under the assumption that these variables take a constant value across replicates within each subject. By collecting multiple replicates for each subject, we are able to estimate the conditional dependence relationships among the observed variables given the latent variables. To test the null hypothesis of conditional independence between two observed variables, we propose a pairwise decorrelated score test. Theoretical guarantees are established for parameter estimation and for this test. We show that our proposal is able to estimate latent variable graphical models more accurately than some existing proposals, and apply the proposed method to a brain imaging dataset.
Patterns in the English language: phonological networks, percolation and assembly models
NASA Astrophysics Data System (ADS)
Stella, Massimo; Brede, Markus
2015-05-01
In this paper we provide a quantitative framework for the study of phonological networks (PNs) for the English language by carrying out principled comparisons to null models, either based on site percolation, randomization techniques, or network growth models. In contrast to previous work, we mainly focus on null models that reproduce lower order characteristics of the empirical data. We find that artificial networks matching connectivity properties of the English PN are exceedingly rare: this leads to the hypothesis that the word repertoire might have been assembled over time by preferentially introducing new words which are small modifications of old words. Our null models are able to explain the ‘power-law-like’ part of the degree distributions and generally retrieve qualitative features of the PN such as high clustering, high assortativity coefficient and small-world characteristics. However, the detailed comparison to expectations from null models also points out significant differences, suggesting the presence of additional constraints in word assembly. Key constraints we identify are the avoidance of large degrees, the avoidance of triadic closure and the avoidance of large non-percolating clusters.
The importance of proving the null.
Gallistel, C R
2009-04-01
Null hypotheses are simple, precise, and theoretically important. Conventional statistical analysis cannot support them; Bayesian analysis can. The challenge in a Bayesian analysis is to formulate a suitably vague alternative, because the vaguer the alternative is (the more it spreads out the unit mass of prior probability), the more the null is favored. A general solution is a sensitivity analysis: Compute the odds for or against the null as a function of the limit(s) on the vagueness of the alternative. If the odds on the null approach 1 from above as the hypothesized maximum size of the possible effect approaches 0, then the data favor the null over any vaguer alternative to it. The simple computations and the intuitive graphic representation of the analysis are illustrated by the analysis of diverse examples from the current literature. They pose 3 common experimental questions: (a) Are 2 means the same? (b) Is performance at chance? (c) Are factors additive? (c) 2009 APA, all rights reserved
Share Market Analysis Using Various Economical Determinants to Predict Decision of Investors
NASA Astrophysics Data System (ADS)
Ghosh, Arijit; Roy, Samrat; Bandyopadhyay, Gautam; Choudhuri, Kripasindhu
2010-10-01
The following paper tries to develop six major hypotheses in Bombay Stock Exchange (BSE) in India. The paper tries to proof the hypothesis by collecting data from the fields on six sectors: oil prices, gold price, Cash Reserve Ratio, food price inflation, call money rate and Dollar price. The research uses these data as indicators to identify relationship and level of influence on Share prices of Bombay Stock Exchange by rejecting and accepting the null hypothesis.
Heightened risk of preterm birth and growth restriction after a first-born son.
Bruckner, Tim A; Mayo, Jonathan A; Gould, Jeffrey B; Stevenson, David K; Lewis, David B; Shaw, Gary M; Carmichael, Suzan L
2015-10-01
In Scandinavia, delivery of a first-born son elevates the risk of preterm delivery and intrauterine growth restriction of the next-born infant. External validity of these results remains unclear. We test this hypothesis for preterm delivery and growth restriction using the linked California birth cohort file. We examined the hypothesis separately by race and/or ethnicity. We retrieved data on 2,852,976 births to 1,426,488 mothers with at least two live births. Our within-mother tests applied Cox proportional hazards (preterm delivery, defined as less than 37 weeks gestation) and linear regression models (birth weight for gestational age percentiles). For non-Hispanic whites, Hispanics, Asians, and American Indian and/or Alaska Natives, analyses indicate heightened risk of preterm delivery and growth restriction after a first-born male. The race-specific hazard ratios for preterm delivery range from 1.07 to 1.18. Regression coefficients for birth weight for gestational age percentile range from -0.73 to -1.49. The 95% confidence intervals for all these estimates do not contain the null. By contrast, we could not reject the null for non-Hispanic black mothers. Whereas California findings generally support those from Scandinavia, the null results among non-Hispanic black mothers suggest that we do not detect adverse outcomes after a first-born male in all racial and/or ethnic groups. Copyright © 2015 Elsevier Inc. All rights reserved.
Lawlor-Savage, Linette; Goghari, Vina M.
2017-01-01
Training of working memory as a method of increasing working memory capacity and fluid intelligence has received much attention in recent years. This burgeoning field remains highly controversial with empirically-backed disagreements at all levels of evidence, including individual studies, systematic reviews, and even meta-analyses. The current study investigated the effect of a randomized six week online working memory intervention on untrained cognitive abilities in a community-recruited sample of healthy young adults, in relation to both a processing speed training active control condition, as well as a no-contact control condition. Results of traditional null hypothesis significance testing, as well as Bayesian factor analyses, revealed support for the null hypothesis across all cognitive tests administered before and after training. Importantly, all three groups were similar at pre-training for a variety of individual variables purported to moderate transfer of training to fluid intelligence, including personality traits, motivation to train, and expectations of cognitive improvement from training. Because these results are consistent with experimental trials of equal or greater methodological rigor, we suggest that future research re-focus on: 1) other promising interventions known to increase memory performance in healthy young adults, and; 2) examining sub-populations or alternative populations in which working memory training may be efficacious. PMID:28558000
A search for optical beacons: implications of null results.
Blair, David G; Zadnik, Marjan G
2002-01-01
Over the past few years a series of searches for interstellar radio beacons have taken place using the Parkes radio telescope. Here we report hitherto unpublished results from a search for optical beacons from 60 solar-type stars using the Perth-Lowell telescope. We discuss the significance of the null results from these searches, all of which were based on the interstellar contact channel hypothesis. While the null results of all searches to date can be explained simply by the nonexistence of electromagnetically communicating life elsewhere in the Milky Way, four other possible explanations that do not preclude its existence are proposed: (1) Extraterrestrial civilizations desiring to make contact through the use of electromagnetic beacons have a very low density in the Milky Way. (2) The interstellar contact channel hypothesis is incorrect, and beacons exist at frequencies that have not yet been searched. (3) The search has been incomplete in terms of sensitivity and/or target directions: Beacons exist, but more sensitive equipment and/or more searching is needed to achieve success. (4) The search has occurred before beacon signals can be expected to have arrived at the Earth, and beacon signals may be expected in the future. Based on consideration of the technology required for extraterrestrial civilizations to identify target planets, we argue that the fourth possibility is likely to be valid and that powerful, easily detectable beacons could be received in coming centuries.
Dougherty, Michael R; Hamovitz, Toby; Tidwell, Joe W
2016-02-01
A recent meta-analysis by Au et al. Psychonomic Bulletin & Review, 22, 366-377, (2015) reviewed the n-back training paradigm for working memory (WM) and evaluated whether (when aggregating across existing studies) there was evidence that gains obtained for training tasks transferred to gains in fluid intelligence (Gf). Their results revealed an overall effect size of g = 0.24 for the effect of n-back training on Gf. We reexamine the data through a Bayesian lens, to evaluate the relative strength of the evidence for the alternative versus null hypotheses, contingent on the type of control condition used. We find that studies using a noncontact (passive) control group strongly favor the alternative hypothesis that training leads to transfer but that studies using active-control groups show modest evidence in favor of the null. We discuss these findings in the context of placebo effects.
Universality hypothesis breakdown at one-loop order
NASA Astrophysics Data System (ADS)
Carvalho, P. R. S.
2018-05-01
We probe the universality hypothesis by analytically computing the at least two-loop corrections to the critical exponents for q -deformed O (N ) self-interacting λ ϕ4 scalar field theories through six distinct and independent field-theoretic renormalization group methods and ɛ -expansion techniques. We show that the effect of q deformation on the one-loop corrections to the q -deformed critical exponents is null, so the universality hypothesis is broken down at this loop order. Such an effect emerges only at the two-loop and higher levels, and the validity of the universality hypothesis is restored. The q -deformed critical exponents obtained through the six methods are the same and, furthermore, reduce to their nondeformed values in the appropriated limit.
Accelerated aging effects on surface hardness and roughness of lingual retainer adhesives.
Ramoglu, Sabri Ilhan; Usumez, Serdar; Buyukyilmaz, Tamer
2008-01-01
To test the null hypothesis that accelerated aging has no effect on the surface microhardness and roughness of two light-cured lingual retainer adhesives. Ten samples of light-cured materials, Transbond Lingual Retainer (3M Unitek) and Light Cure Retainer (Reliance) were cured with a halogen light for 40 seconds. Vickers hardness and surface roughness were measured before and after accelerated aging of 300 hours in a weathering tester. Differences between mean values were analyzed for statistical significance using a t-test. The level of statistical significance was set at P < .05. The mean Vickers hardness of Transbond Lingual Retainer was 62.8 +/- 3.5 and 79.6 +/- 4.9 before and after aging, respectively. The mean Vickers hardness of Light Cure Retainer was 40.3 +/- 2.6 and 58.3 +/- 4.3 before and after aging, respectively. Differences in both groups were statistically significant (P < .001). Following aging, mean surface roughness was changed from 0.039 microm to 0.121 microm and from 0.021 microm to 0.031 microm for Transbond Lingual Retainer and Light Cure Retainer, respectively. The roughening of Transbond Lingual Retainer with aging was statistically significant (P < .05), while the change in the surface roughness of Light Cure Retainer was not (P > .05). Accelerated aging significantly increased the surface microhardness of both light-cured retainer adhesives tested. It also significantly increased the surface roughness of the Transbond Lingual Retainer.
Effect of Computer-Based Video Games on Children: An Experimental Study
ERIC Educational Resources Information Center
Chuang, Tsung-Yen; Chen, Wei-Fan
2009-01-01
This experimental study investigated whether computer-based video games facilitate children's cognitive learning. In comparison to traditional computer-assisted instruction (CAI), this study explored the impact of the varied types of instructional delivery strategies on children's learning achievement. One major research null hypothesis was…
Williams, L. Keoki; Buu, Anne
2017-01-01
We propose a multivariate genome-wide association test for mixed continuous, binary, and ordinal phenotypes. A latent response model is used to estimate the correlation between phenotypes with different measurement scales so that the empirical distribution of the Fisher’s combination statistic under the null hypothesis is estimated efficiently. The simulation study shows that our proposed correlation estimation methods have high levels of accuracy. More importantly, our approach conservatively estimates the variance of the test statistic so that the type I error rate is controlled. The simulation also shows that the proposed test maintains the power at the level very close to that of the ideal analysis based on known latent phenotypes while controlling the type I error. In contrast, conventional approaches–dichotomizing all observed phenotypes or treating them as continuous variables–could either reduce the power or employ a linear regression model unfit for the data. Furthermore, the statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that conducting a multivariate test on multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests. The proposed method also offers a new approach to analyzing the Fagerström Test for Nicotine Dependence as multivariate phenotypes in genome-wide association studies. PMID:28081206
Fordyce, James A
2010-07-23
Phylogenetic hypotheses are increasingly being used to elucidate historical patterns of diversification rate-variation. Hypothesis testing is often conducted by comparing the observed vector of branching times to a null, pure-birth expectation. A popular method for inferring a decrease in speciation rate, which might suggest an early burst of diversification followed by a decrease in diversification rate is the gamma statistic. Using simulations under varying conditions, I examine the sensitivity of gamma to the distribution of the most recent branching times. Using an exploratory data analysis tool for lineages through time plots, tree deviation, I identified trees with a significant gamma statistic that do not appear to have the characteristic early accumulation of lineages consistent with an early, rapid rate of cladogenesis. I further investigated the sensitivity of the gamma statistic to recent diversification by examining the consequences of failing to simulate the full time interval following the most recent cladogenic event. The power of gamma to detect rate decrease at varying times was assessed for simulated trees with an initial high rate of diversification followed by a relatively low rate. The gamma statistic is extraordinarily sensitive to recent diversification rates, and does not necessarily detect early bursts of diversification. This was true for trees of various sizes and completeness of taxon sampling. The gamma statistic had greater power to detect recent diversification rate decreases compared to early bursts of diversification. Caution should be exercised when interpreting the gamma statistic as an indication of early, rapid diversification.
NASA Astrophysics Data System (ADS)
Kehagias, A.; Riotto, A.
2016-05-01
Symmetries play an interesting role in cosmology. They are useful in characterizing the cosmological perturbations generated during inflation and lead to consistency relations involving the soft limit of the statistical correlators of large-scale structure dark matter and galaxies overdensities. On the other hand, in observational cosmology the carriers of the information about these large-scale statistical distributions are light rays traveling on null geodesics. Motivated by this simple consideration, we study the structure of null infinity and the associated BMS symmetry in a cosmological setting. For decelerating Friedmann-Robertson-Walker backgrounds, for which future null infinity exists, we find that the BMS transformations which leaves the asymptotic metric invariant to leading order. Contrary to the asymptotic flat case, the BMS transformations in cosmology generate Goldstone modes corresponding to scalar, vector and tensor degrees of freedom which may exist at null infinity and perturb the asymptotic data. Therefore, BMS transformations generate physically inequivalent vacua as they populate the universe at null infinity with these physical degrees of freedom. We also discuss the gravitational memory effect when cosmological expansion is taken into account. In this case, there are extra contribution to the gravitational memory due to the tail of the retarded Green functions which are supported not only on the light-cone, but also in its interior. The gravitational memory effect can be understood also from an asymptotic point of view as a transition among cosmological BMS-related vacua.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kehagias, A.; Riotto, A.; Center for Astroparticle Physics
Symmetries play an interesting role in cosmology. They are useful in characterizing the cosmological perturbations generated during inflation and lead to consistency relations involving the soft limit of the statistical correlators of large-scale structure dark matter and galaxies overdensities. On the other hand, in observational cosmology the carriers of the information about these large-scale statistical distributions are light rays traveling on null geodesics. Motivated by this simple consideration, we study the structure of null infinity and the associated BMS symmetry in a cosmological setting. For decelerating Friedmann-Robertson-Walker backgrounds, for which future null infinity exists, we find that the BMS transformationsmore » which leaves the asymptotic metric invariant to leading order. Contrary to the asymptotic flat case, the BMS transformations in cosmology generate Goldstone modes corresponding to scalar, vector and tensor degrees of freedom which may exist at null infinity and perturb the asymptotic data. Therefore, BMS transformations generate physically inequivalent vacua as they populate the universe at null infinity with these physical degrees of freedom. We also discuss the gravitational memory effect when cosmological expansion is taken into account. In this case, there are extra contribution to the gravitational memory due to the tail of the retarded Green functions which are supported not only on the light-cone, but also in its interior. The gravitational memory effect can be understood also from an asymptotic point of view as a transition among cosmological BMS-related vacua.« less
Null-space and statistical significance of first-arrival traveltime inversion
NASA Astrophysics Data System (ADS)
Morozov, Igor B.
2004-03-01
The strong uncertainty inherent in the traveltime inversion of first arrivals from surface sources is usually removed by using a priori constraints or regularization. This leads to the null-space (data-independent model variability) being inadequately sampled, and consequently, model uncertainties may be underestimated in traditional (such as checkerboard) resolution tests. To measure the full null-space model uncertainties, we use unconstrained Monte Carlo inversion and examine the statistics of the resulting model ensembles. In an application to 1-D first-arrival traveltime inversion, the τ-p method is used to build a set of models that are equivalent to the IASP91 model within small, ~0.02 per cent, time deviations. The resulting velocity variances are much larger, ~2-3 per cent within the regions above the mantle discontinuities, and are interpreted as being due to the null-space. Depth-variant depth averaging is required for constraining the velocities within meaningful bounds, and the averaging scalelength could also be used as a measure of depth resolution. Velocity variances show structure-dependent, negative correlation with the depth-averaging scalelength. Neither the smoothest (Herglotz-Wiechert) nor the mean velocity-depth functions reproduce the discontinuities in the IASP91 model; however, the discontinuities can be identified by the increased null-space velocity (co-)variances. Although derived for a 1-D case, the above conclusions also relate to higher dimensions.
Dackor, J.; Strunk, K. E.; Wehmeyer, M. M.; Threadgill, D. W.
2007-01-01
Homozygosity for the Egfrtm1Mag null allele in mice leads to genetic background dependent placental abnormalities and embryonic lethality. Molecular mechanisms or genetic modifiers that differentiate strains with surviving versus non-surviving Egfr nullizygous embryos have yet to be identified. Egfr transcripts in wildtype placenta was quantified by ribonuclease protection assay (RPA) and the lowest level of Egfr mRNA expression was found to coincide with Egfrtm1Mag homozygous lethality. Immunohistochemical analysis of ERBB family receptors, ERBB2, ERBB3, and ERBB4, showed similar expression between Egfr wildtype and null placentas indicating that Egfr null trophoblast do not up-regulate these receptors to compensate for EGFR deficiency. Significantly fewer numbers of bromodeoxyuridine (BrdU) positive trophoblast were observed in Egfr nullizygous placentas and Cdc25a and Myc, genes associated with proliferation, were significantly down-regulated in null placentas. However, strains with both mild and severe placental phenotypes exhibit reduced proliferation suggesting that this defect alone does not account for strain-specific embryonic lethality. Consistent with this hypothesis, intercrosses generating mice null for cell cycle checkpoint genes (Trp53, Rb1, Cdkn1a, Cdkn1b or Cdkn2c) in combination with Egfr deficiency did not increase survival of Egfr nullizygous embryos. Since complete development of the spongiotrophoblast compartment is not required for survival of Egfr nullizygous embryos, reduction of this layer that is commonly observed in Egfr nullizygous placentas likely accounts for the decrease in proliferation. PMID:17822758
White, Rhonda; Chileshe, Modesta; Dawson, Liza; Donnell, Deborah; Hillier, Sharon; Morar, Neetha; Noguchi, Lisa; Dixon, Dennis
2011-02-01
Most trials of interventions are designed to address the traditional null hypothesis of no benefit. VOICE, a phase 2B HIV prevention trial funded by NIH and conducted in Africa, is designed to assess if the intervention will prevent a substantial fraction of infections. Planned interim analysis may provide conclusive evidence against the traditional null hypothesis without establishing substantial benefit. At this interim point, the Data and Safety Monitoring Board would then face the dilemma of knowing the product has some positive effect, but perhaps not as great an effect as the protocol has declared necessary. In March 2008, NIH program staff recommended that the VOICE protocol team discuss the stopping rules with stakeholders prior to initiating the protocol. The goals of the workshop were to inform community representatives about the potential ethical dilemma associated with stopping rules and engage in dialogue about these issues. We describe the resulting community consultation and summarize the outcomes. A 2-day workshop was convened with the goal of having a clear and transparent consultation with the stakeholders around the question, 'Given emerging evidence that a product could prevent some infections, would the community support a decision to continue accruing to the trial?' Participants included research staff and community stakeholders. Lectures with visual aids, discussions, and exercises using interactive learning tasks were used, with a focus on statistics and interpreting data from trials, particularly interim data. Results of oral and written evaluations by participants were reviewed. The feedback was mostly positive, with some residual confusion regarding statistical concepts. However, discussions with attendees later revealed that not all felt prepared to engage fully in the workshop. This was the presenters' first experience facilitating a formal discussion with an audience that had no advanced science, research, or mathematics training. Community representatives' concern regarding speaking for their communities without consulting them also created a challenge for the workshop. Open discussion around trial stopping rules requires that all discussants have an understanding of trial design concepts and feel a sense of empowerment to ask and answer questions. The VOICE CWG workshop was a first step toward the goal of open discussion regarding trial stopping rules and interim results for the study; however, ongoing education and dialogue must occur to ensure that all stakeholders fully participate in the process.
Faure, Eric
2008-01-01
Background In Europe, the north-south downhill cline frequency of the chemokine receptor CCR5 allele with a 32-bp deletion (CCR5-Δ32) raises interesting questions for evolutionary biologists. We had suggested first that, in the past, the European colonizers, principally Romans, might have been instrumental of a progressively decrease of the frequencies southwards. Indeed, statistical analyses suggested strong negative correlations between the allele frequency and historical parameters including the colonization dates by Mediterranean civilisations. The gene flows from colonizers to native populations were extremely low but colonizers are responsible of the spread of several diseases suggesting that the dissemination of parasites in naive populations could have induced a breakdown rupture of the fragile pathocenosis changing the balance among diseases. The new equilibrium state has been reached through a negative selection of the null allele. Results Most of the human diseases are zoonoses and cat might have been instrumental in the decrease of the allele frequency, because its diffusion through Europe was a gradual process, due principally to Romans; and that several cat zoonoses could be transmitted to man. The possible implication of a feline lentivirus (FIV) which does not use CCR5 as co-receptor is discussed. This virus can infect primate cells in vitro and induces clinical signs in macaque. Moreover, most of the historical regions with null or low frequency of CCR5-Δ32 allele coincide with historical range of the wild felid species which harbor species-specific FIVs. Conclusion We proposed the hypothesis that the actual European CCR5 allelic frequencies are the result of a negative selection due to a disease spreading. A cat zoonosis, could be the most plausible hypothesis. Future studies could provide if CCR5 can play an antimicrobial role in FIV pathogenesis. Moreover, studies of ancient DNA could provide more evidences regarding the implications of zoonoses in the actual CCR5-Δ32 distribution. PMID:18925940
Long memory behavior of returns after intraday financial jumps
NASA Astrophysics Data System (ADS)
Behfar, Stefan Kambiz
2016-11-01
In this paper, characterization of intraday financial jumps and time dynamics of returns after jumps is investigated, and will be analytically and empirically shown that intraday jumps are power-law distributed with the exponent 1 < μ < 2; in addition, returns after jumps show long-memory behavior. In the theory of finance, it is important to be able to distinguish between jumps and continuous sample path price movements, and this can be achieved by introducing a statistical test via calculating sums of products of returns over small period of time. In the case of having jump, the null hypothesis for normality test is rejected; this is based on the idea that returns are composed of mixture of normally-distributed and power-law distributed data (∼ 1 /r 1 + μ). Probability of rejection of null hypothesis is a function of μ, which is equal to one for 1 < μ < 2 within large intraday sample size M. To test this idea empirically, we downloaded S&P500 index data for both periods of 1997-1998 and 2014-2015, and showed that the Complementary Cumulative Distribution Function of jump return is power-law distributed with the exponent 1 < μ < 2. There are far more jumps in 1997-1998 as compared to 2015-2016; and it represents a power law exponent in 2015-2016 greater than one in 1997-1998. Assuming that i.i.d returns generally follow Poisson distribution, if the jump is a causal factor, high returns after jumps are the effect; we show that returns caused by jump decay as power-law distribution. To test this idea empirically, we average over the time dynamics of all days; therefore the superposed time dynamics after jump represent a power-law, which indicates that there is a long memory with a power-law distribution of return after jump.
Age affects severity of venous gas emboli on decompression from 14.7 to 4.3 psia
NASA Technical Reports Server (NTRS)
Conkin, Johnny; Powell, Michael R.; Gernhardt, Michael L.
2003-01-01
INTRODUCTION: Variables that define who we are, such as age, weight and fitness level influence the risk of decompression sickness (DCS) and venous gas emboli (VGE) from diving and aviation decompressions. We focus on age since astronauts that perform space walks are approximately 10 yr older than our test subjects. Our null hypothesis is that age is not statistically associated with the VGE outcomes from decompression to 4.3 psia. METHODS: Our data are from 7 different NASA tests where 188 men and 50 women performed light exercise at 4.3 psia for planned exposures no less than 4 h. Prebreathe (PB) time on 100% oxygen ranged from 150-270 min, including ascent time, with exercise of different intensity and length being performed during the PB in four of the seven tests with 150 min of PB. Subjects were monitored for VGE in the pulmonary artery using a Doppler ultrasound bubble detector for a 4-min period every 12 min. There were six design variables; the presence or absence of lower body adynamia and five PB variables; plus five concomitant variables on physical characteristics: age, weight height, body mass index, and gender that were available for logistic regression (LR). We used LR models for the probability of DCS and VGE, and multinomial logit (ML) models for the probability of Spencer VGE Grades 0-IV at exposure times of 61, 95, 131, 183 min, and for the entire exposure. RESULTS: Age was significantly associated with VGE in both the LR and ML models, so we reject the null hypothesis. Lower body adynamia was significant for all responses. CONCLUSIONS: Our selection of tests produced a wide range of the explanatory variables, but only age, lower body adynamia, height, and total PB time was helpful in various combinations to model the probability of DCS and VGE.
Time-Frequency Learning Machines for Nonstationarity Detection Using Surrogates
NASA Astrophysics Data System (ADS)
Borgnat, Pierre; Flandrin, Patrick; Richard, Cédric; Ferrari, André; Amoud, Hassan; Honeine, Paul
2012-03-01
Time-frequency representations provide a powerful tool for nonstationary signal analysis and classification, supporting a wide range of applications [12]. As opposed to conventional Fourier analysis, these techniques reveal the evolution in time of the spectral content of signals. In Ref. [7,38], time-frequency analysis is used to test stationarity of any signal. The proposed method consists of a comparison between global and local time-frequency features. The originality is to make use of a family of stationary surrogate signals for defining the null hypothesis of stationarity and, based upon this information, to derive statistical tests. An open question remains, however, about how to choose relevant time-frequency features. Over the last decade, a number of new pattern recognition methods based on reproducing kernels have been introduced. These learning machines have gained popularity due to their conceptual simplicity and their outstanding performance [30]. Initiated by Vapnik’s support vector machines (SVM) [35], they offer now a wide class of supervised and unsupervised learning algorithms. In Ref. [17-19], the authors have shown how the most effective and innovative learning machines can be tuned to operate in the time-frequency domain. This chapter follows this line of research by taking advantage of learning machines to test and quantify stationarity. Based on one-class SVM, our approach uses the entire time-frequency representation and does not require arbitrary feature extraction. Applied to a set of surrogates, it provides the domain boundary that includes most of these stationarized signals. This allows us to test the stationarity of the signal under investigation. This chapter is organized as follows. In Section 22.2, we introduce the surrogate data method to generate stationarized signals, namely, the null hypothesis of stationarity. The concept of time-frequency learning machines is presented in Section 22.3, and applied to one-class SVM in order to derive a stationarity test in Section 22.4. The relevance of the latter is illustrated by simulation results in Section 22.5.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Toloba, Elisa; Guhathakurta, Puragra; Li, Biao
2016-05-01
We analyze the kinematics of six Virgo cluster dwarf early-type galaxies (dEs) from their globular cluster (GC) systems. We present new Keck/DEIMOS spectroscopy for three of them and re-analyze the data found in the literature for the remaining three. We use two independent methods to estimate the rotation amplitude ( V {sub rot}) and velocity dispersion ( σ {sub GC}) of the GC systems and evaluate their statistical significance by simulating non-rotating GC systems with the same number of GC satellites and velocity uncertainties. Our measured kinematics agree with the published values for the three galaxies from the literature and,more » in all cases, some rotation is measured. However, our simulations show that the null hypothesis of being non-rotating GC systems cannot be ruled out. In the case of VCC 1861, the measured V {sub rot} and the simulations indicate that it is not rotating. In the case of VCC 1528, the null hypothesis can be marginally ruled out, and thus it might be rotating although further confirmation is needed. In our analysis, we find that, in general, the measured V {sub rot} tends to be overestimated and the measured σ {sub GC} tends to be underestimated by amounts that depend on the intrinsic V {sub rot}/ σ {sub GC}, the number of observed GCs ( N {sub GC}), and the velocity uncertainties. The bias is negligible when N {sub GC} ≳ 20. In those cases where a large N {sub GC} is not available, it is imperative to obtain data with small velocity uncertainties. For instance, errors of ≤2 km s{sup −1} lead to V {sub rot} < 10 km s{sup −1} for a system that is intrinsically not rotating.« less
Evidence for a postreproductive phase in female false killer whales Pseudorca crassidens.
Photopoulou, Theoni; Ferreira, Ines M; Best, Peter B; Kasuya, Toshio; Marsh, Helene
2017-01-01
A substantial period of life after reproduction ends, known as postreproductive lifespan (PRLS), is at odds with classical life history theory and its causes and mechanisms have puzzled evolutionary biologists for decades. Prolonged PRLS has been confirmed in only two non-human mammals, both odontocete cetaceans in the family Delphinidae. We investigate the evidence for PRLS in a third species, the false killer whale, Pseudorca crassidens , using a quantitative measure of PRLS and morphological evidence from reproductive tissues. We examined specimens from false killer whales from combined strandings (South Africa, 1981) and harvest (Japan 1979-80) and found morphological evidence of changes in the activity of the ovaries in relation to age. Ovulation had ceased in 50% of whales over 45 years, and all whales over 55 years old had ovaries classified as postreproductive. We also calculated a measure of PRLS, known as postreproductive representation (PrR) as an indication of the effect of inter-population demographic variability. PrR for the combined sample was 0.14, whereas the mean of the simulated distribution for PrR under the null hypothesis of no PRLS was 0.02. The 99th percentile of the simulated distribution was 0.08 and no simulated value exceeded 0.13. These results suggest that PrR was convincingly different from the measures simulated under the null hypothesis. We found morphological and statistical evidence for PRLS in South African and Japanese pods of false killer whales, suggesting that this species is the third non-human mammal in which this phenomenon has been demonstrated in wild populations. Nonetheless, our estimate for PrR in false killer whales (0.14) is lower than the single values available for the short-finned pilot whale (0.28) and the killer whale (0.22) and is more similar to working Asian elephants (0.13).
Zhang, Jing; Boyes, Victoria; Festy, Frederic; Lynch, Richard J M; Watson, Timothy F; Banerjee, Avijit
2018-05-08
To test the null hypothesis that chitosan application has no impact on the remineralisation of artificial incipient enamel white spot lesions (WSLs). 66 artificial enamel WSLs were assigned to 6 experimental groups (n=11): (1) bioactive glass slurry, (2) bioactive glass containing polyacrylic acid (BG+PAA) slurry, (3) chitosan pre-treated WSLs with BG slurry (CS-BG), (4) chitosan pre-treated WSLs with BG+PAA slurry (CS-BG+PAA), (5) remineralisation solution (RS) and (6) de-ionised water (negative control, NC). Surface and cross-sectional Raman intensity mapping (960cm -1 ) were performed on 5 samples/group to assess mineral content. Raman spectroscopy and attenuated total reflectance Fourier transform infrared spectroscopy (ATR-FTIR) were used to identify the type of newly formed minerals. Surface and cross-sectional Knoop microhardness were implemented to evaluate the mechanical properties after remineralisation. Surface morphologies and Ca/P ratio were observed using scanning electron microscopy (SEM) coupled with energy dispersive X-ray spectroscopy (EDX). Data were statistically analysed using one-way ANOVA with Tukey's test. BG+PAA, CS-BG, RS presented significantly higher mineral regain compared to NC on lesion surfaces, while CS-BG+PAA had higher subsurface mineral content. Newly mineralised crystals consist of type-B hydroxycarbonate apatite. CS-BG+PAA showed the greatest hardness recovery, followed by CS-BG, both significantly higher than other groups. SEM observations showed altered surface morphologies in all experimental groups except NC post-treatment. EDX suggested a higher content of carbon, oxygen and silicon in the precipitations in CS-BG+PAA group. There was no significant difference between each group in terms of Ca/P ratio. The null hypothesis was rejected. Chitosan pre-treatment enhanced WSL remineralisation with either BG only or with BG-PAA complexes. A further investigation using dynamic remineralisation/demineralisation system is required with regards to clinical application. Copyright © 2018 The Academy of Dental Materials. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Miller-Ricks, Karen A.
Educational reform efforts in Science, Technology, Engineering, Math (STEM) place emphasis on teachers as conduits for student achievement. The purpose of this study was to use TIMSS 2011 data to examine relationships between Science-Technology-Society (STS) instructional practices (student-centered instruction established to promote learning through real-world applications) teacher preparedness, and student achievement and identify variations of achievement between and among eighth-grade science and math classes. The research was framed by both Harper's Anti-Deficit Achievement Theory and Bronfenbrenner's Ecological Systems Theory (BEST). 501 U.S. schools contributed to the TIMSS 2011 data from both the teacher questionnaires and student booklets. Chi-Square, Spearman Correlation, and 2-level hierarchical linear modeling (HLM) were used to analyze data about teachers' preparedness to teach science and math, frequency of using STS instructional practices, and student achievement. The chi-square null hypothesis for math teachers was rejected, providing the assumption that there was an association between the frequency of using STS instruction in math and teacher preparedness. However, the chi-square null hypothesis for science teachers failed to be rejected, providing the assumption that there was no significant association between the frequency of using STS instruction in science and science teacher preparedness. The Spearman Correlation revealed statistically positively significant differences between STS instruction and science achievement, as well as between teacher preparedness and science achievement. The HLM results suggested that 33% of the variance of mathematics achievement was at the individual level and 66% was at the group level. The results for science teachers suggested that 54% of the variance of science achievement was at the individual level and 46% of the variance was at the group level. The data findings support the conclusion that secondary STEM teachers who are more prepared to teach within the STEM content domains and implement STS instructional practices into lessons have higher achievement scores.
Using Bayes factors to evaluate evidence for no effect: examples from the SIPS project.
Dienes, Zoltan; Coulton, Simon; Heather, Nick
2018-02-01
To illustrate how Bayes factors are important for determining the effectiveness of interventions. We consider a case where inappropriate conclusions were drawn publicly based on significance testing, namely the SIPS project (Screening and Intervention Programme for Sensible drinking), a pragmatic, cluster-randomized controlled trial in each of two health-care settings and in the criminal justice system. We show how Bayes factors can disambiguate the non-significant findings from the SIPS project and thus determine whether the findings represent evidence of absence or absence of evidence. We show how to model the sort of effects that could be expected, and how to check the robustness of the Bayes factors. The findings from the three SIPS trials taken individually are largely uninformative but, when data from these trials are combined, there is moderate evidence for a null hypothesis (H0) and thus for a lack of effect of brief intervention compared with simple clinical feedback and an alcohol information leaflet (B = 0.24, P = 0.43). Scientists who find non-significant results should suspend judgement-unless they calculate a Bayes factor to indicate either that there is evidence for a null hypothesis (H0) over a (well-justified) alternative hypothesis (H1), or that more data are needed. © 2017 Society for the Study of Addiction.
Monitoring Statistics Which Have Increased Power over a Reduced Time Range.
ERIC Educational Resources Information Center
Tang, S. M.; MacNeill, I. B.
1992-01-01
The problem of monitoring trends for changes at unknown times is considered. Statistics that permit one to focus high power on a segment of the monitored period are studied. Numerical procedures are developed to compute the null distribution of these statistics. (Author)
Quality of statistical reporting in developmental disability journals.
Namasivayam, Aravind K; Yan, Tina; Wong, Wing Yiu Stephanie; van Lieshout, Pascal
2015-12-01
Null hypothesis significance testing (NHST) dominates quantitative data analysis, but its use is controversial and has been heavily criticized. The American Psychological Association has advocated the reporting of effect sizes (ES), confidence intervals (CIs), and statistical power analysis to complement NHST results to provide a more comprehensive understanding of research findings. The aim of this paper is to carry out a sample survey of statistical reporting practices in two journals with the highest h5-index scores in the areas of developmental disability and rehabilitation. Using a checklist that includes critical recommendations by American Psychological Association, we examined 100 randomly selected articles out of 456 articles reporting inferential statistics in the year 2013 in the Journal of Autism and Developmental Disorders (JADD) and Research in Developmental Disabilities (RDD). The results showed that for both journals, ES were reported only half the time (JADD 59.3%; RDD 55.87%). These findings are similar to psychology journals, but are in stark contrast to ES reporting in educational journals (73%). Furthermore, a priori power and sample size determination (JADD 10%; RDD 6%), along with reporting and interpreting precision measures (CI: JADD 13.33%; RDD 16.67%), were the least reported metrics in these journals, but not dissimilar to journals in other disciplines. To advance the science in developmental disability and rehabilitation and to bridge the research-to-practice divide, reforms in statistical reporting, such as providing supplemental measures to NHST, are clearly needed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berman, D.W.; Allen, B.C.; Van Landingham, C.B.
1998-12-31
The decision rules commonly employed to determine the need for cleanup are evaluated both to identify conditions under which they lead to erroneous conclusions and to quantify the rate that such errors occur. Their performance is also compared with that of other applicable decision rules. The authors based the evaluation of decision rules on simulations. Results are presented as power curves. These curves demonstrate that the degree of statistical control achieved is independent of the form of the null hypothesis. The loss of statistical control that occurs when a decision rule is applied to a data set that does notmore » satisfy the rule`s validity criteria is also clearly demonstrated. Some of the rules evaluated do not offer the formal statistical control that is an inherent design feature of other rules. Nevertheless, results indicate that such informal decision rules may provide superior overall control of error rates, when their application is restricted to data exhibiting particular characteristics. The results reported here are limited to decision rules applied to uncensored and lognormally distributed data. To optimize decision rules, it is necessary to evaluate their behavior when applied to data exhibiting a range of characteristics that bracket those common to field data. The performance of decision rules applied to data sets exhibiting a broader range of characteristics is reported in the second paper of this study.« less
Gontscharuk, Veronika; Landwehr, Sandra; Finner, Helmut
2015-01-01
The higher criticism (HC) statistic, which can be seen as a normalized version of the famous Kolmogorov-Smirnov statistic, has a long history, dating back to the mid seventies. Originally, HC statistics were used in connection with goodness of fit (GOF) tests but they recently gained some attention in the context of testing the global null hypothesis in high dimensional data. The continuing interest for HC seems to be inspired by a series of nice asymptotic properties related to this statistic. For example, unlike Kolmogorov-Smirnov tests, GOF tests based on the HC statistic are known to be asymptotically sensitive in the moderate tails, hence it is favorably applied for detecting the presence of signals in sparse mixture models. However, some questions around the asymptotic behavior of the HC statistic are still open. We focus on two of them, namely, why a specific intermediate range is crucial for GOF tests based on the HC statistic and why the convergence of the HC distribution to the limiting one is extremely slow. Moreover, the inconsistency in the asymptotic and finite behavior of the HC statistic prompts us to provide a new HC test that has better finite properties than the original HC test while showing the same asymptotics. This test is motivated by the asymptotic behavior of the so-called local levels related to the original HC test. By means of numerical calculations and simulations we show that the new HC test is typically more powerful than the original HC test in normal mixture models. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
ERIC Educational Resources Information Center
Dana, Richard H., Ed.
This collection of papers includes: (1) "An Assessment-Intervention Model for Research and Practice with Multicultural Populations" (Richard H. Dana); (2) "An Africentric Perspective for Clinical Research and Practice" (Edward F. Morris); (3) "Myths about the Null Hypothesis and the Path to Reform" (Robert G.…
Testing 40 Predictions from the Transtheoretical Model Again, with Confidence
ERIC Educational Resources Information Center
Velicer, Wayne F.; Brick, Leslie Ann D.; Fava, Joseph L.; Prochaska, James O.
2013-01-01
Testing Theory-based Quantitative Predictions (TTQP) represents an alternative to traditional Null Hypothesis Significance Testing (NHST) procedures and is more appropriate for theory testing. The theory generates explicit effect size predictions and these effect size estimates, with related confidence intervals, are used to test the predictions.…
Ultimate Attainment of Anaphora Resolution in L2 Chinese
ERIC Educational Resources Information Center
Zhao, Lucy Xia
2014-01-01
The current study tests the Interface Hypothesis through forward and backward anaphora in complex sentences with temporal subordinate clauses in highly proficient English-speaking learners' second-language (L2) Chinese. Forward anaphora is involved when the overt pronoun "ta" "he/she" or a null element appears in the subject…
USDA-ARS?s Scientific Manuscript database
Conservation tillage practices have combined genetically modified glyphosate resistant corn crops along with applications of the herbicide glyphosate. We tested the null hypothesis that the soil process of nitrification and the distribution of archaeal and bacterial nitrifying communities would not ...
Estimating equivalence with quantile regression
Cade, B.S.
2011-01-01
Equivalence testing and corresponding confidence interval estimates are used to provide more enlightened statistical statements about parameter estimates by relating them to intervals of effect sizes deemed to be of scientific or practical importance rather than just to an effect size of zero. Equivalence tests and confidence interval estimates are based on a null hypothesis that a parameter estimate is either outside (inequivalence hypothesis) or inside (equivalence hypothesis) an equivalence region, depending on the question of interest and assignment of risk. The former approach, often referred to as bioequivalence testing, is often used in regulatory settings because it reverses the burden of proof compared to a standard test of significance, following a precautionary principle for environmental protection. Unfortunately, many applications of equivalence testing focus on establishing average equivalence by estimating differences in means of distributions that do not have homogeneous variances. I discuss how to compare equivalence across quantiles of distributions using confidence intervals on quantile regression estimates that detect differences in heterogeneous distributions missed by focusing on means. I used one-tailed confidence intervals based on inequivalence hypotheses in a two-group treatment-control design for estimating bioequivalence of arsenic concentrations in soils at an old ammunition testing site and bioequivalence of vegetation biomass at a reclaimed mining site. Two-tailed confidence intervals based both on inequivalence and equivalence hypotheses were used to examine quantile equivalence for negligible trends over time for a continuous exponential model of amphibian abundance. ?? 2011 by the Ecological Society of America.
Role of CYP2B in Phenobarbital-Induced Hepatocyte Proliferation in Mice.
Li, Lei; Bao, Xiaochen; Zhang, Qing-Yu; Negishi, Masahiko; Ding, Xinxin
2017-08-01
Phenobarbital (PB) promotes liver tumorigenesis in rodents, in part through activation of the constitutive androstane receptor (CAR) and the consequent changes in hepatic gene expression and increases in hepatocyte proliferation. A typical effect of CAR activation by PB is a marked induction of Cyp2b10 expression in the liver; the latter has been suspected to be vital for PB-induced hepatocellular proliferation. This hypothesis was tested here by using a Cyp2a(4/5)bgs -null (null) mouse model in which all Cyp2b genes are deleted. Adult male and female wild-type (WT) and null mice were treated intraperitoneally with PB at 50 mg/kg once daily for 5 successive days and tested on day 6. The liver-to-body weight ratio, an indicator of liver hypertrophy, was increased by 47% in male WT mice, but by only 22% in male Cyp2a(4/5)bgs -null mice, by the PB treatment. The fractions of bromodeoxyuridine-positive hepatocyte nuclei, assessed as a measure of the rate of hepatocyte proliferation, were also significantly lower in PB-treated male null mice compared with PB-treated male WT mice. However, whereas few proliferating hepatocytes were detected in saline-treated mice, many proliferating hepatocytes were still detected in PB-treated male null mice. In contrast, female WT mice were much less sensitive than male WT mice to PB-induced hepatocyte proliferation, and PB-treated female WT and PB-treated female null mice did not show significant difference in rates of hepatocyte proliferation. These results indicate that CYP2B induction plays a significant, but partial, role in PB-induced hepatocyte proliferation in male mice. U.S. Government work not protected by U.S. copyright.
Increased occurrence of dental anomalies associated with infraocclusion of deciduous molars.
Shalish, Miriam; Peck, Sheldon; Wasserstein, Atalia; Peck, Leena
2010-05-01
To test the null hypothesis that there is no relationship between infraocclusion and the occurrence of other dental anomalies in subjects selected for clear-cut infraocclusion of one or more deciduous molars. The experimental sample consisted of 99 orthodontic patients (43 from Boston, Mass, United States; 56 from Jerusalem, Israel) with at least one deciduous molar in infraocclusion greater than 1 mm vertical discrepancy, measured from the mesial marginal ridge of the first permanent molar. Panoramic radiographs and dental casts were used to determine the presence of other dental anomalies, including agenesis of permanent teeth, microdontia of maxillary lateral incisors, palatally displaced canines (PDC), and distal angulation of the mandibular second premolars (MnP2-DA). Comparative prevalence reference values were utilized and statistical testing was performed using the chi-square test (P < .05) and odds ratio. The studied dental anomalies showed two to seven times greater prevalence in the infraocclusion samples, compared with reported prevalence in reference samples. In most cases, the infraoccluded deciduous molar exfoliated eventually and the underlying premolar erupted spontaneously. In some severe phenotypes (10%), the infraoccluded deciduous molar was extracted and space was regained to allow uncomplicated eruption of the associated premolar. Statistically significant associations were observed between the presence of infraocclusion and the occurrence of tooth agenesis, microdontia of maxillary lateral incisors, PDC, and MnP2-DA. These associations support a hypothesis favoring shared causal genetic factors. Clinically, infraocclusion may be considered an early marker for the development of later appearing dental anomalies, such as tooth agenesis and PDC.
A search for evidence of solar rotation in Super-Kamiokande solar neutrino dataset
NASA Astrophysics Data System (ADS)
Desai, Shantanu; Liu, Dawei W.
2016-09-01
We apply the generalized Lomb-Scargle (LS) periodogram, proposed by Zechmeister and Kurster, to the solar neutrino data from Super-Kamiokande (Super-K) using data from its first five years. For each peak in the LS periodogram, we evaluate the statistical significance in two different ways. The first method involves calculating the False Alarm Probability (FAP) using non-parametric bootstrap resampling, and the second method is by calculating the difference in Bayesian Information Criterion (BIC) between the null hypothesis, viz. the data contains only noise, compared to the hypothesis that the data contains a peak at a given frequency. Using these methods, we scan the frequency range between 7-14 cycles per year to look for any peaks caused by solar rotation, since this is the proposed explanation for the statistically significant peaks found by Sturrock and collaborators in the Super-K dataset. From our analysis, we do confirm that similar to Sturrock et al, the maximum peak occurs at a frequency of 9.42/year, corresponding to a period of 38.75 days. The FAP for this peak is about 1.5% and the difference in BIC (between pure white noise and this peak) is about 4.8. We note that the significance depends on the frequency band used to search for peaks and hence it is important to use a search band appropriate for solar rotation. However, The significance of this peak based on the value of BIC is marginal and more data is needed to confirm if the peak persists and is real.
Classification of HCV and HIV-1 Sequences with the Branching Index
Hraber, Peter; Kuiken, Carla; Waugh, Mark; Geer, Shaun; Bruno, William J.; Leitner, Thomas
2009-01-01
SUMMARY Classification of viral sequences should be fast, objective, accurate, and reproducible. Most methods that classify sequences use either pairwise distances or phylogenetic relations, but cannot discern when a sequence is unclassifiable. The branching index (BI) combines distance and phylogeny methods to compute a ratio that quantifies how closely a query sequence clusters with a subtype clade. In the hypothesis-testing framework of statistical inference, the BI is compared with a threshold to test whether sufficient evidence exists for the query sequence to be classified among known sequences. If above the threshold, the null hypothesis of no support for the subtype relation is rejected and the sequence is taken as belonging to the subtype clade with which it clusters on the tree. This study evaluates statistical properties of the branching index for subtype classification in HCV and HIV-1. Pairs of BI values with known positive and negative test results were computed from 10,000 random fragments of reference alignments. Sampled fragments were of sufficient length to contain phylogenetic signal that groups reference sequences together properly into subtype clades. For HCV, a threshold BI of 0.71 yields 95.1% agreement with reference subtypes, with equal false positive and false negative rates. For HIV-1, a threshold of 0.66 yields 93.5% agreement. Higher thresholds can be used where lower false positive rates are required. In synthetic recombinants, regions without breakpoints are recognized accurately; regions with breakpoints do not uniquely represent any known subtype. Web-based services for viral subtype classification with the branching index are available online. PMID:18753218
The effects of temperature on sex determination in the bloater Coregonus hoyi: a hypothesis test
Eck, Gary W.; Allen, Jeffrey D.
1995-01-01
The hypothesis that temperature was an epigamic factor in bloater (Coregonus hoyi) sex determination in Lake Michigan was tested by rearing bloater larvae in the laboratory at 6, 11, and 15 degrees C for the first 80 days after hatching. The percentages of females of fish exposed to the three treatment temperatures did not differ significantly from the expected, 50%. Therefore, the null hypothesis, that temperature did not influence bloater sex determination within the confines of this study, could not be rejected. Our study of bloater sex determination was an attempt to explain the extreme female predominance (> 95%) that occurred in the Lake Michigan bloater population during the 1960s.
Adiponectin deficiency impairs liver regeneration through attenuating STAT3 phosphorylation in mice.
Shu, Run-Zhe; Zhang, Feng; Wang, Fang; Feng, De-Chun; Li, Xi-Hua; Ren, Wei-Hua; Wu, Xiao-Lin; Yang, Xue; Liao, Xiao-Dong; Huang, Lei; Wang, Zhu-Gang
2009-09-01
Liver regeneration is a very complex and well-orchestrated process associated with signaling cascades involving cytokines, growth factors, and metabolic pathways. Adiponectin is an adipocytokine secreted by mature adipocytes, and its receptors are widely distributed in many tissues, including the liver. Adiponectin has direct actions in the liver with prominent roles to improve hepatic insulin sensitivity, increase fatty acid oxidation, and decrease inflammation. To test the hypothesis that adiponectin is required for normal progress of liver regeneration, 2/3 partial hepatectomy (PH) was performed on wild-type and adiponectin-null mice. Compared to wild-type mice, adiponectin-null mice displayed decreased liver mass regrowth, impeded hepatocyte proliferation, and increased hepatic lipid accumulation. Gene expression analysis revealed that adiponectin regulated the gene transcription related to lipid metabolism. Furthermore, the suppressed hepatocyte proliferation was accompanied with reduced signal transducer and activator of transcription protein 3 (STAT3) activity and enhanced suppressor of cytokine signaling 3 (Socs3) transcription. In conclusion, adiponectin-null mice exhibit impaired liver regeneration and increased hepatic steatosis. Increased expression of Socs3 and subsequently reduced activation of STAT3 in adiponectin-null mice may contribute to the alteration of the liver regeneration capability and hepatic lipid metabolism after PH.
On Determining the Rise, Size, and Duration Classes of a Sunspot Cycle
NASA Astrophysics Data System (ADS)
Wilson, Robert M.; Hathaway, David H.; Reichmann, Edwin J.
1996-09-01
The behavior of ascent duration, maximum amplitude, and period for cycles 1 to 21 suggests that they are not mutually independent. Analysis of the resultant three-dimensional contingency table for cycles divided according to rise time (ascent duration), size (maximum amplitude), and duration (period) yields a chi-square statistic (= 18.59) that is larger than the test statistic (= 9.49 for 4 degrees-of-freedom at the 5-percent level of significance), thereby, inferring that the null hypothesis (mutual independence) can be rejected. Analysis of individual 2 by 2 contingency tables (based on Fisher's exact test) for these parameters shows that, while ascent duration is strongly related to maximum amplitude in the negative sense (inverse correlation) - the Waldmeier effect, it also is related (marginally) to period, but in the positive sense (direct correlation). No significant (or marginally significant) correlation is found between period and maximum amplitude. Using cycle 22 as a test case, we show that by the 12th month following conventional onset, cycle 22 appeared highly likely to be a fast-rising, larger-than-average-size cycle. Because of the inferred correlation between ascent duration and period, it also seems likely that it will have a period shorter than average length.
Memory and Trend of Precipitation in China during 1966-2013
NASA Astrophysics Data System (ADS)
Du, M.; Sun, F.; Liu, W.
2017-12-01
As climate change has had a significant impact on water cycle, the characteristic and variation of precipitation under climate change turned into a hotspot in hydrology. This study aims to analyze the trend and memory (both short-term and long-term) of precipitation in China. To do that, we apply statistical tests (including Mann-Kendall test, Ljung-Box test and Hurst exponent) to annual precipitation (P), frequency of rainy day (λ) and mean daily rainfall in days when precipitation occurs (α) in China (1966-2013). We also use a resampling approach to determine the field significance. From there, we evaluate the spatial distribution and percentages of stations with significant memory or trend. We find that the percentages of significant downtrends for λ and significant uptrends for α are significantly larger than the critical values at 95% field significance level, probably caused by the global warming. From these results, we conclude that extra care is necessary when significant results are obtained using statistical tests. This is because the null hypothesis could be rejected by chance and this situation is more likely to occur if spatial correlation is ignored according to the results of the resampling approach.
On Determining the Rise, Size, and Duration Classes of a Sunspot Cycle
NASA Technical Reports Server (NTRS)
Wilson, Robert M.; Hathaway, David H.; Reichmann, Edwin J.
1996-01-01
The behavior of ascent duration, maximum amplitude, and period for cycles 1 to 21 suggests that they are not mutually independent. Analysis of the resultant three-dimensional contingency table for cycles divided according to rise time (ascent duration), size (maximum amplitude), and duration (period) yields a chi-square statistic (= 18.59) that is larger than the test statistic (= 9.49 for 4 degrees-of-freedom at the 5-percent level of significance), thereby, inferring that the null hypothesis (mutual independence) can be rejected. Analysis of individual 2 by 2 contingency tables (based on Fisher's exact test) for these parameters shows that, while ascent duration is strongly related to maximum amplitude in the negative sense (inverse correlation) - the Waldmeier effect, it also is related (marginally) to period, but in the positive sense (direct correlation). No significant (or marginally significant) correlation is found between period and maximum amplitude. Using cycle 22 as a test case, we show that by the 12th month following conventional onset, cycle 22 appeared highly likely to be a fast-rising, larger-than-average-size cycle. Because of the inferred correlation between ascent duration and period, it also seems likely that it will have a period shorter than average length.
Sunspot random walk and 22-year variation
Love, Jeffrey J.; Rigler, E. Joshua
2012-01-01
We examine two stochastic models for consistency with observed long-term secular trends in sunspot number and a faint, but semi-persistent, 22-yr signal: (1) a null hypothesis, a simple one-parameter random-walk model of sunspot-number cycle-to-cycle change, and, (2) an alternative hypothesis, a two-parameter random-walk model with an imposed 22-yr alternating amplitude. The observed secular trend in sunspots, seen from solar cycle 5 to 23, would not be an unlikely result of the accumulation of multiple random-walk steps. Statistical tests show that a 22-yr signal can be resolved in historical sunspot data; that is, the probability is low that it would be realized from random data. On the other hand, the 22-yr signal has a small amplitude compared to random variation, and so it has a relatively small effect on sunspot predictions. Many published predictions for cycle 24 sunspots fall within the dispersion of previous cycle-to-cycle sunspot differences. The probability is low that the Sun will, with the accumulation of random steps over the next few cycles, walk down to a Dalton-like minimum. Our models support published interpretations of sunspot secular variation and 22-yr variation resulting from cycle-to-cycle accumulation of dynamo-generated magnetic energy.
Observation-Oriented Modeling: Going beyond "Is It All a Matter of Chance"?
ERIC Educational Resources Information Center
Grice, James W.; Yepez, Maria; Wilson, Nicole L.; Shoda, Yuichi
2017-01-01
An alternative to null hypothesis significance testing is presented and discussed. This approach, referred to as observation-oriented modeling, is centered on model building in an effort to explicate the structures and processes believed to generate a set of observations. In terms of analysis, this novel approach complements traditional methods…
Three New Methods for Analysis of Answer Changes
ERIC Educational Resources Information Center
Sinharay, Sandip; Johnson, Matthew S.
2017-01-01
In a pioneering research article, Wollack and colleagues suggested the "erasure detection index" (EDI) to detect test tampering. The EDI can be used with or without a continuity correction and is assumed to follow the standard normal distribution under the null hypothesis of no test tampering. When used without a continuity correction,…
ERIC Educational Resources Information Center
Ejionueme, L. K.; Oyoyo, Anthonia Oluchi
2015-01-01
The study was conducted to investigate the application of Total Quality Management (TQM) in secondary school administration in Umuahia Education Zone. Three research questions and one null hypothesis guided the study. Descriptive survey design was employed for the study. The population of the study comprised 1365 administrators. Multi-stage…
Fraternity as "Enabling Environment:" Does Membership Lead to Gambling Problems?
ERIC Educational Resources Information Center
Biddix, J. Patrick; Hardy, Thomas W.
2008-01-01
Researchers have suggested that fraternity membership is the most reliable predictor of gambling and gambling problems on campus. The purpose of this study was to determine if problematic gambling could be linked to specific aspects of fraternity membership. Though the null hypothesis (no enabling environment) failed to be rejected, descriptive…
Remediating Misconception on Climate Change among Secondary School Students in Malaysia
ERIC Educational Resources Information Center
Karpudewan, Mageswary; Roth, Wolff-Michael; Chandrakesan, Kasturi
2015-01-01
Existing studies report on secondary school students' misconceptions related to climate change; they also report on the methods of teaching as reinforcing misconceptions. This quasi-experimental study was designed to test the null hypothesis that a curriculum based on constructivist principles does not lead to greater understanding and fewer…
ERIC Educational Resources Information Center
Dunst, Carl J.; Hamby, Deborah W.
2012-01-01
This paper includes a nontechnical description of methods for calculating effect sizes in intellectual and developmental disability studies. Different hypothetical studies are used to illustrate how null hypothesis significance testing (NHST) and effect size findings can result in quite different outcomes and therefore conflicting results. Whereas…
How Often Is p[subscript rep] Close to the True Replication Probability?
ERIC Educational Resources Information Center
Trafimow, David; MacDonald, Justin A.; Rice, Stephen; Clason, Dennis L.
2010-01-01
Largely due to dissatisfaction with the standard null hypothesis significance testing procedure, researchers have begun to consider alternatives. For example, Killeen (2005a) has argued that researchers should calculate p[subscript rep] that is purported to indicate the probability that, if the experiment in question were replicated, the obtained…
Unsecure School Environment and School Phobic Behavior
ERIC Educational Resources Information Center
Tukur, Abubakar Hamman; Muhammad, Khadijatu
2017-01-01
This study determines the level of student's school phobic behavior as a result of insecurity of school environment. The study was guided by one research question and one null hypothesis. The population of the study was all the secondary schools in Maiduguri, Borno state numbering about the same of the study was senior secondary students in…
Disadvantages of the Horsfall-Barratt Scale for estimating severity of citrus canker
USDA-ARS?s Scientific Manuscript database
Direct visual estimation of disease severity to the nearest percent was compared to using the Horsfall-Barratt (H-B) scale. Data from a simulation model designed to sample two diseased populations were used to investigate the probability of the two methods to reject a null hypothesis (H0) using a t-...
Use of the disease severity index for null hypothesis testing
USDA-ARS?s Scientific Manuscript database
A disease severity index (DSI) is a single number for summarizing a large amount of disease severity information. It is used to indicate relative resistance of cultivars, to relate disease severity to yield loss, or to compare treatments. The DSI has most often been based on a special type of ordina...
Association between Propionibacterium acnes and frozen shoulder: a pilot study.
Bunker, Tim D; Boyd, Matthew; Gallacher, Sian; Auckland, Cressida R; Kitson, Jeff; Smith, Chris D
2014-10-01
Frozen shoulder has not previously been shown to be associated with infection. The present study set out to confirm the null hypothesis that there is no relationship between infection and frozen shoulder using two modern scientific methods, extended culture and polymerase chain reaction (PCR) for bacterial nucleic acids. A prospective cohort of 10 patients undergoing arthroscopic release for stage II idiopathic frozen shoulder had two biopsies of tissue taken from the affected shoulder joint capsule at the time of surgery, along with control biopsies of subdermal fat. The biopsies and controls were examined with extended culture and PCR for microbial nucleic acid. Eight of the 10 patients had positive findings on extended culture in their shoulder capsule and, in six of these, Propionibacterium acnes was present. The findings mean that we must reject the null hypothesis that there is no relationship between infection and frozen shoulder. More studies are urgently needed to confirm or refute these findings. If they are confirmed, this could potentially lead to new and effective treatments for this common, painful and disabling condition. Could P. acnes be the Helicobacter of frozen shoulder?
Incorporation of metal and color alteration of enamel in the presence of orthodontic appliances.
Maia, Lúcio Henrique E Gurgel; Filho, Hibernon Lopes de Lima; Araújo, Marcus Vinícius Almeida; Ruellas, Antônio Carlos de Oliveira; Araújo, Mônica Tirre de Souza
2012-09-01
To test the null hypothesis that it is not possible to incorporate metal ions arising from orthodontic appliance corrosion into tooth enamel with resulting tooth color change. This in vitro study used atomic absorption spectrophotometry to evaluate the presence of nickel, chromium, and iron ions in tooth enamel in three groups: a group submitted to cyclic demineralization and remineralization processes with solutions in which orthodontic appliances were previously immersed and corroded, releasing metallic ions; a control group; and another group, submitted to cycling only, without the presence of orthodontic appliances. The influence of the incorporation of these metals on a possible alteration in color was measured with a portable digital spectrophotometer using the CIE LAB system. At the end of the experiment, a significantly higher concentration of chromium and nickel (P < .05) was found in the group in which corrosion was present, and in this group, there was significantly greater color alteration (P ≤ .001). There was chromium and nickel incorporation into enamel and tooth color change when corrosion of orthodontic appliances was associated with cycling process. The null hypothesis is rejected.
Guzman-Rojas, Liliana; Rangel, Roberto; Salameh, Ahmad; Edwards, Julianna K; Dondossola, Eleonora; Kim, Yun-Gon; Saghatelian, Alan; Giordano, Ricardo J; Kolonin, Mikhail G; Staquicini, Fernanda I; Koivunen, Erkki; Sidman, Richard L; Arap, Wadih; Pasqualini, Renata
2012-01-31
Processes that promote cancer progression such as angiogenesis require a functional interplay between malignant and nonmalignant cells in the tumor microenvironment. The metalloprotease aminopeptidase N (APN; CD13) is often overexpressed in tumor cells and has been implicated in angiogenesis and cancer progression. Our previous studies of APN-null mice revealed impaired neoangiogenesis in model systems without cancer cells and suggested the hypothesis that APN expressed by nonmalignant cells might promote tumor growth. We tested this hypothesis by comparing the effects of APN deficiency in allografted malignant (tumor) and nonmalignant (host) cells on tumor growth and metastasis in APN-null mice. In two independent tumor graft models, APN activity in both the tumors and the host cells cooperate to promote tumor vascularization and growth. Loss of APN expression by the host and/or the malignant cells also impaired lung metastasis in experimental mouse models. Thus, cooperation in APN expression by both cancer cells and nonmalignant stromal cells within the tumor microenvironment promotes angiogenesis, tumor growth, and metastasis.
Whitworth, John Martin; Kanaa, Mohammad Dib; Corbett, Ian Porter; Meechan, John Gerald
2007-10-01
This randomized, double-blind trial tested the null hypothesis that speed of deposition has no influence on the injection discomfort, efficacy, distribution, and duration of pulp anesthesia after incisive/mental nerve block in adult volunteers. Thirty-eight subjects received incisive/mental nerve blocks of 2.0 mL lidocaine with 1:80,000 epinephrine slowly over 60 seconds or rapidly over 15 seconds at least 1 week apart. Pulp anesthesia was assessed electronically to 45 minutes after injection. Injection discomfort was self-recorded on visual analogue scales. Overall, 48.7% of volunteers developed pulp anesthesia in first molars, 81.8% in bicuspids, and 38.5% in lateral incisors. The mean duration of pulp anesthesia was 19.1 minutes for first molars, 28.5 minutes for bicuspids, and 19.0 minutes for lateral incisors. Speed of injection had no significant influence on anesthetic success or duration of anesthesia for individual teeth. Slow injection was significantly more comfortable than rapid injection (P < .001). The null hypothesis was supported, although slow injection was more comfortable.
Size, time, and asynchrony matter: the species-area relationship for parasites of freshwater fishes.
Zelmer, Derek A
2014-10-01
The tendency to attribute species-area relationships to "island biogeography" effectively bypasses the examination of specific mechanisms that act to structure parasite communities. Positive covariation between fish size and infrapopulation richness should not be examined within the typical extinction-based paradigm, but rather should be addressed from the standpoint of differences in colonization potential among individual hosts. Although most mechanisms producing the aforementioned pattern constitute some variation of passive sampling, the deterministic aspects of the accumulation of parasite individuals by fish hosts makes untenable the suggestion that infracommunities of freshwater fishes are stochastic assemblages. At the component community level, application of extinction-dependent mechanisms might be appropriate, given sufficient time for colonization, but these structuring forces likely act indirectly through their effects on the host community to increase the probability of parasite persistence. At all levels, the passive sampling hypothesis is a relevant null model. The tendency for mechanisms that produce species-area relationships to produce nested subset patterns means that for most systems, the passive sampling hypothesis can be addressed through the application of appropriate null models of nested subset structure.
Seebacher, Frank
2005-10-01
Biological functions are dependent on the temperature of the organism. Animals may respond to fluctuation in the thermal environment by regulating their body temperature and by modifying physiological and biochemical rates. Phenotypic flexibility (reversible phenotypic plasticity, acclimation, or acclimatisation in rate functions occurs in all major taxonomic groups and may be considered as an ancestral condition. Within the Reptilia, representatives from all major groups show phenotypic flexibility in response to long-term or chronic changes in the thermal environment. Acclimation or acclimatisation in reptiles are most commonly assessed by measuring whole animal responses such as oxygen consumption, but whole animal responses are comprised of variation in individual traits such as enzyme activities, hormone expression, and cardiovascular functions. The challenge now lies in connecting the changes in the components to the functioning of the whole animal and its fitness. Experimental designs in research on reptilian thermal physiology should incorporate the capacity for reversible phenotypic plasticity as a null-hypothesis, because the significance of differential body temperature-performance relationships (thermal reaction norms) between individuals, populations, or species cannot be assessed without testing that null-hypothesis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blume-Kohout, Robin J; Scholten, Travis L.
Quantum state tomography on a d-dimensional system demands resources that grow rapidly with d. They may be reduced by using model selection to tailor the number of parameters in the model (i.e., the size of the density matrix). Most model selection methods typically rely on a test statistic and a null theory that describes its behavior when two models are equally good. Here, we consider the loglikelihood ratio. Because of the positivity constraint ρ ≥ 0, quantum state space does not generally satisfy local asymptotic normality (LAN), meaning the classical null theory for the loglikelihood ratio (the Wilks theorem) shouldmore » not be used. Thus, understanding and quantifying how positivity affects the null behavior of this test statistic is necessary for its use in model selection for state tomography. We define a new generalization of LAN, metric-projected LAN, show that quantum state space satisfies it, and derive a replacement for the Wilks theorem. In addition to enabling reliable model selection, our results shed more light on the qualitative effects of the positivity constraint on state tomography.« less
Qu, Long; Guennel, Tobias; Marshall, Scott L
2013-12-01
Following the rapid development of genome-scale genotyping technologies, genetic association mapping has become a popular tool to detect genomic regions responsible for certain (disease) phenotypes, especially in early-phase pharmacogenomic studies with limited sample size. In response to such applications, a good association test needs to be (1) applicable to a wide range of possible genetic models, including, but not limited to, the presence of gene-by-environment or gene-by-gene interactions and non-linearity of a group of marker effects, (2) accurate in small samples, fast to compute on the genomic scale, and amenable to large scale multiple testing corrections, and (3) reasonably powerful to locate causal genomic regions. The kernel machine method represented in linear mixed models provides a viable solution by transforming the problem into testing the nullity of variance components. In this study, we consider score-based tests by choosing a statistic linear in the score function. When the model under the null hypothesis has only one error variance parameter, our test is exact in finite samples. When the null model has more than one variance parameter, we develop a new moment-based approximation that performs well in simulations. Through simulations and analysis of real data, we demonstrate that the new test possesses most of the aforementioned characteristics, especially when compared to existing quadratic score tests or restricted likelihood ratio tests. © 2013, The International Biometric Society.
Isabwe, Alain; Yang, Jun R; Wang, Yongming; Liu, Lemian; Chen, Huihuang; Yang, Jun
2018-07-15
Although the influence of microbial community assembly processes on aquatic ecosystem function and biodiversity is well known, the processes that govern planktonic communities in human-impacted rivers remain largely unstudied. Here, we used multivariate statistics and a null model approach to test the hypothesis that environmental conditions and obstructed dispersal opportunities, dictate a deterministic community assembly for phytoplankton and bacterioplankton across contrasting hydrographic conditions in a subtropical mid-sized river (Jiulong River, southeast China). Variation partitioning analysis showed that the explanatory power of local environmental variables was larger than that of the spatial variables for both plankton communities during the dry season. During the wet season, phytoplankton community variation was mainly explained by local environmental variables, whereas the variance in bacterioplankton was explained by both environmental and spatial predictors. The null model based on Raup-Crick coefficients for both planktonic groups suggested little evidences of the stochastic processes involving dispersal and random distribution. Our results showed that hydrological change and landscape structure act together to cause divergence in communities along the river channel, thereby dictating a deterministic assembly and that selection exceeds dispersal limitation during the dry season. Therefore, to protect the ecological integrity of human-impacted rivers, watershed managers should not only consider local environmental conditions but also dispersal routes to account for the effect of regional species pool on local communities. Copyright © 2018 Elsevier B.V. All rights reserved.
The data-driven null models for information dissemination tree in social networks
NASA Astrophysics Data System (ADS)
Zhang, Zhiwei; Wang, Zhenyu
2017-10-01
For the purpose of detecting relatedness and co-occurrence between users, as well as the distribution features of nodes in spreading path of a social network, this paper explores topological characteristics of information dissemination trees (IDT) that can be employed indirectly to probe the information dissemination laws within social networks. Hence, three different null models of IDT are presented in this article, including the statistical-constrained 0-order IDT null model, the random-rewire-broken-edge 0-order IDT null model and the random-rewire-broken-edge 2-order IDT null model. These null models firstly generate the corresponding randomized copy of an actual IDT; then the extended significance profile, which is developed by adding the cascade ratio of information dissemination path, is exploited not only to evaluate degree correlation of two nodes associated with an edge, but also to assess the cascade ratio of different length of information dissemination paths. The experimental correspondences of the empirical analysis for several SinaWeibo IDTs and Twitter IDTs indicate that the IDT null models presented in this paper perform well in terms of degree correlation of nodes and dissemination path cascade ratio, which can be better to reveal the features of information dissemination and to fit the situation of real social networks.
Revisiting tests for neglected nonlinearity using artificial neural networks.
Cho, Jin Seo; Ishida, Isao; White, Halbert
2011-05-01
Tests for regression neglected nonlinearity based on artificial neural networks (ANNs) have so far been studied by separately analyzing the two ways in which the null of regression linearity can hold. This implies that the asymptotic behavior of general ANN-based tests for neglected nonlinearity is still an open question. Here we analyze a convenient ANN-based quasi-likelihood ratio statistic for testing neglected nonlinearity, paying careful attention to both components of the null. We derive the asymptotic null distribution under each component separately and analyze their interaction. Somewhat remarkably, it turns out that the previously known asymptotic null distribution for the type 1 case still applies, but under somewhat stronger conditions than previously recognized. We present Monte Carlo experiments corroborating our theoretical results and showing that standard methods can yield misleading inference when our new, stronger regularity conditions are violated.
Improved Statistics for Genome-Wide Interaction Analysis
Ueki, Masao; Cordell, Heather J.
2012-01-01
Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result. PMID:22496670
Applying Statistical Process Quality Control Methodology to Educational Settings.
ERIC Educational Resources Information Center
Blumberg, Carol Joyce
A subset of Statistical Process Control (SPC) methodology known as Control Charting is introduced. SPC methodology is a collection of graphical and inferential statistics techniques used to study the progress of phenomena over time. The types of control charts covered are the null X (mean), R (Range), X (individual observations), MR (moving…
Organic Anion Transporting Polypeptide 1a1 Null Mice Are Sensitive to Cholestatic Liver Injury
Zhang, Youcai; Csanaky, Iván L.; Cheng, Xingguo; Lehman-McKeeman, Lois D.; Klaassen, Curtis D.
2012-01-01
Organic anion transporting polypeptide 1a1 (Oatp1a1) is predominantly expressed in livers of mice and is thought to transport bile acids (BAs) from blood into liver. Because Oatp1a1 expression is markedly decreased in mice after bile duct ligation (BDL). We hypothesized that Oatp1a1-null mice would be protected against liver injury during BDL-induced cholestasis due largely to reduced hepatic uptake of BAs. To evaluate this hypothesis, BDL surgeries were performed in both male wild-type (WT) and Oatp1a1-null mice. At 24 h after BDL, Oatp1a1-null mice showed higher serum alanine aminotransferase levels and more severe liver injury than WT mice, and all Oatp1a1-null mice died within 4 days after BDL, whereas all WT mice survived. At 24 h after BDL, surprisingly Oatp1a1-null mice had higher total BA concentrations in livers than WT mice, suggesting that loss of Oatp1a1 did not prevent BA accumulation in the liver. In addition, secondary BAs dramatically increased in serum of Oatp1a1-null BDL mice but not in WT BDL mice. Oatp1a1-null BDL mice had similar basolateral BA uptake (Na+-taurocholate cotransporting polypeptide and Oatp1b2) and BA-efflux (multidrug resistance–associated protein [Mrp]-3, Mrp4, and organic solute transporter α/β) transporters, as well as BA-synthetic enzyme (Cyp7a1) in livers as WT BDL mice. Hepatic expression of small heterodimer partner Cyp3a11, Cyp4a14, and Nqo1, which are target genes of farnesoid X receptor, pregnane X receptor, peroxisome proliferator-activated receptor alpha, and NF-E2-related factor 2, respectively, were increased in WT BDL mice but not in Oatp1a1-null BDL mice. These results demonstrate that loss of Oatp1a1 function exacerbates cholestatic liver injury in mice and suggest that Oatp1a1 plays a unique role in liver adaptive responses to obstructive cholestasis. PMID:22461449
NASA Astrophysics Data System (ADS)
Willenbring, J. K.; Jerolmack, D. J.
2015-12-01
At the largest time and space scales, the pace of erosion and chemical weathering is determined by tectonic uplift rates. Deviations from this equilibrium condition arise from the transient response of landscape denudation to climatic and tectonic perturbations, and may be long lived. We posit that the constraint of mass balance, however, makes it unlikely that such disequilibrium persists at the global scale over millions of years, as has been proposed for late Cenozoic erosion. To support this contention, we synthesize existing data for weathering fluxes, global sedimentation rates, sediment yields and tectonic motions. The records show a remarkable constancy in the pace of Earth-surface evolution over the last 10 million years. These findings provide strong support for the null hypothesis; that global rates of landscape change have remained constant over the last ten million years, despite global climate change and massive mountain building events. Two important implications are: (1) global climate change may not change global denudation rates, because the nature and sign of landscape responses are varied; and (2) tectonic and climatic perturbations are accommodated in the long term by changes in landscape form. This work undermines the hypothesis that increased weathering due to late Cenozoic mountain building or climate change was the primary agent for a decrease in global temperatures.
Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul
2016-01-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257
Motion versus position in the perception of head-centred movement.
Freeman, Tom C A; Sumnall, Jane H
2002-01-01
Abstract. Observers can recover motion with respect to the head during an eye movement by comparing signals encoding retinal motion and the velocity of pursuit. Evidently there is a mismatch between these signals because perceived head-centred motion is not always veridical. One example is the Filehne illusion, in which a stationary object appears to move in the opposite direction to pursuit. Like the motion aftereffect, the phenomenal experience of the Filehne illusion is one in which the stimulus moves but does not seem to go anywhere. This raises problems when measuring the illusion by motion nulling because the more traditional technique confounds perceived motion with changes in perceived position. We devised a new nulling technique using global-motion stimuli that degraded familiar position cues but preserved cues to motion. Stimuli consisted of random-dot patterns comprising signal and noise dots that moved at the same retinal 'base' speed. Noise moved in random directions. In an eye-stationary speed-matching experiment we found noise slowed perceived retinal speed as 'coherence strength' (ie percentage of signal) was reduced. The effect occurred over the two-octave range of base speeds studied and well above direction threshold. When the same stimuli were combined with pursuit, observers were able to null the Filehne illusion by adjusting coherence. A power law relating coherence to retinal base speed fit the data well with a negative exponent. Eye-movement recordings showed that pursuit was quite accurate. We then tested the hypothesis that the stimuli found at the null-points appeared to move at the same retinal speed. Two observers supported the hypothesis, a third partially, and a fourth showed a small linear trend. In addition, the retinal speed found by the traditional Filehne technique was similar to the matches obtained with the global-motion stimuli. The results provide support for the idea that speed is the critical cue in head-centred motion perception.
Pan, Luyuan; Broadie, Kendal S
2007-11-07
A current hypothesis proposes that fragile X mental retardation protein (FMRP), an RNA-binding translational regulator, acts downstream of glutamatergic transmission, via metabotropic glutamate receptor (mGluR) G(q)-dependent signaling, to modulate protein synthesis critical for trafficking ionotropic glutamate receptors (iGluRs) at synapses. However, direct evidence linking FMRP and mGluR function with iGluR synaptic expression is limited. In this study, we use the Drosophila fragile X model to test this hypothesis at the well characterized glutamatergic neuromuscular junction (NMJ). Two iGluR classes reside at this synapse, each containing common GluRIIC (III), IID and IIE subunits, and variable GluRIIA (A-class) or GluRIIB (B-class) subunits. In Drosophila fragile X mental retardation 1 (dfmr1) null mutants, A-class GluRs accumulate and B-class GluRs are lost, whereas total GluR levels do not change, resulting in a striking change in GluR subclass ratio at individual synapses. The sole Drosophila mGluR, DmGluRA, is also expressed at the NMJ. In dmGluRA null mutants, both iGluR classes increase, resulting in an increase in total synaptic GluR content at individual synapses. Targeted postsynaptic dmGluRA overexpression causes the exact opposite GluR phenotype to the dfmr1 null, confirming postsynaptic GluR subtype-specific regulation. In dfmr1; dmGluRA double null mutants, there is an additive increase in A-class GluRs, and a similar additive impact on B-class GluRs, toward normal levels in the double mutants. These results show that both dFMRP and DmGluRA differentially regulate the abundance of different GluR subclasses in a convergent mechanism within individual postsynaptic domains.
Moshtagh-Khorasani, Majid; Akbarzadeh-T, Mohammad-R; Jahangiri, Nader; Khoobdel, Mehdi
2009-01-01
BACKGROUND: Aphasia diagnosis is particularly challenging due to the linguistic uncertainty and vagueness, inconsistencies in the definition of aphasic syndromes, large number of measurements with imprecision, natural diversity and subjectivity in test objects as well as in opinions of experts who diagnose the disease. METHODS: Fuzzy probability is proposed here as the basic framework for handling the uncertainties in medical diagnosis and particularly aphasia diagnosis. To efficiently construct this fuzzy probabilistic mapping, statistical analysis is performed that constructs input membership functions as well as determines an effective set of input features. RESULTS: Considering the high sensitivity of performance measures to different distribution of testing/training sets, a statistical t-test of significance is applied to compare fuzzy approach results with NN results as well as author's earlier work using fuzzy logic. The proposed fuzzy probability estimator approach clearly provides better diagnosis for both classes of data sets. Specifically, for the first and second type of fuzzy probability classifiers, i.e. spontaneous speech and comprehensive model, P-values are 2.24E-08 and 0.0059, respectively, strongly rejecting the null hypothesis. CONCLUSIONS: The technique is applied and compared on both comprehensive and spontaneous speech test data for diagnosis of four Aphasia types: Anomic, Broca, Global and Wernicke. Statistical analysis confirms that the proposed approach can significantly improve accuracy using fewer Aphasia features. PMID:21772867
NASA Astrophysics Data System (ADS)
Smith, Tony E.; Lee, Ka Lok
2012-01-01
There is a common belief that the presence of residual spatial autocorrelation in ordinary least squares (OLS) regression leads to inflated significance levels in beta coefficients and, in particular, inflated levels relative to the more efficient spatial error model (SEM). However, our simulations show that this is not always the case. Hence, the purpose of this paper is to examine this question from a geometric viewpoint. The key idea is to characterize the OLS test statistic in terms of angle cosines and examine the geometric implications of this characterization. Our first result is to show that if the explanatory variables in the regression exhibit no spatial autocorrelation, then the distribution of test statistics for individual beta coefficients in OLS is independent of any spatial autocorrelation in the error term. Hence, inferences about betas exhibit all the optimality properties of the classic uncorrelated error case. However, a second more important series of results show that if spatial autocorrelation is present in both the dependent and explanatory variables, then the conventional wisdom is correct. In particular, even when an explanatory variable is statistically independent of the dependent variable, such joint spatial dependencies tend to produce "spurious correlation" that results in over-rejection of the null hypothesis. The underlying geometric nature of this problem is clarified by illustrative examples. The paper concludes with a brief discussion of some possible remedies for this problem.
Lemire, Mathieu; Roslin, Nicole M.; Laprise, Catherine; Hudson, Thomas J.; Morgan, Kenneth
2004-01-01
We studied the effect of transmission-ratio distortion (TRD) on tests of linkage based on allele sharing in affected sib pairs. We developed and implemented a discrete-trait allele-sharing test statistic, Sad, analogous to the Spairs test statistic of Whittemore and Halpern, that evaluates an excess sharing of alleles at autosomal loci in pairs of affected siblings, as well as a lack of sharing in phenotypically discordant relative pairs, where available. Under the null hypothesis of no linkage, nuclear families with at least two affected siblings and one unaffected sibling have a contribution to Sad that is unbiased, with respect to the effects of TRD independent of the disease under study. If more distantly related unaffected individuals are studied, the bias of Sad is generally reduced compared with that of Spairs, but not completely. Moreover, Sad has higher power, in some circumstances, because of the availability of unaffected relatives, who are ignored in affected-only analyses. We discuss situations in which it may be an efficient use of resources to genotype unaffected relatives, which would give insights for promising study designs. The method is applied to a sample of pedigrees ascertained for asthma in a chromosomal region in which TRD has been reported. Results are consistent with the presence of transmission distortion in that region. PMID:15322985
Lee, Seong Min; Pike, J Wesley
2016-11-01
The vitamin D receptor (VDR) is a critical mediator of the biological actions of 1,25-dihydroxyvitamin D 3 (1,25(OH) 2 D 3 ). As a nuclear receptor, ligand activation of the VDR leads to the protein's binding to specific sites on the genome that results in the modulation of target gene expression. The VDR is also known to play a role in the hair cycle, an action that appears to be 1,25(OH) 2 D 3 -independent. Indeed, in the absence of the VDR as in hereditary 1,25-dihydroxyvitamin D resistant rickets (HVDRR) both skin defects and alopecia emerge. Recently, we generated a mouse model of HVDRR without alopecia wherein a mutant human VDR lacking 1,25(OH) 2 D 3 -binding activity was expressed in the absence of endogenous mouse VDR. While 1,25(OH) 2 D 3 failed to induce gene expression in these mice, resulting in an extensive skeletal phenotype, the receptor was capable of restoring normal hair cycling. We also noted a level of secondary hyperparathyroidism that was much higher than that seen in the VDR null mouse and was associated with an exaggerated bone phenotype as well. This suggested that the VDR might play a role in parathyroid hormone (PTH) regulation independent of 1,25(OH) 2 D 3 . To evaluate this hypothesis further, we contrasted PTH levels in the HVDRR mouse model with those seen in Cyp27b1 null mice where the VDR was present but the hormone was absent. The data revealed that PTH was indeed higher in Cyp27b1 null mice compared to VDR null mice. To evaluate the mechanism of action underlying such a hypothesis, we measured the expression levels of a number of VDR target genes in the duodena of wildtype mice and in transgenic mice expressing either normal or hormone-binding deficient mutant VDRs. We also compared expression levels of these genes between VDR null mice and Cyp27b1 null mice. In a subset of cases, the expression of VDR target genes was lower in mice containing the VDR as opposed to mice that did not. We suggest that the VDR may function as a selective suppressor/de-repressor of gene expression in the absence of 1,25(OH) 2 D 3 . Copyright © 2015 Elsevier Ltd. All rights reserved.
McArtor, Daniel B.; Lubke, Gitta H.; Bergeman, C. S.
2017-01-01
Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains. PMID:27738957
McArtor, Daniel B; Lubke, Gitta H; Bergeman, C S
2017-12-01
Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains.
ERIC Educational Resources Information Center
Bahrick, Lorraine E.; Hernandez-Reif, Maria; Pickens, Jeffrey N.
1997-01-01
Tested hypothesis from Bahrick and Pickens' infant attention model that retrieval cues increase memory accessibility and shift visual preferences toward greater novelty to resemble recent memories. Found that after retention intervals associated with remote or intermediate memory, previous familiarity preferences shifted to null or novelty…
Replicating Peer-Led Team Learning in Cyberspace: Research, Opportunities, and Challenges
ERIC Educational Resources Information Center
Smith, Joshua; Wilson, Sarah Beth; Banks, Julianna; Zhu, Lin; Varma-Nelson, Pratibha
2014-01-01
This quasi-experimental, mixed methods study examined the transfer of a well-established pedagogical strategy, Peer-Led Team Learning (PLTL), to an online workshop environment (cPLTL) in a general chemistry course at a research university in the Midwest. The null hypothesis guiding the study was that no substantive differences would emerge between…
Kinase-Mediated Regulation of 40S Ribosome Assembly in Human Breast Cancer
2017-02-01
Major Task 2 and Major Task 3 (i) CRISPR /cas9-deleted Ltv1-null TNBC clones that have been purified by Dr. Karbstein by growth in the presence of...cells using CRISPR /Cas9 technology proved difficult, as our hypothesis was correct, where Ltv1 is essential for proper growth of TNBC. In particular
Management by Objectives (MBO) Imperatives for Transforming Higher Education for a Globalised World
ERIC Educational Resources Information Center
Ofojebe, Wenceslaus N.; Olibie, Eyiuche Ifeoma
2014-01-01
This study was conducted to determine the extent to which the stipulations and visions of Management by Objectives (MBO) would be integrated in higher education institutions in South Eastern Nigeria to enhance higher education transformation in a globalised world. Four research questions and a null hypothesis guided the study. A sample of 510…
Mini-Versus Traditional: An Experimental Study of High School Social Studies Curricula.
ERIC Educational Resources Information Center
Roberts, Arthur D.; Gable, Robert K.
This study assessed some of the cognitive and affective elements for both the traditional and mini curricula. The hypothesis, stated in the null form, was there will be no difference between students in the mini-course curriculum and the traditional curriculum on a number of stated cognitive variables (focusing on critical thinking and reading…
A Comparison of Uniform DIF Effect Size Estimators under the MIMIC and Rasch Models
ERIC Educational Resources Information Center
Jin, Ying; Myers, Nicholas D.; Ahn, Soyeon; Penfield, Randall D.
2013-01-01
The Rasch model, a member of a larger group of models within item response theory, is widely used in empirical studies. Detection of uniform differential item functioning (DIF) within the Rasch model typically employs null hypothesis testing with a concomitant consideration of effect size (e.g., signed area [SA]). Parametric equivalence between…
ERIC Educational Resources Information Center
Sehati, Samira; Khodabandehlou, Morteza
2017-01-01
The present investigation was an attempt to study on the effect of power point enhanced teaching (visual input) on Iranian Intermediate EFL learners' listening comprehension ability. To that end, a null hypothesis was formulated as power point enhanced teaching (visual input) has no effect on Iranian Intermediate EFL learners' listening…
On the Directionality Test of Peer Effects in Social Networks
ERIC Educational Resources Information Center
An, Weihua
2016-01-01
One interesting idea in social network analysis is the directionality test that utilizes the directions of social ties to help identify peer effects. The null hypothesis of the test is that if contextual factors are the only force that affects peer outcomes, the estimated peer effects should not differ, if the directions of social ties are…
ERIC Educational Resources Information Center
Eze, Ogwa Christopher
2015-01-01
This research was conducted to ascertain teachers' and students perception of instructional supervision in relation to capacity building in electrical installation trade in technical colleges. Three research questions and a null hypothesis were employed to guide the study. Descriptive survey was adopted. A 23-item questionnaire was used to elicit…
USDA-ARS?s Scientific Manuscript database
A disease severity index (DSI) is a single number for summarizing a large amount of information on disease severity. It has been used to indicate the performance of a cultivar in regard to disease resistance at a particular location, to relate disease severity to yield loss, to determine the effecti...
Brownian model of transcriptome evolution and phylogenetic network visualization between tissues.
Gu, Xun; Ruan, Hang; Su, Zhixi; Zou, Yangyun
2017-09-01
While phylogenetic analysis of transcriptomes of the same tissue is usually congruent with the species tree, the controversy emerges when multiple tissues are included, that is, whether species from the same tissue are clustered together, or different tissues from the same species are clustered together. Recent studies have suggested that phylogenetic network approach may shed some lights on our understanding of multi-tissue transcriptome evolution; yet the underlying evolutionary mechanism remains unclear. In this paper we develop a Brownian-based model of transcriptome evolution under the phylogenetic network that can statistically distinguish between the patterns of species-clustering and tissue-clustering. Our model can be used as a null hypothesis (neutral transcriptome evolution) for testing any correlation in tissue evolution, can be applied to cancer transcriptome evolution to study whether two tumors of an individual appeared independently or via metastasis, and can be useful to detect convergent evolution at the transcriptional level. Copyright © 2017. Published by Elsevier Inc.
Random positions of dendritic spines in human cerebral cortex.
Morales, Juan; Benavides-Piccione, Ruth; Dar, Mor; Fernaud, Isabel; Rodríguez, Angel; Anton-Sanchez, Laura; Bielza, Concha; Larrañaga, Pedro; DeFelipe, Javier; Yuste, Rafael
2014-07-23
Dendritic spines establish most excitatory synapses in the brain and are located in Purkinje cell's dendrites along helical paths, perhaps maximizing the probability to contact different axons. To test whether spine helixes also occur in neocortex, we reconstructed >500 dendritic segments from adult human cortex obtained from autopsies. With Fourier analysis and spatial statistics, we analyzed spine position along apical and basal dendrites of layer 3 pyramidal neurons from frontal, temporal, and cingulate cortex. Although we occasionally detected helical positioning, for the great majority of dendrites we could not reject the null hypothesis of spatial randomness in spine locations, either in apical or basal dendrites, in neurons of different cortical areas or among spines of different volumes and lengths. We conclude that in adult human neocortex spine positions are mostly random. We discuss the relevance of these results for spine formation and plasticity and their functional impact for cortical circuits. Copyright © 2014 the authors 0270-6474/14/3410078-07$15.00/0.
Grossman, R A
1995-09-01
The purpose of this study was to determine whether women can discriminate better from less effective paracervical block techniques applied to opposite sides of the cervix. If this discrimination could be made, it would be possible to compare different techniques and thus improve the quality of paracervical anesthesia. Two milliliters of local anesthetic was applied to one side and 6 ml to the other side of volunteers' cervices before cervical dilation. Statistical examination was by sequential analysis. The study was stopped after 47 subjects had entered, when sequential analysis found that there was no significant difference in women's perception of pain. Nine women reported more pain on the side with more anesthesia and eight reported more pain on the side with less anesthesia. Because the amount of anesthesia did not make a difference, the null hypothesis (that women cannot discriminate between different anesthetic techniques) was accepted. Women are not able to discriminate different doses of local anesthetic when applied to opposite sides of the cervix.
Frequentist Analysis of SLAC Rosenbluth Data
NASA Astrophysics Data System (ADS)
Higinbotham, Douglas; McClellan, Evan; Shamaiengar, Stephen
2017-01-01
Analysis of the SLAC NE-11 elastic electron-proton scattering data typically assumes that the 1.6 GeV spectrometer has a systematic normalization offset as compared to the well-known 8 GeV spectrometer, yet such an offset should have been observed globally. A review of doctoral theses from the period finds that analysis with high statistics, inelastic data saw no significant normalization difference. Moreover, the unique kinematics utilized to match the two spectrometers for normalization required the 8 GeV to be rotated beyond it's well-understood angular range. We try to quantify the confidence level of rejecting the null hypothesis, i.e. that the 1.6 GeV spectrometer normalization is correct, and will show the result of simply analyzing the cross section data as obtained. This is a critical study, as the 1.6 GeV spectrometer data drives the epsilon lever arm in Rosenbluth extractions, and therefore can have a significant impact on form factor extractions at high momentum transfer.
Cumming, Geoff; Fresc, Luca; Boedker, Ingrid; Tressoldi, Patrizio
2017-01-01
From January 2014, Psychological Science introduced new submission guidelines that encouraged the use of effect sizes, estimation, and meta-analysis (the “new statistics”), required extra detail of methods, and offered badges for use of open science practices. We investigated the use of these practices in empirical articles published by Psychological Science and, for comparison, by the Journal of Experimental Psychology: General, during the period of January 2013 to December 2015. The use of null hypothesis significance testing (NHST) was extremely high at all times and in both journals. In Psychological Science, the use of confidence intervals increased markedly overall, from 28% of articles in 2013 to 70% in 2015, as did the availability of open data (3 to 39%) and open materials (7 to 31%). The other journal showed smaller or much smaller changes. Our findings suggest that journal-specific submission guidelines may encourage desirable changes in authors’ practices. PMID:28414751
Saha, Ranajit; Pan, Sudip; Chattaraj, Pratim K
2016-11-05
The validity of the maximum hardness principle (MHP) is tested in the cases of 50 chemical reactions, most of which are organic in nature and exhibit anomeric effect. To explore the effect of the level of theory on the validity of MHP in an exothermic reaction, B3LYP/6-311++G(2df,3pd) and LC-BLYP/6-311++G(2df,3pd) (def2-QZVP for iodine and mercury) levels are employed. Different approximations like the geometric mean of hardness and combined hardness are considered in case there are multiple reactants and/or products. It is observed that, based on the geometric mean of hardness, while 82% of the studied reactions obey the MHP at the B3LYP level, 84% of the reactions follow this rule at the LC-BLYP level. Most of the reactions possess the hardest species on the product side. A 50% null hypothesis is rejected at a 1% level of significance.
NASA Astrophysics Data System (ADS)
Ma, Pengcheng; Li, Daye; Li, Shuo
2016-02-01
Using one minute high-frequency data of the Shanghai Composite Index (SHCI) and the Shenzhen Composite Index (SZCI) (2007-2008), we employ the detrended fluctuation analysis (DFA) and the detrended cross correlation analysis (DCCA) with rolling window approach to observe the evolution of market efficiency and cross-correlation in pre-crisis and crisis period. Considering the fat-tail distribution of return time series, statistical test based on shuffling method is conducted to verify the null hypothesis of no long-term dependence. Our empirical research displays three main findings. First Shanghai equity market efficiency deteriorated while Shenzhen equity market efficiency improved with the advent of financial crisis. Second the highly positive dependence between SHCI and SZCI varies with time scale. Third financial crisis saw a significant increase of dependence between SHCI and SZCI at shorter time scales but a lack of significant change at longer time scales, providing evidence of contagion and absence of interdependence during crisis.
Soave, David; Sun, Lei
2017-09-01
We generalize Levene's test for variance (scale) heterogeneity between k groups for more complex data, when there are sample correlation and group membership uncertainty. Following a two-stage regression framework, we show that least absolute deviation regression must be used in the stage 1 analysis to ensure a correct asymptotic χk-12/(k-1) distribution of the generalized scale (gS) test statistic. We then show that the proposed gS test is independent of the generalized location test, under the joint null hypothesis of no mean and no variance heterogeneity. Consequently, we generalize the recently proposed joint location-scale (gJLS) test, valuable in settings where there is an interaction effect but one interacting variable is not available. We evaluate the proposed method via an extensive simulation study and two genetic association application studies. © 2017 The Authors Biometrics published by Wiley Periodicals, Inc. on behalf of International Biometric Society.
Phelps, G.A.
2008-01-01
This report describes some simple spatial statistical methods to explore the relationships of scattered points to geologic or other features, represented by points, lines, or areas. It also describes statistical methods to search for linear trends and clustered patterns within the scattered point data. Scattered points are often contained within irregularly shaped study areas, necessitating the use of methods largely unexplored in the point pattern literature. The methods take advantage of the power of modern GIS toolkits to numerically approximate the null hypothesis of randomly located data within an irregular study area. Observed distributions can then be compared with the null distribution of a set of randomly located points. The methods are non-parametric and are applicable to irregularly shaped study areas. Patterns within the point data are examined by comparing the distribution of the orientation of the set of vectors defined by each pair of points within the data with the equivalent distribution for a random set of points within the study area. A simple model is proposed to describe linear or clustered structure within scattered data. A scattered data set of damage to pavement and pipes, recorded after the 1989 Loma Prieta earthquake, is used as an example to demonstrate the analytical techniques. The damage is found to be preferentially located nearer a set of mapped lineaments than randomly scattered damage, suggesting range-front faulting along the base of the Santa Cruz Mountains is related to both the earthquake damage and the mapped lineaments. The damage also exhibit two non-random patterns: a single cluster of damage centered in the town of Los Gatos, California, and a linear alignment of damage along the range front of the Santa Cruz Mountains, California. The linear alignment of damage is strongest between 45? and 50? northwest. This agrees well with the mean trend of the mapped lineaments, measured as 49? northwest.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lund, Amie K.; Goens, M. Beth; Nunez, Bethany A.
2006-04-15
The aryl hydrocarbon receptor (AhR) is a ligand-activated transcription factor characterized to play a role in detection and adaptation to environmental stimuli. Genetic deletion of AhR results in hypertension, and cardiac hypertrophy and fibrosis, associated with elevated plasma angiotensin II (Ang II) and endothelin-1 (ET-1), thus AhR appears to contribute to cardiovascular homeostasis. In these studies, we tested the hypothesis that ET-1 mediates cardiovascular pathology in AhR null mice via ET{sub A} receptor activation. First, we determine the time courses of cardiac hypertrophy, and of plasma and tissue ET-1 expression in AhR wildtype and null mice. AhR null mice exhibitedmore » increases in heart-to-body weight ratio and age-related expression of cardiac hypertrophy markers, {beta}-myosin heavy chain ({beta}-MHC), and atrial natriuretic factor (ANF), which were significant at 2 months. Similarly, plasma and tissue ET-1 expression was significantly elevated at 2 months and increased further with age. Second, AhR null mice were treated with ET{sub A} receptor antagonist, BQ-123 (100 nmol/kg/day), for 7, 28, or 58 days and blood pressure, cardiac fibrosis, and cardiac hypertrophy assessed, respectively. BQ-123 for 7 days significantly reduced mean arterial pressure in conscious, catheterized mice. BQ-123 for 28 days significantly reduced the histological appearance of cardiac fibrosis. Treatment for 58 days significantly reduced cardiac mass, assessed by heart weight, echocardiography, and {beta}-MHC and ANF expression; and reduced cardiac fibrosis as determined by osteopontin and collagen I mRNA expression. These findings establish ET-1 and the ET{sub A} receptor as primary determinants of hypertension and cardiac pathology in AhR null mice.« less
Repicky, Sarah; Broadie, Kendal
2009-02-01
Loss of the mRNA-binding protein FMRP results in the most common inherited form of both mental retardation and autism spectrum disorders: fragile X syndrome (FXS). The leading FXS hypothesis proposes that metabotropic glutamate receptor (mGluR) signaling at the synapse controls FMRP function in the regulation of local protein translation to modulate synaptic transmission strength. In this study, we use the Drosophila FXS disease model to test the relationship between Drosophila FMRP (dFMRP) and the sole Drosophila mGluR (dmGluRA) in regulation of synaptic function, using two-electrode voltage-clamp recording at the glutamatergic neuromuscular junction (NMJ). Null dmGluRA mutants show minimal changes in basal synapse properties but pronounced defects during sustained high-frequency stimulation (HFS). The double null dfmr1;dmGluRA mutant shows repression of enhanced augmentation and delayed onset of premature long-term facilitation (LTF) and strongly reduces grossly elevated post-tetanic potentiation (PTP) phenotypes present in dmGluRA-null animals. Null dfmr1 mutants show features of synaptic hyperexcitability, including multiple transmission events in response to a single stimulus and cyclic modulation of transmission amplitude during prolonged HFS. The double null dfmr1;dmGluRA mutant shows amelioration of these defects but does not fully restore wildtype properties in dfmr1-null animals. These data suggest that dmGluRA functions in a negative feedback loop in which excess glutamate released during high-frequency transmission binds the glutamate receptor to dampen synaptic excitability, and dFMRP functions to suppress the translation of proteins regulating this synaptic excitability. Removal of the translational regulator partially compensates for loss of the receptor and, similarly, loss of the receptor weakly compensates for loss of the translational regulator.
Comparison of body weight and gene expression in amelogenin null and wild-type mice.
Li, Yong; Yuan, Zhi-An; Aragon, Melissa A; Kulkarni, Ashok B; Gibson, Carolyn W
2006-05-01
Amelogenin (AmelX) null mice develop hypomineralized enamel lacking normal prism structure, but are healthy and fertile. Because these mice are smaller than wild-type mice prior to weaning, we undertook a detailed analysis of the weight of mice and analyzed AmelX expression in non-dental tissues. Wild-type mice had a greater average weight each day within the 3-wk period. Using reverse transcription-polymerase chain reaction (RT-PCR), products of approximately 200 bp in size were generated from wild-type teeth, brain, eye, and calvariae. DNA sequence analysis of RT-PCR products from calvariae indicated that the small amelogenin leucine-rich amelogenin peptide (LRAP), both with and without exon 4, was expressed. No products were obtained from any of the samples from the AmelX null mice. We also isolated mRNAs that included AmelX exons 8 and 9, and identified a duplication within the murine AmelX gene with 91% homology. Our results add additional support to the hypothesis that amelogenins are multifunctional proteins, with potential roles in non-ameloblasts and in non-mineralizing tissues during development. The smaller size of AmelX null mice could potentially be explained by the lack of LRAP expression in some of these tissues, leading to a delay in development.
Computational and Statistical Models: A Comparison for Policy Modeling of Childhood Obesity
NASA Astrophysics Data System (ADS)
Mabry, Patricia L.; Hammond, Ross; Ip, Edward Hak-Sing; Huang, Terry T.-K.
As systems science methodologies have begun to emerge as a set of innovative approaches to address complex problems in behavioral, social science, and public health research, some apparent conflicts with traditional statistical methodologies for public health have arisen. Computational modeling is an approach set in context that integrates diverse sources of data to test the plausibility of working hypotheses and to elicit novel ones. Statistical models are reductionist approaches geared towards proving the null hypothesis. While these two approaches may seem contrary to each other, we propose that they are in fact complementary and can be used jointly to advance solutions to complex problems. Outputs from statistical models can be fed into computational models, and outputs from computational models can lead to further empirical data collection and statistical models. Together, this presents an iterative process that refines the models and contributes to a greater understanding of the problem and its potential solutions. The purpose of this panel is to foster communication and understanding between statistical and computational modelers. Our goal is to shed light on the differences between the approaches and convey what kinds of research inquiries each one is best for addressing and how they can serve complementary (and synergistic) roles in the research process, to mutual benefit. For each approach the panel will cover the relevant "assumptions" and how the differences in what is assumed can foster misunderstandings. The interpretations of the results from each approach will be compared and contrasted and the limitations for each approach will be delineated. We will use illustrative examples from CompMod, the Comparative Modeling Network for Childhood Obesity Policy. The panel will also incorporate interactive discussions with the audience on the issues raised here.
Interpreting Null Findings from Trials of Alcohol Brief Interventions
Heather, Nick
2014-01-01
The effectiveness of alcohol brief intervention (ABI) has been established by a succession of meta-analyses but, because the effects of ABI are small, null findings from randomized controlled trials are often reported and can sometimes lead to skepticism regarding the benefits of ABI in routine practice. This article first explains why null findings are likely to occur under null hypothesis significance testing (NHST) due to the phenomenon known as “the dance of the p-values.” A number of misconceptions about null findings are then described, using as an example the way in which the results of the primary care arm of a recent cluster-randomized trial of ABI in England (the SIPS project) have been misunderstood. These misinterpretations include the fallacy of “proving the null hypothesis” that lack of a significant difference between the means of sample groups can be taken as evidence of no difference between their population means, and the possible effects of this and related misunderstandings of the SIPS findings are examined. The mistaken inference that reductions in alcohol consumption seen in control groups from baseline to follow-up are evidence of real effects of control group procedures is then discussed and other possible reasons for such reductions, including regression to the mean, research participation effects, historical trends, and assessment reactivity, are described. From the standpoint of scientific progress, the chief problem about null findings under the conventional NHST approach is that it is not possible to distinguish “evidence of absence” from “absence of evidence.” By contrast, under a Bayesian approach, such a distinction is possible and it is explained how this approach could classify ABIs in particular settings or among particular populations as either truly ineffective or as of unknown effectiveness, thus accelerating progress in the field of ABI research. PMID:25076917
Flowable composites for bonding orthodontic retainers.
Tabrizi, Sama; Salemis, Elio; Usumez, Serdar
2010-01-01
To test the null hypothesis that there are no statistically significant differences between flowables and an orthodontic adhesive tested in terms of shear bond strength (SBS) and pullout resistance. To test the SBS of Light Bond, FlowTain, Filtek Supreme, and Tetric Flow were applied to the enamel surfaces of 15 teeth. Using matrices for application, each composite material was cured for 40 seconds and subjected to SBS testing. To test pullout resistance, 15 samples were prepared for each composite in which a wire was embedded; then the composite was cured for 40 seconds. Later, the ends of the wire were drawn up and tensile stress was applied until the resin failed. Findings were analyzed using an ANOVA and a Tukey HSD test. The SBS values for Light Bond, FlowTain, Filtek Supreme, and Tetric Flow were 19.0 +/- 10.9, 14.7 +/- 9.3, 22.4 +/- 16.3, and 16.8 +/- 11.8 MPa, respectively, and mean pullout values were 42.2 +/- 13.0, 24.0 +/- 6.9, 26.3 +/- 9.4, and 33.8 +/- 18.0 N, respectively. No statistically significant differences were found among the groups in terms of SBS (P > .05). On the other hand, Light Bond yielded significantly higher pullout values compared with the flowables Filtek Supreme and Flow-Tain (P < .01). However, there were no significant differences among the pullout values of flowables, nor between Light Bond and Tetric Flow (P > .05). The hypothesis is rejected. Light Bond yielded significantly higher pullout values compared with the flowables Filtek Supreme and FlowTain. However, flowable composites provided satisfactory SBS and wire pullout values, comparable to a standard orthodontic resin, and therefore can be used as an alternative for direct bonding of lingual retainers.
Rietschel, Marcella; Mattheisen, Manuel; Breuer, René; Schulze, Thomas G.; Nöthen, Markus M.; Levinson, Douglas; Shi, Jianxin; Gejman, Pablo V.; Cichon, Sven; Ophoff, Roel A.
2012-01-01
Recent studies suggest that variation in complex disorders (e.g., schizophrenia) is explained by a large number of genetic variants with small effect size (Odds Ratio∼1.05–1.1). The statistical power to detect these genetic variants in Genome Wide Association (GWA) studies with large numbers of cases and controls (∼15,000) is still low. As it will be difficult to further increase sample size, we decided to explore an alternative method for analyzing GWA data in a study of schizophrenia, dramatically reducing the number of statistical tests. The underlying hypothesis was that at least some of the genetic variants related to a common outcome are collocated in segments of chromosomes at a wider scale than single genes. Our approach was therefore to study the association between relatively large segments of DNA and disease status. An association test was performed for each SNP and the number of nominally significant tests in a segment was counted. We then performed a permutation-based binomial test to determine whether this region contained significantly more nominally significant SNPs than expected under the null hypothesis of no association, taking linkage into account. Genome Wide Association data of three independent schizophrenia case/control cohorts with European ancestry (Dutch, German, and US) using segments of DNA with variable length (2 to 32 Mbp) was analyzed. Using this approach we identified a region at chromosome 5q23.3-q31.3 (128–160 Mbp) that was significantly enriched with nominally associated SNPs in three independent case-control samples. We conclude that considering relatively wide segments of chromosomes may reveal reliable relationships between the genome and schizophrenia, suggesting novel methodological possibilities as well as raising theoretical questions. PMID:22723893
Sutton, Wesley K; Knight, Alec; Underhill, Peter A; Neulander, Judith S; Disotell, Todd R; Mountain, Joanna L
2006-01-01
The ethnic heritage of northernmost New Spain, including present-day northern New Mexico and southernmost Colorado, USA, is intensely debated. Local Spanish-American folkways and anecdotal narratives led to claims that the region was colonized primarily by secret- or crypto-Jews. Despite ethnographic criticisms, the notion of substantial crypto-Jewish ancestry among Spanish-Americans persists. We tested the null hypothesis that Spanish-Americans of northern New Mexico carry essentially the same profile of paternally inherited DNA variation as the peoples of Iberia, and the relevant alternative hypothesis that the sampled Spanish-Americans possess inherited DNA variation that reflects Jewish ancestry significantly greater than that in present-day Iberia. We report frequencies of 19 Y-chromosome unique event polymorphism (UEP) biallelic markers for 139 men from across northern New Mexico and southern Colorado, USA, who self-identify as 'Spanish-American'. We used three different statistical tests of differentiation to compare frequencies of major UEP-defined clades or haplogroups with published data for Iberians, Jews, and other Mediterranean populations. We also report frequencies of derived UEP markers within each major haplogroup, compared with published data for relevant populations. All tests of differentiation showed that, for frequencies of the major UEP-defined clades, Spanish-Americans and Iberians are statistically indistinguishable. All other pairwise comparisons, including between Spanish-Americans and Jews, and Iberians and Jews, revealed highly significant differences in UEP frequencies. Our results indicate that paternal genetic inheritance of Spanish-Americans is indistinguishable from that of Iberians and refute the popular and widely publicized scenario of significant crypto-Jewish ancestry of the Spanish-American population.
Kim, Young Kyung; Park, Hyo-Sang; Kim, Kyo-Han; Kwon, Tae-Yub
2015-10-01
To test the null hypothesis that neither the flexural properties of orthodontic adhesive resins nor the enamel pre-treatment methods would affect metal bracket debonding behaviours, including enamel fracture. A dimethacrylate-based resin (Transbond XT, TX) and two methyl methacrylate (MMA)-based resins (Super-Bond C&B, SB; an experimental light-cured resin, EXP) were tested. Flexural strength and flexural modulus for each resin were measured by a three-point-bending test. Metal brackets were bonded to human enamel pretreated with total-etch (TE) or self-etch adhesive using one of the three resins (a total of six groups, n = 15). After 24 hours of storage in water at 37°C, a shear bond strength (SBS) test was performed using the wire loop method. After debonding, remaining resin on the enamel surfaces and occurrence of enamel fracture were assessed. Statistical significance was set at P < 0.05. The two MMA resins exhibited substantially lower flexural strength and modulus values than the TX resin. The mean SBS values of all groups (10.15-11.09MPa) were statistically equivalent to one another (P > 0.05), except for the TE-TX group (13.51MPa, P < 0.05). The two EXP groups showed less resin remnant. Only in the two TX groups were enamel fractures observed (three cases for each group). The results were drawn only from ex vivo experiments. The hypothesis is rejected. This study suggests that a more flexible MMA resin is favourable for avoiding enamel fracture during metal bracket debonding. © The Author 2014. Published by Oxford University Press on behalf of the European Orthodontic Society. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Behavior of the maximum likelihood in quantum state tomography
NASA Astrophysics Data System (ADS)
Scholten, Travis L.; Blume-Kohout, Robin
2018-02-01
Quantum state tomography on a d-dimensional system demands resources that grow rapidly with d. They may be reduced by using model selection to tailor the number of parameters in the model (i.e., the size of the density matrix). Most model selection methods typically rely on a test statistic and a null theory that describes its behavior when two models are equally good. Here, we consider the loglikelihood ratio. Because of the positivity constraint ρ ≥ 0, quantum state space does not generally satisfy local asymptotic normality (LAN), meaning the classical null theory for the loglikelihood ratio (the Wilks theorem) should not be used. Thus, understanding and quantifying how positivity affects the null behavior of this test statistic is necessary for its use in model selection for state tomography. We define a new generalization of LAN, metric-projected LAN, show that quantum state space satisfies it, and derive a replacement for the Wilks theorem. In addition to enabling reliable model selection, our results shed more light on the qualitative effects of the positivity constraint on state tomography.
Behavior of the maximum likelihood in quantum state tomography
Blume-Kohout, Robin J; Scholten, Travis L.
2018-02-22
Quantum state tomography on a d-dimensional system demands resources that grow rapidly with d. They may be reduced by using model selection to tailor the number of parameters in the model (i.e., the size of the density matrix). Most model selection methods typically rely on a test statistic and a null theory that describes its behavior when two models are equally good. Here, we consider the loglikelihood ratio. Because of the positivity constraint ρ ≥ 0, quantum state space does not generally satisfy local asymptotic normality (LAN), meaning the classical null theory for the loglikelihood ratio (the Wilks theorem) shouldmore » not be used. Thus, understanding and quantifying how positivity affects the null behavior of this test statistic is necessary for its use in model selection for state tomography. We define a new generalization of LAN, metric-projected LAN, show that quantum state space satisfies it, and derive a replacement for the Wilks theorem. In addition to enabling reliable model selection, our results shed more light on the qualitative effects of the positivity constraint on state tomography.« less