[Dilemma of null hypothesis in ecological hypothesis's experiment test.
Li, Ji
2016-06-01
Experimental test is one of the major test methods of ecological hypothesis, though there are many arguments due to null hypothesis. Quinn and Dunham (1983) analyzed the hypothesis deduction model from Platt (1964) and thus stated that there is no null hypothesis in ecology that can be strictly tested by experiments. Fisher's falsificationism and Neyman-Pearson (N-P)'s non-decisivity inhibit statistical null hypothesis from being strictly tested. Moreover, since the null hypothesis H 0 (α=1, β=0) and alternative hypothesis H 1 '(α'=1, β'=0) in ecological progresses are diffe-rent from classic physics, the ecological null hypothesis can neither be strictly tested experimentally. These dilemmas of null hypothesis could be relieved via the reduction of P value, careful selection of null hypothesis, non-centralization of non-null hypothesis, and two-tailed test. However, the statistical null hypothesis significance testing (NHST) should not to be equivalent to the causality logistical test in ecological hypothesis. Hence, the findings and conclusions about methodological studies and experimental tests based on NHST are not always logically reliable.
P value and the theory of hypothesis testing: an explanation for new researchers.
Biau, David Jean; Jolles, Brigitte M; Porcher, Raphaël
2010-03-01
In the 1920s, Ronald Fisher developed the theory behind the p value and Jerzy Neyman and Egon Pearson developed the theory of hypothesis testing. These distinct theories have provided researchers important quantitative tools to confirm or refute their hypotheses. The p value is the probability to obtain an effect equal to or more extreme than the one observed presuming the null hypothesis of no effect is true; it gives researchers a measure of the strength of evidence against the null hypothesis. As commonly used, investigators will select a threshold p value below which they will reject the null hypothesis. The theory of hypothesis testing allows researchers to reject a null hypothesis in favor of an alternative hypothesis of some effect. As commonly used, investigators choose Type I error (rejecting the null hypothesis when it is true) and Type II error (accepting the null hypothesis when it is false) levels and determine some critical region. If the test statistic falls into that critical region, the null hypothesis is rejected in favor of the alternative hypothesis. Despite similarities between the two, the p value and the theory of hypothesis testing are different theories that often are misunderstood and confused, leading researchers to improper conclusions. Perhaps the most common misconception is to consider the p value as the probability that the null hypothesis is true rather than the probability of obtaining the difference observed, or one that is more extreme, considering the null is true. Another concern is the risk that an important proportion of statistically significant results are falsely significant. Researchers should have a minimum understanding of these two theories so that they are better able to plan, conduct, interpret, and report scientific experiments.
A Critique of One-Tailed Hypothesis Test Procedures in Business and Economics Statistics Textbooks.
ERIC Educational Resources Information Center
Liu, Tung; Stone, Courtenay C.
1999-01-01
Surveys introductory business and economics statistics textbooks and finds that they differ over the best way to explain one-tailed hypothesis tests: the simple null-hypothesis approach or the composite null-hypothesis approach. Argues that the composite null-hypothesis approach contains methodological shortcomings that make it more difficult for…
Explorations in statistics: hypothesis tests and P values.
Curran-Everett, Douglas
2009-06-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This second installment of Explorations in Statistics delves into test statistics and P values, two concepts fundamental to the test of a scientific null hypothesis. The essence of a test statistic is that it compares what we observe in the experiment to what we expect to see if the null hypothesis is true. The P value associated with the magnitude of that test statistic answers this question: if the null hypothesis is true, what proportion of possible values of the test statistic are at least as extreme as the one I got? Although statisticians continue to stress the limitations of hypothesis tests, there are two realities we must acknowledge: hypothesis tests are ingrained within science, and the simple test of a null hypothesis can be useful. As a result, it behooves us to explore the notions of hypothesis tests, test statistics, and P values.
Testing the null hypothesis: the forgotten legacy of Karl Popper?
Wilkinson, Mick
2013-01-01
Testing of the null hypothesis is a fundamental aspect of the scientific method and has its basis in the falsification theory of Karl Popper. Null hypothesis testing makes use of deductive reasoning to ensure that the truth of conclusions is irrefutable. In contrast, attempting to demonstrate the new facts on the basis of testing the experimental or research hypothesis makes use of inductive reasoning and is prone to the problem of the Uniformity of Nature assumption described by David Hume in the eighteenth century. Despite this issue and the well documented solution provided by Popper's falsification theory, the majority of publications are still written such that they suggest the research hypothesis is being tested. This is contrary to accepted scientific convention and possibly highlights a poor understanding of the application of conventional significance-based data analysis approaches. Our work should remain driven by conjecture and attempted falsification such that it is always the null hypothesis that is tested. The write up of our studies should make it clear that we are indeed testing the null hypothesis and conforming to the established and accepted philosophical conventions of the scientific method.
A statistical test to show negligible trend
Philip M. Dixon; Joseph H.K. Pechmann
2005-01-01
The usual statistical tests of trend are inappropriate for demonstrating the absence of trend. This is because failure to reject the null hypothesis of no trend does not prove that null hypothesis. The appropriate statistical method is based on an equivalence test. The null hypothesis is that the trend is not zero, i.e., outside an a priori specified equivalence region...
An omnibus test for the global null hypothesis.
Futschik, Andreas; Taus, Thomas; Zehetmayer, Sonja
2018-01-01
Global hypothesis tests are a useful tool in the context of clinical trials, genetic studies, or meta-analyses, when researchers are not interested in testing individual hypotheses, but in testing whether none of the hypotheses is false. There are several possibilities how to test the global null hypothesis when the individual null hypotheses are independent. If it is assumed that many of the individual null hypotheses are false, combination tests have been recommended to maximize power. If, however, it is assumed that only one or a few null hypotheses are false, global tests based on individual test statistics are more powerful (e.g. Bonferroni or Simes test). However, usually there is no a priori knowledge on the number of false individual null hypotheses. We therefore propose an omnibus test based on cumulative sums of the transformed p-values. We show that this test yields an impressive overall performance. The proposed method is implemented in an R-package called omnibus.
ERIC Educational Resources Information Center
Trafimow, David
2017-01-01
There has been much controversy over the null hypothesis significance testing procedure, with much of the criticism centered on the problem of inverse inference. Specifically, p gives the probability of the finding (or one more extreme) given the null hypothesis, whereas the null hypothesis significance testing procedure involves drawing a…
Null but not void: considerations for hypothesis testing.
Shaw, Pamela A; Proschan, Michael A
2013-01-30
Standard statistical theory teaches us that once the null and alternative hypotheses have been defined for a parameter, the choice of the statistical test is clear. Standard theory does not teach us how to choose the null or alternative hypothesis appropriate to the scientific question of interest. Neither does it tell us that in some cases, depending on which alternatives are realistic, we may want to define our null hypothesis differently. Problems in statistical practice are frequently not as pristinely summarized as the classic theory in our textbooks. In this article, we present examples in statistical hypothesis testing in which seemingly simple choices are in fact rich with nuance that, when given full consideration, make the choice of the right hypothesis test much less straightforward. Published 2012. This article is a US Government work and is in the public domain in the USA.
Chiba, Yasutaka
2017-09-01
Fisher's exact test is commonly used to compare two groups when the outcome is binary in randomized trials. In the context of causal inference, this test explores the sharp causal null hypothesis (i.e. the causal effect of treatment is the same for all subjects), but not the weak causal null hypothesis (i.e. the causal risks are the same in the two groups). Therefore, in general, rejection of the null hypothesis by Fisher's exact test does not mean that the causal risk difference is not zero. Recently, Chiba (Journal of Biometrics and Biostatistics 2015; 6: 244) developed a new exact test for the weak causal null hypothesis when the outcome is binary in randomized trials; the new test is not based on any large sample theory and does not require any assumption. In this paper, we extend the new test; we create a version of the test applicable to a stratified analysis. The stratified exact test that we propose is general in nature and can be used in several approaches toward the estimation of treatment effects after adjusting for stratification factors. The stratified Fisher's exact test of Jung (Biometrical Journal 2014; 56: 129-140) tests the sharp causal null hypothesis. This test applies a crude estimator of the treatment effect and can be regarded as a special case of our proposed exact test. Our proposed stratified exact test can be straightforwardly extended to analysis of noninferiority trials and to construct the associated confidence interval. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Bayes factor and posterior probability: Complementary statistical evidence to p-value.
Lin, Ruitao; Yin, Guosheng
2015-09-01
As a convention, a p-value is often computed in hypothesis testing and compared with the nominal level of 0.05 to determine whether to reject the null hypothesis. Although the smaller the p-value, the more significant the statistical test, it is difficult to perceive the p-value in a probability scale and quantify it as the strength of the data against the null hypothesis. In contrast, the Bayesian posterior probability of the null hypothesis has an explicit interpretation of how strong the data support the null. We make a comparison of the p-value and the posterior probability by considering a recent clinical trial. The results show that even when we reject the null hypothesis, there is still a substantial probability (around 20%) that the null is true. Not only should we examine whether the data would have rarely occurred under the null hypothesis, but we also need to know whether the data would be rare under the alternative. As a result, the p-value only provides one side of the information, for which the Bayes factor and posterior probability may offer complementary evidence. Copyright © 2015 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Wilcox, Rand R.; Serang, Sarfaraz
2017-01-01
The article provides perspectives on p values, null hypothesis testing, and alternative techniques in light of modern robust statistical methods. Null hypothesis testing and "p" values can provide useful information provided they are interpreted in a sound manner, which includes taking into account insights and advances that have…
Hypothesis Testing Using Spatially Dependent Heavy Tailed Multisensor Data
2014-12-01
Office of Research 113 Bowne Hall Syracuse, NY 13244 -1200 ABSTRACT HYPOTHESIS TESTING USING SPATIALLY DEPENDENT HEAVY-TAILED MULTISENSOR DATA Report...consistent with the null hypothesis of linearity and can be used to estimate the distribution of a test statistic that can discrimi- nate between the null... Test for nonlinearity. Histogram is generated using the surrogate data. The statistic of the original time series is represented by the solid line
The Harm Done to Reproducibility by the Culture of Null Hypothesis Significance Testing.
Lash, Timothy L
2017-09-15
In the last few years, stakeholders in the scientific community have raised alarms about a perceived lack of reproducibility of scientific results. In reaction, guidelines for journals have been promulgated and grant applicants have been asked to address the rigor and reproducibility of their proposed projects. Neither solution addresses a primary culprit, which is the culture of null hypothesis significance testing that dominates statistical analysis and inference. In an innovative research enterprise, selection of results for further evaluation based on null hypothesis significance testing is doomed to yield a low proportion of reproducible results and a high proportion of effects that are initially overestimated. In addition, the culture of null hypothesis significance testing discourages quantitative adjustments to account for systematic errors and quantitative incorporation of prior information. These strategies would otherwise improve reproducibility and have not been previously proposed in the widely cited literature on this topic. Without discarding the culture of null hypothesis significance testing and implementing these alternative methods for statistical analysis and inference, all other strategies for improving reproducibility will yield marginal gains at best. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
ERIC Educational Resources Information Center
Marmolejo-Ramos, Fernando; Cousineau, Denis
2017-01-01
The number of articles showing dissatisfaction with the null hypothesis statistical testing (NHST) framework has been progressively increasing over the years. Alternatives to NHST have been proposed and the Bayesian approach seems to have achieved the highest amount of visibility. In this last part of the special issue, a few alternative…
An Extension of RSS-based Model Comparison Tests for Weighted Least Squares
2012-08-22
use the model comparison test statistic to analyze the null hypothesis. Under the null hypothesis, the weighted least squares cost functional is JWLS ...q̂WLSH ) = 10.3040×106. Under the alternative hypothesis, the weighted least squares cost functional is JWLS (q̂WLS) = 8.8394 × 106. Thus the model
Bundschuh, Mirco; Newman, Michael C; Zubrod, Jochen P; Seitz, Frank; Rosenfeldt, Ricki R; Schulz, Ralf
2015-03-01
We argued recently that the positive predictive value (PPV) and the negative predictive value (NPV) are valuable metrics to include during null hypothesis significance testing: They inform the researcher about the probability of statistically significant and non-significant test outcomes actually being true. Although commonly misunderstood, a reported p value estimates only the probability of obtaining the results or more extreme results if the null hypothesis of no effect was true. Calculations of the more informative PPV and NPV require a priori estimate of the probability (R). The present document discusses challenges of estimating R.
Killeen's (2005) "p[subscript rep]" Coefficient: Logical and Mathematical Problems
ERIC Educational Resources Information Center
Maraun, Michael; Gabriel, Stephanie
2010-01-01
In his article, "An Alternative to Null-Hypothesis Significance Tests," Killeen (2005) urged the discipline to abandon the practice of "p[subscript obs]"-based null hypothesis testing and to quantify the signal-to-noise characteristics of experimental outcomes with replication probabilities. He described the coefficient that he…
ERIC Educational Resources Information Center
Tryon, Warren W.; Lewis, Charles
2008-01-01
Evidence of group matching frequently takes the form of a nonsignificant test of statistical difference. Theoretical hypotheses of no difference are also tested in this way. These practices are flawed in that null hypothesis statistical testing provides evidence against the null hypothesis and failing to reject H[subscript 0] is not evidence…
The Need for Nuance in the Null Hypothesis Significance Testing Debate
ERIC Educational Resources Information Center
Häggström, Olle
2017-01-01
Null hypothesis significance testing (NHST) provides an important statistical toolbox, but there are a number of ways in which it is often abused and misinterpreted, with bad consequences for the reliability and progress of science. Parts of contemporary NHST debate, especially in the psychological sciences, is reviewed, and a suggestion is made…
Shaping Up the Practice of Null Hypothesis Significance Testing.
ERIC Educational Resources Information Center
Wainer, Howard; Robinson, Daniel H.
2003-01-01
Discusses criticisms of null hypothesis significance testing (NHST), suggesting that historical use of NHST was reasonable, and current users should read Sir Ronald Fisher's applied work. Notes that modifications to NHST and interpretations of its outcomes might better suit the needs of modern science. Concludes that NHST is most often useful as…
Thou Shalt Not Bear False Witness against Null Hypothesis Significance Testing
ERIC Educational Resources Information Center
García-Pérez, Miguel A.
2017-01-01
Null hypothesis significance testing (NHST) has been the subject of debate for decades and alternative approaches to data analysis have been proposed. This article addresses this debate from the perspective of scientific inquiry and inference. Inference is an inverse problem and application of statistical methods cannot reveal whether effects…
The Impact of Economic Factors and Acquisition Reforms on the Cost of Defense Weapon Systems
2006-03-01
test for homoskedasticity, the Breusch - Pagan test is employed. The null hypothesis of the Breusch - Pagan test is that the variance is equal to zero...made. Using the Breusch - Pagan test shown in Table 19 below, the prob>chi2 is greater than 05.=α , therefore we fail to reject the null hypothesis...overrunpercentfp100 Breusch - Pagan Test (Ho=Constant Variance) Estimated Results Variance Standard Deviation overrunpercent100
Concerns regarding a call for pluralism of information theory and hypothesis testing
Lukacs, P.M.; Thompson, W.L.; Kendall, W.L.; Gould, W.R.; Doherty, P.F.; Burnham, K.P.; Anderson, D.R.
2007-01-01
1. Stephens et al . (2005) argue for `pluralism? in statistical analysis, combining null hypothesis testing and information-theoretic (I-T) methods. We show that I-T methods are more informative even in single variable problems and we provide an ecological example. 2. I-T methods allow inferences to be made from multiple models simultaneously. We believe multimodel inference is the future of data analysis, which cannot be achieved with null hypothesis-testing approaches. 3. We argue for a stronger emphasis on critical thinking in science in general and less reliance on exploratory data analysis and data dredging. Deriving alternative hypotheses is central to science; deriving a single interesting science hypothesis and then comparing it to a default null hypothesis (e.g. `no difference?) is not an efficient strategy for gaining knowledge. We think this single-hypothesis strategy has been relied upon too often in the past. 4. We clarify misconceptions presented by Stephens et al . (2005). 5. We think inference should be made about models, directly linked to scientific hypotheses, and their parameters conditioned on data, Prob(Hj| data). I-T methods provide a basis for this inference. Null hypothesis testing merely provides a probability statement about the data conditioned on a null model, Prob(data |H0). 6. Synthesis and applications. I-T methods provide a more informative approach to inference. I-T methods provide a direct measure of evidence for or against hypotheses and a means to consider simultaneously multiple hypotheses as a basis for rigorous inference. Progress in our science can be accelerated if modern methods can be used intelligently; this includes various I-T and Bayesian methods.
Suggestions for presenting the results of data analyses
Anderson, David R.; Link, William A.; Johnson, Douglas H.; Burnham, Kenneth P.
2001-01-01
We give suggestions for the presentation of research results from frequentist, information-theoretic, and Bayesian analysis paradigms, followed by several general suggestions. The information-theoretic and Bayesian methods offer alternative approaches to data analysis and inference compared to traditionally used methods. Guidance is lacking on the presentation of results under these alternative procedures and on nontesting aspects of classical frequentists methods of statistical analysis. Null hypothesis testing has come under intense criticism. We recommend less reporting of the results of statistical tests of null hypotheses in cases where the null is surely false anyway, or where the null hypothesis is of little interest to science or management.
Testing of Hypothesis in Equivalence and Non Inferiority Trials-A Concept.
Juneja, Atul; Aggarwal, Abha R; Adhikari, Tulsi; Pandey, Arvind
2016-04-01
Establishing the appropriate hypothesis is one of the important steps for carrying out the statistical tests/analysis. Its understanding is important for interpreting the results of statistical analysis. The current communication attempts to provide the concept of testing of hypothesis in non inferiority and equivalence trials, where the null hypothesis is just reverse of what is set up for conventional superiority trials. It is similarly looked for rejection for establishing the fact the researcher is intending to prove. It is important to mention that equivalence or non inferiority cannot be proved by accepting the null hypothesis of no difference. Hence, establishing the appropriate statistical hypothesis is extremely important to arrive at meaningful conclusion for the set objectives in research.
Optimizing Aircraft Availability: Where to Spend Your Next O&M Dollar
2010-03-01
patterns of variance are present. In addition, we use the Breusch - Pagan test to statistically determine whether homoscedasticity exists. For this... Breusch - Pagan test , large p-values are preferred so that we may accept the null hypothesis of normality. Failure to meet the fourth assumption is...Next, we show the residual by predicted plot and the Breusch - Pagan test for constant variance of the residuals. The null hypothesis is that the
In the Beginning-There Is the Introduction-and Your Study Hypothesis.
Vetter, Thomas R; Mascha, Edward J
2017-05-01
Writing a manuscript for a medical journal is very akin to writing a newspaper article-albeit a scholarly one. Like any journalist, you have a story to tell. You need to tell your story in a way that is easy to follow and makes a compelling case to the reader. Although recommended since the beginning of the 20th century, the conventional Introduction-Methods-Results-And-Discussion (IMRAD) scientific reporting structure has only been the standard since the 1980s. The Introduction should be focused and succinct in communicating the significance, background, rationale, study aims or objectives, and the primary (and secondary, if appropriate) study hypotheses. Hypothesis testing involves posing both a null and an alternative hypothesis. The null hypothesis proposes that no difference or association exists on the outcome variable of interest between the interventions or groups being compared. The alternative hypothesis is the opposite of the null hypothesis and thus typically proposes that a difference in the population does exist between the groups being compared on the parameter of interest. Most investigators seek to reject the null hypothesis because of their expectation that the studied intervention does result in a difference between the study groups or that the association of interest does exist. Therefore, in most clinical and basic science studies and manuscripts, the alternative hypothesis is stated, not the null hypothesis. Also, in the Introduction, the alternative hypothesis is typically stated in the direction of interest, or the expected direction. However, when assessing the association of interest, researchers typically look in both directions (ie, favoring 1 group or the other) by conducting a 2-tailed statistical test because the true direction of the effect is typically not known, and either direction would be important to report.
ERIC Educational Resources Information Center
Paek, Insu
2010-01-01
Conservative bias in rejection of a null hypothesis from using the continuity correction in the Mantel-Haenszel (MH) procedure was examined through simulation in a differential item functioning (DIF) investigation context in which statistical testing uses a prespecified level [alpha] for the decision on an item with respect to DIF. The standard MH…
ERIC Educational Resources Information Center
LeMire, Steven D.
2010-01-01
This paper proposes an argument framework for the teaching of null hypothesis statistical testing and its application in support of research. Elements of the Toulmin (1958) model of argument are used to illustrate the use of p values and Type I and Type II error rates in support of claims about statistical parameters and subject matter research…
Explorations in Statistics: Power
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2010-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This fifth installment of "Explorations in Statistics" revisits power, a concept fundamental to the test of a null hypothesis. Power is the probability that we reject the null hypothesis when it is false. Four…
What Constitutes Science and Scientific Evidence: Roles of Null Hypothesis Testing
ERIC Educational Resources Information Center
Chang, Mark
2017-01-01
We briefly discuss the philosophical basis of science, causality, and scientific evidence, by introducing the hidden but most fundamental principle of science: the similarity principle. The principle's use in scientific discovery is illustrated with Simpson's paradox and other examples. In discussing the value of null hypothesis statistical…
Unicorns do exist: a tutorial on "proving" the null hypothesis.
Streiner, David L
2003-12-01
Introductory statistics classes teach us that we can never prove the null hypothesis; all we can do is reject or fail to reject it. However, there are times when it is necessary to try to prove the nonexistence of a difference between groups. This most often happens within the context of comparing a new treatment against an established one and showing that the new intervention is not inferior to the standard. This article first outlines the logic of "noninferiority" testing by differentiating between the null hypothesis (that which we are trying to nullify) and the "nill" hypothesis (there is no difference), reversing the role of the null and alternate hypotheses, and defining an interval within which groups are said to be equivalent. We then work through an example and show how to calculate sample sizes for noninferiority studies.
Hemenway, David
2009-09-01
Hypothesis testing can be misused and misinterpreted in various ways. Limitations in the research design, for example, can make it almost impossible to reject the null hypothesis that a policy has no effect. This article discusses two examples of such experimental designs and analyses, in which, unfortunately, the researchers touted their null results as strong evidence of no effect.
2011-03-01
1.179 1 22 .289 POP-UP .000 1 22 .991 Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design ...POP-UP 2.104 1 22 .161 Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design : Intercept... design also limited the number of intended treatments. The experimental design originally was suppose to test all three adverse events that threaten
Unadjusted Bivariate Two-Group Comparisons: When Simpler is Better.
Vetter, Thomas R; Mascha, Edward J
2018-01-01
Hypothesis testing involves posing both a null hypothesis and an alternative hypothesis. This basic statistical tutorial discusses the appropriate use, including their so-called assumptions, of the common unadjusted bivariate tests for hypothesis testing and thus comparing study sample data for a difference or association. The appropriate choice of a statistical test is predicated on the type of data being analyzed and compared. The unpaired or independent samples t test is used to test the null hypothesis that the 2 population means are equal, thereby accepting the alternative hypothesis that the 2 population means are not equal. The unpaired t test is intended for comparing dependent continuous (interval or ratio) data from 2 study groups. A common mistake is to apply several unpaired t tests when comparing data from 3 or more study groups. In this situation, an analysis of variance with post hoc (posttest) intragroup comparisons should instead be applied. Another common mistake is to apply a series of unpaired t tests when comparing sequentially collected data from 2 study groups. In this situation, a repeated-measures analysis of variance, with tests for group-by-time interaction, and post hoc comparisons, as appropriate, should instead be applied in analyzing data from sequential collection points. The paired t test is used to assess the difference in the means of 2 study groups when the sample observations have been obtained in pairs, often before and after an intervention in each study subject. The Pearson chi-square test is widely used to test the null hypothesis that 2 unpaired categorical variables, each with 2 or more nominal levels (values), are independent of each other. When the null hypothesis is rejected, 1 concludes that there is a probable association between the 2 unpaired categorical variables. When comparing 2 groups on an ordinal or nonnormally distributed continuous outcome variable, the 2-sample t test is usually not appropriate. The Wilcoxon-Mann-Whitney test is instead preferred. When making paired comparisons on data that are ordinal, or continuous but nonnormally distributed, the Wilcoxon signed-rank test can be used. In analyzing their data, researchers should consider the continued merits of these simple yet equally valid unadjusted bivariate statistical tests. However, the appropriate use of an unadjusted bivariate test still requires a solid understanding of its utility, assumptions (requirements), and limitations. This understanding will mitigate the risk of misleading findings, interpretations, and conclusions.
Shi, Haolun; Yin, Guosheng
2018-02-21
Simon's two-stage design is one of the most commonly used methods in phase II clinical trials with binary endpoints. The design tests the null hypothesis that the response rate is less than an uninteresting level, versus the alternative hypothesis that the response rate is greater than a desirable target level. From a Bayesian perspective, we compute the posterior probabilities of the null and alternative hypotheses given that a promising result is declared in Simon's design. Our study reveals that because the frequentist hypothesis testing framework places its focus on the null hypothesis, a potentially efficacious treatment identified by rejecting the null under Simon's design could have only less than 10% posterior probability of attaining the desirable target level. Due to the indifference region between the null and alternative, rejecting the null does not necessarily mean that the drug achieves the desirable response level. To clarify such ambiguity, we propose a Bayesian enhancement two-stage (BET) design, which guarantees a high posterior probability of the response rate reaching the target level, while allowing for early termination and sample size saving in case that the drug's response rate is smaller than the clinically uninteresting level. Moreover, the BET design can be naturally adapted to accommodate survival endpoints. We conduct extensive simulation studies to examine the empirical performance of our design and present two trial examples as applications. © 2018, The International Biometric Society.
Hypothesis Testing in the Real World
ERIC Educational Resources Information Center
Miller, Jeff
2017-01-01
Critics of null hypothesis significance testing suggest that (a) its basic logic is invalid and (b) it addresses a question that is of no interest. In contrast to (a), I argue that the underlying logic of hypothesis testing is actually extremely straightforward and compelling. To substantiate that, I present examples showing that hypothesis…
Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie
2013-01-01
Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.
When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment.
Szucs, Denes; Ioannidis, John P A
2017-01-01
Null hypothesis significance testing (NHST) has several shortcomings that are likely contributing factors behind the widely debated replication crisis of (cognitive) neuroscience, psychology, and biomedical science in general. We review these shortcomings and suggest that, after sustained negative experience, NHST should no longer be the default, dominant statistical practice of all biomedical and psychological research. If theoretical predictions are weak we should not rely on all or nothing hypothesis tests. Different inferential methods may be most suitable for different types of research questions. Whenever researchers use NHST they should justify its use, and publish pre-study power calculations and effect sizes, including negative findings. Hypothesis-testing studies should be pre-registered and optimally raw data published. The current statistics lite educational approach for students that has sustained the widespread, spurious use of NHST should be phased out.
When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment
Szucs, Denes; Ioannidis, John P. A.
2017-01-01
Null hypothesis significance testing (NHST) has several shortcomings that are likely contributing factors behind the widely debated replication crisis of (cognitive) neuroscience, psychology, and biomedical science in general. We review these shortcomings and suggest that, after sustained negative experience, NHST should no longer be the default, dominant statistical practice of all biomedical and psychological research. If theoretical predictions are weak we should not rely on all or nothing hypothesis tests. Different inferential methods may be most suitable for different types of research questions. Whenever researchers use NHST they should justify its use, and publish pre-study power calculations and effect sizes, including negative findings. Hypothesis-testing studies should be pre-registered and optimally raw data published. The current statistics lite educational approach for students that has sustained the widespread, spurious use of NHST should be phased out. PMID:28824397
A large scale test of the gaming-enhancement hypothesis.
Przybylski, Andrew K; Wang, John C
2016-01-01
A growing research literature suggests that regular electronic game play and game-based training programs may confer practically significant benefits to cognitive functioning. Most evidence supporting this idea, the gaming-enhancement hypothesis , has been collected in small-scale studies of university students and older adults. This research investigated the hypothesis in a general way with a large sample of 1,847 school-aged children. Our aim was to examine the relations between young people's gaming experiences and an objective test of reasoning performance. Using a Bayesian hypothesis testing approach, evidence for the gaming-enhancement and null hypotheses were compared. Results provided no substantive evidence supporting the idea that having preference for or regularly playing commercially available games was positively associated with reasoning ability. Evidence ranged from equivocal to very strong in support for the null hypothesis over what was predicted. The discussion focuses on the value of Bayesian hypothesis testing for investigating electronic gaming effects, the importance of open science practices, and pre-registered designs to improve the quality of future work.
Power Enhancement in High Dimensional Cross-Sectional Tests
Fan, Jianqing; Liao, Yuan; Yao, Jiawei
2016-01-01
We propose a novel technique to boost the power of testing a high-dimensional vector H : θ = 0 against sparse alternatives where the null hypothesis is violated only by a couple of components. Existing tests based on quadratic forms such as the Wald statistic often suffer from low powers due to the accumulation of errors in estimating high-dimensional parameters. More powerful tests for sparse alternatives such as thresholding and extreme-value tests, on the other hand, require either stringent conditions or bootstrap to derive the null distribution and often suffer from size distortions due to the slow convergence. Based on a screening technique, we introduce a “power enhancement component”, which is zero under the null hypothesis with high probability, but diverges quickly under sparse alternatives. The proposed test statistic combines the power enhancement component with an asymptotically pivotal statistic, and strengthens the power under sparse alternatives. The null distribution does not require stringent regularity conditions, and is completely determined by that of the pivotal statistic. As specific applications, the proposed methods are applied to testing the factor pricing models and validating the cross-sectional independence in panel data models. PMID:26778846
Bayesian Methods for Determining the Importance of Effects
USDA-ARS?s Scientific Manuscript database
Criticisms have plagued the frequentist null-hypothesis significance testing (NHST) procedure since the day it was created from the Fisher Significance Test and Hypothesis Test of Jerzy Neyman and Egon Pearson. Alternatives to NHST exist in frequentist statistics, but competing methods are also avai...
Testing for purchasing power parity in the long-run for ASEAN-5
NASA Astrophysics Data System (ADS)
Choji, Niri Martha; Sek, Siok Kun
2017-04-01
For more than a decade, there has been a substantial interest in testing for the validity of the purchasing power parity (PPP) hypothesis empirically. This paper performs a test on revealing a long-run relative Purchasing Power Parity for a group of ASEAN-5 countries for the period of 1996-2016 using monthly data. For this purpose, we used the Pedroni co-integration method to test for the long-run hypothesis of purchasing power parity. We first tested for the stationarity of the variables and found that the variables are non-stationary at levels but stationary at first difference. Results of the Pedroni test rejected the null hypothesis of no co-integration meaning that we have enough evidence to support PPP in the long-run for the ASEAN-5 countries over the period of 1996-2016. In other words, the rejection of null hypothesis implies a long-run relation between nominal exchange rates and relative prices.
Biostatistics Series Module 2: Overview of Hypothesis Testing.
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Hypothesis testing (or statistical inference) is one of the major applications of biostatistics. Much of medical research begins with a research question that can be framed as a hypothesis. Inferential statistics begins with a null hypothesis that reflects the conservative position of no change or no difference in comparison to baseline or between groups. Usually, the researcher has reason to believe that there is some effect or some difference which is the alternative hypothesis. The researcher therefore proceeds to study samples and measure outcomes in the hope of generating evidence strong enough for the statistician to be able to reject the null hypothesis. The concept of the P value is almost universally used in hypothesis testing. It denotes the probability of obtaining by chance a result at least as extreme as that observed, even when the null hypothesis is true and no real difference exists. Usually, if P is < 0.05 the null hypothesis is rejected and sample results are deemed statistically significant. With the increasing availability of computers and access to specialized statistical software, the drudgery involved in statistical calculations is now a thing of the past, once the learning curve of the software has been traversed. The life sciences researcher is therefore free to devote oneself to optimally designing the study, carefully selecting the hypothesis tests to be applied, and taking care in conducting the study well. Unfortunately, selecting the right test seems difficult initially. Thinking of the research hypothesis as addressing one of five generic research questions helps in selection of the right hypothesis test. In addition, it is important to be clear about the nature of the variables (e.g., numerical vs. categorical; parametric vs. nonparametric) and the number of groups or data sets being compared (e.g., two or more than two) at a time. The same research question may be explored by more than one type of hypothesis test. While this may be of utility in highlighting different aspects of the problem, merely reapplying different tests to the same issue in the hope of finding a P < 0.05 is a wrong use of statistics. Finally, it is becoming the norm that an estimate of the size of any effect, expressed with its 95% confidence interval, is required for meaningful interpretation of results. A large study is likely to have a small (and therefore "statistically significant") P value, but a "real" estimate of the effect would be provided by the 95% confidence interval. If the intervals overlap between two interventions, then the difference between them is not so clear-cut even if P < 0.05. The two approaches are now considered complementary to one another.
Biostatistics Series Module 2: Overview of Hypothesis Testing
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Hypothesis testing (or statistical inference) is one of the major applications of biostatistics. Much of medical research begins with a research question that can be framed as a hypothesis. Inferential statistics begins with a null hypothesis that reflects the conservative position of no change or no difference in comparison to baseline or between groups. Usually, the researcher has reason to believe that there is some effect or some difference which is the alternative hypothesis. The researcher therefore proceeds to study samples and measure outcomes in the hope of generating evidence strong enough for the statistician to be able to reject the null hypothesis. The concept of the P value is almost universally used in hypothesis testing. It denotes the probability of obtaining by chance a result at least as extreme as that observed, even when the null hypothesis is true and no real difference exists. Usually, if P is < 0.05 the null hypothesis is rejected and sample results are deemed statistically significant. With the increasing availability of computers and access to specialized statistical software, the drudgery involved in statistical calculations is now a thing of the past, once the learning curve of the software has been traversed. The life sciences researcher is therefore free to devote oneself to optimally designing the study, carefully selecting the hypothesis tests to be applied, and taking care in conducting the study well. Unfortunately, selecting the right test seems difficult initially. Thinking of the research hypothesis as addressing one of five generic research questions helps in selection of the right hypothesis test. In addition, it is important to be clear about the nature of the variables (e.g., numerical vs. categorical; parametric vs. nonparametric) and the number of groups or data sets being compared (e.g., two or more than two) at a time. The same research question may be explored by more than one type of hypothesis test. While this may be of utility in highlighting different aspects of the problem, merely reapplying different tests to the same issue in the hope of finding a P < 0.05 is a wrong use of statistics. Finally, it is becoming the norm that an estimate of the size of any effect, expressed with its 95% confidence interval, is required for meaningful interpretation of results. A large study is likely to have a small (and therefore “statistically significant”) P value, but a “real” estimate of the effect would be provided by the 95% confidence interval. If the intervals overlap between two interventions, then the difference between them is not so clear-cut even if P < 0.05. The two approaches are now considered complementary to one another. PMID:27057011
The Importance of Teaching Power in Statistical Hypothesis Testing
ERIC Educational Resources Information Center
Olinsky, Alan; Schumacher, Phyllis; Quinn, John
2012-01-01
In this paper, we discuss the importance of teaching power considerations in statistical hypothesis testing. Statistical power analysis determines the ability of a study to detect a meaningful effect size, where the effect size is the difference between the hypothesized value of the population parameter under the null hypothesis and the true value…
A large scale test of the gaming-enhancement hypothesis
Wang, John C.
2016-01-01
A growing research literature suggests that regular electronic game play and game-based training programs may confer practically significant benefits to cognitive functioning. Most evidence supporting this idea, the gaming-enhancement hypothesis, has been collected in small-scale studies of university students and older adults. This research investigated the hypothesis in a general way with a large sample of 1,847 school-aged children. Our aim was to examine the relations between young people’s gaming experiences and an objective test of reasoning performance. Using a Bayesian hypothesis testing approach, evidence for the gaming-enhancement and null hypotheses were compared. Results provided no substantive evidence supporting the idea that having preference for or regularly playing commercially available games was positively associated with reasoning ability. Evidence ranged from equivocal to very strong in support for the null hypothesis over what was predicted. The discussion focuses on the value of Bayesian hypothesis testing for investigating electronic gaming effects, the importance of open science practices, and pre-registered designs to improve the quality of future work. PMID:27896035
Huang, Peng; Ou, Ai-hua; Piantadosi, Steven; Tan, Ming
2014-11-01
We discuss the problem of properly defining treatment superiority through the specification of hypotheses in clinical trials. The need to precisely define the notion of superiority in a one-sided hypothesis test problem has been well recognized by many authors. Ideally designed null and alternative hypotheses should correspond to a partition of all possible scenarios of underlying true probability models P={P(ω):ω∈Ω} such that the alternative hypothesis Ha={P(ω):ω∈Ωa} can be inferred upon the rejection of null hypothesis Ho={P(ω):ω∈Ω(o)} However, in many cases, tests are carried out and recommendations are made without a precise definition of superiority or a specification of alternative hypothesis. Moreover, in some applications, the union of probability models specified by the chosen null and alternative hypothesis does not constitute a completed model collection P (i.e., H(o)∪H(a) is smaller than P). This not only imposes a strong non-validated assumption of the underlying true models, but also leads to different superiority claims depending on which test is used instead of scientific plausibility. Different ways to partition P fro testing treatment superiority often have different implications on sample size, power, and significance in both efficacy and comparative effectiveness trial design. Such differences are often overlooked. We provide a theoretical framework for evaluating the statistical properties of different specification of superiority in typical hypothesis testing. This can help investigators to select proper hypotheses for treatment comparison inclinical trial design. Copyright © 2014 Elsevier Inc. All rights reserved.
To P or Not to P: Backing Bayesian Statistics.
Buchinsky, Farrel J; Chadha, Neil K
2017-12-01
In biomedical research, it is imperative to differentiate chance variation from truth before we generalize what we see in a sample of subjects to the wider population. For decades, we have relied on null hypothesis significance testing, where we calculate P values for our data to decide whether to reject a null hypothesis. This methodology is subject to substantial misinterpretation and errant conclusions. Instead of working backward by calculating the probability of our data if the null hypothesis were true, Bayesian statistics allow us instead to work forward, calculating the probability of our hypothesis given the available data. This methodology gives us a mathematical means of incorporating our "prior probabilities" from previous study data (if any) to produce new "posterior probabilities." Bayesian statistics tell us how confidently we should believe what we believe. It is time to embrace and encourage their use in our otolaryngology research.
The frequentist implications of optional stopping on Bayesian hypothesis tests.
Sanborn, Adam N; Hills, Thomas T
2014-04-01
Null hypothesis significance testing (NHST) is the most commonly used statistical methodology in psychology. The probability of achieving a value as extreme or more extreme than the statistic obtained from the data is evaluated, and if it is low enough, the null hypothesis is rejected. However, because common experimental practice often clashes with the assumptions underlying NHST, these calculated probabilities are often incorrect. Most commonly, experimenters use tests that assume that sample sizes are fixed in advance of data collection but then use the data to determine when to stop; in the limit, experimenters can use data monitoring to guarantee that the null hypothesis will be rejected. Bayesian hypothesis testing (BHT) provides a solution to these ills because the stopping rule used is irrelevant to the calculation of a Bayes factor. In addition, there are strong mathematical guarantees on the frequentist properties of BHT that are comforting for researchers concerned that stopping rules could influence the Bayes factors produced. Here, we show that these guaranteed bounds have limited scope and often do not apply in psychological research. Specifically, we quantitatively demonstrate the impact of optional stopping on the resulting Bayes factors in two common situations: (1) when the truth is a combination of the hypotheses, such as in a heterogeneous population, and (2) when a hypothesis is composite-taking multiple parameter values-such as the alternative hypothesis in a t-test. We found that, for these situations, while the Bayesian interpretation remains correct regardless of the stopping rule used, the choice of stopping rule can, in some situations, greatly increase the chance of experimenters finding evidence in the direction they desire. We suggest ways to control these frequentist implications of stopping rules on BHT.
Saraf, Sanatan; Mathew, Thomas; Roy, Anindya
2015-01-01
For the statistical validation of surrogate endpoints, an alternative formulation is proposed for testing Prentice's fourth criterion, under a bivariate normal model. In such a setup, the criterion involves inference concerning an appropriate regression parameter, and the criterion holds if the regression parameter is zero. Testing such a null hypothesis has been criticized in the literature since it can only be used to reject a poor surrogate, and not to validate a good surrogate. In order to circumvent this, an equivalence hypothesis is formulated for the regression parameter, namely the hypothesis that the parameter is equivalent to zero. Such an equivalence hypothesis is formulated as an alternative hypothesis, so that the surrogate endpoint is statistically validated when the null hypothesis is rejected. Confidence intervals for the regression parameter and tests for the equivalence hypothesis are proposed using bootstrap methods and small sample asymptotics, and their performances are numerically evaluated and recommendations are made. The choice of the equivalence margin is a regulatory issue that needs to be addressed. The proposed equivalence testing formulation is also adopted for other parameters that have been proposed in the literature on surrogate endpoint validation, namely, the relative effect and proportion explained.
Testing for nonlinearity in time series: The method of surrogate data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Theiler, J.; Galdrikian, B.; Longtin, A.
1991-01-01
We describe a statistical approach for identifying nonlinearity in time series; in particular, we want to avoid claims of chaos when simpler models (such as linearly correlated noise) can explain the data. The method requires a careful statement of the null hypothesis which characterizes a candidate linear process, the generation of an ensemble of surrogate'' data sets which are similar to the original time series but consistent with the null hypothesis, and the computation of a discriminating statistic for the original and for each of the surrogate data sets. The idea is to test the original time series against themore » null hypothesis by checking whether the discriminating statistic computed for the original time series differs significantly from the statistics computed for each of the surrogate sets. We present algorithms for generating surrogate data under various null hypotheses, and we show the results of numerical experiments on artificial data using correlation dimension, Lyapunov exponent, and forecasting error as discriminating statistics. Finally, we consider a number of experimental time series -- including sunspots, electroencephalogram (EEG) signals, and fluid convection -- and evaluate the statistical significance of the evidence for nonlinear structure in each case. 56 refs., 8 figs.« less
Orr, H A
1998-01-01
Evolutionary biologists have long sought a way to determine whether a phenotypic difference between two taxa was caused by natural selection or random genetic drift. Here I argue that data from quantitative trait locus (QTL) analyses can be used to test the null hypothesis of neutral phenotypic evolution. I propose a sign test that compares the observed number of plus and minus alleles in the "high line" with that expected under neutrality, conditioning on the known phenotypic difference between the taxa. Rejection of the null hypothesis implies a role for directional natural selection. This test is applicable to any character in any organism in which QTL analysis can be performed. PMID:9691061
Explorations in Statistics: Hypothesis Tests and P Values
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2009-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This second installment of "Explorations in Statistics" delves into test statistics and P values, two concepts fundamental to the test of a scientific null hypothesis. The essence of a test statistic is that it compares what…
Hypothesis testing of a change point during cognitive decline among Alzheimer's disease patients.
Ji, Ming; Xiong, Chengjie; Grundman, Michael
2003-10-01
In this paper, we present a statistical hypothesis test for detecting a change point over the course of cognitive decline among Alzheimer's disease patients. The model under the null hypothesis assumes a constant rate of cognitive decline over time and the model under the alternative hypothesis is a general bilinear model with an unknown change point. When the change point is unknown, however, the null distribution of the test statistics is not analytically tractable and has to be simulated by parametric bootstrap. When the alternative hypothesis that a change point exists is accepted, we propose an estimate of its location based on the Akaike's Information Criterion. We applied our method to a data set from the Neuropsychological Database Initiative by implementing our hypothesis testing method to analyze Mini Mental Status Exam scores based on a random-slope and random-intercept model with a bilinear fixed effect. Our result shows that despite large amount of missing data, accelerated decline did occur for MMSE among AD patients. Our finding supports the clinical belief of the existence of a change point during cognitive decline among AD patients and suggests the use of change point models for the longitudinal modeling of cognitive decline in AD research.
Statistical modeling, detection, and segmentation of stains in digitized fabric images
NASA Astrophysics Data System (ADS)
Gururajan, Arunkumar; Sari-Sarraf, Hamed; Hequet, Eric F.
2007-02-01
This paper will describe a novel and automated system based on a computer vision approach, for objective evaluation of stain release on cotton fabrics. Digitized color images of the stained fabrics are obtained, and the pixel values in the color and intensity planes of these images are probabilistically modeled as a Gaussian Mixture Model (GMM). Stain detection is posed as a decision theoretic problem, where the null hypothesis corresponds to absence of a stain. The null hypothesis and the alternate hypothesis mathematically translate into a first order GMM and a second order GMM respectively. The parameters of the GMM are estimated using a modified Expectation-Maximization (EM) algorithm. Minimum Description Length (MDL) is then used as the test statistic to decide the verity of the null hypothesis. The stain is then segmented by a decision rule based on the probability map generated by the EM algorithm. The proposed approach was tested on a dataset of 48 fabric images soiled with stains of ketchup, corn oil, mustard, ragu sauce, revlon makeup and grape juice. The decision theoretic part of the algorithm produced a correct detection rate (true positive) of 93% and a false alarm rate of 5% on these set of images.
The researcher and the consultant: from testing to probability statements.
Hamra, Ghassan B; Stang, Andreas; Poole, Charles
2015-09-01
In the first instalment of this series, Stang and Poole provided an overview of Fisher significance testing (ST), Neyman-Pearson null hypothesis testing (NHT), and their unfortunate and unintended offspring, null hypothesis significance testing. In addition to elucidating the distinction between the first two and the evolution of the third, the authors alluded to alternative models of statistical inference; namely, Bayesian statistics. Bayesian inference has experienced a revival in recent decades, with many researchers advocating for its use as both a complement and an alternative to NHT and ST. This article will continue in the direction of the first instalment, providing practicing researchers with an introduction to Bayesian inference. Our work will draw on the examples and discussion of the previous dialogue.
Use of Pearson's Chi-Square for Testing Equality of Percentile Profiles across Multiple Populations.
Johnson, William D; Beyl, Robbie A; Burton, Jeffrey H; Johnson, Callie M; Romer, Jacob E; Zhang, Lei
2015-08-01
In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For example, it may be more informative to compare two or more populations with respect to their within population distributions by testing the hypothesis that their corresponding respective 10 th , 50 th , and 90 th percentiles are equal. As a generalization of the median test, the proposed test statistic is asymptotically distributed as Chi-square with degrees of freedom dependent upon the number of percentiles tested and constraints of the null hypothesis. Results from simulation studies are used to validate the nominal 0.05 significance level under the null hypothesis, and asymptotic power properties that are suitable for testing equality of percentile profiles against selected profile discrepancies for a variety of underlying distributions. A pragmatic example is provided to illustrate the comparison of the percentile profiles for four body mass index distributions.
Statistical power analysis in wildlife research
Steidl, R.J.; Hayes, J.P.
1997-01-01
Statistical power analysis can be used to increase the efficiency of research efforts and to clarify research results. Power analysis is most valuable in the design or planning phases of research efforts. Such prospective (a priori) power analyses can be used to guide research design and to estimate the number of samples necessary to achieve a high probability of detecting biologically significant effects. Retrospective (a posteriori) power analysis has been advocated as a method to increase information about hypothesis tests that were not rejected. However, estimating power for tests of null hypotheses that were not rejected with the effect size observed in the study is incorrect; these power estimates will always be a??0.50 when bias adjusted and have no relation to true power. Therefore, retrospective power estimates based on the observed effect size for hypothesis tests that were not rejected are misleading; retrospective power estimates are only meaningful when based on effect sizes other than the observed effect size, such as those effect sizes hypothesized to be biologically significant. Retrospective power analysis can be used effectively to estimate the number of samples or effect size that would have been necessary for a completed study to have rejected a specific null hypothesis. Simply presenting confidence intervals can provide additional information about null hypotheses that were not rejected, including information about the size of the true effect and whether or not there is adequate evidence to 'accept' a null hypothesis as true. We suggest that (1) statistical power analyses be routinely incorporated into research planning efforts to increase their efficiency, (2) confidence intervals be used in lieu of retrospective power analyses for null hypotheses that were not rejected to assess the likely size of the true effect, (3) minimum biologically significant effect sizes be used for all power analyses, and (4) if retrospective power estimates are to be reported, then the I?-level, effect sizes, and sample sizes used in calculations must also be reported.
Hypothesis testing and earthquake prediction.
Jackson, D D
1996-04-30
Requirements for testing include advance specification of the conditional rate density (probability per unit time, area, and magnitude) or, alternatively, probabilities for specified intervals of time, space, and magnitude. Here I consider testing fully specified hypotheses, with no parameter adjustments or arbitrary decisions allowed during the test period. Because it may take decades to validate prediction methods, it is worthwhile to formulate testable hypotheses carefully in advance. Earthquake prediction generally implies that the probability will be temporarily higher than normal. Such a statement requires knowledge of "normal behavior"--that is, it requires a null hypothesis. Hypotheses can be tested in three ways: (i) by comparing the number of actual earth-quakes to the number predicted, (ii) by comparing the likelihood score of actual earthquakes to the predicted distribution, and (iii) by comparing the likelihood ratio to that of a null hypothesis. The first two tests are purely self-consistency tests, while the third is a direct comparison of two hypotheses. Predictions made without a statement of probability are very difficult to test, and any test must be based on the ratio of earthquakes in and out of the forecast regions.
Hypothesis testing and earthquake prediction.
Jackson, D D
1996-01-01
Requirements for testing include advance specification of the conditional rate density (probability per unit time, area, and magnitude) or, alternatively, probabilities for specified intervals of time, space, and magnitude. Here I consider testing fully specified hypotheses, with no parameter adjustments or arbitrary decisions allowed during the test period. Because it may take decades to validate prediction methods, it is worthwhile to formulate testable hypotheses carefully in advance. Earthquake prediction generally implies that the probability will be temporarily higher than normal. Such a statement requires knowledge of "normal behavior"--that is, it requires a null hypothesis. Hypotheses can be tested in three ways: (i) by comparing the number of actual earth-quakes to the number predicted, (ii) by comparing the likelihood score of actual earthquakes to the predicted distribution, and (iii) by comparing the likelihood ratio to that of a null hypothesis. The first two tests are purely self-consistency tests, while the third is a direct comparison of two hypotheses. Predictions made without a statement of probability are very difficult to test, and any test must be based on the ratio of earthquakes in and out of the forecast regions. PMID:11607663
Test of association: which one is the most appropriate for my study?
Gonzalez-Chica, David Alejandro; Bastos, João Luiz; Duquia, Rodrigo Pereira; Bonamigo, Renan Rangel; Martínez-Mesa, Jeovany
2015-01-01
Hypothesis tests are statistical tools widely used for assessing whether or not there is an association between two or more variables. These tests provide a probability of the type 1 error (p-value), which is used to accept or reject the null study hypothesis. To provide a practical guide to help researchers carefully select the most appropriate procedure to answer the research question. We discuss the logic of hypothesis testing and present the prerequisites of each procedure based on practical examples.
Testing an earthquake prediction algorithm
Kossobokov, V.G.; Healy, J.H.; Dewey, J.W.
1997-01-01
A test to evaluate earthquake prediction algorithms is being applied to a Russian algorithm known as M8. The M8 algorithm makes intermediate term predictions for earthquakes to occur in a large circle, based on integral counts of transient seismicity in the circle. In a retroactive prediction for the period January 1, 1985 to July 1, 1991 the algorithm as configured for the forward test would have predicted eight of ten strong earthquakes in the test area. A null hypothesis, based on random assignment of predictions, predicts eight earthquakes in 2.87% of the trials. The forward test began July 1, 1991 and will run through December 31, 1997. As of July 1, 1995, the algorithm had forward predicted five out of nine earthquakes in the test area, which success ratio would have been achieved in 53% of random trials with the null hypothesis.
Two-sample binary phase 2 trials with low type I error and low sample size
Litwin, Samuel; Basickes, Stanley; Ross, Eric A.
2017-01-01
Summary We address design of two-stage clinical trials comparing experimental and control patients. Our end-point is success or failure, however measured, with null hypothesis that the chance of success in both arms is p0 and alternative that it is p0 among controls and p1 > p0 among experimental patients. Standard rules will have the null hypothesis rejected when the number of successes in the (E)xperimental arm, E, sufficiently exceeds C, that among (C)ontrols. Here, we combine one-sample rejection decision rules, E ≥ m, with two-sample rules of the form E – C > r to achieve two-sample tests with low sample number and low type I error. We find designs with sample numbers not far from the minimum possible using standard two-sample rules, but with type I error of 5% rather than 15% or 20% associated with them, and of equal power. This level of type I error is achieved locally, near the stated null, and increases to 15% or 20% when the null is significantly higher than specified. We increase the attractiveness of these designs to patients by using 2:1 randomization. Examples of the application of this new design covering both high and low success rates under the null hypothesis are provided. PMID:28118686
Bayesian models based on test statistics for multiple hypothesis testing problems.
Ji, Yuan; Lu, Yiling; Mills, Gordon B
2008-04-01
We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.
Invited Commentary: Can Issues With Reproducibility in Science Be Blamed on Hypothesis Testing?
Weinberg, Clarice R.
2017-01-01
Abstract In the accompanying article (Am J Epidemiol. 2017;186(6):646–647), Dr. Timothy Lash makes a forceful case that the problems with reproducibility in science stem from our “culture” of null hypothesis significance testing. He notes that when attention is selectively given to statistically significant findings, the estimated effects will be systematically biased away from the null. Here I revisit the recent history of genetic epidemiology and argue for retaining statistical testing as an important part of the tool kit. Particularly when many factors are considered in an agnostic way, in what Lash calls “innovative” research, investigators need a selection strategy to identify which findings are most likely to be genuine, and hence worthy of further study. PMID:28938713
Significance tests for functional data with complex dependence structure.
Staicu, Ana-Maria; Lahiri, Soumen N; Carroll, Raymond J
2015-01-01
We propose an L 2 -norm based global testing procedure for the null hypothesis that multiple group mean functions are equal, for functional data with complex dependence structure. Specifically, we consider the setting of functional data with a multilevel structure of the form groups-clusters or subjects-units, where the unit-level profiles are spatially correlated within the cluster, and the cluster-level data are independent. Orthogonal series expansions are used to approximate the group mean functions and the test statistic is estimated using the basis coefficients. The asymptotic null distribution of the test statistic is developed, under mild regularity conditions. To our knowledge this is the first work that studies hypothesis testing, when data have such complex multilevel functional and spatial structure. Two small-sample alternatives, including a novel block bootstrap for functional data, are proposed, and their performance is examined in simulation studies. The paper concludes with an illustration of a motivating experiment.
Two-sample binary phase 2 trials with low type I error and low sample size.
Litwin, Samuel; Basickes, Stanley; Ross, Eric A
2017-04-30
We address design of two-stage clinical trials comparing experimental and control patients. Our end point is success or failure, however measured, with null hypothesis that the chance of success in both arms is p 0 and alternative that it is p 0 among controls and p 1 > p 0 among experimental patients. Standard rules will have the null hypothesis rejected when the number of successes in the (E)xperimental arm, E, sufficiently exceeds C, that among (C)ontrols. Here, we combine one-sample rejection decision rules, E⩾m, with two-sample rules of the form E - C > r to achieve two-sample tests with low sample number and low type I error. We find designs with sample numbers not far from the minimum possible using standard two-sample rules, but with type I error of 5% rather than 15% or 20% associated with them, and of equal power. This level of type I error is achieved locally, near the stated null, and increases to 15% or 20% when the null is significantly higher than specified. We increase the attractiveness of these designs to patients by using 2:1 randomization. Examples of the application of this new design covering both high and low success rates under the null hypothesis are provided. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Intra-fraction motion of the prostate is a random walk
NASA Astrophysics Data System (ADS)
Ballhausen, H.; Li, M.; Hegemann, N.-S.; Ganswindt, U.; Belka, C.
2015-01-01
A random walk model for intra-fraction motion has been proposed, where at each step the prostate moves a small amount from its current position in a random direction. Online tracking data from perineal ultrasound is used to validate or reject this model against alternatives. Intra-fraction motion of a prostate was recorded by 4D ultrasound (Elekta Clarity system) during 84 fractions of external beam radiotherapy of six patients. In total, the center of the prostate was tracked for 8 h in intervals of 4 s. Maximum likelihood model parameters were fitted to the data. The null hypothesis of a random walk was tested with the Dickey-Fuller test. The null hypothesis of stationarity was tested by the Kwiatkowski-Phillips-Schmidt-Shin test. The increase of variance in prostate position over time and the variability in motility between fractions were analyzed. Intra-fraction motion of the prostate was best described as a stochastic process with an auto-correlation coefficient of ρ = 0.92 ± 0.13. The random walk hypothesis (ρ = 1) could not be rejected (p = 0.27). The static noise hypothesis (ρ = 0) was rejected (p < 0.001). The Dickey-Fuller test rejected the null hypothesis ρ = 1 in 25% to 32% of cases. On average, the Kwiatkowski-Phillips-Schmidt-Shin test rejected the null hypothesis ρ = 0 with a probability of 93% to 96%. The variance in prostate position increased linearly over time (r2 = 0.9 ± 0.1). Variance kept increasing and did not settle at a maximum as would be expected from a stationary process. There was substantial variability in motility between fractions and patients with maximum aberrations from isocenter ranging from 0.5 mm to over 10 mm in one patient alone. In conclusion, evidence strongly suggests that intra-fraction motion of the prostate is a random walk and neither static (like inter-fraction setup errors) nor stationary (like a cyclic motion such as breathing, for example). The prostate tends to drift away from the isocenter during a fraction, and this variance increases with time, such that shorter fractions are beneficial to the problem of intra-fraction motion. As a consequence, fixed safety margins (which would over-compensate at the beginning and under-compensate at the end of a fraction) cannot optimally account for intra-fraction motion. Instead, online tracking and position correction on-the-fly should be considered as the preferred approach to counter intra-fraction motion.
Nikolakopoulou, Adriani; Mavridis, Dimitris; Furukawa, Toshi A; Cipriani, Andrea; Tricco, Andrea C; Straus, Sharon E; Siontis, George C M; Egger, Matthias; Salanti, Georgia
2018-02-28
To examine whether the continuous updating of networks of prospectively planned randomised controlled trials (RCTs) ("living" network meta-analysis) provides strong evidence against the null hypothesis in comparative effectiveness of medical interventions earlier than the updating of conventional, pairwise meta-analysis. Empirical study of the accumulating evidence about the comparative effectiveness of clinical interventions. Database of network meta-analyses of RCTs identified through searches of Medline, Embase, and the Cochrane Database of Systematic Reviews until 14 April 2015. Network meta-analyses published after January 2012 that compared at least five treatments and included at least 20 RCTs. Clinical experts were asked to identify in each network the treatment comparison of greatest clinical interest. Comparisons were excluded for which direct and indirect evidence disagreed, based on side, or node, splitting test (P<0.10). Cumulative pairwise and network meta-analyses were performed for each selected comparison. Monitoring boundaries of statistical significance were constructed and the evidence against the null hypothesis was considered to be strong when the monitoring boundaries were crossed. A significance level was defined as α=5%, power of 90% (β=10%), and an anticipated treatment effect to detect equal to the final estimate from the network meta-analysis. The frequency and time to strong evidence was compared against the null hypothesis between pairwise and network meta-analyses. 49 comparisons of interest from 44 networks were included; most (n=39, 80%) were between active drugs, mainly from the specialties of cardiology, endocrinology, psychiatry, and rheumatology. 29 comparisons were informed by both direct and indirect evidence (59%), 13 by indirect evidence (27%), and 7 by direct evidence (14%). Both network and pairwise meta-analysis provided strong evidence against the null hypothesis for seven comparisons, but for an additional 10 comparisons only network meta-analysis provided strong evidence against the null hypothesis (P=0.002). The median time to strong evidence against the null hypothesis was 19 years with living network meta-analysis and 23 years with living pairwise meta-analysis (hazard ratio 2.78, 95% confidence interval 1.00 to 7.72, P=0.05). Studies directly comparing the treatments of interest continued to be published for eight comparisons after strong evidence had become evident in network meta-analysis. In comparative effectiveness research, prospectively planned living network meta-analyses produced strong evidence against the null hypothesis more often and earlier than conventional, pairwise meta-analyses. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Nikolakopoulou, Adriani; Mavridis, Dimitris; Furukawa, Toshi A; Cipriani, Andrea; Tricco, Andrea C; Straus, Sharon E; Siontis, George C M; Egger, Matthias
2018-01-01
Abstract Objective To examine whether the continuous updating of networks of prospectively planned randomised controlled trials (RCTs) (“living” network meta-analysis) provides strong evidence against the null hypothesis in comparative effectiveness of medical interventions earlier than the updating of conventional, pairwise meta-analysis. Design Empirical study of the accumulating evidence about the comparative effectiveness of clinical interventions. Data sources Database of network meta-analyses of RCTs identified through searches of Medline, Embase, and the Cochrane Database of Systematic Reviews until 14 April 2015. Eligibility criteria for study selection Network meta-analyses published after January 2012 that compared at least five treatments and included at least 20 RCTs. Clinical experts were asked to identify in each network the treatment comparison of greatest clinical interest. Comparisons were excluded for which direct and indirect evidence disagreed, based on side, or node, splitting test (P<0.10). Outcomes and analysis Cumulative pairwise and network meta-analyses were performed for each selected comparison. Monitoring boundaries of statistical significance were constructed and the evidence against the null hypothesis was considered to be strong when the monitoring boundaries were crossed. A significance level was defined as α=5%, power of 90% (β=10%), and an anticipated treatment effect to detect equal to the final estimate from the network meta-analysis. The frequency and time to strong evidence was compared against the null hypothesis between pairwise and network meta-analyses. Results 49 comparisons of interest from 44 networks were included; most (n=39, 80%) were between active drugs, mainly from the specialties of cardiology, endocrinology, psychiatry, and rheumatology. 29 comparisons were informed by both direct and indirect evidence (59%), 13 by indirect evidence (27%), and 7 by direct evidence (14%). Both network and pairwise meta-analysis provided strong evidence against the null hypothesis for seven comparisons, but for an additional 10 comparisons only network meta-analysis provided strong evidence against the null hypothesis (P=0.002). The median time to strong evidence against the null hypothesis was 19 years with living network meta-analysis and 23 years with living pairwise meta-analysis (hazard ratio 2.78, 95% confidence interval 1.00 to 7.72, P=0.05). Studies directly comparing the treatments of interest continued to be published for eight comparisons after strong evidence had become evident in network meta-analysis. Conclusions In comparative effectiveness research, prospectively planned living network meta-analyses produced strong evidence against the null hypothesis more often and earlier than conventional, pairwise meta-analyses. PMID:29490922
Phase II design with sequential testing of hypotheses within each stage.
Poulopoulou, Stavroula; Karlis, Dimitris; Yiannoutsos, Constantin T; Dafni, Urania
2014-01-01
The main goal of a Phase II clinical trial is to decide, whether a particular therapeutic regimen is effective enough to warrant further study. The hypothesis tested by Fleming's Phase II design (Fleming, 1982) is [Formula: see text] versus [Formula: see text], with level [Formula: see text] and with a power [Formula: see text] at [Formula: see text], where [Formula: see text] is chosen to represent the response probability achievable with standard treatment and [Formula: see text] is chosen such that the difference [Formula: see text] represents a targeted improvement with the new treatment. This hypothesis creates a misinterpretation mainly among clinicians that rejection of the null hypothesis is tantamount to accepting the alternative, and vice versa. As mentioned by Storer (1992), this introduces ambiguity in the evaluation of type I and II errors and the choice of the appropriate decision at the end of the study. Instead of testing this hypothesis, an alternative class of designs is proposed in which two hypotheses are tested sequentially. The hypothesis [Formula: see text] versus [Formula: see text] is tested first. If this null hypothesis is rejected, the hypothesis [Formula: see text] versus [Formula: see text] is tested next, in order to examine whether the therapy is effective enough to consider further testing in a Phase III study. For the derivation of the proposed design the exact binomial distribution is used to calculate the decision cut-points. The optimal design parameters are chosen, so as to minimize the average sample number (ASN) under specific upper bounds for error levels. The optimal values for the design were found using a simulated annealing method.
Estimating Required Contingency Funds for Construction Projects using Multiple Linear Regression
2006-03-01
Breusch - Pagan test , in which the null hypothesis states that the residuals have constant variance. The alternate hypothesis is that the residuals do not...variance, the Breusch - Pagan test provides statistical evidence that the assumption is justified. For the proposed model, the p-value is 0.173...entire test sample. v Acknowledgments First, I would like to acknowledge the influence and help of Greg Hoffman. His work served as the
Testing 40 Predictions from the Transtheoretical Model Again, with Confidence
ERIC Educational Resources Information Center
Velicer, Wayne F.; Brick, Leslie Ann D.; Fava, Joseph L.; Prochaska, James O.
2013-01-01
Testing Theory-based Quantitative Predictions (TTQP) represents an alternative to traditional Null Hypothesis Significance Testing (NHST) procedures and is more appropriate for theory testing. The theory generates explicit effect size predictions and these effect size estimates, with related confidence intervals, are used to test the predictions.…
The continuum of hydroclimate variability in western North America during the last millennium
Ault, Toby R.; Cole, Julia E.; Overpeck, Jonathan T.; Pederson, Gregory T.; St. George, Scott; Otto-Bliesner, Bette; Woodhouse, Connie A.; Deser, Clara
2013-01-01
The distribution of climatic variance across the frequency spectrum has substantial importance for anticipating how climate will evolve in the future. Here we estimate power spectra and power laws (ß) from instrumental, proxy, and climate model data to characterize the hydroclimate continuum in western North America (WNA). We test the significance of our estimates of spectral densities and ß against the null hypothesis that they reflect solely the effects of local (non-climate) sources of autocorrelation at the monthly timescale. Although tree-ring based hydroclimate reconstructions are generally consistent with this null hypothesis, values of ß calculated from long-moisture sensitive chronologies (as opposed to reconstructions), and other types of hydroclimate proxies, exceed null expectations. We therefore argue that there is more low-frequency variability in hydroclimate than monthly autocorrelation alone can generate. Coupled model results archived as part of the Climate Model Intercomparison Project 5 (CMIP5) are consistent with the null hypothesis and appear unable to generate variance in hydroclimate commensurate with paleoclimate records. Consequently, at decadal to multidecadal timescales there is more variability in instrumental and proxy data than in the models, suggesting that the risk of prolonged droughts under climate change may be underestimated by CMIP5 simulations of the future.
Calculating p-values and their significances with the Energy Test for large datasets
NASA Astrophysics Data System (ADS)
Barter, W.; Burr, C.; Parkes, C.
2018-04-01
The energy test method is a multi-dimensional test of whether two samples are consistent with arising from the same underlying population, through the calculation of a single test statistic (called the T-value). The method has recently been used in particle physics to search for samples that differ due to CP violation. The generalised extreme value function has previously been used to describe the distribution of T-values under the null hypothesis that the two samples are drawn from the same underlying population. We show that, in a simple test case, the distribution is not sufficiently well described by the generalised extreme value function. We present a new method, where the distribution of T-values under the null hypothesis when comparing two large samples can be found by scaling the distribution found when comparing small samples drawn from the same population. This method can then be used to quickly calculate the p-values associated with the results of the test.
Developing the research hypothesis.
Toledo, Alexander H; Flikkema, Robert; Toledo-Pereyra, Luis H
2011-01-01
The research hypothesis is needed for a sound and well-developed research study. The research hypothesis contributes to the solution of the research problem. Types of research hypotheses include inductive and deductive, directional and non-directional, and null and alternative hypotheses. Rejecting the null hypothesis and accepting the alternative hypothesis is the basis for building a good research study. This work reviews the most important aspects of organizing and establishing an efficient and complete hypothesis.
Précis of statistical significance: rationale, validity, and utility.
Chow, S L
1998-04-01
The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of data. Neither can any of them fulfill the nonstatistical functions expected of them by critics.
A robust null hypothesis for the potential causes of megadrought in western North America
NASA Astrophysics Data System (ADS)
Ault, T.; St George, S.; Smerdon, J. E.; Coats, S.; Mankin, J. S.; Cruz, C. C.; Cook, B.; Stevenson, S.
2017-12-01
The western United States was affected by several megadroughts during the last 1200 years, most prominently during the Medieval Climate Anomaly (MCA: 800 to 1300 CE). A null hypothesis is developed to test the possibility that, given a sufficiently long period of time, these events are inevitable and occur purely as a consequence of internal climate variability. The null distribution of this hypothesis is populated by a linear inverse model (LIM) constructed from global sea-surface temperature anomalies and self-calibrated Palmer Drought Severity Index data for North America. Despite being trained only on seasonal data from the late 20th century, the LIM produces megadroughts that are comparable in their duration, spatial scale, and magnitude as the most severe events of the last 12 centuries. The null hypothesis therefore cannot be rejected with much confidence when considering these features of megadrought, meaning that similar events are possible today, even without any changes to boundary conditions. In contrast, the observed clustering of megadroughts in the MCA, as well as the change in mean hydroclimate between the MCA and the 1500-2000 period, are more likely to have been caused by either external forcing or by internal climate variability not well sampled during the latter half of the Twentieth Century. Finally, the results demonstrate the LIM is a viable tool for determining whether paleoclimate reconstructions events should be ascribed to external forcings, "out of sample" climate mechanisms, or if they are consistent with the variability observed during the recent period.
Frömke, Cornelia; Hothorn, Ludwig A; Kropf, Siegfried
2008-01-27
In many research areas it is necessary to find differences between treatment groups with several variables. For example, studies of microarray data seek to find a significant difference in location parameters from zero or one for ratios thereof for each variable. However, in some studies a significant deviation of the difference in locations from zero (or 1 in terms of the ratio) is biologically meaningless. A relevant difference or ratio is sought in such cases. This article addresses the use of relevance-shifted tests on ratios for a multivariate parallel two-sample group design. Two empirical procedures are proposed which embed the relevance-shifted test on ratios. As both procedures test a hypothesis for each variable, the resulting multiple testing problem has to be considered. Hence, the procedures include a multiplicity correction. Both procedures are extensions of available procedures for point null hypotheses achieving exact control of the familywise error rate. Whereas the shift of the null hypothesis alone would give straight-forward solutions, the problems that are the reason for the empirical considerations discussed here arise by the fact that the shift is considered in both directions and the whole parameter space in between these two limits has to be accepted as null hypothesis. The first algorithm to be discussed uses a permutation algorithm, and is appropriate for designs with a moderately large number of observations. However, many experiments have limited sample sizes. Then the second procedure might be more appropriate, where multiplicity is corrected according to a concept of data-driven order of hypotheses.
Invited Commentary: Can Issues With Reproducibility in Science Be Blamed on Hypothesis Testing?
Weinberg, Clarice R
2017-09-15
In the accompanying article (Am J Epidemiol. 2017;186(6):646-647), Dr. Timothy Lash makes a forceful case that the problems with reproducibility in science stem from our "culture" of null hypothesis significance testing. He notes that when attention is selectively given to statistically significant findings, the estimated effects will be systematically biased away from the null. Here I revisit the recent history of genetic epidemiology and argue for retaining statistical testing as an important part of the tool kit. Particularly when many factors are considered in an agnostic way, in what Lash calls "innovative" research, investigators need a selection strategy to identify which findings are most likely to be genuine, and hence worthy of further study. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health 2017. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Two Bayesian tests of the GLOMOsys Model.
Field, Sarahanne M; Wagenmakers, Eric-Jan; Newell, Ben R; Zeelenberg, René; van Ravenzwaaij, Don
2016-12-01
Priming is arguably one of the key phenomena in contemporary social psychology. Recent retractions and failed replication attempts have led to a division in the field between proponents and skeptics and have reinforced the importance of confirming certain priming effects through replication. In this study, we describe the results of 2 preregistered replication attempts of 1 experiment by Förster and Denzler (2012). In both experiments, participants first processed letters either globally or locally, then were tested using a typicality rating task. Bayes factor hypothesis tests were conducted for both experiments: Experiment 1 (N = 100) yielded an indecisive Bayes factor of 1.38, indicating that the in-lab data are 1.38 times more likely to have occurred under the null hypothesis than under the alternative. Experiment 2 (N = 908) yielded a Bayes factor of 10.84, indicating strong support for the null hypothesis that global priming does not affect participants' mean typicality ratings. The failure to replicate this priming effect challenges existing support for the GLOMO sys model. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
A single test for rejecting the null hypothesis in subgroups and in the overall sample.
Lin, Yunzhi; Zhou, Kefei; Ganju, Jitendra
2017-01-01
In clinical trials, some patient subgroups are likely to demonstrate larger effect sizes than other subgroups. For example, the effect size, or informally the benefit with treatment, is often greater in patients with a moderate condition of a disease than in those with a mild condition. A limitation of the usual method of analysis is that it does not incorporate this ordering of effect size by patient subgroup. We propose a test statistic which supplements the conventional test by including this information and simultaneously tests the null hypothesis in pre-specified subgroups and in the overall sample. It results in more power than the conventional test when the differences in effect sizes across subgroups are at least moderately large; otherwise it loses power. The method involves combining p-values from models fit to pre-specified subgroups and the overall sample in a manner that assigns greater weight to subgroups in which a larger effect size is expected. Results are presented for randomized trials with two and three subgroups.
A more powerful test based on ratio distribution for retention noninferiority hypothesis.
Deng, Ling; Chen, Gang
2013-03-11
Rothmann et al. ( 2003 ) proposed a method for the statistical inference of fraction retention noninferiority (NI) hypothesis. A fraction retention hypothesis is defined as a ratio of the new treatment effect verse the control effect in the context of a time to event endpoint. One of the major concerns using this method in the design of an NI trial is that with a limited sample size, the power of the study is usually very low. This makes an NI trial not applicable particularly when using time to event endpoint. To improve power, Wang et al. ( 2006 ) proposed a ratio test based on asymptotic normality theory. Under a strong assumption (equal variance of the NI test statistic under null and alternative hypotheses), the sample size using Wang's test was much smaller than that using Rothmann's test. However, in practice, the assumption of equal variance is generally questionable for an NI trial design. This assumption is removed in the ratio test proposed in this article, which is derived directly from a Cauchy-like ratio distribution. In addition, using this method, the fundamental assumption used in Rothmann's test, that the observed control effect is always positive, that is, the observed hazard ratio for placebo over the control is greater than 1, is no longer necessary. Without assuming equal variance under null and alternative hypotheses, the sample size required for an NI trial can be significantly reduced if using the proposed ratio test for a fraction retention NI hypothesis.
Bayes Factor Approaches for Testing Interval Null Hypotheses
ERIC Educational Resources Information Center
Morey, Richard D.; Rouder, Jeffrey N.
2011-01-01
Psychological theories are statements of constraint. The role of hypothesis testing in psychology is to test whether specific theoretical constraints hold in data. Bayesian statistics is well suited to the task of finding supporting evidence for constraint, because it allows for comparing evidence for 2 hypotheses against each another. One issue…
NASA Astrophysics Data System (ADS)
Psaltis, Dimitrios; Özel, Feryal; Chan, Chi-Kwan; Marrone, Daniel P.
2015-12-01
The half opening angle of a Kerr black hole shadow is always equal to (5 ± 0.2)GM/Dc2, where M is the mass of the black hole and D is its distance from the Earth. Therefore, measuring the size of a shadow and verifying whether it is within this 4% range constitutes a null hypothesis test of general relativity. We show that the black hole in the center of the Milky Way, Sgr A*, is the optimal target for performing this test with upcoming observations using the Event Horizon Telescope (EHT). We use the results of optical/IR monitoring of stellar orbits to show that the mass-to-distance ratio for Sgr A* is already known to an accuracy of ∼4%. We investigate our prior knowledge of the properties of the scattering screen between Sgr A* and the Earth, the effects of which will need to be corrected for in order for the black hole shadow to appear sharp against the background emission. Finally, we explore an edge detection scheme for interferometric data and a pattern matching algorithm based on the Hough/Radon transform and demonstrate that the shadow of the black hole at 1.3 mm can be localized, in principle, to within ∼9%. All these results suggest that our prior knowledge of the properties of the black hole, of scattering broadening, and of the accretion flow can only limit this general relativistic null hypothesis test with EHT observations of Sgr A* to ≲10%.
Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies.
Sayyari, Erfan; Mirarab, Siavash
2018-02-28
Phylogenetic species trees typically represent the speciation history as a bifurcating tree. Speciation events that simultaneously create more than two descendants, thereby creating polytomies in the phylogeny, are possible. Moreover, the inability to resolve relationships is often shown as a (soft) polytomy. Both types of polytomies have been traditionally studied in the context of gene tree reconstruction from sequence data. However, polytomies in the species tree cannot be detected or ruled out without considering gene tree discordance. In this paper, we describe a statistical test based on properties of the multi-species coalescent model to test the null hypothesis that a branch in an estimated species tree should be replaced by a polytomy. On both simulated and biological datasets, we show that the null hypothesis is rejected for all but the shortest branches, and in most cases, it is retained for true polytomies. The test, available as part of the Accurate Species TRee ALgorithm (ASTRAL) package, can help systematists decide whether their datasets are sufficient to resolve specific relationships of interest.
Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies
Sayyari, Erfan
2018-01-01
Phylogenetic species trees typically represent the speciation history as a bifurcating tree. Speciation events that simultaneously create more than two descendants, thereby creating polytomies in the phylogeny, are possible. Moreover, the inability to resolve relationships is often shown as a (soft) polytomy. Both types of polytomies have been traditionally studied in the context of gene tree reconstruction from sequence data. However, polytomies in the species tree cannot be detected or ruled out without considering gene tree discordance. In this paper, we describe a statistical test based on properties of the multi-species coalescent model to test the null hypothesis that a branch in an estimated species tree should be replaced by a polytomy. On both simulated and biological datasets, we show that the null hypothesis is rejected for all but the shortest branches, and in most cases, it is retained for true polytomies. The test, available as part of the Accurate Species TRee ALgorithm (ASTRAL) package, can help systematists decide whether their datasets are sufficient to resolve specific relationships of interest. PMID:29495636
Assessment of resampling methods for causality testing: A note on the US inflation behavior
Kyrtsou, Catherine; Kugiumtzis, Dimitris; Diks, Cees
2017-01-01
Different resampling methods for the null hypothesis of no Granger causality are assessed in the setting of multivariate time series, taking into account that the driving-response coupling is conditioned on the other observed variables. As appropriate test statistic for this setting, the partial transfer entropy (PTE), an information and model-free measure, is used. Two resampling techniques, time-shifted surrogates and the stationary bootstrap, are combined with three independence settings (giving a total of six resampling methods), all approximating the null hypothesis of no Granger causality. In these three settings, the level of dependence is changed, while the conditioning variables remain intact. The empirical null distribution of the PTE, as the surrogate and bootstrapped time series become more independent, is examined along with the size and power of the respective tests. Additionally, we consider a seventh resampling method by contemporaneously resampling the driving and the response time series using the stationary bootstrap. Although this case does not comply with the no causality hypothesis, one can obtain an accurate sampling distribution for the mean of the test statistic since its value is zero under H0. Results indicate that as the resampling setting gets more independent, the test becomes more conservative. Finally, we conclude with a real application. More specifically, we investigate the causal links among the growth rates for the US CPI, money supply and crude oil. Based on the PTE and the seven resampling methods, we consistently find that changes in crude oil cause inflation conditioning on money supply in the post-1986 period. However this relationship cannot be explained on the basis of traditional cost-push mechanisms. PMID:28708870
Assessment of resampling methods for causality testing: A note on the US inflation behavior.
Papana, Angeliki; Kyrtsou, Catherine; Kugiumtzis, Dimitris; Diks, Cees
2017-01-01
Different resampling methods for the null hypothesis of no Granger causality are assessed in the setting of multivariate time series, taking into account that the driving-response coupling is conditioned on the other observed variables. As appropriate test statistic for this setting, the partial transfer entropy (PTE), an information and model-free measure, is used. Two resampling techniques, time-shifted surrogates and the stationary bootstrap, are combined with three independence settings (giving a total of six resampling methods), all approximating the null hypothesis of no Granger causality. In these three settings, the level of dependence is changed, while the conditioning variables remain intact. The empirical null distribution of the PTE, as the surrogate and bootstrapped time series become more independent, is examined along with the size and power of the respective tests. Additionally, we consider a seventh resampling method by contemporaneously resampling the driving and the response time series using the stationary bootstrap. Although this case does not comply with the no causality hypothesis, one can obtain an accurate sampling distribution for the mean of the test statistic since its value is zero under H0. Results indicate that as the resampling setting gets more independent, the test becomes more conservative. Finally, we conclude with a real application. More specifically, we investigate the causal links among the growth rates for the US CPI, money supply and crude oil. Based on the PTE and the seven resampling methods, we consistently find that changes in crude oil cause inflation conditioning on money supply in the post-1986 period. However this relationship cannot be explained on the basis of traditional cost-push mechanisms.
A SIGNIFICANCE TEST FOR THE LASSO1
Lockhart, Richard; Taylor, Jonathan; Tibshirani, Ryan J.; Tibshirani, Robert
2014-01-01
In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an Exp(1) asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i.e., testing for a single significant predictor variable against the global null) requires only weak assumptions on the predictor matrix X. On the other hand, our proof for a general step in the lasso path places further technical assumptions on X and the generative model, but still allows for the important high-dimensional case p > n, and does not necessarily require that the current lasso model achieves perfect recovery of the truly active variables. Of course, for testing the significance of an additional variable between two nested linear models, one typically uses the chi-squared test, comparing the drop in residual sum of squares (RSS) to a χ12 distribution. But when this additional variable is not fixed, and has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than χ12 under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter λ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the l1 penalty. Therefore, the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties—adaptivity and shrinkage—and its null distribution is tractable and asymptotically Exp(1). PMID:25574062
Sources of Error and the Statistical Formulation of M S: m b Seismic Event Screening Analysis
NASA Astrophysics Data System (ADS)
Anderson, D. N.; Patton, H. J.; Taylor, S. R.; Bonner, J. L.; Selby, N. D.
2014-03-01
The Comprehensive Nuclear-Test-Ban Treaty (CTBT), a global ban on nuclear explosions, is currently in a ratification phase. Under the CTBT, an International Monitoring System (IMS) of seismic, hydroacoustic, infrasonic and radionuclide sensors is operational, and the data from the IMS is analysed by the International Data Centre (IDC). The IDC provides CTBT signatories basic seismic event parameters and a screening analysis indicating whether an event exhibits explosion characteristics (for example, shallow depth). An important component of the screening analysis is a statistical test of the null hypothesis H 0: explosion characteristics using empirical measurements of seismic energy (magnitudes). The established magnitude used for event size is the body-wave magnitude (denoted m b) computed from the initial segment of a seismic waveform. IDC screening analysis is applied to events with m b greater than 3.5. The Rayleigh wave magnitude (denoted M S) is a measure of later arriving surface wave energy. Magnitudes are measurements of seismic energy that include adjustments (physical correction model) for path and distance effects between event and station. Relative to m b, earthquakes generally have a larger M S magnitude than explosions. This article proposes a hypothesis test (screening analysis) using M S and m b that expressly accounts for physical correction model inadequacy in the standard error of the test statistic. With this hypothesis test formulation, the 2009 Democratic Peoples Republic of Korea announced nuclear weapon test fails to reject the null hypothesis H 0: explosion characteristics.
A Ratio Test of Interrater Agreement with High Specificity
ERIC Educational Resources Information Center
Cousineau, Denis; Laurencelle, Louis
2015-01-01
Existing tests of interrater agreements have high statistical power; however, they lack specificity. If the ratings of the two raters do not show agreement but are not random, the current tests, some of which are based on Cohen's kappa, will often reject the null hypothesis, leading to the wrong conclusion that agreement is present. A new test of…
A shift from significance test to hypothesis test through power analysis in medical research.
Singh, G
2006-01-01
Medical research literature until recently, exhibited substantial dominance of the Fisher's significance test approach of statistical inference concentrating more on probability of type I error over Neyman-Pearson's hypothesis test considering both probability of type I and II error. Fisher's approach dichotomises results into significant or not significant results with a P value. The Neyman-Pearson's approach talks of acceptance or rejection of null hypothesis. Based on the same theory these two approaches deal with same objective and conclude in their own way. The advancement in computing techniques and availability of statistical software have resulted in increasing application of power calculations in medical research and thereby reporting the result of significance tests in the light of power of the test also. Significance test approach, when it incorporates power analysis contains the essence of hypothesis test approach. It may be safely argued that rising application of power analysis in medical research may have initiated a shift from Fisher's significance test to Neyman-Pearson's hypothesis test procedure.
Bayesian evaluation of effect size after replicating an original study
van Aert, Robbie C. M.; van Assen, Marcel A. L. M.
2017-01-01
The vast majority of published results in the literature is statistically significant, which raises concerns about their reliability. The Reproducibility Project Psychology (RPP) and Experimental Economics Replication Project (EE-RP) both replicated a large number of published studies in psychology and economics. The original study and replication were statistically significant in 36.1% in RPP and 68.8% in EE-RP suggesting many null effects among the replicated studies. However, evidence in favor of the null hypothesis cannot be examined with null hypothesis significance testing. We developed a Bayesian meta-analysis method called snapshot hybrid that is easy to use and understand and quantifies the amount of evidence in favor of a zero, small, medium and large effect. The method computes posterior model probabilities for a zero, small, medium, and large effect and adjusts for publication bias by taking into account that the original study is statistically significant. We first analytically approximate the methods performance, and demonstrate the necessity to control for the original study’s significance to enable the accumulation of evidence for a true zero effect. Then we applied the method to the data of RPP and EE-RP, showing that the underlying effect sizes of the included studies in EE-RP are generally larger than in RPP, but that the sample sizes of especially the included studies in RPP are often too small to draw definite conclusions about the true effect size. We also illustrate how snapshot hybrid can be used to determine the required sample size of the replication akin to power analysis in null hypothesis significance testing and present an easy to use web application (https://rvanaert.shinyapps.io/snapshot/) and R code for applying the method. PMID:28388646
Pridemore, William Alex; Freilich, Joshua D
2007-12-01
Since Roe v. Wade, most states have passed laws either restricting or further protecting reproductive rights. During a wave of anti-abortion violence in the early 1990s, several states also enacted legislation protecting abortion clinics, staff, and patients. One hypothesis drawn from the theoretical literature predicts that these laws provide a deterrent effect and thus fewer anti-abortion crimes in states that protect clinics and reproductive rights. An alternative hypothesis drawn from the literature expects a backlash effect from radical members of the movement and thus more crimes in states with protective legislation. We tested these competing hypotheses by taking advantage of unique data sets that gauge the strength of laws protecting clinics and reproductive rights and that provide self-report victimization data from clinics. Employing logistic regression and controlling for several potential covariates, we found null effects and thus no support for either hypothesis. The null findings were consistent across a number of different types of victimization. Our discussion contextualizes these results in terms of previous research on crimes against abortion providers, discusses alternative explanations for the null findings, and considers the implications for future policy development and research.
One-way ANOVA based on interval information
NASA Astrophysics Data System (ADS)
Hesamian, Gholamreza
2016-08-01
This paper deals with extending the one-way analysis of variance (ANOVA) to the case where the observed data are represented by closed intervals rather than real numbers. In this approach, first a notion of interval random variable is introduced. Especially, a normal distribution with interval parameters is introduced to investigate hypotheses about the equality of interval means or test the homogeneity of interval variances assumption. Moreover, the least significant difference (LSD method) for investigating multiple comparison of interval means is developed when the null hypothesis about the equality of means is rejected. Then, at a given interval significance level, an index is applied to compare the interval test statistic and the related interval critical value as a criterion to accept or reject the null interval hypothesis of interest. Finally, the method of decision-making leads to some degrees to accept or reject the interval hypotheses. An applied example will be used to show the performance of this method.
Students' Understanding of Conditional Probability on Entering University
ERIC Educational Resources Information Center
Reaburn, Robyn
2013-01-01
An understanding of conditional probability is essential for students of inferential statistics as it is used in Null Hypothesis Tests. Conditional probability is also used in Bayes' theorem, in the interpretation of medical screening tests and in quality control procedures. This study examines the understanding of conditional probability of…
Conservative Tests under Satisficing Models of Publication Bias.
McCrary, Justin; Christensen, Garret; Fanelli, Daniele
2016-01-01
Publication bias leads consumers of research to observe a selected sample of statistical estimates calculated by producers of research. We calculate critical values for statistical significance that could help to adjust after the fact for the distortions created by this selection effect, assuming that the only source of publication bias is file drawer bias. These adjusted critical values are easy to calculate and differ from unadjusted critical values by approximately 50%-rather than rejecting a null hypothesis when the t-ratio exceeds 2, the analysis suggests rejecting a null hypothesis when the t-ratio exceeds 3. Samples of published social science research indicate that on average, across research fields, approximately 30% of published t-statistics fall between the standard and adjusted cutoffs.
Conservative Tests under Satisficing Models of Publication Bias
McCrary, Justin; Christensen, Garret; Fanelli, Daniele
2016-01-01
Publication bias leads consumers of research to observe a selected sample of statistical estimates calculated by producers of research. We calculate critical values for statistical significance that could help to adjust after the fact for the distortions created by this selection effect, assuming that the only source of publication bias is file drawer bias. These adjusted critical values are easy to calculate and differ from unadjusted critical values by approximately 50%—rather than rejecting a null hypothesis when the t-ratio exceeds 2, the analysis suggests rejecting a null hypothesis when the t-ratio exceeds 3. Samples of published social science research indicate that on average, across research fields, approximately 30% of published t-statistics fall between the standard and adjusted cutoffs. PMID:26901834
A phenological mid-domain effect in flowering diversity.
Morales, Manuel A; Dodge, Gary J; Inouye, David W
2005-01-01
In this paper, we test the mid-domain hypothesis as an explanation for observed patterns of flowering diversity in two sub-alpine communities of insect-pollinated plants. Observed species richness patterns showed an early-season increase in richness, a mid-season peak, and a late-season decrease. We show that a "mid-domain" null model can qualitatively match this pattern of flowering species richness, with R(2) values typically greater than 60%. We find significant or marginally significant departures from expected patterns of diversity for only 3 out of 12 year-site combinations. On the other hand, we do find a consistent pattern of departure when comparing observed versus null-model predicted flowering diversity averaged across years. Our results therefore support the hypothesis that ecological factors shape patterns of flowering phenology, but that the strength or nature of these environmental forcings may differ between years or the two habitats we studied, or may depend on species-specific characteristics of these plant communities. We conclude that mid-domain null models provide an important baseline from which to test departure of expected patterns of flowering diversity across temporal domains. Geometric constraints should be included first in the list of factors that drive seasonal patterns of flowering diversity.
Accuracy of maxillary positioning after standard and inverted orthognathic sequencing.
Ritto, Fabio G; Ritto, Thiago G; Ribeiro, Danilo Passeado; Medeiros, Paulo José; de Moraes, Márcio
2014-05-01
This study aimed to compare the accuracy of maxillary positioning after bimaxillary orthognathic surgery, using 2 sequences. A total of 80 cephalograms (40 preoperative and 40 postoperative) from 40 patients were analyzed. Group 1 included radiographs of patients submitted to conventional sequence, whereas group 2 patients were submitted to inverted sequence. The final position of the maxillary central incisor was obtained after vertical and horizontal measurements of the tracings, and it was compared with what had been planned. The null hypothesis, which stated that there would be no difference between the groups, was tested. After applying the Welch t test for comparison of mean differences between maxillary desired and achieved position, considering a statistical significance of 5% and a 2-tailed test, the null hypothesis was not rejected (P > .05). Thus, there was no difference in the accuracy of maxillary positioning between groups. Conventional and inverted sequencing proved to be reliable in positioning the maxilla after LeFort I osteotomy in bimaxillary orthognathic surgeries. Copyright © 2014 Elsevier Inc. All rights reserved.
Significance levels for studies with correlated test statistics.
Shi, Jianxin; Levinson, Douglas F; Whittemore, Alice S
2008-07-01
When testing large numbers of null hypotheses, one needs to assess the evidence against the global null hypothesis that none of the hypotheses is false. Such evidence typically is based on the test statistic of the largest magnitude, whose statistical significance is evaluated by permuting the sample units to simulate its null distribution. Efron (2007) has noted that correlation among the test statistics can induce substantial interstudy variation in the shapes of their histograms, which may cause misleading tail counts. Here, we show that permutation-based estimates of the overall significance level also can be misleading when the test statistics are correlated. We propose that such estimates be conditioned on a simple measure of the spread of the observed histogram, and we provide a method for obtaining conditional significance levels. We justify this conditioning using the conditionality principle described by Cox and Hinkley (1974). Application of the method to gene expression data illustrates the circumstances when conditional significance levels are needed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Psaltis, Dimitrios; Özel, Feryal; Chan, Chi-Kwan
2015-12-01
The half opening angle of a Kerr black hole shadow is always equal to (5 ± 0.2)GM/Dc{sup 2}, where M is the mass of the black hole and D is its distance from the Earth. Therefore, measuring the size of a shadow and verifying whether it is within this 4% range constitutes a null hypothesis test of general relativity. We show that the black hole in the center of the Milky Way, Sgr A*, is the optimal target for performing this test with upcoming observations using the Event Horizon Telescope (EHT). We use the results of optical/IR monitoring of stellar orbits to showmore » that the mass-to-distance ratio for Sgr A* is already known to an accuracy of ∼4%. We investigate our prior knowledge of the properties of the scattering screen between Sgr A* and the Earth, the effects of which will need to be corrected for in order for the black hole shadow to appear sharp against the background emission. Finally, we explore an edge detection scheme for interferometric data and a pattern matching algorithm based on the Hough/Radon transform and demonstrate that the shadow of the black hole at 1.3 mm can be localized, in principle, to within ∼9%. All these results suggest that our prior knowledge of the properties of the black hole, of scattering broadening, and of the accretion flow can only limit this general relativistic null hypothesis test with EHT observations of Sgr A* to ≲10%.« less
2011-05-24
of 230 community similarity (Legendre and Legendre 1998). 231 232 Permutational Multivariate Analysis of Variance ( PerMANOVA ) (McArdle...241 null hypothesis can be rejected with a type I error rate of a. We used an implementation 242 of PerMANOVA that involved sequential removal...TEXTURE, and 249 HABITAT. 250 251 The null distribution for PerMANOVA tests for site-scale effects was generated 252 using a restricted
On the Directionality Test of Peer Effects in Social Networks
ERIC Educational Resources Information Center
An, Weihua
2016-01-01
One interesting idea in social network analysis is the directionality test that utilizes the directions of social ties to help identify peer effects. The null hypothesis of the test is that if contextual factors are the only force that affects peer outcomes, the estimated peer effects should not differ, if the directions of social ties are…
Statistical significance versus clinical relevance.
van Rijn, Marieke H C; Bech, Anneke; Bouyer, Jean; van den Brand, Jan A J G
2017-04-01
In March this year, the American Statistical Association (ASA) posted a statement on the correct use of P-values, in response to a growing concern that the P-value is commonly misused and misinterpreted. We aim to translate these warnings given by the ASA into a language more easily understood by clinicians and researchers without a deep background in statistics. Moreover, we intend to illustrate the limitations of P-values, even when used and interpreted correctly, and bring more attention to the clinical relevance of study findings using two recently reported studies as examples. We argue that P-values are often misinterpreted. A common mistake is saying that P < 0.05 means that the null hypothesis is false, and P ≥0.05 means that the null hypothesis is true. The correct interpretation of a P-value of 0.05 is that if the null hypothesis were indeed true, a similar or more extreme result would occur 5% of the times upon repeating the study in a similar sample. In other words, the P-value informs about the likelihood of the data given the null hypothesis and not the other way around. A possible alternative related to the P-value is the confidence interval (CI). It provides more information on the magnitude of an effect and the imprecision with which that effect was estimated. However, there is no magic bullet to replace P-values and stop erroneous interpretation of scientific results. Scientists and readers alike should make themselves familiar with the correct, nuanced interpretation of statistical tests, P-values and CIs. © The Author 2017. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
UNIFORMLY MOST POWERFUL BAYESIAN TESTS
Johnson, Valen E.
2014-01-01
Uniformly most powerful tests are statistical hypothesis tests that provide the greatest power against a fixed null hypothesis among all tests of a given size. In this article, the notion of uniformly most powerful tests is extended to the Bayesian setting by defining uniformly most powerful Bayesian tests to be tests that maximize the probability that the Bayes factor, in favor of the alternative hypothesis, exceeds a specified threshold. Like their classical counterpart, uniformly most powerful Bayesian tests are most easily defined in one-parameter exponential family models, although extensions outside of this class are possible. The connection between uniformly most powerful tests and uniformly most powerful Bayesian tests can be used to provide an approximate calibration between p-values and Bayes factors. Finally, issues regarding the strong dependence of resulting Bayes factors and p-values on sample size are discussed. PMID:24659829
Xie, Yufen; Wang, Yingchun; Sun, Tong; Wang, Fangfei; Trostinskaia, Anna; Puscheck, Elizabeth; Rappolee, Daniel A
2005-05-01
Mitogen-activated protein kinase (MAPK) signaling pathways play an important role in controlling embryonic proliferation and differentiation. It has been demonstrated that sequential lipophilic signal transduction mediators that participate in the MAPK pathway are null post-implantation lethal. It is not clear why the lethality of these null mutants arises after implantation and not before. One hypothesis is that the gene product of these post-implantation lethal null mutants are not present before implantation in normal embryos and do not have function until after implantation. To test this hypothesis, we selected a set of lipophilic genes mediating MAPK signal transduction pathways whose null mutants result in early peri-implantation or placental lethality. These included FRS2alpha, GAB1, GRB2, SOS1, Raf-B, and Raf1. Products of these selected genes were detected and their locations and functions indicated by indirect immunocytochemistry and Western blotting for proteins and RT-polymerase chain reaction (PCR) for mRNA transcription. We report here that all six signal mediators are detected at the protein level in preimplantation mouse embryo, placental trophoblasts, and in cultured trophoblast stem cells (TSC). Proteins are all detected in E3.5 embryos at a time when the first known mitogenic intercellular communication has been documented. mRNA transcripts of two post-implantation null mutant genes are expressed in mouse preimplantation embryos and unfertilized eggs. These mRNA transcripts were detected as maternal mRNA in unfertilized eggs that could delay the lethality of null mutants. All of the proteins were detected in the cytoplasm or in the cell membrane. This study of spatial and temporal expression revealed that all of these six null mutants post-implantation genes in MAPK pathway are expressed and, where tested, phosphorylated/activated proteins are detected in the blastocyst. Studies on RNA expression using RT-PCR suggest that maternal RNA could play an important role in delaying the presence of the lethal phenotype of null mutations. Copyright (c) 2005 Wiley-Liss, Inc.
Hypothesis testing for band size detection of high-dimensional banded precision matrices.
An, Baiguo; Guo, Jianhua; Liu, Yufeng
2014-06-01
Many statistical analysis procedures require a good estimator for a high-dimensional covariance matrix or its inverse, the precision matrix. When the precision matrix is banded, the Cholesky-based method often yields a good estimator of the precision matrix. One important aspect of this method is determination of the band size of the precision matrix. In practice, crossvalidation is commonly used; however, we show that crossvalidation not only is computationally intensive but can be very unstable. In this paper, we propose a new hypothesis testing procedure to determine the band size in high dimensions. Our proposed test statistic is shown to be asymptotically normal under the null hypothesis, and its theoretical power is studied. Numerical examples demonstrate the effectiveness of our testing procedure.
Three New Methods for Analysis of Answer Changes
ERIC Educational Resources Information Center
Sinharay, Sandip; Johnson, Matthew S.
2017-01-01
In a pioneering research article, Wollack and colleagues suggested the "erasure detection index" (EDI) to detect test tampering. The EDI can be used with or without a continuity correction and is assumed to follow the standard normal distribution under the null hypothesis of no test tampering. When used without a continuity correction,…
Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong
2013-01-01
As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.
Rosales, Corina; Patel, Niket; Gillard, Baiba K.; Yelamanchili, Dedipya; Yang, Yaliu; Courtney, Harry S.; Santos, Raul D.; Gotto, Antonio M.; Pownall, Henry J.
2016-01-01
The reaction of Streptococcal serum opacity factor (SOF) against plasma high-density lipoproteins (HDL) produces a large cholesteryl ester-rich microemulsion (CERM), a smaller neo HDL that is apolipoprotein (apo) AI-poor, and lipid-free apo AI. SOF is active vs. both human and mouse plasma HDL. In vivo injection of SOF into mice reduces plasma cholesterol ~40% in 3 hours while forming the same products observed in vitro, but at different ratios. Previous studies supported the hypothesis that labile apo AI is required for the SOF reaction vs. HDL. Here we further tested that hypothesis by studies of SOF against HDL from apo AI-null mice. When injected into apo AI-null mice, SOF reduced plasma cholesterol ~35% in three hours. The reaction of SOF vs. apo AI-null HDL in vitro produced a CERM and neo HDL, but no lipid-free apo. Moreover, according to the rate of CERM formation, the extent and rate of the SOF reaction vs. apo AI-null mouse HDL was less than that against wild-type (WT) mouse HDL. Chaotropic perturbation studies using guanidine hydrochloride showed that apo AI-null HDL was more stable than WT HDL. Human apo AI added to apo AI-null HDL was quantitatively incorporated, giving reconstituted HDL. Both SOF and guanidine hydrochloride displaced apo AI from the reconstituted HDL. These results support the conclusion that apo AI-null HDL is more stable than WT HDL because it lacks apo AI, a labile protein that is readily displaced by physico-chemical and biochemical perturbations. Thus, apo AI-null HDL is less SOF-reactive than WT HDL. The properties of apo AI-null HDL can be partially restored to those of WT HDL by the spontaneous incorporation of human apo AI. It remains to be determined what other HDL functions are affected by apo AI deletion. PMID:25790332
Map LineUps: Effects of spatial structure on graphical inference.
Beecham, Roger; Dykes, Jason; Meulemans, Wouter; Slingsby, Aidan; Turkay, Cagatay; Wood, Jo
2017-01-01
Fundamental to the effective use of visualization as an analytic and descriptive tool is the assurance that presenting data visually provides the capability of making inferences from what we see. This paper explores two related approaches to quantifying the confidence we may have in making visual inferences from mapped geospatial data. We adapt Wickham et al.'s 'Visual Line-up' method as a direct analogy with Null Hypothesis Significance Testing (NHST) and propose a new approach for generating more credible spatial null hypotheses. Rather than using as a spatial null hypothesis the unrealistic assumption of complete spatial randomness, we propose spatially autocorrelated simulations as alternative nulls. We conduct a set of crowdsourced experiments (n=361) to determine the just noticeable difference (JND) between pairs of choropleth maps of geographic units controlling for spatial autocorrelation (Moran's I statistic) and geometric configuration (variance in spatial unit area). Results indicate that people's abilities to perceive differences in spatial autocorrelation vary with baseline autocorrelation structure and the geometric configuration of geographic units. These results allow us, for the first time, to construct a visual equivalent of statistical power for geospatial data. Our JND results add to those provided in recent years by Klippel et al. (2011), Harrison et al. (2014) and Kay & Heer (2015) for correlation visualization. Importantly, they provide an empirical basis for an improved construction of visual line-ups for maps and the development of theory to inform geospatial tests of graphical inference.
Random variability explains apparent global clustering of large earthquakes
Michael, A.J.
2011-01-01
The occurrence of 5 Mw ≥ 8.5 earthquakes since 2004 has created a debate over whether or not we are in a global cluster of large earthquakes, temporarily raising risks above long-term levels. I use three classes of statistical tests to determine if the record of M ≥ 7 earthquakes since 1900 can reject a null hypothesis of independent random events with a constant rate plus localized aftershock sequences. The data cannot reject this null hypothesis. Thus, the temporal distribution of large global earthquakes is well-described by a random process, plus localized aftershocks, and apparent clustering is due to random variability. Therefore the risk of future events has not increased, except within ongoing aftershock sequences, and should be estimated from the longest possible record of events.
Interpreting null findings from trials of alcohol brief interventions.
Heather, Nick
2014-01-01
The effectiveness of alcohol brief intervention (ABI) has been established by a succession of meta-analyses but, because the effects of ABI are small, null findings from randomized controlled trials are often reported and can sometimes lead to skepticism regarding the benefits of ABI in routine practice. This article first explains why null findings are likely to occur under null hypothesis significance testing (NHST) due to the phenomenon known as "the dance of the p-values." A number of misconceptions about null findings are then described, using as an example the way in which the results of the primary care arm of a recent cluster-randomized trial of ABI in England (the SIPS project) have been misunderstood. These misinterpretations include the fallacy of "proving the null hypothesis" that lack of a significant difference between the means of sample groups can be taken as evidence of no difference between their population means, and the possible effects of this and related misunderstandings of the SIPS findings are examined. The mistaken inference that reductions in alcohol consumption seen in control groups from baseline to follow-up are evidence of real effects of control group procedures is then discussed and other possible reasons for such reductions, including regression to the mean, research participation effects, historical trends, and assessment reactivity, are described. From the standpoint of scientific progress, the chief problem about null findings under the conventional NHST approach is that it is not possible to distinguish "evidence of absence" from "absence of evidence." By contrast, under a Bayesian approach, such a distinction is possible and it is explained how this approach could classify ABIs in particular settings or among particular populations as either truly ineffective or as of unknown effectiveness, thus accelerating progress in the field of ABI research.
SANABRIA, FEDERICO; KILLEEN, PETER R.
2008-01-01
Despite being under challenge for the past 50 years, null hypothesis significance testing (NHST) remains dominant in the scientific field for want of viable alternatives. NHST, along with its significance level p, is inadequate for most of the uses to which it is put, a flaw that is of particular interest to educational practitioners who too often must use it to sanctify their research. In this article, we review the failure of NHST and propose prep, the probability of replicating an effect, as a more useful statistic for evaluating research and aiding practical decision making. PMID:19122766
The impact of p53 protein core domain structural alteration on ovarian cancer survival.
Rose, Stephen L; Robertson, Andrew D; Goodheart, Michael J; Smith, Brian J; DeYoung, Barry R; Buller, Richard E
2003-09-15
Although survival with a p53 missense mutation is highly variable, p53-null mutation is an independent adverse prognostic factor for advanced stage ovarian cancer. By evaluating ovarian cancer survival based upon a structure function analysis of the p53 protein, we tested the hypothesis that not all missense mutations are equivalent. The p53 gene was sequenced from 267 consecutive ovarian cancers. The effect of individual missense mutations on p53 structure was analyzed using the International Agency for Research on Cancer p53 Mutational Database, which specifies the effects of p53 mutations on p53 core domain structure. Mutations in the p53 core domain were classified as either explained or not explained in structural or functional terms by their predicted effects on protein folding, protein-DNA contacts, or mutation in highly conserved residues. Null mutations were classified by their mechanism of origin. Mutations were sequenced from 125 tumors. Effects of 62 of the 82 missense mutations (76%) could be explained by alterations in the p53 protein. Twenty-three (28%) of the explained mutations occurred in highly conserved regions of the p53 core protein. Twenty-two nonsense point mutations and 21 frameshift null mutations were sequenced. Survival was independent of missense mutation type and mechanism of null mutation. The hypothesis that not all missense mutations are equivalent is, therefore, rejected. Furthermore, p53 core domain structural alteration secondary to missense point mutation is not functionally equivalent to a p53-null mutation. The poor prognosis associated with p53-null mutation is independent of the mutation mechanism.
A two-hypothesis approach to establishing a life detection/biohazard protocol for planetary samples
NASA Astrophysics Data System (ADS)
Conley, Catharine; Steele, Andrew
2016-07-01
The COSPAR policy on performing a biohazard assessment on samples brought from Mars to Earth is framed in the context of a concern for false-positive results. However, as noted during the 2012 Workshop for Life Detection in Samples from Mars (ref. Kminek et al., 2014), a more significant concern for planetary samples brought to Earth is false-negative results, because an undetected biohazard could increase risk to the Earth. This is the reason that stringent contamination control must be a high priority for all Category V Restricted Earth Return missions. A useful conceptual framework for addressing these concerns involves two complementary 'null' hypotheses: testing both of them, together, would allow statistical and community confidence to be developed regarding one or the other conclusion. As noted above, false negatives are of primary concern for safety of the Earth, so the 'Earth Safety null hypothesis' -- that must be disproved to assure low risk to the Earth from samples introduced by Category V Restricted Earth Return missions -- is 'There is native life in these samples.' False positives are of primary concern for Astrobiology, so the 'Astrobiology null hypothesis' -- that must be disproved in order to demonstrate the existence of extraterrestrial life is 'There is no life in these samples.' The presence of Earth contamination would render both of these hypotheses more difficult to disprove. Both these hypotheses can be tested following a strict science protocol; analyse, interprete, test the hypotheses and repeat. The science measurements undertaken are then done in an iterative fashion that responds to discovery with both hypotheses testable from interpretation of the scientific data. This is a robust, community involved activity that ensures maximum science return with minimal sample use.
Siller, Saul S.; Broadie, Kendal
2011-01-01
SUMMARY Fragile X syndrome (FXS), caused by loss of the fragile X mental retardation 1 (FMR1) product (FMRP), is the most common cause of inherited intellectual disability and autism spectrum disorders. FXS patients suffer multiple behavioral symptoms, including hyperactivity, disrupted circadian cycles, and learning and memory deficits. Recently, a study in the mouse FXS model showed that the tetracycline derivative minocycline effectively remediates the disease state via a proposed matrix metalloproteinase (MMP) inhibition mechanism. Here, we use the well-characterized Drosophila FXS model to assess the effects of minocycline treatment on multiple neural circuit morphological defects and to investigate the MMP hypothesis. We first treat Drosophila Fmr1 (dfmr1) null animals with minocycline to assay the effects on mutant synaptic architecture in three disparate locations: the neuromuscular junction (NMJ), clock neurons in the circadian activity circuit and Kenyon cells in the mushroom body learning and memory center. We find that minocycline effectively restores normal synaptic structure in all three circuits, promising therapeutic potential for FXS treatment. We next tested the MMP hypothesis by assaying the effects of overexpressing the sole Drosophila tissue inhibitor of MMP (TIMP) in dfmr1 null mutants. We find that TIMP overexpression effectively prevents defects in the NMJ synaptic architecture in dfmr1 mutants. Moreover, co-removal of dfmr1 similarly rescues TIMP overexpression phenotypes, including cellular tracheal defects and lethality. To further test the MMP hypothesis, we generated dfmr1;mmp1 double null mutants. Null mmp1 mutants are 100% lethal and display cellular tracheal defects, but co-removal of dfmr1 allows adult viability and prevents tracheal defects. Conversely, co-removal of mmp1 ameliorates the NMJ synaptic architecture defects in dfmr1 null mutants, despite the lack of detectable difference in MMP1 expression or gelatinase activity between the single dfmr1 mutants and controls. These results support minocycline as a promising potential FXS treatment and suggest that it might act via MMP inhibition. We conclude that FMRP and TIMP pathways interact in a reciprocal, bidirectional manner. PMID:21669931
Clark, Cameron M; Lawlor-Savage, Linette; Goghari, Vina M
2017-01-01
Training of working memory as a method of increasing working memory capacity and fluid intelligence has received much attention in recent years. This burgeoning field remains highly controversial with empirically-backed disagreements at all levels of evidence, including individual studies, systematic reviews, and even meta-analyses. The current study investigated the effect of a randomized six week online working memory intervention on untrained cognitive abilities in a community-recruited sample of healthy young adults, in relation to both a processing speed training active control condition, as well as a no-contact control condition. Results of traditional null hypothesis significance testing, as well as Bayesian factor analyses, revealed support for the null hypothesis across all cognitive tests administered before and after training. Importantly, all three groups were similar at pre-training for a variety of individual variables purported to moderate transfer of training to fluid intelligence, including personality traits, motivation to train, and expectations of cognitive improvement from training. Because these results are consistent with experimental trials of equal or greater methodological rigor, we suggest that future research re-focus on: 1) other promising interventions known to increase memory performance in healthy young adults, and; 2) examining sub-populations or alternative populations in which working memory training may be efficacious.
Elaborating Selected Statistical Concepts with Common Experience.
ERIC Educational Resources Information Center
Weaver, Kenneth A.
1992-01-01
Presents ways of elaborating statistical concepts so as to make course material more meaningful for students. Describes examples using exclamations, circus and cartoon characters, and falling leaves to illustrate variability, null hypothesis testing, and confidence interval. Concludes that the exercises increase student comprehension of the text…
Null Hypothesis Significance Testing and "p" Values
ERIC Educational Resources Information Center
Travers, Jason C.; Cook, Bryan G.; Cook, Lysandra
2017-01-01
"p" values are commonly reported in quantitative research, but are often misunderstood and misinterpreted by research consumers. Our aim in this article is to provide special educators with guidance for appropriately interpreting "p" values, with the broader goal of improving research consumers' understanding and interpretation…
Predicting Cost and Schedule Growth for Military and Civil Space Systems
2008-03-01
the Shapiro-Wilk Test , and testing the residuals for constant variance using the Breusch - Pagan test . For logistic models, diagnostics include...the Breusch - Pagan Test . With this test , a p-value below 0.05 rejects the null hypothesis that the residuals have constant variance. Thus, similar...to the Shapiro- Wilk Test , because the optimal model will have constant variance of its residuals, this requires Breusch - Pagan p-values over 0.05
Model error in covariance structure models: Some implications for power and Type I error
Coffman, Donna L.
2010-01-01
The present study investigated the degree to which violation of the parameter drift assumption affects the Type I error rate for the test of close fit and power analysis procedures proposed by MacCallum, Browne, and Sugawara (1996) for both the test of close fit and the test of exact fit. The parameter drift assumption states that as sample size increases both sampling error and model error (i.e. the degree to which the model is an approximation in the population) decrease. Model error was introduced using a procedure proposed by Cudeck and Browne (1992). The empirical power for both the test of close fit, in which the null hypothesis specifies that the Root Mean Square Error of Approximation (RMSEA) ≤ .05, and the test of exact fit, in which the null hypothesis specifies that RMSEA = 0, is compared with the theoretical power computed using the MacCallum et al. (1996) procedure. The empirical power and theoretical power for both the test of close fit and the test of exact fit are nearly identical under violations of the assumption. The results also indicated that the test of close fit maintains the nominal Type I error rate under violations of the assumption. PMID:21331302
The effects of temperature on sex determination in the bloater Coregonus hoyi: a hypothesis test
Eck, Gary W.; Allen, Jeffrey D.
1995-01-01
The hypothesis that temperature was an epigamic factor in bloater (Coregonus hoyi) sex determination in Lake Michigan was tested by rearing bloater larvae in the laboratory at 6, 11, and 15 degrees C for the first 80 days after hatching. The percentages of females of fish exposed to the three treatment temperatures did not differ significantly from the expected, 50%. Therefore, the null hypothesis, that temperature did not influence bloater sex determination within the confines of this study, could not be rejected. Our study of bloater sex determination was an attempt to explain the extreme female predominance (> 95%) that occurred in the Lake Michigan bloater population during the 1960s.
Performing Inferential Statistics Prior to Data Collection
ERIC Educational Resources Information Center
Trafimow, David; MacDonald, Justin A.
2017-01-01
Typically, in education and psychology research, the investigator collects data and subsequently performs descriptive and inferential statistics. For example, a researcher might compute group means and use the null hypothesis significance testing procedure to draw conclusions about the populations from which the groups were drawn. We propose an…
On Some Assumptions of the Null Hypothesis Statistical Testing
ERIC Educational Resources Information Center
Patriota, Alexandre Galvão
2017-01-01
Bayesian and classical statistical approaches are based on different types of logical principles. In order to avoid mistaken inferences and misguided interpretations, the practitioner must respect the inference rules embedded into each statistical method. Ignoring these principles leads to the paradoxical conclusions that the hypothesis…
Building Intuitions about Statistical Inference Based on Resampling
ERIC Educational Resources Information Center
Watson, Jane; Chance, Beth
2012-01-01
Formal inference, which makes theoretical assumptions about distributions and applies hypothesis testing procedures with null and alternative hypotheses, is notoriously difficult for tertiary students to master. The debate about whether this content should appear in Years 11 and 12 of the "Australian Curriculum: Mathematics" has gone on…
A Bayesian bird's eye view of ‘Replications of important results in social psychology’
Schönbrodt, Felix D.; Yao, Yuling; Gelman, Andrew; Wagenmakers, Eric-Jan
2017-01-01
We applied three Bayesian methods to reanalyse the preregistered contributions to the Social Psychology special issue ‘Replications of Important Results in Social Psychology’ (Nosek & Lakens. 2014 Registered reports: a method to increase the credibility of published results. Soc. Psychol. 45, 137–141. (doi:10.1027/1864-9335/a000192)). First, individual-experiment Bayesian parameter estimation revealed that for directed effect size measures, only three out of 44 central 95% credible intervals did not overlap with zero and fell in the expected direction. For undirected effect size measures, only four out of 59 credible intervals contained values greater than 0.10 (10% of variance explained) and only 19 intervals contained values larger than 0.05. Second, a Bayesian random-effects meta-analysis for all 38 t-tests showed that only one out of the 38 hierarchically estimated credible intervals did not overlap with zero and fell in the expected direction. Third, a Bayes factor hypothesis test was used to quantify the evidence for the null hypothesis against a default one-sided alternative. Only seven out of 60 Bayes factors indicated non-anecdotal support in favour of the alternative hypothesis (BF10>3), whereas 51 Bayes factors indicated at least some support for the null hypothesis. We hope that future analyses of replication success will embrace a more inclusive statistical approach by adopting a wider range of complementary techniques. PMID:28280547
ERIC Educational Resources Information Center
Umesh, U. N.; Mishra, Sanjay
1990-01-01
Major issues related to index-of-fit conjoint analysis were addressed in this simulation study. Goals were to develop goodness-of-fit criteria for conjoint analysis; develop tests to determine the significance of conjoint analysis results; and calculate the power of the test of the null hypothesis of random data distribution. (SLD)
ERIC Educational Resources Information Center
Adani, Anthony; Eskay, Michael; Onu, Victoria
2012-01-01
This quasi-experimental study examined the effect of self-instruction strategy on the achievement in algebra of students with learning difficulty in mathematics. Two research questions and one null hypothesis were formulated to guide the study. The study adopted a non-randomized pre-test and post-test control group design with one experimental…
Overgaard, Morten; Lindeløv, Jonas; Svejstrup, Stinna; Døssing, Marianne; Hvid, Tanja; Kauffmann, Oliver; Mouridsen, Kim
2013-01-01
This paper reports an experiment intended to test a particular hypothesis derived from blindsight research, which we name the “source misidentification hypothesis.” According to this hypothesis, a subject may be correct about a stimulus without being correct about how she had access to this knowledge (whether the stimulus was visual, auditory, or something else). We test this hypothesis in healthy subjects, asking them to report whether a masked stimulus was presented auditorily or visually, what the stimulus was, and how clearly they experienced the stimulus using the Perceptual Awareness Scale (PAS). We suggest that knowledge about perceptual modality may be a necessary precondition in order to issue correct reports of which stimulus was presented. Furthermore, we find that PAS ratings correlate with correctness, and that subjects are at chance level when reporting no conscious experience of the stimulus. To demonstrate that particular levels of reporting accuracy are obtained, we employ a statistical strategy, which operationally tests the hypothesis of non-equality, such that the usual rejection of the null-hypothesis admits the conclusion of equivalence. PMID:23508677
Earthquake likelihood model testing
Schorlemmer, D.; Gerstenberger, M.C.; Wiemer, S.; Jackson, D.D.; Rhoades, D.A.
2007-01-01
INTRODUCTIONThe Regional Earthquake Likelihood Models (RELM) project aims to produce and evaluate alternate models of earthquake potential (probability per unit volume, magnitude, and time) for California. Based on differing assumptions, these models are produced to test the validity of their assumptions and to explore which models should be incorporated in seismic hazard and risk evaluation. Tests based on physical and geological criteria are useful but we focus on statistical methods using future earthquake catalog data only. We envision two evaluations: a test of consistency with observed data and a comparison of all pairs of models for relative consistency. Both tests are based on the likelihood method, and both are fully prospective (i.e., the models are not adjusted to fit the test data). To be tested, each model must assign a probability to any possible event within a specified region of space, time, and magnitude. For our tests the models must use a common format: earthquake rates in specified “bins” with location, magnitude, time, and focal mechanism limits.Seismology cannot yet deterministically predict individual earthquakes; however, it should seek the best possible models for forecasting earthquake occurrence. This paper describes the statistical rules of an experiment to examine and test earthquake forecasts. The primary purposes of the tests described below are to evaluate physical models for earthquakes, assure that source models used in seismic hazard and risk studies are consistent with earthquake data, and provide quantitative measures by which models can be assigned weights in a consensus model or be judged as suitable for particular regions.In this paper we develop a statistical method for testing earthquake likelihood models. A companion paper (Schorlemmer and Gerstenberger 2007, this issue) discusses the actual implementation of these tests in the framework of the RELM initiative.Statistical testing of hypotheses is a common task and a wide range of possible testing procedures exist. Jolliffe and Stephenson (2003) present different forecast verifications from atmospheric science, among them likelihood testing of probability forecasts and testing the occurrence of binary events. Testing binary events requires that for each forecasted event, the spatial, temporal and magnitude limits be given. Although major earthquakes can be considered binary events, the models within the RELM project express their forecasts on a spatial grid and in 0.1 magnitude units; thus the results are a distribution of rates over space and magnitude. These forecasts can be tested with likelihood tests.In general, likelihood tests assume a valid null hypothesis against which a given hypothesis is tested. The outcome is either a rejection of the null hypothesis in favor of the test hypothesis or a nonrejection, meaning the test hypothesis cannot outperform the null hypothesis at a given significance level. Within RELM, there is no accepted null hypothesis and thus the likelihood test needs to be expanded to allow comparable testing of equipollent hypotheses.To test models against one another, we require that forecasts are expressed in a standard format: the average rate of earthquake occurrence within pre-specified limits of hypocentral latitude, longitude, depth, magnitude, time period, and focal mechanisms. Focal mechanisms should either be described as the inclination of P-axis, declination of P-axis, and inclination of the T-axis, or as strike, dip, and rake angles. Schorlemmer and Gerstenberger (2007, this issue) designed classes of these parameters such that similar models will be tested against each other. These classes make the forecasts comparable between models. Additionally, we are limited to testing only what is precisely defined and consistently reported in earthquake catalogs. Therefore it is currently not possible to test such information as fault rupture length or area, asperity location, etc. Also, to account for data quality issues, we allow for location and magnitude uncertainties as well as the probability that an event is dependent on another event.As we mentioned above, only models with comparable forecasts can be tested against each other. Our current tests are designed to examine grid-based models. This requires that any fault-based model be adapted to a grid before testing is possible. While this is a limitation of the testing, it is an inherent difficulty in any such comparative testing. Please refer to appendix B for a statistical evaluation of the application of the Poisson hypothesis to fault-based models.The testing suite we present consists of three different tests: L-Test, N-Test, and R-Test. These tests are defined similarily to Kagan and Jackson (1995). The first two tests examine the consistency of the hypotheses with the observations while the last test compares the spatial performances of the models.
Unscaled Bayes factors for multiple hypothesis testing in microarray experiments.
Bertolino, Francesco; Cabras, Stefano; Castellanos, Maria Eugenia; Racugno, Walter
2015-12-01
Multiple hypothesis testing collects a series of techniques usually based on p-values as a summary of the available evidence from many statistical tests. In hypothesis testing, under a Bayesian perspective, the evidence for a specified hypothesis against an alternative, conditionally on data, is given by the Bayes factor. In this study, we approach multiple hypothesis testing based on both Bayes factors and p-values, regarding multiple hypothesis testing as a multiple model selection problem. To obtain the Bayes factors we assume default priors that are typically improper. In this case, the Bayes factor is usually undetermined due to the ratio of prior pseudo-constants. We show that ignoring prior pseudo-constants leads to unscaled Bayes factor which do not invalidate the inferential procedure in multiple hypothesis testing, because they are used within a comparative scheme. In fact, using partial information from the p-values, we are able to approximate the sampling null distribution of the unscaled Bayes factor and use it within Efron's multiple testing procedure. The simulation study suggests that under normal sampling model and even with small sample sizes, our approach provides false positive and false negative proportions that are less than other common multiple hypothesis testing approaches based only on p-values. The proposed procedure is illustrated in two simulation studies, and the advantages of its use are showed in the analysis of two microarray experiments. © The Author(s) 2011.
ERIC Educational Resources Information Center
Stallings, William M.
In the educational research literature alpha, the a priori level of significance, and p, the a posteriori probability of obtaining a test statistic of at least a certain value when the null hypothesis is true, are often confused. Explanations for this confusion are offered. Paradoxically, alpha retains a prominent place in textbook discussions of…
Ultimate Attainment of Anaphora Resolution in L2 Chinese
ERIC Educational Resources Information Center
Zhao, Lucy Xia
2014-01-01
The current study tests the Interface Hypothesis through forward and backward anaphora in complex sentences with temporal subordinate clauses in highly proficient English-speaking learners' second-language (L2) Chinese. Forward anaphora is involved when the overt pronoun "ta" "he/she" or a null element appears in the subject…
Three Strategies for the Critical Use of Statistical Methods in Psychological Research
ERIC Educational Resources Information Center
Campitelli, Guillermo; Macbeth, Guillermo; Ospina, Raydonal; Marmolejo-Ramos, Fernando
2017-01-01
We present three strategies to replace the null hypothesis statistical significance testing approach in psychological research: (1) visual representation of cognitive processes and predictions, (2) visual representation of data distributions and choice of the appropriate distribution for analysis, and (3) model comparison. The three strategies…
Explorations in Statistics: Permutation Methods
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2012-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This eighth installment of "Explorations in Statistics" explores permutation methods, empiric procedures we can use to assess an experimental result--to test a null hypothesis--when we are reluctant to trust statistical…
USDA-ARS?s Scientific Manuscript database
Conservation tillage practices have combined genetically modified glyphosate resistant corn crops along with applications of the herbicide glyphosate. We tested the null hypothesis that the soil process of nitrification and the distribution of archaeal and bacterial nitrifying communities would not ...
Natural killer T cell facilitated engraftment of rat skin but not islet xenografts in mice.
Gordon, Ethel J; Kelkar, Vinaya
2009-01-01
We have studied cellular components required for xenograft survival mediated by anti-CD154 monoclonal antibody (mAb) and a transfusion of donor spleen cells and found that the elimination of CD4(+) but not CD8(+) cells significantly improves graft survival. A contribution of other cellular components, such as natural killer (NK) cells and natural killer T (NKT) cells, for costimulation blockade-induced xenograft survival has not been clearly defined. We therefore tested the hypothesis that NK or NKT cells would promote rat islet and skin xenograft acceptance in mice. Lewis rat islets or skin was transplanted into wild type B6 mice or into B6 mice that were Jalpha18(null), CD1(null), or beta2 microglobulin (beta2M)(null) NK 1.1 depleted, or perforin(null). Graft recipients were pretreated with an infusion of donor derived spleen cells and a brief course of anti-CD154 mAb treatments. Additional groups received mAb or cells only. We first observed that the depletion of NK1.1 cells does not significantly interfere with graft survival in C57BL/6 (B6) mice. We used NKT cell deficient B6 mice to test the hypothesis that NKT cells are involved in islet and skin xenograft survival in our model. These mice bear a null mutation in the gene for the Jalpha18 component of the T-cell receptor. The component is uniquely associated with NKT cells. We found no difference in islet xenograft survival between Jalpha18(null) and wild type B6 mice. In contrast, median skin graft survival appeared shorter in Jalpha18(null) recipients. These data imply a role for Jalpha18(+) NKT cells in skin xenograft survival in treated mice. In order to confirm this inference, we tested skin xenograft survival in B6 CD1(null) mice because NKT cells are CD1 restricted. Results of these trials demonstrate that the absence of CD1(+) cells adversely affects rat skin graft survival. An additional assay in beta2M(null) mice demonstrated a requirement for major histocompatibility complex (MHC) class I expression in the graft host, and we demonstrate that CD1 is the requisite MHC component. We further demonstrated that, unlike reports for allograft survival, skin xenograft survival does not require perforin-secreting NK cells. We conclude that MHC class I(+) CD1(+) Jalpha18(+) NKT cells promote the survival of rat skin but not rat islet xenografts. These studies implicate different mechanisms for inducing and maintaining islet vs. skin xenograft survival in mice treated with donor antigen and anti-CD154 mAb, and further indicate a role for NKT cells but not NK cells in skin xenograft survival.
What is too much variation? The null hypothesis in small-area analysis.
Diehr, P; Cain, K; Connell, F; Volinn, E
1990-01-01
A small-area analysis (SAA) in health services research often calculates surgery rates for several small areas, compares the largest rate to the smallest, notes that the difference is large, and attempts to explain this discrepancy as a function of service availability, physician practice styles, or other factors. SAAs are often difficult to interpret because there is little theoretical basis for determining how much variation would be expected under the null hypothesis that all of the small areas have similar underlying surgery rates and that the observed variation is due to chance. We developed a computer program to simulate the distribution of several commonly used descriptive statistics under the null hypothesis, and used it to examine the variability in rates among the counties of the state of Washington. The expected variability when the null hypothesis is true is surprisingly large, and becomes worse for procedures with low incidence, for smaller populations, when there is variability among the populations of the counties, and when readmissions are possible. The characteristics of four descriptive statistics were studied and compared. None was uniformly good, but the chi-square statistic had better performance than the others. When we reanalyzed five journal articles that presented sufficient data, the results were usually statistically significant. Since SAA research today is tending to deal with low-incidence events, smaller populations, and measures where readmissions are possible, more research is needed on the distribution of small-area statistics under the null hypothesis. New standards are proposed for the presentation of SAA results. PMID:2312306
What is too much variation? The null hypothesis in small-area analysis.
Diehr, P; Cain, K; Connell, F; Volinn, E
1990-02-01
A small-area analysis (SAA) in health services research often calculates surgery rates for several small areas, compares the largest rate to the smallest, notes that the difference is large, and attempts to explain this discrepancy as a function of service availability, physician practice styles, or other factors. SAAs are often difficult to interpret because there is little theoretical basis for determining how much variation would be expected under the null hypothesis that all of the small areas have similar underlying surgery rates and that the observed variation is due to chance. We developed a computer program to simulate the distribution of several commonly used descriptive statistics under the null hypothesis, and used it to examine the variability in rates among the counties of the state of Washington. The expected variability when the null hypothesis is true is surprisingly large, and becomes worse for procedures with low incidence, for smaller populations, when there is variability among the populations of the counties, and when readmissions are possible. The characteristics of four descriptive statistics were studied and compared. None was uniformly good, but the chi-square statistic had better performance than the others. When we reanalyzed five journal articles that presented sufficient data, the results were usually statistically significant. Since SAA research today is tending to deal with low-incidence events, smaller populations, and measures where readmissions are possible, more research is needed on the distribution of small-area statistics under the null hypothesis. New standards are proposed for the presentation of SAA results.
Perneger, Thomas V; Combescure, Christophe
2017-07-01
Published P-values provide a window into the global enterprise of medical research. The aim of this study was to use the distribution of published P-values to estimate the relative frequencies of null and alternative hypotheses and to seek irregularities suggestive of publication bias. This cross-sectional study included P-values published in 120 medical research articles in 2016 (30 each from the BMJ, JAMA, Lancet, and New England Journal of Medicine). The observed distribution of P-values was compared with expected distributions under the null hypothesis (i.e., uniform between 0 and 1) and the alternative hypothesis (strictly decreasing from 0 to 1). P-values were categorized according to conventional levels of statistical significance and in one-percent intervals. Among 4,158 recorded P-values, 26.1% were highly significant (P < 0.001), 9.1% were moderately significant (P ≥ 0.001 to < 0.01), 11.7% were weakly significant (P ≥ 0.01 to < 0.05), and 53.2% were nonsignificant (P ≥ 0.05). We noted three irregularities: (1) high proportion of P-values <0.001, especially in observational studies, (2) excess of P-values equal to 1, and (3) about twice as many P-values less than 0.05 compared with those more than 0.05. The latter finding was seen in both randomized trials and observational studies, and in most types of analyses, excepting heterogeneity tests and interaction tests. Under plausible assumptions, we estimate that about half of the tested hypotheses were null and the other half were alternative. This analysis suggests that statistical tests published in medical journals are not a random sample of null and alternative hypotheses but that selective reporting is prevalent. In particular, significant results are about twice as likely to be reported as nonsignificant results. Copyright © 2017 Elsevier Inc. All rights reserved.
Heightened risk of preterm birth and growth restriction after a first-born son.
Bruckner, Tim A; Mayo, Jonathan A; Gould, Jeffrey B; Stevenson, David K; Lewis, David B; Shaw, Gary M; Carmichael, Suzan L
2015-10-01
In Scandinavia, delivery of a first-born son elevates the risk of preterm delivery and intrauterine growth restriction of the next-born infant. External validity of these results remains unclear. We test this hypothesis for preterm delivery and growth restriction using the linked California birth cohort file. We examined the hypothesis separately by race and/or ethnicity. We retrieved data on 2,852,976 births to 1,426,488 mothers with at least two live births. Our within-mother tests applied Cox proportional hazards (preterm delivery, defined as less than 37 weeks gestation) and linear regression models (birth weight for gestational age percentiles). For non-Hispanic whites, Hispanics, Asians, and American Indian and/or Alaska Natives, analyses indicate heightened risk of preterm delivery and growth restriction after a first-born male. The race-specific hazard ratios for preterm delivery range from 1.07 to 1.18. Regression coefficients for birth weight for gestational age percentile range from -0.73 to -1.49. The 95% confidence intervals for all these estimates do not contain the null. By contrast, we could not reject the null for non-Hispanic black mothers. Whereas California findings generally support those from Scandinavia, the null results among non-Hispanic black mothers suggest that we do not detect adverse outcomes after a first-born male in all racial and/or ethnic groups. Copyright © 2015 Elsevier Inc. All rights reserved.
Lawlor-Savage, Linette; Goghari, Vina M.
2017-01-01
Training of working memory as a method of increasing working memory capacity and fluid intelligence has received much attention in recent years. This burgeoning field remains highly controversial with empirically-backed disagreements at all levels of evidence, including individual studies, systematic reviews, and even meta-analyses. The current study investigated the effect of a randomized six week online working memory intervention on untrained cognitive abilities in a community-recruited sample of healthy young adults, in relation to both a processing speed training active control condition, as well as a no-contact control condition. Results of traditional null hypothesis significance testing, as well as Bayesian factor analyses, revealed support for the null hypothesis across all cognitive tests administered before and after training. Importantly, all three groups were similar at pre-training for a variety of individual variables purported to moderate transfer of training to fluid intelligence, including personality traits, motivation to train, and expectations of cognitive improvement from training. Because these results are consistent with experimental trials of equal or greater methodological rigor, we suggest that future research re-focus on: 1) other promising interventions known to increase memory performance in healthy young adults, and; 2) examining sub-populations or alternative populations in which working memory training may be efficacious. PMID:28558000
Statistical analysis of particle trajectories in living cells
NASA Astrophysics Data System (ADS)
Briane, Vincent; Kervrann, Charles; Vimond, Myriam
2018-06-01
Recent advances in molecular biology and fluorescence microscopy imaging have made possible the inference of the dynamics of molecules in living cells. Such inference allows us to understand and determine the organization and function of the cell. The trajectories of particles (e.g., biomolecules) in living cells, computed with the help of object tracking methods, can be modeled with diffusion processes. Three types of diffusion are considered: (i) free diffusion, (ii) subdiffusion, and (iii) superdiffusion. The mean-square displacement (MSD) is generally used to discriminate the three types of particle dynamics. We propose here a nonparametric three-decision test as an alternative to the MSD method. The rejection of the null hypothesis, i.e., free diffusion, is accompanied by claims of the direction of the alternative (subdiffusion or superdiffusion). We study the asymptotic behavior of the test statistic under the null hypothesis and under parametric alternatives which are currently considered in the biophysics literature. In addition, we adapt the multiple-testing procedure of Benjamini and Hochberg to fit with the three-decision-test setting, in order to apply the test procedure to a collection of independent trajectories. The performance of our procedure is much better than the MSD method as confirmed by Monte Carlo experiments. The method is demonstrated on real data sets corresponding to protein dynamics observed in fluorescence microscopy.
Outlier Removal and the Relation with Reporting Errors and Quality of Psychological Research
Bakker, Marjan; Wicherts, Jelte M.
2014-01-01
Background The removal of outliers to acquire a significant result is a questionable research practice that appears to be commonly used in psychology. In this study, we investigated whether the removal of outliers in psychology papers is related to weaker evidence (against the null hypothesis of no effect), a higher prevalence of reporting errors, and smaller sample sizes in these papers compared to papers in the same journals that did not report the exclusion of outliers from the analyses. Methods and Findings We retrieved a total of 2667 statistical results of null hypothesis significance tests from 153 articles in main psychology journals, and compared results from articles in which outliers were removed (N = 92) with results from articles that reported no exclusion of outliers (N = 61). We preregistered our hypotheses and methods and analyzed the data at the level of articles. Results show no significant difference between the two types of articles in median p value, sample sizes, or prevalence of all reporting errors, large reporting errors, and reporting errors that concerned the statistical significance. However, we did find a discrepancy between the reported degrees of freedom of t tests and the reported sample size in 41% of articles that did not report removal of any data values. This suggests common failure to report data exclusions (or missingness) in psychological articles. Conclusions We failed to find that the removal of outliers from the analysis in psychological articles was related to weaker evidence (against the null hypothesis of no effect), sample size, or the prevalence of errors. However, our control sample might be contaminated due to nondisclosure of excluded values in articles that did not report exclusion of outliers. Results therefore highlight the importance of more transparent reporting of statistical analyses. PMID:25072606
The potential for increased power from combining P-values testing the same hypothesis.
Ganju, Jitendra; Julie Ma, Guoguang
2017-02-01
The conventional approach to hypothesis testing for formal inference is to prespecify a single test statistic thought to be optimal. However, we usually have more than one test statistic in mind for testing the null hypothesis of no treatment effect but we do not know which one is the most powerful. Rather than relying on a single p-value, combining p-values from prespecified multiple test statistics can be used for inference. Combining functions include Fisher's combination test and the minimum p-value. Using randomization-based tests, the increase in power can be remarkable when compared with a single test and Simes's method. The versatility of the method is that it also applies when the number of covariates exceeds the number of observations. The increase in power is large enough to prefer combined p-values over a single p-value. The limitation is that the method does not provide an unbiased estimator of the treatment effect and does not apply to situations when the model includes treatment by covariate interaction.
WASP (Write a Scientific Paper) using Excel - 8: t-Tests.
Grech, Victor
2018-06-01
t-Testing is a common component of inferential statistics when comparing two means. This paper explains the central limit theorem and the concept of the null hypothesis as well as types of errors. On the practical side, this paper outlines how different t-tests may be performed in Microsoft Excel, for different purposes, both statically as well as dynamically, with Excel's functions. Copyright © 2018 Elsevier B.V. All rights reserved.
Effects of DoD Engagements in Collaborative Humanitarian Assistance
2013-09-01
Breusch - Pagan (BP) test , which tests for heteroscedasticity in panel data using Lagrange Multipliers. The null hypothesis for the BP test is that...Two Stage Least Squares AOR Area of Responsibility BP Breusch - Pagan COCOM Combatant Command COMPACT Compact of Free Association DoD...homoscedasticity is present ( Breusch & Pagan , 1979, p. 1288). Each fixed effect, “CountryName,” “FiscalYear,”and the combined effect of both variables, was
Data-driven inference for the spatial scan statistic.
Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C
2011-08-02
Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
Globigerinoides ruber morphotypes in the Gulf of Mexico: a test of null hypothesis
Thirumalai, Kaustubh; Richey, Julie N.; Quinn, Terrence M.; Poore, Richard Z.
2014-01-01
Planktic foraminifer Globigerinoides ruber (G. ruber), due to its abundance and ubiquity in the tropical/subtropical mixed layer, has been the workhorse of paleoceanographic studies investigating past sea-surface conditions on a range of timescales. Recent geochemical work on the two principal white G. ruber (W) morphotypes, sensu stricto (ss) and sensu lato (sl), has hypothesized differences in seasonal preferences or calcification depths, implying that reconstructions using a non-selective mixture of morphotypes could potentially be biased. Here, we test these hypotheses by performing stable isotope and abundance measurements on the two morphotypes in sediment trap, core-top, and downcore samples from the northern Gulf of Mexico. As a test of null hypothesis, we perform the same analyses on couplets of G. ruber (W) specimens with attributes intermediate to the holotypic ss and sl morphologies. We find no systematic or significant offsets in coeval ss-sl δ18O, and δ13C. These offsets are no larger than those in the intermediate pairs. Coupling our results with foraminiferal statistical model INFAUNAL, we find that contrary to previous work elsewhere, there is no evidence for discrepancies in ss-sl calcifying depth habitat or seasonality in the Gulf of Mexico.
Does McNemar's test compare the sensitivities and specificities of two diagnostic tests?
Kim, Soeun; Lee, Woojoo
2017-02-01
McNemar's test is often used in practice to compare the sensitivities and specificities for the evaluation of two diagnostic tests. For correct evaluation of accuracy, an intuitive recommendation is to test the diseased and the non-diseased groups separately so that the sensitivities can be compared among the diseased, and specificities can be compared among the healthy group of people. This paper provides a rigorous theoretical framework for this argument and study the validity of McNemar's test regardless of the conditional independence assumption. We derive McNemar's test statistic under the null hypothesis considering both assumptions of conditional independence and conditional dependence. We then perform power analyses to show how the result is affected by the amount of the conditional dependence under alternative hypothesis.
The Epistemology of Mathematical and Statistical Modeling: A Quiet Methodological Revolution
ERIC Educational Resources Information Center
Rodgers, Joseph Lee
2010-01-01
A quiet methodological revolution, a modeling revolution, has occurred over the past several decades, almost without discussion. In contrast, the 20th century ended with contentious argument over the utility of null hypothesis significance testing (NHST). The NHST controversy may have been at least partially irrelevant, because in certain ways the…
Observation-Oriented Modeling: Going beyond "Is It All a Matter of Chance"?
ERIC Educational Resources Information Center
Grice, James W.; Yepez, Maria; Wilson, Nicole L.; Shoda, Yuichi
2017-01-01
An alternative to null hypothesis significance testing is presented and discussed. This approach, referred to as observation-oriented modeling, is centered on model building in an effort to explicate the structures and processes believed to generate a set of observations. In terms of analysis, this novel approach complements traditional methods…
Remediating Misconception on Climate Change among Secondary School Students in Malaysia
ERIC Educational Resources Information Center
Karpudewan, Mageswary; Roth, Wolff-Michael; Chandrakesan, Kasturi
2015-01-01
Existing studies report on secondary school students' misconceptions related to climate change; they also report on the methods of teaching as reinforcing misconceptions. This quasi-experimental study was designed to test the null hypothesis that a curriculum based on constructivist principles does not lead to greater understanding and fewer…
ERIC Educational Resources Information Center
Dunst, Carl J.; Hamby, Deborah W.
2012-01-01
This paper includes a nontechnical description of methods for calculating effect sizes in intellectual and developmental disability studies. Different hypothetical studies are used to illustrate how null hypothesis significance testing (NHST) and effect size findings can result in quite different outcomes and therefore conflicting results. Whereas…
How Often Is p[subscript rep] Close to the True Replication Probability?
ERIC Educational Resources Information Center
Trafimow, David; MacDonald, Justin A.; Rice, Stephen; Clason, Dennis L.
2010-01-01
Largely due to dissatisfaction with the standard null hypothesis significance testing procedure, researchers have begun to consider alternatives. For example, Killeen (2005a) has argued that researchers should calculate p[subscript rep] that is purported to indicate the probability that, if the experiment in question were replicated, the obtained…
ERIC Educational Resources Information Center
Spinella, Sarah
2011-01-01
As result replicability is essential to science and difficult to achieve through external replicability, the present paper notes the insufficiency of null hypothesis statistical significance testing (NHSST) and explains the bootstrap as a plausible alternative, with a heuristic example to illustrate the bootstrap method. The bootstrap relies on…
Spatial autocorrelation in growth of undisturbed natural pine stands across Georgia
Raymond L. Czaplewski; Robin M. Reich; William A. Bechtold
1994-01-01
Moran's I statistic measures the spatial autocorrelation in a random variable measured at discrete locations in space. Permutation procedures test the null hypothesis that the observed Moran's I value is no greater than that expected by chance. The spatial autocorrelation of gross basal area increment is analyzed for undisturbed, naturally regenerated stands...
Use of the disease severity index for null hypothesis testing
USDA-ARS?s Scientific Manuscript database
A disease severity index (DSI) is a single number for summarizing a large amount of disease severity information. It is used to indicate relative resistance of cultivars, to relate disease severity to yield loss, or to compare treatments. The DSI has most often been based on a special type of ordina...
Confidence Intervals for Effect Sizes: Applying Bootstrap Resampling
ERIC Educational Resources Information Center
Banjanovic, Erin S.; Osborne, Jason W.
2016-01-01
Confidence intervals for effect sizes (CIES) provide readers with an estimate of the strength of a reported statistic as well as the relative precision of the point estimate. These statistics offer more information and context than null hypothesis statistic testing. Although confidence intervals have been recommended by scholars for many years,…
Diaz, Francisco J.; McDonald, Peter R.; Pinter, Abraham; Chaguturu, Rathnam
2018-01-01
Biomolecular screening research frequently searches for the chemical compounds that are most likely to make a biochemical or cell-based assay system produce a strong continuous response. Several doses are tested with each compound and it is assumed that, if there is a dose-response relationship, the relationship follows a monotonic curve, usually a version of the median-effect equation. However, the null hypothesis of no relationship cannot be statistically tested using this equation. We used a linearized version of this equation to define a measure of pharmacological effect size, and use this measure to rank the investigated compounds in order of their overall capability to produce strong responses. The null hypothesis that none of the examined doses of a particular compound produced a strong response can be tested with this approach. The proposed approach is based on a new statistical model of the important concept of response detection limit, a concept that is usually neglected in the analysis of dose-response data with continuous responses. The methodology is illustrated with data from a study searching for compounds that neutralize the infection by a human immunodeficiency virus of brain glioblastoma cells. PMID:24905187
Accuracy of Time Phasing Aircraft Development using the Continuous Distribution Function
2015-03-26
Breusch - Pagan test ; the reported p-value of 0.5264 fails to rejects the null hypothesis of constant... Breusch - Pagan Test : P-value – 0.6911 0 2 4 6 8 10 12 -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 Shapiro-Wilk W Test Prob. < W: 0.9849 -1...Weibull Scale Parameter β – Constant Variance Breusch - Pagan Test : P-value – 0.5176 Beta Shape Parameter α – Influential Data
Ling, Zhi-Qiang; Wang, Yi; Mukaisho, Kenichi; Hattori, Takanori; Tatsuta, Takeshi; Ge, Ming-Hua; Jin, Li; Mao, Wei-Min; Sugihara, Hiroyuki
2010-06-01
Tests of differentially expressed genes (DEGs) from microarray experiments are based on the null hypothesis that genes that are irrelevant to the phenotype/stimulus are expressed equally in the target and control samples. However, this strict hypothesis is not always true, as there can be several transcriptomic background differences between target and control samples, including different cell/tissue types, different cell cycle stages and different biological donors. These differences lead to increased false positives, which have little biological/medical significance. In this article, we propose a statistical framework to identify DEGs between target and control samples from expression microarray data allowing transcriptomic background differences between these samples by introducing a modified null hypothesis that the gene expression background difference is normally distributed. We use an iterative procedure to perform robust estimation of the null hypothesis and identify DEGs as outliers. We evaluated our method using our own triplicate microarray experiment, followed by validations with reverse transcription-polymerase chain reaction (RT-PCR) and on the MicroArray Quality Control dataset. The evaluations suggest that our technique (i) results in less false positive and false negative results, as measured by the degree of agreement with RT-PCR of the same samples, (ii) can be applied to different microarray platforms and results in better reproducibility as measured by the degree of DEG identification concordance both intra- and inter-platforms and (iii) can be applied efficiently with only a few microarray replicates. Based on these evaluations, we propose that this method not only identifies more reliable and biologically/medically significant DEG, but also reduces the power-cost tradeoff problem in the microarray field. Source code and binaries freely available for download at http://comonca.org.cn/fdca/resources/softwares/deg.zip.
Yang, Yang; DeGruttola, Victor
2016-01-01
Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients. PMID:22740584
Yang, Yang; DeGruttola, Victor
2012-06-22
Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients.
Narayana, Sai Sathya; Deepa, Vinoth Kumar; Ahamed, Shafie; Sathish, Emmanuel Solomon; Meyappan, R; Satheesh Kumar, K S
2014-01-01
The objective of this study is to investigate the efficacy of bioactive glass containing product on remineralization of artificial induced carious enamel lesion and to compare its efficiency with other remineralization products using an in-vitro pH cycling method. The null hypothesis tested was bioactive glass has no effect on enamel remineralization. A total of 20 enamel samples of human molar teeth were subjected to artificial caries lesion formation using pH cycling method and was verified using high resolution scanning electron microscope (HRSEM). Each demineralized sample was then divided into five test groups each containing twenty. Group A - Bioactive glass (SHY-NM), Group B - Fluoride tooth paste (Amflor), Group C - CPP-ACP (Tooth mousse), Group D - CPP-ACPF (Tooth mousse plus), Group E - control. All the test groups were exposed to the pH cycling regime, the remineralizing agents were applied for 10 min except control. After 10 days period, the entire test groups were evaluated with HRSEM and quantitative assessment by energy dispersive X-ray spectroscopy. The obtained data was analyzed statistically using one-way ANOVA, Student's t-test and Tukey's multiple comparison tests. P ≤ 0.05 was considered to be significant. Rejection of the null hypothesis and highlights the concept of biomimetic bioactive glass as an effective remineralizing agent. To focus on the importance of minimal invasive treatment on incipient carious lesion by remineralization.
Some controversial multiple testing problems in regulatory applications.
Hung, H M James; Wang, Sue-Jane
2009-01-01
Multiple testing problems in regulatory applications are often more challenging than the problems of handling a set of mathematical symbols representing multiple null hypotheses under testing. In the union-intersection setting, it is important to define a family of null hypotheses relevant to the clinical questions at issue. The distinction between primary endpoint and secondary endpoint needs to be considered properly in different clinical applications. Without proper consideration, the widely used sequential gate keeping strategies often impose too many logical restrictions to make sense, particularly to deal with the problem of testing multiple doses and multiple endpoints, the problem of testing a composite endpoint and its component endpoints, and the problem of testing superiority and noninferiority in the presence of multiple endpoints. Partitioning the null hypotheses involved in closed testing into clinical relevant orderings or sets can be a viable alternative to resolving the illogical problems requiring more attention from clinical trialists in defining the clinical hypotheses or clinical question(s) at the design stage. In the intersection-union setting there is little room for alleviating the stringency of the requirement that each endpoint must meet the same intended alpha level, unless the parameter space under the null hypothesis can be substantially restricted. Such restriction often requires insurmountable justification and usually cannot be supported by the internal data. Thus, a possible remedial approach to alleviate the possible conservatism as a result of this requirement is a group-sequential design strategy that starts with a conservative sample size planning and then utilizes an alpha spending function to possibly reach the conclusion early.
Null Effects and Publication Bias in Special Education Research
ERIC Educational Resources Information Center
Cook, Bryan G.; Therrien, William J.
2017-01-01
Researchers sometimes conduct a study and find that the predicted relation between variables did not exist or that the intervention did not have a positive impact on student outcomes; these are referred to as null findings because they fail to disconfirm the null hypothesis. Rather than consider such studies as failures and disregard the null…
ERIC Educational Resources Information Center
Bahrick, Lorraine E.; Hernandez-Reif, Maria; Pickens, Jeffrey N.
1997-01-01
Tested hypothesis from Bahrick and Pickens' infant attention model that retrieval cues increase memory accessibility and shift visual preferences toward greater novelty to resemble recent memories. Found that after retention intervals associated with remote or intermediate memory, previous familiarity preferences shifted to null or novelty…
ERIC Educational Resources Information Center
Hoekstra, Rink; Johnson, Addie; Kiers, Henk A. L.
2012-01-01
The use of confidence intervals (CIs) as an addition or as an alternative to null hypothesis significance testing (NHST) has been promoted as a means to make researchers more aware of the uncertainty that is inherent in statistical inference. Little is known, however, about whether presenting results via CIs affects how readers judge the…
ERIC Educational Resources Information Center
Dunkel, Curtis S.; Harbke, Colin R.; Papini, Dennis R.
2009-01-01
The authors proposed that birth order affects psychosocial outcomes through differential investment from parent to child and differences in the degree of identification from child to parent. The authors conducted this study to test these 2 models. Despite the use of statistical and methodological procedures to increase sensitivity and reduce…
A Comparison of Uniform DIF Effect Size Estimators under the MIMIC and Rasch Models
ERIC Educational Resources Information Center
Jin, Ying; Myers, Nicholas D.; Ahn, Soyeon; Penfield, Randall D.
2013-01-01
The Rasch model, a member of a larger group of models within item response theory, is widely used in empirical studies. Detection of uniform differential item functioning (DIF) within the Rasch model typically employs null hypothesis testing with a concomitant consideration of effect size (e.g., signed area [SA]). Parametric equivalence between…
USDA-ARS?s Scientific Manuscript database
A disease severity index (DSI) is a single number for summarizing a large amount of information on disease severity. It has been used to indicate the performance of a cultivar in regard to disease resistance at a particular location, to relate disease severity to yield loss, to determine the effecti...
Deblauwe, Vincent; Kennel, Pol; Couteron, Pierre
2012-01-01
Background Independence between observations is a standard prerequisite of traditional statistical tests of association. This condition is, however, violated when autocorrelation is present within the data. In the case of variables that are regularly sampled in space (i.e. lattice data or images), such as those provided by remote-sensing or geographical databases, this problem is particularly acute. Because analytic derivation of the null probability distribution of the test statistic (e.g. Pearson's r) is not always possible when autocorrelation is present, we propose instead the use of a Monte Carlo simulation with surrogate data. Methodology/Principal Findings The null hypothesis that two observed mapped variables are the result of independent pattern generating processes is tested here by generating sets of random image data while preserving the autocorrelation function of the original images. Surrogates are generated by matching the dual-tree complex wavelet spectra (and hence the autocorrelation functions) of white noise images with the spectra of the original images. The generated images can then be used to build the probability distribution function of any statistic of association under the null hypothesis. We demonstrate the validity of a statistical test of association based on these surrogates with both actual and synthetic data and compare it with a corrected parametric test and three existing methods that generate surrogates (randomization, random rotations and shifts, and iterative amplitude adjusted Fourier transform). Type I error control was excellent, even with strong and long-range autocorrelation, which is not the case for alternative methods. Conclusions/Significance The wavelet-based surrogates are particularly appropriate in cases where autocorrelation appears at all scales or is direction-dependent (anisotropy). We explore the potential of the method for association tests involving a lattice of binary data and discuss its potential for validation of species distribution models. An implementation of the method in Java for the generation of wavelet-based surrogates is available online as supporting material. PMID:23144961
Role of CYP2B in Phenobarbital-Induced Hepatocyte Proliferation in Mice.
Li, Lei; Bao, Xiaochen; Zhang, Qing-Yu; Negishi, Masahiko; Ding, Xinxin
2017-08-01
Phenobarbital (PB) promotes liver tumorigenesis in rodents, in part through activation of the constitutive androstane receptor (CAR) and the consequent changes in hepatic gene expression and increases in hepatocyte proliferation. A typical effect of CAR activation by PB is a marked induction of Cyp2b10 expression in the liver; the latter has been suspected to be vital for PB-induced hepatocellular proliferation. This hypothesis was tested here by using a Cyp2a(4/5)bgs -null (null) mouse model in which all Cyp2b genes are deleted. Adult male and female wild-type (WT) and null mice were treated intraperitoneally with PB at 50 mg/kg once daily for 5 successive days and tested on day 6. The liver-to-body weight ratio, an indicator of liver hypertrophy, was increased by 47% in male WT mice, but by only 22% in male Cyp2a(4/5)bgs -null mice, by the PB treatment. The fractions of bromodeoxyuridine-positive hepatocyte nuclei, assessed as a measure of the rate of hepatocyte proliferation, were also significantly lower in PB-treated male null mice compared with PB-treated male WT mice. However, whereas few proliferating hepatocytes were detected in saline-treated mice, many proliferating hepatocytes were still detected in PB-treated male null mice. In contrast, female WT mice were much less sensitive than male WT mice to PB-induced hepatocyte proliferation, and PB-treated female WT and PB-treated female null mice did not show significant difference in rates of hepatocyte proliferation. These results indicate that CYP2B induction plays a significant, but partial, role in PB-induced hepatocyte proliferation in male mice. U.S. Government work not protected by U.S. copyright.
Filipiak, Katarzyna; Klein, Daniel; Roy, Anuradha
2017-01-01
The problem of testing the separability of a covariance matrix against an unstructured variance-covariance matrix is studied in the context of multivariate repeated measures data using Rao's score test (RST). The RST statistic is developed with the first component of the separable structure as a first-order autoregressive (AR(1)) correlation matrix or an unstructured (UN) covariance matrix under the assumption of multivariate normality. It is shown that the distribution of the RST statistic under the null hypothesis of any separability does not depend on the true values of the mean or the unstructured components of the separable structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data, where the likelihood ratio test (LRT) cannot be used, and it outperforms the standard LRT in a number of contexts. Monte Carlo simulations are then used to study the comparative behavior of the null distribution of the RST statistic, as well as that of the LRT statistic, in terms of sample size considerations, and for the estimation of the empirical percentiles. Our findings are compared with existing results where the first component of the separable structure is a compound symmetry (CS) correlation matrix. It is also shown by simulations that the empirical null distribution of the RST statistic converges faster than the empirical null distribution of the LRT statistic to the limiting χ 2 distribution. The tests are implemented on a real dataset from medical studies. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Confidence intervals for single-case effect size measures based on randomization test inversion.
Michiels, Bart; Heyvaert, Mieke; Meulders, Ann; Onghena, Patrick
2017-02-01
In the current paper, we present a method to construct nonparametric confidence intervals (CIs) for single-case effect size measures in the context of various single-case designs. We use the relationship between a two-sided statistical hypothesis test at significance level α and a 100 (1 - α) % two-sided CI to construct CIs for any effect size measure θ that contain all point null hypothesis θ values that cannot be rejected by the hypothesis test at significance level α. This method of hypothesis test inversion (HTI) can be employed using a randomization test as the statistical hypothesis test in order to construct a nonparametric CI for θ. We will refer to this procedure as randomization test inversion (RTI). We illustrate RTI in a situation in which θ is the unstandardized and the standardized difference in means between two treatments in a completely randomized single-case design. Additionally, we demonstrate how RTI can be extended to other types of single-case designs. Finally, we discuss a few challenges for RTI as well as possibilities when using the method with other effect size measures, such as rank-based nonoverlap indices. Supplementary to this paper, we provide easy-to-use R code, which allows the user to construct nonparametric CIs according to the proposed method.
2016-12-01
KS and AD Statistical Power via Monte Carlo Simulation Statistical power is the probability of correctly rejecting the null hypothesis when the...Select a caveat DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited. Determining the Statistical Power...real-world data to test the accuracy of the simulation. Statistical comparison of these metrics can be necessary when making such a determination
Forecasting Flying Hour Costs of the B-1, B-2, and the B-52 Bomber Aircraft
2008-03-01
reject the null hypothesis that the residuals are normally distributed. Likewise, in the Breusch Pagan test , a p-value greater than 0.05 means we...normality or constant variance, it will be noted in the results tables in Chapter IV. The Shapiro Wilk and Breusch Pagan tests are also very...the model; and • the results of the Shapiro Wilk, Breusch Pagan , and Durbin Watson tests . Summary This chapter outlines the methodology used in
Tressoldi, Patrizio E.
2011-01-01
Starting from the famous phrase “extraordinary claims require extraordinary evidence,” we will present the evidence supporting the concept that human visual perception may have non-local properties, in other words, that it may operate beyond the space and time constraints of sensory organs, in order to discuss which criteria can be used to define evidence as extraordinary. This evidence has been obtained from seven databases which are related to six different protocols used to test the reality and the functioning of non-local perception, analyzed using both a frequentist and a new Bayesian meta-analysis statistical procedure. According to a frequentist meta-analysis, the null hypothesis can be rejected for all six protocols even if the effect sizes range from 0.007 to 0.28. According to Bayesian meta-analysis, the Bayes factors provides strong evidence to support the alternative hypothesis (H1) over the null hypothesis (H0), but only for three out of the six protocols. We will discuss whether quantitative psychology can contribute to defining the criteria for the acceptance of new scientific ideas in order to avoid the inconclusive controversies between supporters and opponents. PMID:21713069
Mathematical Capture of Human Data for Computer Model Building and Validation
2014-04-03
weapon. The Projectile, the VDE , and the IDE weapons had effects of financial loss for the targeted participant, while the MRAD yielded its own...for LE, Centroid and TE for the baseline and The VDE weapon conditions since p-values exceeded α. All other conditions rejected the null...hypothesis except the LE for VDE weapon. The K-S Statistics were correspondingly lower for the measures that failed to reject the null hypothesis. The CDF
Clinical trial designs for testing biomarker-based personalized therapies
Lai, Tze Leung; Lavori, Philip W; Shih, Mei-Chiung I; Sikic, Branimir I
2014-01-01
Background Advances in molecular therapeutics in the past decade have opened up new possibilities for treating cancer patients with personalized therapies, using biomarkers to determine which treatments are most likely to benefit them, but there are difficulties and unresolved issues in the development and validation of biomarker-based personalized therapies. We develop a new clinical trial design to address some of these issues. The goal is to capture the strengths of the frequentist and Bayesian approaches to address this problem in the recent literature and to circumvent their limitations. Methods We use generalized likelihood ratio tests of the intersection null and enriched strategy null hypotheses to derive a novel clinical trial design for the problem of advancing promising biomarker-guided strategies toward eventual validation. We also investigate the usefulness of adaptive randomization (AR) and futility stopping proposed in the recent literature. Results Simulation studies demonstrate the advantages of testing both the narrowly focused enriched strategy null hypothesis related to validating a proposed strategy and the intersection null hypothesis that can accommodate to a potentially successful strategy. AR and early termination of ineffective treatments offer increased probability of receiving the preferred treatment and better response rates for patients in the trial, at the expense of more complicated inference under small-to-moderate total sample sizes and some reduction in power. Limitations The binary response used in the development phase may not be a reliable indicator of treatment benefit on long-term clinical outcomes. In the proposed design, the biomarker-guided strategy (BGS) is not compared to ‘standard of care’, such as physician’s choice that may be informed by patient characteristics. Therefore, a positive result does not imply superiority of the BGS to ‘standard of care’. The proposed design and tests are valid asymptotically. Simulations are used to examine small-to-moderate sample properties. Conclusion Innovative clinical trial designs are needed to address the difficulties and issues in the development and validation of biomarker-based personalized therapies. The article shows the advantages of using likelihood inference and interim analysis to meet the challenges in the sample size needed and in the constantly evolving biomarker landscape and genomic and proteomic technologies. PMID:22397801
TMJ symptoms reduce chewing amplitude and velocity, and increase variability.
Radke, John C; Kamyszek, Greg J; Kull, Robert S; Velasco, Gerardo R
2017-09-04
The null hypothesis was that mandibular amplitude, velocity, and variability during gum chewing are not altered in subjects with temporomandibular joint (TMJ) internal derangements (ID). Thirty symptomatic subjects with confirmed ID consented to chew gum on their left and right sides while being tracked by an incisor-point jaw tracker. A gender and age matched control group (p > 0.67) volunteered to be likewise recorded. Student's t-test compared the ID group's mean values to the control group. The control group opened wider (p < 0.05) and chewed faster (p < 0.05) than the ID group. The mean cycle time of the ID group (0.929 s) was longer than the control group (0.751 s; p < 0.05) and more variable (p < 0.05). The ID group exhibited reduced amplitude and velocity but increased variability during chewing. The null hypothesis was rejected. Further study of adaptation to ID by patients should be pursued.
Testing Small Variance Priors Using Prior-Posterior Predictive p Values.
Hoijtink, Herbert; van de Schoot, Rens
2017-04-03
Muthén and Asparouhov (2012) propose to evaluate model fit in structural equation models based on approximate (using small variance priors) instead of exact equality of (combinations of) parameters to zero. This is an important development that adequately addresses Cohen's (1994) The Earth is Round (p < .05), which stresses that point null-hypotheses are so precise that small and irrelevant differences from the null-hypothesis may lead to their rejection. It is tempting to evaluate small variance priors using readily available approaches like the posterior predictive p value and the DIC. However, as will be shown, both are not suited for the evaluation of models based on small variance priors. In this article, a well behaving alternative, the prior-posterior predictive p value, will be introduced. It will be shown that it is consistent, the distributions under the null and alternative hypotheses will be elaborated, and it will be applied to testing whether the difference between 2 means and the size of a correlation are relevantly different from zero. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Classical Testing in Functional Linear Models.
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications.
Classical Testing in Functional Linear Models
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications. PMID:28955155
ERIC Educational Resources Information Center
Anuna, M. C.; Mbonu, F. O.; Amanchukwu, R. N.
2013-01-01
The purpose of this study is to determine whether violation of students' legal rights has relationship with organizational climate in secondary schools in Imo State, Nigeria. Three research questions and null hypothesis were put forward and tested in order to make one's decisions on the issues investigated. Relevant literature to the study was…
USDA-ARS?s Scientific Manuscript database
We report results of the last two years of a 7-year (2008-2014) field experiment designed to test the null hypothesis that applications of glyphosate on glyphosate resistant corn (Zea mays L.) as a routine weed control practice under both conventional and reduced tillage practices would have no effe...
High Impact = High Statistical Standards? Not Necessarily So
Tressoldi, Patrizio E.; Giofré, David; Sella, Francesco; Cumming, Geoff
2013-01-01
What are the statistical practices of articles published in journals with a high impact factor? Are there differences compared with articles published in journals with a somewhat lower impact factor that have adopted editorial policies to reduce the impact of limitations of Null Hypothesis Significance Testing? To investigate these questions, the current study analyzed all articles related to psychological, neuropsychological and medical issues, published in 2011 in four journals with high impact factors: Science, Nature, The New England Journal of Medicine and The Lancet, and three journals with relatively lower impact factors: Neuropsychology, Journal of Experimental Psychology-Applied and the American Journal of Public Health. Results show that Null Hypothesis Significance Testing without any use of confidence intervals, effect size, prospective power and model estimation, is the prevalent statistical practice used in articles published in Nature, 89%, followed by articles published in Science, 42%. By contrast, in all other journals, both with high and lower impact factors, most articles report confidence intervals and/or effect size measures. We interpreted these differences as consequences of the editorial policies adopted by the journal editors, which are probably the most effective means to improve the statistical practices in journals with high or low impact factors. PMID:23418533
High impact = high statistical standards? Not necessarily so.
Tressoldi, Patrizio E; Giofré, David; Sella, Francesco; Cumming, Geoff
2013-01-01
What are the statistical practices of articles published in journals with a high impact factor? Are there differences compared with articles published in journals with a somewhat lower impact factor that have adopted editorial policies to reduce the impact of limitations of Null Hypothesis Significance Testing? To investigate these questions, the current study analyzed all articles related to psychological, neuropsychological and medical issues, published in 2011 in four journals with high impact factors: Science, Nature, The New England Journal of Medicine and The Lancet, and three journals with relatively lower impact factors: Neuropsychology, Journal of Experimental Psychology-Applied and the American Journal of Public Health. Results show that Null Hypothesis Significance Testing without any use of confidence intervals, effect size, prospective power and model estimation, is the prevalent statistical practice used in articles published in Nature, 89%, followed by articles published in Science, 42%. By contrast, in all other journals, both with high and lower impact factors, most articles report confidence intervals and/or effect size measures. We interpreted these differences as consequences of the editorial policies adopted by the journal editors, which are probably the most effective means to improve the statistical practices in journals with high or low impact factors.
Sequential parallel comparison design with binary and time-to-event outcomes.
Silverman, Rachel Kloss; Ivanova, Anastasia; Fine, Jason
2018-04-30
Sequential parallel comparison design (SPCD) has been proposed to increase the likelihood of success of clinical trials especially trials with possibly high placebo effect. Sequential parallel comparison design is conducted with 2 stages. Participants are randomized between active therapy and placebo in stage 1. Then, stage 1 placebo nonresponders are rerandomized between active therapy and placebo. Data from the 2 stages are pooled to yield a single P value. We consider SPCD with binary and with time-to-event outcomes. For time-to-event outcomes, response is defined as a favorable event prior to the end of follow-up for a given stage of SPCD. We show that for these cases, the usual test statistics from stages 1 and 2 are asymptotically normal and uncorrelated under the null hypothesis, leading to a straightforward combined testing procedure. In addition, we show that the estimators of the treatment effects from the 2 stages are asymptotically normal and uncorrelated under the null and alternative hypothesis, yielding confidence interval procedures with correct coverage. Simulations and real data analysis demonstrate the utility of the binary and time-to-event SPCD. Copyright © 2018 John Wiley & Sons, Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hearin, Andrew P.; Zentner, Andrew R., E-mail: aph15@pitt.edu, E-mail: zentner@pitt.edu
Forthcoming projects such as the Dark Energy Survey, Joint Dark Energy Mission, and the Large Synoptic Survey Telescope, aim to measure weak lensing shear correlations with unprecedented accuracy. Weak lensing observables are sensitive to both the distance-redshift relation and the growth of structure in the Universe. If the cause of accelerated cosmic expansion is dark energy within general relativity, both cosmic distances and structure growth are governed by the properties of dark energy. Consequently, one may use lensing to check for this consistency and test general relativity. After reviewing the phenomenology of such tests, we address a major challenge tomore » such a program. The evolution of the baryonic component of the Universe is highly uncertain and can influence lensing observables, manifesting as modified structure growth for a fixed cosmic distance scale. Using two proposed methods, we show that one could be led to reject the null hypothesis of general relativity when it is the true theory if this uncertainty in baryonic processes is neglected. Recent simulations suggest that we can correct for baryonic effects using a parameterized model in which the halo mass-concentration relation is modified. The correction suffices to render biases small compared to statistical uncertainties. We study the ability of future weak lensing surveys to constrain the internal structures of halos and test the null hypothesis of general relativity simultaneously. Compared to alternative methods which null information from small-scales to mitigate sensitivity to baryonic physics, this internal calibration program should provide limits on deviations from general relativity that are several times more constraining. Specifically, we find that limits on general relativity in the case of internal calibration are degraded by only {approx} 30% or less compared to the case of perfect knowledge of nonlinear structure.« less
NASA Astrophysics Data System (ADS)
Cannas, Barbara; Fanni, Alessandra; Murari, Andrea; Pisano, Fabio; Contributors, JET
2018-02-01
In this paper, the dynamic characteristics of type-I ELM time-series from the JET tokamak, the world’s largest magnetic confinement plasma physics experiment, have been investigated. The dynamic analysis has been focused on the detection of nonlinear structure in D α radiation time series. Firstly, the method of surrogate data has been applied to evaluate the statistical significance of the null hypothesis of static nonlinear distortion of an underlying Gaussian linear process. Several nonlinear statistics have been evaluated, such us the time delayed mutual information, the correlation dimension and the maximal Lyapunov exponent. The obtained results allow us to reject the null hypothesis, giving evidence of underlying nonlinear dynamics. Moreover, no evidence of low-dimensional chaos has been found; indeed, the analysed time series are better characterized by the power law sensitivity to initial conditions which can suggest a motion at the ‘edge of chaos’, at the border between chaotic and regular non-chaotic dynamics. This uncertainty makes it necessary to further investigate about the nature of the nonlinear dynamics. For this purpose, a second surrogate test to distinguish chaotic orbits from pseudo-periodic orbits has been applied. In this case, we cannot reject the null hypothesis which means that the ELM time series is possibly pseudo-periodic. In order to reproduce pseudo-periodic dynamical properties, a periodic state-of-the-art model, proposed to reproduce the ELM cycle, has been corrupted by a dynamical noise, obtaining time series qualitatively in agreement with experimental time series.
Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing.
Perezgonzalez, Jose D
2015-01-01
Despite frequent calls for the overhaul of null hypothesis significance testing (NHST), this controversial procedure remains ubiquitous in behavioral, social and biomedical teaching and research. Little change seems possible once the procedure becomes well ingrained in the minds and current practice of researchers; thus, the optimal opportunity for such change is at the time the procedure is taught, be this at undergraduate or at postgraduate levels. This paper presents a tutorial for the teaching of data testing procedures, often referred to as hypothesis testing theories. The first procedure introduced is Fisher's approach to data testing-tests of significance; the second is Neyman-Pearson's approach-tests of acceptance; the final procedure is the incongruent combination of the previous two theories into the current approach-NSHT. For those researchers sticking with the latter, two compromise solutions on how to improve NHST conclude the tutorial.
An asymptotic analysis of the logrank test.
Strawderman, R L
1997-01-01
Asymptotic expansions for the null distribution of the logrank statistic and its distribution under local proportional hazards alternatives are developed in the case of iid observations. The results, which are derived from the work of Gu (1992) and Taniguchi (1992), are easy to interpret, and provide some theoretical justification for many behavioral characteristics of the logrank test that have been previously observed in simulation studies. We focus primarily upon (i) the inadequacy of the usual normal approximation under treatment group imbalance; and, (ii) the effects of treatment group imbalance on power and sample size calculations. A simple transformation of the logrank statistic is also derived based on results in Konishi (1991) and is found to substantially improve the standard normal approximation to its distribution under the null hypothesis of no survival difference when there is treatment group imbalance.
Tests of Hypotheses Arising In the Correlated Random Coefficient Model*
Heckman, James J.; Schmierer, Daniel
2010-01-01
This paper examines the correlated random coefficient model. It extends the analysis of Swamy (1971), who pioneered the uncorrelated random coefficient model in economics. We develop the properties of the correlated random coefficient model and derive a new representation of the variance of the instrumental variable estimator for that model. We develop tests of the validity of the correlated random coefficient model against the null hypothesis of the uncorrelated random coefficient model. PMID:21170148
Invited Commentary: The Need for Cognitive Science in Methodology.
Greenland, Sander
2017-09-15
There is no complete solution for the problem of abuse of statistics, but methodological training needs to cover cognitive biases and other psychosocial factors affecting inferences. The present paper discusses 3 common cognitive distortions: 1) dichotomania, the compulsion to perceive quantities as dichotomous even when dichotomization is unnecessary and misleading, as in inferences based on whether a P value is "statistically significant"; 2) nullism, the tendency to privilege the hypothesis of no difference or no effect when there is no scientific basis for doing so, as when testing only the null hypothesis; and 3) statistical reification, treating hypothetical data distributions and statistical models as if they reflect known physical laws rather than speculative assumptions for thought experiments. As commonly misused, null-hypothesis significance testing combines these cognitive problems to produce highly distorted interpretation and reporting of study results. Interval estimation has so far proven to be an inadequate solution because it involves dichotomization, an avenue for nullism. Sensitivity and bias analyses have been proposed to address reproducibility problems (Am J Epidemiol. 2017;186(6):646-647); these methods can indeed address reification, but they can also introduce new distortions via misleading specifications for bias parameters. P values can be reframed to lessen distortions by presenting them without reference to a cutoff, providing them for relevant alternatives to the null, and recognizing their dependence on all assumptions used in their computation; they nonetheless require rescaling for measuring evidence. I conclude that methodological development and training should go beyond coverage of mechanistic biases (e.g., confounding, selection bias, measurement error) to cover distortions of conclusions produced by statistical methods and psychosocial forces. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Bayesian analysis of multimethod ego-depletion studies favours the null hypothesis.
Etherton, Joseph L; Osborne, Randall; Stephenson, Katelyn; Grace, Morgan; Jones, Chas; De Nadai, Alessandro S
2018-04-01
Ego-depletion refers to the purported decrease in performance on a task requiring self-control after engaging in a previous task involving self-control, with self-control proposed to be a limited resource. Despite many published studies consistent with this hypothesis, recurrent null findings within our laboratory and indications of publication bias have called into question the validity of the depletion effect. This project used three depletion protocols involved three different depleting initial tasks followed by three different self-control tasks as dependent measures (total n = 840). For each method, effect sizes were not significantly different from zero When data were aggregated across the three different methods and examined meta-analytically, the pooled effect size was not significantly different from zero (for all priors evaluated, Hedges' g = 0.10 with 95% credibility interval of [-0.05, 0.24]) and Bayes factors reflected strong support for the null hypothesis (Bayes factor > 25 for all priors evaluated). © 2018 The British Psychological Society.
Abad-Grau, Mara M; Medina-Medina, Nuria; Montes-Soldado, Rosana; Matesanz, Fuencisla; Bafna, Vineet
2012-01-01
Multimarker Transmission/Disequilibrium Tests (TDTs) are very robust association tests to population admixture and structure which may be used to identify susceptibility loci in genome-wide association studies. Multimarker TDTs using several markers may increase power by capturing high-degree associations. However, there is also a risk of spurious associations and power reduction due to the increase in degrees of freedom. In this study we show that associations found by tests built on simple null hypotheses are highly reproducible in a second independent data set regardless the number of markers. As a test exhibiting this feature to its maximum, we introduce the multimarker 2-Groups TDT (mTDT(2G)), a test which under the hypothesis of no linkage, asymptotically follows a χ2 distribution with 1 degree of freedom regardless the number of markers. The statistic requires the division of parental haplotypes into two groups: disease susceptibility and disease protective haplotype groups. We assessed the test behavior by performing an extensive simulation study as well as a real-data study using several data sets of two complex diseases. We show that mTDT(2G) test is highly efficient and it achieves the highest power among all the tests used, even when the null hypothesis is tested in a second independent data set. Therefore, mTDT(2G) turns out to be a very promising multimarker TDT to perform genome-wide searches for disease susceptibility loci that may be used as a preprocessing step in the construction of more accurate genetic models to predict individual susceptibility to complex diseases.
Abad-Grau, Mara M.; Medina-Medina, Nuria; Montes-Soldado, Rosana; Matesanz, Fuencisla; Bafna, Vineet
2012-01-01
Multimarker Transmission/Disequilibrium Tests (TDTs) are very robust association tests to population admixture and structure which may be used to identify susceptibility loci in genome-wide association studies. Multimarker TDTs using several markers may increase power by capturing high-degree associations. However, there is also a risk of spurious associations and power reduction due to the increase in degrees of freedom. In this study we show that associations found by tests built on simple null hypotheses are highly reproducible in a second independent data set regardless the number of markers. As a test exhibiting this feature to its maximum, we introduce the multimarker -Groups TDT ( ), a test which under the hypothesis of no linkage, asymptotically follows a distribution with degree of freedom regardless the number of markers. The statistic requires the division of parental haplotypes into two groups: disease susceptibility and disease protective haplotype groups. We assessed the test behavior by performing an extensive simulation study as well as a real-data study using several data sets of two complex diseases. We show that test is highly efficient and it achieves the highest power among all the tests used, even when the null hypothesis is tested in a second independent data set. Therefore, turns out to be a very promising multimarker TDT to perform genome-wide searches for disease susceptibility loci that may be used as a preprocessing step in the construction of more accurate genetic models to predict individual susceptibility to complex diseases. PMID:22363405
Affected sib pair tests in inbred populations.
Liu, W; Weir, B S
2004-11-01
The affected-sib-pair (ASP) method for detecting linkage between a disease locus and marker loci was first established 50 years ago, and since then numerous modifications have been made. We modify two identity-by-state (IBS) test statistics of Lange (Lange, 1986a, 1986b) to allow for inbreeding in the population. We evaluate the power and false positive rates of the modified tests under three disease models, using simulated data. Before estimating false positive rates, we demonstrate that IBS tests are tests of both linkage and linkage disequilibrium between marker and disease loci. Therefore, the null hypothesis of IBS tests should be no linkage and no LD. When the population inbreeding coefficient is large, the false positive rates of Lange's tests become much larger than the nominal value, while those of our modified tests remain close to the nominal value. To estimate power with a controlled false positive rate, we choose the cutoff values based on simulated datasets under the null hypothesis, so that both Lange's tests and the modified tests generate same false positive rate. The powers of Lange's z-test and our modified z-test are very close and do not change much with increasing inbreeding. The power of the modified chi-square test also stays stable when the inbreeding coefficient increases. However, the power of Lange's chi-square test increases with increasing inbreeding, and is larger than that of our modified chi-square test for large inbreeding coefficients. The power is high under a recessive disease model for both Lange's tests and the modified tests, though the power is low for additive and dominant disease models. Allowing for inbreeding is therefore appropriate, at least for diseases known to be recessive.
Long working hours and use of psychotropic medicine: a follow-up study with register linkage.
Hannerz, Harald; Albertsen, Karen
2016-03-01
This study aimed to investigate the possibility of a prospective association between long working hours and use of psychotropic medicine. Survey data drawn from random samples of the general working population of Denmark in the time period 1995-2010 were linked to national registers covering all inhabitants. The participants were followed for first occurrence of redeemed prescriptions for psychotropic medicine. The primary analysis included 25,959 observations (19,259 persons) and yielded a total of 2914 new cases of psychotropic drug use in 99,018 person-years at risk. Poisson regression was used to model incidence rates of redeemed prescriptions for psychotropic medicine as a function of working hours (32-40, 41-48, >48 hours/week). The analysis was controlled for gender, age, sample, shift work, and socioeconomic status. A likelihood ratio test was used to test the null hypothesis, which stated that the incidence rates were independent of weekly working hours. The likelihood ratio test did not reject the null hypothesis (P=0.085). The rate ratio (RR) was 1.04 [95% confidence interval (95% CI) 0.94-1.15] for the contrast 41-48 versus 32-40 work hours/week and 1.15 (95% CI 1.02-1.30) for >48 versus 32-40 hours/week. None of the rate ratios that were estimated in the present study were statistically significant after adjustment for multiple testing. However, stratified analyses, in which 30 RR were estimated, generated the hypothesis that overtime work (>48 hours/week) might be associated with an increased risk among night or shift workers (RR=1.51, 95% CI 1.15-1.98). The present study did not find a statistically significant association between long working hours and incidence of psychotropic drug usage among Danish employees.
Classical Statistics and Statistical Learning in Imaging Neuroscience
Bzdok, Danilo
2017-01-01
Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques. PMID:29056896
A Bayesian Perspective on the Reproducibility Project: Psychology.
Etz, Alexander; Vandekerckhove, Joachim
2016-01-01
We revisit the results of the recent Reproducibility Project: Psychology by the Open Science Collaboration. We compute Bayes factors-a quantity that can be used to express comparative evidence for an hypothesis but also for the null hypothesis-for a large subset (N = 72) of the original papers and their corresponding replication attempts. In our computation, we take into account the likely scenario that publication bias had distorted the originally published results. Overall, 75% of studies gave qualitatively similar results in terms of the amount of evidence provided. However, the evidence was often weak (i.e., Bayes factor < 10). The majority of the studies (64%) did not provide strong evidence for either the null or the alternative hypothesis in either the original or the replication, and no replication attempts provided strong evidence in favor of the null. In all cases where the original paper provided strong evidence but the replication did not (15%), the sample size in the replication was smaller than the original. Where the replication provided strong evidence but the original did not (10%), the replication sample size was larger. We conclude that the apparent failure of the Reproducibility Project to replicate many target effects can be adequately explained by overestimation of effect sizes (or overestimation of evidence against the null hypothesis) due to small sample sizes and publication bias in the psychological literature. We further conclude that traditional sample sizes are insufficient and that a more widespread adoption of Bayesian methods is desirable.
ERIC Educational Resources Information Center
Martuza, Victor R.; Engel, John D.
Results from classical power analysis (Brewer, 1972) suggest that a researcher should not set a=p (when p is less than a) in a posteriori fashion when a study yields statistically significant results because of a resulting decrease in power. The purpose of the present report is to use Bayesian theory in examining the validity of this…
ERIC Educational Resources Information Center
Wagstaff, David A.; Elek, Elvira; Kulis, Stephen; Marsiglia, Flavio
2009-01-01
A nonparametric bootstrap was used to obtain an interval estimate of Pearson's "r," and test the null hypothesis that there was no association between 5th grade students' positive substance use expectancies and their intentions to not use substances. The students were participating in a substance use prevention program in which the unit of…
On the Model-Based Bootstrap with Missing Data: Obtaining a "P"-Value for a Test of Exact Fit
ERIC Educational Resources Information Center
Savalei, Victoria; Yuan, Ke-Hai
2009-01-01
Evaluating the fit of a structural equation model via bootstrap requires a transformation of the data so that the null hypothesis holds exactly in the sample. For complete data, such a transformation was proposed by Beran and Srivastava (1985) for general covariance structure models and applied to structural equation modeling by Bollen and Stine…
Genomic Analysis of Complex Microbial Communities in Wounds
2012-01-01
thoroughly in the ecology literature. Permutation Multivariate Analysis of Variance ( PerMANOVA ). We used PerMANOVA to test the null-hypothesis of no...difference between the bacterial communities found within a single wound compared to those from different patients (α = 0.05). PerMANOVA is a...permutation-based version of the multivariate analysis of variance (MANOVA). PerMANOVA uses the distances between samples to partition variance and
ERIC Educational Resources Information Center
DeSantis, Larisa
2009-01-01
Clarifying ancient environments millions of years ago is necessary to better understand how ecosystems change over time, providing insight as to the potential impacts of current global warming. This module engages middle school students in the scientific process, asking them to use tooth measurement to test the null hypothesis that horse and tapir…
Krefeld-Schwalb, Antonia; Witte, Erich H.; Zenker, Frank
2018-01-01
In psychology as elsewhere, the main statistical inference strategy to establish empirical effects is null-hypothesis significance testing (NHST). The recent failure to replicate allegedly well-established NHST-results, however, implies that such results lack sufficient statistical power, and thus feature unacceptably high error-rates. Using data-simulation to estimate the error-rates of NHST-results, we advocate the research program strategy (RPS) as a superior methodology. RPS integrates Frequentist with Bayesian inference elements, and leads from a preliminary discovery against a (random) H0-hypothesis to a statistical H1-verification. Not only do RPS-results feature significantly lower error-rates than NHST-results, RPS also addresses key-deficits of a “pure” Frequentist and a standard Bayesian approach. In particular, RPS aggregates underpowered results safely. RPS therefore provides a tool to regain the trust the discipline had lost during the ongoing replicability-crisis. PMID:29740363
Krefeld-Schwalb, Antonia; Witte, Erich H; Zenker, Frank
2018-01-01
In psychology as elsewhere, the main statistical inference strategy to establish empirical effects is null-hypothesis significance testing (NHST). The recent failure to replicate allegedly well-established NHST-results, however, implies that such results lack sufficient statistical power, and thus feature unacceptably high error-rates. Using data-simulation to estimate the error-rates of NHST-results, we advocate the research program strategy (RPS) as a superior methodology. RPS integrates Frequentist with Bayesian inference elements, and leads from a preliminary discovery against a (random) H 0 -hypothesis to a statistical H 1 -verification. Not only do RPS-results feature significantly lower error-rates than NHST-results, RPS also addresses key-deficits of a "pure" Frequentist and a standard Bayesian approach. In particular, RPS aggregates underpowered results safely. RPS therefore provides a tool to regain the trust the discipline had lost during the ongoing replicability-crisis.
Default "Gunel and Dickey" Bayes factors for contingency tables.
Jamil, Tahira; Ly, Alexander; Morey, Richard D; Love, Jonathon; Marsman, Maarten; Wagenmakers, Eric-Jan
2017-04-01
The analysis of R×C contingency tables usually features a test for independence between row and column counts. Throughout the social sciences, the adequacy of the independence hypothesis is generally evaluated by the outcome of a classical p-value null-hypothesis significance test. Unfortunately, however, the classical p-value comes with a number of well-documented drawbacks. Here we outline an alternative, Bayes factor method to quantify the evidence for and against the hypothesis of independence in R×C contingency tables. First we describe different sampling models for contingency tables and provide the corresponding default Bayes factors as originally developed by Gunel and Dickey (Biometrika, 61(3):545-557 (1974)). We then illustrate the properties and advantages of a Bayes factor analysis of contingency tables through simulations and practical examples. Computer code is available online and has been incorporated in the "BayesFactor" R package and the JASP program ( jasp-stats.org ).
On computation of p-values in parametric linkage analysis.
Kurbasic, Azra; Hössjer, Ola
2004-01-01
Parametric linkage analysis is usually used to find chromosomal regions linked to a disease (phenotype) that is described with a specific genetic model. This is done by investigating the relations between the disease and genetic markers, that is, well-characterized loci of known position with a clear Mendelian mode of inheritance. Assume we have found an interesting region on a chromosome that we suspect is linked to the disease. Then we want to test the hypothesis of no linkage versus the alternative one of linkage. As a measure we use the maximal lod score Z(max). It is well known that the maximal lod score has asymptotically a (2 ln 10)(-1) x (1/2 chi2(0) + 1/2 chi2(1)) distribution under the null hypothesis of no linkage when only one point (one marker) on the chromosome is studied. In this paper, we show, both by simulations and theoretical arguments, that the null hypothesis distribution of Zmax has no simple form when more than one marker is used (multipoint analysis). In fact, the distribution of Zmax depends on the number of families, their structure, the assumed genetic model, marker denseness, and marker informativity. This means that a constant critical limit of Zmax leads to tests associated with different significance levels. Because of the above-mentioned problems, from the statistical point of view the maximal lod score should be supplemented by a p-value when results are reported. Copyright (c) 2004 S. Karger AG, Basel.
Seasonal variation of sudden infant death syndrome in Hawaii.
Mage, David T
2004-11-01
To test whether the sudden infant death syndrome (SIDS) rate displays the universal winter maximum and summer minimum in Hawaii where there is no appreciable seasonal variation of temperature. The null hypothesis is tested that there is no seasonal variation of necropsied SIDS in Hawaii. The numbers of live births and SIDS cases by month for the years 1979 to 2002 were collected and the monthly SIDS distribution is predicted based on the age at death distribution. The state of Hawaii, located in the midst of the Pacific Ocean, has a semi-tropical climate with temperatures fluctuating diurnally as 25 +/- 5 degrees C throughout the year. Therefore homes are unheated and infants are not excessively swaddled. The Hawaii State Department of Health maintains vital statistics of all infant births and deaths. The results reject the null hypothesis of no seasonal variation of SIDS (p = 0.026). An explanation for the seasonal effect of the winter maximum and summer minimum for Hawaiian SIDS is that it arises from the cycle of the school session and summer vacation periods that represent variable intensity of a possible viral infection vector. SIDS rates in both Hawaii and the United States increase with parity, also indicating a possible role of school age siblings as carriers. The winter peak of the SIDS in Hawaii is support for the hypothesis that a low grade viral infection, insufficient by itself to be a visible cause of death at necropsy, may be implicated as contributing to SIDS in vulnerable infants.
NASA Astrophysics Data System (ADS)
Lehmann, Rüdiger; Lösler, Michael
2017-12-01
Geodetic deformation analysis can be interpreted as a model selection problem. The null model indicates that no deformation has occurred. It is opposed to a number of alternative models, which stipulate different deformation patterns. A common way to select the right model is the usage of a statistical hypothesis test. However, since we have to test a series of deformation patterns, this must be a multiple test. As an alternative solution for the test problem, we propose the p-value approach. Another approach arises from information theory. Here, the Akaike information criterion (AIC) or some alternative is used to select an appropriate model for a given set of observations. Both approaches are discussed and applied to two test scenarios: A synthetic levelling network and the Delft test data set. It is demonstrated that they work but behave differently, sometimes even producing different results. Hypothesis tests are well-established in geodesy, but may suffer from an unfavourable choice of the decision error rates. The multiple test also suffers from statistical dependencies between the test statistics, which are neglected. Both problems are overcome by applying information criterions like AIC.
Taroni, F; Biedermann, A; Bozza, S
2016-02-01
Many people regard the concept of hypothesis testing as fundamental to inferential statistics. Various schools of thought, in particular frequentist and Bayesian, have promoted radically different solutions for taking a decision about the plausibility of competing hypotheses. Comprehensive philosophical comparisons about their advantages and drawbacks are widely available and continue to span over large debates in the literature. More recently, controversial discussion was initiated by an editorial decision of a scientific journal [1] to refuse any paper submitted for publication containing null hypothesis testing procedures. Since the large majority of papers published in forensic journals propose the evaluation of statistical evidence based on the so called p-values, it is of interest to expose the discussion of this journal's decision within the forensic science community. This paper aims to provide forensic science researchers with a primer on the main concepts and their implications for making informed methodological choices. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Multi-arm group sequential designs with a simultaneous stopping rule.
Urach, S; Posch, M
2016-12-30
Multi-arm group sequential clinical trials are efficient designs to compare multiple treatments to a control. They allow one to test for treatment effects already in interim analyses and can have a lower average sample number than fixed sample designs. Their operating characteristics depend on the stopping rule: We consider simultaneous stopping, where the whole trial is stopped as soon as for any of the arms the null hypothesis of no treatment effect can be rejected, and separate stopping, where only recruitment to arms for which a significant treatment effect could be demonstrated is stopped, but the other arms are continued. For both stopping rules, the family-wise error rate can be controlled by the closed testing procedure applied to group sequential tests of intersection and elementary hypotheses. The group sequential boundaries for the separate stopping rule also control the family-wise error rate if the simultaneous stopping rule is applied. However, we show that for the simultaneous stopping rule, one can apply improved, less conservative stopping boundaries for local tests of elementary hypotheses. We derive corresponding improved Pocock and O'Brien type boundaries as well as optimized boundaries to maximize the power or average sample number and investigate the operating characteristics and small sample properties of the resulting designs. To control the power to reject at least one null hypothesis, the simultaneous stopping rule requires a lower average sample number than the separate stopping rule. This comes at the cost of a lower power to reject all null hypotheses. Some of this loss in power can be regained by applying the improved stopping boundaries for the simultaneous stopping rule. The procedures are illustrated with clinical trials in systemic sclerosis and narcolepsy. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Using Bayes factors to evaluate evidence for no effect: examples from the SIPS project.
Dienes, Zoltan; Coulton, Simon; Heather, Nick
2018-02-01
To illustrate how Bayes factors are important for determining the effectiveness of interventions. We consider a case where inappropriate conclusions were drawn publicly based on significance testing, namely the SIPS project (Screening and Intervention Programme for Sensible drinking), a pragmatic, cluster-randomized controlled trial in each of two health-care settings and in the criminal justice system. We show how Bayes factors can disambiguate the non-significant findings from the SIPS project and thus determine whether the findings represent evidence of absence or absence of evidence. We show how to model the sort of effects that could be expected, and how to check the robustness of the Bayes factors. The findings from the three SIPS trials taken individually are largely uninformative but, when data from these trials are combined, there is moderate evidence for a null hypothesis (H0) and thus for a lack of effect of brief intervention compared with simple clinical feedback and an alcohol information leaflet (B = 0.24, P = 0.43). Scientists who find non-significant results should suspend judgement-unless they calculate a Bayes factor to indicate either that there is evidence for a null hypothesis (H0) over a (well-justified) alternative hypothesis (H1), or that more data are needed. © 2017 Society for the Study of Addiction.
van Reenen, Mari; Westerhuis, Johan A; Reinecke, Carolus J; Venter, J Hendrik
2017-02-02
ERp is a variable selection and classification method for metabolomics data. ERp uses minimized classification error rates, based on data from a control and experimental group, to test the null hypothesis of no difference between the distributions of variables over the two groups. If the associated p-values are significant they indicate discriminatory variables (i.e. informative metabolites). The p-values are calculated assuming a common continuous strictly increasing cumulative distribution under the null hypothesis. This assumption is violated when zero-valued observations can occur with positive probability, a characteristic of GC-MS metabolomics data, disqualifying ERp in this context. This paper extends ERp to address two sources of zero-valued observations: (i) zeros reflecting the complete absence of a metabolite from a sample (true zeros); and (ii) zeros reflecting a measurement below the detection limit. This is achieved by allowing the null cumulative distribution function to take the form of a mixture between a jump at zero and a continuous strictly increasing function. The extended ERp approach is referred to as XERp. XERp is no longer non-parametric, but its null distributions depend only on one parameter, the true proportion of zeros. Under the null hypothesis this parameter can be estimated by the proportion of zeros in the available data. XERp is shown to perform well with regard to bias and power. To demonstrate the utility of XERp, it is applied to GC-MS data from a metabolomics study on tuberculosis meningitis in infants and children. We find that XERp is able to provide an informative shortlist of discriminatory variables, while attaining satisfactory classification accuracy for new subjects in a leave-one-out cross-validation context. XERp takes into account the distributional structure of data with a probability mass at zero without requiring any knowledge of the detection limit of the metabolomics platform. XERp is able to identify variables that discriminate between two groups by simultaneously extracting information from the difference in the proportion of zeros and shifts in the distributions of the non-zero observations. XERp uses simple rules to classify new subjects and a weight pair to adjust for unequal sample sizes or sensitivity and specificity requirements.
Kilinç, Delal Dara; Sayar, Gülşilay
2018-04-07
The aim of this study was to evaluate the effect of total surface sandblasting on the shear bond strength of two different retainer wires. The null hypothesis was that there is no difference in the bond strength of the two types of lingual retainer wires when they are sandblasted. One hundred and sixty human premolar teeth were equally divided into four groups (n=40). A pair of teeth was embedded in self-curing acrylic resin and polished. Retainer wires were applied on the etched and rinsed surfaces of the teeth. Four retainers were used: group 1: braided retainer (0.010×0.028″, Ortho Technology); group 2: sandblasted braided retainer (0.010×0.028″, Ortho Technology); group 3: coaxial retainer (0.0215″ Coaxial, 3M) and group 4: sandblasted coaxial retainer (0.0215″ Coaxial, 3M). The specimens were tested using a universal test machine in shear mode with a crosshead speed of one mm/min. One-way analysis of variance (Anova) was used to determine the significant differences among the groups. There was no significant difference (P=0.117) among the groups according to this test. The null hypothesis was accepted. There was no statistically significant difference among the shear bond strength values of the four groups. Copyright © 2018 CEO. Published by Elsevier Masson SAS. All rights reserved.
Robustness of survival estimates for radio-marked animals
Bunck, C.M.; Chen, C.-L.
1992-01-01
Telemetry techniques are often used to study the survival of birds and mammals; particularly whcn mark-recapture approaches are unsuitable. Both parametric and nonparametric methods to estimate survival have becn developed or modified from other applications. An implicit assumption in these approaches is that the probability of re-locating an animal with a functioning transmitter is one. A Monte Carlo study was conducted to determine the bias and variance of the Kaplan-Meier estimator and an estimator based also on the assumption of constant hazard and to eva!uate the performance of the two-sample tests associated with each. Modifications of each estimator which allow a re-Iocation probability of less than one are described and evaluated. Generallv the unmodified estimators were biased but had lower variance. At low sample sizes all estimators performed poorly. Under the null hypothesis, the distribution of all test statistics reasonably approximated the null distribution when survival was low but not when it was high. The power of the two-sample tests were similar.
Replication Unreliability in Psychology: Elusive Phenomena or “Elusive” Statistical Power?
Tressoldi, Patrizio E.
2012-01-01
The focus of this paper is to analyze whether the unreliability of results related to certain controversial psychological phenomena may be a consequence of their low statistical power. Applying the Null Hypothesis Statistical Testing (NHST), still the widest used statistical approach, unreliability derives from the failure to refute the null hypothesis, in particular when exact or quasi-exact replications of experiments are carried out. Taking as example the results of meta-analyses related to four different controversial phenomena, subliminal semantic priming, incubation effect for problem solving, unconscious thought theory, and non-local perception, it was found that, except for semantic priming on categorization, the statistical power to detect the expected effect size (ES) of the typical study, is low or very low. The low power in most studies undermines the use of NHST to study phenomena with moderate or low ESs. We conclude by providing some suggestions on how to increase the statistical power or use different statistical approaches to help discriminate whether the results obtained may or may not be used to support or to refute the reality of a phenomenon with small ES. PMID:22783215
Incorporation of metal and color alteration of enamel in the presence of orthodontic appliances.
Maia, Lúcio Henrique E Gurgel; Filho, Hibernon Lopes de Lima; Araújo, Marcus Vinícius Almeida; Ruellas, Antônio Carlos de Oliveira; Araújo, Mônica Tirre de Souza
2012-09-01
To test the null hypothesis that it is not possible to incorporate metal ions arising from orthodontic appliance corrosion into tooth enamel with resulting tooth color change. This in vitro study used atomic absorption spectrophotometry to evaluate the presence of nickel, chromium, and iron ions in tooth enamel in three groups: a group submitted to cyclic demineralization and remineralization processes with solutions in which orthodontic appliances were previously immersed and corroded, releasing metallic ions; a control group; and another group, submitted to cycling only, without the presence of orthodontic appliances. The influence of the incorporation of these metals on a possible alteration in color was measured with a portable digital spectrophotometer using the CIE LAB system. At the end of the experiment, a significantly higher concentration of chromium and nickel (P < .05) was found in the group in which corrosion was present, and in this group, there was significantly greater color alteration (P ≤ .001). There was chromium and nickel incorporation into enamel and tooth color change when corrosion of orthodontic appliances was associated with cycling process. The null hypothesis is rejected.
Guzman-Rojas, Liliana; Rangel, Roberto; Salameh, Ahmad; Edwards, Julianna K; Dondossola, Eleonora; Kim, Yun-Gon; Saghatelian, Alan; Giordano, Ricardo J; Kolonin, Mikhail G; Staquicini, Fernanda I; Koivunen, Erkki; Sidman, Richard L; Arap, Wadih; Pasqualini, Renata
2012-01-31
Processes that promote cancer progression such as angiogenesis require a functional interplay between malignant and nonmalignant cells in the tumor microenvironment. The metalloprotease aminopeptidase N (APN; CD13) is often overexpressed in tumor cells and has been implicated in angiogenesis and cancer progression. Our previous studies of APN-null mice revealed impaired neoangiogenesis in model systems without cancer cells and suggested the hypothesis that APN expressed by nonmalignant cells might promote tumor growth. We tested this hypothesis by comparing the effects of APN deficiency in allografted malignant (tumor) and nonmalignant (host) cells on tumor growth and metastasis in APN-null mice. In two independent tumor graft models, APN activity in both the tumors and the host cells cooperate to promote tumor vascularization and growth. Loss of APN expression by the host and/or the malignant cells also impaired lung metastasis in experimental mouse models. Thus, cooperation in APN expression by both cancer cells and nonmalignant stromal cells within the tumor microenvironment promotes angiogenesis, tumor growth, and metastasis.
Whitworth, John Martin; Kanaa, Mohammad Dib; Corbett, Ian Porter; Meechan, John Gerald
2007-10-01
This randomized, double-blind trial tested the null hypothesis that speed of deposition has no influence on the injection discomfort, efficacy, distribution, and duration of pulp anesthesia after incisive/mental nerve block in adult volunteers. Thirty-eight subjects received incisive/mental nerve blocks of 2.0 mL lidocaine with 1:80,000 epinephrine slowly over 60 seconds or rapidly over 15 seconds at least 1 week apart. Pulp anesthesia was assessed electronically to 45 minutes after injection. Injection discomfort was self-recorded on visual analogue scales. Overall, 48.7% of volunteers developed pulp anesthesia in first molars, 81.8% in bicuspids, and 38.5% in lateral incisors. The mean duration of pulp anesthesia was 19.1 minutes for first molars, 28.5 minutes for bicuspids, and 19.0 minutes for lateral incisors. Speed of injection had no significant influence on anesthetic success or duration of anesthesia for individual teeth. Slow injection was significantly more comfortable than rapid injection (P < .001). The null hypothesis was supported, although slow injection was more comfortable.
Seebacher, Frank
2005-10-01
Biological functions are dependent on the temperature of the organism. Animals may respond to fluctuation in the thermal environment by regulating their body temperature and by modifying physiological and biochemical rates. Phenotypic flexibility (reversible phenotypic plasticity, acclimation, or acclimatisation in rate functions occurs in all major taxonomic groups and may be considered as an ancestral condition. Within the Reptilia, representatives from all major groups show phenotypic flexibility in response to long-term or chronic changes in the thermal environment. Acclimation or acclimatisation in reptiles are most commonly assessed by measuring whole animal responses such as oxygen consumption, but whole animal responses are comprised of variation in individual traits such as enzyme activities, hormone expression, and cardiovascular functions. The challenge now lies in connecting the changes in the components to the functioning of the whole animal and its fitness. Experimental designs in research on reptilian thermal physiology should incorporate the capacity for reversible phenotypic plasticity as a null-hypothesis, because the significance of differential body temperature-performance relationships (thermal reaction norms) between individuals, populations, or species cannot be assessed without testing that null-hypothesis.
Diffuse prior monotonic likelihood ratio test for evaluation of fused image quality measures.
Wei, Chuanming; Kaplan, Lance M; Burks, Stephen D; Blum, Rick S
2011-02-01
This paper introduces a novel method to score how well proposed fused image quality measures (FIQMs) indicate the effectiveness of humans to detect targets in fused imagery. The human detection performance is measured via human perception experiments. A good FIQM should relate to perception results in a monotonic fashion. The method computes a new diffuse prior monotonic likelihood ratio (DPMLR) to facilitate the comparison of the H(1) hypothesis that the intrinsic human detection performance is related to the FIQM via a monotonic function against the null hypothesis that the detection and image quality relationship is random. The paper discusses many interesting properties of the DPMLR and demonstrates the effectiveness of the DPMLR test via Monte Carlo simulations. Finally, the DPMLR is used to score FIQMs with test cases considering over 35 scenes and various image fusion algorithms.
Two-sample discrimination of Poisson means
NASA Technical Reports Server (NTRS)
Lampton, M.
1994-01-01
This paper presents a statistical test for detecting significant differences between two random count accumulations. The null hypothesis is that the two samples share a common random arrival process with a mean count proportional to each sample's exposure. The model represents the partition of N total events into two counts, A and B, as a sequence of N independent Bernoulli trials whose partition fraction, f, is determined by the ratio of the exposures of A and B. The detection of a significant difference is claimed when the background (null) hypothesis is rejected, which occurs when the observed sample falls in a critical region of (A, B) space. The critical region depends on f and the desired significance level, alpha. The model correctly takes into account the fluctuations in both the signals and the background data, including the important case of small numbers of counts in the signal, the background, or both. The significance can be exactly determined from the cumulative binomial distribution, which in turn can be inverted to determine the critical A(B) or B(A) contour. This paper gives efficient implementations of these tests, based on lookup tables. Applications include the detection of clustering of astronomical objects, the detection of faint emission or absorption lines in photon-limited spectroscopy, the detection of faint emitters or absorbers in photon-limited imaging, and dosimetry.
Adiponectin deficiency impairs liver regeneration through attenuating STAT3 phosphorylation in mice.
Shu, Run-Zhe; Zhang, Feng; Wang, Fang; Feng, De-Chun; Li, Xi-Hua; Ren, Wei-Hua; Wu, Xiao-Lin; Yang, Xue; Liao, Xiao-Dong; Huang, Lei; Wang, Zhu-Gang
2009-09-01
Liver regeneration is a very complex and well-orchestrated process associated with signaling cascades involving cytokines, growth factors, and metabolic pathways. Adiponectin is an adipocytokine secreted by mature adipocytes, and its receptors are widely distributed in many tissues, including the liver. Adiponectin has direct actions in the liver with prominent roles to improve hepatic insulin sensitivity, increase fatty acid oxidation, and decrease inflammation. To test the hypothesis that adiponectin is required for normal progress of liver regeneration, 2/3 partial hepatectomy (PH) was performed on wild-type and adiponectin-null mice. Compared to wild-type mice, adiponectin-null mice displayed decreased liver mass regrowth, impeded hepatocyte proliferation, and increased hepatic lipid accumulation. Gene expression analysis revealed that adiponectin regulated the gene transcription related to lipid metabolism. Furthermore, the suppressed hepatocyte proliferation was accompanied with reduced signal transducer and activator of transcription protein 3 (STAT3) activity and enhanced suppressor of cytokine signaling 3 (Socs3) transcription. In conclusion, adiponectin-null mice exhibit impaired liver regeneration and increased hepatic steatosis. Increased expression of Socs3 and subsequently reduced activation of STAT3 in adiponectin-null mice may contribute to the alteration of the liver regeneration capability and hepatic lipid metabolism after PH.
Nonparametric estimation and testing of fixed effects panel data models
Henderson, Daniel J.; Carroll, Raymond J.; Li, Qi
2009-01-01
In this paper we consider the problem of estimating nonparametric panel data models with fixed effects. We introduce an iterative nonparametric kernel estimator. We also extend the estimation method to the case of a semiparametric partially linear fixed effects model. To determine whether a parametric, semiparametric or nonparametric model is appropriate, we propose test statistics to test between the three alternatives in practice. We further propose a test statistic for testing the null hypothesis of random effects against fixed effects in a nonparametric panel data regression model. Simulations are used to examine the finite sample performance of the proposed estimators and the test statistics. PMID:19444335
Dicken, Cary L.; Israel, Davelene D.; Davis, Joe B.; Sun, Yan; Shu, Jun; Hardin, John; Neal-Perry, Genevieve
2012-01-01
ABSTRACT The mechanism(s) by which vitamin D3 regulates female reproduction is minimally understood. We tested the hypothesis that peripubertal vitamin D3 deficiency disrupts hypothalamic-pituitary-ovarian physiology. To test this hypothesis, we used wild-type mice and Cyp27b1 (the rate-limiting enzyme in the synthesis of 1,25-dihydroxyvitamin D3) null mice to study the effect of vitamin D3 deficiency on puberty and reproductive physiology. At the time of weaning, mice were randomized to a vitamin D3-replete or -deficient diet supplemented with calcium. We assessed the age of vaginal opening and first estrus (puberty markers), gonadotropin levels, ovarian histology, ovarian responsiveness to exogenous gonadotropins, and estrous cyclicity. Peripubertal vitamin D3 deficiency significantly delayed vaginal opening without affecting the number of GnRH-immunopositive neurons or estradiol-negative feedback on gonadotropin levels during diestrus. Young adult females maintained on a vitamin D3-deficient diet after puberty had arrested follicular development and prolonged estrous cycles characterized by extended periods of diestrus. Ovaries of vitamin D3-deficient Cyp27b1 null mice responded to exogenous gonadotropins and deposited significantly more oocytes into the oviducts than mice maintained on a vitamin D3-replete diet. Estrous cycles were restored when vitamin D3-deficient Cyp27b1 null young adult females were transferred to a vitamin D3-replete diet. This study is the first to demonstrate that peripubertal vitamin D3 sufficiency is important for an appropriately timed pubertal transition and maintenance of normal female reproductive physiology. These data suggest vitamin D3 is a key regulator of neuroendocrine and ovarian physiology. PMID:22572998
Goovaerts, Pierre; Jacquez, Geoffrey M
2004-01-01
Background Complete Spatial Randomness (CSR) is the null hypothesis employed by many statistical tests for spatial pattern, such as local cluster or boundary analysis. CSR is however not a relevant null hypothesis for highly complex and organized systems such as those encountered in the environmental and health sciences in which underlying spatial pattern is present. This paper presents a geostatistical approach to filter the noise caused by spatially varying population size and to generate spatially correlated neutral models that account for regional background obtained by geostatistical smoothing of observed mortality rates. These neutral models were used in conjunction with the local Moran statistics to identify spatial clusters and outliers in the geographical distribution of male and female lung cancer in Nassau, Queens, and Suffolk counties, New York, USA. Results We developed a typology of neutral models that progressively relaxes the assumptions of null hypotheses, allowing for the presence of spatial autocorrelation, non-uniform risk, and incorporation of spatially heterogeneous population sizes. Incorporation of spatial autocorrelation led to fewer significant ZIP codes than found in previous studies, confirming earlier claims that CSR can lead to over-identification of the number of significant spatial clusters or outliers. Accounting for population size through geostatistical filtering increased the size of clusters while removing most of the spatial outliers. Integration of regional background into the neutral models yielded substantially different spatial clusters and outliers, leading to the identification of ZIP codes where SMR values significantly depart from their regional background. Conclusion The approach presented in this paper enables researchers to assess geographic relationships using appropriate null hypotheses that account for the background variation extant in real-world systems. In particular, this new methodology allows one to identify geographic pattern above and beyond background variation. The implementation of this approach in spatial statistical software will facilitate the detection of spatial disparities in mortality rates, establishing the rationale for targeted cancer control interventions, including consideration of health services needs, and resource allocation for screening and diagnostic testing. It will allow researchers to systematically evaluate how sensitive their results are to assumptions implicit under alternative null hypotheses. PMID:15272930
Toward "Constructing" the Concept of Statistical Power: An Optical Analogy.
ERIC Educational Resources Information Center
Rogers, Bruce G.
This paper presents a visual analogy that may be used by instructors to teach the concept of statistical power in statistical courses. Statistical power is mathematically defined as the probability of rejecting a null hypothesis when that null is false, or, equivalently, the probability of detecting a relationship when it exists. The analogy…
A General Class of Test Statistics for Van Valen’s Red Queen Hypothesis
Wiltshire, Jelani; Huffer, Fred W.; Parker, William C.
2014-01-01
Van Valen’s Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen’s work, various statistical approaches have been used to address the relationship between taxon age and the rate of extinction. We propose a general class of test statistics that can be used to test for the effect of age on the rate of extinction. These test statistics allow for a varying background rate of extinction and attempt to remove the effects of other covariates when assessing the effect of age on extinction. No model is assumed for the covariate effects. Instead we control for covariate effects by pairing or grouping together similar species. Simulations are used to compare the power of the statistics. We apply the test statistics to data on Foram extinctions and find that age has a positive effect on the rate of extinction. A derivation of the null distribution of one of the test statistics is provided in the supplementary material. PMID:24910489
A General Class of Test Statistics for Van Valen's Red Queen Hypothesis.
Wiltshire, Jelani; Huffer, Fred W; Parker, William C
2014-09-01
Van Valen's Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen's work, various statistical approaches have been used to address the relationship between taxon age and the rate of extinction. We propose a general class of test statistics that can be used to test for the effect of age on the rate of extinction. These test statistics allow for a varying background rate of extinction and attempt to remove the effects of other covariates when assessing the effect of age on extinction. No model is assumed for the covariate effects. Instead we control for covariate effects by pairing or grouping together similar species. Simulations are used to compare the power of the statistics. We apply the test statistics to data on Foram extinctions and find that age has a positive effect on the rate of extinction. A derivation of the null distribution of one of the test statistics is provided in the supplementary material.
Yang, Songshan; Cranford, James A; Jester, Jennifer M; Li, Runze; Zucker, Robert A; Buu, Anne
2017-02-28
This study proposes a time-varying effect model for examining group differences in trajectories of zero-inflated count outcomes. The motivating example demonstrates that this zero-inflated Poisson model allows investigators to study group differences in different aspects of substance use (e.g., the probability of abstinence and the quantity of alcohol use) simultaneously. The simulation study shows that the accuracy of estimation of trajectory functions improves as the sample size increases; the accuracy under equal group sizes is only higher when the sample size is small (100). In terms of the performance of the hypothesis testing, the type I error rates are close to their corresponding significance levels under all settings. Furthermore, the power increases as the alternative hypothesis deviates more from the null hypothesis, and the rate of this increasing trend is higher when the sample size is larger. Moreover, the hypothesis test for the group difference in the zero component tends to be less powerful than the test for the group difference in the Poisson component. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Menne, Matthew J.; Williams, Claude N., Jr.
2005-10-01
An evaluation of three hypothesis test statistics that are commonly used in the detection of undocumented changepoints is described. The goal of the evaluation was to determine whether the use of multiple tests could improve undocumented, artificial changepoint detection skill in climate series. The use of successive hypothesis testing is compared to optimal approaches, both of which are designed for situations in which multiple undocumented changepoints may be present. In addition, the importance of the form of the composite climate reference series is evaluated, particularly with regard to the impact of undocumented changepoints in the various component series that are used to calculate the composite.In a comparison of single test changepoint detection skill, the composite reference series formulation is shown to be less important than the choice of the hypothesis test statistic, provided that the composite is calculated from the serially complete and homogeneous component series. However, each of the evaluated composite series is not equally susceptible to the presence of changepoints in its components, which may be erroneously attributed to the target series. Moreover, a reference formulation that is based on the averaging of the first-difference component series is susceptible to random walks when the composition of the component series changes through time (e.g., values are missing), and its use is, therefore, not recommended. When more than one test is required to reject the null hypothesis of no changepoint, the number of detected changepoints is reduced proportionately less than the number of false alarms in a wide variety of Monte Carlo simulations. Consequently, a consensus of hypothesis tests appears to improve undocumented changepoint detection skill, especially when reference series homogeneity is violated. A consensus of successive hypothesis tests using a semihierarchic splitting algorithm also compares favorably to optimal solutions, even when changepoints are not hierarchic.
A Test by Any Other Name: P Values, Bayes Factors, and Statistical Inference.
Stern, Hal S
2016-01-01
Procedures used for statistical inference are receiving increased scrutiny as the scientific community studies the factors associated with insuring reproducible research. This note addresses recent negative attention directed at p values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye toward better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required.
2010-06-01
models 13 The Chi-Square test fails to reject the null hypothesis that there is no difference between 2008 and 2009 data (p-value = 0.601). This...attributed to process performance modeling 53 Table 4: Relationships between data quality and integrity activities and overall value attributed to... data quality and integrity; staffing and resources devoted to the work; pertinent training and coaching; and the alignment of the models with
NASA Technical Reports Server (NTRS)
Carpenter, J. R.; Markley, F. L.; Alfriend, K. T.; Wright, C.; Arcido, J.
2011-01-01
Sequential probability ratio tests explicitly allow decision makers to incorporate false alarm and missed detection risks, and are potentially less sensitive to modeling errors than a procedure that relies solely on a probability of collision threshold. Recent work on constrained Kalman filtering has suggested an approach to formulating such a test for collision avoidance maneuver decisions: a filter bank with two norm-inequality-constrained epoch-state extended Kalman filters. One filter models 1he null hypothesis 1ha1 the miss distance is inside the combined hard body radius at the predicted time of closest approach, and one filter models the alternative hypothesis. The epoch-state filter developed for this method explicitly accounts for any process noise present in the system. The method appears to work well using a realistic example based on an upcoming highly-elliptical orbit formation flying mission.
Krypotos, Angelos-Miltiadis; Klugkist, Irene; Engelhard, Iris M.
2017-01-01
ABSTRACT Threat conditioning procedures have allowed the experimental investigation of the pathogenesis of Post-Traumatic Stress Disorder. The findings of these procedures have also provided stable foundations for the development of relevant intervention programs (e.g. exposure therapy). Statistical inference of threat conditioning procedures is commonly based on p-values and Null Hypothesis Significance Testing (NHST). Nowadays, however, there is a growing concern about this statistical approach, as many scientists point to the various limitations of p-values and NHST. As an alternative, the use of Bayes factors and Bayesian hypothesis testing has been suggested. In this article, we apply this statistical approach to threat conditioning data. In order to enable the easy computation of Bayes factors for threat conditioning data we present a new R package named condir, which can be used either via the R console or via a Shiny application. This article provides both a non-technical introduction to Bayesian analysis for researchers using the threat conditioning paradigm, and the necessary tools for computing Bayes factors easily. PMID:29038683
Price, James W
Does performing pre-employment hair drug testing subsequently affect the prevalence of positive random and postaccident urine drug tests? This cross-sectional study was designed to evaluate the prevalence of positive postaccident and random workplace urine drug tests for companies that perform pre-employment hair and urine drug testing to companies that only perform pre-employment urine drug testing. Fisher exact test of independence indicated no significant difference between pre-employment hair drug testing and overall US Department of Transportation random and postaccident urine drug test positivity rates. The analysis failed to reject the null hypothesis, suggesting that pre-employment hair drug testing had no effect upon random and postaccident urine drug test positivity rates.
On the insignificance of Herschel's sunspot correlation
NASA Astrophysics Data System (ADS)
Love, Jeffrey J.
2013-08-01
We examine William Herschel's hypothesis that solar-cycle variation of the Sun's irradiance has a modulating effect on the Earth's climate and that this is, specifically, manifested as an anticorrelation between sunspot number and the market price of wheat. Since Herschel first proposed his hypothesis in 1801, it has been regarded with both interest and skepticism. Recently, reports have been published that either support Herschel's hypothesis or rely on its validity. As a test of Herschel's hypothesis, we seek to reject a null hypothesis of a statistically random correlation between historical sunspot numbers, wheat prices in London and the United States, and wheat farm yields in the United States. We employ binary-correlation, Pearson-correlation, and frequency-domain methods. We test our methods using a historical geomagnetic activity index, well known to be causally correlated with sunspot number. As expected, the measured correlation between sunspot number and geomagnetic activity would be an unlikely realization of random data; the correlation is "statistically significant." On the other hand, measured correlations between sunspot number and wheat price and wheat yield data would be very likely realizations of random data; these correlations are "insignificant." Therefore, Herschel's hypothesis must be regarded with skepticism. We compare and contrast our results with those of other researchers. We discuss procedures for evaluating hypotheses that are formulated from historical data.
A Bayesian Approach to the Paleomagnetic Conglomerate Test
NASA Astrophysics Data System (ADS)
Heslop, David; Roberts, Andrew P.
2018-02-01
The conglomerate test has served the paleomagnetic community for over 60 years as a means to detect remagnetizations. The test states that if a suite of clasts within a bed have uniformly random paleomagnetic directions, then the conglomerate cannot have experienced a pervasive event that remagnetized the clasts in the same direction. The current form of the conglomerate test is based on null hypothesis testing, which results in a binary "pass" (uniformly random directions) or "fail" (nonrandom directions) outcome. We have recast the conglomerate test in a Bayesian framework with the aim of providing more information concerning the level of support a given data set provides for a hypothesis of uniformly random paleomagnetic directions. Using this approach, we place the conglomerate test in a fully probabilistic framework that allows for inconclusive results when insufficient information is available to draw firm conclusions concerning the randomness or nonrandomness of directions. With our method, sample sets larger than those typically employed in paleomagnetism may be required to achieve strong support for a hypothesis of random directions. Given the potentially detrimental effect of unrecognized remagnetizations on paleomagnetic reconstructions, it is important to provide a means to draw statistically robust data-driven inferences. Our Bayesian analysis provides a means to do this for the conglomerate test.
A Bayesian Perspective on the Reproducibility Project: Psychology
Etz, Alexander; Vandekerckhove, Joachim
2016-01-01
We revisit the results of the recent Reproducibility Project: Psychology by the Open Science Collaboration. We compute Bayes factors—a quantity that can be used to express comparative evidence for an hypothesis but also for the null hypothesis—for a large subset (N = 72) of the original papers and their corresponding replication attempts. In our computation, we take into account the likely scenario that publication bias had distorted the originally published results. Overall, 75% of studies gave qualitatively similar results in terms of the amount of evidence provided. However, the evidence was often weak (i.e., Bayes factor < 10). The majority of the studies (64%) did not provide strong evidence for either the null or the alternative hypothesis in either the original or the replication, and no replication attempts provided strong evidence in favor of the null. In all cases where the original paper provided strong evidence but the replication did not (15%), the sample size in the replication was smaller than the original. Where the replication provided strong evidence but the original did not (10%), the replication sample size was larger. We conclude that the apparent failure of the Reproducibility Project to replicate many target effects can be adequately explained by overestimation of effect sizes (or overestimation of evidence against the null hypothesis) due to small sample sizes and publication bias in the psychological literature. We further conclude that traditional sample sizes are insufficient and that a more widespread adoption of Bayesian methods is desirable. PMID:26919473
Statistical evaluation of synchronous spike patterns extracted by frequent item set mining
Torre, Emiliano; Picado-Muiño, David; Denker, Michael; Borgelt, Christian; Grün, Sonja
2013-01-01
We recently proposed frequent itemset mining (FIM) as a method to perform an optimized search for patterns of synchronous spikes (item sets) in massively parallel spike trains. This search outputs the occurrence count (support) of individual patterns that are not trivially explained by the counts of any superset (closed frequent item sets). The number of patterns found by FIM makes direct statistical tests infeasible due to severe multiple testing. To overcome this issue, we proposed to test the significance not of individual patterns, but instead of their signatures, defined as the pairs of pattern size z and support c. Here, we derive in detail a statistical test for the significance of the signatures under the null hypothesis of full independence (pattern spectrum filtering, PSF) by means of surrogate data. As a result, injected spike patterns that mimic assembly activity are well detected, yielding a low false negative rate. However, this approach is prone to additionally classify patterns resulting from chance overlap of real assembly activity and background spiking as significant. These patterns represent false positives with respect to the null hypothesis of having one assembly of given signature embedded in otherwise independent spiking activity. We propose the additional method of pattern set reduction (PSR) to remove these false positives by conditional filtering. By employing stochastic simulations of parallel spike trains with correlated activity in form of injected spike synchrony in subsets of the neurons, we demonstrate for a range of parameter settings that the analysis scheme composed of FIM, PSF and PSR allows to reliably detect active assemblies in massively parallel spike trains. PMID:24167487
Weinheimer-Haus, Eileen M.; Mirza, Rita E.; Koh, Timothy J.
2015-01-01
The Nod-like receptor protein (NLRP)-3 inflammasome/IL-1β pathway is involved in the pathogenesis of various inflammatory skin diseases, but its biological role in wound healing remains to be elucidated. Since inflammation is typically thought to impede healing, we hypothesized that loss of NLRP-3 activity would result in a downregulated inflammatory response and accelerated wound healing. NLRP-3 null mice, caspase-1 null mice and C57Bl/6 wild type control mice (WT) received four 8 mm excisional cutaneous wounds; inflammation and healing were assessed during the early stage of wound healing. Consistent with our hypothesis, wounds from NLRP-3 null and caspase-1 null mice contained lower levels of the pro-inflammatory cytokines IL-1β and TNF-α compared to WT mice and had reduced neutrophil and macrophage accumulation. Contrary to our hypothesis, re-epithelialization, granulation tissue formation, and angiogenesis were delayed in NLRP-3 null mice and caspase-1 null mice compared to WT mice, indicating that NLRP-3 signaling is important for early events in wound healing. Topical treatment of excisional wounds with recombinant IL-1β partially restored granulation tissue formation in wounds of NLRP-3 null mice, confirming the importance of NLRP-3-dependent IL-1β production during early wound healing. Despite the improvement in healing, angiogenesis and levels of the pro-angiogenic growth factor VEGF were further reduced in IL-1β treated wounds, suggesting that IL-1β has a negative effect on angiogenesis and that NLRP-3 promotes angiogenesis in an IL-1β-independent manner. These findings indicate that the NLRP-3 inflammasome contributes to the early inflammatory phase following skin wounding and is important for efficient healing. PMID:25793779
A Demographic Analysis of Suicide Among U.S. Navy Personnel
1997-08-01
estimates of a Poisson- distributed variable according to the procedure described in Lilienfeld and Lilienfeld .27 Based on averaged age-specific rates of...n suicides, the total number of pairs will be n(n- 1)/2). The Knox method tests the null hypothesis that the event of a pair of suicides being close...significantly differ. It is likely, however, that the military’s required suicide prevention programs and psychological autopsies help to ascertain as
Lisovskiĭ, A A; Pavlinov, I Ia
2008-01-01
Any morphospace is partitioned by the forms of group variation, its structure is described by a set of scalar (range, overlap) and vector (direction) characteristics. They are analyzed quantitatively for the sex and age variations in the sample of 200 skulls of the pine marten described by 14 measurable traits. Standard dispersion and variance components analyses are employed, accompanied with several resampling methods (randomization and bootstrep); effects of changes in the analysis design on results of the above methods are also considered. Maximum likelihood algorithm of variance components analysis is shown to give an adequate estimates of portions of particular forms of group variation within the overall disparity. It is quite stable in respect to changes of the analysis design and therefore could be used in the explorations of the real data with variously unbalanced designs. A new algorithm of estimation of co-directionality of particular forms of group variation within the overall disparity is elaborated, which includes angle measures between eigenvectors of covariation matrices of effects of group variations calculated by dispersion analysis. A null hypothesis of random portion of a given group variation could be tested by means of randomization of the respective grouping variable. A null hypothesis of equality of both portions and directionalities of different forms of group variation could be tested by means of the bootstrep procedure.
Fatigue Failure of External Hexagon Connections on Cemented Implant-Supported Crowns.
Malta Barbosa, João; Navarro da Rocha, Daniel; Hirata, Ronaldo; Freitas, Gileade; Bonfante, Estevam A; Coelho, Paulo G
2018-01-17
To evaluate the probability of survival and failure modes of different external hexagon connection systems restored with anterior cement-retained single-unit crowns. The postulated null hypothesis was that there would be no differences under accelerated life testing. Fifty-four external hexagon dental implants (∼4 mm diameter) were used for single cement-retained crown replacement and divided into 3 groups: (3i) Full OSSEOTITE, Biomet 3i (n = 18); (OL) OEX P4, Osseolife Implants (n = 18); and (IL) Unihex, Intra-Lock International (n = 18). Abutments were torqued to the implants, and maxillary central incisor crowns were cemented and subjected to step-stress-accelerated life testing in water. Use-level probability Weibull curves and probability of survival for a mission of 100,000 cycles at 200 N (95% 2-sided confidence intervals) were calculated. Stereo and scanning electron microscopes were used for failure inspection. The beta values for 3i, OL, and IL (1.60, 1.69, and 1.23, respectively) indicated that fatigue accelerated the failure of the 3 groups. Reliability for the 3i and OL (41% and 68%, respectively) was not different between each other, but both were significantly lower than IL group (98%). Abutment screw fracture was the failure mode consistently observed in all groups. Because the reliability was significantly different between the 3 groups, our postulated null hypothesis was rejected.
Beyond statistical inference: a decision theory for science.
Killeen, Peter R
2006-08-01
Traditional null hypothesis significance testing does not yield the probability of the null or its alternative and, therefore, cannot logically ground scientific decisions. The decision theory proposed here calculates the expected utility of an effect on the basis of (1) the probability of replicating it and (2) a utility function on its size. It takes significance tests--which place all value on the replicability of an effect and none on its magnitude--as a special case, one in which the cost of a false positive is revealed to be an order of magnitude greater than the value of a true positive. More realistic utility functions credit both replicability and effect size, integrating them for a single index of merit. The analysis incorporates opportunity cost and is consistent with alternate measures of effect size, such as r2 and information transmission, and with Bayesian model selection criteria. An alternate formulation is functionally equivalent to the formal theory, transparent, and easy to compute.
Towards a framework for testing general relativity with extreme-mass-ratio-inspiral observations
NASA Astrophysics Data System (ADS)
Chua, A. J. K.; Hee, S.; Handley, W. J.; Higson, E.; Moore, C. J.; Gair, J. R.; Hobson, M. P.; Lasenby, A. N.
2018-07-01
Extreme-mass-ratio-inspiral observations from future space-based gravitational-wave detectors such as LISA will enable strong-field tests of general relativity with unprecedented precision, but at prohibitive computational cost if existing statistical techniques are used. In one such test that is currently employed for LIGO black hole binary mergers, generic deviations from relativity are represented by N deformation parameters in a generalized waveform model; the Bayesian evidence for each of its 2N combinatorial submodels is then combined into a posterior odds ratio for modified gravity over relativity in a null-hypothesis test. We adapt and apply this test to a generalized model for extreme-mass-ratio inspirals constructed on deformed black hole spacetimes, and focus our investigation on how computational efficiency can be increased through an evidence-free method of model selection. This method is akin to the algorithm known as product-space Markov chain Monte Carlo, but uses nested sampling and improved error estimates from a rethreading technique. We perform benchmarking and robustness checks for the method, and find order-of-magnitude computational gains over regular nested sampling in the case of synthetic data generated from the null model.
Towards a framework for testing general relativity with extreme-mass-ratio-inspiral observations
NASA Astrophysics Data System (ADS)
Chua, A. J. K.; Hee, S.; Handley, W. J.; Higson, E.; Moore, C. J.; Gair, J. R.; Hobson, M. P.; Lasenby, A. N.
2018-04-01
Extreme-mass-ratio-inspiral observations from future space-based gravitational-wave detectors such as LISA will enable strong-field tests of general relativity with unprecedented precision, but at prohibitive computational cost if existing statistical techniques are used. In one such test that is currently employed for LIGO black-hole binary mergers, generic deviations from relativity are represented by N deformation parameters in a generalised waveform model; the Bayesian evidence for each of its 2N combinatorial submodels is then combined into a posterior odds ratio for modified gravity over relativity in a null-hypothesis test. We adapt and apply this test to a generalised model for extreme-mass-ratio inspirals constructed on deformed black-hole spacetimes, and focus our investigation on how computational efficiency can be increased through an evidence-free method of model selection. This method is akin to the algorithm known as product-space Markov chain Monte Carlo, but uses nested sampling and improved error estimates from a rethreading technique. We perform benchmarking and robustness checks for the method, and find order-of-magnitude computational gains over regular nested sampling in the case of synthetic data generated from the null model.
Gaus, Wilhelm
2014-09-02
The US National Toxicology Program (NTP) is assessed by a statistician. In the NTP-program groups of rodents are fed for a certain period of time with different doses of the substance that is being investigated. Then the animals are sacrificed and all organs are examined pathologically. Such an investigation facilitates many statistical tests. Technical Report TR 578 on Ginkgo biloba is used as an example. More than 4800 statistical tests are possible with the investigations performed. Due to a thought experiment we expect >240 false significant tests. In actuality, 209 significant pathological findings were reported. The readers of Toxicology Letters should carefully distinguish between confirmative and explorative statistics. A confirmative interpretation of a significant test rejects the null-hypothesis and delivers "statistical proof". It is only allowed if (i) a precise hypothesis was established independently from the data used for the test and (ii) the computed p-values are adjusted for multiple testing if more than one test was performed. Otherwise an explorative interpretation generates a hypothesis. We conclude that NTP-reports - including TR 578 on Ginkgo biloba - deliver explorative statistics, i.e. they generate hypotheses, but do not prove them. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Long memory and multifractality: A joint test
NASA Astrophysics Data System (ADS)
Goddard, John; Onali, Enrico
2016-06-01
The properties of statistical tests for hypotheses concerning the parameters of the multifractal model of asset returns (MMAR) are investigated, using Monte Carlo techniques. We show that, in the presence of multifractality, conventional tests of long memory tend to over-reject the null hypothesis of no long memory. Our test addresses this issue by jointly estimating long memory and multifractality. The estimation and test procedures are applied to exchange rate data for 12 currencies. Among the nested model specifications that are investigated, in 11 out of 12 cases, daily returns are most appropriately characterized by a variant of the MMAR that applies a multifractal time-deformation process to NIID returns. There is no evidence of long memory.
Sirota, Miroslav; Kostovičová, Lenka; Juanchich, Marie
2014-08-01
Knowing which properties of visual displays facilitate statistical reasoning bears practical and theoretical implications. Therefore, we studied the effect of one property of visual diplays - iconicity (i.e., the resemblance of a visual sign to its referent) - on Bayesian reasoning. Two main accounts of statistical reasoning predict different effect of iconicity on Bayesian reasoning. The ecological-rationality account predicts a positive iconicity effect, because more highly iconic signs resemble more individuated objects, which tap better into an evolutionary-designed frequency-coding mechanism that, in turn, facilitates Bayesian reasoning. The nested-sets account predicts a null iconicity effect, because iconicity does not affect the salience of a nested-sets structure-the factor facilitating Bayesian reasoning processed by a general reasoning mechanism. In two well-powered experiments (N = 577), we found no support for a positive iconicity effect across different iconicity levels that were manipulated in different visual displays (meta-analytical overall effect: log OR = -0.13, 95% CI [-0.53, 0.28]). A Bayes factor analysis provided strong evidence in favor of the null hypothesis-the null iconicity effect. Thus, these findings corroborate the nested-sets rather than the ecological-rationality account of statistical reasoning.
How to talk about protein‐level false discovery rates in shotgun proteomics
The, Matthew; Tasnim, Ayesha
2016-01-01
A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein‐level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein‐level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein‐level FDRs for both competing null hypotheses. PMID:27503675
Towers, Sherry; Mubayi, Anuj; Castillo-Chavez, Carlos
2018-01-01
When attempting to statistically distinguish between a null and an alternative hypothesis, many researchers in the life and social sciences turn to binned statistical analysis methods, or methods that are simply based on the moments of a distribution (such as the mean, and variance). These methods have the advantage of simplicity of implementation, and simplicity of explanation. However, when null and alternative hypotheses manifest themselves in subtle differences in patterns in the data, binned analysis methods may be insensitive to these differences, and researchers may erroneously fail to reject the null hypothesis when in fact more sensitive statistical analysis methods might produce a different result when the null hypothesis is actually false. Here, with a focus on two recent conflicting studies of contagion in mass killings as instructive examples, we discuss how the use of unbinned likelihood methods makes optimal use of the information in the data; a fact that has been long known in statistical theory, but perhaps is not as widely appreciated amongst general researchers in the life and social sciences. In 2015, Towers et al published a paper that quantified the long-suspected contagion effect in mass killings. However, in 2017, Lankford & Tomek subsequently published a paper, based upon the same data, that claimed to contradict the results of the earlier study. The former used unbinned likelihood methods, and the latter used binned methods, and comparison of distribution moments. Using these analyses, we also discuss how visualization of the data can aid in determination of the most appropriate statistical analysis methods to distinguish between a null and alternate hypothesis. We also discuss the importance of assessment of the robustness of analysis results to methodological assumptions made (for example, arbitrary choices of number of bins and bin widths when using binned methods); an issue that is widely overlooked in the literature, but is critical to analysis reproducibility and robustness. When an analysis cannot distinguish between a null and alternate hypothesis, care must be taken to ensure that the analysis methodology itself maximizes the use of information in the data that can distinguish between the two hypotheses. The use of binned methods by Lankford & Tomek (2017), that examined how many mass killings fell within a 14 day window from a previous mass killing, substantially reduced the sensitivity of their analysis to contagion effects. The unbinned likelihood methods used by Towers et al (2015) did not suffer from this problem. While a binned analysis might be favorable for simplicity and clarity of presentation, unbinned likelihood methods are preferable when effects might be somewhat subtle.
Mubayi, Anuj; Castillo-Chavez, Carlos
2018-01-01
Background When attempting to statistically distinguish between a null and an alternative hypothesis, many researchers in the life and social sciences turn to binned statistical analysis methods, or methods that are simply based on the moments of a distribution (such as the mean, and variance). These methods have the advantage of simplicity of implementation, and simplicity of explanation. However, when null and alternative hypotheses manifest themselves in subtle differences in patterns in the data, binned analysis methods may be insensitive to these differences, and researchers may erroneously fail to reject the null hypothesis when in fact more sensitive statistical analysis methods might produce a different result when the null hypothesis is actually false. Here, with a focus on two recent conflicting studies of contagion in mass killings as instructive examples, we discuss how the use of unbinned likelihood methods makes optimal use of the information in the data; a fact that has been long known in statistical theory, but perhaps is not as widely appreciated amongst general researchers in the life and social sciences. Methods In 2015, Towers et al published a paper that quantified the long-suspected contagion effect in mass killings. However, in 2017, Lankford & Tomek subsequently published a paper, based upon the same data, that claimed to contradict the results of the earlier study. The former used unbinned likelihood methods, and the latter used binned methods, and comparison of distribution moments. Using these analyses, we also discuss how visualization of the data can aid in determination of the most appropriate statistical analysis methods to distinguish between a null and alternate hypothesis. We also discuss the importance of assessment of the robustness of analysis results to methodological assumptions made (for example, arbitrary choices of number of bins and bin widths when using binned methods); an issue that is widely overlooked in the literature, but is critical to analysis reproducibility and robustness. Conclusions When an analysis cannot distinguish between a null and alternate hypothesis, care must be taken to ensure that the analysis methodology itself maximizes the use of information in the data that can distinguish between the two hypotheses. The use of binned methods by Lankford & Tomek (2017), that examined how many mass killings fell within a 14 day window from a previous mass killing, substantially reduced the sensitivity of their analysis to contagion effects. The unbinned likelihood methods used by Towers et al (2015) did not suffer from this problem. While a binned analysis might be favorable for simplicity and clarity of presentation, unbinned likelihood methods are preferable when effects might be somewhat subtle. PMID:29742115
ERIC Educational Resources Information Center
Case, Catherine; Whitaker, Douglas
2016-01-01
In the criminal justice system, defendants accused of a crime are presumed innocent until proven guilty. Statistical inference in any context is built on an analogous principle: The null hypothesis--often a hypothesis of "no difference" or "no effect"--is presumed true unless there is sufficient evidence against it. In this…
Motion versus position in the perception of head-centred movement.
Freeman, Tom C A; Sumnall, Jane H
2002-01-01
Abstract. Observers can recover motion with respect to the head during an eye movement by comparing signals encoding retinal motion and the velocity of pursuit. Evidently there is a mismatch between these signals because perceived head-centred motion is not always veridical. One example is the Filehne illusion, in which a stationary object appears to move in the opposite direction to pursuit. Like the motion aftereffect, the phenomenal experience of the Filehne illusion is one in which the stimulus moves but does not seem to go anywhere. This raises problems when measuring the illusion by motion nulling because the more traditional technique confounds perceived motion with changes in perceived position. We devised a new nulling technique using global-motion stimuli that degraded familiar position cues but preserved cues to motion. Stimuli consisted of random-dot patterns comprising signal and noise dots that moved at the same retinal 'base' speed. Noise moved in random directions. In an eye-stationary speed-matching experiment we found noise slowed perceived retinal speed as 'coherence strength' (ie percentage of signal) was reduced. The effect occurred over the two-octave range of base speeds studied and well above direction threshold. When the same stimuli were combined with pursuit, observers were able to null the Filehne illusion by adjusting coherence. A power law relating coherence to retinal base speed fit the data well with a negative exponent. Eye-movement recordings showed that pursuit was quite accurate. We then tested the hypothesis that the stimuli found at the null-points appeared to move at the same retinal speed. Two observers supported the hypothesis, a third partially, and a fourth showed a small linear trend. In addition, the retinal speed found by the traditional Filehne technique was similar to the matches obtained with the global-motion stimuli. The results provide support for the idea that speed is the critical cue in head-centred motion perception.
Pan, Luyuan; Broadie, Kendal S
2007-11-07
A current hypothesis proposes that fragile X mental retardation protein (FMRP), an RNA-binding translational regulator, acts downstream of glutamatergic transmission, via metabotropic glutamate receptor (mGluR) G(q)-dependent signaling, to modulate protein synthesis critical for trafficking ionotropic glutamate receptors (iGluRs) at synapses. However, direct evidence linking FMRP and mGluR function with iGluR synaptic expression is limited. In this study, we use the Drosophila fragile X model to test this hypothesis at the well characterized glutamatergic neuromuscular junction (NMJ). Two iGluR classes reside at this synapse, each containing common GluRIIC (III), IID and IIE subunits, and variable GluRIIA (A-class) or GluRIIB (B-class) subunits. In Drosophila fragile X mental retardation 1 (dfmr1) null mutants, A-class GluRs accumulate and B-class GluRs are lost, whereas total GluR levels do not change, resulting in a striking change in GluR subclass ratio at individual synapses. The sole Drosophila mGluR, DmGluRA, is also expressed at the NMJ. In dmGluRA null mutants, both iGluR classes increase, resulting in an increase in total synaptic GluR content at individual synapses. Targeted postsynaptic dmGluRA overexpression causes the exact opposite GluR phenotype to the dfmr1 null, confirming postsynaptic GluR subtype-specific regulation. In dfmr1; dmGluRA double null mutants, there is an additive increase in A-class GluRs, and a similar additive impact on B-class GluRs, toward normal levels in the double mutants. These results show that both dFMRP and DmGluRA differentially regulate the abundance of different GluR subclasses in a convergent mechanism within individual postsynaptic domains.
A Continuous Threshold Expectile Model.
Zhang, Feipeng; Li, Qunhua
2017-12-01
Expectile regression is a useful tool for exploring the relation between the response and the explanatory variables beyond the conditional mean. A continuous threshold expectile regression is developed for modeling data in which the effect of a covariate on the response variable is linear but varies below and above an unknown threshold in a continuous way. The estimators for the threshold and the regression coefficients are obtained using a grid search approach. The asymptotic properties for all the estimators are derived, and the estimator for the threshold is shown to achieve root-n consistency. A weighted CUSUM type test statistic is proposed for the existence of a threshold at a given expectile, and its asymptotic properties are derived under both the null and the local alternative models. This test only requires fitting the model under the null hypothesis in the absence of a threshold, thus it is computationally more efficient than the likelihood-ratio type tests. Simulation studies show that the proposed estimators and test have desirable finite sample performance in both homoscedastic and heteroscedastic cases. The application of the proposed method on a Dutch growth data and a baseball pitcher salary data reveals interesting insights. The proposed method is implemented in the R package cthreshER .
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lund, Amie K.; Goens, M. Beth; Nunez, Bethany A.
2006-04-15
The aryl hydrocarbon receptor (AhR) is a ligand-activated transcription factor characterized to play a role in detection and adaptation to environmental stimuli. Genetic deletion of AhR results in hypertension, and cardiac hypertrophy and fibrosis, associated with elevated plasma angiotensin II (Ang II) and endothelin-1 (ET-1), thus AhR appears to contribute to cardiovascular homeostasis. In these studies, we tested the hypothesis that ET-1 mediates cardiovascular pathology in AhR null mice via ET{sub A} receptor activation. First, we determine the time courses of cardiac hypertrophy, and of plasma and tissue ET-1 expression in AhR wildtype and null mice. AhR null mice exhibitedmore » increases in heart-to-body weight ratio and age-related expression of cardiac hypertrophy markers, {beta}-myosin heavy chain ({beta}-MHC), and atrial natriuretic factor (ANF), which were significant at 2 months. Similarly, plasma and tissue ET-1 expression was significantly elevated at 2 months and increased further with age. Second, AhR null mice were treated with ET{sub A} receptor antagonist, BQ-123 (100 nmol/kg/day), for 7, 28, or 58 days and blood pressure, cardiac fibrosis, and cardiac hypertrophy assessed, respectively. BQ-123 for 7 days significantly reduced mean arterial pressure in conscious, catheterized mice. BQ-123 for 28 days significantly reduced the histological appearance of cardiac fibrosis. Treatment for 58 days significantly reduced cardiac mass, assessed by heart weight, echocardiography, and {beta}-MHC and ANF expression; and reduced cardiac fibrosis as determined by osteopontin and collagen I mRNA expression. These findings establish ET-1 and the ET{sub A} receptor as primary determinants of hypertension and cardiac pathology in AhR null mice.« less
Repicky, Sarah; Broadie, Kendal
2009-02-01
Loss of the mRNA-binding protein FMRP results in the most common inherited form of both mental retardation and autism spectrum disorders: fragile X syndrome (FXS). The leading FXS hypothesis proposes that metabotropic glutamate receptor (mGluR) signaling at the synapse controls FMRP function in the regulation of local protein translation to modulate synaptic transmission strength. In this study, we use the Drosophila FXS disease model to test the relationship between Drosophila FMRP (dFMRP) and the sole Drosophila mGluR (dmGluRA) in regulation of synaptic function, using two-electrode voltage-clamp recording at the glutamatergic neuromuscular junction (NMJ). Null dmGluRA mutants show minimal changes in basal synapse properties but pronounced defects during sustained high-frequency stimulation (HFS). The double null dfmr1;dmGluRA mutant shows repression of enhanced augmentation and delayed onset of premature long-term facilitation (LTF) and strongly reduces grossly elevated post-tetanic potentiation (PTP) phenotypes present in dmGluRA-null animals. Null dfmr1 mutants show features of synaptic hyperexcitability, including multiple transmission events in response to a single stimulus and cyclic modulation of transmission amplitude during prolonged HFS. The double null dfmr1;dmGluRA mutant shows amelioration of these defects but does not fully restore wildtype properties in dfmr1-null animals. These data suggest that dmGluRA functions in a negative feedback loop in which excess glutamate released during high-frequency transmission binds the glutamate receptor to dampen synaptic excitability, and dFMRP functions to suppress the translation of proteins regulating this synaptic excitability. Removal of the translational regulator partially compensates for loss of the receptor and, similarly, loss of the receptor weakly compensates for loss of the translational regulator.
A simple test of association for contingency tables with multiple column responses.
Decady, Y J; Thomas, D R
2000-09-01
Loughin and Scherer (1998, Biometrics 54, 630-637) investigated tests of association in two-way tables when one of the categorical variables allows for multiple-category responses from individual respondents. Standard chi-squared tests are invalid in this case, and they developed a bootstrap test procedure that provides good control of test levels under the null hypothesis. This procedure and some others that have been proposed are computationally involved and are based on techniques that are relatively unfamiliar to many practitioners. In this paper, the methods introduced by Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) for analyzing complex survey data are used to develop a simple test based on a corrected chi-squared statistic.
Effects of Gum Chewing on Appetite and Digestion
2013-05-28
The Null Hypothesis is That Food Rheology Will Have no Effect on These Indices.; The Alternate Hypothesis is That Increased Mechanical Stimulation Will Result in Stronger Satiation/Satiety and Reduced Energy Intake.; Further, it is Hypothesized That the Effects of Mastication Will be Less Evident in Obese Compared to Lean Individuals.
Assessing significance in a Markov chain without mixing.
Chikina, Maria; Frieze, Alan; Pegden, Wesley
2017-03-14
We present a statistical test to detect that a presented state of a reversible Markov chain was not chosen from a stationary distribution. In particular, given a value function for the states of the Markov chain, we would like to show rigorously that the presented state is an outlier with respect to the values, by establishing a [Formula: see text] value under the null hypothesis that it was chosen from a stationary distribution of the chain. A simple heuristic used in practice is to sample ranks of states from long random trajectories on the Markov chain and compare these with the rank of the presented state; if the presented state is a [Formula: see text] outlier compared with the sampled ranks (its rank is in the bottom [Formula: see text] of sampled ranks), then this observation should correspond to a [Formula: see text] value of [Formula: see text] This significance is not rigorous, however, without good bounds on the mixing time of the Markov chain. Our test is the following: Given the presented state in the Markov chain, take a random walk from the presented state for any number of steps. We prove that observing that the presented state is an [Formula: see text]-outlier on the walk is significant at [Formula: see text] under the null hypothesis that the state was chosen from a stationary distribution. We assume nothing about the Markov chain beyond reversibility and show that significance at [Formula: see text] is best possible in general. We illustrate the use of our test with a potential application to the rigorous detection of gerrymandering in Congressional districting.
Cadenaro, Milena; Breschi, Lorenzo; Nucci, Cesare; Antoniolli, Francesca; Visintini, Erika; Prati, Carlo; Matis, Bruce A; Di Lenarda, Roberto
2008-01-01
This study evaluated the morphological effects produced in vivo by two in-office bleaching agents on enamel surface roughness using a noncontact profilometric analysis of epoxy replicas. The null hypothesis tested was that there would be no difference in the micromorphology of the enamel surface during or after bleaching with two different bleaching agents. Eighteen subjects were selected and randomly assigned to two treatment groups (n=9). The tooth whitening materials tested were 38% hydrogen peroxide (HP) (Opalescence Xtra Boost) and 35% carbamide peroxide (CP) (Rembrandt Quik Start). The bleaching agents were applied in accordance with manufacturer protocols. The treatments were repeated four times at one-week intervals. High precision impressions of the upper right incisor were taken at baseline as the control (CTRL) and after each bleaching treatment (T0: first application, T1: second application at one week, T2: third application at two weeks and T3: fourth application at three weeks). Epoxy resin replicas were poured from impressions, and the surface roughness was analyzed by means of a non-contact profilometer (Talysurf CLI 1000). Epoxy replicas were then observed using SEM. All data were statistically analyzed using ANOVA and differences were determined with a t-test. No significant differences in surface roughness were found on enamel replicas using either 38% hydrogen peroxide or 35% carbamide peroxide in vivo. This in vivo study supports the null hypothesis that two in-office bleaching agents, with either a high concentration of hydrogen or carbamide peroxide, do not alter enamel surface roughness, even after multiple applications.
A comparison of dental ultrasonic technologies on subgingival calculus removal: a pilot study.
Silva, Lidia Brión; Hodges, Kathleen O; Calley, Kristin Hamman; Seikel, John A
2012-01-01
This pilot study compared the clinical endpoints of the magnetostrictive and piezoelectric ultrasonic instruments on calculus removal. The null hypothesis stated that there is no statistically significant difference in calculus removal between the 2 instruments. A quasi-experimental pre- and post-test design was used. Eighteen participants were included. The magnetostrictive and piezoelectric ultrasonic instruments were used in 2 assigned contra-lateral quadrants on each participant. A data collector, blind to treatment assignment, assessed the calculus on 6 predetermined tooth sites before and after ultrasonic instrumentation. Calculus size was evaluated using ordinal measurements on a 4 point scale (0, 1, 2, 3). Subjects were required to have size 2 or 3 calculus deposit on the 6 predetermined sites. One clinician instrumented the pre-assigned quadrants. A maximum time of 20 minutes of instrumentation was allowed with each technology. Immediately after instrumentation, the data collector then conducted the post-test calculus evaluation. The repeated analysis of variance (ANOVA) was used to analyze the pre- and post-test calculus data (p≤0.05). The null hypothesis was accepted indicating that there is no statistically significant difference in calculus removal when comparing technologies (p≤0.05). Therefore, under similar conditions, both technologies removed the same amount of calculus. This research design could be used as a foundation for continued research in this field. Future studies include implementing this study design with a larger sample size and/or modifying the study design to include multiple clinicians who are data collectors. Also, deposit removal with periodontal maintenance patients could be explored.
Assessing significance in a Markov chain without mixing
Chikina, Maria; Frieze, Alan; Pegden, Wesley
2017-01-01
We present a statistical test to detect that a presented state of a reversible Markov chain was not chosen from a stationary distribution. In particular, given a value function for the states of the Markov chain, we would like to show rigorously that the presented state is an outlier with respect to the values, by establishing a p value under the null hypothesis that it was chosen from a stationary distribution of the chain. A simple heuristic used in practice is to sample ranks of states from long random trajectories on the Markov chain and compare these with the rank of the presented state; if the presented state is a 0.1% outlier compared with the sampled ranks (its rank is in the bottom 0.1% of sampled ranks), then this observation should correspond to a p value of 0.001. This significance is not rigorous, however, without good bounds on the mixing time of the Markov chain. Our test is the following: Given the presented state in the Markov chain, take a random walk from the presented state for any number of steps. We prove that observing that the presented state is an ε-outlier on the walk is significant at p=2ε under the null hypothesis that the state was chosen from a stationary distribution. We assume nothing about the Markov chain beyond reversibility and show that significance at p≈ε is best possible in general. We illustrate the use of our test with a potential application to the rigorous detection of gerrymandering in Congressional districting. PMID:28246331
On the insignificance of Herschel's sunspot correlation
Love, Jeffrey J.
2013-01-01
We examine William Herschel's hypothesis that solar-cycle variation of the Sun's irradiance has a modulating effect on the Earth's climate and that this is, specifically, manifested as an anticorrelation between sunspot number and the market price of wheat. Since Herschel first proposed his hypothesis in 1801, it has been regarded with both interest and skepticism. Recently, reports have been published that either support Herschel's hypothesis or rely on its validity. As a test of Herschel's hypothesis, we seek to reject a null hypothesis of a statistically random correlation between historical sunspot numbers, wheat prices in London and the United States, and wheat farm yields in the United States. We employ binary-correlation, Pearson-correlation, and frequency-domain methods. We test our methods using a historical geomagnetic activity index, well known to be causally correlated with sunspot number. As expected, the measured correlation between sunspot number and geomagnetic activity would be an unlikely realization of random data; the correlation is “statistically significant.” On the other hand, measured correlations between sunspot number and wheat price and wheat yield data would be very likely realizations of random data; these correlations are “insignificant.” Therefore, Herschel's hypothesis must be regarded with skepticism. We compare and contrast our results with those of other researchers. We discuss procedures for evaluating hypotheses that are formulated from historical data.
NASA Technical Reports Server (NTRS)
Goepfert, T. M.; McCarthy, M.; Kittrell, F. S.; Stephens, C.; Ullrich, R. L.; Brinkley, B. R.; Medina, D.
2000-01-01
Mammary epithelial cells from p53 null mice have been shown recently to exhibit an increased risk for tumor development. Hormonal stimulation markedly increased tumor development in p53 null mammary cells. Here we demonstrate that mammary tumors arising in p53 null mammary cells are highly aneuploid, with greater than 70% of the tumor cells containing altered chromosome number and a mean chromosome number of 56. Normal mammary cells of p53 null genotype and aged less than 14 wk do not exhibit aneuploidy in primary cell culture. Significantly, the hormone progesterone, but not estrogen, increases the incidence of aneuploidy in morphologically normal p53 null mammary epithelial cells. Such cells exhibited 40% aneuploidy and a mean chromosome number of 54. The increase in aneuploidy measured in p53 null tumor cells or hormonally stimulated normal p53 null cells was not accompanied by centrosome amplification. These results suggest that normal levels of progesterone can facilitate chromosomal instability in the absence of the tumor suppressor gene, p53. The results support the emerging hypothesis based both on human epidemiological and animal model studies that progesterone markedly enhances mammary tumorigenesis.
A new modeling and inference approach for the Systolic Blood Pressure Intervention Trial outcomes.
Yang, Song; Ambrosius, Walter T; Fine, Lawrence J; Bress, Adam P; Cushman, William C; Raj, Dominic S; Rehman, Shakaib; Tamariz, Leonardo
2018-06-01
Background/aims In clinical trials with time-to-event outcomes, usually the significance tests and confidence intervals are based on a proportional hazards model. Thus, the temporal pattern of the treatment effect is not directly considered. This could be problematic if the proportional hazards assumption is violated, as such violation could impact both interim and final estimates of the treatment effect. Methods We describe the application of inference procedures developed recently in the literature for time-to-event outcomes when the treatment effect may or may not be time-dependent. The inference procedures are based on a new model which contains the proportional hazards model as a sub-model. The temporal pattern of the treatment effect can then be expressed and displayed. The average hazard ratio is used as the summary measure of the treatment effect. The test of the null hypothesis uses adaptive weights that often lead to improvement in power over the log-rank test. Results Without needing to assume proportional hazards, the new approach yields results consistent with previously published findings in the Systolic Blood Pressure Intervention Trial. It provides a visual display of the time course of the treatment effect. At four of the five scheduled interim looks, the new approach yields smaller p values than the log-rank test. The average hazard ratio and its confidence interval indicates a treatment effect nearly a year earlier than a restricted mean survival time-based approach. Conclusion When the hazards are proportional between the comparison groups, the new methods yield results very close to the traditional approaches. When the proportional hazards assumption is violated, the new methods continue to be applicable and can potentially be more sensitive to departure from the null hypothesis.
Chandrasekaran, Srinivas Niranj; Yardimci, Galip Gürkan; Erdogan, Ozgün; Roach, Jeffrey; Carter, Charles W.
2013-01-01
We tested the idea that ancestral class I and II aminoacyl-tRNA synthetases arose on opposite strands of the same gene. We assembled excerpted 94-residue Urgenes for class I tryptophanyl-tRNA synthetase (TrpRS) and class II Histidyl-tRNA synthetase (HisRS) from a diverse group of species, by identifying and catenating three blocks coding for secondary structures that position the most highly conserved, active-site residues. The codon middle-base pairing frequency was 0.35 ± 0.0002 in all-by-all sense/antisense alignments for 211 TrpRS and 207 HisRS sequences, compared with frequencies between 0.22 ± 0.0009 and 0.27 ± 0.0005 for eight different representations of the null hypothesis. Clustering algorithms demonstrate further that profiles of middle-base pairing in the synthetase antisense alignments are correlated along the sequences from one species-pair to another, whereas this is not the case for similar operations on sets representing the null hypothesis. Most probable reconstructed sequences for ancestral nodes of maximum likelihood trees show that middle-base pairing frequency increases to approximately 0.42 ± 0.002 as bacterial trees approach their roots; ancestral nodes from trees including archaeal sequences show a less pronounced increase. Thus, contemporary and reconstructed sequences all validate important bioinformatic predictions based on descent from opposite strands of the same ancestral gene. They further provide novel evidence for the hypothesis that bacteria lie closer than archaea to the origin of translation. Moreover, the inverse polarity of genetic coding, together with a priori α-helix propensities suggest that in-frame coding on opposite strands leads to similar secondary structures with opposite polarity, as observed in TrpRS and HisRS crystal structures. PMID:23576570
Acar, Elif F; Sun, Lei
2013-06-01
Motivated by genetic association studies of SNPs with genotype uncertainty, we propose a generalization of the Kruskal-Wallis test that incorporates group uncertainty when comparing k samples. The extended test statistic is based on probability-weighted rank-sums and follows an asymptotic chi-square distribution with k - 1 degrees of freedom under the null hypothesis. Simulation studies confirm the validity and robustness of the proposed test in finite samples. Application to a genome-wide association study of type 1 diabetic complications further demonstrates the utilities of this generalized Kruskal-Wallis test for studies with group uncertainty. The method has been implemented as an open-resource R program, GKW. © 2013, The International Biometric Society.
Error analysis and system optimization of non-null aspheric testing system
NASA Astrophysics Data System (ADS)
Luo, Yongjie; Yang, Yongying; Liu, Dong; Tian, Chao; Zhuo, Yongmo
2010-10-01
A non-null aspheric testing system, which employs partial null lens (PNL for short) and reverse iterative optimization reconstruction (ROR for short) technique, is proposed in this paper. Based on system modeling in ray tracing software, the parameter of each optical element is optimized and this makes system modeling more precise. Systematic error of non-null aspheric testing system is analyzed and can be categorized into two types, the error due to surface parameters of PNL in the system modeling and the rest from non-null interferometer by the approach of error storage subtraction. Experimental results show that, after systematic error is removed from testing result of non-null aspheric testing system, the aspheric surface is precisely reconstructed by ROR technique and the consideration of systematic error greatly increase the test accuracy of non-null aspheric testing system.
Davies, M H; Elias, E; Acharya, S; Cotton, W; Faulder, G C; Fryer, A A; Strange, R C
1993-01-01
Studies were carried out to test the hypothesis that the GSTM1 null phenotype at the mu (mu) class glutathione S-transferase 1 locus is associated with an increased predisposition to primary biliary cirrhosis. Starch gel electrophoresis was used to compare the prevalence of GSTM1 null phenotype 0 in patients with end stage primary biliary cirrhosis and a group of controls without evidence of liver disease. The prevalence of GSTM1 null phenotype in the primary biliary cirrhosis and control groups was similar; 39% and 45% respectively. In the primary biliary cirrhosis group all subjects were of the common GSTM1 0, GSTM1 A, GSTM1 B or GSTM1 A, B phenotypes while in the controls, one subject showed an isoform with an anodal mobility compatible with it being a product of the putative GSTM1*3 allele. As the GSTM1 phenotype might be changed by the disease process, the polymerase chain reaction was used to amplify the exon 4-exon 5 region of GSTM1 and show that in 13 control subjects and 11 patients with primary biliary cirrhosis, GSTM1 positive and negative genotypes were associated with corresponding GSTM1 expressing and non-expressing phenotypes respectively. The control subject with GSTM1 3 phenotype showed a positive genotype. Images Figure 1 Figure 2 PMID:8491405
NASA Astrophysics Data System (ADS)
Jacob, Rinku; Harikrishnan, K. P.; Misra, R.; Ambika, G.
2018-01-01
Recurrence networks and the associated statistical measures have become important tools in the analysis of time series data. In this work, we test how effective the recurrence network measures are in analyzing real world data involving two main types of noise, white noise and colored noise. We use two prominent network measures as discriminating statistic for hypothesis testing using surrogate data for a specific null hypothesis that the data is derived from a linear stochastic process. We show that the characteristic path length is especially efficient as a discriminating measure with the conclusions reasonably accurate even with limited number of data points in the time series. We also highlight an additional advantage of the network approach in identifying the dimensionality of the system underlying the time series through a convergence measure derived from the probability distribution of the local clustering coefficients. As examples of real world data, we use the light curves from a prominent black hole system and show that a combined analysis using three primary network measures can provide vital information regarding the nature of temporal variability of light curves from different spectroscopic classes.
Root-Bernstein, Robert; Root-Bernstein, Meredith
2016-05-21
We have proposed that the ribosome may represent a missing link between prebiotic chemistries and the first cells. One of the predictions that follows from this hypothesis, which we test here, is that ribosomal RNA (rRNA) must have encoded the proteins necessary for ribosomal function. In other words, the rRNA also functioned pre-biotically as mRNA. Since these ribosome-binding proteins (rb-proteins) must bind to the rRNA, but the rRNA also functioned as mRNA, it follows that rb-proteins should bind to their own mRNA as well. This hypothesis can be contrasted to a "null" hypothesis in which rb-proteins evolved independently of the rRNA sequences and therefore there should be no necessary similarity between the rRNA to which rb-proteins bind and the mRNA that encodes the rb-protein. Five types of evidence reported here support the plausibility of the hypothesis that the mRNA encoding rb-proteins evolved from rRNA: (1) the ubiquity of rb-protein binding to their own mRNAs and autogenous control of their own translation; (2) the higher-than-expected incidence of Arginine-rich modules associated with RNA binding that occurs in rRNA-encoded proteins; (3) the fact that rRNA-binding regions of rb-proteins are homologous to their mRNA binding regions; (4) the higher than expected incidence of rb-protein sequences encoded in rRNA that are of a high degree of homology to their mRNA as compared with a random selection of other proteins; and (5) rRNA in modern prokaryotes and eukaryotes encodes functional proteins. None of these results can be explained by the null hypothesis that assumes independent evolution of rRNA and the mRNAs encoding ribosomal proteins. Also noteworthy is that very few proteins bind their own mRNAs that are not associated with ribosome function. Further tests of the hypothesis are suggested: (1) experimental testing of whether rRNA-encoded proteins bind to rRNA at their coding sites; (2) whether tRNA synthetases, which are also known to bind to their own mRNAs, are encoded by the tRNA sequences themselves; (3) and the prediction that archaeal and prokaryotic (DNA-based) genomes were built around rRNA "genes" so that rRNA-related sequences will be found to make up an unexpectedly high proportion of these genomes. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Mojica, Celina; Bai, Yu; Lotfipour, Shahrdad
2018-06-01
The objective of the current study is to test the hypothesis that the deletion of alpha(α)2* nicotinic acetylcholine receptors (nAChRs) (encoded by the Chrna2 gene) ablate maternal nicotine-induced learning and memory deficits in adolescent mice. We use a pre-exposure-dependent contextual fear conditioning behavioral paradigm that is highly hippocampus-dependent. Adolescent wild type and α2-null mutant offspring are exposed to vehicle or maternal nicotine exposure (200 μg/ml, expressed as base) in the drinking water throughout pregnancy until weaning. Adolescent male offspring mice are tested for alterations in growth and development characteristics as well as modifications in locomotion, anxiety, shock-reactivity and learning and memory. As expected, maternal nicotine exposure has no effects on pup number, weight gain and only modestly reduces fluid intake by 19%. Behaviorally, maternal nicotine exposure impedes extinction learning in adolescent wild type mice, a consequence that is abolished in α2-null mutant mice. The effects on learning and memory are not confounded by alternations in stereotypy, locomotion, anxiety or sensory shock reactivity. Overall, the findings highlight that the deletion of α2* nAChRs eliminate the effects of maternal nicotine exposure on learning and memory in adolescent mice. Copyright © 2018 Elsevier Ltd. All rights reserved.
How to talk about protein-level false discovery rates in shotgun proteomics.
The, Matthew; Tasnim, Ayesha; Käll, Lukas
2016-09-01
A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein-level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein-level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein-level FDRs for both competing null hypotheses. © 2016 The Authors. Proteomics Published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Interpreting Null Findings from Trials of Alcohol Brief Interventions
Heather, Nick
2014-01-01
The effectiveness of alcohol brief intervention (ABI) has been established by a succession of meta-analyses but, because the effects of ABI are small, null findings from randomized controlled trials are often reported and can sometimes lead to skepticism regarding the benefits of ABI in routine practice. This article first explains why null findings are likely to occur under null hypothesis significance testing (NHST) due to the phenomenon known as “the dance of the p-values.” A number of misconceptions about null findings are then described, using as an example the way in which the results of the primary care arm of a recent cluster-randomized trial of ABI in England (the SIPS project) have been misunderstood. These misinterpretations include the fallacy of “proving the null hypothesis” that lack of a significant difference between the means of sample groups can be taken as evidence of no difference between their population means, and the possible effects of this and related misunderstandings of the SIPS findings are examined. The mistaken inference that reductions in alcohol consumption seen in control groups from baseline to follow-up are evidence of real effects of control group procedures is then discussed and other possible reasons for such reductions, including regression to the mean, research participation effects, historical trends, and assessment reactivity, are described. From the standpoint of scientific progress, the chief problem about null findings under the conventional NHST approach is that it is not possible to distinguish “evidence of absence” from “absence of evidence.” By contrast, under a Bayesian approach, such a distinction is possible and it is explained how this approach could classify ABIs in particular settings or among particular populations as either truly ineffective or as of unknown effectiveness, thus accelerating progress in the field of ABI research. PMID:25076917
Detecting anomalies in CMB maps: a new method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Neelakanta, Jayanth T., E-mail: jayanthtn@gmail.com
2015-10-01
Ever since WMAP announced its first results, different analyses have shown that there is weak evidence for several large-scale anomalies in the CMB data. While the evidence for each anomaly appears to be weak, the fact that there are multiple seemingly unrelated anomalies makes it difficult to account for them via a single statistical fluke. So, one is led to considering a combination of these anomalies. But, if we ''hand-pick'' the anomalies (test statistics) to consider, we are making an a posteriori choice. In this article, we propose two statistics that do not suffer from this problem. The statistics aremore » linear and quadratic combinations of the a{sub ℓ m}'s with random co-efficients, and they test the null hypothesis that the a{sub ℓ m}'s are independent, normally-distributed, zero-mean random variables with an m-independent variance. The motivation for considering multiple modes is this: because most physical models that lead to large-scale anomalies result in coupling multiple ℓ and m modes, the ''coherence'' of this coupling should get enhanced if a combination of different modes is considered. In this sense, the statistics are thus much more generic than those that have been hitherto considered in literature. Using fiducial data, we demonstrate that the method works and discuss how it can be used with actual CMB data to make quite general statements about the incompatibility of the data with the null hypothesis.« less
Jain, Shikha; Shetty, K Sadashiva; Jain, Shweta; Jain, Sachin; Prakash, A T; Agrawal, Mamta
2015-07-01
To assess the null hypothesis that there is no difference in the rate of dental development and the occurrence of selected developmental anomalies related to shape, number, structure, and position of teeth between subjects with impacted mandibular canines and those with normally erupted canines. Pretreatment records of 42 subjects diagnosed with mandibular canines impaction (impaction group: IG) were compared with those of 84 subjects serving as a control reference sample (control group: CG). Independent t-tests were used to compare mean dental ages between the groups. Intergroup differences in distribution of subjects based on the rate of dental development and occurrence of selected dental anomalies were assessed using χ(2) tests. Odds of late, normal, and early developers and various categories of developmental anomalies between the IG and the CG were evaluated in terms of odds ratios. Mean dental age for the IG was lower than that for the CG in general. Specifically, this was true for girls (P < .05). Differences in the distribution of the subjects based on the rate of dental development and occurrence of positional anomalies also reached statistical significance (P < .05). The IG showed a higher frequency of late developers and positional anomalies compared with controls (odds ratios 3.00 and 2.82, respectively; P < .05). The null hypothesis was rejected. We identified close association of female subjects in the IG with retarded dental development compared with the female orthodontic patients. Increased frequency of positional developmental anomalies was also remarkable in the IG.
Yarotskyy, Viktor; Protasi, Feliciano; Dirksen, Robert T.
2013-01-01
Store-operated calcium entry (SOCE) channels play an important role in Ca2+ signaling. Recently, excessive SOCE was proposed to play a central role in the pathogenesis of malignant hyperthermia (MH), a pharmacogenic disorder of skeletal muscle. We tested this hypothesis by characterizing SOCE current (ISkCRAC) magnitude, voltage dependence, and rate of activation in myotubes derived from two mouse models of anesthetic- and heat-induced sudden death: 1) type 1 ryanodine receptor (RyR1) knock-in mice (Y524S/+) and 2) calsequestrin 1 and 2 double knock-out (dCasq-null) mice. ISkCRAC voltage dependence and magnitude at -80 mV were not significantly different in myotubes derived from wild type (WT), Y524S/+ and dCasq-null mice. However, the rate of ISkCRAC activation upon repetitive depolarization was significantly faster at room temperature in myotubes from Y524S/+ and dCasq-null mice. In addition, the maximum rate of ISkCRAC activation in dCasq-null myotubes was also faster than WT at more physiological temperatures (35-37°C). Azumolene (50 µM), a more water-soluble analog of dantrolene that is used to reverse MH crises, failed to alter ISkCRAC density or rate of activation. Together, these results indicate that while an increased rate of ISkCRAC activation is a common characteristic of myotubes derived from Y524S/+ and dCasq-null mice and that the protective effects of azumolene are not due to a direct inhibition of SOCE channels. PMID:24143248
A test to evaluate the earthquake prediction algorithm, M8
Healy, John H.; Kossobokov, Vladimir G.; Dewey, James W.
1992-01-01
A test of the algorithm M8 is described. The test is constructed to meet four rules, which we propose to be applicable to the test of any method for earthquake prediction: 1. An earthquake prediction technique should be presented as a well documented, logical algorithm that can be used by investigators without restrictions. 2. The algorithm should be coded in a common programming language and implementable on widely available computer systems. 3. A test of the earthquake prediction technique should involve future predictions with a black box version of the algorithm in which potentially adjustable parameters are fixed in advance. The source of the input data must be defined and ambiguities in these data must be resolved automatically by the algorithm. 4. At least one reasonable null hypothesis should be stated in advance of testing the earthquake prediction method, and it should be stated how this null hypothesis will be used to estimate the statistical significance of the earthquake predictions. The M8 algorithm has successfully predicted several destructive earthquakes, in the sense that the earthquakes occurred inside regions with linear dimensions from 384 to 854 km that the algorithm had identified as being in times of increased probability for strong earthquakes. In addition, M8 has successfully "post predicted" high percentages of strong earthquakes in regions to which it has been applied in retroactive studies. The statistical significance of previous predictions has not been established, however, and post-prediction studies in general are notoriously subject to success-enhancement through hindsight. Nor has it been determined how much more precise an M8 prediction might be than forecasts and probability-of-occurrence estimates made by other techniques. We view our test of M8 both as a means to better determine the effectiveness of M8 and as an experimental structure within which to make observations that might lead to improvements in the algorithm or conceivably lead to a radically different approach to earthquake prediction.
Bowden, Vanessa K; Loft, Shayne
2016-06-01
In 2 experiments we examined the impact of memory for prior events on conflict detection in simulated air traffic control under conditions where individuals proactively controlled aircraft and completed concurrent tasks. Individuals were faster to detect conflicts that had repeatedly been presented during training (positive transfer). Bayesian statistics indicated strong evidence for the null hypothesis that conflict detection was not impaired for events that resembled an aircraft pair that had repeatedly come close to conflicting during training. This is likely because aircraft altitude (the feature manipulated between training and test) was attended to by participants when proactively controlling aircraft. In contrast, a minor change to the relative position of a repeated nonconflicting aircraft pair moderately impaired conflict detection (negative transfer). There was strong evidence for the null hypothesis that positive transfer was not impacted by dividing participant attention, which suggests that part of the information retrieved regarding prior aircraft events was perceptual (the new aircraft pair "looked" like a conflict based on familiarity). These findings extend the effects previously reported by Loft, Humphreys, and Neal (2004), answering the recent strong and unanimous calls across the psychological science discipline to formally establish the robustness and generality of previously published effects. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Ethanol Wet-bonding Challenges Current Anti-degradation Strategy
Sadek, F.T.; Braga, R.R.; Muench, A.; Liu, Y.; Pashley, D.H.; Tay, F.R.
2010-01-01
The long-term effectiveness of chlorhexidine as a matrix metalloproteinase (MMP) inhibitor may be compromised when water is incompletely removed during dentin bonding. This study challenged this anti-bond degradation strategy by testing the null hypothesis that wet-bonding with water or ethanol has no effect on the effectiveness of chlorhexidine in preventing hybrid layer degradation over an 18-month period. Acid-etched dentin was bonded under pulpal pressure simulation with Scotchbond MP and Single Bond 2, with water wet-bonding or with a hydrophobic adhesive with ethanol wet-bonding, with or without pre-treatment with chlorhexidine diacetate (CHD). Resin-dentin beams were prepared for bond strength and TEM evaluation after 24 hrs and after aging in artificial saliva for 9 and 18 mos. Bonds made to ethanol-saturated dentin did not change over time with preservation of hybrid layer integrity. Bonds made to CHD pre-treated acid-etched dentin with commercial adhesives with water wet-bonding were preserved after 9 mos but not after 18 mos, with severe hybrid layer degradation. The results led to rejection of the null hypothesis and highlight the concept of biomimetic water replacement from the collagen intrafibrillar compartments as the ultimate goal in extending the longevity of resin-dentin bonds. PMID:20940353
NASA Astrophysics Data System (ADS)
Harken, B.; Geiges, A.; Rubin, Y.
2013-12-01
There are several stages in any hydrological modeling campaign, including: formulation and analysis of a priori information, data acquisition through field campaigns, inverse modeling, and forward modeling and prediction of some environmental performance metric (EPM). The EPM being predicted could be, for example, contaminant concentration, plume travel time, or aquifer recharge rate. These predictions often have significant bearing on some decision that must be made. Examples include: how to allocate limited remediation resources between multiple contaminated groundwater sites, where to place a waste repository site, and what extraction rates can be considered sustainable in an aquifer. Providing an answer to these questions depends on predictions of EPMs using forward models as well as levels of uncertainty related to these predictions. Uncertainty in model parameters, such as hydraulic conductivity, leads to uncertainty in EPM predictions. Often, field campaigns and inverse modeling efforts are planned and undertaken with reduction of parametric uncertainty as the objective. The tool of hypothesis testing allows this to be taken one step further by considering uncertainty reduction in the ultimate prediction of the EPM as the objective and gives a rational basis for weighing costs and benefits at each stage. When using the tool of statistical hypothesis testing, the EPM is cast into a binary outcome. This is formulated as null and alternative hypotheses, which can be accepted and rejected with statistical formality. When accounting for all sources of uncertainty at each stage, the level of significance of this test provides a rational basis for planning, optimization, and evaluation of the entire campaign. Case-specific information, such as consequences prediction error and site-specific costs can be used in establishing selection criteria based on what level of risk is deemed acceptable. This framework is demonstrated and discussed using various synthetic case studies. The case studies involve contaminated aquifers where a decision must be made based on prediction of when a contaminant will arrive at a given location. The EPM, in this case contaminant travel time, is cast into the hypothesis testing framework. The null hypothesis states that the contaminant plume will arrive at the specified location before a critical value of time passes, and the alternative hypothesis states that the plume will arrive after the critical time passes. Different field campaigns are analyzed based on effectiveness in reducing the probability of selecting the wrong hypothesis, which in this case corresponds to reducing uncertainty in the prediction of plume arrival time. To examine the role of inverse modeling in this framework, case studies involving both Maximum Likelihood parameter estimation and Bayesian inversion are used.
NASA Astrophysics Data System (ADS)
Fajardo, Mario; Neel, Christopher; Lacina, David
2017-06-01
We report (null) results of experiments testing the hypothesis that mid-infrared (mid-IR) spectroscopy can be used to distinguish samples of poly[methyl methacrylate] (PMMA) obtained from different commercial suppliers. This work was motivated by the desire for a simple non-destructive and non-invasive test for pre-sorting PMMA samples prior to use in shock and high-strain-rate experiments, where PMMA is commonly used as a standard material. We discuss: our choice of mid-IR external reflectance spectroscopy, our approach to recording reflectance spectra at near-normal (θ = 0 + / - 5 degree) incidence and for extracting the wavelength-weighted absorption spectrum from the raw reflectance data via a Kramers-Krönig analysis. We employ extensive signal, which necessitates adopting a special experimental protocol to mitigate the effects of instrumental drift. Finally, we report spectra of three PMMA samples with different commercial pedigrees, and show that they are virtually identical (+ / - 1 % error, 95% confidence); obviating the use of mid-IR reflectance spectroscopy to tell the samples apart.
Mebane, Christopher A.
2015-01-01
Criticisms of the uses of the no-observed-effect concentration (NOEC) and the lowest-observed-effect concentration (LOEC) and more generally the entire null hypothesis statistical testing scheme are hardly new or unique to the field of ecotoxicology [1-4]. Among the criticisms of NOECs and LOECs is that statistically similar LOECs (in terms of p value) can represent drastically different levels of effect. For instance, my colleagues and I found that a battery of chronic toxicity tests with different species and endpoints yielded LOECs with minimum detectable differences ranging from 3% to 48% reductions from controls [5].
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boadas-Vaello, Pere; Jover, Eric; Diez-Padrisa, Nuria
2007-12-15
Several alkylnitriles are toxic to sensory systems, including the vestibular system, through yet undefined mechanisms. This study addressed the hypothesis that the vestibular toxicity of cis-crotononitrile depends on CYP2E1-mediated bioactivation. Wild-type (129S1) and CYP2E1-null female mice were exposed to cis-crotononitrile at 0, 2, 2.25 or 2.5 mmol/kg (p.o.) in either a baseline condition or following exposure to 1% acetone in drinking water to induce CYP2E1 expression. The exposed animals were assessed for vestibular toxicity using a behavioral test battery and through surface observation of the vestibular sensory epithelia by scanning electron microscopy. In parallel groups, concentrations of cis-crotononitrile and cyanidemore » were assessed in whole blood. Contrary to our hypothesis, CYP2E1-null mice were slightly more susceptible to the vestibular toxicity of cis-crotononitrile than were control 129S1 mice. Similarly, rather than enhance vestibular toxicity, acetone pretreatment actually reduced it slightly in 129S1 controls, although not in CYP2E1-null mice. In addition, significant differences in mortality were recorded, with the greatest mortality occurring in 129S1 mice after acetone pretreatment. The highest mortality recorded in the 129S1 + acetone mice was associated with the lowest blood concentrations of cis-crotononitrile and the highest concentrations of cyanide at 6 h after nitrile exposure, the time when deaths were initially recorded. We conclude that cis-crotononitrile is a CYP2E1 substrate as hypothesized, but that CYP2E1-mediated metabolism of this nitrile is not necessary for vestibular toxicity; rather, this metabolism constitutes a major pathway for cyanide release and subsequent lethality.« less
Rigor and academic achievement: Career academies versus traditional class structure
NASA Astrophysics Data System (ADS)
Kyees, Linda L.
The purpose of this study was to determine if students who attended high school Career Academy classes, as part of Career and Technical Education, showed greater academic achievement than students who attended traditional high school classes. While all participants attended schools in the same school district, and were seeking the same goal of graduation with a standard diploma, the Career Academy students had the benefit of all classes being directed by a team of teachers who helped them connect their learning to their desired career through collaborative learning projects and assignments. The traditional high school classes taught each subject independent of other subjects and did not have specific connections to desired career goals of the students. The study used a causal-comparative research design and the participants included 1,142 students from 11th and 12th grades who attended 9 high schools in a diversely populated area of central Florida with 571 enrolled in the Career Academies and 571 enrolled in traditional classes. The 10th-grade FCAT scores served as the dependent variable. All students attended similar classes with similar content, making the primary variable the difference in academic gains between students participating in the Career Academy design and the traditional design classes. Using the Man-Whitney U Test resulted in the Career Academy group achieving the higher scores overall. This resulted in rejection of the first null-hypothesis. Further examination determined that the 10th-grade FCAT scores were greater for the average students group, which comprised the largest portion of the participant group, also resulted in rejection of the second null-hypothesis. The gifted and at-risk student group scores resulted in failure to reject the third and fourth null-hypotheses.
On the Interpretation and Use of Mediation: Multiple Perspectives on Mediation Analysis.
Agler, Robert; De Boeck, Paul
2017-01-01
Mediation analysis has become a very popular approach in psychology, and it is one that is associated with multiple perspectives that are often at odds, often implicitly. Explicitly discussing these perspectives and their motivations, advantages, and disadvantages can help to provide clarity to conversations and research regarding the use and refinement of mediation models. We discuss five such pairs of perspectives on mediation analysis, their associated advantages and disadvantages, and their implications: with vs. without a mediation hypothesis, specific effects vs. a global model, directness vs. indirectness of causation, effect size vs. null hypothesis testing, and hypothesized vs. alternative explanations. Discussion of the perspectives is facilitated by a small simulation study. Some philosophical and linguistic considerations are briefly discussed, as well as some other perspectives we do not develop here.
The Influence of Color and Illumination on the Interpretation of Emotions.
ERIC Educational Resources Information Center
Kohn, Imre Ransome
Research is presented that is derived from the hypothesis that a person's interpretation of emotional stimulus is affected by the painted hue and the light intensity of the visual environment. The reported experiment proved in part a null hypothesis; it was suggested that, within the considered variables of the experiment, either a person's…
Underpowered samples, false negatives, and unconscious learning.
Vadillo, Miguel A; Konstantinidis, Emmanouil; Shanks, David R
2016-02-01
The scientific community has witnessed growing concern about the high rate of false positives and unreliable results within the psychological literature, but the harmful impact of false negatives has been largely ignored. False negatives are particularly concerning in research areas where demonstrating the absence of an effect is crucial, such as studies of unconscious or implicit processing. Research on implicit processes seeks evidence of above-chance performance on some implicit behavioral measure at the same time as chance-level performance (that is, a null result) on an explicit measure of awareness. A systematic review of 73 studies of contextual cuing, a popular implicit learning paradigm, involving 181 statistical analyses of awareness tests, reveals how underpowered studies can lead to failure to reject a false null hypothesis. Among the studies that reported sufficient information, the meta-analytic effect size across awareness tests was d z = 0.31 (95 % CI 0.24-0.37), showing that participants' learning in these experiments was conscious. The unusually large number of positive results in this literature cannot be explained by selective publication. Instead, our analyses demonstrate that these tests are typically insensitive and underpowered to detect medium to small, but true, effects in awareness tests. These findings challenge a widespread and theoretically important claim about the extent of unconscious human cognition.
On resilience studies of system detection and recovery techniques against stealthy insider attacks
NASA Astrophysics Data System (ADS)
Wei, Sixiao; Zhang, Hanlin; Chen, Genshe; Shen, Dan; Yu, Wei; Pham, Khanh D.; Blasch, Erik P.; Cruz, Jose B.
2016-05-01
With the explosive growth of network technologies, insider attacks have become a major concern to business operations that largely rely on computer networks. To better detect insider attacks that marginally manipulate network traffic over time, and to recover the system from attacks, in this paper we implement a temporal-based detection scheme using the sequential hypothesis testing technique. Two hypothetical states are considered: the null hypothesis that the collected information is from benign historical traffic and the alternative hypothesis that the network is under attack. The objective of such a detection scheme is to recognize the change within the shortest time by comparing the two defined hypotheses. In addition, once the attack is detected, a server migration-based system recovery scheme can be triggered to recover the system to the state prior to the attack. To understand mitigation of insider attacks, a multi-functional web display of the detection analysis was developed for real-time analytic. Experiments using real-world traffic traces evaluate the effectiveness of Detection System and Recovery (DeSyAR) scheme. The evaluation data validates the detection scheme based on sequential hypothesis testing and the server migration-based system recovery scheme can perform well in effectively detecting insider attacks and recovering the system under attack.
A closure test for time-specific capture-recapture data
Stanley, T.R.; Burnham, K.P.
1999-01-01
The assumption of demographic closure in the analysis of capture-recapture data under closed-population models is of fundamental importance. Yet, little progress has been made in the development of omnibus tests of the closure assumption. We present a closure test for time-specific data that, in principle, tests the null hypothesis of closed-population model M(t) against the open-population Jolly-Seber model as a specific alternative. This test is chi-square, and can be decomposed into informative components that can be interpreted to determine the nature of closure violations. The test is most sensitive to permanent emigration and least sensitive to temporary emigration, and is of intermediate sensitivity to permanent or temporary immigration. This test is a versatile tool for testing the assumption of demographic closure in the analysis of capture-recapture data.
Connolly, Brian; Matykiewicz, Pawel; Bretonnel Cohen, K; Standridge, Shannon M; Glauser, Tracy A; Dlugos, Dennis J; Koh, Susan; Tham, Eric; Pestian, John
2014-01-01
The constant progress in computational linguistic methods provides amazing opportunities for discovering information in clinical text and enables the clinical scientist to explore novel approaches to care. However, these new approaches need evaluation. We describe an automated system to compare descriptions of epilepsy patients at three different organizations: Cincinnati Children's Hospital, the Children's Hospital Colorado, and the Children's Hospital of Philadelphia. To our knowledge, there have been no similar previous studies. In this work, a support vector machine (SVM)-based natural language processing (NLP) algorithm is trained to classify epilepsy progress notes as belonging to a patient with a specific type of epilepsy from a particular hospital. The same SVM is then used to classify notes from another hospital. Our null hypothesis is that an NLP algorithm cannot be trained using epilepsy-specific notes from one hospital and subsequently used to classify notes from another hospital better than a random baseline classifier. The hypothesis is tested using epilepsy progress notes from the three hospitals. We are able to reject the null hypothesis at the 95% level. It is also found that classification was improved by including notes from a second hospital in the SVM training sample. With a reasonably uniform epilepsy vocabulary and an NLP-based algorithm able to use this uniformity to classify epilepsy progress notes across different hospitals, we can pursue automated comparisons of patient conditions, treatments, and diagnoses across different healthcare settings. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Risk-Based, Hypothesis-Driven Framework for Hydrological Field Campaigns with Case Studies
NASA Astrophysics Data System (ADS)
Harken, B.; Rubin, Y.
2014-12-01
There are several stages in any hydrological modeling campaign, including: formulation and analysis of a priori information, data acquisition through field campaigns, inverse modeling, and prediction of some environmental performance metric (EPM). The EPM being predicted could be, for example, contaminant concentration or plume travel time. These predictions often have significant bearing on a decision that must be made. Examples include: how to allocate limited remediation resources between contaminated groundwater sites or where to place a waste repository site. Answering such questions depends on predictions of EPMs using forward models as well as levels of uncertainty related to these predictions. Uncertainty in EPM predictions stems from uncertainty in model parameters, which can be reduced by measurements taken in field campaigns. The costly nature of field measurements motivates a rational basis for determining a measurement strategy that is optimal with respect to the uncertainty in the EPM prediction. The tool of hypothesis testing allows this uncertainty to be quantified by computing the significance of the test resulting from a proposed field campaign. The significance of the test gives a rational basis for determining the optimality of a proposed field campaign. This hypothesis testing framework is demonstrated and discussed using various synthetic case studies. This study involves contaminated aquifers where a decision must be made based on prediction of when a contaminant will arrive at a specified location. The EPM, in this case contaminant travel time, is cast into the hypothesis testing framework. The null hypothesis states that the contaminant plume will arrive at the specified location before a critical amount of time passes, and the alternative hypothesis states that the plume will arrive after the critical time passes. The optimality of different field campaigns is assessed by computing the significance of the test resulting from each one. Evaluating the level of significance caused by a field campaign involves steps including likelihood-based inverse modeling and semi-analytical conditional particle tracking.
Del Fabbro, Egidio; Dev, Rony; Hui, David; Palmer, Lynn; Bruera, Eduardo
2013-04-01
Prior studies have suggested that melatonin, a frequently used integrative medicine, can attenuate weight loss, anorexia, and fatigue in patients with cancer. These studies were limited by a lack of blinding and absence of placebo controls. The primary purpose of this study was to compare melatonin with placebo for appetite improvement in patients with cancer cachexia. We performed a randomized, double-blind, 28-day trial of melatonin 20 mg versus placebo in patients with advanced lung or GI cancer, appetite scores ≥ 4 on a 0 to 10 scale (10 = worst appetite), and history of weight loss ≥ 5%. Assessments included weight, symptoms by the Edmonton Symptom Assessment Scale, and quality of life by the Functional Assessment of Anorexia/Cachexia Therapy (FAACT) questionnaire. Differences between groups from baseline to day 28 were analyzed using one-sided, two-sample t tests or Wilcoxon two-sample tests. Interim analysis halfway through the trial had a Lan-DeMets monitoring boundary with an O'Brien-Fleming stopping rule. Decision boundaries were to accept the null hypothesis of futility if the test statistic z < 0.39 (P ≥ .348) and reject the null hypothesis if z > 2.54 (P ≤ .0056). After interim analysis of 48 patients, the study was closed for futility. There were no significant differences between groups for appetite (P = .78) or other symptoms, weight (P = .17), FAACT score (P = .95), toxicity, or survival from baseline to day 28. In cachectic patients with advanced cancer, oral melatonin 20 mg at night did not improve appetite, weight, or quality of life compared with placebo.
Knösel, Michael; Mattysek, Simone; Jung, Klaus; Kubein-Meesenburg, Dietmar; Sadat-Khonsari, Reza; Ziebolz, Dirk
2010-07-01
To test the null hypothesis that there are no significant differences in the reusability of debonded brackets with regard to debonding technique and adhesive used. Ninety-six osteotomed third molars were randomly assigned to two study groups (n = 48) for bonding of a 0.018-inch bracket (Ormesh, Ormco) with either a composite adhesive (Mono-Lok2; RMO) or a glass ionomer cement (GIC; Fuji Ortho LC;GC). Each of these two groups were then randomly divided into four subgroups (n = 12) according to the method of debonding using (1) bracket removal pliers (BRP; Dentaurum), (2) a side cutter (SC; Dentaurum), (3) a lift-off debracketing instrument (LODI; 3M-Unitek), or (4) an air pressure pulse device (CoronaFlex; KaVo). The brackets were subsequently assessed visually for reusability and reworkability with 2x magnification and by pull testing with a 0.017- x 0.025-inch steel archwire. The proportions of reusable brackets were individually compared in terms of mode of removal and with regard to adhesives using the Fisher exact test (alpha = 5%). The null hypothesis was rejected. Not taking into account the debonding method, brackets bonded with GIC were judged to a significant extent (81%; n = 39; P < .01) to be reworkable compared with those bonded with composite (56%; n = 27). All brackets in both adhesive groups removed with either the LODI or the CoronaFlex were found to be reusable, whereas 79% (46%) of the brackets removed with the BRP (SC) were not. The proportion of reusable brackets differed significantly between modes of removal (P < .01). With regard to bracket reusability, the SC and the BRP cannot be recommended for debonding brackets, especially in combination with a composite adhesive.
deHart, Gregory W; Healy, Kevin E; Jones, Jonathan C R
2003-02-01
Analyses of mice with targeted deletions in the genes for alpha3 and beta1 integrin suggest that the alpha3beta1 integrin heterodimer likely determines the organization of the extracellular matrix within the basement membrane of skin. Here we tested this hypothesis using keratinocytes derived from alpha3 integrin-null mice. We have compared the organizational state of laminin-5, a ligand of alpha3beta1 integrin, in the matrix of wild-type keratinocytes with that of laminin-5 in the matrix of alpha3 integrin-null cells. Laminin-5 distributes diffusely in arc structures in the matrix of wild-type mouse keratinocytes, whereas laminin-5 is organized into linear, spike-like arrays by the alpha3 integrin-null cells. The fact that alpha3 integrin-null cells are deficient in their ability to assemble a proper laminin-5 matrix is also shown by their failure to remodel laminin-5 when plated onto surfaces coated with purified laminin-5 protein. In sharp contrast, wild-type keratinocytes organize exogenously added laminin-5 into discrete ring-like organizations. These findings led us next to assess whether differences in laminin-5 organization in the matrix of the wild-type and alpha3 integrin-null cells impact cell behavior. Our results indicate that alpha3 integrin-null cells are more motile than their wild-type counterparts and leave extensive trails of laminin-5 over the surface on which they move. Moreover, HEK 293 cells migrate significantly more on the laminin-5-rich matrix derived from the alpha3 integrin-null cells than on the wild-type keratinocyte laminin-5 matrix. In addition, alpha3 integrin-null cells show low strength of adhesion to surfaces coated with purified laminin-5 compared to wild-type cells although both the wild type and the alpha3 integrin-null keratinocytes adhere equally strongly to laminin-5 that has been organized into arrays by other epithelial cells. These data suggest: (1) that alpha3beta1 integrin plays an important role in determining the incorporation of laminin-5 into its proper higher-order structure within the extracellular matrix of keratinocytes and (2) that the organizational state of laminin-5 has an influence on laminin-5 matrix function. Copyright 2003 Elsevier Science (USA)
Coffee, R. Lane; Williamson, Ashley J.; Adkins, Christopher M.; Gray, Marisa C.; Page, Terry L.; Broadie, Kendal
2012-01-01
Fragile X syndrome (FXS), caused by loss of the Fragile X Mental Retardation 1 (FMR1) gene product (FMRP), is the most common heritable cause of intellectual disability and autism spectrum disorders. It has been long hypothesized that the phosphorylation of serine 500 (S500) in human FMRP controls its function as an RNA-binding translational repressor. To test this hypothesis in vivo, we employed neuronally targeted expression of three human FMR1 transgenes, including wild-type (hFMR1), dephosphomimetic (S500A-hFMR1) and phosphomimetic (S500D-hFMR1), in the Drosophila FXS disease model to investigate phosphorylation requirements. At the molecular level, dfmr1 null mutants exhibit elevated brain protein levels due to loss of translational repressor activity. This defect is rescued for an individual target protein and across the population of brain proteins by the phosphomimetic, whereas the dephosphomimetic phenocopies the null condition. At the cellular level, dfmr1 null synapse architecture exhibits increased area, branching and bouton number. The phosphomimetic fully rescues these synaptogenesis defects, whereas the dephosphomimetic provides no rescue. The presence of Futsch-positive (microtubule-associated protein 1B) supernumerary microtubule loops is elevated in dfmr1 null synapses. The human phosphomimetic restores normal Futsch loops, whereas the dephosphomimetic provides no activity. At the behavioral level, dfmr1 null mutants exhibit strongly impaired olfactory associative learning. The human phosphomimetic targeted only to the brain-learning center restores normal learning ability, whereas the dephosphomimetic provides absolutely no rescue. We conclude that human FMRP S500 phosphorylation is necessary for its in vivo function as a neuronal translational repressor and regulator of synaptic architecture, and for the manifestation of FMRP-dependent learning behavior. PMID:22080836
NASA Astrophysics Data System (ADS)
Straka, Mika J.; Caldarelli, Guido; Squartini, Tiziano; Saracco, Fabio
2018-04-01
Bipartite networks provide an insightful representation of many systems, ranging from mutualistic networks of species interactions to investment networks in finance. The analyses of their topological structures have revealed the ubiquitous presence of properties which seem to characterize many—apparently different—systems. Nestedness, for example, has been observed in biological plant-pollinator as well as in country-product exportation networks. Due to the interdisciplinary character of complex networks, tools developed in one field, for example ecology, can greatly enrich other areas of research, such as economy and finance, and vice versa. With this in mind, we briefly review several entropy-based bipartite null models that have been recently proposed and discuss their application to real-world systems. The focus on these models is motivated by the fact that they show three very desirable features: analytical character, general applicability, and versatility. In this respect, entropy-based methods have been proven to perform satisfactorily both in providing benchmarks for testing evidence-based null hypotheses and in reconstructing unknown network configurations from partial information. Furthermore, entropy-based models have been successfully employed to analyze ecological as well as economic systems. As an example, the application of entropy-based null models has detected early-warning signals, both in economic and financial systems, of the 2007-2008 world crisis. Moreover, they have revealed a statistically-significant export specialization phenomenon of country export baskets in international trade, a result that seems to reconcile Ricardo's hypothesis in classical economics with recent findings on the (empirical) diversification industrial production at the national level. Finally, these null models have shown that the information contained in the nestedness is already accounted for by the degree sequence of the corresponding graphs.
Baba, Shahid P; Zhang, Deqing; Singh, Mahavir; Dassanayaka, Sujith; Xie, Zhengzhi; Jagatheesan, Ganapathy; Zhao, Jingjing; Schmidtke, Virginia K; Brittian, Kenneth R; Merchant, Michael L; Conklin, Daniel J; Jones, Steven P; Bhatnagar, Aruni
2018-05-01
Pathological cardiac hypertrophy is associated with the accumulation of lipid peroxidation-derived aldehydes such as 4-hydroxy-trans-2-nonenal (HNE) and acrolein in the heart. These aldehydes are metabolized via several pathways, of which aldose reductase (AR) represents a broad-specificity route for their elimination. We tested the hypothesis that by preventing aldehyde removal, AR deficiency accentuates the pathological effects of transverse aortic constriction (TAC). We found that the levels of AR in the heart were increased in mice subjected to TAC for 2 weeks. In comparison with wild-type (WT), AR-null mice showed lower ejection fraction, which was exacerbated 2 weeks after TAC. Levels of atrial natriuretic peptide and myosin heavy chain were higher in AR-null than in WT TAC hearts. Deficiency of AR decreased urinary levels of the acrolein metabolite, 3-hydroxypropylmercapturic acid. Deletion of AR did not affect the levels of the other aldehyde-metabolizing enzyme - aldehyde dehydrogenase 2 in the heart, or its urinary product - (N-Acetyl-S-(2-carboxyethyl)-l-cystiene). AR-null hearts subjected to TAC showed increased accumulation of HNE- and acrolein-modified proteins, as well as increased AMPK phosphorylation and autophagy. Superfusion with HNE led to a greater increase in p62, LC3II formation, and GFP-LC3-II punctae formation in AR-null than WT cardiac myocytes. Pharmacological inactivation of JNK decreased HNE-induced autophagy in AR-null cardiac myocytes. Collectively, these results suggest that during hypertrophy the accumulation of lipid peroxidation derived aldehydes promotes pathological remodeling via excessive autophagy, and that metabolic detoxification of these aldehydes by AR may be essential for maintaining cardiac function during early stages of pressure overload. Published by Elsevier Ltd.
NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.
Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N
2016-11-01
The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.
Royston, Patrick; Parmar, Mahesh K B
2014-08-07
Most randomized controlled trials with a time-to-event outcome are designed and analysed under the proportional hazards assumption, with a target hazard ratio for the treatment effect in mind. However, the hazards may be non-proportional. We address how to design a trial under such conditions, and how to analyse the results. We propose to extend the usual approach, a logrank test, to also include the Grambsch-Therneau test of proportional hazards. We test the resulting composite null hypothesis using a joint test for the hazard ratio and for time-dependent behaviour of the hazard ratio. We compute the power and sample size for the logrank test under proportional hazards, and from that we compute the power of the joint test. For the estimation of relevant quantities from the trial data, various models could be used; we advocate adopting a pre-specified flexible parametric survival model that supports time-dependent behaviour of the hazard ratio. We present the mathematics for calculating the power and sample size for the joint test. We illustrate the methodology in real data from two randomized trials, one in ovarian cancer and the other in treating cellulitis. We show selected estimates and their uncertainty derived from the advocated flexible parametric model. We demonstrate in a small simulation study that when a treatment effect either increases or decreases over time, the joint test can outperform the logrank test in the presence of both patterns of non-proportional hazards. Those designing and analysing trials in the era of non-proportional hazards need to acknowledge that a more complex type of treatment effect is becoming more common. Our method for the design of the trial retains the tools familiar in the standard methodology based on the logrank test, and extends it to incorporate a joint test of the null hypothesis with power against non-proportional hazards. For the analysis of trial data, we propose the use of a pre-specified flexible parametric model that can represent a time-dependent hazard ratio if one is present.
Wilcoxon's signed-rank statistic: what null hypothesis and why it matters.
Li, Heng; Johnson, Terri
2014-01-01
In statistical literature, the term 'signed-rank test' (or 'Wilcoxon signed-rank test') has been used to refer to two distinct tests: a test for symmetry of distribution and a test for the median of a symmetric distribution, sharing a common test statistic. To avoid potential ambiguity, we propose to refer to those two tests by different names, as 'test for symmetry based on signed-rank statistic' and 'test for median based on signed-rank statistic', respectively. The utility of such terminological differentiation should become evident through our discussion of how those tests connect and contrast with sign test and one-sample t-test. Published 2014. This article is a U.S. Government work and is in the public domain in the USA. Published 2014. This article is a U.S. Government work and is in the public domain in the USA.
Nature's style: Naturally trendy
Cohn, T.A.; Lins, H.F.
2005-01-01
Hydroclimatological time series often exhibit trends. While trend magnitude can be determined with little ambiguity, the corresponding statistical significance, sometimes cited to bolster scientific and political argument, is less certain because significance depends critically on the null hypothesis which in turn reflects subjective notions about what one expects to see. We consider statistical trend tests of hydroclimatological data in the presence of long-term persistence (LTP). Monte Carlo experiments employing FARIMA models indicate that trend tests which fail to consider LTP greatly overstate the statistical significance of observed trends when LTP is present. A new test is presented that avoids this problem. From a practical standpoint, however, it may be preferable to acknowledge that the concept of statistical significance is meaningless when discussing poorly understood systems.
Nature's style: Naturally trendy
NASA Astrophysics Data System (ADS)
Cohn, Timothy A.; Lins, Harry F.
2005-12-01
Hydroclimatological time series often exhibit trends. While trend magnitude can be determined with little ambiguity, the corresponding statistical significance, sometimes cited to bolster scientific and political argument, is less certain because significance depends critically on the null hypothesis which in turn reflects subjective notions about what one expects to see. We consider statistical trend tests of hydroclimatological data in the presence of long-term persistence (LTP). Monte Carlo experiments employing FARIMA models indicate that trend tests which fail to consider LTP greatly overstate the statistical significance of observed trends when LTP is present. A new test is presented that avoids this problem. From a practical standpoint, however, it may be preferable to acknowledge that the concept of statistical significance is meaningless when discussing poorly understood systems.
Qu, Long; Guennel, Tobias; Marshall, Scott L
2013-12-01
Following the rapid development of genome-scale genotyping technologies, genetic association mapping has become a popular tool to detect genomic regions responsible for certain (disease) phenotypes, especially in early-phase pharmacogenomic studies with limited sample size. In response to such applications, a good association test needs to be (1) applicable to a wide range of possible genetic models, including, but not limited to, the presence of gene-by-environment or gene-by-gene interactions and non-linearity of a group of marker effects, (2) accurate in small samples, fast to compute on the genomic scale, and amenable to large scale multiple testing corrections, and (3) reasonably powerful to locate causal genomic regions. The kernel machine method represented in linear mixed models provides a viable solution by transforming the problem into testing the nullity of variance components. In this study, we consider score-based tests by choosing a statistic linear in the score function. When the model under the null hypothesis has only one error variance parameter, our test is exact in finite samples. When the null model has more than one variance parameter, we develop a new moment-based approximation that performs well in simulations. Through simulations and analysis of real data, we demonstrate that the new test possesses most of the aforementioned characteristics, especially when compared to existing quadratic score tests or restricted likelihood ratio tests. © 2013, The International Biometric Society.
Time-Frequency Learning Machines for Nonstationarity Detection Using Surrogates
NASA Astrophysics Data System (ADS)
Borgnat, Pierre; Flandrin, Patrick; Richard, Cédric; Ferrari, André; Amoud, Hassan; Honeine, Paul
2012-03-01
Time-frequency representations provide a powerful tool for nonstationary signal analysis and classification, supporting a wide range of applications [12]. As opposed to conventional Fourier analysis, these techniques reveal the evolution in time of the spectral content of signals. In Ref. [7,38], time-frequency analysis is used to test stationarity of any signal. The proposed method consists of a comparison between global and local time-frequency features. The originality is to make use of a family of stationary surrogate signals for defining the null hypothesis of stationarity and, based upon this information, to derive statistical tests. An open question remains, however, about how to choose relevant time-frequency features. Over the last decade, a number of new pattern recognition methods based on reproducing kernels have been introduced. These learning machines have gained popularity due to their conceptual simplicity and their outstanding performance [30]. Initiated by Vapnik’s support vector machines (SVM) [35], they offer now a wide class of supervised and unsupervised learning algorithms. In Ref. [17-19], the authors have shown how the most effective and innovative learning machines can be tuned to operate in the time-frequency domain. This chapter follows this line of research by taking advantage of learning machines to test and quantify stationarity. Based on one-class SVM, our approach uses the entire time-frequency representation and does not require arbitrary feature extraction. Applied to a set of surrogates, it provides the domain boundary that includes most of these stationarized signals. This allows us to test the stationarity of the signal under investigation. This chapter is organized as follows. In Section 22.2, we introduce the surrogate data method to generate stationarized signals, namely, the null hypothesis of stationarity. The concept of time-frequency learning machines is presented in Section 22.3, and applied to one-class SVM in order to derive a stationarity test in Section 22.4. The relevance of the latter is illustrated by simulation results in Section 22.5.
On-Orbit Prospective Echocardiography on International Space Station Crew
NASA Technical Reports Server (NTRS)
Hamilton, Douglas R.; Sargsyan, Ashot E.; Martin, David S.; Garcia, Kathleen M.; Melton, Shannon L.; Feiveson, Alan; Dulchavsky, Scott A.
2010-01-01
Introduction A prospective trial of echocardiography was conducted on of six crewmembers onboard the International Space Station. The main objective was to determine the efficacy of remotely guided tele-echocardiography, including just-in-time e-training methods and determine what "space normal" echocardiographic data is. Methods Each crewmember operator (n=6) had 2-hour preflight training. Baseline echocardiographic data were collected 55 to 167days preflight. Similar equipment was used in each 60-minute in-flight session (mean microgravity exposure - 114 days (34 -- 190)). On Orbit ultrasound operators used an e-learning system within 24h of these sessions. Expert assistance was provided using ultrasound video downlink and two-way voice. Testing was repeated 5 to 16 days after landing. Separate ANOVA was used on each echocardiographic variable (n=33). Within each ANOVA, three tests were made: a) effect of mission phase (preflight, in-flight, post flight); b) effect of echo technician (two technicians independently analyzed the data); c) interaction between mission phase and technician. Results Nine rejections of the null hypothesis (mission phase or technician or both had no effect) were discovered and considered for follow up. Of these, six rejections were for significant technician effects, not as a result of space flight. Three rejections of the null hypothesis (Aortic Valve time velocity integral, Mitral E wave Velocity and heart rate) were attributable to space flight, however determined not to be clinically significant. No rejections were due to the interaction between technician and space flight. Conclusion No consistent clinically significant effects of long-duration space flight were seen in echocardiographic variables of the given group of subjects.
Differential effects of cough, valsalva, and continence status on vesical neck movement.
Howard, D; Miller, J M; Delancey, J O; Ashton-Miller, J A
2000-04-01
We tested the null hypothesis that vesical neck descent is the same during a cough and during a Valsalva maneuver. We also tested the secondary null hypothesis that differences in vesical neck mobility would be independent of parity and continence status. Three groups were included: 17 nulliparous continent (31.3 +/- 5.6; range 22-42 years), 18 primiparous continent (30.4 +/- 4.3; 24-43), and 23 primiparous stress-incontinent (31.9 +/- 3.9; 25-38) women. Measures of vesical neck position at rest and during displacement were obtained by ultrasound. Abdominal pressures were recorded simultaneously using an intravaginal microtransducer catheter. To control for differing abdominal pressures, the stiffness of the vesical neck support was calculated by dividing the pressure exerted during a particular effort by the urethral descent during that effort. The primiparous stress-incontinent women displayed similar vesical neck mobility during a cough effort and during a Valsalva maneuver (13.8 mm compared with 14.8 mm; P =.49). The nulliparous continent women (8.2 mm compared with 12.4 mm; P =. 001) and the primiparous continent women (9.9 mm compared with 14.5 mm; P =.002) displayed less mobility during a cough than during a Valsalva maneuver despite greater abdominal pressure during cough. The nulliparas displayed greater pelvic floor stiffness during a cough compared with the continent and incontinent primiparas (22.7, 15.5, 12.2 cm H(2)O/mm, respectively; P =.001). There are quantifiable differences in vesical neck mobility during a cough and Valsalva maneuver in continent women. This difference is lost in the primiparous stress-incontinent women.
Orexin Receptor Antagonism Improves Sleep and Reduces Seizures in Kcna1-null Mice.
Roundtree, Harrison M; Simeone, Timothy A; Johnson, Chaz; Matthews, Stephanie A; Samson, Kaeli K; Simeone, Kristina A
2016-02-01
Comorbid sleep disorders occur in approximately one-third of people with epilepsy. Seizures and sleep disorders have an interdependent relationship where the occurrence of one can exacerbate the other. Orexin, a wake-promoting neuropeptide, is associated with sleep disorder symptoms. Here, we tested the hypothesis that orexin dysregulation plays a role in the comorbid sleep disorder symptoms in the Kcna1-null mouse model of temporal lobe epilepsy. Rest-activity was assessed using infrared beam actigraphy. Sleep architecture and seizures were assessed using continuous video-electroencephalography-electromyography recordings in Kcna1-null mice treated with vehicle or the dual orexin receptor antagonist, almorexant (100 mg/kg, intraperitoneally). Orexin levels in the lateral hypothalamus/perifornical region (LH/P) and hypothalamic pathology were assessed with immunohistochemistry and oxygen polarography. Kcna1-null mice have increased latency to rapid eye movement (REM) sleep onset, sleep fragmentation, and number of wake epochs. The numbers of REM and non-REM (NREM) sleep epochs are significantly reduced in Kcna1-null mice. Severe seizures propagate to the wake-promoting LH/P where injury is apparent (indicated by astrogliosis, blood-brain barrier permeability, and impaired mitochondrial function). The number of orexin-positive neurons is increased in the LH/P compared to wild-type LH/P. Treatment with a dual orexin receptor antagonist significantly increases the number and duration of NREM sleep epochs and reduces the latency to REM sleep onset. Further, almorexant treatment reduces the incidence of severe seizures and overall seizure burden. Interestingly, we report a significant positive correlation between latency to REM onset and seizure burden in Kcna1-null mice. Dual orexin receptor antagonists may be an effective sleeping aid in epilepsy, and warrants further study on their somnogenic and ant-seizure effects in other epilepsy models. © 2016 Associated Professional Sleep Societies, LLC.
Comparative study between EDXRF and ASTM E572 methods using two-way ANOVA
NASA Astrophysics Data System (ADS)
Krummenauer, A.; Veit, H. M.; Zoppas-Ferreira, J.
2018-03-01
Comparison with reference method is one of the necessary requirements for the validation of non-standard methods. This comparison was made using the experiment planning technique with two-way ANOVA. In ANOVA, the results obtained using the EDXRF method, to be validated, were compared with the results obtained using the ASTM E572-13 standard test method. Fisher's tests (F-test) were used to comparative study between of the elements: molybdenum, niobium, copper, nickel, manganese, chromium and vanadium. All F-tests of the elements indicate that the null hypothesis (Ho) has not been rejected. As a result, there is no significant difference between the methods compared. Therefore, according to this study, it is concluded that the EDXRF method was approved in this method comparison requirement.
Hopcroft, Rosemary L; Martin, David O
2014-06-01
This paper tests the Trivers-Willard hypothesis that high-status individuals will invest more in sons and low-status individuals will invest more in daughters using data from the 2000 to 2010 General Social Survey and the 1979 National Longitudinal Survey of Youth. We argue that the primary investment U.S. parents make in their children is in their children's education, and this investment is facilitated by a diverse market of educational choices at every educational level. We examine two measures of this investment: children's years of education and the highest degree attained. Results show that sons of high-status fathers receive more years of education and higher degrees than daughters, whereas daughters of low-status fathers receive more years of education and higher degrees than sons. Further analyses of possible mechanisms for these findings yield null results. We also find that males are more likely to have high-status fathers than females.
A risk-based approach to flood management decisions in a nonstationary world
NASA Astrophysics Data System (ADS)
Rosner, Ana; Vogel, Richard M.; Kirshen, Paul H.
2014-03-01
Traditional approaches to flood management in a nonstationary world begin with a null hypothesis test of "no trend" and its likelihood, with little or no attention given to the likelihood that we might ignore a trend if it really existed. Concluding a trend exists when it does not, or rejecting a trend when it exists are known as type I and type II errors, respectively. Decision-makers are poorly served by statistical and/or decision methods that do not carefully consider both over- and under-preparation errors, respectively. Similarly, little attention is given to how to integrate uncertainty in our ability to detect trends into a flood management decision context. We show how trend hypothesis test results can be combined with an adaptation's infrastructure costs and damages avoided to provide a rational decision approach in a nonstationary world. The criterion of expected regret is shown to be a useful metric that integrates the statistical, economic, and hydrological aspects of the flood management problem in a nonstationary world.
Mudge, Joseph F; Penny, Faith M; Houlahan, Jeff E
2012-12-01
Setting optimal significance levels that minimize Type I and Type II errors allows for more transparent and well-considered statistical decision making compared to the traditional α = 0.05 significance level. We use the optimal α approach to re-assess conclusions reached by three recently published tests of the pace-of-life syndrome hypothesis, which attempts to unify occurrences of different physiological, behavioral, and life history characteristics under one theory, over different scales of biological organization. While some of the conclusions reached using optimal α were consistent to those previously reported using the traditional α = 0.05 threshold, opposing conclusions were also frequently reached. The optimal α approach reduced probabilities of Type I and Type II errors, and ensured statistical significance was associated with biological relevance. Biologists should seriously consider their choice of α when conducting null hypothesis significance tests, as there are serious disadvantages with consistent reliance on the traditional but arbitrary α = 0.05 significance level. Copyright © 2012 WILEY Periodicals, Inc.
Prum, Richard O
2010-11-01
The Fisher-inspired, arbitrary intersexual selection models of Lande (1981) and Kirkpatrick (1982), including both stable and unstable equilibrium conditions, provide the appropriate null model for the evolution of traits and preferences by intersexual selection. Like the Hardy–Weinberg equilibrium, the Lande–Kirkpatrick (LK) mechanism arises as an intrinsic consequence of genetic variation in trait and preference in the absence of other evolutionary forces. The LK mechanism is equivalent to other intersexual selection mechanisms in the absence of additional selection on preference and with additional trait-viability and preference-viability correlations equal to zero. The LK null model predicts the evolution of arbitrary display traits that are neither honest nor dishonest, indicate nothing other than mating availability, and lack any meaning or design other than their potential to correspond to mating preferences. The current standard for demonstrating an arbitrary trait is impossible to meet because it requires proof of the null hypothesis. The LK null model makes distinct predictions about the evolvability of traits and preferences. Examples of recent intersexual selection research document the confirmationist pitfalls of lacking a null model. Incorporation of the LK null into intersexual selection will contribute to serious examination of the extent to which natural selection on preferences shapes signals.
Three common misuses of P values
Kim, Jeehyoung; Bang, Heejung
2016-01-01
“Significance” has a specific meaning in science, especially in statistics. The p-value as a measure of statistical significance (evidence against a null hypothesis) has long been used in statistical inference and has served as a key player in science and research. Despite its clear mathematical definition and original purpose, and being just one of the many statistical measures/criteria, its role has been over-emphasized along with hypothesis testing. Observing and reflecting on this practice, some journals have attempted to ban reporting of p-values, and the American Statistical Association (for the first time in its 177 year old history) released a statement on p-values in 2016. In this article, we intend to review the correct definition of the p-value as well as its common misuses, in the hope that our article is useful to clinicians and researchers. PMID:27695640
On the Interpretation and Use of Mediation: Multiple Perspectives on Mediation Analysis
Agler, Robert; De Boeck, Paul
2017-01-01
Mediation analysis has become a very popular approach in psychology, and it is one that is associated with multiple perspectives that are often at odds, often implicitly. Explicitly discussing these perspectives and their motivations, advantages, and disadvantages can help to provide clarity to conversations and research regarding the use and refinement of mediation models. We discuss five such pairs of perspectives on mediation analysis, their associated advantages and disadvantages, and their implications: with vs. without a mediation hypothesis, specific effects vs. a global model, directness vs. indirectness of causation, effect size vs. null hypothesis testing, and hypothesized vs. alternative explanations. Discussion of the perspectives is facilitated by a small simulation study. Some philosophical and linguistic considerations are briefly discussed, as well as some other perspectives we do not develop here. PMID:29187828
Fascin1-Dependent Filopodia are Required for Directional Migration of a Subset of Neural Crest Cells
Boer, Elena F.; Howell, Elizabeth D.; Schilling, Thomas F.; Jette, Cicely A.; Stewart, Rodney A.
2015-01-01
Directional migration of neural crest (NC) cells is essential for patterning the vertebrate embryo, including the craniofacial skeleton. Extensive filopodial protrusions in NC cells are thought to sense chemo-attractive/repulsive signals that provide directionality. To test this hypothesis, we generated null mutations in zebrafish fascin1a (fscn1a), which encodes an actin-bundling protein required for filopodia formation. Homozygous fscn1a zygotic null mutants have normal NC filopodia due to unexpected stability of maternal Fscn1a protein throughout NC development and into juvenile stages. In contrast, maternal/zygotic fscn1a null mutant embryos (fscn1a MZ) have severe loss of NC filopodia. However, only a subset of NC streams display migration defects, associated with selective loss of craniofacial elements and peripheral neurons. We also show that fscn1a-dependent NC migration functions through cxcr4a/cxcl12b chemokine signaling to ensure the fidelity of directional cell migration. These data show that fscn1a-dependent filopodia are required in a subset of NC cells to promote cell migration and NC derivative formation, and that perdurance of long-lived maternal proteins can mask essential zygotic gene functions during NC development. PMID:25607881
Beyond statistical inference: A decision theory for science
KILLEEN, PETER R.
2008-01-01
Traditional null hypothesis significance testing does not yield the probability of the null or its alternative and, therefore, cannot logically ground scientific decisions. The decision theory proposed here calculates the expected utility of an effect on the basis of (1) the probability of replicating it and (2) a utility function on its size. It takes significance tests—which place all value on the replicability of an effect and none on its magnitude—as a special case, one in which the cost of a false positive is revealed to be an order of magnitude greater than the value of a true positive. More realistic utility functions credit both replicability and effect size, integrating them for a single index of merit. The analysis incorporates opportunity cost and is consistent with alternate measures of effect size, such as r2 and information transmission, and with Bayesian model selection criteria. An alternate formulation is functionally equivalent to the formal theory, transparent, and easy to compute. PMID:17201351
Estimating equivalence with quantile regression
Cade, B.S.
2011-01-01
Equivalence testing and corresponding confidence interval estimates are used to provide more enlightened statistical statements about parameter estimates by relating them to intervals of effect sizes deemed to be of scientific or practical importance rather than just to an effect size of zero. Equivalence tests and confidence interval estimates are based on a null hypothesis that a parameter estimate is either outside (inequivalence hypothesis) or inside (equivalence hypothesis) an equivalence region, depending on the question of interest and assignment of risk. The former approach, often referred to as bioequivalence testing, is often used in regulatory settings because it reverses the burden of proof compared to a standard test of significance, following a precautionary principle for environmental protection. Unfortunately, many applications of equivalence testing focus on establishing average equivalence by estimating differences in means of distributions that do not have homogeneous variances. I discuss how to compare equivalence across quantiles of distributions using confidence intervals on quantile regression estimates that detect differences in heterogeneous distributions missed by focusing on means. I used one-tailed confidence intervals based on inequivalence hypotheses in a two-group treatment-control design for estimating bioequivalence of arsenic concentrations in soils at an old ammunition testing site and bioequivalence of vegetation biomass at a reclaimed mining site. Two-tailed confidence intervals based both on inequivalence and equivalence hypotheses were used to examine quantile equivalence for negligible trends over time for a continuous exponential model of amphibian abundance. ?? 2011 by the Ecological Society of America.
Wagner, Michael R; Chen, Zhong
2004-12-01
The southwestern pine tip moth, Rhyacionia neomexicana (Dyar) (Lepidoptera: Tortricidae), is a native forest pest that attacks seedlings and saplings of ponderosa pine, Pinus ponderosa Dougl. ex Laws, in the southwestern United States. Repeated attacks can cause severe deformation of host trees and significant long-term growth loss. Alternatively, effective control of R. neomexicana, vegetative competition, or both in young pine plantations may increase survival and growth of trees for many years after treatments are applied. We test the null hypothesis that 4 yr of R. neomexicana and weed control with insecticide, weeding, and insecticide plus weeding would not have any residual effect on survival and growth of trees in ponderosa pine plantation in northern Arizona 14 yr post-treatment, when the trees were 18 yr old. Both insecticide and weeding treatment increased tree growth and reduced the incidence of southwestern pine tip moth damage compared with the control. However, weeding alone also significantly increased tree survival, whereas insecticide alone did not. The insecticide plus weeding treatment had the greatest tree growth and survival, and the lowest rate of tip moth damage. Based on these results, we rejected our null hypothesis and concluded that there were detectable increases in the survival and growth of ponderosa pines 14 yr after treatments applied to control R. neomexicana and weeds.
Orexin Receptor Antagonism Improves Sleep and Reduces Seizures in Kcna1-null Mice
Roundtree, Harrison M.; Simeone, Timothy A.; Johnson, Chaz; Matthews, Stephanie A.; Samson, Kaeli K.; Simeone, Kristina A.
2016-01-01
Study Objective: Comorbid sleep disorders occur in approximately one-third of people with epilepsy. Seizures and sleep disorders have an interdependent relationship where the occurrence of one can exacerbate the other. Orexin, a wake-promoting neuropeptide, is associated with sleep disorder symptoms. Here, we tested the hypothesis that orexin dysregulation plays a role in the comorbid sleep disorder symptoms in the Kcna1-null mouse model of temporal lobe epilepsy. Methods: Rest-activity was assessed using infrared beam actigraphy. Sleep architecture and seizures were assessed using continuous video-electroencephalography-electromyography recordings in Kcna1-null mice treated with vehicle or the dual orexin receptor antagonist, almorexant (100 mg/kg, intraperitoneally). Orexin levels in the lateral hypothalamus/perifornical region (LH/P) and hypothalamic pathology were assessed with immunohistochemistry and oxygen polarography. Results: Kcna1-null mice have increased latency to rapid eye movement (REM) sleep onset, sleep fragmentation, and number of wake epochs. The numbers of REM and non-REM (NREM) sleep epochs are significantly reduced in Kcna1-null mice. Severe seizures propagate to the wake-promoting LH/P where injury is apparent (indicated by astrogliosis, blood-brain barrier permeability, and impaired mitochondrial function). The number of orexin-positive neurons is increased in the LH/P compared to wild-type LH/P. Treatment with a dual orexin receptor antagonist significantly increases the number and duration of NREM sleep epochs and reduces the latency to REM sleep onset. Further, almorexant treatment reduces the incidence of severe seizures and overall seizure burden. Interestingly, we report a significant positive correlation between latency to REM onset and seizure burden in Kcna1-null mice. Conclusion: Dual orexin receptor antagonists may be an effective sleeping aid in epilepsy, and warrants further study on their somnogenic and ant-seizure effects in other epilepsy models. Citation: Roundtree HM, Simeone TA, Johnson C, Matthews SA, Samson KK, Simeone KA. Orexin receptor antagonism improves sleep and reduces seizures in Kcna1-null mice. SLEEP 2016;39(2):357–368. PMID:26446112
Spiegelhalter, D J; Freedman, L S
1986-01-01
The 'textbook' approach to determining sample size in a clinical trial has some fundamental weaknesses which we discuss. We describe a new predictive method which takes account of prior clinical opinion about the treatment difference. The method adopts the point of clinical equivalence (determined by interviewing the clinical participants) as the null hypothesis. Decision rules at the end of the study are based on whether the interval estimate of the treatment difference (classical or Bayesian) includes the null hypothesis. The prior distribution is used to predict the probabilities of making the decisions to use one or other treatment or to reserve final judgement. It is recommended that sample size be chosen to control the predicted probability of the last of these decisions. An example is given from a multi-centre trial of superficial bladder cancer.
NASA Astrophysics Data System (ADS)
Hilburn, Monty D.
Successful lean manufacturing and cellular manufacturing execution relies upon a foundation of leadership commitment and strategic planning built upon solid data and robust analysis. The problem for this study was to create and employ a simple lean transformation planning model and review process that could be used to identify functional support staff resources required to plan and execute lean manufacturing cells within aerospace assembly and manufacturing sites. The lean planning model was developed using available literature for lean manufacturing kaizen best practices and validated through a Delphi panel of lean experts. The resulting model and a standardized review process were used to assess the state of lean transformation planning at five sites of an international aerospace manufacturing and assembly company. The results of the three day, on-site review were compared with baseline plans collected from each of the five sites to determine if there analyzed, with focus on three critical areas of lean planning: the number and type of manufacturing cells identified, the number, type, and duration of planned lean and continuous kaizen events, and the quantity and type of functional staffing resources planned to support the kaizen schedule. Summarized data of the baseline and on-site reviews was analyzed with descriptive statistics. ANOVAs and paired-t tests at 95% significance level were conducted on the means of data sets to determine if null hypotheses related to cell, kaizen event, and support resources could be rejected. The results of the research found significant differences between lean transformation plans developed by site leadership and plans developed utilizing the structured, on-site review process and lean transformation planning model. The null hypothesis that there was no difference between the means of pre-review and on-site cell counts was rejected, as was the null hypothesis that there was no significant difference in kaizen event plans. These factors are critical inputs into the support staffing resources calculation used by the lean planning model. Null hypothesis related to functional support staff resources was rejected for most functional groups, indicating that the baseline site plan inadequately provided for cross-functional staff involvement to support the lean transformation plan. Null hypotheses related to total lean transformation staffing could not be rejected, indicating that while total staffing plans were not significantly different than plans developed during the on-site review and through the use of the lean planning model, the allocation of staffing among various functional groups such as engineering, production, and materials planning was an issue. The on-site review process and simple lean transformation plan developed was determined to be useful in identifying short-comings in lean transformation planning within aerospace manufacturing and assembly sites. It was concluded that the differences uncovered were likely contributing factors affecting the effectiveness of aerospace manufacturing sites' implementation of lean cellular manufacturing.
Tello, J. Sebastián; Myers, Jonathan A.; Macía, Manuel J.; Fuentes, Alfredo F.; Cayola, Leslie; Arellano, Gabriel; Loza, M. Isabel; Torrez, Vania; Cornejo, Maritza; Miranda, Tatiana B.; Jørgensen, Peter M.
2015-01-01
Despite long-standing interest in elevational-diversity gradients, little is known about the processes that cause changes in the compositional variation of communities (β-diversity) across elevations. Recent studies have suggested that β-diversity gradients are driven by variation in species pools, rather than by variation in the strength of local community assembly mechanisms such as dispersal limitation, environmental filtering, or local biotic interactions. However, tests of this hypothesis have been limited to very small spatial scales that limit inferences about how the relative importance of assembly mechanisms may change across spatial scales. Here, we test the hypothesis that scale-dependent community assembly mechanisms shape biogeographic β-diversity gradients using one of the most well-characterized elevational gradients of tropical plant diversity. Using an extensive dataset on woody plant distributions along a 4,000-m elevational gradient in the Bolivian Andes, we compared observed patterns of β-diversity to null-model expectations. β-deviations (standardized differences from null values) were used to measure the relative effects of local community assembly mechanisms after removing sampling effects caused by variation in species pools. To test for scale-dependency, we compared elevational gradients at two contrasting spatial scales that differed in the size of local assemblages and regions by at least an order of magnitude. Elevational gradients in β-diversity persisted after accounting for regional variation in species pools. Moreover, the elevational gradient in β-deviations changed with spatial scale. At small scales, local assembly mechanisms were detectable, but variation in species pools accounted for most of the elevational gradient in β-diversity. At large spatial scales, in contrast, local assembly mechanisms were a dominant force driving changes in β-diversity. In contrast to the hypothesis that variation in species pools alone drives β-diversity gradients, we show that local community assembly mechanisms contribute strongly to systematic changes in β-diversity across elevations. We conclude that scale-dependent variation in community assembly mechanisms underlies these iconic gradients in global biodiversity. PMID:25803846
Cantalapiedra, Juan L; Hernández Fernández, Manuel; Morales, Jorge
2011-01-01
The resource-use hypothesis proposed by E.S. Vrba predicts that specialist species have higher speciation and extinction rates than generalists because they are more susceptible to environmental changes and vicariance. In this work, we test some of the predictions derived from this hypothesis on the 197 extant and recently extinct species of Ruminantia (Cetartiodactyla, Mammalia) using the biomic specialization index (BSI) of each species, which is based on its distribution within different biomes. We ran 10000 Monte Carlo simulations of our data in order to get a null distribution of BSI values against which to contrast the observed data. Additionally, we drew on a supertree of the ruminants and a phylogenetic likelihood-based method (QuaSSE) for testing whether the degree of biomic specialization affects speciation rates in ruminant lineages. Our results are consistent with the predictions of the resource-use hypothesis, which foretells a higher speciation rate of lineages restricted to a single biome (BSI = 1) and higher frequency of specialist species in biomes that underwent high degree of contraction and fragmentation during climatic cycles. Bovids and deer present differential specialization across biomes; cervids show higher specialization in biomes with a marked hydric seasonality (tropical deciduous woodlands and schlerophyllous woodlands), while bovids present higher specialization in a greater variety of biomes. This might be the result of divergent physiological constraints as well as a different biogeographic and evolutionary history.
Cantalapiedra, Juan L.; Hernández Fernández, Manuel; Morales, Jorge
2011-01-01
The resource-use hypothesis proposed by E.S. Vrba predicts that specialist species have higher speciation and extinction rates than generalists because they are more susceptible to environmental changes and vicariance. In this work, we test some of the predictions derived from this hypothesis on the 197 extant and recently extinct species of Ruminantia (Cetartiodactyla, Mammalia) using the biomic specialization index (BSI) of each species, which is based on its distribution within different biomes. We ran 10000 Monte Carlo simulations of our data in order to get a null distribution of BSI values against which to contrast the observed data. Additionally, we drew on a supertree of the ruminants and a phylogenetic likelihood-based method (QuaSSE) for testing whether the degree of biomic specialization affects speciation rates in ruminant lineages. Our results are consistent with the predictions of the resource-use hypothesis, which foretells a higher speciation rate of lineages restricted to a single biome (BSI = 1) and higher frequency of specialist species in biomes that underwent high degree of contraction and fragmentation during climatic cycles. Bovids and deer present differential specialization across biomes; cervids show higher specialization in biomes with a marked hydric seasonality (tropical deciduous woodlands and schlerophyllous woodlands), while bovids present higher specialization in a greater variety of biomes. This might be the result of divergent physiological constraints as well as a different biogeographic and evolutionary history. PMID:22174888
Zhang, Fanghong; Miyaoka, Etsuo; Huang, Fuping; Tanaka, Yutaka
2015-01-01
The problem for establishing noninferiority is discussed between a new treatment and a standard (control) treatment with ordinal categorical data. A measure of treatment effect is used and a method of specifying noninferiority margin for the measure is provided. Two Z-type test statistics are proposed where the estimation of variance is constructed under the shifted null hypothesis using U-statistics. Furthermore, the confidence interval and the sample size formula are given based on the proposed test statistics. The proposed procedure is applied to a dataset from a clinical trial. A simulation study is conducted to compare the performance of the proposed test statistics with that of the existing ones, and the results show that the proposed test statistics are better in terms of the deviation from nominal level and the power.
Dunkel, Curtis S; Harbke, Colin R; Papini, Dennis R
2009-06-01
The authors proposed that birth order affects psychosocial outcomes through differential investment from parent to child and differences in the degree of identification from child to parent. The authors conducted this study to test these 2 models. Despite the use of statistical and methodological procedures to increase sensitivity and reduce error, the authors did not find support for the models. They discuss results in the context of the mixed-research findings regarding birth order and suggest further research on the proposed developmental dynamics that may produce birth-order effects.
NASA Astrophysics Data System (ADS)
Meyer, M. R.
2010-10-01
In this contribution I summarize some recent successes, and focus on remaining challenges, in understanding the formation and evolution of planetary systems in the context of the Blue Dots initiative. Because our understanding is incomplete, we cannot yet articulate a design reference mission engineering matrix suitable for an exploration mission where success is defined as obtaining a spectrum of a potentially habitable world around a nearby star. However, as progress accelerates, we can identify observational programs that would address fundamental scientific questions through hypothesis testing such that the null result is interesting.
A Closer Look at Data Independence: Comment on “Lies, Damned Lies, and Statistics (in Geology)”
NASA Astrophysics Data System (ADS)
Kravtsov, Sergey; Saunders, Rolando Olivas
2011-02-01
In his Forum (Eos, 90(47), 443, doi:10.1029/2009EO470004, 2009), P. Vermeesch suggests that statistical tests are not fit to interpret long data records. He asserts that for large enough data sets any true null hypothesis will always be rejected. This is certainly not the case! Here we revisit this author's example of weekly distribution of earthquakes and show that statistical results support the commonsense expectation that seismic activity does not depend on weekday (see the online supplement to this Eos issue for details (http://www.agu.org/eos_elec/)).
Wang, Yuanjia; Chen, Huaihou
2012-01-01
Summary We examine a generalized F-test of a nonparametric function through penalized splines and a linear mixed effects model representation. With a mixed effects model representation of penalized splines, we imbed the test of an unspecified function into a test of some fixed effects and a variance component in a linear mixed effects model with nuisance variance components under the null. The procedure can be used to test a nonparametric function or varying-coefficient with clustered data, compare two spline functions, test the significance of an unspecified function in an additive model with multiple components, and test a row or a column effect in a two-way analysis of variance model. Through a spectral decomposition of the residual sum of squares, we provide a fast algorithm for computing the null distribution of the test, which significantly improves the computational efficiency over bootstrap. The spectral representation reveals a connection between the likelihood ratio test (LRT) in a multiple variance components model and a single component model. We examine our methods through simulations, where we show that the power of the generalized F-test may be higher than the LRT, depending on the hypothesis of interest and the true model under the alternative. We apply these methods to compute the genome-wide critical value and p-value of a genetic association test in a genome-wide association study (GWAS), where the usual bootstrap is computationally intensive (up to 108 simulations) and asymptotic approximation may be unreliable and conservative. PMID:23020801
Wang, Yuanjia; Chen, Huaihou
2012-12-01
We examine a generalized F-test of a nonparametric function through penalized splines and a linear mixed effects model representation. With a mixed effects model representation of penalized splines, we imbed the test of an unspecified function into a test of some fixed effects and a variance component in a linear mixed effects model with nuisance variance components under the null. The procedure can be used to test a nonparametric function or varying-coefficient with clustered data, compare two spline functions, test the significance of an unspecified function in an additive model with multiple components, and test a row or a column effect in a two-way analysis of variance model. Through a spectral decomposition of the residual sum of squares, we provide a fast algorithm for computing the null distribution of the test, which significantly improves the computational efficiency over bootstrap. The spectral representation reveals a connection between the likelihood ratio test (LRT) in a multiple variance components model and a single component model. We examine our methods through simulations, where we show that the power of the generalized F-test may be higher than the LRT, depending on the hypothesis of interest and the true model under the alternative. We apply these methods to compute the genome-wide critical value and p-value of a genetic association test in a genome-wide association study (GWAS), where the usual bootstrap is computationally intensive (up to 10(8) simulations) and asymptotic approximation may be unreliable and conservative. © 2012, The International Biometric Society.
Long memory behavior of returns after intraday financial jumps
NASA Astrophysics Data System (ADS)
Behfar, Stefan Kambiz
2016-11-01
In this paper, characterization of intraday financial jumps and time dynamics of returns after jumps is investigated, and will be analytically and empirically shown that intraday jumps are power-law distributed with the exponent 1 < μ < 2; in addition, returns after jumps show long-memory behavior. In the theory of finance, it is important to be able to distinguish between jumps and continuous sample path price movements, and this can be achieved by introducing a statistical test via calculating sums of products of returns over small period of time. In the case of having jump, the null hypothesis for normality test is rejected; this is based on the idea that returns are composed of mixture of normally-distributed and power-law distributed data (∼ 1 /r 1 + μ). Probability of rejection of null hypothesis is a function of μ, which is equal to one for 1 < μ < 2 within large intraday sample size M. To test this idea empirically, we downloaded S&P500 index data for both periods of 1997-1998 and 2014-2015, and showed that the Complementary Cumulative Distribution Function of jump return is power-law distributed with the exponent 1 < μ < 2. There are far more jumps in 1997-1998 as compared to 2015-2016; and it represents a power law exponent in 2015-2016 greater than one in 1997-1998. Assuming that i.i.d returns generally follow Poisson distribution, if the jump is a causal factor, high returns after jumps are the effect; we show that returns caused by jump decay as power-law distribution. To test this idea empirically, we average over the time dynamics of all days; therefore the superposed time dynamics after jump represent a power-law, which indicates that there is a long memory with a power-law distribution of return after jump.
Age affects severity of venous gas emboli on decompression from 14.7 to 4.3 psia
NASA Technical Reports Server (NTRS)
Conkin, Johnny; Powell, Michael R.; Gernhardt, Michael L.
2003-01-01
INTRODUCTION: Variables that define who we are, such as age, weight and fitness level influence the risk of decompression sickness (DCS) and venous gas emboli (VGE) from diving and aviation decompressions. We focus on age since astronauts that perform space walks are approximately 10 yr older than our test subjects. Our null hypothesis is that age is not statistically associated with the VGE outcomes from decompression to 4.3 psia. METHODS: Our data are from 7 different NASA tests where 188 men and 50 women performed light exercise at 4.3 psia for planned exposures no less than 4 h. Prebreathe (PB) time on 100% oxygen ranged from 150-270 min, including ascent time, with exercise of different intensity and length being performed during the PB in four of the seven tests with 150 min of PB. Subjects were monitored for VGE in the pulmonary artery using a Doppler ultrasound bubble detector for a 4-min period every 12 min. There were six design variables; the presence or absence of lower body adynamia and five PB variables; plus five concomitant variables on physical characteristics: age, weight height, body mass index, and gender that were available for logistic regression (LR). We used LR models for the probability of DCS and VGE, and multinomial logit (ML) models for the probability of Spencer VGE Grades 0-IV at exposure times of 61, 95, 131, 183 min, and for the entire exposure. RESULTS: Age was significantly associated with VGE in both the LR and ML models, so we reject the null hypothesis. Lower body adynamia was significant for all responses. CONCLUSIONS: Our selection of tests produced a wide range of the explanatory variables, but only age, lower body adynamia, height, and total PB time was helpful in various combinations to model the probability of DCS and VGE.
The ranking probability approach and its usage in design and analysis of large-scale studies.
Kuo, Chia-Ling; Zaykin, Dmitri
2013-01-01
In experiments with many statistical tests there is need to balance type I and type II error rates while taking multiplicity into account. In the traditional approach, the nominal [Formula: see text]-level such as 0.05 is adjusted by the number of tests, [Formula: see text], i.e., as 0.05/[Formula: see text]. Assuming that some proportion of tests represent "true signals", that is, originate from a scenario where the null hypothesis is false, power depends on the number of true signals and the respective distribution of effect sizes. One way to define power is for it to be the probability of making at least one correct rejection at the assumed [Formula: see text]-level. We advocate an alternative way of establishing how "well-powered" a study is. In our approach, useful for studies with multiple tests, the ranking probability [Formula: see text] is controlled, defined as the probability of making at least [Formula: see text] correct rejections while rejecting hypotheses with [Formula: see text] smallest P-values. The two approaches are statistically related. Probability that the smallest P-value is a true signal (i.e., [Formula: see text]) is equal to the power at the level [Formula: see text], to an very good excellent approximation. Ranking probabilities are also related to the false discovery rate and to the Bayesian posterior probability of the null hypothesis. We study properties of our approach when the effect size distribution is replaced for convenience by a single "typical" value taken to be the mean of the underlying distribution. We conclude that its performance is often satisfactory under this simplification; however, substantial imprecision is to be expected when [Formula: see text] is very large and [Formula: see text] is small. Precision is largely restored when three values with the respective abundances are used instead of a single typical effect size value.
Mayhew, Terry M; Lucocq, John M
2011-03-01
Various methods for quantifying cellular immunogold labelling on transmission electron microscope thin sections are currently available. All rely on sound random sampling principles and are applicable to single immunolabelling across compartments within a given cell type or between different experimental groups of cells. Although methods are also available to test for colocalization in double/triple immunogold labelling studies, so far, these have relied on making multiple measurements of gold particle densities in defined areas or of inter-particle nearest neighbour distances. Here, we present alternative two-step approaches to codistribution and colocalization assessment that merely require raw counts of gold particles in distinct cellular compartments. For assessing codistribution over aggregate compartments, initial statistical evaluation involves combining contingency table and chi-squared analyses to provide predicted gold particle distributions. The observed and predicted distributions allow testing of the appropriate null hypothesis, namely, that there is no difference in the distribution patterns of proteins labelled by different sizes of gold particle. In short, the null hypothesis is that of colocalization. The approach for assessing colabelling recognises that, on thin sections, a compartment is made up of a set of sectional images (profiles) of cognate structures. The approach involves identifying two groups of compartmental profiles that are unlabelled and labelled for one gold marker size. The proportions in each group that are also labelled for the second gold marker size are then compared. Statistical analysis now uses a 2 × 2 contingency table combined with the Fisher exact probability test. Having identified double labelling, the profiles can be analysed further in order to identify characteristic features that might account for the double labelling. In each case, the approach is illustrated using synthetic and/or experimental datasets and can be refined to correct observed labelling patterns to specific labelling patterns. These simple and efficient approaches should be of more immediate utility to those interested in codistribution and colocalization in multiple immunogold labelling investigations.
Goodman and Kruskal's TAU-B Statistics: A Fortran-77 Subroutine.
ERIC Educational Resources Information Center
Berry, Kenneth J.; Mielke, Paul W., Jr.
1986-01-01
An algorithm and associated FORTRAN-77 computer subroutine are described for computing Goodman and Kruskal's tau-b statistic along with the associated nonasymptotic probability value under the null hypothesis tau=O. (Author)
Really a Matter of Data: A Reply to Solomon.
ERIC Educational Resources Information Center
Sroufe, L. Alan
1980-01-01
Replies to Solomon's paper that basic criticisms made earlier of Shaffran and Decaries' study still apply. Views the study as essentially a confirmation of the null hypothesis based on weak measures. (Author/RH)
Statistical Hypothesis Testing in Intraspecific Phylogeography: NCPA versus ABC
Templeton, Alan R.
2009-01-01
Nested clade phylogeographic analysis (NCPA) and approximate Bayesian computation (ABC) have been used to test phylogeographic hypotheses. Multilocus NCPA tests null hypotheses, whereas ABC discriminates among a finite set of alternatives. The interpretive criteria of NCPA are explicit and allow complex models to be built from simple components. The interpretive criteria of ABC are ad hoc and require the specification of a complete phylogeographic model. The conclusions from ABC are often influenced by implicit assumptions arising from the many parameters needed to specify a complex model. These complex models confound many assumptions so that biological interpretations are difficult. Sampling error is accounted for in NCPA, but ABC ignores important sources of sampling error that creates pseudo-statistical power. NCPA generates the full sampling distribution of its statistics, but ABC only yields local probabilities, which in turn make it impossible to distinguish between a good fitting model, a non-informative model, and an over-determined model. Both NCPA and ABC use approximations, but convergences of the approximations used in NCPA are well defined whereas those in ABC are not. NCPA can analyze a large number of locations, but ABC cannot. Finally, the dimensionality of tested hypothesis is known in NCPA, but not for ABC. As a consequence, the “probabilities” generated by ABC are not true probabilities and are statistically non-interpretable. Accordingly, ABC should not be used for hypothesis testing, but simulation approaches are valuable when used in conjunction with NCPA or other methods that do not rely on highly parameterized models. PMID:19192182
Brown, Angus M
2010-04-01
The objective of the method described in this paper is to develop a spreadsheet template for the purpose of comparing multiple sample means. An initial analysis of variance (ANOVA) test on the data returns F--the test statistic. If F is larger than the critical F value drawn from the F distribution at the appropriate degrees of freedom, convention dictates rejection of the null hypothesis and allows subsequent multiple comparison testing to determine where the inequalities between the sample means lie. A variety of multiple comparison methods are described that return the 95% confidence intervals for differences between means using an inclusive pairwise comparison of the sample means. 2009 Elsevier Ireland Ltd. All rights reserved.
Wu, Mixia; Shu, Yu; Li, Zhaohai; Liu, Aiyi
2016-01-01
A sequential design is proposed to test whether the accuracy of a binary diagnostic biomarker meets the minimal level of acceptance. The accuracy of a binary diagnostic biomarker is a linear combination of the marker’s sensitivity and specificity. The objective of the sequential method is to minimize the maximum expected sample size under the null hypothesis that the marker’s accuracy is below the minimal level of acceptance. The exact results of two-stage designs based on Youden’s index and efficiency indicate that the maximum expected sample sizes are smaller than the sample sizes of the fixed designs. Exact methods are also developed for estimation, confidence interval and p-value concerning the proposed accuracy index upon termination of the sequential testing. PMID:26947768
Bale, Laurie K; Conover, Cheryl A
2005-08-01
Pregnancy-associated plasma protein-A (PAPP-A), an insulin-like growth factor-binding protein (IGFBP) protease, increases insulin-like growth factor (IGF) activity through cleavage of inhibitory IGFBP-4 and the consequent release of IGF peptide for receptor activation. Mice homozygous for targeted disruption of the PAPP-A gene are born as proportional dwarfs and exhibit retarded bone ossification during fetal development. Phenotype and in vitro data support a model in which decreased IGF-II bioavailability during embryogenesis results in growth retardation and reduction in overall body size. To test the hypothesis that an increase in IGF-II during embryogenesis would overcome the growth deficiencies, PAPP-A-null mice were crossed with DeltaH19 mutant mice, which have increased IGF-II expression and fetal overgrowth due to disruption of IgfII imprinting. DeltaH19 mutant mice were 126% and PAPP-A-null mice were 74% the size of controls at birth. These size differences were evident at embryonic day 16.5. Importantly, double mutants were indistinguishable from controls both in terms of size and skeletal development. Body size programmed during embryo development persisted post-natally. Thus, disruption of IgfII imprinting and consequent elevation in IGF-II during fetal development was associated with rescue of the dwarf phenotype and ossification defects of PAPP-A-null mice. These data provide strong genetic evidence that PAPP-A plays an essential role in determining IGF-II bioavailability for optimal fetal growth and development.
De Meeûs, Thierry
2014-03-01
In population genetics data analysis, researchers are often faced to the problem of decision making from a series of tests of the same null hypothesis. This is the case when one wants to test differentiation between pathogens found on different host species sampled from different locations (as many tests as number of locations). Many procedures are available to date but not all apply to all situations. Finding which tests are significant or if the whole series is significant, when tests are independent or not do not require the same procedures. In this note I describe several procedures, among the simplest and easiest to undertake, that should allow decision making in most (if not all) situations population geneticists (or biologists) should meet, in particular in host-parasite systems. Copyright © 2014 Elsevier B.V. All rights reserved.
Zou, W; Ouyang, H
2016-02-01
We propose a multiple estimation adjustment (MEA) method to correct effect overestimation due to selection bias from a hypothesis-generating study (HGS) in pharmacogenetics. MEA uses a hierarchical Bayesian approach to model individual effect estimates from maximal likelihood estimation (MLE) in a region jointly and shrinks them toward the regional effect. Unlike many methods that model a fixed selection scheme, MEA capitalizes on local multiplicity independent of selection. We compared mean square errors (MSEs) in simulated HGSs from naive MLE, MEA and a conditional likelihood adjustment (CLA) method that model threshold selection bias. We observed that MEA effectively reduced MSE from MLE on null effects with or without selection, and had a clear advantage over CLA on extreme MLE estimates from null effects under lenient threshold selection in small samples, which are common among 'top' associations from a pharmacogenetics HGS.
Do Men Produce Higher Quality Ejaculates When Primed With Thoughts of Partner Infidelity?
Pham, Michael N; Barbaro, Nicole; Holub, Andrew M; Holden, Christopher J; Mogilski, Justin K; Lopes, Guilherme S; Nicolas, Sylis C A; Sela, Yael; Shackelford, Todd K; Zeigler-Hill, Virgil; Welling, Lisa L M
2018-01-01
Sperm competition theory can be used to generate the hypothesis that men alter the quality of their ejaculates as a function of sperm competition risk. Using a repeated measures experimental design, we investigated whether men produce a higher quality ejaculate when primed with cues to sperm competition (i.e., imagined partner infidelity) relative to a control prime. Men ( n = 45) submitted two masturbatory ejaculates-one ejaculate sample for each condition (i.e., sperm competition and control conditions). Ejaculates were assessed on 17 clinical parameters. The results did not support the hypothesis: Men did not produce higher quality ejaculates in the sperm competition condition relative to the control condition. Despite the null results of the current research, there is evidence for psychological and physiological adaptations to sperm competition in humans. We discuss methodological limitations that may have produced the null results and present methodological suggestions for research on human sperm competition.
Kuiper, Rebecca M; Nederhoff, Tim; Klugkist, Irene
2015-05-01
In this paper, the performance of six types of techniques for comparisons of means is examined. These six emerge from the distinction between the method employed (hypothesis testing, model selection using information criteria, or Bayesian model selection) and the set of hypotheses that is investigated (a classical, exploration-based set of hypotheses containing equality constraints on the means, or a theory-based limited set of hypotheses with equality and/or order restrictions). A simulation study is conducted to examine the performance of these techniques. We demonstrate that, if one has specific, a priori specified hypotheses, confirmation (i.e., investigating theory-based hypotheses) has advantages over exploration (i.e., examining all possible equality-constrained hypotheses). Furthermore, examining reasonable order-restricted hypotheses has more power to detect the true effect/non-null hypothesis than evaluating only equality restrictions. Additionally, when investigating more than one theory-based hypothesis, model selection is preferred over hypothesis testing. Because of the first two results, we further examine the techniques that are able to evaluate order restrictions in a confirmatory fashion by examining their performance when the homogeneity of variance assumption is violated. Results show that the techniques are robust to heterogeneity when the sample sizes are equal. When the sample sizes are unequal, the performance is affected by heterogeneity. The size and direction of the deviations from the baseline, where there is no heterogeneity, depend on the effect size (of the means) and on the trend in the group variances with respect to the ordering of the group sizes. Importantly, the deviations are less pronounced when the group variances and sizes exhibit the same trend (e.g., are both increasing with group number). © 2014 The British Psychological Society.
Solvent-induced dimensional changes in EDTA-demineralized dentin matrix.
Pashley, D H; Agee, K A; Nakajima, M; Tay, F R; Carvalho, R M; Terada, R S; Harmon, F J; Lee, W K; Rueggeberg, F A
2001-08-01
The purpose of this study was to test the null hypothesis that the re-expansion of dried matrix and the shrinkage of moist, demineralized dentin is not influenced by polar solvents. Dentin disks were prepared from midcoronal dentin of extracted human third molars. After complete demineralization in 0.5M of EDTA (pH 7), the specimens were placed in the well of a device that measures changes in matrix height in real time. Dry, collapsed matrices were created by blowing dry N(2) on the specimens until they shrank to a stable plateau. Polar solvents [water, methanol, ethanol, n-propanol, n-butanol, formamide, ethylene glycol, hydroxyethyl methacrylate (HEMA), or mixtures of water-HEMA] as model primers then were added and the degree of re-expansion measured. These same solvents also were applied to moist, expanded matrices and the solvent-induced shrinkages measured. Regression analysis was used to test the correlations between matrix height and Hansen's dispersive, polar, hydrogen bonding, and total solubility parameters (delta(d), delta(p), delta(h), delta(t)). The results indicate that water-free polar solvents of low hydrogen bonding (H-bond) ability (e.g., neat HEMA) do not re-expand dried matrices and that they shrink moist matrices. When HEMA was mixed with progressively higher water concentrations, the model water-HEMA primers expanded the dried matrix in proportion to their water concentrations and they produced less shrinkage of moist matrices. Solvents with higher H-bonding capacities (methanol, ethanol, ethylene glycol, formamide, and water) re-expanded the dried matrix in proportion to their solubility parameters for H-bonding (delta(h)). They also induced small transient shrinkages of moist matrices, which slowly re-expanded. The results require rejection of the null hypothesis. Copyright 2001 John Wiley & Sons, Inc. J Biomed Mater Res 56: 273-281, 2001
A system for 3D representation of burns and calculation of burnt skin area.
Prieto, María Felicidad; Acha, Begoña; Gómez-Cía, Tomás; Fondón, Irene; Serrano, Carmen
2011-11-01
In this paper a computer-based system for burnt surface area estimation (BAI), is presented. First, a 3D model of a patient, adapted to age, weight, gender and constitution is created. On this 3D model, physicians represent both burns as well as burn depth allowing the burnt surface area to be automatically calculated by the system. Each patient models as well as photographs and burn area estimation can be stored. Therefore, these data can be included in the patient's clinical records for further review. Validation of this system was performed. In a first experiment, artificial known sized paper patches were attached to different parts of the body in 37 volunteers. A panel of 5 experts diagnosed the extent of the patches using the Rule of Nines. Besides, our system estimated the area of the "artificial burn". In order to validate the null hypothesis, Student's t-test was applied to collected data. In addition, intraclass correlation coefficient (ICC) was calculated and a value of 0.9918 was obtained, demonstrating that the reliability of the program in calculating the area is of 99%. In a second experiment, the burnt skin areas of 80 patients were calculated using BAI system and the Rule of Nines. A comparison between these two measuring methods was performed via t-Student test and ICC. The hypothesis of null difference between both measures is only true for deep dermal burns and the ICC is significantly different, indicating that the area estimation calculated by applying classical techniques can result in a wrong diagnose of the burnt surface. Copyright © 2011 Elsevier Ltd and ISBI. All rights reserved.
Rothmann, Mark
2005-01-01
When testing the equality of means from two different populations, a t-test or large sample normal test tend to be performed. For these tests, when the sample size or design for the second sample is dependent on the results of the first sample, the type I error probability is altered for each specific possibility in the null hypothesis. We will examine the impact on the type I error probabilities for two confidence interval procedures and procedures using test statistics when the design for the second sample or experiment is dependent on the results from the first sample or experiment (or series of experiments). Ways for controlling a desired maximum type I error probability or a desired type I error rate will be discussed. Results are applied to the setting of noninferiority comparisons in active controlled trials where the use of a placebo is unethical.
Sobiecki, Jakub G
2017-08-01
Despite the consistent findings of lower total cancer incidence in vegetarians than in meat-eaters in the UK, the results of studies of colorectal cancer (CRC) risk in British vegetarians have largely been null. This was in contrast to the hypothesis of a decreased risk of CRC in this population due to null intake of red and processed meats and increased intake of fibre. Although the data are inconsistent, it has been suggested that selenium (Se) status may influence CRC risk. A literature review was performed of studies on CRC risk in vegetarians, Se intakes and status in vegetarians, and changes of Se intakes and status in the UK throughout the follow-up periods of studies on CRC risk in British vegetarians. Vegetarians in the UK and other low-Se areas were found to have low Se intakes and status compared to non-vegetarians. There was some evidence of a reverse J-shaped curve of Se intakes and status in the UK throughout the last three decades. These presumed patterns were followed by the changes in CRC mortality or incidence in British vegetarians during this period. Available data on Se intake and status in British vegetarians, as well as the relationship between their secular changes in the UK and changes in CRC risk in this dietary group, are compatible with the hypothesis that low Se status may contribute to the largely null results of studies of CRC risk in vegetarians in the UK.
Detecting Multifractal Properties in Asset Returns:
NASA Astrophysics Data System (ADS)
Lux, Thomas
It has become popular recently to apply the multifractal formalism of statistical physics (scaling analysis of structure functions and f(α) singularity spectrum analysis) to financial data. The outcome of such studies is a nonlinear shape of the structure function and a nontrivial behavior of the spectrum. Eventually, this literature has moved from basic data analysis to estimation of particular variants of multifractal models for asset returns via fitting of the empirical τ(q) and f(α) functions. Here, we reinvestigate earlier claims of multifractality using four long time series of important financial markets. Taking the recently proposed multifractal models of asset returns as our starting point, we show that the typical "scaling estimators" used in the physics literature are unable to distinguish between spurious and "true" multiscaling of financial data. Designing explicit tests for multiscaling, we can in no case reject the null hypothesis that the apparent curvature of both the scaling function and the Hölder spectrum are spuriously generated by the particular fat-tailed distribution of financial data. Given the well-known overwhelming evidence in favor of different degrees of long-term dependence in the powers of returns, we interpret this inability to reject the null hypothesis of multiscaling as a lack of discriminatory power of the standard approach rather than as a true rejection of multiscaling. However, the complete "failure" of the multifractal apparatus in this setting also raises the question whether results in other areas (like geophysics) suffer from similar shortcomings of the traditional methodology.
Quantifying lead-time bias in risk factor studies of cancer through simulation.
Jansen, Rick J; Alexander, Bruce H; Anderson, Kristin E; Church, Timothy R
2013-11-01
Lead-time is inherent in early detection and creates bias in observational studies of screening efficacy, but its potential to bias effect estimates in risk factor studies is not always recognized. We describe a form of this bias that conventional analyses cannot address and develop a model to quantify it. Surveillance Epidemiology and End Results (SEER) data form the basis for estimates of age-specific preclinical incidence, and log-normal distributions describe the preclinical duration distribution. Simulations assume a joint null hypothesis of no effect of either the risk factor or screening on the preclinical incidence of cancer, and then quantify the bias as the risk-factor odds ratio (OR) from this null study. This bias can be used as a factor to adjust observed OR in the actual study. For this particular study design, as average preclinical duration increased, the bias in the total-physical activity OR monotonically increased from 1% to 22% above the null, but the smoking OR monotonically decreased from 1% above the null to 5% below the null. The finding of nontrivial bias in fixed risk-factor effect estimates demonstrates the importance of quantitatively evaluating it in susceptible studies. Copyright © 2013 Elsevier Inc. All rights reserved.
After p Values: The New Statistics for Undergraduate Neuroscience Education.
Calin-Jageman, Robert J
2017-01-01
Statistical inference is a methodological cornerstone for neuroscience education. For many years this has meant inculcating neuroscience majors into null hypothesis significance testing with p values. There is increasing concern, however, about the pervasive misuse of p values. It is time to start planning statistics curricula for neuroscience majors that replaces or de-emphasizes p values. One promising alternative approach is what Cumming has dubbed the "New Statistics", an approach that emphasizes effect sizes, confidence intervals, meta-analysis, and open science. I give an example of the New Statistics in action and describe some of the key benefits of adopting this approach in neuroscience education.
Jorge, Inmaculada; Navarro, Pedro; Martínez-Acedo, Pablo; Núñez, Estefanía; Serrano, Horacio; Alfranca, Arántzazu; Redondo, Juan Miguel; Vázquez, Jesús
2009-01-01
Statistical models for the analysis of protein expression changes by stable isotope labeling are still poorly developed, particularly for data obtained by 16O/18O labeling. Besides large scale test experiments to validate the null hypothesis are lacking. Although the study of mechanisms underlying biological actions promoted by vascular endothelial growth factor (VEGF) on endothelial cells is of considerable interest, quantitative proteomics studies on this subject are scarce and have been performed after exposing cells to the factor for long periods of time. In this work we present the largest quantitative proteomics study to date on the short term effects of VEGF on human umbilical vein endothelial cells by 18O/16O labeling. Current statistical models based on normality and variance homogeneity were found unsuitable to describe the null hypothesis in a large scale test experiment performed on these cells, producing false expression changes. A random effects model was developed including four different sources of variance at the spectrum-fitting, scan, peptide, and protein levels. With the new model the number of outliers at scan and peptide levels was negligible in three large scale experiments, and only one false protein expression change was observed in the test experiment among more than 1000 proteins. The new model allowed the detection of significant protein expression changes upon VEGF stimulation for 4 and 8 h. The consistency of the changes observed at 4 h was confirmed by a replica at a smaller scale and further validated by Western blot analysis of some proteins. Most of the observed changes have not been described previously and are consistent with a pattern of protein expression that dynamically changes over time following the evolution of the angiogenic response. With this statistical model the 18O labeling approach emerges as a very promising and robust alternative to perform quantitative proteomics studies at a depth of several thousand proteins. PMID:19181660
Basic biostatistics for post-graduate students
Dakhale, Ganesh N.; Hiware, Sachin K.; Shinde, Abhijit T.; Mahatme, Mohini S.
2012-01-01
Statistical methods are important to draw valid conclusions from the obtained data. This article provides background information related to fundamental methods and techniques in biostatistics for the use of postgraduate students. Main focus is given to types of data, measurement of central variations and basic tests, which are useful for analysis of different types of observations. Few parameters like normal distribution, calculation of sample size, level of significance, null hypothesis, indices of variability, and different test are explained in detail by giving suitable examples. Using these guidelines, we are confident enough that postgraduate students will be able to classify distribution of data along with application of proper test. Information is also given regarding various free software programs and websites useful for calculations of statistics. Thus, postgraduate students will be benefitted in both ways whether they opt for academics or for industry. PMID:23087501
Buu, Anne; Williams, L Keoki; Yang, James J
2018-03-01
We propose a new genome-wide association test for mixed binary and continuous phenotypes that uses an efficient numerical method to estimate the empirical distribution of the Fisher's combination statistic under the null hypothesis. Our simulation study shows that the proposed method controls the type I error rate and also maintains its power at the level of the permutation method. More importantly, the computational efficiency of the proposed method is much higher than the one of the permutation method. The simulation results also indicate that the power of the test increases when the genetic effect increases, the minor allele frequency increases, and the correlation between responses decreases. The statistical analysis on the database of the Study of Addiction: Genetics and Environment demonstrates that the proposed method combining multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests.
Caveolins/caveolae protect adipocytes from fatty acid-mediated lipotoxicity.
Meshulam, Tova; Breen, Michael R; Liu, Libin; Parton, Robert G; Pilch, Paul F
2011-08-01
Mice and humans lacking functional caveolae are dyslipidemic and have reduced fat stores and smaller fat cells. To test the role of caveolins/caveolae in maintaining lipid stores and adipocyte integrity, we compared lipolysis in caveolin-1 (Cav1)-null fat cells to that in cells reconstituted for caveolae by caveolin-1 re-expression. We find that the Cav1-null cells have a modestly enhanced rate of lipolysis and reduced cellular integrity compared with reconstituted cells as determined by the release of lipid metabolites and lactic dehydrogenase, respectively, into the media. There are no apparent differences in the levels of lipolytic enzymes or hormonally stimulated phosphorylation events in the two cell lines. In addition, acute fasting, which dramatically raises circulating fatty acid levels in vivo, causes a significant upregulation of caveolar protein constituents. These results are consistent with the hypothesis that caveolae protect fat cells from the lipotoxic effects of elevated levels fatty acids, which are weak detergents at physiological pH, by virtue of the property of caveolae to form detergent-resistant membrane domains.
Caveolins/caveolae protect adipocytes from fatty acid-mediated lipotoxicity
Meshulam, Tova; Breen, Michael R.; Liu, Libin; Parton, Robert G.; Pilch, Paul F.
2011-01-01
Mice and humans lacking functional caveolae are dyslipidemic and have reduced fat stores and smaller fat cells. To test the role of caveolins/caveolae in maintaining lipid stores and adipocyte integrity, we compared lipolysis in caveolin-1 (Cav1)-null fat cells to that in cells reconstituted for caveolae by caveolin-1 re-expression. We find that the Cav1-null cells have a modestly enhanced rate of lipolysis and reduced cellular integrity compared with reconstituted cells as determined by the release of lipid metabolites and lactic dehydrogenase, respectively, into the media. There are no apparent differences in the levels of lipolytic enzymes or hormonally stimulated phosphorylation events in the two cell lines. In addition, acute fasting, which dramatically raises circulating fatty acid levels in vivo, causes a significant upregulation of caveolar protein constituents. These results are consistent with the hypothesis that caveolae protect fat cells from the lipotoxic effects of elevated levels fatty acids, which are weak detergents at physiological pH, by virtue of the property of caveolae to form detergent-resistant membrane domains. PMID:21652731
Investigating Soil Moisture Feedbacks on Precipitation With Tests of Granger Causality
NASA Astrophysics Data System (ADS)
Salvucci, G. D.; Saleem, J. A.; Kaufmann, R.
2002-05-01
Granger causality (GC) is used in the econometrics literature to identify the presence of one- and two-way coupling between terms in noisy multivariate dynamical systems. Here we test for the presence of GC to identify a soil moisture (S) feedback on precipitation (P) using data from Illinois. In this framework S is said to Granger cause P if F(Pt;At-dt)does not equal F(P;(A-S)t-dt) where F denotes the conditional distribution of P at time t, At-dt represents the set of all knowledge available at time t-dt, and (A-S)t-dt represents all knowledge available at t-dt except S. Critical for land-atmosphere interaction research is that At-dt includes all past information on P as well as S. Therefore that part of the relation between past soil moisture and current precipitation which results from precipitation autocorrelation and soil water balance will be accounted for and not attributed to causality. Tests for GC usually specify all relevant variables in a coupled vector autoregressive (VAR) model and then calculate the significance level of decreased predictability as various coupling coefficients are omitted. But because the data (daily precipitation and soil moisture) are distinctly non-Gaussian, we avoid using a VAR and instead express the daily precipitation events as a Markov model. We then test whether the probability of storm occurrence, conditioned on past information on precipitation, changes with information on soil moisture. Past information on precipitation is expressed both as the occurrence of previous day precipitation (to account for storm-scale persistence) and as a simple soil moisture-like precipitation-wetness index derived solely from precipitation (to account for seasonal-scale persistence). In this way only those fluctuations in moisture not attributable to past fluctuations in precipitation (e.g., those due to temperature) can influence the outcome of the test. The null hypothesis (no moisture influence) is evaluated by comparing observed changes in storm probability to Monte-Carlo simulated differences generated with unconditional occurrence probabilities. The null hypothesis is not rejected (p>0.5) suggesting that contrary to recently published results, insufficient evidence exists to support an influence of soil moisture on precipitation in Illinois.
Wang, Hong-Qiang; Tsai, Chung-Jui
2013-01-01
With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher's Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu's procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data. A web server for CorSig is provided at http://202.127.200.1:8080/probeWeb. R code for CorSig is freely available for non-commercial use at http://aspendb.uga.edu/downloads.
The disruption of central CO2 chemosensitivity in a mouse model of Rett syndrome
Zhang, Xiaoli; Su, Junda; Cui, Ningren; Gai, Hongyu; Wu, Zhongying
2011-01-01
People with Rett syndrome (RTT) have breathing instability in addition to other neuropathological manifestations. The breathing disturbances contribute to the high incidence of unexplained death and abnormal brain development. However, the cellular mechanisms underlying the breathing abnormalities remain unclear. To test the hypothesis that the central CO2 chemoreception in these people is disrupted, we studied the CO2 chemosensitivity in a mouse model of RTT. The Mecp2-null mice showed a selective loss of their respiratory response to 1–3% CO2 (mild hypercapnia), whereas they displayed more regular breathing in response to 6–9% CO2 (severe hypercapnia). The defect was alleviated with the NE uptake blocker desipramine (10 mg·kg−1·day−1 ip, for 5–7 days). Consistent with the in vivo observations, in vitro studies in brain slices indicated that CO2 chemosensitivity of locus coeruleus (LC) neurons was impaired in Mecp2-null mice. Two major neuronal pH-sensitive Kir currents that resembled homomeric Kir4.1 and heteromeric Ki4.1/Kir5.1 channels were identified in the LC neurons. The screening of Kir channels with real-time PCR indicated the overexpression of Kir4.1 in the LC region of Mecp2-null mice. In a heterologous expression system, an overexpression of Kir4.1 resulted in a reduction in the pH sensitivity of the heteromeric Kir4.1-Kir5.1 channels. Given that Kir4.1 and Kir5.1 subunits are also expressed in brain stem respiration-related areas, the Kir4.1 overexpression may not allow CO2 to be detected until hypercapnia becomes severe, leading to periodical hyper- and hypoventilation in Mecp2-null mice and, perhaps, in people with RTT as well. PMID:21307341
Alterations in the cholinergic system of brain stem neurons in a mouse model of Rett syndrome.
Oginsky, Max F; Cui, Ningren; Zhong, Weiwei; Johnson, Christopher M; Jiang, Chun
2014-09-15
Rett syndrome is an autism-spectrum disorder resulting from mutations to the X-linked gene, methyl-CpG binding protein 2 (MeCP2), which causes abnormalities in many systems. It is possible that the body may develop certain compensatory mechanisms to alleviate the abnormalities. The norepinephrine system originating mainly in the locus coeruleus (LC) is defective in Rett syndrome and Mecp2-null mice. LC neurons are subject to modulation by GABA, glutamate, and acetylcholine (ACh), providing an ideal system to test the compensatory hypothesis. Here we show evidence for potential compensatory modulation of LC neurons by post- and presynaptic ACh inputs. We found that the postsynaptic currents of nicotinic ACh receptors (nAChR) were smaller in amplitude and longer in decay time in the Mecp2-null mice than in the wild type. Single-cell PCR analysis showed a decrease in the expression of α3-, α4-, α7-, and β3-subunits and an increase in the α5- and α6-subunits in the mutant mice. The α5-subunit was present in many of the LC neurons with slow-decay nAChR currents. The nicotinic modulation of spontaneous GABAA-ergic inhibitory postsynaptic currents in LC neurons was enhanced in Mecp2-null mice. In contrast, the nAChR manipulation of glutamatergic input to LC neurons was unaffected in both groups of mice. Our current-clamp studies showed that the modulation of LC neurons by ACh input was reduced moderately in Mecp2-null mice, despite the major decrease in nAChR currents, suggesting possible compensatory processes may take place, thus reducing the defects to a lesser extent in LC neurons. Copyright © 2014 the American Physiological Society.
Stevens, Karen E; Choo, Kevin S; Stitzel, Jerry A; Marks, Michael J; Adams, Catherine E
2014-03-13
Perinatal choline supplementation has produced several benefits in rodent models, from improved learning and memory to protection from the behavioral effects of fetal alcohol exposure. We have shown that supplemented choline through gestation and lactation produces long-term improvement in deficient sensory inhibition in DBA/2 mice which models a similar deficit in schizophrenia patients. The present study extends that research by feeding normal or supplemented choline diets to DBA/2 mice carrying the null mutation for the α7 nicotinic receptor gene (Chrna7). DBA/2 mice heterozygotic for Chrna7 were bred together. Dams were placed on supplemented (5 gm/kg diet) or normal (1.1 gm/kg diet) choline at mating and remained on the specific diet until offspring weaning. Thereafter, offspring were fed standard rodent chow. Adult offspring were assessed for sensory inhibition. Brains were obtained to ascertain hippocampal α7 nicotinic receptor levels. Choline-supplemented mice heterozygotic or null-mutant for Chrna7 failed to show improvement in sensory inhibition. Only wildtype choline-supplemented mice showed improvement with the effect solely through a decrease in test amplitude. This supports the hypothesis that gestational-choline supplementation is acting through the α7 nicotinic receptor to improve sensory inhibition. Although there was a significant gene-dose-related change in hippocampal α7 receptor numbers, binding studies did not reveal any choline-dose-related change in binding in any hippocampal region, the interaction being driven by a significant genotype main effect (wildtype>heterozygote>null mutant). These data parallel a human study wherein the offspring of pregnant women receiving choline supplementation during gestation, showed better sensory inhibition than offspring of women on placebo. Published by Elsevier B.V.
Stevens, Karen E.; Choo, Kevin S.; Stitzel, Jerry A.; Marks, Michael J.; Adams, Catherine E.
2014-01-01
Perinatal choline supplementation has produced several benefits in rodent models, from improved learning and memory to protection from the behavioral effects of fetal alcohol exposure. We have shown that supplemented choline through gestation and lactation produces long-term improvement in deficient sensory inhibition in DBA/2 mice which models a similar deficit in schizophrenia patients. The present study extends that research by feeding normal or supplemented choline diets to DBA/2 mice carrying the null mutation for the α7 nicotinic receptor gene (Chrna7). DBA/2 mice heterozygotic for Chrna7 were bred together. Dams were placed on supplemented (5 gm/kg diet) or normal (1.1 gm/kg diet) choline at mating and remained on the specific diet until offspring weaning. Thereafter, offspring were fed standard rodent chow. Adult offspring were assessed for sensory inhibition. Brains were obtained to ascertain hippocampal α7 nicotinic receptor levels. Choline-supplemented mice heterozygotic or null-mutant for Chrna7 failed to show improvement in sensory inhibition. Only wildtype choline-supplemented mice showed improvement with the effect solely through a decrease in test amplitude. This supports the hypothesis that gestational-choline supplementation is acting through the α7 nicotinic receptor to improve sensory inhibition. Although there was a significant gene-dose-related change in hippocampal α7 receptor numbers, binding studies did not reveal any choline-dose-related change in binding in any hippocampal region, the interaction being driven by a significant genotype main effect (wildtype>heterozygote>null mutant). These data parallel a human study wherein the offspring of pregnant women receiving choline supplementation during gestation, showed better sensory inhibition than offspring of women on placebo. PMID:24462939
Gotelli, Nicholas J.; Dorazio, Robert M.; Ellison, Aaron M.; Grossman, Gary D.
2010-01-01
Quantifying patterns of temporal trends in species assemblages is an important analytical challenge in community ecology. We describe methods of analysis that can be applied to a matrix of counts of individuals that is organized by species (rows) and time-ordered sampling periods (columns). We first developed a bootstrapping procedure to test the null hypothesis of random sampling from a stationary species abundance distribution with temporally varying sampling probabilities. This procedure can be modified to account for undetected species. We next developed a hierarchical model to estimate species-specific trends in abundance while accounting for species-specific probabilities of detection. We analysed two long-term datasets on stream fishes and grassland insects to demonstrate these methods. For both assemblages, the bootstrap test indicated that temporal trends in abundance were more heterogeneous than expected under the null model. We used the hierarchical model to estimate trends in abundance and identified sets of species in each assemblage that were steadily increasing, decreasing or remaining constant in abundance over more than a decade of standardized annual surveys. Our methods of analysis are broadly applicable to other ecological datasets, and they represent an advance over most existing procedures, which do not incorporate effects of incomplete sampling and imperfect detection.
Pioglitazone in early Parkinson's disease: a phase 2, multicentre, double-blind, randomised trial
2015-01-01
Summary Background A systematic assessment of potential disease-modifying compounds for Parkinson's disease concluded that pioglitazone could hold promise for the treatment of patients with this disease. We assessed the effect of pioglitazone on the progression of Parkinson's disease in a multicentre, double-blind, placebo-controlled, futility clinical trial. Methods Participants with the diagnosis of early Parkinson's disease on a stable regimen of 1 mg/day rasagiline or 10 mg/day selegiline were randomly assigned (1:1:1) to 15 mg/day pioglitazone, 45 mg/day pioglitazone, or placebo. Investigators were masked to the treatment assignment. Only the statistical centre and the central pharmacy knew the treatment name associated with the randomisation number. The primary outcome was the change in the total Unified Parkinson's Disease Rating Scale (UPDRS) score between the baseline and 44 weeks, analysed by intention to treat. The primary null hypothesis for each dose group was that the mean change in UPDRS was 3 points less than the mean change in the placebo group. The alternative hypothesis (of futility) was that pioglitazone is not meaningfully different from placebo. We rejected the null if there was significant evidence of futility at the one-sided alpha level of 0.10. The study is registered at ClinicalTrials.gov, number NCT01280123. Findings 210 patients from 35 sites in the USA were enrolled between May 10, 2011, and July 31, 2013. The primary analysis included 72 patients in the 15 mg group, 67 in the 45 mg group, and 71 in the placebo group. The mean total UPDRS change at 44 weeks was 4.42 (95% CI 2.55–6.28) for 15 mg pioglitazone, 5.13 (95% CI 3.17–7.08) for 45 mg pioglitazone, and 6.25 (95% CI 4.35–8.15) for placebo (higher change scores are worse). The mean difference between the 15 mg and placebo groups was −1.83 (80% CI −3.56 to −0.10) and the null hypothesis could not be rejected (p=0.19). The mean difference between the 45 mg and placebo groups was −1.12 (80% CI −2.93 to 0.69) and the null hypothesis was rejected in favour of futility (p=0.09). Planned sensitivity analyses of the primary outcome, using last value carried forward (LVCF) to handle missing data and using the completers' only sample, suggested that the 15 mg dose is also futile (p=0.09 for LVCF, p=0.09 for completers) but failed to reject the null hypothesis for the 45 mg dose (p=0.12 for LVCF, p=0.19 for completers). Six serious adverse events occurred in the 15 mg group, nine in the 45 mg group, and three in the placebo group; none were thought to be definitely or probably related to the study interventions. Interpretation These findings suggest that pioglitazone at the doses studied here is unlikely to modify progression in early Parkinson's disease. Further study of pioglitazone in a larger trial in patients with Parkinson's disease is not recommended. Funding National Institute of Neurological Disorders and Stroke. PMID:26116315
A null model for microbial diversification
Straub, Timothy J.
2017-01-01
Whether prokaryotes (Bacteria and Archaea) are naturally organized into phenotypically and genetically cohesive units comparable to animal or plant species remains contested, frustrating attempts to estimate how many such units there might be, or to identify the ecological roles they play. Analyses of gene sequences in various closely related prokaryotic groups reveal that sequence diversity is typically organized into distinct clusters, and processes such as periodic selection and extensive recombination are understood to be drivers of cluster formation (“speciation”). However, observed patterns are rarely compared with those obtainable with simple null models of diversification under stochastic lineage birth and death and random genetic drift. Via a combination of simulations and analyses of core and phylogenetic marker genes, we show that patterns of diversity for the genera Escherichia, Neisseria, and Borrelia are generally indistinguishable from patterns arising under a null model. We suggest that caution should thus be taken in interpreting observed clustering as a result of selective evolutionary forces. Unknown forces do, however, appear to play a role in Helicobacter pylori, and some individual genes in all groups fail to conform to the null model. Taken together, we recommend the presented birth−death model as a null hypothesis in prokaryotic speciation studies. It is only when the real data are statistically different from the expectations under the null model that some speciation process should be invoked. PMID:28630293
2015-01-01
Objective: A study to compare the usage of throat swab testing for leukocyte esterase on a test strip(urine dip stick-multi stick) to rapid strep test for rapid diagnosis of Group A Beta hemolytic streptococci in cases of acute pharyngitis in children. Hypothesis: The testing of throat swab for leukocyte esterase on test strip currently used for urine testing may be used to detect throat infection and might be as useful as rapid strep. Methods: All patients who come with a complaint of sore throat and fever were examined clinically for erythema of pharynx, tonsils and also for any exudates. Informed consent was obtained from the parents and assent from the subjects. 3 swabs were taken from pharyngo-tonsillar region, testing for culture, rapid strep & Leukocyte Esterase. Results: Total number is 100. Cultures 9(+); for rapid strep== 84(-) and16 (+); For LE== 80(-) and 20(+) Statistics: From data configuration Rapid Strep versus LE test don’t seem to be a random (independent) assignment but extremely aligned. The Statistical results show rapid and LE show very agreeable results. Calculated Value of Chi Squared Exceeds Tabulated under 1 Degree Of Freedom (P<.0.0001) reject Null Hypothesis and Conclude Alternative Conclusions: Leukocyte esterase on throat swab is as useful as rapid strep test for rapid diagnosis of strep pharyngitis on test strip currently used for urine dip stick causing acute pharyngitis in children. PMID:27335975
Han, Buhm; Kang, Hyun Min; Eskin, Eleazar
2009-01-01
With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu. PMID:19381255
Gauran, Iris Ivy M; Park, Junyong; Lim, Johan; Park, DoHwan; Zylstra, John; Peterson, Thomas; Kann, Maricel; Spouge, John L
2017-09-22
In recent mutation studies, analyses based on protein domain positions are gaining popularity over gene-centric approaches since the latter have limitations in considering the functional context that the position of the mutation provides. This presents a large-scale simultaneous inference problem, with hundreds of hypothesis tests to consider at the same time. This article aims to select significant mutation counts while controlling a given level of Type I error via False Discovery Rate (FDR) procedures. One main assumption is that the mutation counts follow a zero-inflated model in order to account for the true zeros in the count model and the excess zeros. The class of models considered is the Zero-inflated Generalized Poisson (ZIGP) distribution. Furthermore, we assumed that there exists a cut-off value such that smaller counts than this value are generated from the null distribution. We present several data-dependent methods to determine the cut-off value. We also consider a two-stage procedure based on screening process so that the number of mutations exceeding a certain value should be considered as significant mutations. Simulated and protein domain data sets are used to illustrate this procedure in estimation of the empirical null using a mixture of discrete distributions. Overall, while maintaining control of the FDR, the proposed two-stage testing procedure has superior empirical power. 2017 The Authors. Biometrics published by Wiley Periodicals, Inc. on behalf of International Biometric Society This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
FUNGIBILITY AND CONSUMER CHOICE: EVIDENCE FROM COMMODITY PRICE SHOCKS.
Hastings, Justine S; Shapiro, Jesse M
2013-11-01
We formulate a test of the fungibility of money based on parallel shifts in the prices of different quality grades of a commodity. We embed the test in a discrete-choice model of product quality choice and estimate the model using panel microdata on gasoline purchases. We find that when gasoline prices rise consumers substitute to lower octane gasoline, to an extent that cannot be explained by income effects. Across a wide range of specifications, we consistently reject the null hypothesis that households treat "gas money" as fungible with other income. We compare the empirical fit of three psychological models of decision-making. A simple model of category budgeting fits the data well, with models of loss aversion and salience both capturing important features of the time series.
OSO 8 observational limits to the acoustic coronal heating mechanism
NASA Technical Reports Server (NTRS)
Bruner, E. C., Jr.
1981-01-01
An improved analysis of time-resolved line profiles of the C IV resonance line at 1548 A has been used to test the acoustic wave hypothesis of solar coronal heating. It is shown that the observed motions and brightness fluctuations are consistent with the existence of acoustic waves. Specific account is taken of the effect of photon statistics on the observed velocities, and a test is devised to determine whether the motions represent propagating or evanescent waves. It is found that on the average about as much energy is carried upward as downward such that the net acoustic flux density is statistically consistent with zero. The statistical uncertainty in this null result is three orders of magnitue lower than the flux level needed to heat the corona.
FUNGIBILITY AND CONSUMER CHOICE: EVIDENCE FROM COMMODITY PRICE SHOCKS*
Hastings, Justine S.; Shapiro, Jesse M.
2015-01-01
We formulate a test of the fungibility of money based on parallel shifts in the prices of different quality grades of a commodity. We embed the test in a discrete-choice model of product quality choice and estimate the model using panel microdata on gasoline purchases. We find that when gasoline prices rise consumers substitute to lower octane gasoline, to an extent that cannot be explained by income effects. Across a wide range of specifications, we consistently reject the null hypothesis that households treat “gas money” as fungible with other income. We compare the empirical fit of three psychological models of decision-making. A simple model of category budgeting fits the data well, with models of loss aversion and salience both capturing important features of the time series. PMID:26937053
Status and Power Do Not Modulate Automatic Imitation of Intransitive Hand Movements
Farmer, Harry; Carr, Evan W.; Svartdal, Marita; Winkielman, Piotr; Hamilton, Antonia F. de C.
2016-01-01
The tendency to mimic the behaviour of others is affected by a variety of social factors, and it has been argued that such “mirroring” is often unconsciously deployed as a means of increasing affiliation during interpersonal interactions. However, the relationship between automatic motor imitation and status/power is currently unclear. This paper reports five experiments that investigated whether social status (Experiments 1, 2, and 3) or power (Experiments 4 and 5) had a moderating effect on automatic imitation (AI) in finger-movement tasks, using a series of different manipulations. Experiments 1 and 2 manipulated the social status of the observed person using an associative learning task. Experiment 3 manipulated social status via perceived competence at a simple computer game. Experiment 4 manipulated participants’ power (relative to the actors) in a card-choosing task. Finally, Experiment 5 primed participants using a writing task, to induce the sense of being powerful or powerless. No significant interactions were found between congruency and social status/power in any of the studies. Additionally, Bayesian hypothesis testing indicated that the null hypothesis should be favoured over the experimental hypothesis in all five studies. These findings are discussed in terms of their implications for AI tasks, social effects on mimicry, and the hypothesis of mimicry as a strategic mechanism to promote affiliation. PMID:27096167
DOE Office of Scientific and Technical Information (OSTI.GOV)
Robinowitz, R.; Roberts, W.R.; Dolan, M.P.
1989-09-01
This study asked, What are the psychological characteristics of Vietnam combat veterans who claim Agent Orange exposure when compared with combat-experienced cohorts who do not report such contamination The question was researched among 153 heroin addicts, polydrug abusers, and chronic alcoholics who were seeking treatment: 58 reported moderate to high defoliant exposure while in combat; 95 reported minimal to no exposure while in Vietnam. The null hypothesis was accepted for measures of childhood and present family social climate, premilitary backgrounds, reasons for seeking treatment, patterns and types of illicit drug and alcohol use, interpersonal problems, intellectual functioning, and short-term memory.more » The null hypothesis was rejected for personality differences, however, those who self-reported high Agent Orange exposure scored significantly higher on MMPI scales F, Hypochondriasis, Depression, Paranoia, Psychasthenia, Schizophrenia, Mania, and Social interoversion. The results suggest that clinicians carefully assess attributional processing of those who report traumatic experience.« less
Changing world extreme temperature statistics
NASA Astrophysics Data System (ADS)
Finkel, J. M.; Katz, J. I.
2018-04-01
We use the Global Historical Climatology Network--daily database to calculate a nonparametric statistic that describes the rate at which all-time daily high and low temperature records have been set in nine geographic regions (continents or major portions of continents) during periods mostly from the mid-20th Century to the present. This statistic was defined in our earlier work on temperature records in the 48 contiguous United States. In contrast to this earlier work, we find that in every region except North America all-time high records were set at a rate significantly (at least $3\\sigma$) higher than in the null hypothesis of a stationary climate. Except in Antarctica, all-time low records were set at a rate significantly lower than in the null hypothesis. In Europe, North Africa and North Asia the rate of setting new all-time highs increased suddenly in the 1990's, suggesting a change in regional climate regime; in most other regions there was a steadier increase.
Silva, Ivair R
2018-01-15
Type I error probability spending functions are commonly used for designing sequential analysis of binomial data in clinical trials, but it is also quickly emerging for near-continuous sequential analysis of post-market drug and vaccine safety surveillance. It is well known that, for clinical trials, when the null hypothesis is not rejected, it is still important to minimize the sample size. Unlike in post-market drug and vaccine safety surveillance, that is not important. In post-market safety surveillance, specially when the surveillance involves identification of potential signals, the meaningful statistical performance measure to be minimized is the expected sample size when the null hypothesis is rejected. The present paper shows that, instead of the convex Type I error spending shape conventionally used in clinical trials, a concave shape is more indicated for post-market drug and vaccine safety surveillance. This is shown for both, continuous and group sequential analysis. Copyright © 2017 John Wiley & Sons, Ltd.
Patterns in the English language: phonological networks, percolation and assembly models
NASA Astrophysics Data System (ADS)
Stella, Massimo; Brede, Markus
2015-05-01
In this paper we provide a quantitative framework for the study of phonological networks (PNs) for the English language by carrying out principled comparisons to null models, either based on site percolation, randomization techniques, or network growth models. In contrast to previous work, we mainly focus on null models that reproduce lower order characteristics of the empirical data. We find that artificial networks matching connectivity properties of the English PN are exceedingly rare: this leads to the hypothesis that the word repertoire might have been assembled over time by preferentially introducing new words which are small modifications of old words. Our null models are able to explain the ‘power-law-like’ part of the degree distributions and generally retrieve qualitative features of the PN such as high clustering, high assortativity coefficient and small-world characteristics. However, the detailed comparison to expectations from null models also points out significant differences, suggesting the presence of additional constraints in word assembly. Key constraints we identify are the avoidance of large degrees, the avoidance of triadic closure and the avoidance of large non-percolating clusters.
Share Market Analysis Using Various Economical Determinants to Predict Decision of Investors
NASA Astrophysics Data System (ADS)
Ghosh, Arijit; Roy, Samrat; Bandyopadhyay, Gautam; Choudhuri, Kripasindhu
2010-10-01
The following paper tries to develop six major hypotheses in Bombay Stock Exchange (BSE) in India. The paper tries to proof the hypothesis by collecting data from the fields on six sectors: oil prices, gold price, Cash Reserve Ratio, food price inflation, call money rate and Dollar price. The research uses these data as indicators to identify relationship and level of influence on Share prices of Bombay Stock Exchange by rejecting and accepting the null hypothesis.
NASA Astrophysics Data System (ADS)
Al-Sarrani, Nauaf
The purpose of this study was to obtain Science faculty concerns and professional development needs to adopt blended learning in their teaching at Taibah University. To answer these two research questions the survey instrument was designed to collect quantitative and qualitative data from close-ended and open-ended questions. The participants' general characteristics were first presented, then the quantitative measures were presented as the results of the null hypotheses. The data analysis for research question one revealed a statistically significant difference in the participants' concerns in adopting BL by their gender sig = .0015. The significances were found in stages one (sig = .000) and stage five (sig = .006) for female faculty. Therefore, null hypothesis 1.1 was rejected (There are no statistically significant differences between science faculty's gender and their concerns in adopting BL). The data analysis indicated also that there were no relationships between science faculty's age, academic rank, nationality, country of graduation and years of teaching experience and their concerns in adopting BL in their teaching, so the null hypotheses 1.2-7 were accepted (There are no statistically significant differences between Science faculty's age and their concerns in adopting BL, there are no statistically significant differences between Science faculty's academic rank and their concerns in adopting BL, there are no statistically significant differences between Science faculty's nationality and their concerns in adopting BL, there are no statistically significant differences between Science faculty's content area and their concerns in adopting BL, there are no statistically significant differences between Science faculty's country of graduation and their concerns in adopting BL and there are no statistically significant differences between Science faculty's years of teaching experience and their concerns in adopting BL). The data analyses for research question two revealed that there was a statistically significant difference between science faculty's use of technology in teaching by department and their attitudes towards technology integration in the Science curriculum. Lambda MANOVA test result was sig =.019 at the alpha = .05 level. Follow up ANOVA result indicated that Chemistry department was significant in the use of computer-based technology (sig =.049) and instructional technology use (sig =.041). Therefore, null hypothesis 2.1 was rejected (There are no statistically significant differences between science faculty's attitudes towards technology integration in the Science curriculum and faculty's use of technology in teaching by department). The data also revealed that there was no statistically significant difference (p<.05) between science faculty's use of technology in teaching by department and their instructional technology use on pedagogy. Therefore, null hypothesis 2.2 was accepted (There are no statistically significant differences between science faculty's perceptions of the effects of faculty IT use on pedagogy and faculty's use of technology in teaching by department). The data also revealed that there was a statistically significant difference between science faculty's use of technology in teaching by department and their professional development needs in adopting BL. Lambda MANOVA test result was .007 at the alpha = .05 level. The follow up ANOVA results showed that the value of significance of Science faculty's professional development needs for adopting BL was smaller than .05 in the Chemistry department with sig =.001 in instructional technology use. Therefore, null hypothesis 2.3 was rejected (There are no statistically significant differences between Science faculty's perceptions of technology professional development needs and faculty's use of technology in teaching by department). Qualitative measures included analyzing data based on answers to three open-ended questions, numbers thirty-six, seventy-four, and seventy-five. These three questions were on blended learning concerns comments (question 36, which had 10 units), professional development activities, support, or incentive requested (question 74, which had 28 units), and the most important professional development activities, support, or incentive (question 75, which had 37 units). These questions yielded 75 units, 23 categories and 8 themes that triangulated with the quantitative data. These 8 themes were then combined to obtain overall themes for all qualitative questions in the study. The two most important themes were "Professional development" with three categories; Professional development through workshops (10 units), Workshops (10 units), Professional development (5 units) and the second overall theme was "Technical support" with two categories: Internet connectivity (4 units), and Technical support (4 units). Finally, based on quantitative and qualitative data, the summary, conclusions, and recommendations for Taibah University regarding faculty adoption of BL in teaching were presented. The recommendations for future studies focused on Science faculty Level of Use and technology use in Saudi universities.
NASA Astrophysics Data System (ADS)
Goff, J.; Zahirovic, S.; Müller, D.
2017-12-01
Recently published spectral analyses of seafloor bathymetry concluded that abyssal hills, highly linear ridges that are formed along seafloor spreading centers, exhibit periodicities that correspond to Milankovitch cycles - variations in Earth's orbit that affect climate on periods of 23, 41 and 100 thousand years. These studies argue that this correspondence could be explained by modulation of volcanic output at the mid-ocean ridge due to lithostatic pressure variations associated with rising and falling sea level. If true, then the implications are substantial: mapping the topography of the seafloor with sonar could be used as a way to investigate past climate change. This "Milankovitch cycle" hypothesis predicts that the rise and fall of abyssal hills will be correlated to crustal age, which can be tested by stacking, or averaging, bathymetry as a function of age; stacking will enhance any age-dependent signal while suppressing random components, such as fault-generated topography. We apply age-stacking to data flanking the Southeast Indian Ridge ( 3.6 cm/yr half rate), northern East Pacific Rise ( 5.4 cm/yr half rate) and southern East Pacific Rise ( 7.8 cm/yr half rate), where multibeam bathymetric coverage is extensive on the ridge flanks. At the greatest precision possible given magnetic anomaly data coverage, we have revised digital crustal age models in these regions with updated axis and magnetic anomaly traces. We also utilize known 2nd-order spatial statistical properties of abyssal hills to predict the variability of the age-stack under the null hypothesis that abyssal hills are entirely random with respect to crustal age; the age-stacked profile is significantly different from zero only if it exceeds this expected variability by a large margin. Our results indicate, however, that the null hypothesis satisfactorily explains the age-stacking results in all three regions of study, thus providing no support for the Milankovitch cycle hypothesis. The random nature of abyssal hills is consistent with a primarily faulted origin. .
Soave, David; Sun, Lei
2017-09-01
We generalize Levene's test for variance (scale) heterogeneity between k groups for more complex data, when there are sample correlation and group membership uncertainty. Following a two-stage regression framework, we show that least absolute deviation regression must be used in the stage 1 analysis to ensure a correct asymptotic χk-12/(k-1) distribution of the generalized scale (gS) test statistic. We then show that the proposed gS test is independent of the generalized location test, under the joint null hypothesis of no mean and no variance heterogeneity. Consequently, we generalize the recently proposed joint location-scale (gJLS) test, valuable in settings where there is an interaction effect but one interacting variable is not available. We evaluate the proposed method via an extensive simulation study and two genetic association application studies. © 2017 The Authors Biometrics published by Wiley Periodicals, Inc. on behalf of International Biometric Society.
Testing goodness of fit in regression: a general approach for specified alternatives.
Solari, Aldo; le Cessie, Saskia; Goeman, Jelle J
2012-12-10
When fitting generalized linear models or the Cox proportional hazards model, it is important to have tools to test for lack of fit. Because lack of fit comes in all shapes and sizes, distinguishing among different types of lack of fit is of practical importance. We argue that an adequate diagnosis of lack of fit requires a specified alternative model. Such specification identifies the type of lack of fit the test is directed against so that if we reject the null hypothesis, we know the direction of the departure from the model. The goodness-of-fit approach of this paper allows to treat different types of lack of fit within a unified general framework and to consider many existing tests as special cases. Connections with penalized likelihood and random effects are discussed, and the application of the proposed approach is illustrated with medical examples. Tailored functions for goodness-of-fit testing have been implemented in the R package global test. Copyright © 2012 John Wiley & Sons, Ltd.
A search for optical beacons: implications of null results.
Blair, David G; Zadnik, Marjan G
2002-01-01
Over the past few years a series of searches for interstellar radio beacons have taken place using the Parkes radio telescope. Here we report hitherto unpublished results from a search for optical beacons from 60 solar-type stars using the Perth-Lowell telescope. We discuss the significance of the null results from these searches, all of which were based on the interstellar contact channel hypothesis. While the null results of all searches to date can be explained simply by the nonexistence of electromagnetically communicating life elsewhere in the Milky Way, four other possible explanations that do not preclude its existence are proposed: (1) Extraterrestrial civilizations desiring to make contact through the use of electromagnetic beacons have a very low density in the Milky Way. (2) The interstellar contact channel hypothesis is incorrect, and beacons exist at frequencies that have not yet been searched. (3) The search has been incomplete in terms of sensitivity and/or target directions: Beacons exist, but more sensitive equipment and/or more searching is needed to achieve success. (4) The search has occurred before beacon signals can be expected to have arrived at the Earth, and beacon signals may be expected in the future. Based on consideration of the technology required for extraterrestrial civilizations to identify target planets, we argue that the fourth possibility is likely to be valid and that powerful, easily detectable beacons could be received in coming centuries.
A Unified Mixed-Effects Model for Rare-Variant Association in Sequencing Studies
Sun, Jianping; Zheng, Yingye; Hsu, Li
2013-01-01
For rare-variant association analysis, due to extreme low frequencies of these variants, it is necessary to aggregate them by a prior set (e.g., genes and pathways) in order to achieve adequate power. In this paper, we consider hierarchical models to relate a set of rare variants to phenotype by modeling the effects of variants as a function of variant characteristics while allowing for variant-specific effect (heterogeneity). We derive a set of two score statistics, testing the group effect by variant characteristics and the heterogeneity effect. We make a novel modification to these score statistics so that they are independent under the null hypothesis and their asymptotic distributions can be derived. As a result, the computational burden is greatly reduced compared with permutation-based tests. Our approach provides a general testing framework for rare variants association, which includes many commonly used tests, such as the burden test [Li and Leal, 2008] and the sequence kernel association test [Wu et al., 2011], as special cases. Furthermore, in contrast to these tests, our proposed test has an added capacity to identify which components of variant characteristics and heterogeneity contribute to the association. Simulations under a wide range of scenarios show that the proposed test is valid, robust and powerful. An application to the Dallas Heart Study illustrates that apart from identifying genes with significant associations, the new method also provides additional information regarding the source of the association. Such information may be useful for generating hypothesis in future studies. PMID:23483651
Dougherty, Michael R; Hamovitz, Toby; Tidwell, Joe W
2016-02-01
A recent meta-analysis by Au et al. Psychonomic Bulletin & Review, 22, 366-377, (2015) reviewed the n-back training paradigm for working memory (WM) and evaluated whether (when aggregating across existing studies) there was evidence that gains obtained for training tasks transferred to gains in fluid intelligence (Gf). Their results revealed an overall effect size of g = 0.24 for the effect of n-back training on Gf. We reexamine the data through a Bayesian lens, to evaluate the relative strength of the evidence for the alternative versus null hypotheses, contingent on the type of control condition used. We find that studies using a noncontact (passive) control group strongly favor the alternative hypothesis that training leads to transfer but that studies using active-control groups show modest evidence in favor of the null. We discuss these findings in the context of placebo effects.
de Mendoza, Guillermo; Traunspurger, Walter; Palomo, Alejandro; Catalan, Jordi
2017-05-01
Nematode species are widely tolerant of environmental conditions and disperse passively. Therefore, the species richness distribution in this group might largely depend on the topological distribution of the habitats and main aerial and aquatic dispersal pathways connecting them. If so, the nematode species richness distributions may serve as null models for evaluating that of other groups more affected by environmental gradients. We investigated this hypothesis in lakes across an altitudinal gradient in the Pyrenees. We compared the altitudinal distribution, environmental tolerance, and species richness, of nematodes with that of three other invertebrate groups collected during the same sampling: oligochaetes, chironomids, and nonchironomid insects. We tested the altitudinal bias in distributions with t -tests and the significance of narrow-ranging altitudinal distributions with randomizations. We compared results between groups with Fisher's exact tests. We then explored the influence of environmental factors on species assemblages in all groups with redundancy analysis (RDA), using 28 environmental variables. And, finally, we analyzed species richness patterns across altitude with simple linear and quadratic regressions. Nematode species were rarely biased from random distributions (5% of species) in contrast with other groups (35%, 47%, and 50%, respectively). The altitudinal bias most often shifted toward low altitudes (85% of biased species). Nematodes showed a lower portion of narrow-ranging species than any other group, and differed significantly from nonchironomid insects (10% and 43%, respectively). Environmental variables barely explained nematode assemblages (RDA adjusted R 2 = 0.02), in contrast with other groups (0.13, 0.19 and 0.24). Despite these substantial differences in the response to environmental factors, species richness across altitude was unimodal, peaking at mid elevations, in all groups. This similarity indicates that the spatial distribution of lakes across altitude is a primary driver of invertebrate richness. Provided that nematodes are ubiquitous, their distribution offers potential null models to investigate species richness across environmental gradients in other ecosystem types and biogeographic regions.
Fienup, Daniel M; Critchfield, Thomas S
2010-01-01
Computerized lessons that reflect stimulus equivalence principles were used to teach college students concepts related to inferential statistics and hypothesis decision making. Lesson 1 taught participants concepts related to inferential statistics, and Lesson 2 taught them to base hypothesis decisions on a scientific hypothesis and the direction of an effect. Lesson 3 taught the conditional influence of inferential statistics over decisions regarding the scientific and null hypotheses. Participants entered the study with low scores on the targeted skills and left the study demonstrating a high level of accuracy on these skills, which involved mastering more relations than were taught formally. This study illustrates the efficiency of equivalence-based instruction in establishing academic skills in sophisticated learners. PMID:21358904
Universality hypothesis breakdown at one-loop order
NASA Astrophysics Data System (ADS)
Carvalho, P. R. S.
2018-05-01
We probe the universality hypothesis by analytically computing the at least two-loop corrections to the critical exponents for q -deformed O (N ) self-interacting λ ϕ4 scalar field theories through six distinct and independent field-theoretic renormalization group methods and ɛ -expansion techniques. We show that the effect of q deformation on the one-loop corrections to the q -deformed critical exponents is null, so the universality hypothesis is broken down at this loop order. Such an effect emerges only at the two-loop and higher levels, and the validity of the universality hypothesis is restored. The q -deformed critical exponents obtained through the six methods are the same and, furthermore, reduce to their nondeformed values in the appropriated limit.
Collins, Ryan L; Hu, Ting; Wejse, Christian; Sirugo, Giorgio; Williams, Scott M; Moore, Jason H
2013-02-18
Identifying high-order genetics associations with non-additive (i.e. epistatic) effects in population-based studies of common human diseases is a computational challenge. Multifactor dimensionality reduction (MDR) is a machine learning method that was designed specifically for this problem. The goal of the present study was to apply MDR to mining high-order epistatic interactions in a population-based genetic study of tuberculosis (TB). The study used a previously published data set consisting of 19 candidate single-nucleotide polymorphisms (SNPs) in 321 pulmonary TB cases and 347 healthy controls from Guniea-Bissau in Africa. The ReliefF algorithm was applied first to generate a smaller set of the five most informative SNPs. MDR with 10-fold cross-validation was then applied to look at all possible combinations of two, three, four and five SNPs. The MDR model with the best testing accuracy (TA) consisted of SNPs rs2305619, rs187084, and rs11465421 (TA = 0.588) in PTX3, TLR9 and DC-Sign, respectively. A general 1000-fold permutation test of the null hypothesis of no association confirmed the statistical significance of the model (p = 0.008). An additional 1000-fold permutation test designed specifically to test the linear null hypothesis that the association effects are only additive confirmed the presence of non-additive (i.e. nonlinear) or epistatic effects (p = 0.013). An independent information-gain measure corroborated these results with a third-order epistatic interaction that was stronger than any lower-order associations. We have identified statistically significant evidence for a three-way epistatic interaction that is associated with susceptibility to TB. This interaction is stronger than any previously described one-way or two-way associations. This study highlights the importance of using machine learning methods that are designed to embrace, rather than ignore, the complexity of common diseases such as TB. We recommend future studies of the genetics of TB take into account the possibility that high-order epistatic interactions might play an important role in disease susceptibility.
Sequential Probability Ratio Test for Spacecraft Collision Avoidance Maneuver Decisions
NASA Technical Reports Server (NTRS)
Carpenter, J. Russell; Markley, F. Landis
2013-01-01
A document discusses sequential probability ratio tests that explicitly allow decision-makers to incorporate false alarm and missed detection risks, and are potentially less sensitive to modeling errors than a procedure that relies solely on a probability of collision threshold. Recent work on constrained Kalman filtering has suggested an approach to formulating such a test for collision avoidance maneuver decisions: a filter bank with two norm-inequality-constrained epoch-state extended Kalman filters. One filter models the null hypotheses that the miss distance is inside the combined hard body radius at the predicted time of closest approach, and one filter models the alternative hypothesis. The epoch-state filter developed for this method explicitly accounts for any process noise present in the system. The method appears to work well using a realistic example based on an upcoming, highly elliptical orbit formation flying mission.
Using potential performance theory to test five hypotheses about meta-attribution.
Trafimow, David; Hunt, Gayle; Rice, Stephen; Geels, Kasha
2011-01-01
Based on I. Kant's (1991) distinction between perfect and imperfect duties and the attribution literature pertaining to that distinction, the authors proposed and tested 5 hypotheses about meta-attribution. More specifically, violations of perfect duties have been shown to arouse both more negative affect and stronger correspondent inferences than do violations of imperfect duties (e.g., D. Trafimow, I. K. Bromgard, K. A. Finlay, & T. Ketelaar, 2005). But when it comes to making meta-attributions-that is, guessing the attributions others would make-is the affect differential an advantage or a disadvantage? In addition to the null hypothesis of no effect, the authors proposed and tested additional hypotheses about how negative affect might increase or decrease the effectiveness of people's meta-attribution strategies and how even if there is no effect on strategy effectiveness, negative affect could increase or decrease the consistencies with which these strategies could be used.
Replicates in high dimensions, with applications to latent variable graphical models.
Tan, Kean Ming; Ning, Yang; Witten, Daniela M; Liu, Han
2016-12-01
In classical statistics, much thought has been put into experimental design and data collection. In the high-dimensional setting, however, experimental design has been less of a focus. In this paper, we stress the importance of collecting multiple replicates for each subject in this setting. We consider learning the structure of a graphical model with latent variables, under the assumption that these variables take a constant value across replicates within each subject. By collecting multiple replicates for each subject, we are able to estimate the conditional dependence relationships among the observed variables given the latent variables. To test the null hypothesis of conditional independence between two observed variables, we propose a pairwise decorrelated score test. Theoretical guarantees are established for parameter estimation and for this test. We show that our proposal is able to estimate latent variable graphical models more accurately than some existing proposals, and apply the proposed method to a brain imaging dataset.
Competition between trees and grasses for both soil water and mineral nitrogen in dry savannas.
Donzelli, D; De Michele, C; Scholes, R J
2013-09-07
The co-existence of trees and grasses in savannas in general can be the result of processes involving competition for resources (e.g. water and nutrients) or differential response to disturbances such as fire, animals and human activities; or a combination of both broad mechanisms. In moist savannas, the tree-grass coexistence is mainly attributed to of disturbances, while in dry savannas, limiting resources are considered the principal mechanism of co-existence. Virtually all theoretical explorations of tree-grass dynamics in dry savannas consider only competition for soil water. Here we investigate whether coexistence could result from a balanced competition for two resources, namely soil water and mineral nitrogen. We introduce a simple dynamical resource-competition model for trees and grasses. We consider two alternative hypotheses: (1) trees are the superior competitors for nitrogen while grasses are superior competitors for water, and (2) vice-versa. We study the model properties under the two hypotheses and test each hypothesis against data from 132 dry savannas in Africa using Kendall's test of independence. We find that Hypothesis 1 gets much more support than Hypothesis 2, and more support than the null hypothesis that neither is operative. We further consider gradients of rainfall and nitrogen availability and find that the Hypothesis 1 model reproduces the observed patterns in nature. We do not consider our results to definitively show that tree-grass coexistence in dry savannas is due to balanced competition for water and nitrogen, but show that this mechanism is a possibility, which cannot be a priori excluded and should thus be considered along with the more traditional explanations. Copyright © 2013 Elsevier Ltd. All rights reserved.
Predictive Fusion of Geophysical Waveforms using Fisher's Method, under the Alternative Hypothesis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carmichael, Joshua Daniel; Nemzek, Robert James; Webster, Jeremy David
2017-05-05
This presentation tries to understand how to combine different signatures from an event or source together in a defensible way. The objective was to build a digital detector that continuously combines detection statistics recording explosions to screen sources of interest from null sources.
Effect of Computer-Based Video Games on Children: An Experimental Study
ERIC Educational Resources Information Center
Chuang, Tsung-Yen; Chen, Wei-Fan
2009-01-01
This experimental study investigated whether computer-based video games facilitate children's cognitive learning. In comparison to traditional computer-assisted instruction (CAI), this study explored the impact of the varied types of instructional delivery strategies on children's learning achievement. One major research null hypothesis was…
Watanabe, Hiroshi; Nomura, Yoshikazu; Kuribayashi, Ami; Kurabayashi, Tohru
2018-02-01
We aimed to employ the Radia diagnostic software with the safety and efficacy of a new emerging dental X-ray modality (SEDENTEXCT) image quality (IQ) phantom in CT, and to evaluate its validity. The SEDENTEXCT IQ phantom and Radia diagnostic software were employed. The phantom was scanned using one medical full-body CT and two dentomaxillofacial cone beam CTs. The obtained images were imported to the Radia software, and the spatial resolution outputs were evaluated. The oversampling method was employed using our original wire phantom as a reference. The resultant modulation transfer function (MTF) curves were compared. The null hypothesis was that MTF curves generated using both methods would be in agreement. One-way analysis of variance tests were applied to the f50 and f10 values from the MTF curves. The f10 values were subjectively confirmed by observing the line pair modules. The Radia software reported the MTF curves on the xy-plane of the CT scans, but could not return f50 and f10 values on the z-axis. The null hypothesis concerning the reported MTF curves on the xy-plane was rejected. There were significant differences between the results of the Radia software and our reference method, except for f10 values in CS9300. These findings were consistent with our line pair observations. We evaluated the validity of the Radia software with the SEDENTEXCT IQ phantom. The data provided were semi-automatic, albeit with problems and statistically different from our reference. We hope the manufacturer will overcome these limitations.
A Statistical Test of Correlations and Periodicities in the Geological Records
NASA Astrophysics Data System (ADS)
Yabushita, S.
1997-09-01
Matsumoto & Kubotani argued that there is a positive and statistically significant correlation between cratering and mass extinction. This argument is critically examined by adopting a method of Ertel used by Matsumoto & Kubotani but by applying it more directly to the extinction and cratering records. It is shown that on the null-hypothesis of random distribution of crater ages, the observed correlation has a probability of occurrence of 13%. However, when large craters are excluded whose ages agree with the times of peaks of extinction rate of marine fauna, one obtains a negative correlation. This result strongly indicates that mass extinction are not due to accumulation of impacts but due to isolated gigantic impacts.
NASA Astrophysics Data System (ADS)
Sahoo, Ramendra; Jain, Vikrant
2017-04-01
Morphology of the landscape and derived features are regarded to be an important tool for inferring about tectonic activity in an area, since surface exposures of these subsurface processes may not be available or may get eroded away over time. This has led to an extensive research in application of the non-planar morphological attributes like river long profile and hypsometry for tectonic studies, whereas drainage network as a proxy for tectonic activity has not been explored greatly. Though, significant work has been done on drainage network pattern which started in a qualitative manner and over the years, has evolved to incorporate more quantitative aspects, like studying the evolution of a network under the influence of external and internal controls. Random Topology (RT) model is one of these concepts, which elucidates the connection between evolution of a drainage network pattern and the entropy of the drainage system and it states that in absence of any geological controls, a natural population of channel networks will be topologically random. We have used the entropy maximization principle to provide a theoretical structure for the RT model. Furthermore, analysis was carried out on the drainage network structures around Jwalamukhi thrust in the Kangra reentrant in western Himalayas, India, to investigate the tectonic activity in the region. Around one thousand networks were extracted from the foot-wall (fw) and hanging-wall (hw) region of the thrust sheet and later categorized based on their magnitudes. We have adopted the goodness of fit test for comparing the network patterns in fw and hw drainage with those derived using the RT model. The null hypothesis for the test was, the drainage networks in the fw are statistically more similar than those on the hw, to the network patterns derived using the RT model for any given magnitude. The test results are favorable to our null hypothesis for networks with smaller magnitudes (< 9), whereas for larger magnitudes, both hw and fw networks were found to be statistically not similar to the model network patterns. Calculation of pattern frequency for each magnitude and subsequent hypothesis testing were carried out using Matlab (v R2015a). Our results will help to define drainage network pattern as one of the geomorphic proxy to identify tectonically active area. This study also serve as a supplementary proof of the neo-tectonic control on the morphology of landscape and its derivatives around the Jwalamukhi thrust. Additionally, it will help to verify the theory of probabilistic evolution of drainage networks.
On the importance of avoiding shortcuts in applying cognitive models to hierarchical data.
Boehm, Udo; Marsman, Maarten; Matzke, Dora; Wagenmakers, Eric-Jan
2018-06-12
Psychological experiments often yield data that are hierarchically structured. A number of popular shortcut strategies in cognitive modeling do not properly accommodate this structure and can result in biased conclusions. To gauge the severity of these biases, we conducted a simulation study for a two-group experiment. We first considered a modeling strategy that ignores the hierarchical data structure. In line with theoretical results, our simulations showed that Bayesian and frequentist methods that rely on this strategy are biased towards the null hypothesis. Secondly, we considered a modeling strategy that takes a two-step approach by first obtaining participant-level estimates from a hierarchical cognitive model and subsequently using these estimates in a follow-up statistical test. Methods that rely on this strategy are biased towards the alternative hypothesis. Only hierarchical models of the multilevel data lead to correct conclusions. Our results are particularly relevant for the use of hierarchical Bayesian parameter estimates in cognitive modeling.
NASA Astrophysics Data System (ADS)
Cianciara, Aleksander
2016-09-01
The paper presents the results of research aimed at verifying the hypothesis that the Weibull distribution is an appropriate statistical distribution model of microseismicity emission characteristics, namely: energy of phenomena and inter-event time. It is understood that the emission under consideration is induced by the natural rock mass fracturing. Because the recorded emission contain noise, therefore, it is subjected to an appropriate filtering. The study has been conducted using the method of statistical verification of null hypothesis that the Weibull distribution fits the empirical cumulative distribution function. As the model describing the cumulative distribution function is given in an analytical form, its verification may be performed using the Kolmogorov-Smirnov goodness-of-fit test. Interpretations by means of probabilistic methods require specifying the correct model describing the statistical distribution of data. Because in these methods measurement data are not used directly, but their statistical distributions, e.g., in the method based on the hazard analysis, or in that that uses maximum value statistics.
Maside, X R; Naveira, H F
1996-10-01
The observation of segregation ratios of sterile and fertile males in offspring samples from backcrossed hybrid females is, in principle, a valid method to unveil the genetic basis of hybrid male sterility in Drosophila. When the female parent is heterozygous (hybrid) for a sterility factor with major effects, equal proportions of fertile and sterile sons are expected in her offspring. However, intact (not recombined) chromosome segments of considerable length are expected to give segregation ratios that can not be easily differentiated from the 1:1 ratio expected from a single factor. When the phenotypic character under analysis can be determined by combinations of minor factors from the donor species spanning a certain chromosome length, very large offspring samples may be needed to test this alternative hypothesis against the null hypothesis of a single major factor. This is particularly the case of hybrid male sterility determinants in Drosophila.
Dackor, J.; Strunk, K. E.; Wehmeyer, M. M.; Threadgill, D. W.
2007-01-01
Homozygosity for the Egfrtm1Mag null allele in mice leads to genetic background dependent placental abnormalities and embryonic lethality. Molecular mechanisms or genetic modifiers that differentiate strains with surviving versus non-surviving Egfr nullizygous embryos have yet to be identified. Egfr transcripts in wildtype placenta was quantified by ribonuclease protection assay (RPA) and the lowest level of Egfr mRNA expression was found to coincide with Egfrtm1Mag homozygous lethality. Immunohistochemical analysis of ERBB family receptors, ERBB2, ERBB3, and ERBB4, showed similar expression between Egfr wildtype and null placentas indicating that Egfr null trophoblast do not up-regulate these receptors to compensate for EGFR deficiency. Significantly fewer numbers of bromodeoxyuridine (BrdU) positive trophoblast were observed in Egfr nullizygous placentas and Cdc25a and Myc, genes associated with proliferation, were significantly down-regulated in null placentas. However, strains with both mild and severe placental phenotypes exhibit reduced proliferation suggesting that this defect alone does not account for strain-specific embryonic lethality. Consistent with this hypothesis, intercrosses generating mice null for cell cycle checkpoint genes (Trp53, Rb1, Cdkn1a, Cdkn1b or Cdkn2c) in combination with Egfr deficiency did not increase survival of Egfr nullizygous embryos. Since complete development of the spongiotrophoblast compartment is not required for survival of Egfr nullizygous embryos, reduction of this layer that is commonly observed in Egfr nullizygous placentas likely accounts for the decrease in proliferation. PMID:17822758
The Heuristic Value of p in Inductive Statistical Inference
Krueger, Joachim I.; Heck, Patrick R.
2017-01-01
Many statistical methods yield the probability of the observed data – or data more extreme – under the assumption that a particular hypothesis is true. This probability is commonly known as ‘the’ p-value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p-value has been subjected to much speculation, analysis, and criticism. We explore how well the p-value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p-value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p-value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say. PMID:28649206
The Heuristic Value of p in Inductive Statistical Inference.
Krueger, Joachim I; Heck, Patrick R
2017-01-01
Many statistical methods yield the probability of the observed data - or data more extreme - under the assumption that a particular hypothesis is true. This probability is commonly known as 'the' p -value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p -value has been subjected to much speculation, analysis, and criticism. We explore how well the p -value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p -value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p -value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say.
Subliminal or not? Comparing null-hypothesis and Bayesian methods for testing subliminal priming.
Sand, Anders; Nilsson, Mats E
2016-08-01
A difficulty for reports of subliminal priming is demonstrating that participants who actually perceived the prime are not driving the priming effects. There are two conventional methods for testing this. One is to test whether a direct measure of stimulus perception is not significantly above chance on a group level. The other is to use regression to test if an indirect measure of stimulus processing is significantly above zero when the direct measure is at chance. Here we simulated samples in which we assumed that only participants who perceived the primes were primed by it. Conventional analyses applied to these samples had a very large error rate of falsely supporting subliminal priming. Calculating a Bayes factor for the samples very seldom falsely supported subliminal priming. We conclude that conventional tests are not reliable diagnostics of subliminal priming. Instead, we recommend that experimenters calculate a Bayes factor when investigating subliminal priming. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Order-restricted inference for means with missing values.
Wang, Heng; Zhong, Ping-Shou
2017-09-01
Missing values appear very often in many applications, but the problem of missing values has not received much attention in testing order-restricted alternatives. Under the missing at random (MAR) assumption, we impute the missing values nonparametrically using kernel regression. For data with imputation, the classical likelihood ratio test designed for testing the order-restricted means is no longer applicable since the likelihood does not exist. This article proposes a novel method for constructing test statistics for assessing means with an increasing order or a decreasing order based on jackknife empirical likelihood (JEL) ratio. It is shown that the JEL ratio statistic evaluated under the null hypothesis converges to a chi-bar-square distribution, whose weights depend on missing probabilities and nonparametric imputation. Simulation study shows that the proposed test performs well under various missing scenarios and is robust for normally and nonnormally distributed data. The proposed method is applied to an Alzheimer's disease neuroimaging initiative data set for finding a biomarker for the diagnosis of the Alzheimer's disease. © 2017, The International Biometric Society.
Active optics null test system based on a liquid crystal programmable spatial light modulator
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ares, Miguel; Royo, Santiago; Sergievskaya, Irina
2010-11-10
We present an active null test system adapted to test lenses and wavefronts with complex shapes and strong local deformations. This system provides greater flexibility than conventional static null tests that match only a precisely positioned, individual wavefront. The system is based on a cylindrical Shack-Hartmann wavefront sensor, a commercial liquid crystal programmable phase modulator (PPM), which acts as the active null corrector, enabling the compensation of large strokes with high fidelity in a single iteration, and a spatial filter to remove unmodulated light when steep phase changes are compensated. We have evaluated the PPM's phase response at 635 nmmore » and checked its performance by measuring its capability to generate different amounts of defocus aberration, finding root mean squared errors below {lambda}/18 for spherical wavefronts with peak-to-valley heights of up to 78.7{lambda}, which stands as the limit from which diffractive artifacts created by the PPM have been found to be critical under no spatial filtering. Results of a null test for a complex lens (an ophthalmic customized progressive addition lens) are presented and discussed.« less
Wavelet analysis in ecology and epidemiology: impact of statistical tests
Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario
2014-01-01
Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the ‘beta-surrogate’ method. PMID:24284892
Wavelet analysis in ecology and epidemiology: impact of statistical tests.
Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario
2014-02-06
Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the 'beta-surrogate' method.
ERIC Educational Resources Information Center
Dana, Richard H., Ed.
This collection of papers includes: (1) "An Assessment-Intervention Model for Research and Practice with Multicultural Populations" (Richard H. Dana); (2) "An Africentric Perspective for Clinical Research and Practice" (Edward F. Morris); (3) "Myths about the Null Hypothesis and the Path to Reform" (Robert G.…
Neuroimaging Research: from Null-Hypothesis Falsification to Out-Of-Sample Generalization
ERIC Educational Resources Information Center
Bzdok, Danilo; Varoquaux, Gaël; Thirion, Bertrand
2017-01-01
Brain-imaging technology has boosted the quantification of neurobiological phenomena underlying human mental operations and their disturbances. Since its inception, drawing inference on neurophysiological effects hinged on classical statistical methods, especially, the general linear model. The tens of thousands of variables per brain scan were…
Null alleles are ubiquitous at microsatellite loci in the Wedge Clam (Donax trunculus)
Cuesta, Jose Antonio; Drake, Pilar; Macpherson, Enrique; Bernatchez, Louis
2017-01-01
Recent studies have reported an unusually high frequency of nonamplifying alleles at microsatellite loci in bivalves. Null alleles have been associated with heterozygous deficits in many studies. While several studies have tested for its presence using different analytical tools, few have empirically tested for its consequences in estimating population structure and differentiation. We characterised 16 newly developed microsatellite loci and show that null alleles are ubiquitous in the wedge clam, Donax trunculus. We carried out several tests to demonstrate that the large heterozygous deficits observed in the newly characterised loci were most likely due to null alleles. We tested the robustness of microsatellite genotyping for population assignment by showing that well-recognised biogeographic regions of the south Atlantic and south Mediterranean coast of Spain harbour genetically different populations. PMID:28439464
Estimating the proportion of true null hypotheses when the statistics are discrete.
Dialsingh, Isaac; Austin, Stefanie R; Altman, Naomi S
2015-07-15
In high-dimensional testing problems π0, the proportion of null hypotheses that are true is an important parameter. For discrete test statistics, the P values come from a discrete distribution with finite support and the null distribution may depend on an ancillary statistic such as a table margin that varies among the test statistics. Methods for estimating π0 developed for continuous test statistics, which depend on a uniform or identical null distribution of P values, may not perform well when applied to discrete testing problems. This article introduces a number of π0 estimators, the regression and 'T' methods that perform well with discrete test statistics and also assesses how well methods developed for or adapted from continuous tests perform with discrete tests. We demonstrate the usefulness of these estimators in the analysis of high-throughput biological RNA-seq and single-nucleotide polymorphism data. implemented in R. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Testing for carryover effects after cessation of treatments: a design approach.
Sturdevant, S Gwynn; Lumley, Thomas
2016-08-02
Recently, trials addressing noisy measurements with diagnosis occurring by exceeding thresholds (such as diabetes and hypertension) have been published which attempt to measure carryover - the impact that treatment has on an outcome after cessation. The design of these trials has been criticised and simulations have been conducted which suggest that the parallel-designs used are not adequate to test this hypothesis; two solutions are that either a differing parallel-design or a cross-over design could allow for diagnosis of carryover. We undertook a systematic simulation study to determine the ability of a cross-over or a parallel-group trial design to detect carryover effects on incident hypertension in a population with prehypertension. We simulated blood pressure and focused on varying criteria to diagnose systolic hypertension. Using the difference in cumulative incidence hypertension to analyse parallel-group or cross-over trials resulted in none of the designs having acceptable Type I error rate. Under the null hypothesis of no carryover the difference is well above the nominal 5 % error rate. When a treatment is effective during the intervention period, reliable testing for a carryover effect is difficult. Neither parallel-group nor cross-over designs using the difference in cumulative incidence appear to be a feasible approach. Future trials should ensure their design and analysis is validated by simulation.
Uno, Hajime; Tian, Lu; Claggett, Brian; Wei, L J
2015-12-10
With censored event time observations, the logrank test is the most popular tool for testing the equality of two underlying survival distributions. Although this test is asymptotically distribution free, it may not be powerful when the proportional hazards assumption is violated. Various other novel testing procedures have been proposed, which generally are derived by assuming a class of specific alternative hypotheses with respect to the hazard functions. The test considered by Pepe and Fleming (1989) is based on a linear combination of weighted differences of the two Kaplan-Meier curves over time and is a natural tool to assess the difference of two survival functions directly. In this article, we take a similar approach but choose weights that are proportional to the observed standardized difference of the estimated survival curves at each time point. The new proposal automatically makes weighting adjustments empirically. The new test statistic is aimed at a one-sided general alternative hypothesis and is distributed with a short right tail under the null hypothesis but with a heavy tail under the alternative. The results from extensive numerical studies demonstrate that the new procedure performs well under various general alternatives with a caution of a minor inflation of the type I error rate when the sample size is small or the number of observed events is small. The survival data from a recent cancer comparative study are utilized for illustrating the implementation of the process. Copyright © 2015 John Wiley & Sons, Ltd.
VizieR Online Data Catalog: Classification of 2XMM variable sources (Lo+, 2014)
NASA Astrophysics Data System (ADS)
Lo, K. K.; Farrell, S.; Murphy, T.; Gaensler, B. M.
2017-06-01
The 2XMMi-DR2 catalog (Cat. IX/40) consists of observations made with the XMM-Newton satellite between 2000 and 2008 and covers a sky area of about 420 deg2. The observations were made using the European Photon Imaging Camera (EPIC) that consists of three CCD cameras - pn, MOS1, and MOS2 - and covers the energy range from 0.2 keV to 12 keV. There are 221012 unique sources in 2XMM-DR2, of which 2267 were flagged as variable by the XMM processing pipeline (Watson et al. 2009, J/A+A/493/339). The variability test used by the pipeline is a {Chi}2 test against the null hypothesis that the source flux is constant, with the probability threshold set at 10-5. (1 data file).
Testing for a Debt-Threshold Effect on Output Growth.
Lee, Sokbae; Park, Hyunmin; Seo, Myung Hwan; Shin, Youngki
2017-12-01
Using the Reinhart-Rogoff dataset, we find a debt threshold not around 90 per cent but around 30 per cent, above which the median real gross domestic product (GDP) growth falls abruptly. Our work is the first to formally test for threshold effects in the relationship between public debt and median real GDP growth. The null hypothesis of no threshold effect is rejected at the 5 per cent significance level for most cases. While we find no evidence of a threshold around 90 per cent, our findings from the post-war sample suggest that the debt threshold for economic growth may exist around a relatively small debt-to-GDP ratio of 30 per cent. Furthermore, countries with debt-to-GDP ratios above 30 per cent have GDP growth that is 1 percentage point lower at the median.
Testing for a Debt‐Threshold Effect on Output Growth†
Lee, Sokbae; Park, Hyunmin; Seo, Myung Hwan; Shin, Youngki
2017-01-01
Abstract Using the Reinhart–Rogoff dataset, we find a debt threshold not around 90 per cent but around 30 per cent, above which the median real gross domestic product (GDP) growth falls abruptly. Our work is the first to formally test for threshold effects in the relationship between public debt and median real GDP growth. The null hypothesis of no threshold effect is rejected at the 5 per cent significance level for most cases. While we find no evidence of a threshold around 90 per cent, our findings from the post‐war sample suggest that the debt threshold for economic growth may exist around a relatively small debt‐to‐GDP ratio of 30 per cent. Furthermore, countries with debt‐to‐GDP ratios above 30 per cent have GDP growth that is 1 percentage point lower at the median. PMID:29263562
Automation and Evaluation of the SOWH Test with SOWHAT.
Church, Samuel H; Ryan, Joseph F; Dunn, Casey W
2015-11-01
The Swofford-Olsen-Waddell-Hillis (SOWH) test evaluates statistical support for incongruent phylogenetic topologies. It is commonly applied to determine if the maximum likelihood tree in a phylogenetic analysis is significantly different than an alternative hypothesis. The SOWH test compares the observed difference in log-likelihood between two topologies to a null distribution of differences in log-likelihood generated by parametric resampling. The test is a well-established phylogenetic method for topology testing, but it is sensitive to model misspecification, it is computationally burdensome to perform, and its implementation requires the investigator to make several decisions that each have the potential to affect the outcome of the test. We analyzed the effects of multiple factors using seven data sets to which the SOWH test was previously applied. These factors include a number of sample replicates, likelihood software, the introduction of gaps to simulated data, the use of distinct models of evolution for data simulation and likelihood inference, and a suggested test correction wherein an unresolved "zero-constrained" tree is used to simulate sequence data. To facilitate these analyses and future applications of the SOWH test, we wrote SOWHAT, a program that automates the SOWH test. We find that inadequate bootstrap sampling can change the outcome of the SOWH test. The results also show that using a zero-constrained tree for data simulation can result in a wider null distribution and higher p-values, but does not change the outcome of the SOWH test for most of the data sets tested here. These results will help others implement and evaluate the SOWH test and allow us to provide recommendations for future applications of the SOWH test. SOWHAT is available for download from https://github.com/josephryan/SOWHAT. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Bowden, Stephen C; Saklofske, Donald H; Weiss, Lawrence G
2011-06-01
Examination of measurement invariance provides a powerful method to evaluate the hypothesis that the same set of psychological constructs underlies a set of test scores in different populations. If measurement invariance is observed, then the same psychological meaning can be ascribed to scores in both populations. In this study, the measurement model including core and supplementary subtests of the Wechsler Adult Intelligence Scale-Fourth edition (WAIS-IV) were compared across the U.S. and Canadian standardization samples. Populations were compared on the 15 subtest version of the test in people aged 70 and younger and on the 12 subtest version in people aged 70 or older. Results indicated that a slightly modified version of the four-factor model reported in the WAIS-IV technical manual provided the best fit in both populations and in both age groups. The null hypothesis of measurement invariance across populations was not rejected, and the results provide direct evidence for the generalizability of convergent and discriminant validity studies with the WAIS-IV across populations. Small to medium differences in latent means favoring Canadians highlight the value of local norms.
The importance of topographically corrected null models for analyzing ecological point processes.
McDowall, Philip; Lynch, Heather J
2017-07-01
Analyses of point process patterns and related techniques (e.g., MaxEnt) make use of the expected number of occurrences per unit area and second-order statistics based on the distance between occurrences. Ecologists working with point process data often assume that points exist on a two-dimensional x-y plane or within a three-dimensional volume, when in fact many observed point patterns are generated on a two-dimensional surface existing within three-dimensional space. For many surfaces, however, such as the topography of landscapes, the projection from the surface to the x-y plane preserves neither area nor distance. As such, when these point patterns are implicitly projected to and analyzed in the x-y plane, our expectations of the point pattern's statistical properties may not be met. When used in hypothesis testing, we find that the failure to account for the topography of the generating surface may bias statistical tests that incorrectly identify clustering and, furthermore, may bias coefficients in inhomogeneous point process models that incorporate slope as a covariate. We demonstrate the circumstances under which this bias is significant, and present simple methods that allow point processes to be simulated with corrections for topography. These point patterns can then be used to generate "topographically corrected" null models against which observed point processes can be compared. © 2017 by the Ecological Society of America.
Zhang, Jing; Boyes, Victoria; Festy, Frederic; Lynch, Richard J M; Watson, Timothy F; Banerjee, Avijit
2018-05-08
To test the null hypothesis that chitosan application has no impact on the remineralisation of artificial incipient enamel white spot lesions (WSLs). 66 artificial enamel WSLs were assigned to 6 experimental groups (n=11): (1) bioactive glass slurry, (2) bioactive glass containing polyacrylic acid (BG+PAA) slurry, (3) chitosan pre-treated WSLs with BG slurry (CS-BG), (4) chitosan pre-treated WSLs with BG+PAA slurry (CS-BG+PAA), (5) remineralisation solution (RS) and (6) de-ionised water (negative control, NC). Surface and cross-sectional Raman intensity mapping (960cm -1 ) were performed on 5 samples/group to assess mineral content. Raman spectroscopy and attenuated total reflectance Fourier transform infrared spectroscopy (ATR-FTIR) were used to identify the type of newly formed minerals. Surface and cross-sectional Knoop microhardness were implemented to evaluate the mechanical properties after remineralisation. Surface morphologies and Ca/P ratio were observed using scanning electron microscopy (SEM) coupled with energy dispersive X-ray spectroscopy (EDX). Data were statistically analysed using one-way ANOVA with Tukey's test. BG+PAA, CS-BG, RS presented significantly higher mineral regain compared to NC on lesion surfaces, while CS-BG+PAA had higher subsurface mineral content. Newly mineralised crystals consist of type-B hydroxycarbonate apatite. CS-BG+PAA showed the greatest hardness recovery, followed by CS-BG, both significantly higher than other groups. SEM observations showed altered surface morphologies in all experimental groups except NC post-treatment. EDX suggested a higher content of carbon, oxygen and silicon in the precipitations in CS-BG+PAA group. There was no significant difference between each group in terms of Ca/P ratio. The null hypothesis was rejected. Chitosan pre-treatment enhanced WSL remineralisation with either BG only or with BG-PAA complexes. A further investigation using dynamic remineralisation/demineralisation system is required with regards to clinical application. Copyright © 2018 The Academy of Dental Materials. Published by Elsevier Inc. All rights reserved.
Harnessing Multivariate Statistics for Ellipsoidal Data in Structural Geology
NASA Astrophysics Data System (ADS)
Roberts, N.; Davis, J. R.; Titus, S.; Tikoff, B.
2015-12-01
Most structural geology articles do not state significance levels, report confidence intervals, or perform regressions to find trends. This is, in part, because structural data tend to include directions, orientations, ellipsoids, and tensors, which are not treatable by elementary statistics. We describe a full procedural methodology for the statistical treatment of ellipsoidal data. We use a reconstructed dataset of deformed ooids in Maryland from Cloos (1947) to illustrate the process. Normalized ellipsoids have five degrees of freedom and can be represented by a second order tensor. This tensor can be permuted into a five dimensional vector that belongs to a vector space and can be treated with standard multivariate statistics. Cloos made several claims about the distribution of deformation in the South Mountain fold, Maryland, and we reexamine two particular claims using hypothesis testing: 1) octahedral shear strain increases towards the axial plane of the fold; 2) finite strain orientation varies systematically along the trend of the axial trace as it bends with the Appalachian orogen. We then test the null hypothesis that the southern segment of South Mountain is the same as the northern segment. This test illustrates the application of ellipsoidal statistics, which combine both orientation and shape. We report confidence intervals for each test, and graphically display our results with novel plots. This poster illustrates the importance of statistics in structural geology, especially when working with noisy or small datasets.
Revisiting tests for neglected nonlinearity using artificial neural networks.
Cho, Jin Seo; Ishida, Isao; White, Halbert
2011-05-01
Tests for regression neglected nonlinearity based on artificial neural networks (ANNs) have so far been studied by separately analyzing the two ways in which the null of regression linearity can hold. This implies that the asymptotic behavior of general ANN-based tests for neglected nonlinearity is still an open question. Here we analyze a convenient ANN-based quasi-likelihood ratio statistic for testing neglected nonlinearity, paying careful attention to both components of the null. We derive the asymptotic null distribution under each component separately and analyze their interaction. Somewhat remarkably, it turns out that the previously known asymptotic null distribution for the type 1 case still applies, but under somewhat stronger conditions than previously recognized. We present Monte Carlo experiments corroborating our theoretical results and showing that standard methods can yield misleading inference when our new, stronger regularity conditions are violated.
Homogeneity tests of clustered diagnostic markers with applications to the BioCycle Study
Tang, Liansheng Larry; Liu, Aiyi; Schisterman, Enrique F.; Zhou, Xiao-Hua; Liu, Catherine Chun-ling
2014-01-01
Diagnostic trials often require the use of a homogeneity test among several markers. Such a test may be necessary to determine the power both during the design phase and in the initial analysis stage. However, no formal method is available for the power and sample size calculation when the number of markers is greater than two and marker measurements are clustered in subjects. This article presents two procedures for testing the accuracy among clustered diagnostic markers. The first procedure is a test of homogeneity among continuous markers based on a global null hypothesis of the same accuracy. The result under the alternative provides the explicit distribution for the power and sample size calculation. The second procedure is a simultaneous pairwise comparison test based on weighted areas under the receiver operating characteristic curves. This test is particularly useful if a global difference among markers is found by the homogeneity test. We apply our procedures to the BioCycle Study designed to assess and compare the accuracy of hormone and oxidative stress markers in distinguishing women with ovulatory menstrual cycles from those without. PMID:22733707
On the occurrence of false positives in tests of migration under an isolation with migration model
Hey, Jody; Chung, Yujin; Sethuraman, Arun
2015-01-01
The population genetic study of divergence is often done using a Bayesian genealogy sampler, like those implemented in IMa2 and related programs, and these analyses frequently include a likelihood-ratio test of the null hypothesis of no migration between populations. Cruickshank and Hahn (2014, Molecular Ecology, 23, 3133–3157) recently reported a high rate of false positive test results with IMa2 for data simulated with small numbers of loci under models with no migration and recent splitting times. We confirm these findings and discover that they are caused by a failure of the assumptions underlying likelihood ratio tests that arises when using marginal likelihoods for a subset of model parameters. We also show that for small data sets, with little divergence between samples from two populations, an excellent fit can often be found by a model with a low migration rate and recent splitting time and a model with a high migration rate and a deep splitting time. PMID:26456794
ERIC Educational Resources Information Center
Ejionueme, L. K.; Oyoyo, Anthonia Oluchi
2015-01-01
The study was conducted to investigate the application of Total Quality Management (TQM) in secondary school administration in Umuahia Education Zone. Three research questions and one null hypothesis guided the study. Descriptive survey design was employed for the study. The population of the study comprised 1365 administrators. Multi-stage…
Fraternity as "Enabling Environment:" Does Membership Lead to Gambling Problems?
ERIC Educational Resources Information Center
Biddix, J. Patrick; Hardy, Thomas W.
2008-01-01
Researchers have suggested that fraternity membership is the most reliable predictor of gambling and gambling problems on campus. The purpose of this study was to determine if problematic gambling could be linked to specific aspects of fraternity membership. Though the null hypothesis (no enabling environment) failed to be rejected, descriptive…
Unsecure School Environment and School Phobic Behavior
ERIC Educational Resources Information Center
Tukur, Abubakar Hamman; Muhammad, Khadijatu
2017-01-01
This study determines the level of student's school phobic behavior as a result of insecurity of school environment. The study was guided by one research question and one null hypothesis. The population of the study was all the secondary schools in Maiduguri, Borno state numbering about the same of the study was senior secondary students in…
I Am 95% Confident That the Earth is Round: An Interview about Statistics with Chris Spatz.
ERIC Educational Resources Information Center
Dillon, Kathleen M.
1999-01-01
Presents an interview with Chris Spatz who is a professor of psychology at Hendrix College in Conway (Arkansas). Discusses the null hypothesis statistical texts (NHST) and the arguments for and against the use of NHST, the changes in research articles, textbook changes, and the Internet. (CMK)
Disadvantages of the Horsfall-Barratt Scale for estimating severity of citrus canker
USDA-ARS?s Scientific Manuscript database
Direct visual estimation of disease severity to the nearest percent was compared to using the Horsfall-Barratt (H-B) scale. Data from a simulation model designed to sample two diseased populations were used to investigate the probability of the two methods to reject a null hypothesis (H0) using a t-...
Association between Propionibacterium acnes and frozen shoulder: a pilot study.
Bunker, Tim D; Boyd, Matthew; Gallacher, Sian; Auckland, Cressida R; Kitson, Jeff; Smith, Chris D
2014-10-01
Frozen shoulder has not previously been shown to be associated with infection. The present study set out to confirm the null hypothesis that there is no relationship between infection and frozen shoulder using two modern scientific methods, extended culture and polymerase chain reaction (PCR) for bacterial nucleic acids. A prospective cohort of 10 patients undergoing arthroscopic release for stage II idiopathic frozen shoulder had two biopsies of tissue taken from the affected shoulder joint capsule at the time of surgery, along with control biopsies of subdermal fat. The biopsies and controls were examined with extended culture and PCR for microbial nucleic acid. Eight of the 10 patients had positive findings on extended culture in their shoulder capsule and, in six of these, Propionibacterium acnes was present. The findings mean that we must reject the null hypothesis that there is no relationship between infection and frozen shoulder. More studies are urgently needed to confirm or refute these findings. If they are confirmed, this could potentially lead to new and effective treatments for this common, painful and disabling condition. Could P. acnes be the Helicobacter of frozen shoulder?
Size, time, and asynchrony matter: the species-area relationship for parasites of freshwater fishes.
Zelmer, Derek A
2014-10-01
The tendency to attribute species-area relationships to "island biogeography" effectively bypasses the examination of specific mechanisms that act to structure parasite communities. Positive covariation between fish size and infrapopulation richness should not be examined within the typical extinction-based paradigm, but rather should be addressed from the standpoint of differences in colonization potential among individual hosts. Although most mechanisms producing the aforementioned pattern constitute some variation of passive sampling, the deterministic aspects of the accumulation of parasite individuals by fish hosts makes untenable the suggestion that infracommunities of freshwater fishes are stochastic assemblages. At the component community level, application of extinction-dependent mechanisms might be appropriate, given sufficient time for colonization, but these structuring forces likely act indirectly through their effects on the host community to increase the probability of parasite persistence. At all levels, the passive sampling hypothesis is a relevant null model. The tendency for mechanisms that produce species-area relationships to produce nested subset patterns means that for most systems, the passive sampling hypothesis can be addressed through the application of appropriate null models of nested subset structure.
Investigating soil moisture feedbacks on precipitation with tests of Granger causality
NASA Astrophysics Data System (ADS)
Salvucci, Guido D.; Saleem, Jennifer A.; Kaufmann, Robert
Granger causality (GC) is used in the econometrics literature to identify the presence of one- and two-way coupling between terms in noisy multivariate dynamical systems. Here we test for the presence of GC to identify a soil moisture ( S) feedback on precipitation ( P) using data from Illinois. In this framework S is said to Granger cause P if F(P t|Ω t- Δt )≠F(P t|Ω t- Δt -S t- Δt ) where F denotes the conditional distribution of P, Ω t- Δt represents the set of all knowledge available at time t-Δ t, and Ω t- Δt -S t- Δt represents all knowledge except S. Critical for land-atmosphere interaction research is that Ω t- Δt includes all past information on P as well as S. Therefore that part of the relation between past soil moisture and current precipitation which results from precipitation autocorrelation and soil water balance will be accounted for and not attributed to causality. Tests for GC usually specify all relevant variables in a coupled vector autoregressive (VAR) model and then calculate the significance level of decreased predictability as various coupling coefficients are omitted. But because the data (daily precipitation and soil moisture) are distinctly non-Gaussian, we avoid using a VAR and instead express the daily precipitation events as a Markov model. We then test whether the probability of storm occurrence, conditioned on past information on precipitation, changes with information on soil moisture. Past information on precipitation is expressed both as the occurrence of previous day precipitation (to account for storm-scale persistence) and as a simple soil moisture-like precipitation-wetness index derived solely from precipitation (to account for seasonal-scale persistence). In this way only those fluctuations in moisture not attributable to past fluctuations in precipitation (e.g., those due to temperature) can influence the outcome of the test. The null hypothesis (no moisture influence) is evaluated by comparing observed changes in storm probability to Monte-Carlo simulated differences generated with unconditional occurrence probabilities. The null hypothesis is not rejected ( p>0.5) suggesting that contrary to recently published results, insufficient evidence exists to support an influence of soil moisture on precipitation in Illinois.
Mickenautsch, Steffen; Yengopal, Veerasamy
2013-01-01
Background Naïve-indirect comparisons are comparisons between competing clinical interventions’ evidence from separate (uncontrolled) trials. Direct comparisons are comparisons within randomised control trials (RCTs). The objective of this empirical study is to test the null-hypothesis that trends and performance differences inferred from naïve-indirect comparisons and from direct comparisons/RCTs regarding the failure rates of amalgam and direct high-viscosity glass-ionomer cement (HVGIC) restorations in permanent posterior teeth have similar direction and magnitude. Methods A total of 896 citations were identified through systematic literature search. From these, ten and two uncontrolled clinical longitudinal studies for HVGIC and amalgam, respectively, were included for naïve-indirect comparison and could be matched with three out twenty RCTs. Summary effects sizes were computed as Odds ratios (OR; 95% Confidence intervals) and compared with those from RCTs. Trend directions were inferred from 95% Confidence interval overlaps and direction of point estimates; magnitudes of performance differences were inferred from the median point estimates (OR) with 25% and 75% percentile range, for both types of comparison. Mann-Whitney U test was applied to test for statistically significant differences between point estimates of both comparison types. Results Trends and performance differences inferred from naïve-indirect comparison based on evidence from uncontrolled clinical longitudinal studies and from direct comparisons based on RCT evidence are not the same. The distributions of the point estimates differed significantly for both comparison types (Mann–Whitney U = 25, nindirect = 26; ndirect = 8; p = 0.0013, two-tailed). Conclusion The null-hypothesis was rejected. Trends and performance differences inferred from either comparison between HVGIC and amalgam restorations failure rates in permanent posterior teeth are not the same. It is recommended that clinical practice guidance regarding HVGICs should rest on direct comparisons via RCTs and not on naïve-indirect comparisons based on uncontrolled longitudinal studies in order to avoid inflation of effect estimates. PMID:24205220
Mickenautsch, Steffen; Yengopal, Veerasamy
2013-01-01
Naïve-indirect comparisons are comparisons between competing clinical interventions' evidence from separate (uncontrolled) trials. Direct comparisons are comparisons within randomised control trials (RCTs). The objective of this empirical study is to test the null-hypothesis that trends and performance differences inferred from naïve-indirect comparisons and from direct comparisons/RCTs regarding the failure rates of amalgam and direct high-viscosity glass-ionomer cement (HVGIC) restorations in permanent posterior teeth have similar direction and magnitude. A total of 896 citations were identified through systematic literature search. From these, ten and two uncontrolled clinical longitudinal studies for HVGIC and amalgam, respectively, were included for naïve-indirect comparison and could be matched with three out twenty RCTs. Summary effects sizes were computed as Odds ratios (OR; 95% Confidence intervals) and compared with those from RCTs. Trend directions were inferred from 95% Confidence interval overlaps and direction of point estimates; magnitudes of performance differences were inferred from the median point estimates (OR) with 25% and 75% percentile range, for both types of comparison. Mann-Whitney U test was applied to test for statistically significant differences between point estimates of both comparison types. Trends and performance differences inferred from naïve-indirect comparison based on evidence from uncontrolled clinical longitudinal studies and from direct comparisons based on RCT evidence are not the same. The distributions of the point estimates differed significantly for both comparison types (Mann-Whitney U = 25, n(indirect) = 26; n(direct) = 8; p = 0.0013, two-tailed). The null-hypothesis was rejected. Trends and performance differences inferred from either comparison between HVGIC and amalgam restorations failure rates in permanent posterior teeth are not the same. It is recommended that clinical practice guidance regarding HVGICs should rest on direct comparisons via RCTs and not on naïve-indirect comparisons based on uncontrolled longitudinal studies in order to avoid inflation of effect estimates.
NASA Astrophysics Data System (ADS)
Yang, Zhongming; Dou, Jiantai; Du, Jinyu; Gao, Zhishan
2018-03-01
Non-null interferometry could use to measure the radius of curvature (ROC), we have presented a virtual quadratic Newton rings phase-shifting moiré-fringes measurement method for large ROC measurement (Yang et al., 2016). In this paper, we propose a large ROC measurement method based on the evaluation of the interferogram-quality metric by the non-null interferometer. With the multi-configuration model of the non-null interferometric system in ZEMAX, the retrace errors and the phase introduced by the test surface are reconstructed. The interferogram-quality metric is obtained by the normalized phase-shifted testing Newton rings with the spherical surface model in the non-null interferometric system. The radius curvature of the test spherical surface can be obtained until the minimum of the interferogram-quality metric is found. Simulations and experimental results are verified the feasibility of our proposed method. For a spherical mirror with a ROC of 41,400 mm, the measurement accuracy is better than 0.13%.
Model-based phase-shifting interferometer
NASA Astrophysics Data System (ADS)
Liu, Dong; Zhang, Lei; Shi, Tu; Yang, Yongying; Chong, Shiyao; Miao, Liang; Huang, Wei; Shen, Yibing; Bai, Jian
2015-10-01
A model-based phase-shifting interferometer (MPI) is developed, in which a novel calculation technique is proposed instead of the traditional complicated system structure, to achieve versatile, high precision and quantitative surface tests. In the MPI, the partial null lens (PNL) is employed to implement the non-null test. With some alternative PNLs, similar as the transmission spheres in ZYGO interferometers, the MPI provides a flexible test for general spherical and aspherical surfaces. Based on modern computer modeling technique, a reverse iterative optimizing construction (ROR) method is employed for the retrace error correction of non-null test, as well as figure error reconstruction. A self-compiled ray-tracing program is set up for the accurate system modeling and reverse ray tracing. The surface figure error then can be easily extracted from the wavefront data in forms of Zernike polynomials by the ROR method. Experiments of the spherical and aspherical tests are presented to validate the flexibility and accuracy. The test results are compared with those of Zygo interferometer (null tests), which demonstrates the high accuracy of the MPI. With such accuracy and flexibility, the MPI would possess large potential in modern optical shop testing.
No Evidence of Periodic Variability in the Light Curve of Active Galaxy J0045+41
NASA Astrophysics Data System (ADS)
Barth, Aaron J.; Stern, Daniel
2018-05-01
Dorn-Wallenstein, Levesque, & Ruan recently presented the identification of a z = 0.215 active galaxy located behind M31 and claimed the detection of multiple periodic variations in the object’s light curve with as many as nine different periods. They interpreted these results as evidence of the presence of a binary supermassive black hole with an orbital separation of just a few hundred au, and estimated the gravitational-wave signal implied by such a system. We demonstrate that the claimed periodicities are based on a misinterpretation of the null hypothesis test simulations and an error in the method used to calculate the false alarm probabilities. There is no evidence of periodicity in the data.
Are the Effects of Response Inhibition on Gambling Long-Lasting?
Verbruggen, Frederick; Adams, Rachel C.; van ‘t Wout, Felice; Stevens, Tobias; McLaren, Ian P. L.; Chambers, Christopher D.
2013-01-01
A recent study has shown that short-term training in response inhibition can make people more cautious for up to two hours when making decisions. However, the longevity of such training effects is unclear. In this study we tested whether training in the stop-signal paradigm reduces risky gambling when the training and gambling task are separated by 24 hours. Two independent experiments revealed that the aftereffects of stop-signal training are negligible after 24 hours. This was supported by Bayes factors that provided strong support for the null hypothesis. These findings indicate the need to better optimise the parameters of inhibition training to achieve clinical efficacy, potentially by strengthening automatic associations between specific stimuli and stopping. PMID:23922948
On the scaling of the distribution of daily price fluctuations in the Mexican financial market index
NASA Astrophysics Data System (ADS)
Alfonso, Léster; Mansilla, Ricardo; Terrero-Escalante, César A.
2012-05-01
In this paper, a statistical analysis of log-return fluctuations of the IPC, the Mexican Stock Market Index is presented. A sample of daily data covering the period from 04/09/2000-04/09/2010 was analyzed, and fitted to different distributions. Tests of the goodness of fit were performed in order to quantitatively asses the quality of the estimation. Special attention was paid to the impact of the size of the sample on the estimated decay of the distributions tail. In this study a forceful rejection of normality was obtained. On the other hand, the null hypothesis that the log-fluctuations are fitted to a α-stable Lévy distribution cannot be rejected at the 5% significance level.
Business cycles and mortality: results from Swedish microdata.
Gerdtham, Ulf-G; Johannesson, Magnus
2005-01-01
We assess the relationship between business cycles and mortality risk using a large individual level data set on over 40,000 individuals in Sweden who were followed for 10-16 years (leading to over 500,000 person-year observations). We test the effect of six alternative business cycle indicators on the mortality risk: the unemployment rate, the notification rate, the deviation from the GDP trend, the GDP change, the industry capacity utilization, and the industry confidence indicator. For men we find a significant countercyclical relationship between the business cycle and the mortality risk for four of the indicators and a non-significant effect for the other two indicators. For women we cannot reject the null hypothesis of no effect for any of the business cycle indicators.
Bayesian inference for psychology, part IV: parameter estimation and Bayes factors.
Rouder, Jeffrey N; Haaf, Julia M; Vandekerckhove, Joachim
2018-02-01
In the psychological literature, there are two seemingly different approaches to inference: that from estimation of posterior intervals and that from Bayes factors. We provide an overview of each method and show that a salient difference is the choice of models. The two approaches as commonly practiced can be unified with a certain model specification, now popular in the statistics literature, called spike-and-slab priors. A spike-and-slab prior is a mixture of a null model, the spike, with an effect model, the slab. The estimate of the effect size here is a function of the Bayes factor, showing that estimation and model comparison can be unified. The salient difference is that common Bayes factor approaches provide for privileged consideration of theoretically useful parameter values, such as the value corresponding to the null hypothesis, while estimation approaches do not. Both approaches, either privileging the null or not, are useful depending on the goals of the analyst.
Xu, Fen; Burk, David; Gao, Zhanguo; Yin, Jun; Zhang, Xia
2012-01-01
The histone deacetylase sirtuin 1 (SIRT1) inhibits adipocyte differentiation and suppresses inflammation by targeting the transcription factors peroxisome proliferator-activated receptor γ and nuclear factor κB. Although this suggests that adiposity and inflammation should be enhanced when SIRT1 activity is inactivated in the body, this hypothesis has not been tested in SIRT1 null (SIRT1−/−) mice. In this study, we addressed this issue by investigating the adipose tissue in SIRT1−/− mice. Compared with their wild-type littermates, SIRT1 null mice exhibited a significant reduction in body weight. In adipose tissue, the average size of adipocytes was smaller, the content of extracellular matrix was lower, adiponectin and leptin were expressed at 60% of normal level, and adipocyte differentiation was reduced. All of these changes were observed with a 50% reduction in capillary density that was determined using a three-dimensional imaging technique. Except for vascular endothelial growth factor, the expression of several angiogenic factors (Pdgf, Hgf, endothelin, apelin, and Tgf-β) was reduced by about 50%. Macrophage infiltration and inflammatory cytokine expression were 70% less in the adipose tissue of null mice and macrophage differentiation was significantly inhibited in SIRT1−/− mouse embryonic fibroblasts in vitro. In wild-type mice, macrophage deletion led to a reduction in vascular density. These data suggest that SIRT1 controls adipose tissue function through regulation of angiogenesis, whose deficiency is associated with macrophage malfunction in SIRT1−/− mice. The study supports the concept that inflammation regulates angiogenesis in the adipose tissue. PMID:22315447
Sim1 Neurons Are Sufficient for MC4R-Mediated Sexual Function in Male Mice.
Semple, Erin; Hill, Jennifer W
2018-01-01
Sexual dysfunction is a poorly understood condition that affects up to one-third of men around the world. Existing treatments that target the periphery do not work for all men. Previous studies have shown that central melanocortins, which are released by pro-opiomelanocortin neurons in the arcuate nucleus of the hypothalamus, can lead to male erection and increased libido. Several studies specifically implicate the melanocortin 4 receptor (MC4R) in the central control of sexual function, but the specific neural circuitry involved is unknown. We hypothesized that single-minded homolog 1 (Sim1) neurons play an important role in the melanocortin-mediated regulation of male sexual behavior. To test this hypothesis, we examined the sexual behavior of mice expressing MC4R only on Sim1-positive neurons (tbMC4Rsim1 mice) in comparison with tbMC4R null mice and wild-type controls. In tbMC4Rsim1 mice, MC4R reexpression was found in the medial amygdala and paraventricular nucleus of the hypothalamus. These mice were paired with sexually experienced females, and their sexual function and behavior was scored based on mounting, intromission, and ejaculation. tbMC4R null mice showed a longer latency to mount, a reduced intromission efficiency, and an inability to reach ejaculation. Expression of MC4R only on Sim1 neurons reversed the sexual deficits seen in tbMC4R null mice. This study implicates melanocortin signaling via the MC4R on Sim1 neurons in the central control of male sexual behavior. Copyright © 2018 Endocrine Society.
Corrosion behavior of self-ligating and conventional metal brackets.
Maia, Lúcio Henrique Esmeraldo Gurgel; Lopes Filho, Hibernon; Ruellas, Antônio Carlos de Oliveira; Araújo, Mônica Tirre de Souza; Vaitsman, Delmo Santiago
2014-01-01
To test the null hypothesis that the aging process in self-ligating brackets is not higher than in conventional brackets. Twenty-five conventional (GN-3M/Unitek; GE-GAC; VE-Aditek) and 25 self-ligating (SCs-3M/Unitek; INs-GAC; ECs-Aditek) metal brackets from three manufacturers (n = 150) were submitted to aging process in 0.9% NaCl solution at a constant temperature of 37 ± 1°C for 21 days. The content of nickel, chromium and iron ions in the solution collected at intervals of 7, 14 and 21 days was quantified by atomic absorption spectrophotometry. After the aging process, the brackets were analyzed by scanning electron microscopy (SEM) under 22X and 1,000X magnifications. Comparison of metal release in self-ligating and conventional brackets from the same manufacturer proved that the SCs group released more nickel (p < 0.05) than the GN group after 7 and 14 days, but less chromium (p < 0.05) after 14 days and less iron (p < 0.05) at the three experimental time intervals. The INs group released less iron (p < 0.05) than the GE group after 7 days and less nickel, chromium and iron (p < 0.05) after 14 and 21 days. The ECs group released more nickel, chromium and iron (p < 0.05) than the VE group after 14 days, but released less nickel and chromium (p < 0.05) after 7 days and less chromium and iron (p < 0.05) after 21 days. The SEM analysis revealed alterations on surface topography of conventional and self-ligating brackets. The aging process in self-ligating brackets was not greater than in conventional brackets from the same manufacturer. The null hypothesis was accepted.
Hydrophilicity of dentin bonding systems influences in vitro Streptococcus mutans biofilm formation
Brambilla, Eugenio; Ionescu, Andrei; Mazzoni, Annalisa; Cadenaro, Milena; Gagliani, Massimo; Ferraroni, Monica; Tay, Franklin; Pashley, David; Breschi, Lorenzo
2014-01-01
Objectives To evaluate in vitro Streptococcus mutans (S. mutans) biofilm formation on the surface of five light-curing experimental dental bonding systems (DBS) with increasing hydrophilicity. The null hypothesis tested was that resin chemical composition and hydrophilicity does not affect S. mutans biofilm formation. Methods Five light-curing versions of experimental resin blends with increasing hydrophilicity were investigated (R1, R2, R3, R4 and R5). R1 and R2 contained ethoxylated BisGMA/TEGDMA or BisGMA/TEGDMA, respectively, and were very hydrophobic, were representative of pit-and-fissure bonding agents. R3 was representative of a typical two-step etch- and-rinse adhesive, while R4 and R5 were very hydrophilic resins analogous to self-etching adhesives. Twenty-eight disks were prepared for each resin blend. After a 24 h-incubation at 37 °C, a multilayer monospecific biofilm of S. mutans was obtained on the surface of each disk. The adherent biomass was determined using the MTT assay and evaluated morphologically with confocal laser scanning microscopy (CLSM) and scanning electron microscopy (SEM). Results R2 and R3 surfaces showed the highest biofilm formation while R1 and R4 showed a similar intermediate biofilm formation. R5 was more hydrophilic and acidic and was significantly less colonized than all the other resins. A significant quadratic relationship between biofilm formation and hydrophilicity of the resin blends was found. CLSM and SEM evaluation confirmed MTT assay results. Conclusions The null hypothesis was rejected since S. mutans biofilm formation was influenced by hydrophilicity, surface acidity and chemical composition of the experimental resins. Further studies using a bioreactor are needed to confirm the results and clarify the role of the single factors. PMID:24954666
Brain, Richard A; Teed, R Scott; Bang, JiSu; Thorbek, Pernille; Perine, Jeff; Peranginangin, Natalia; Kim, Myoungwoo; Valenti, Ted; Chen, Wenlin; Breton, Roger L; Rodney, Sara I; Moore, Dwayne R J
2015-01-01
Simple, deterministic screening-level assessments that are highly conservative by design facilitate a rapid initial screening to determine whether a pesticide active ingredient has the potential to adversely affect threatened or endangered species. If a worst-case estimate of pesticide exposure is below a very conservative effects metric (e.g., the no observed effects concentration of the most sensitive tested surrogate species) then the potential risks are considered de minimis and unlikely to jeopardize the existence of a threatened or endangered species. Thus by design, such compounded layers of conservatism are intended to minimize potential Type II errors (failure to reject a false null hypothesis of de minimus risk), but correspondingly increase Type I errors (falsely reject a null hypothesis of de minimus risk). Because of the conservatism inherent in screening-level risk assessments, higher-tier scientific information and analyses that provide additional environmental realism can be applied in cases where a potential risk has been identified. This information includes community-level effects data, environmental fate and exposure data, monitoring data, geospatial location and proximity data, species biology data, and probabilistic exposure and population models. Given that the definition of "risk" includes likelihood and magnitude of effect, higher-tier risk assessments should use probabilistic techniques that more accurately and realistically characterize risk. Moreover, where possible and appropriate, risk assessments should focus on effects at the population and community levels of organization rather than the more traditional focus on the organism level. This document provides a review of some types of higher-tier data and assessment refinements available to more accurately and realistically evaluate potential risks of pesticide use to threatened and endangered species. © 2014 SETAC.
Corrosion behavior of self-ligating and conventional metal brackets
Maia, Lúcio Henrique Esmeraldo Gurgel; Lopes Filho, Hibernon; Ruellas, Antônio Carlos de Oliveira; Araújo, Mônica Tirre de Souza; Vaitsman, Delmo Santiago
2014-01-01
Objective To test the null hypothesis that the aging process in self-ligating brackets is not higher than in conventional brackets. Methods Twenty-five conventional (GN-3M/Unitek; GE-GAC; VE-Aditek) and 25 self-ligating (SCs-3M/Unitek; INs-GAC; ECs-Aditek) metal brackets from three manufacturers (n = 150) were submitted to aging process in 0.9% NaCl solution at a constant temperature of 37 ± 1ºC for 21 days. The content of nickel, chromium and iron ions in the solution collected at intervals of 7, 14 and 21 days was quantified by atomic absorption spectrophotometry. After the aging process, the brackets were analyzed by scanning electron microscopy (SEM) under 22X and 1,000X magnifications. Results Comparison of metal release in self-ligating and conventional brackets from the same manufacturer proved that the SCs group released more nickel (p < 0.05) than the GN group after 7 and 14 days, but less chromium (p < 0.05) after 14 days and less iron (p < 0.05) at the three experimental time intervals. The INs group released less iron (p < 0.05) than the GE group after 7 days and less nickel, chromium and iron (p < 0.05) after 14 and 21 days. The ECs group released more nickel, chromium and iron (p < 0.05) than the VE group after 14 days, but released less nickel and chromium (p < 0.05) after 7 days and less chromium and iron (p < 0.05) after 21 days. The SEM analysis revealed alterations on surface topography of conventional and self-ligating brackets. Conclusions The aging process in self-ligating brackets was not greater than in conventional brackets from the same manufacturer. The null hypothesis was accepted. PMID:24945521
Interpreting observational studies: why empirical calibration is needed to correct p-values
Schuemie, Martijn J; Ryan, Patrick B; DuMouchel, William; Suchard, Marc A; Madigan, David
2014-01-01
Often the literature makes assertions of medical product effects on the basis of ‘ p < 0.05’. The underlying premise is that at this threshold, there is only a 5% probability that the observed effect would be seen by chance when in reality there is no effect. In observational studies, much more than in randomized trials, bias and confounding may undermine this premise. To test this premise, we selected three exemplar drug safety studies from literature, representing a case–control, a cohort, and a self-controlled case series design. We attempted to replicate these studies as best we could for the drugs studied in the original articles. Next, we applied the same three designs to sets of negative controls: drugs that are not believed to cause the outcome of interest. We observed how often p < 0.05 when the null hypothesis is true, and we fitted distributions to the effect estimates. Using these distributions, we compute calibrated p-values that reflect the probability of observing the effect estimate under the null hypothesis, taking both random and systematic error into account. An automated analysis of scientific literature was performed to evaluate the potential impact of such a calibration. Our experiment provides evidence that the majority of observational studies would declare statistical significance when no effect is present. Empirical calibration was found to reduce spurious results to the desired 5% level. Applying these adjustments to literature suggests that at least 54% of findings with p < 0.05 are not actually statistically significant and should be reevaluated. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:23900808
Sheehan, Frances T; Borotikar, Bhushan S; Behnam, Abrahm J; Alter, Katharine E
2012-07-01
A potential source of patellofemoral pain, one of the most common problems of the knee, is believed to be altered patellofemoral kinematics due to a force imbalance around the knee. Although no definitive etiology for this imbalance has been found, a weak vastus medialis is considered a primary factor. Therefore, this study's purpose was to determine how the loss of vastus medialis obliquus force alters three-dimensional in vivo knee joint kinematics during a volitional extension task. Eighteen asymptomatic female subjects with no history of knee pain or pathology participated in this IRB approved study. Patellofemoral and tibiofemoral kinematics were derived from velocity data acquired using dynamic cine-phase contrast MRI. The same kinematics were then acquired immediately after administering a motor branch block to the vastus medialis obliquus using 3-5ml of 1% lidocaine. A repeated measures analysis of variance was used to test the null hypothesis that the post- and pre-injection kinematics were no different. The null hypothesis was rejected for patellofemoral lateral shift (P=0.003, max change=1.8mm, standard deviation=1.7mm), tibiofemoral lateral shift (P<0.001, max change=2.1mm, standard deviation=2.9mm), and tibiofemoral external rotation (P<0.001, max change=3.7°, standard deviation=4.4°). The loss of vastus medialis obliquus function produced kinematic changes that mirrored the axial plane kinematics seen in individuals with patellofemoral pain, but could not account for the full extent of these changes. Thus, vastus medialis weakness is likely a major factor in, but not the sole source of, altered patellofemoral kinematics in such individuals. Published by Elsevier Ltd.
Organic Anion Transporting Polypeptide 1a1 Null Mice Are Sensitive to Cholestatic Liver Injury
Zhang, Youcai; Csanaky, Iván L.; Cheng, Xingguo; Lehman-McKeeman, Lois D.; Klaassen, Curtis D.
2012-01-01
Organic anion transporting polypeptide 1a1 (Oatp1a1) is predominantly expressed in livers of mice and is thought to transport bile acids (BAs) from blood into liver. Because Oatp1a1 expression is markedly decreased in mice after bile duct ligation (BDL). We hypothesized that Oatp1a1-null mice would be protected against liver injury during BDL-induced cholestasis due largely to reduced hepatic uptake of BAs. To evaluate this hypothesis, BDL surgeries were performed in both male wild-type (WT) and Oatp1a1-null mice. At 24 h after BDL, Oatp1a1-null mice showed higher serum alanine aminotransferase levels and more severe liver injury than WT mice, and all Oatp1a1-null mice died within 4 days after BDL, whereas all WT mice survived. At 24 h after BDL, surprisingly Oatp1a1-null mice had higher total BA concentrations in livers than WT mice, suggesting that loss of Oatp1a1 did not prevent BA accumulation in the liver. In addition, secondary BAs dramatically increased in serum of Oatp1a1-null BDL mice but not in WT BDL mice. Oatp1a1-null BDL mice had similar basolateral BA uptake (Na+-taurocholate cotransporting polypeptide and Oatp1b2) and BA-efflux (multidrug resistance–associated protein [Mrp]-3, Mrp4, and organic solute transporter α/β) transporters, as well as BA-synthetic enzyme (Cyp7a1) in livers as WT BDL mice. Hepatic expression of small heterodimer partner Cyp3a11, Cyp4a14, and Nqo1, which are target genes of farnesoid X receptor, pregnane X receptor, peroxisome proliferator-activated receptor alpha, and NF-E2-related factor 2, respectively, were increased in WT BDL mice but not in Oatp1a1-null BDL mice. These results demonstrate that loss of Oatp1a1 function exacerbates cholestatic liver injury in mice and suggest that Oatp1a1 plays a unique role in liver adaptive responses to obstructive cholestasis. PMID:22461449
NASA Astrophysics Data System (ADS)
Willenbring, J. K.; Jerolmack, D. J.
2015-12-01
At the largest time and space scales, the pace of erosion and chemical weathering is determined by tectonic uplift rates. Deviations from this equilibrium condition arise from the transient response of landscape denudation to climatic and tectonic perturbations, and may be long lived. We posit that the constraint of mass balance, however, makes it unlikely that such disequilibrium persists at the global scale over millions of years, as has been proposed for late Cenozoic erosion. To support this contention, we synthesize existing data for weathering fluxes, global sedimentation rates, sediment yields and tectonic motions. The records show a remarkable constancy in the pace of Earth-surface evolution over the last 10 million years. These findings provide strong support for the null hypothesis; that global rates of landscape change have remained constant over the last ten million years, despite global climate change and massive mountain building events. Two important implications are: (1) global climate change may not change global denudation rates, because the nature and sign of landscape responses are varied; and (2) tectonic and climatic perturbations are accommodated in the long term by changes in landscape form. This work undermines the hypothesis that increased weathering due to late Cenozoic mountain building or climate change was the primary agent for a decrease in global temperatures.
Alignment of optical system components using an ADM beam through a null assembly
NASA Technical Reports Server (NTRS)
Hayden, Joseph E. (Inventor); Olczak, Eugene G. (Inventor)
2010-01-01
A system for testing an optical surface includes a rangefinder configured to emit a light beam and a null assembly located between the rangefinder and the optical surface. The null assembly is configured to receive and to reflect the emitted light beam toward the optical surface. The light beam reflected from the null assembly is further reflected back from the optical surface toward the null assembly as a return light beam. The rangefinder is configured to measure a distance to the optical surface using the return light beam.
Test of Lorentz and CPT violation with short baseline neutrino oscillation excesses
NASA Astrophysics Data System (ADS)
MiniBooNE Collaboration; Aguilar-Arevalo, A. A.; Anderson, C. E.; Bazarko, A. O.; Brice, S. J.; Brown, B. C.; Bugel, L.; Cao, J.; Coney, L.; Conrad, J. M.; Cox, D. C.; Curioni, A.; Dharmapalan, R.; Djurcic, Z.; Finley, D. A.; Fleming, B. T.; Ford, R.; Garcia, F. G.; Garvey, G. T.; Grange, J.; Green, C.; Green, J. A.; Hart, T. L.; Hawker, E.; Huelsnitz, W.; Imlay, R.; Johnson, R. A.; Karagiorgi, G.; Kasper, P.; Katori, T.; Kobilarcik, T.; Kourbanis, I.; Koutsoliotas, S.; Laird, E. M.; Linden, S. K.; Link, J. M.; Liu, Y.; Liu, Y.; Louis, W. C.; Mahn, K. B. M.; Marsh, W.; Mauger, C.; McGary, V. T.; McGregor, G.; Metcalf, W.; Meyers, P. D.; Mills, F.; Mills, G. B.; Monroe, J.; Moore, C. D.; Mousseau, J.; Nelson, R. H.; Nienaber, P.; Nowak, J. A.; Osmanov, B.; Ouedraogo, S.; Patterson, R. B.; Pavlovic, Z.; Perevalov, D.; Polly, C. C.; Prebys, E.; Raaf, J. L.; Ray, H.; Roe, B. P.; Russell, A. D.; Sandberg, V.; Schirato, R.; Schmitz, D.; Shaevitz, M. H.; Shoemaker, F. C.; Smith, D.; Soderberg, M.; Sorel, M.; Spentzouris, P.; Spitz, J.; Stancu, I.; Stefanski, R. J.; Sung, M.; Tanaka, H. A.; Tayloe, R.; Tzanov, M.; Van de Water, R. G.; Wascko, M. O.; White, D. H.; Wilking, M. J.; Yang, H. J.; Zeller, G. P.; Zimmerman, E. D.
2013-01-01
The sidereal time dependence of MiniBooNE νe and ν appearance data is analyzed to search for evidence of Lorentz and CPT violation. An unbinned Kolmogorov-Smirnov (K-S) test shows both the νe and ν appearance data are compatible with the null sidereal variation hypothesis to more than 5%. Using an unbinned likelihood fit with a Lorentz-violating oscillation model derived from the Standard Model Extension (SME) to describe any excess events over background, we find that the νe appearance data prefer a sidereal time-independent solution, and the ν appearance data slightly prefer a sidereal time-dependent solution. Limits of order 10-20 GeV are placed on combinations of SME coefficients. These limits give the best limits on certain SME coefficients for νμ→νe and ν→ν oscillations. The fit values and limits of combinations of SME coefficients are provided.
Developing a technique to master multiplication facts 6 to 9 for year 5 pupils
NASA Astrophysics Data System (ADS)
Ahmat, Norhayati; Mohamed, Nurul Huda; Azmee, Nor Afzalina; Adham, Sarah Mohd
2017-05-01
This study was performed to enhance the mastery of multiplication facts 6 to 9 amongst the year 5 pupils. The samples of this study were 40 pupils of year 5 from one of primary school at Teluk Intan, Perak. The samples were divided into two groups, the control group and the treatment group where each group consists of 20 pupils respectively. In this study, new multiplication facts technique, known as `Teknik Sifir Jari' has been introduced to the treatment group. The objectives of the study are to test the effectiveness of the new technique and to increase multiplication fact fluency among pupils. The instruments that have been used in this study are the achievement tests (pre-test and post-test). The data obtained were collected and analyzed using SPSS version 21.A t-test had also been carried out to test the null hypothesis. The results showed that there was an increment in the students' achievement for the students who were taught using the `Teknik Sifir Jari'. Therefore, the `Teknik Sifir Jari' has proved its effectiveness to support students in mastering multiplication facts.
Proficiency Testing for Evaluating Aerospace Materials Test Anomalies
NASA Technical Reports Server (NTRS)
Hirsch, D.; Motto, S.; Peyton, S.; Beeson, H.
2006-01-01
ASTM G 86 and ASTM G 74 are commonly used to evaluate materials susceptibility to ignition in liquid and gaseous oxygen systems. However, the methods have been known for their lack of repeatability. The inherent problems identified with the test logic would either not allow precise identification or the magnitude of problems related to running the tests, such as lack of consistency of systems performance, lack of adherence to procedures, etc. Excessive variability leads to increasing instances of accepting the null hypothesis erroneously, and so to the false logical deduction that problems are nonexistent when they really do exist. This paper attempts to develop and recommend an approach that could lead to increased accuracy in problem diagnostics by using the 50% reactivity point, which has been shown to be more repeatable. The initial tests conducted indicate that PTFE and Viton A (for pneumatic impact) and Buna S (for mechanical impact) would be good choices for additional testing and consideration for inter-laboratory evaluations. The approach presented could also be used to evaluate variable effects with increased confidence and tolerance optimization.
Lee, Seong Min; Pike, J Wesley
2016-11-01
The vitamin D receptor (VDR) is a critical mediator of the biological actions of 1,25-dihydroxyvitamin D 3 (1,25(OH) 2 D 3 ). As a nuclear receptor, ligand activation of the VDR leads to the protein's binding to specific sites on the genome that results in the modulation of target gene expression. The VDR is also known to play a role in the hair cycle, an action that appears to be 1,25(OH) 2 D 3 -independent. Indeed, in the absence of the VDR as in hereditary 1,25-dihydroxyvitamin D resistant rickets (HVDRR) both skin defects and alopecia emerge. Recently, we generated a mouse model of HVDRR without alopecia wherein a mutant human VDR lacking 1,25(OH) 2 D 3 -binding activity was expressed in the absence of endogenous mouse VDR. While 1,25(OH) 2 D 3 failed to induce gene expression in these mice, resulting in an extensive skeletal phenotype, the receptor was capable of restoring normal hair cycling. We also noted a level of secondary hyperparathyroidism that was much higher than that seen in the VDR null mouse and was associated with an exaggerated bone phenotype as well. This suggested that the VDR might play a role in parathyroid hormone (PTH) regulation independent of 1,25(OH) 2 D 3 . To evaluate this hypothesis further, we contrasted PTH levels in the HVDRR mouse model with those seen in Cyp27b1 null mice where the VDR was present but the hormone was absent. The data revealed that PTH was indeed higher in Cyp27b1 null mice compared to VDR null mice. To evaluate the mechanism of action underlying such a hypothesis, we measured the expression levels of a number of VDR target genes in the duodena of wildtype mice and in transgenic mice expressing either normal or hormone-binding deficient mutant VDRs. We also compared expression levels of these genes between VDR null mice and Cyp27b1 null mice. In a subset of cases, the expression of VDR target genes was lower in mice containing the VDR as opposed to mice that did not. We suggest that the VDR may function as a selective suppressor/de-repressor of gene expression in the absence of 1,25(OH) 2 D 3 . Copyright © 2015 Elsevier Ltd. All rights reserved.
Wildfire Selectivity for Land Cover Type: Does Size Matter?
Barros, Ana M. G.; Pereira, José M. C.
2014-01-01
Previous research has shown that fires burn certain land cover types disproportionally to their abundance. We used quantile regression to study land cover proneness to fire as a function of fire size, under the hypothesis that they are inversely related, for all land cover types. Using five years of fire perimeters, we estimated conditional quantile functions for lower (avoidance) and upper (preference) quantiles of fire selectivity for five land cover types - annual crops, evergreen oak woodlands, eucalypt forests, pine forests and shrublands. The slope of significant regression quantiles describes the rate of change in fire selectivity (avoidance or preference) as a function of fire size. We used Monte-Carlo methods to randomly permutate fires in order to obtain a distribution of fire selectivity due to chance. This distribution was used to test the null hypotheses that 1) mean fire selectivity does not differ from that obtained by randomly relocating observed fire perimeters; 2) that land cover proneness to fire does not vary with fire size. Our results show that land cover proneness to fire is higher for shrublands and pine forests than for annual crops and evergreen oak woodlands. As fire size increases, selectivity decreases for all land cover types tested. Moreover, the rate of change in selectivity with fire size is higher for preference than for avoidance. Comparison between observed and randomized data led us to reject both null hypotheses tested ( = 0.05) and to conclude it is very unlikely the observed values of fire selectivity and change in selectivity with fire size are due to chance. PMID:24454747
Measurement of steep aspheric surfaces using improved two-wavelength phase-shifting interferometer
NASA Astrophysics Data System (ADS)
Zhang, Liqiong; Wang, Shaopu; Hu, Yao; Hao, Qun
2017-10-01
Optical components with aspheric surfaces can improve the imaging quality of optical systems, and also provide extra advantages such as lighter weight, smaller volume and simper structure. In order to satisfy these performance requirements, the surface error of aspheric surfaces, especially high departure aspheric surfaces must be measured accurately and conveniently. The major obstacle of traditional null-interferometry for aspheric surface under test is that specific and complex null optics need to be designed to fully compensate for the normal aberration of the aspheric surface under test. However, non-null interferometry partially compensating for the aspheric normal aberration can test aspheric surfaces without specific null optics. In this work, a novel non-null test approach of measuring the deviation between aspheric surfaces and the best reference sphere by using improved two-wavelength phase shifting interferometer is described. With the help of the calibration based on reverse iteration optimization, we can effectively remove the retrace error and thus improve the accuracy. Simulation results demonstrate that this method can measure the aspheric surface with the departure of over tens of microns from the best reference sphere, which introduces approximately 500λ of wavefront aberration at the detector.
Replicating Peer-Led Team Learning in Cyberspace: Research, Opportunities, and Challenges
ERIC Educational Resources Information Center
Smith, Joshua; Wilson, Sarah Beth; Banks, Julianna; Zhu, Lin; Varma-Nelson, Pratibha
2014-01-01
This quasi-experimental, mixed methods study examined the transfer of a well-established pedagogical strategy, Peer-Led Team Learning (PLTL), to an online workshop environment (cPLTL) in a general chemistry course at a research university in the Midwest. The null hypothesis guiding the study was that no substantive differences would emerge between…
Kinase-Mediated Regulation of 40S Ribosome Assembly in Human Breast Cancer
2017-02-01
Major Task 2 and Major Task 3 (i) CRISPR /cas9-deleted Ltv1-null TNBC clones that have been purified by Dr. Karbstein by growth in the presence of...cells using CRISPR /Cas9 technology proved difficult, as our hypothesis was correct, where Ltv1 is essential for proper growth of TNBC. In particular
Management by Objectives (MBO) Imperatives for Transforming Higher Education for a Globalised World
ERIC Educational Resources Information Center
Ofojebe, Wenceslaus N.; Olibie, Eyiuche Ifeoma
2014-01-01
This study was conducted to determine the extent to which the stipulations and visions of Management by Objectives (MBO) would be integrated in higher education institutions in South Eastern Nigeria to enhance higher education transformation in a globalised world. Four research questions and a null hypothesis guided the study. A sample of 510…
Mini-Versus Traditional: An Experimental Study of High School Social Studies Curricula.
ERIC Educational Resources Information Center
Roberts, Arthur D.; Gable, Robert K.
This study assessed some of the cognitive and affective elements for both the traditional and mini curricula. The hypothesis, stated in the null form, was there will be no difference between students in the mini-course curriculum and the traditional curriculum on a number of stated cognitive variables (focusing on critical thinking and reading…
ERIC Educational Resources Information Center
Sehati, Samira; Khodabandehlou, Morteza
2017-01-01
The present investigation was an attempt to study on the effect of power point enhanced teaching (visual input) on Iranian Intermediate EFL learners' listening comprehension ability. To that end, a null hypothesis was formulated as power point enhanced teaching (visual input) has no effect on Iranian Intermediate EFL learners' listening…
ERIC Educational Resources Information Center
Eze, Ogwa Christopher
2015-01-01
This research was conducted to ascertain teachers' and students perception of instructional supervision in relation to capacity building in electrical installation trade in technical colleges. Three research questions and a null hypothesis were employed to guide the study. Descriptive survey was adopted. A 23-item questionnaire was used to elicit…
Dewar, Alastair; Camplin, William; Barry, Jon; Kennedy, Paul
2014-12-01
Since the cessation of phosphoric acid production (in 1992) and subsequent closure and decommissioning (2004) of the Rhodia Consumer Specialties Limited plant in Whitehaven, the concentration levels of polonium-210 ((210)Po) in local marine materials have declined towards a level more typical of natural background. However, enhanced concentrations of (210)Po and lead-210 ((210)Pb), due to this historic industrial activity (plant discharges and ingrowth of (210)Po from (210)Pb), have been observed in fish and shellfish samples collected from this area over the last 20 years. The results of this monitoring, and assessments of the dose from these radionuclides, to high-rate aquatic food consumers are published annually in the Radioactivity in Food and the Environment (RIFE) report series. The RIFE assessment uses a simple approach to determine whether and by how much activity is enhanced above the normal background. As a potential tool to improve the assessment of enhanced concentrations of (210)Po in routine dose assessments, a formal statistical test, where the null hypothesis is that the Whitehaven area is contaminated with (210)Po, was applied to sample data. This statistical, modified "green", test has been used in assessments of chemicals by the OSPAR commission. It involves comparison of the reported environmental concentrations of (210)Po in a given aquatic species against its corresponding Background Assessment Concentration (BAC), which is based upon environmental samples collected from regions assumed to be not enhanced by industrial sources of (210)Po, over the period for which regular monitoring data are available (1990-2010). Unlike RIFE, these BAC values take account of the variability of the natural background level. As an example, for 2010 data, crab, lobster, mussels and winkles passed the modified "green" test (i.e. the null hypothesis is rejected) and as such are deemed not to be enhanced. Since the cessation of phosphoric acid production in 1992, the modified "green" test pass rate for crustaceans is ∼53% and ∼64% for molluscs. Results of dose calculations are made (i) using the RIFE approach and (ii) with the application of the modified "green" test, where samples passing the modified "green" test are assumed to have background levels and hence zero enhancement of (210)Po. Applying the modified "green" test reduces the dose on average by 44% over the period of this study (1990-2010). Crown Copyright © 2014. Published by Elsevier Ltd. All rights reserved.
Significance Testing in Confirmatory Factor Analytic Models.
ERIC Educational Resources Information Center
Khattab, Ali-Maher; Hocevar, Dennis
Traditionally, confirmatory factor analytic models are tested against a null model of total independence. Using randomly generated factors in a matrix of 46 aptitude tests, this approach is shown to be unlikely to reject even random factors. An alternative null model, based on a single general factor, is suggested. In addition, an index of model…
Ethanol wet-bonding technique sensitivity assessed by AFM.
Osorio, E; Toledano, M; Aguilera, F S; Tay, F R; Osorio, R
2010-11-01
In ethanol wet bonding, water is replaced by ethanol to maintain dehydrated collagen matrices in an extended state to facilitate resin infiltration. Since short ethanol dehydration protocols may be ineffective, this study tested the null hypothesis that there are no differences in ethanol dehydration protocols for maintaining the surface roughness, fibril diameter, and interfibrillar spaces of acid-etched dentin. Polished human dentin surfaces were etched with phosphoric acid and water-rinsed. Tested protocols were: (1) water-rinse (control); (2) 100% ethanol-rinse (1-min); (3) 100% ethanol-rinse (5-min); and (4) progressive ethanol replacement (50-100%). Surface roughness, fibril diameter, and interfibrillar spaces were determined with atomic force microscopy and analyzed by one-way analysis of variance and the Student-Newman-Keuls test (α = 0.05). Dentin roughness and fibril diameter significantly decreased when 100% ethanol (1-5 min) was used for rinsing (p < 0.001). Absolute ethanol produced collapse and shrinkage of collagen fibrils. Ascending ethanol concentrations did not collapse the matrix and shrank the fibrils less than absolute ethanol-rinses.
Handedness, Earnings, Ability and Personality. Evidence from the Lab
2016-01-01
Evidence showing that on average left-handed (L), who are 10% in a population, tend to earn less than others is solely based on survey data. This paper is the first to test the relationship between handedness and earnings experimentally and also to assess whether the mechanism underlying it is predominantly cognitive or psychological. Data on 432 undergraduate students show that L do not obtain significantly different payoffs, a proxy for earnings, in a stylised labour market with multiple principals and agents. Similarly, scores in the Cognitive Reflection Test are not significantly different. Data on personality, measured using the Big Five test, show, instead, that L are significantly more agreeable and L females more extroverted. In addition, earnings significantly vary with personality only for L, increasing with extraversion and decreasing with neuroticism. Overall, our results fail to reject the null hypothesis that earnings do not differ by handedness and suggest differences in personality as a novel mechanism to rationalise L’s behaviour. PMID:27788156
Unification of field theory and maximum entropy methods for learning probability densities
NASA Astrophysics Data System (ADS)
Kinney, Justin B.
2015-09-01
The need to estimate smooth probability distributions (a.k.a. probability densities) from finite sampled data is ubiquitous in science. Many approaches to this problem have been described, but none is yet regarded as providing a definitive solution. Maximum entropy estimation and Bayesian field theory are two such approaches. Both have origins in statistical physics, but the relationship between them has remained unclear. Here I unify these two methods by showing that every maximum entropy density estimate can be recovered in the infinite smoothness limit of an appropriate Bayesian field theory. I also show that Bayesian field theory estimation can be performed without imposing any boundary conditions on candidate densities, and that the infinite smoothness limit of these theories recovers the most common types of maximum entropy estimates. Bayesian field theory thus provides a natural test of the maximum entropy null hypothesis and, furthermore, returns an alternative (lower entropy) density estimate when the maximum entropy hypothesis is falsified. The computations necessary for this approach can be performed rapidly for one-dimensional data, and software for doing this is provided.
Hart, Gina O
2005-11-01
There have been several anthropological studies on trauma analysis in recent literature, but few studies have focused on the differences between the three mechanisms of trauma (sharp force trauma, blunt force trauma and ballistics trauma). The hypothesis of this study is that blunt force and ballistics fracture patterns in the skull can be differentiated using concentric fractures. Two-hundred and eleven injuries from skulls exhibiting concentric fractures were examined to determine if the mechanism of trauma could be determined by beveling direction. Fractures occurring in buttressed and non-buttressed regions were examined separately. Contingency tables and Pearson's Chi-Square were used to evaluate the relationship between the two variables (the mechanism of trauma and the direction of beveling), while Pearson's r correlation was used to determine the strength of the relationship. Contingency tables and Chi-square tests among the entire sample, the buttressed areas, and the non-buttressed areas led to the null hypothesis (no relationship) to be rejected. Pearson's r correlation indicated that the relationship between the variables studied is greater than chance allocation.
Unification of field theory and maximum entropy methods for learning probability densities.
Kinney, Justin B
2015-09-01
The need to estimate smooth probability distributions (a.k.a. probability densities) from finite sampled data is ubiquitous in science. Many approaches to this problem have been described, but none is yet regarded as providing a definitive solution. Maximum entropy estimation and Bayesian field theory are two such approaches. Both have origins in statistical physics, but the relationship between them has remained unclear. Here I unify these two methods by showing that every maximum entropy density estimate can be recovered in the infinite smoothness limit of an appropriate Bayesian field theory. I also show that Bayesian field theory estimation can be performed without imposing any boundary conditions on candidate densities, and that the infinite smoothness limit of these theories recovers the most common types of maximum entropy estimates. Bayesian field theory thus provides a natural test of the maximum entropy null hypothesis and, furthermore, returns an alternative (lower entropy) density estimate when the maximum entropy hypothesis is falsified. The computations necessary for this approach can be performed rapidly for one-dimensional data, and software for doing this is provided.
Bruxism and Dental Implants: A Meta-Analysis.
Chrcanovic, Bruno Ramos; Albrektsson, Tomas; Wennerberg, Ann
2015-10-01
To test the null hypothesis of no difference in the implant failure rates, postoperative infection, and marginal bone loss after the insertion of dental implants in bruxers compared with the insertion in non-bruxers against the alternative hypothesis of a difference. An electronic search was undertaken in June 2014. Eligibility criteria included clinical studies, either randomized or not. Ten publications were included with a total of 760 implants inserted in bruxers (49 failures; 6.45%) and 2989 in non-bruxers (109 failures; 3.65%). Due to lack of information, meta-analyses for the outcomes "postoperative infection" and "marginal bone loss" were not possible. A risk ratio of 2.93 was found (95% confidence interval, 1.48-5.81; P = 0.002). These results cannot suggest that the insertion of dental implants in bruxers affects the implant failure rates due to a limited number of published studies, all characterized by a low level of specificity, and most of them deal with a limited number of cases without a control group. Therefore, the real effect of bruxing habits on the osseointegration and survival of endosteal dental implants is still not well established.
Comparison of body weight and gene expression in amelogenin null and wild-type mice.
Li, Yong; Yuan, Zhi-An; Aragon, Melissa A; Kulkarni, Ashok B; Gibson, Carolyn W
2006-05-01
Amelogenin (AmelX) null mice develop hypomineralized enamel lacking normal prism structure, but are healthy and fertile. Because these mice are smaller than wild-type mice prior to weaning, we undertook a detailed analysis of the weight of mice and analyzed AmelX expression in non-dental tissues. Wild-type mice had a greater average weight each day within the 3-wk period. Using reverse transcription-polymerase chain reaction (RT-PCR), products of approximately 200 bp in size were generated from wild-type teeth, brain, eye, and calvariae. DNA sequence analysis of RT-PCR products from calvariae indicated that the small amelogenin leucine-rich amelogenin peptide (LRAP), both with and without exon 4, was expressed. No products were obtained from any of the samples from the AmelX null mice. We also isolated mRNAs that included AmelX exons 8 and 9, and identified a duplication within the murine AmelX gene with 91% homology. Our results add additional support to the hypothesis that amelogenins are multifunctional proteins, with potential roles in non-ameloblasts and in non-mineralizing tissues during development. The smaller size of AmelX null mice could potentially be explained by the lack of LRAP expression in some of these tissues, leading to a delay in development.
Wilkinson, Michael
2014-03-01
Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.
Type-II generalized family-wise error rate formulas with application to sample size determination.
Delorme, Phillipe; de Micheaux, Pierre Lafaye; Liquet, Benoit; Riou, Jérémie
2016-07-20
Multiple endpoints are increasingly used in clinical trials. The significance of some of these clinical trials is established if at least r null hypotheses are rejected among m that are simultaneously tested. The usual approach in multiple hypothesis testing is to control the family-wise error rate, which is defined as the probability that at least one type-I error is made. More recently, the q-generalized family-wise error rate has been introduced to control the probability of making at least q false rejections. For procedures controlling this global type-I error rate, we define a type-II r-generalized family-wise error rate, which is directly related to the r-power defined as the probability of rejecting at least r false null hypotheses. We obtain very general power formulas that can be used to compute the sample size for single-step and step-wise procedures. These are implemented in our R package rPowerSampleSize available on the CRAN, making them directly available to end users. Complexities of the formulas are presented to gain insight into computation time issues. Comparison with Monte Carlo strategy is also presented. We compute sample sizes for two clinical trials involving multiple endpoints: one designed to investigate the effectiveness of a drug against acute heart failure and the other for the immunogenicity of a vaccine strategy against pneumococcus. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Peeters, Harry Huiz; Gutknecht, Norbert
2014-08-01
The purpose of the study was to test the hypothesis that air entrapment occurs in the apical third of a root canal during irrigation. A second objective was to test the null hypothesis that there is no difference between laser-driven irrigation (an erbium, chromium:yttrium-scandium-gallium-garnet laser) and passive ultrasonic irrigation in removing an airlock from the apical third. One hundred twenty extracted human teeth with single narrow root canals were randomised into two experimental groups (n = 40) and two control groups (n = 20). The specimens were shaped using hand instruments up to a size 30/0.02 file. The teeth were irrigated with a mixture of saline, radiopaque contrast and ink in solution. In the passive ultrasonic irrigation group, the irrigant was activated with an ultrasonic device for 60 s. In the laser group, the irrigant was activated with a laser for 60 s. It was concluded that if the insertion of irrigation needle is shorter than the working length, air entrapment may develop in the apical third, but the use of laser-driven irrigation is completely effective in removing it. © 2013 The Authors. Australian Endodontic Journal © 2013 Australian Society of Endodontology.
The missing link? Testing a schema account of unitization.
Tibon, Roni; Greve, Andrea; Henson, Richard
2018-05-09
Unitization refers to the creation of a new unit from previously distinct items. The concept of unitization has been used to explain how novel pairings between items can be remembered without requiring recollection, by virtue of new, item-like representations that enable familiarity-based retrieval. We tested an alternative account of unitization - a schema account - which suggests that associations between items can be rapidly assimilated into a schema. We used a common operationalization of "unitization" as the difference between two unrelated words being linked by a definition, relative to two words being linked by a sentence, during an initial study phase. During the following relearning phase, a studied word was re-paired with a new word, either related or unrelated to the original associate from study. In a final test phase, memory for the relearned associations was tested. We hypothesized that, if unitized representations act like schemas, then we would observe some generalization to related words, such that memory would be better in the definition than sentence condition for related words, but not for unrelated words. Contrary to the schema hypothesis, evidence favored the null hypothesis of no difference between definition and sentence conditions for related words (Experiment 1), even when each cue was associated with multiple associates, indicating that the associations can be generalized (Experiment 2), or when the schematic information was explicitly re-activated during Relearning (Experiment 3). These results suggest that unitized associations do not generalize to accommodate new information, and therefore provide evidence against the schema account.
The Skillings-Mack test (Friedman test when there are missing data).
Chatfield, Mark; Mander, Adrian
2009-04-01
The Skillings-Mack statistic (Skillings and Mack, 1981, Technometrics 23: 171-177) is a general Friedman-type statistic that can be used in almost any block design with an arbitrary missing-data structure. The missing data can be either missing by design, for example, an incomplete block design, or missing completely at random. The Skillings-Mack test is equivalent to the Friedman test when there are no missing data in a balanced complete block design, and the Skillings-Mack test is equivalent to the test suggested in Durbin (1951, British Journal of Psychology, Statistical Section 4: 85-90) for a balanced incomplete block design. The Friedman test was implemented in Stata by Goldstein (1991, Stata Technical Bulletin 3: 26-27) and further developed in Goldstein (2005, Stata Journal 5: 285). This article introduces the skilmack command, which performs the Skillings-Mack test.The skilmack command is also useful when there are many ties or equal ranks (N.B. the Friedman statistic compared with the chi(2) distribution will give a conservative result), as well as for small samples; appropriate results can be obtained by simulating the distribution of the test statistic under the null hypothesis.
Measurement Via Optical Near-Nulling and Subaperture Stitching
NASA Technical Reports Server (NTRS)
Forbes, Greg; De Vries, Gary; Murphy, Paul; Brophy, Chris
2012-01-01
A subaperture stitching interferometer system provides near-nulling of a subaperture wavefront reflected from an object of interest over a portion of a surface of the object. A variable optical element located in the radiation path adjustably provides near-nulling to facilitate stitching of subaperture interferograms, creating an interferogram representative of the entire surface of interest. This enables testing of aspheric surfaces without null optics customized for each surface prescription. The surface shapes of objects such as lenses and other precision components are often measured with interferometry. However, interferometers have a limited capture range, and thus the test wavefront cannot be too different from the reference or the interference cannot be analyzed. Furthermore, the performance of the interferometer is usually best when the test and reference wavefronts are nearly identical (referred to as a null condition). Thus, it is necessary when performing such measurements to correct for known variations in shape to ensure that unintended variations are within the capture range of the interferometer and accurately measured. This invention is a system for nearnulling within a subaperture stitching interferometer, although in principle, the concept can be employed by wavefront measuring gauges other than interferometers. The system employs a light source for providing coherent radiation of a subaperture extent. An object of interest is placed to modify the radiation (e.g., to reflect or pass the radiation), and a variable optical element is located to interact with, and nearly null, the affected radiation. A detector or imaging device is situated to obtain interference patterns in the modified radiation. Multiple subaperture interferograms are taken and are stitched, or joined, to provide an interferogram representative of the entire surface of the object of interest. The primary aspect of the invention is the use of adjustable corrective optics in the context of subaperture stitching near-nulling interferometry, wherein a complex surface is analyzed via multiple, separate, overlapping interferograms. For complex surfaces, the problem of managing the identification and placement of corrective optics becomes even more pronounced, to the extent that in most cases the null corrector optics are specific to the particular asphere prescription and no others (i.e. another asphere requires completely different null correction optics). In principle, the near-nulling technique does not require subaperture stitching at all. Building a near-null system that is practically useful relies on two key features: simplicity and universality. If the system is too complex, it will be difficult to calibrate and model its manufacturing errors, rendering it useless as a precision metrology tool and/or prohibitively expensive. If the system is not applicable to a wide range of test parts, then it does not provide significant value over conventional null-correction technology. Subaperture stitching enables simpler and more universal near-null systems to be effective, because a fraction of a surface is necessarily less complex than the whole surface (excepting the extreme case of a fractal surface description). The technique of near-nulling can significantly enhance aspheric subaperture stitching capability by allowing the interferometer to capture a wider range of aspheres. More over, subaperture stitching is essential to a truly effective near-nulling system, since looking at a fraction of the surface keeps the wavefront complexity within the capability of a relatively simple nearnull apparatus. Furthermore, by reducing the subaperture size, the complexity of the measured wavefront can be reduced until it is within the capability of the near-null design.
A network perspective on the topological importance of enzymes and their phylogenetic conservation
Liu, Wei-chung; Lin, Wen-hsien; Davis, Andrew J; Jordán, Ferenc; Yang, Hsih-te; Hwang, Ming-jing
2007-01-01
Background A metabolic network is the sum of all chemical transformations or reactions in the cell, with the metabolites being interconnected by enzyme-catalyzed reactions. Many enzymes exist in numerous species while others occur only in a few. We ask if there are relationships between the phylogenetic profile of an enzyme, or the number of different bacterial species that contain it, and its topological importance in the metabolic network. Our null hypothesis is that phylogenetic profile is independent of topological importance. To test our null hypothesis we constructed an enzyme network from the KEGG (Kyoto Encyclopedia of Genes and Genomes) database. We calculated three network indices of topological importance: the degree or the number of connections of a network node; closeness centrality, which measures how close a node is to others; and betweenness centrality measuring how frequently a node appears on all shortest paths between two other nodes. Results Enzyme phylogenetic profile correlates best with betweenness centrality and also quite closely with degree, but poorly with closeness centrality. Both betweenness and closeness centralities are non-local measures of topological importance and it is intriguing that they have contrasting power of predicting phylogenetic profile in bacterial species. We speculate that redundancy in an enzyme network may be reflected by betweenness centrality but not by closeness centrality. We also discuss factors influencing the correlation between phylogenetic profile and topological importance. Conclusion Our analysis falsifies the hypothesis that phylogenetic profile of enzymes is independent of enzyme network importance. Our results show that phylogenetic profile correlates better with degree and betweenness centrality, but less so with closeness centrality. Enzymes that occur in many bacterial species tend to be those that have high network importance. We speculate that this phenomenon originates in mechanisms driving network evolution. Closeness centrality reflects phylogenetic profile poorly. This is because metabolic networks often consist of distinct functional modules and some are not in the centre of the network. Enzymes in these peripheral parts of a network might be important for cell survival and should therefore occur in many bacterial species. They are, however, distant from other enzymes in the same network. PMID:17425808
Hindocha, C; Freeman, T P; Grabski, M; Crudgington, H; Davies, A C; Stroud, J B; Das, R K; Lawn, W; Morgan, C J A; Curran, H V
2018-05-15
Acute nicotine abstinence in cigarette smokers results in deficits in performance on specific cognitive processes, including working memory and impulsivity which are important in relapse. Cannabidiol (CBD), the non-intoxicating cannabinoid found in cannabis, has shown pro-cognitive effects and preliminary evidence has indicated it can reduce the number of cigarettes smoked in dependent smokers. However, the effects of CBD on cognition have never been tested during acute nicotine withdrawal. The present study therefore aimed to investigate if CBD can improve memory and reduce impulsivity during acute tobacco abstinence. Thirty, non-treatment seeking, dependent, cigarette smokers attended two laboratory-based sessions after overnight abstinence, in which they received either 800 mg oral CBD or placebo (PBO), in a randomised order. Abstinence was biologically verified. Participants were assessed on go/no-go, delay discounting, prose recall and N-back (0-back, 1-back, 2-back) tasks. The effects of CBD on delay discounting, prose recall and the N-back (correct responses, maintenance or manipulation) were null, confirmed by a Bayesian analysis, which found evidence for the null hypothesis. Contrary to our predictions, CBD increased commission errors on the go/no-go task. In conclusion, a single 800 mg dose of CBD does not improve verbal or spatial working memory, or impulsivity during tobacco abstinence.
Subramaniyam, Narayan Puthanmadam; Hyttinen, Jari
2015-02-01
Recently Andrezejak et al. combined the randomness and nonlinear independence test with iterative amplitude adjusted Fourier transform (iAAFT) surrogates to distinguish between the dynamics of seizure-free intracranial electroencephalographic (EEG) signals recorded from epileptogenic (focal) and nonepileptogenic (nonfocal) brain areas of epileptic patients. However, stationarity is a part of the null hypothesis for iAAFT surrogates and thus nonstationarity can violate the null hypothesis. In this work we first propose the application of the randomness and nonlinear independence test based on recurrence network measures to distinguish between the dynamics of focal and nonfocal EEG signals. Furthermore, we combine these tests with both iAAFT and truncated Fourier transform (TFT) surrogate methods, which also preserves the nonstationarity of the original data in the surrogates along with its linear structure. Our results indicate that focal EEG signals exhibit an increased degree of structural complexity and interdependency compared to nonfocal EEG signals. In general, we find higher rejections for randomness and nonlinear independence tests for focal EEG signals compared to nonfocal EEG signals. In particular, the univariate recurrence network measures, the average clustering coefficient C and assortativity R, and the bivariate recurrence network measure, the average cross-clustering coefficient C(cross), can successfully distinguish between the focal and nonfocal EEG signals, even when the analysis is restricted to nonstationary signals, irrespective of the type of surrogates used. On the other hand, we find that the univariate recurrence network measures, the average path length L, and the average betweenness centrality BC fail to distinguish between the focal and nonfocal EEG signals when iAAFT surrogates are used. However, these two measures can distinguish between focal and nonfocal EEG signals when TFT surrogates are used for nonstationary signals. We also report an improvement in the performance of nonlinear prediction error N and nonlinear interdependence measure L used by Andrezejak et al., when TFT surrogates are used for nonstationary EEG signals. We also find that the outcome of the nonlinear independence test based on the average cross-clustering coefficient C(cross) is independent of the outcome of the randomness test based on the average clustering coefficient C. Thus, the univariate and bivariate recurrence network measures provide independent information regarding the dynamics of the focal and nonfocal EEG signals. In conclusion, recurrence network analysis combined with nonstationary surrogates can be applied to derive reliable biomarkers to distinguish between epileptogenic and nonepileptogenic brain areas using EEG signals.
NASA Astrophysics Data System (ADS)
Subramaniyam, Narayan Puthanmadam; Hyttinen, Jari
2015-02-01
Recently Andrezejak et al. combined the randomness and nonlinear independence test with iterative amplitude adjusted Fourier transform (iAAFT) surrogates to distinguish between the dynamics of seizure-free intracranial electroencephalographic (EEG) signals recorded from epileptogenic (focal) and nonepileptogenic (nonfocal) brain areas of epileptic patients. However, stationarity is a part of the null hypothesis for iAAFT surrogates and thus nonstationarity can violate the null hypothesis. In this work we first propose the application of the randomness and nonlinear independence test based on recurrence network measures to distinguish between the dynamics of focal and nonfocal EEG signals. Furthermore, we combine these tests with both iAAFT and truncated Fourier transform (TFT) surrogate methods, which also preserves the nonstationarity of the original data in the surrogates along with its linear structure. Our results indicate that focal EEG signals exhibit an increased degree of structural complexity and interdependency compared to nonfocal EEG signals. In general, we find higher rejections for randomness and nonlinear independence tests for focal EEG signals compared to nonfocal EEG signals. In particular, the univariate recurrence network measures, the average clustering coefficient C and assortativity R , and the bivariate recurrence network measure, the average cross-clustering coefficient Ccross, can successfully distinguish between the focal and nonfocal EEG signals, even when the analysis is restricted to nonstationary signals, irrespective of the type of surrogates used. On the other hand, we find that the univariate recurrence network measures, the average path length L , and the average betweenness centrality BC fail to distinguish between the focal and nonfocal EEG signals when iAAFT surrogates are used. However, these two measures can distinguish between focal and nonfocal EEG signals when TFT surrogates are used for nonstationary signals. We also report an improvement in the performance of nonlinear prediction error N and nonlinear interdependence measure L used by Andrezejak et al., when TFT surrogates are used for nonstationary EEG signals. We also find that the outcome of the nonlinear independence test based on the average cross-clustering coefficient Ccross is independent of the outcome of the randomness test based on the average clustering coefficient C . Thus, the univariate and bivariate recurrence network measures provide independent information regarding the dynamics of the focal and nonfocal EEG signals. In conclusion, recurrence network analysis combined with nonstationary surrogates can be applied to derive reliable biomarkers to distinguish between epileptogenic and nonepileptogenic brain areas using EEG signals.
Too good to be true: publication bias in two prominent studies from experimental psychology.
Francis, Gregory
2012-04-01
Empirical replication has long been considered the final arbiter of phenomena in science, but replication is undermined when there is evidence for publication bias. Evidence for publication bias in a set of experiments can be found when the observed number of rejections of the null hypothesis exceeds the expected number of rejections. Application of this test reveals evidence of publication bias in two prominent investigations from experimental psychology that have purported to reveal evidence of extrasensory perception and to indicate severe limitations of the scientific method. The presence of publication bias suggests that those investigations cannot be taken as proper scientific studies of such phenomena, because critical data are not available to the field. Publication bias could partly be avoided if experimental psychologists started using Bayesian data analysis techniques.