Sample records for statistical hypothesis tests

  1. Explorations in statistics: hypothesis tests and P values.

    PubMed

    Curran-Everett, Douglas

    2009-06-01

    Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This second installment of Explorations in Statistics delves into test statistics and P values, two concepts fundamental to the test of a scientific null hypothesis. The essence of a test statistic is that it compares what we observe in the experiment to what we expect to see if the null hypothesis is true. The P value associated with the magnitude of that test statistic answers this question: if the null hypothesis is true, what proportion of possible values of the test statistic are at least as extreme as the one I got? Although statisticians continue to stress the limitations of hypothesis tests, there are two realities we must acknowledge: hypothesis tests are ingrained within science, and the simple test of a null hypothesis can be useful. As a result, it behooves us to explore the notions of hypothesis tests, test statistics, and P values.

  2. Explorations in Statistics: Hypothesis Tests and P Values

    ERIC Educational Resources Information Center

    Curran-Everett, Douglas

    2009-01-01

    Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This second installment of "Explorations in Statistics" delves into test statistics and P values, two concepts fundamental to the test of a scientific null hypothesis. The essence of a test statistic is that it compares what…

  3. Improving the Crossing-SIBTEST Statistic for Detecting Non-uniform DIF.

    PubMed

    Chalmers, R Philip

    2018-06-01

    This paper demonstrates that, after applying a simple modification to Li and Stout's (Psychometrika 61(4):647-677, 1996) CSIBTEST statistic, an improved variant of the statistic could be realized. It is shown that this modified version of CSIBTEST has a more direct association with the SIBTEST statistic presented by Shealy and Stout (Psychometrika 58(2):159-194, 1993). In particular, the asymptotic sampling distributions and general interpretation of the effect size estimates are the same for SIBTEST and the new CSIBTEST. Given the more natural connection to SIBTEST, it is shown that Li and Stout's hypothesis testing approach is insufficient for CSIBTEST; thus, an improved hypothesis testing procedure is required. Based on the presented arguments, a new chi-squared-based hypothesis testing approach is proposed for the modified CSIBTEST statistic. Positive results from a modest Monte Carlo simulation study strongly suggest the original CSIBTEST procedure and randomization hypothesis testing approach should be replaced by the modified statistic and hypothesis testing method.

  4. A Critique of One-Tailed Hypothesis Test Procedures in Business and Economics Statistics Textbooks.

    ERIC Educational Resources Information Center

    Liu, Tung; Stone, Courtenay C.

    1999-01-01

    Surveys introductory business and economics statistics textbooks and finds that they differ over the best way to explain one-tailed hypothesis tests: the simple null-hypothesis approach or the composite null-hypothesis approach. Argues that the composite null-hypothesis approach contains methodological shortcomings that make it more difficult for…

  5. A statistical test to show negligible trend

    Treesearch

    Philip M. Dixon; Joseph H.K. Pechmann

    2005-01-01

    The usual statistical tests of trend are inappropriate for demonstrating the absence of trend. This is because failure to reject the null hypothesis of no trend does not prove that null hypothesis. The appropriate statistical method is based on an equivalence test. The null hypothesis is that the trend is not zero, i.e., outside an a priori specified equivalence region...

  6. Perspectives on the Use of Null Hypothesis Statistical Testing. Part III: the Various Nuts and Bolts of Statistical and Hypothesis Testing

    ERIC Educational Resources Information Center

    Marmolejo-Ramos, Fernando; Cousineau, Denis

    2017-01-01

    The number of articles showing dissatisfaction with the null hypothesis statistical testing (NHST) framework has been progressively increasing over the years. Alternatives to NHST have been proposed and the Bayesian approach seems to have achieved the highest amount of visibility. In this last part of the special issue, a few alternative…

  7. The Importance of Teaching Power in Statistical Hypothesis Testing

    ERIC Educational Resources Information Center

    Olinsky, Alan; Schumacher, Phyllis; Quinn, John

    2012-01-01

    In this paper, we discuss the importance of teaching power considerations in statistical hypothesis testing. Statistical power analysis determines the ability of a study to detect a meaningful effect size, where the effect size is the difference between the hypothesized value of the population parameter under the null hypothesis and the true value…

  8. [Dilemma of null hypothesis in ecological hypothesis's experiment test.

    PubMed

    Li, Ji

    2016-06-01

    Experimental test is one of the major test methods of ecological hypothesis, though there are many arguments due to null hypothesis. Quinn and Dunham (1983) analyzed the hypothesis deduction model from Platt (1964) and thus stated that there is no null hypothesis in ecology that can be strictly tested by experiments. Fisher's falsificationism and Neyman-Pearson (N-P)'s non-decisivity inhibit statistical null hypothesis from being strictly tested. Moreover, since the null hypothesis H 0 (α=1, β=0) and alternative hypothesis H 1 '(α'=1, β'=0) in ecological progresses are diffe-rent from classic physics, the ecological null hypothesis can neither be strictly tested experimentally. These dilemmas of null hypothesis could be relieved via the reduction of P value, careful selection of null hypothesis, non-centralization of non-null hypothesis, and two-tailed test. However, the statistical null hypothesis significance testing (NHST) should not to be equivalent to the causality logistical test in ecological hypothesis. Hence, the findings and conclusions about methodological studies and experimental tests based on NHST are not always logically reliable.

  9. Null but not void: considerations for hypothesis testing.

    PubMed

    Shaw, Pamela A; Proschan, Michael A

    2013-01-30

    Standard statistical theory teaches us that once the null and alternative hypotheses have been defined for a parameter, the choice of the statistical test is clear. Standard theory does not teach us how to choose the null or alternative hypothesis appropriate to the scientific question of interest. Neither does it tell us that in some cases, depending on which alternatives are realistic, we may want to define our null hypothesis differently. Problems in statistical practice are frequently not as pristinely summarized as the classic theory in our textbooks. In this article, we present examples in statistical hypothesis testing in which seemingly simple choices are in fact rich with nuance that, when given full consideration, make the choice of the right hypothesis test much less straightforward. Published 2012. This article is a US Government work and is in the public domain in the USA.

  10. Robust inference from multiple test statistics via permutations: a better alternative to the single test statistic approach for randomized trials.

    PubMed

    Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie

    2013-01-01

    Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.

  11. Testing of Hypothesis in Equivalence and Non Inferiority Trials-A Concept.

    PubMed

    Juneja, Atul; Aggarwal, Abha R; Adhikari, Tulsi; Pandey, Arvind

    2016-04-01

    Establishing the appropriate hypothesis is one of the important steps for carrying out the statistical tests/analysis. Its understanding is important for interpreting the results of statistical analysis. The current communication attempts to provide the concept of testing of hypothesis in non inferiority and equivalence trials, where the null hypothesis is just reverse of what is set up for conventional superiority trials. It is similarly looked for rejection for establishing the fact the researcher is intending to prove. It is important to mention that equivalence or non inferiority cannot be proved by accepting the null hypothesis of no difference. Hence, establishing the appropriate statistical hypothesis is extremely important to arrive at meaningful conclusion for the set objectives in research.

  12. Knowledge dimensions in hypothesis test problems

    NASA Astrophysics Data System (ADS)

    Krishnan, Saras; Idris, Noraini

    2012-05-01

    The reformation in statistics education over the past two decades has predominantly shifted the focus of statistical teaching and learning from procedural understanding to conceptual understanding. The emphasis of procedural understanding is on the formulas and calculation procedures. Meanwhile, conceptual understanding emphasizes students knowing why they are using a particular formula or executing a specific procedure. In addition, the Revised Bloom's Taxonomy offers a twodimensional framework to describe learning objectives comprising of the six revised cognition levels of original Bloom's taxonomy and four knowledge dimensions. Depending on the level of complexities, the four knowledge dimensions essentially distinguish basic understanding from the more connected understanding. This study identifiesthe factual, procedural and conceptual knowledgedimensions in hypothesis test problems. Hypothesis test being an important tool in making inferences about a population from sample informationis taught in many introductory statistics courses. However, researchers find that students in these courses still have difficulty in understanding the underlying concepts of hypothesis test. Past studies also show that even though students can perform the hypothesis testing procedure, they may not understand the rationale of executing these steps or know how to apply them in novel contexts. Besides knowing the procedural steps in conducting a hypothesis test, students must have fundamental statistical knowledge and deep understanding of the underlying inferential concepts such as sampling distribution and central limit theorem. By identifying the knowledge dimensions of hypothesis test problems in this study, suitable instructional and assessment strategies can be developed in future to enhance students' learning of hypothesis test as a valuable inferential tool.

  13. Hypothesis Testing Using Spatially Dependent Heavy Tailed Multisensor Data

    DTIC Science & Technology

    2014-12-01

    Office of Research 113 Bowne Hall Syracuse, NY 13244 -1200 ABSTRACT HYPOTHESIS TESTING USING SPATIALLY DEPENDENT HEAVY-TAILED MULTISENSOR DATA Report...consistent with the null hypothesis of linearity and can be used to estimate the distribution of a test statistic that can discrimi- nate between the null... Test for nonlinearity. Histogram is generated using the surrogate data. The statistic of the original time series is represented by the solid line

  14. An Inferential Confidence Interval Method of Establishing Statistical Equivalence that Corrects Tryon's (2001) Reduction Factor

    ERIC Educational Resources Information Center

    Tryon, Warren W.; Lewis, Charles

    2008-01-01

    Evidence of group matching frequently takes the form of a nonsignificant test of statistical difference. Theoretical hypotheses of no difference are also tested in this way. These practices are flawed in that null hypothesis statistical testing provides evidence against the null hypothesis and failing to reject H[subscript 0] is not evidence…

  15. The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective.

    PubMed

    Kruschke, John K; Liddell, Torrin M

    2018-02-01

    In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty on the other. Among frequentists in psychology, a shift of emphasis from hypothesis testing to estimation has been dubbed "the New Statistics" (Cumming 2014). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis.

  16. Biostatistics Series Module 2: Overview of Hypothesis Testing.

    PubMed

    Hazra, Avijit; Gogtay, Nithya

    2016-01-01

    Hypothesis testing (or statistical inference) is one of the major applications of biostatistics. Much of medical research begins with a research question that can be framed as a hypothesis. Inferential statistics begins with a null hypothesis that reflects the conservative position of no change or no difference in comparison to baseline or between groups. Usually, the researcher has reason to believe that there is some effect or some difference which is the alternative hypothesis. The researcher therefore proceeds to study samples and measure outcomes in the hope of generating evidence strong enough for the statistician to be able to reject the null hypothesis. The concept of the P value is almost universally used in hypothesis testing. It denotes the probability of obtaining by chance a result at least as extreme as that observed, even when the null hypothesis is true and no real difference exists. Usually, if P is < 0.05 the null hypothesis is rejected and sample results are deemed statistically significant. With the increasing availability of computers and access to specialized statistical software, the drudgery involved in statistical calculations is now a thing of the past, once the learning curve of the software has been traversed. The life sciences researcher is therefore free to devote oneself to optimally designing the study, carefully selecting the hypothesis tests to be applied, and taking care in conducting the study well. Unfortunately, selecting the right test seems difficult initially. Thinking of the research hypothesis as addressing one of five generic research questions helps in selection of the right hypothesis test. In addition, it is important to be clear about the nature of the variables (e.g., numerical vs. categorical; parametric vs. nonparametric) and the number of groups or data sets being compared (e.g., two or more than two) at a time. The same research question may be explored by more than one type of hypothesis test. While this may be of utility in highlighting different aspects of the problem, merely reapplying different tests to the same issue in the hope of finding a P < 0.05 is a wrong use of statistics. Finally, it is becoming the norm that an estimate of the size of any effect, expressed with its 95% confidence interval, is required for meaningful interpretation of results. A large study is likely to have a small (and therefore "statistically significant") P value, but a "real" estimate of the effect would be provided by the 95% confidence interval. If the intervals overlap between two interventions, then the difference between them is not so clear-cut even if P < 0.05. The two approaches are now considered complementary to one another.

  17. Biostatistics Series Module 2: Overview of Hypothesis Testing

    PubMed Central

    Hazra, Avijit; Gogtay, Nithya

    2016-01-01

    Hypothesis testing (or statistical inference) is one of the major applications of biostatistics. Much of medical research begins with a research question that can be framed as a hypothesis. Inferential statistics begins with a null hypothesis that reflects the conservative position of no change or no difference in comparison to baseline or between groups. Usually, the researcher has reason to believe that there is some effect or some difference which is the alternative hypothesis. The researcher therefore proceeds to study samples and measure outcomes in the hope of generating evidence strong enough for the statistician to be able to reject the null hypothesis. The concept of the P value is almost universally used in hypothesis testing. It denotes the probability of obtaining by chance a result at least as extreme as that observed, even when the null hypothesis is true and no real difference exists. Usually, if P is < 0.05 the null hypothesis is rejected and sample results are deemed statistically significant. With the increasing availability of computers and access to specialized statistical software, the drudgery involved in statistical calculations is now a thing of the past, once the learning curve of the software has been traversed. The life sciences researcher is therefore free to devote oneself to optimally designing the study, carefully selecting the hypothesis tests to be applied, and taking care in conducting the study well. Unfortunately, selecting the right test seems difficult initially. Thinking of the research hypothesis as addressing one of five generic research questions helps in selection of the right hypothesis test. In addition, it is important to be clear about the nature of the variables (e.g., numerical vs. categorical; parametric vs. nonparametric) and the number of groups or data sets being compared (e.g., two or more than two) at a time. The same research question may be explored by more than one type of hypothesis test. While this may be of utility in highlighting different aspects of the problem, merely reapplying different tests to the same issue in the hope of finding a P < 0.05 is a wrong use of statistics. Finally, it is becoming the norm that an estimate of the size of any effect, expressed with its 95% confidence interval, is required for meaningful interpretation of results. A large study is likely to have a small (and therefore “statistically significant”) P value, but a “real” estimate of the effect would be provided by the 95% confidence interval. If the intervals overlap between two interventions, then the difference between them is not so clear-cut even if P < 0.05. The two approaches are now considered complementary to one another. PMID:27057011

  18. Semantically enabled and statistically supported biological hypothesis testing with tissue microarray databases

    PubMed Central

    2011-01-01

    Background Although many biological databases are applying semantic web technologies, meaningful biological hypothesis testing cannot be easily achieved. Database-driven high throughput genomic hypothesis testing requires both of the capabilities of obtaining semantically relevant experimental data and of performing relevant statistical testing for the retrieved data. Tissue Microarray (TMA) data are semantically rich and contains many biologically important hypotheses waiting for high throughput conclusions. Methods An application-specific ontology was developed for managing TMA and DNA microarray databases by semantic web technologies. Data were represented as Resource Description Framework (RDF) according to the framework of the ontology. Applications for hypothesis testing (Xperanto-RDF) for TMA data were designed and implemented by (1) formulating the syntactic and semantic structures of the hypotheses derived from TMA experiments, (2) formulating SPARQLs to reflect the semantic structures of the hypotheses, and (3) performing statistical test with the result sets returned by the SPARQLs. Results When a user designs a hypothesis in Xperanto-RDF and submits it, the hypothesis can be tested against TMA experimental data stored in Xperanto-RDF. When we evaluated four previously validated hypotheses as an illustration, all the hypotheses were supported by Xperanto-RDF. Conclusions We demonstrated the utility of high throughput biological hypothesis testing. We believe that preliminary investigation before performing highly controlled experiment can be benefited. PMID:21342584

  19. A General Class of Test Statistics for Van Valen’s Red Queen Hypothesis

    PubMed Central

    Wiltshire, Jelani; Huffer, Fred W.; Parker, William C.

    2014-01-01

    Van Valen’s Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen’s work, various statistical approaches have been used to address the relationship between taxon age and the rate of extinction. We propose a general class of test statistics that can be used to test for the effect of age on the rate of extinction. These test statistics allow for a varying background rate of extinction and attempt to remove the effects of other covariates when assessing the effect of age on extinction. No model is assumed for the covariate effects. Instead we control for covariate effects by pairing or grouping together similar species. Simulations are used to compare the power of the statistics. We apply the test statistics to data on Foram extinctions and find that age has a positive effect on the rate of extinction. A derivation of the null distribution of one of the test statistics is provided in the supplementary material. PMID:24910489

  20. A General Class of Test Statistics for Van Valen's Red Queen Hypothesis.

    PubMed

    Wiltshire, Jelani; Huffer, Fred W; Parker, William C

    2014-09-01

    Van Valen's Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen's work, various statistical approaches have been used to address the relationship between taxon age and the rate of extinction. We propose a general class of test statistics that can be used to test for the effect of age on the rate of extinction. These test statistics allow for a varying background rate of extinction and attempt to remove the effects of other covariates when assessing the effect of age on extinction. No model is assumed for the covariate effects. Instead we control for covariate effects by pairing or grouping together similar species. Simulations are used to compare the power of the statistics. We apply the test statistics to data on Foram extinctions and find that age has a positive effect on the rate of extinction. A derivation of the null distribution of one of the test statistics is provided in the supplementary material.

  1. Statistical Validation of Surrogate Endpoints: Another Look at the Prentice Criterion and Other Criteria.

    PubMed

    Saraf, Sanatan; Mathew, Thomas; Roy, Anindya

    2015-01-01

    For the statistical validation of surrogate endpoints, an alternative formulation is proposed for testing Prentice's fourth criterion, under a bivariate normal model. In such a setup, the criterion involves inference concerning an appropriate regression parameter, and the criterion holds if the regression parameter is zero. Testing such a null hypothesis has been criticized in the literature since it can only be used to reject a poor surrogate, and not to validate a good surrogate. In order to circumvent this, an equivalence hypothesis is formulated for the regression parameter, namely the hypothesis that the parameter is equivalent to zero. Such an equivalence hypothesis is formulated as an alternative hypothesis, so that the surrogate endpoint is statistically validated when the null hypothesis is rejected. Confidence intervals for the regression parameter and tests for the equivalence hypothesis are proposed using bootstrap methods and small sample asymptotics, and their performances are numerically evaluated and recommendations are made. The choice of the equivalence margin is a regulatory issue that needs to be addressed. The proposed equivalence testing formulation is also adopted for other parameters that have been proposed in the literature on surrogate endpoint validation, namely, the relative effect and proportion explained.

  2. A shift from significance test to hypothesis test through power analysis in medical research.

    PubMed

    Singh, G

    2006-01-01

    Medical research literature until recently, exhibited substantial dominance of the Fisher's significance test approach of statistical inference concentrating more on probability of type I error over Neyman-Pearson's hypothesis test considering both probability of type I and II error. Fisher's approach dichotomises results into significant or not significant results with a P value. The Neyman-Pearson's approach talks of acceptance or rejection of null hypothesis. Based on the same theory these two approaches deal with same objective and conclude in their own way. The advancement in computing techniques and availability of statistical software have resulted in increasing application of power calculations in medical research and thereby reporting the result of significance tests in the light of power of the test also. Significance test approach, when it incorporates power analysis contains the essence of hypothesis test approach. It may be safely argued that rising application of power analysis in medical research may have initiated a shift from Fisher's significance test to Neyman-Pearson's hypothesis test procedure.

  3. An Argument Framework for the Application of Null Hypothesis Statistical Testing in Support of Research

    ERIC Educational Resources Information Center

    LeMire, Steven D.

    2010-01-01

    This paper proposes an argument framework for the teaching of null hypothesis statistical testing and its application in support of research. Elements of the Toulmin (1958) model of argument are used to illustrate the use of p values and Type I and Type II error rates in support of claims about statistical parameters and subject matter research…

  4. A critique of statistical hypothesis testing in clinical research

    PubMed Central

    Raha, Somik

    2011-01-01

    Many have documented the difficulty of using the current paradigm of Randomized Controlled Trials (RCTs) to test and validate the effectiveness of alternative medical systems such as Ayurveda. This paper critiques the applicability of RCTs for all clinical knowledge-seeking endeavors, of which Ayurveda research is a part. This is done by examining statistical hypothesis testing, the underlying foundation of RCTs, from a practical and philosophical perspective. In the philosophical critique, the two main worldviews of probability are that of the Bayesian and the frequentist. The frequentist worldview is a special case of the Bayesian worldview requiring the unrealistic assumptions of knowing nothing about the universe and believing that all observations are unrelated to each other. Many have claimed that the first belief is necessary for science, and this claim is debunked by comparing variations in learning with different prior beliefs. Moving beyond the Bayesian and frequentist worldviews, the notion of hypothesis testing itself is challenged on the grounds that a hypothesis is an unclear distinction, and assigning a probability on an unclear distinction is an exercise that does not lead to clarity of action. This critique is of the theory itself and not any particular application of statistical hypothesis testing. A decision-making frame is proposed as a way of both addressing this critique and transcending ideological debates on probability. An example of a Bayesian decision-making approach is shown as an alternative to statistical hypothesis testing, utilizing data from a past clinical trial that studied the effect of Aspirin on heart attacks in a sample population of doctors. As a big reason for the prevalence of RCTs in academia is legislation requiring it, the ethics of legislating the use of statistical methods for clinical research is also examined. PMID:22022152

  5. Bayesian inference for psychology. Part II: Example applications with JASP.

    PubMed

    Wagenmakers, Eric-Jan; Love, Jonathon; Marsman, Maarten; Jamil, Tahira; Ly, Alexander; Verhagen, Josine; Selker, Ravi; Gronau, Quentin F; Dropmann, Damian; Boutin, Bruno; Meerhoff, Frans; Knight, Patrick; Raj, Akash; van Kesteren, Erik-Jan; van Doorn, Johnny; Šmíra, Martin; Epskamp, Sacha; Etz, Alexander; Matzke, Dora; de Jong, Tim; van den Bergh, Don; Sarafoglou, Alexandra; Steingroever, Helen; Derks, Koen; Rouder, Jeffrey N; Morey, Richard D

    2018-02-01

    Bayesian hypothesis testing presents an attractive alternative to p value hypothesis testing. Part I of this series outlined several advantages of Bayesian hypothesis testing, including the ability to quantify evidence and the ability to monitor and update this evidence as data come in, without the need to know the intention with which the data were collected. Despite these and other practical advantages, Bayesian hypothesis tests are still reported relatively rarely. An important impediment to the widespread adoption of Bayesian tests is arguably the lack of user-friendly software for the run-of-the-mill statistical problems that confront psychologists for the analysis of almost every experiment: the t-test, ANOVA, correlation, regression, and contingency tables. In Part II of this series we introduce JASP ( http://www.jasp-stats.org ), an open-source, cross-platform, user-friendly graphical software package that allows users to carry out Bayesian hypothesis tests for standard statistical problems. JASP is based in part on the Bayesian analyses implemented in Morey and Rouder's BayesFactor package for R. Armed with JASP, the practical advantages of Bayesian hypothesis testing are only a mouse click away.

  6. Towards a web-based decision support tool for selecting appropriate statistical test in medical and biological sciences.

    PubMed

    Suner, Aslı; Karakülah, Gökhan; Dicle, Oğuz

    2014-01-01

    Statistical hypothesis testing is an essential component of biological and medical studies for making inferences and estimations from the collected data in the study; however, the misuse of statistical tests is widely common. In order to prevent possible errors in convenient statistical test selection, it is currently possible to consult available test selection algorithms developed for various purposes. However, the lack of an algorithm presenting the most common statistical tests used in biomedical research in a single flowchart causes several problems such as shifting users among the algorithms, poor decision support in test selection and lack of satisfaction of potential users. Herein, we demonstrated a unified flowchart; covers mostly used statistical tests in biomedical domain, to provide decision aid to non-statistician users while choosing the appropriate statistical test for testing their hypothesis. We also discuss some of the findings while we are integrating the flowcharts into each other to develop a single but more comprehensive decision algorithm.

  7. Précis of statistical significance: rationale, validity, and utility.

    PubMed

    Chow, S L

    1998-04-01

    The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of data. Neither can any of them fulfill the nonstatistical functions expected of them by critics.

  8. ON THE SUBJECT OF HYPOTHESIS TESTING

    PubMed Central

    Ugoni, Antony

    1993-01-01

    In this paper, the definition of a statistical hypothesis is discussed, and the considerations which need to be addressed when testing a hypothesis. In particular, the p-value, significance level, and power of a test are reviewed. Finally, the often quoted confidence interval is given a brief introduction. PMID:17989768

  9. New methods of testing nonlinear hypothesis using iterative NLLS estimator

    NASA Astrophysics Data System (ADS)

    Mahaboob, B.; Venkateswarlu, B.; Mokeshrayalu, G.; Balasiddamuni, P.

    2017-11-01

    This research paper discusses the method of testing nonlinear hypothesis using iterative Nonlinear Least Squares (NLLS) estimator. Takeshi Amemiya [1] explained this method. However in the present research paper, a modified Wald test statistic due to Engle, Robert [6] is proposed to test the nonlinear hypothesis using iterative NLLS estimator. An alternative method for testing nonlinear hypothesis using iterative NLLS estimator based on nonlinear hypothesis using iterative NLLS estimator based on nonlinear studentized residuals has been proposed. In this research article an innovative method of testing nonlinear hypothesis using iterative restricted NLLS estimator is derived. Pesaran and Deaton [10] explained the methods of testing nonlinear hypothesis. This paper uses asymptotic properties of nonlinear least squares estimator proposed by Jenrich [8]. The main purpose of this paper is to provide very innovative methods of testing nonlinear hypothesis using iterative NLLS estimator, iterative NLLS estimator based on nonlinear studentized residuals and iterative restricted NLLS estimator. Eakambaram et al. [12] discussed least absolute deviation estimations versus nonlinear regression model with heteroscedastic errors and also they studied the problem of heteroscedasticity with reference to nonlinear regression models with suitable illustration. William Grene [13] examined the interaction effect in nonlinear models disused by Ai and Norton [14] and suggested ways to examine the effects that do not involve statistical testing. Peter [15] provided guidelines for identifying composite hypothesis and addressing the probability of false rejection for multiple hypotheses.

  10. Explorations in Statistics: Power

    ERIC Educational Resources Information Center

    Curran-Everett, Douglas

    2010-01-01

    Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This fifth installment of "Explorations in Statistics" revisits power, a concept fundamental to the test of a null hypothesis. Power is the probability that we reject the null hypothesis when it is false. Four…

  11. Which level of evidence does the US National Toxicology Program provide? Statistical considerations using the Technical Report 578 on Ginkgo biloba as an example.

    PubMed

    Gaus, Wilhelm

    2014-09-02

    The US National Toxicology Program (NTP) is assessed by a statistician. In the NTP-program groups of rodents are fed for a certain period of time with different doses of the substance that is being investigated. Then the animals are sacrificed and all organs are examined pathologically. Such an investigation facilitates many statistical tests. Technical Report TR 578 on Ginkgo biloba is used as an example. More than 4800 statistical tests are possible with the investigations performed. Due to a thought experiment we expect >240 false significant tests. In actuality, 209 significant pathological findings were reported. The readers of Toxicology Letters should carefully distinguish between confirmative and explorative statistics. A confirmative interpretation of a significant test rejects the null-hypothesis and delivers "statistical proof". It is only allowed if (i) a precise hypothesis was established independently from the data used for the test and (ii) the computed p-values are adjusted for multiple testing if more than one test was performed. Otherwise an explorative interpretation generates a hypothesis. We conclude that NTP-reports - including TR 578 on Ginkgo biloba - deliver explorative statistics, i.e. they generate hypotheses, but do not prove them. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  12. Making Knowledge Delivery Failsafe: Adding Step Zero in Hypothesis Testing

    ERIC Educational Resources Information Center

    Pan, Xia; Zhou, Qiang

    2010-01-01

    Knowledge of statistical analysis is increasingly important for professionals in modern business. For example, hypothesis testing is one of the critical topics for quality managers and team workers in Six Sigma training programs. Delivering the knowledge of hypothesis testing effectively can be an important step for the incapable learners or…

  13. The potential for increased power from combining P-values testing the same hypothesis.

    PubMed

    Ganju, Jitendra; Julie Ma, Guoguang

    2017-02-01

    The conventional approach to hypothesis testing for formal inference is to prespecify a single test statistic thought to be optimal. However, we usually have more than one test statistic in mind for testing the null hypothesis of no treatment effect but we do not know which one is the most powerful. Rather than relying on a single p-value, combining p-values from prespecified multiple test statistics can be used for inference. Combining functions include Fisher's combination test and the minimum p-value. Using randomization-based tests, the increase in power can be remarkable when compared with a single test and Simes's method. The versatility of the method is that it also applies when the number of covariates exceeds the number of observations. The increase in power is large enough to prefer combined p-values over a single p-value. The limitation is that the method does not provide an unbiased estimator of the treatment effect and does not apply to situations when the model includes treatment by covariate interaction.

  14. An Exercise for Illustrating the Logic of Hypothesis Testing

    ERIC Educational Resources Information Center

    Lawton, Leigh

    2009-01-01

    Hypothesis testing is one of the more difficult concepts for students to master in a basic, undergraduate statistics course. Students often are puzzled as to why statisticians simply don't calculate the probability that a hypothesis is true. This article presents an exercise that forces students to lay out on their own a procedure for testing a…

  15. Hypothesis Testing, "p" Values, Confidence Intervals, Measures of Effect Size, and Bayesian Methods in Light of Modern Robust Techniques

    ERIC Educational Resources Information Center

    Wilcox, Rand R.; Serang, Sarfaraz

    2017-01-01

    The article provides perspectives on p values, null hypothesis testing, and alternative techniques in light of modern robust statistical methods. Null hypothesis testing and "p" values can provide useful information provided they are interpreted in a sound manner, which includes taking into account insights and advances that have…

  16. When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment.

    PubMed

    Szucs, Denes; Ioannidis, John P A

    2017-01-01

    Null hypothesis significance testing (NHST) has several shortcomings that are likely contributing factors behind the widely debated replication crisis of (cognitive) neuroscience, psychology, and biomedical science in general. We review these shortcomings and suggest that, after sustained negative experience, NHST should no longer be the default, dominant statistical practice of all biomedical and psychological research. If theoretical predictions are weak we should not rely on all or nothing hypothesis tests. Different inferential methods may be most suitable for different types of research questions. Whenever researchers use NHST they should justify its use, and publish pre-study power calculations and effect sizes, including negative findings. Hypothesis-testing studies should be pre-registered and optimally raw data published. The current statistics lite educational approach for students that has sustained the widespread, spurious use of NHST should be phased out.

  17. When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment

    PubMed Central

    Szucs, Denes; Ioannidis, John P. A.

    2017-01-01

    Null hypothesis significance testing (NHST) has several shortcomings that are likely contributing factors behind the widely debated replication crisis of (cognitive) neuroscience, psychology, and biomedical science in general. We review these shortcomings and suggest that, after sustained negative experience, NHST should no longer be the default, dominant statistical practice of all biomedical and psychological research. If theoretical predictions are weak we should not rely on all or nothing hypothesis tests. Different inferential methods may be most suitable for different types of research questions. Whenever researchers use NHST they should justify its use, and publish pre-study power calculations and effect sizes, including negative findings. Hypothesis-testing studies should be pre-registered and optimally raw data published. The current statistics lite educational approach for students that has sustained the widespread, spurious use of NHST should be phased out. PMID:28824397

  18. Classical Statistics and Statistical Learning in Imaging Neuroscience

    PubMed Central

    Bzdok, Danilo

    2017-01-01

    Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques. PMID:29056896

  19. Why Current Statistics of Complementary Alternative Medicine Clinical Trials is Invalid.

    PubMed

    Pandolfi, Maurizio; Carreras, Giulia

    2018-06-07

    It is not sufficiently known that frequentist statistics cannot provide direct information on the probability that the research hypothesis tested is correct. The error resulting from this misunderstanding is compounded when the hypotheses under scrutiny have precarious scientific bases, which, generally, those of complementary alternative medicine (CAM) are. In such cases, it is mandatory to use inferential statistics, considering the prior probability that the hypothesis tested is true, such as the Bayesian statistics. The authors show that, under such circumstances, no real statistical significance can be achieved in CAM clinical trials. In this respect, CAM trials involving human material are also hardly defensible from an ethical viewpoint.

  20. Detection of Undocumented Changepoints Using Multiple Test Statistics and Composite Reference Series.

    NASA Astrophysics Data System (ADS)

    Menne, Matthew J.; Williams, Claude N., Jr.

    2005-10-01

    An evaluation of three hypothesis test statistics that are commonly used in the detection of undocumented changepoints is described. The goal of the evaluation was to determine whether the use of multiple tests could improve undocumented, artificial changepoint detection skill in climate series. The use of successive hypothesis testing is compared to optimal approaches, both of which are designed for situations in which multiple undocumented changepoints may be present. In addition, the importance of the form of the composite climate reference series is evaluated, particularly with regard to the impact of undocumented changepoints in the various component series that are used to calculate the composite.In a comparison of single test changepoint detection skill, the composite reference series formulation is shown to be less important than the choice of the hypothesis test statistic, provided that the composite is calculated from the serially complete and homogeneous component series. However, each of the evaluated composite series is not equally susceptible to the presence of changepoints in its components, which may be erroneously attributed to the target series. Moreover, a reference formulation that is based on the averaging of the first-difference component series is susceptible to random walks when the composition of the component series changes through time (e.g., values are missing), and its use is, therefore, not recommended. When more than one test is required to reject the null hypothesis of no changepoint, the number of detected changepoints is reduced proportionately less than the number of false alarms in a wide variety of Monte Carlo simulations. Consequently, a consensus of hypothesis tests appears to improve undocumented changepoint detection skill, especially when reference series homogeneity is violated. A consensus of successive hypothesis tests using a semihierarchic splitting algorithm also compares favorably to optimal solutions, even when changepoints are not hierarchic.

  1. Assessing Threat Detection Scenarios through Hypothesis Generation and Testing

    DTIC Science & Technology

    2015-12-01

    Publications. Field, A. (2005). Discovering statistics using SPSS (2nd ed.). Thousand Oaks, CA: Sage Publications. Fisher, S. D., Gettys, C. F...therefore, subsequent F statistics are reported using the Huynh-Feldt correction (Greenhouse-Geisser Epsilon > .775). Experienced and inexperienced...change in hypothesis using experience and initial confidence as predictors. In the Dog Day scenario, the regression was not statistically

  2. Bayesian Methods for Determining the Importance of Effects

    USDA-ARS?s Scientific Manuscript database

    Criticisms have plagued the frequentist null-hypothesis significance testing (NHST) procedure since the day it was created from the Fisher Significance Test and Hypothesis Test of Jerzy Neyman and Egon Pearson. Alternatives to NHST exist in frequentist statistics, but competing methods are also avai...

  3. Hypothesis testing for band size detection of high-dimensional banded precision matrices.

    PubMed

    An, Baiguo; Guo, Jianhua; Liu, Yufeng

    2014-06-01

    Many statistical analysis procedures require a good estimator for a high-dimensional covariance matrix or its inverse, the precision matrix. When the precision matrix is banded, the Cholesky-based method often yields a good estimator of the precision matrix. One important aspect of this method is determination of the band size of the precision matrix. In practice, crossvalidation is commonly used; however, we show that crossvalidation not only is computationally intensive but can be very unstable. In this paper, we propose a new hypothesis testing procedure to determine the band size in high dimensions. Our proposed test statistic is shown to be asymptotically normal under the null hypothesis, and its theoretical power is studied. Numerical examples demonstrate the effectiveness of our testing procedure.

  4. Robust hypothesis tests for detecting statistical evidence of two-dimensional and three-dimensional interactions in single-molecule measurements

    NASA Astrophysics Data System (ADS)

    Calderon, Christopher P.; Weiss, Lucien E.; Moerner, W. E.

    2014-05-01

    Experimental advances have improved the two- (2D) and three-dimensional (3D) spatial resolution that can be extracted from in vivo single-molecule measurements. This enables researchers to quantitatively infer the magnitude and directionality of forces experienced by biomolecules in their native environment. Situations where such force information is relevant range from mitosis to directed transport of protein cargo along cytoskeletal structures. Models commonly applied to quantify single-molecule dynamics assume that effective forces and velocity in the x ,y (or x ,y,z) directions are statistically independent, but this assumption is physically unrealistic in many situations. We present a hypothesis testing approach capable of determining if there is evidence of statistical dependence between positional coordinates in experimentally measured trajectories; if the hypothesis of independence between spatial coordinates is rejected, then a new model accounting for 2D (3D) interactions can and should be considered. Our hypothesis testing technique is robust, meaning it can detect interactions, even if the noise statistics are not well captured by the model. The approach is demonstrated on control simulations and on experimental data (directed transport of intraflagellar transport protein 88 homolog in the primary cilium).

  5. P value and the theory of hypothesis testing: an explanation for new researchers.

    PubMed

    Biau, David Jean; Jolles, Brigitte M; Porcher, Raphaël

    2010-03-01

    In the 1920s, Ronald Fisher developed the theory behind the p value and Jerzy Neyman and Egon Pearson developed the theory of hypothesis testing. These distinct theories have provided researchers important quantitative tools to confirm or refute their hypotheses. The p value is the probability to obtain an effect equal to or more extreme than the one observed presuming the null hypothesis of no effect is true; it gives researchers a measure of the strength of evidence against the null hypothesis. As commonly used, investigators will select a threshold p value below which they will reject the null hypothesis. The theory of hypothesis testing allows researchers to reject a null hypothesis in favor of an alternative hypothesis of some effect. As commonly used, investigators choose Type I error (rejecting the null hypothesis when it is true) and Type II error (accepting the null hypothesis when it is false) levels and determine some critical region. If the test statistic falls into that critical region, the null hypothesis is rejected in favor of the alternative hypothesis. Despite similarities between the two, the p value and the theory of hypothesis testing are different theories that often are misunderstood and confused, leading researchers to improper conclusions. Perhaps the most common misconception is to consider the p value as the probability that the null hypothesis is true rather than the probability of obtaining the difference observed, or one that is more extreme, considering the null is true. Another concern is the risk that an important proportion of statistically significant results are falsely significant. Researchers should have a minimum understanding of these two theories so that they are better able to plan, conduct, interpret, and report scientific experiments.

  6. Bayesian models based on test statistics for multiple hypothesis testing problems.

    PubMed

    Ji, Yuan; Lu, Yiling; Mills, Gordon B

    2008-04-01

    We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.

  7. An Extension of RSS-based Model Comparison Tests for Weighted Least Squares

    DTIC Science & Technology

    2012-08-22

    use the model comparison test statistic to analyze the null hypothesis. Under the null hypothesis, the weighted least squares cost functional is JWLS ...q̂WLSH ) = 10.3040×106. Under the alternative hypothesis, the weighted least squares cost functional is JWLS (q̂WLS) = 8.8394 × 106. Thus the model

  8. Unadjusted Bivariate Two-Group Comparisons: When Simpler is Better.

    PubMed

    Vetter, Thomas R; Mascha, Edward J

    2018-01-01

    Hypothesis testing involves posing both a null hypothesis and an alternative hypothesis. This basic statistical tutorial discusses the appropriate use, including their so-called assumptions, of the common unadjusted bivariate tests for hypothesis testing and thus comparing study sample data for a difference or association. The appropriate choice of a statistical test is predicated on the type of data being analyzed and compared. The unpaired or independent samples t test is used to test the null hypothesis that the 2 population means are equal, thereby accepting the alternative hypothesis that the 2 population means are not equal. The unpaired t test is intended for comparing dependent continuous (interval or ratio) data from 2 study groups. A common mistake is to apply several unpaired t tests when comparing data from 3 or more study groups. In this situation, an analysis of variance with post hoc (posttest) intragroup comparisons should instead be applied. Another common mistake is to apply a series of unpaired t tests when comparing sequentially collected data from 2 study groups. In this situation, a repeated-measures analysis of variance, with tests for group-by-time interaction, and post hoc comparisons, as appropriate, should instead be applied in analyzing data from sequential collection points. The paired t test is used to assess the difference in the means of 2 study groups when the sample observations have been obtained in pairs, often before and after an intervention in each study subject. The Pearson chi-square test is widely used to test the null hypothesis that 2 unpaired categorical variables, each with 2 or more nominal levels (values), are independent of each other. When the null hypothesis is rejected, 1 concludes that there is a probable association between the 2 unpaired categorical variables. When comparing 2 groups on an ordinal or nonnormally distributed continuous outcome variable, the 2-sample t test is usually not appropriate. The Wilcoxon-Mann-Whitney test is instead preferred. When making paired comparisons on data that are ordinal, or continuous but nonnormally distributed, the Wilcoxon signed-rank test can be used. In analyzing their data, researchers should consider the continued merits of these simple yet equally valid unadjusted bivariate statistical tests. However, the appropriate use of an unadjusted bivariate test still requires a solid understanding of its utility, assumptions (requirements), and limitations. This understanding will mitigate the risk of misleading findings, interpretations, and conclusions.

  9. Congruence analysis of geodetic networks - hypothesis tests versus model selection by information criteria

    NASA Astrophysics Data System (ADS)

    Lehmann, Rüdiger; Lösler, Michael

    2017-12-01

    Geodetic deformation analysis can be interpreted as a model selection problem. The null model indicates that no deformation has occurred. It is opposed to a number of alternative models, which stipulate different deformation patterns. A common way to select the right model is the usage of a statistical hypothesis test. However, since we have to test a series of deformation patterns, this must be a multiple test. As an alternative solution for the test problem, we propose the p-value approach. Another approach arises from information theory. Here, the Akaike information criterion (AIC) or some alternative is used to select an appropriate model for a given set of observations. Both approaches are discussed and applied to two test scenarios: A synthetic levelling network and the Delft test data set. It is demonstrated that they work but behave differently, sometimes even producing different results. Hypothesis tests are well-established in geodesy, but may suffer from an unfavourable choice of the decision error rates. The multiple test also suffers from statistical dependencies between the test statistics, which are neglected. Both problems are overcome by applying information criterions like AIC.

  10. Hypothesis Testing Using the Films of the Three Stooges

    ERIC Educational Resources Information Center

    Gardner, Robert; Davidson, Robert

    2010-01-01

    The use of The Three Stooges' films as a source of data in an introductory statistics class is described. The Stooges' films are separated into three populations. Using these populations, students may conduct hypothesis tests with data they collect.

  11. Hypothesis-Testing Demands Trustworthy Data—A Simulation Approach to Inferential Statistics Advocating the Research Program Strategy

    PubMed Central

    Krefeld-Schwalb, Antonia; Witte, Erich H.; Zenker, Frank

    2018-01-01

    In psychology as elsewhere, the main statistical inference strategy to establish empirical effects is null-hypothesis significance testing (NHST). The recent failure to replicate allegedly well-established NHST-results, however, implies that such results lack sufficient statistical power, and thus feature unacceptably high error-rates. Using data-simulation to estimate the error-rates of NHST-results, we advocate the research program strategy (RPS) as a superior methodology. RPS integrates Frequentist with Bayesian inference elements, and leads from a preliminary discovery against a (random) H0-hypothesis to a statistical H1-verification. Not only do RPS-results feature significantly lower error-rates than NHST-results, RPS also addresses key-deficits of a “pure” Frequentist and a standard Bayesian approach. In particular, RPS aggregates underpowered results safely. RPS therefore provides a tool to regain the trust the discipline had lost during the ongoing replicability-crisis. PMID:29740363

  12. Hypothesis-Testing Demands Trustworthy Data-A Simulation Approach to Inferential Statistics Advocating the Research Program Strategy.

    PubMed

    Krefeld-Schwalb, Antonia; Witte, Erich H; Zenker, Frank

    2018-01-01

    In psychology as elsewhere, the main statistical inference strategy to establish empirical effects is null-hypothesis significance testing (NHST). The recent failure to replicate allegedly well-established NHST-results, however, implies that such results lack sufficient statistical power, and thus feature unacceptably high error-rates. Using data-simulation to estimate the error-rates of NHST-results, we advocate the research program strategy (RPS) as a superior methodology. RPS integrates Frequentist with Bayesian inference elements, and leads from a preliminary discovery against a (random) H 0 -hypothesis to a statistical H 1 -verification. Not only do RPS-results feature significantly lower error-rates than NHST-results, RPS also addresses key-deficits of a "pure" Frequentist and a standard Bayesian approach. In particular, RPS aggregates underpowered results safely. RPS therefore provides a tool to regain the trust the discipline had lost during the ongoing replicability-crisis.

  13. Using the Coefficient of Confidence to Make the Philosophical Switch from a Posteriori to a Priori Inferential Statistics

    ERIC Educational Resources Information Center

    Trafimow, David

    2017-01-01

    There has been much controversy over the null hypothesis significance testing procedure, with much of the criticism centered on the problem of inverse inference. Specifically, p gives the probability of the finding (or one more extreme) given the null hypothesis, whereas the null hypothesis significance testing procedure involves drawing a…

  14. Hypothesis testing of a change point during cognitive decline among Alzheimer's disease patients.

    PubMed

    Ji, Ming; Xiong, Chengjie; Grundman, Michael

    2003-10-01

    In this paper, we present a statistical hypothesis test for detecting a change point over the course of cognitive decline among Alzheimer's disease patients. The model under the null hypothesis assumes a constant rate of cognitive decline over time and the model under the alternative hypothesis is a general bilinear model with an unknown change point. When the change point is unknown, however, the null distribution of the test statistics is not analytically tractable and has to be simulated by parametric bootstrap. When the alternative hypothesis that a change point exists is accepted, we propose an estimate of its location based on the Akaike's Information Criterion. We applied our method to a data set from the Neuropsychological Database Initiative by implementing our hypothesis testing method to analyze Mini Mental Status Exam scores based on a random-slope and random-intercept model with a bilinear fixed effect. Our result shows that despite large amount of missing data, accelerated decline did occur for MMSE among AD patients. Our finding supports the clinical belief of the existence of a change point during cognitive decline among AD patients and suggests the use of change point models for the longitudinal modeling of cognitive decline in AD research.

  15. LIKELIHOOD RATIO TESTS OF HYPOTHESES ON MULTIVARIATE POPULATIONS, VOLUME II, TEST OF HYPOTHESIS--STATISTICAL MODELS FOR THE EVALUATION AND INTERPRETATION OF EDUCATIONAL CRITERIA. PART 4.

    ERIC Educational Resources Information Center

    SAW, J.G.

    THIS PAPER DEALS WITH SOME TESTS OF HYPOTHESIS FREQUENTLY ENCOUNTERED IN THE ANALYSIS OF MULTIVARIATE DATA. THE TYPE OF HYPOTHESIS CONSIDERED IS THAT WHICH THE STATISTICIAN CAN ANSWER IN THE NEGATIVE OR AFFIRMATIVE. THE DOOLITTLE METHOD MAKES IT POSSIBLE TO EVALUATE THE DETERMINANT OF A MATRIX OF HIGH ORDER, TO SOLVE A MATRIX EQUATION, OR TO…

  16. Modeling Cross-Situational Word–Referent Learning: Prior Questions

    PubMed Central

    Yu, Chen; Smith, Linda B.

    2013-01-01

    Both adults and young children possess powerful statistical computation capabilities—they can infer the referent of a word from highly ambiguous contexts involving many words and many referents by aggregating cross-situational statistical information across contexts. This ability has been explained by models of hypothesis testing and by models of associative learning. This article describes a series of simulation studies and analyses designed to understand the different learning mechanisms posited by the 2 classes of models and their relation to each other. Variants of a hypothesis-testing model and a simple or dumb associative mechanism were examined under different specifications of information selection, computation, and decision. Critically, these 3 components of the models interact in complex ways. The models illustrate a fundamental tradeoff between amount of data input and powerful computations: With the selection of more information, dumb associative models can mimic the powerful learning that is accomplished by hypothesis-testing models with fewer data. However, because of the interactions among the component parts of the models, the associative model can mimic various hypothesis-testing models, producing the same learning patterns but through different internal components. The simulations argue for the importance of a compositional approach to human statistical learning: the experimental decomposition of the processes that contribute to statistical learning in human learners and models with the internal components that can be evaluated independently and together. PMID:22229490

  17. Test of association: which one is the most appropriate for my study?

    PubMed

    Gonzalez-Chica, David Alejandro; Bastos, João Luiz; Duquia, Rodrigo Pereira; Bonamigo, Renan Rangel; Martínez-Mesa, Jeovany

    2015-01-01

    Hypothesis tests are statistical tools widely used for assessing whether or not there is an association between two or more variables. These tests provide a probability of the type 1 error (p-value), which is used to accept or reject the null study hypothesis. To provide a practical guide to help researchers carefully select the most appropriate procedure to answer the research question. We discuss the logic of hypothesis testing and present the prerequisites of each procedure based on practical examples.

  18. Confidence intervals for single-case effect size measures based on randomization test inversion.

    PubMed

    Michiels, Bart; Heyvaert, Mieke; Meulders, Ann; Onghena, Patrick

    2017-02-01

    In the current paper, we present a method to construct nonparametric confidence intervals (CIs) for single-case effect size measures in the context of various single-case designs. We use the relationship between a two-sided statistical hypothesis test at significance level α and a 100 (1 - α) % two-sided CI to construct CIs for any effect size measure θ that contain all point null hypothesis θ values that cannot be rejected by the hypothesis test at significance level α. This method of hypothesis test inversion (HTI) can be employed using a randomization test as the statistical hypothesis test in order to construct a nonparametric CI for θ. We will refer to this procedure as randomization test inversion (RTI). We illustrate RTI in a situation in which θ is the unstandardized and the standardized difference in means between two treatments in a completely randomized single-case design. Additionally, we demonstrate how RTI can be extended to other types of single-case designs. Finally, we discuss a few challenges for RTI as well as possibilities when using the method with other effect size measures, such as rank-based nonoverlap indices. Supplementary to this paper, we provide easy-to-use R code, which allows the user to construct nonparametric CIs according to the proposed method.

  19. A scan statistic for binary outcome based on hypergeometric probability model, with an application to detecting spatial clusters of Japanese encephalitis.

    PubMed

    Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong

    2013-01-01

    As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.

  20. Chi-Square Statistics, Tests of Hypothesis and Technology.

    ERIC Educational Resources Information Center

    Rochowicz, John A.

    The use of technology such as computers and programmable calculators enables students to find p-values and conduct tests of hypotheses in many different ways. Comprehension and interpretation of a research problem become the focus for statistical analysis. This paper describes how to calculate chisquare statistics and p-values for statistical…

  1. Statistical Diversions

    ERIC Educational Resources Information Center

    Petocz, Peter; Sowey, Eric

    2008-01-01

    In this article, the authors focus on hypothesis testing--that peculiarly statistical way of deciding things. Statistical methods for testing hypotheses were developed in the 1920s and 1930s by some of the most famous statisticians, in particular Ronald Fisher, Jerzy Neyman and Egon Pearson, who laid the foundations of almost all modern methods of…

  2. Experimental comparisons of hypothesis test and moving average based combustion phase controllers.

    PubMed

    Gao, Jinwu; Wu, Yuhu; Shen, Tielong

    2016-11-01

    For engine control, combustion phase is the most effective and direct parameter to improve fuel efficiency. In this paper, the statistical control strategy based on hypothesis test criterion is discussed. Taking location of peak pressure (LPP) as combustion phase indicator, the statistical model of LPP is first proposed, and then the controller design method is discussed on the basis of both Z and T tests. For comparison, moving average based control strategy is also presented and implemented in this study. The experiments on a spark ignition gasoline engine at various operating conditions show that the hypothesis test based controller is able to regulate LPP close to set point while maintaining the rapid transient response, and the variance of LPP is also well constrained. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.

  3. Near-exact distributions for the block equicorrelation and equivariance likelihood ratio test statistic

    NASA Astrophysics Data System (ADS)

    Coelho, Carlos A.; Marques, Filipe J.

    2013-09-01

    In this paper the authors combine the equicorrelation and equivariance test introduced by Wilks [13] with the likelihood ratio test (l.r.t.) for independence of groups of variables to obtain the l.r.t. of block equicorrelation and equivariance. This test or its single block version may find applications in many areas as in psychology, education, medicine, genetics and they are important "in many tests of multivariate analysis, e.g. in MANOVA, Profile Analysis, Growth Curve analysis, etc" [12, 9]. By decomposing the overall hypothesis into the hypotheses of independence of groups of variables and the hypothesis of equicorrelation and equivariance we are able to obtain the expressions for the overall l.r.t. statistic and its moments. From these we obtain a suitable factorization of the characteristic function (c.f.) of the logarithm of the l.r.t. statistic, which enables us to develop highly manageable and precise near-exact distributions for the test statistic.

  4. Effects of Item Exposure for Conventional Examinations in a Continuous Testing Environment.

    ERIC Educational Resources Information Center

    Hertz, Norman R.; Chinn, Roberta N.

    This study explored the effect of item exposure on two conventional examinations administered as computer-based tests. A principal hypothesis was that item exposure would have little or no effect on average difficulty of the items over the course of an administrative cycle. This hypothesis was tested by exploring conventional item statistics and…

  5. Directional and Non-directional Hypothesis Testing: A Survey of SIG Members, Journals, and Textbooks.

    ERIC Educational Resources Information Center

    McNeil, Keith

    The use of directional and nondirectional hypothesis testing was examined from the perspectives of textbooks, journal articles, and members of editorial boards. Three widely used statistical texts were reviewed in terms of how directional and nondirectional tests of significance were presented. Texts reviewed were written by: (1) D. E. Hinkle, W.…

  6. Statistical hypothesis testing and common misinterpretations: Should we abandon p-value in forensic science applications?

    PubMed

    Taroni, F; Biedermann, A; Bozza, S

    2016-02-01

    Many people regard the concept of hypothesis testing as fundamental to inferential statistics. Various schools of thought, in particular frequentist and Bayesian, have promoted radically different solutions for taking a decision about the plausibility of competing hypotheses. Comprehensive philosophical comparisons about their advantages and drawbacks are widely available and continue to span over large debates in the literature. More recently, controversial discussion was initiated by an editorial decision of a scientific journal [1] to refuse any paper submitted for publication containing null hypothesis testing procedures. Since the large majority of papers published in forensic journals propose the evaluation of statistical evidence based on the so called p-values, it is of interest to expose the discussion of this journal's decision within the forensic science community. This paper aims to provide forensic science researchers with a primer on the main concepts and their implications for making informed methodological choices. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  7. Using VITA Service Learning Experiences to Teach Hypothesis Testing and P-Value Analysis

    ERIC Educational Resources Information Center

    Drougas, Anne; Harrington, Steve

    2011-01-01

    This paper describes a hypothesis testing project designed to capture student interest and stimulate classroom interaction and communication. Using an online survey instrument, the authors collected student demographic information and data regarding university service learning experiences. Introductory statistics students performed a series of…

  8. Statistical Inference at Work: Statistical Process Control as an Example

    ERIC Educational Resources Information Center

    Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia

    2008-01-01

    To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…

  9. The Harm Done to Reproducibility by the Culture of Null Hypothesis Significance Testing.

    PubMed

    Lash, Timothy L

    2017-09-15

    In the last few years, stakeholders in the scientific community have raised alarms about a perceived lack of reproducibility of scientific results. In reaction, guidelines for journals have been promulgated and grant applicants have been asked to address the rigor and reproducibility of their proposed projects. Neither solution addresses a primary culprit, which is the culture of null hypothesis significance testing that dominates statistical analysis and inference. In an innovative research enterprise, selection of results for further evaluation based on null hypothesis significance testing is doomed to yield a low proportion of reproducible results and a high proportion of effects that are initially overestimated. In addition, the culture of null hypothesis significance testing discourages quantitative adjustments to account for systematic errors and quantitative incorporation of prior information. These strategies would otherwise improve reproducibility and have not been previously proposed in the widely cited literature on this topic. Without discarding the culture of null hypothesis significance testing and implementing these alternative methods for statistical analysis and inference, all other strategies for improving reproducibility will yield marginal gains at best. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  10. Testing for nonlinearity in time series: The method of surrogate data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Theiler, J.; Galdrikian, B.; Longtin, A.

    1991-01-01

    We describe a statistical approach for identifying nonlinearity in time series; in particular, we want to avoid claims of chaos when simpler models (such as linearly correlated noise) can explain the data. The method requires a careful statement of the null hypothesis which characterizes a candidate linear process, the generation of an ensemble of surrogate'' data sets which are similar to the original time series but consistent with the null hypothesis, and the computation of a discriminating statistic for the original and for each of the surrogate data sets. The idea is to test the original time series against themore » null hypothesis by checking whether the discriminating statistic computed for the original time series differs significantly from the statistics computed for each of the surrogate sets. We present algorithms for generating surrogate data under various null hypotheses, and we show the results of numerical experiments on artificial data using correlation dimension, Lyapunov exponent, and forecasting error as discriminating statistics. Finally, we consider a number of experimental time series -- including sunspots, electroencephalogram (EEG) signals, and fluid convection -- and evaluate the statistical significance of the evidence for nonlinear structure in each case. 56 refs., 8 figs.« less

  11. Sources of Error and the Statistical Formulation of M S: m b Seismic Event Screening Analysis

    NASA Astrophysics Data System (ADS)

    Anderson, D. N.; Patton, H. J.; Taylor, S. R.; Bonner, J. L.; Selby, N. D.

    2014-03-01

    The Comprehensive Nuclear-Test-Ban Treaty (CTBT), a global ban on nuclear explosions, is currently in a ratification phase. Under the CTBT, an International Monitoring System (IMS) of seismic, hydroacoustic, infrasonic and radionuclide sensors is operational, and the data from the IMS is analysed by the International Data Centre (IDC). The IDC provides CTBT signatories basic seismic event parameters and a screening analysis indicating whether an event exhibits explosion characteristics (for example, shallow depth). An important component of the screening analysis is a statistical test of the null hypothesis H 0: explosion characteristics using empirical measurements of seismic energy (magnitudes). The established magnitude used for event size is the body-wave magnitude (denoted m b) computed from the initial segment of a seismic waveform. IDC screening analysis is applied to events with m b greater than 3.5. The Rayleigh wave magnitude (denoted M S) is a measure of later arriving surface wave energy. Magnitudes are measurements of seismic energy that include adjustments (physical correction model) for path and distance effects between event and station. Relative to m b, earthquakes generally have a larger M S magnitude than explosions. This article proposes a hypothesis test (screening analysis) using M S and m b that expressly accounts for physical correction model inadequacy in the standard error of the test statistic. With this hypothesis test formulation, the 2009 Democratic Peoples Republic of Korea announced nuclear weapon test fails to reject the null hypothesis H 0: explosion characteristics.

  12. The Need for Nuance in the Null Hypothesis Significance Testing Debate

    ERIC Educational Resources Information Center

    Häggström, Olle

    2017-01-01

    Null hypothesis significance testing (NHST) provides an important statistical toolbox, but there are a number of ways in which it is often abused and misinterpreted, with bad consequences for the reliability and progress of science. Parts of contemporary NHST debate, especially in the psychological sciences, is reviewed, and a suggestion is made…

  13. Thou Shalt Not Bear False Witness against Null Hypothesis Significance Testing

    ERIC Educational Resources Information Center

    García-Pérez, Miguel A.

    2017-01-01

    Null hypothesis significance testing (NHST) has been the subject of debate for decades and alternative approaches to data analysis have been proposed. This article addresses this debate from the perspective of scientific inquiry and inference. Inference is an inverse problem and application of statistical methods cannot reveal whether effects…

  14. Appropriate Statistical Analysis for Two Independent Groups of Likert-Type Data

    ERIC Educational Resources Information Center

    Warachan, Boonyasit

    2011-01-01

    The objective of this research was to determine the robustness and statistical power of three different methods for testing the hypothesis that ordinal samples of five and seven Likert categories come from equal populations. The three methods are the two sample t-test with equal variances, the Mann-Whitney test, and the Kolmogorov-Smirnov test. In…

  15. Determining the Statistical Power of the Kolmogorov-Smirnov and Anderson-Darling Goodness-of-Fit Tests via Monte Carlo Simulation

    DTIC Science & Technology

    2016-12-01

    KS and AD Statistical Power via Monte Carlo Simulation Statistical power is the probability of correctly rejecting the null hypothesis when the...Select a caveat DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited. Determining the Statistical Power...real-world data to test the accuracy of the simulation. Statistical comparison of these metrics can be necessary when making such a determination

  16. Statistical Power in Evaluations That Investigate Effects on Multiple Outcomes: A Guide for Researchers

    ERIC Educational Resources Information Center

    Porter, Kristin E.

    2018-01-01

    Researchers are often interested in testing the effectiveness of an intervention on multiple outcomes, for multiple subgroups, at multiple points in time, or across multiple treatment groups. The resulting multiplicity of statistical hypothesis tests can lead to spurious findings of effects. Multiple testing procedures (MTPs) are statistical…

  17. Interpreting “statistical hypothesis testing” results in clinical research

    PubMed Central

    Sarmukaddam, Sanjeev B.

    2012-01-01

    Difference between “Clinical Significance and Statistical Significance” should be kept in mind while interpreting “statistical hypothesis testing” results in clinical research. This fact is already known to many but again pointed out here as philosophy of “statistical hypothesis testing” is sometimes unnecessarily criticized mainly due to failure in considering such distinction. Randomized controlled trials are also wrongly criticized similarly. Some scientific method may not be applicable in some peculiar/particular situation does not mean that the method is useless. Also remember that “statistical hypothesis testing” is not for decision making and the field of “decision analysis” is very much an integral part of science of statistics. It is not correct to say that “confidence intervals have nothing to do with confidence” unless one understands meaning of the word “confidence” as used in context of confidence interval. Interpretation of the results of every study should always consider all possible alternative explanations like chance, bias, and confounding. Statistical tests in inferential statistics are, in general, designed to answer the question “How likely is the difference found in random sample(s) is due to chance” and therefore limitation of relying only on statistical significance in making clinical decisions should be avoided. PMID:22707861

  18. Multiple Hypothesis Testing for Experimental Gingivitis Based on Wilcoxon Signed Rank Statistics

    PubMed Central

    Preisser, John S.; Sen, Pranab K.; Offenbacher, Steven

    2011-01-01

    Dental research often involves repeated multivariate outcomes on a small number of subjects for which there is interest in identifying outcomes that exhibit change in their levels over time as well as to characterize the nature of that change. In particular, periodontal research often involves the analysis of molecular mediators of inflammation for which multivariate parametric methods are highly sensitive to outliers and deviations from Gaussian assumptions. In such settings, nonparametric methods may be favored over parametric ones. Additionally, there is a need for statistical methods that control an overall error rate for multiple hypothesis testing. We review univariate and multivariate nonparametric hypothesis tests and apply them to longitudinal data to assess changes over time in 31 biomarkers measured from the gingival crevicular fluid in 22 subjects whereby gingivitis was induced by temporarily withholding tooth brushing. To identify biomarkers that can be induced to change, multivariate Wilcoxon signed rank tests for a set of four summary measures based upon area under the curve are applied for each biomarker and compared to their univariate counterparts. Multiple hypothesis testing methods with choice of control of the false discovery rate or strong control of the family-wise error rate are examined. PMID:21984957

  19. Using Guided Reinvention to Develop Teachers' Understanding of Hypothesis Testing Concepts

    ERIC Educational Resources Information Center

    Dolor, Jason; Noll, Jennifer

    2015-01-01

    Statistics education reform efforts emphasize the importance of informal inference in the learning of statistics. Research suggests statistics teachers experience similar difficulties understanding statistical inference concepts as students and how teacher knowledge can impact student learning. This study investigates how teachers reinvented an…

  20. Bayesian Hypothesis Testing for Psychologists: A Tutorial on the Savage-Dickey Method

    ERIC Educational Resources Information Center

    Wagenmakers, Eric-Jan; Lodewyckx, Tom; Kuriyal, Himanshu; Grasman, Raoul

    2010-01-01

    In the field of cognitive psychology, the "p"-value hypothesis test has established a stranglehold on statistical reporting. This is unfortunate, as the "p"-value provides at best a rough estimate of the evidence that the data provide for the presence of an experimental effect. An alternative and arguably more appropriate measure of evidence is…

  1. The Genesis of Pedophilia: Testing the "Abuse-to-Abuser" Hypothesis.

    ERIC Educational Resources Information Center

    Fedoroff, J. Paul; Pinkus, Shari

    1996-01-01

    This study tested three versions of the "abuse-to-abuser" hypothesis by comparing men with personal histories of sexual abuse and men without sexual abuse histories. There was a statistically non-significant trend for assaulted offenders to be more likely as adults to commit genital assaults on children. Implications for the abuse-to-abuser…

  2. Bayesian hypothesis testing for human threat conditioning research: an introduction and the condir R package

    PubMed Central

    Krypotos, Angelos-Miltiadis; Klugkist, Irene; Engelhard, Iris M.

    2017-01-01

    ABSTRACT Threat conditioning procedures have allowed the experimental investigation of the pathogenesis of Post-Traumatic Stress Disorder. The findings of these procedures have also provided stable foundations for the development of relevant intervention programs (e.g. exposure therapy). Statistical inference of threat conditioning procedures is commonly based on p-values and Null Hypothesis Significance Testing (NHST). Nowadays, however, there is a growing concern about this statistical approach, as many scientists point to the various limitations of p-values and NHST. As an alternative, the use of Bayes factors and Bayesian hypothesis testing has been suggested. In this article, we apply this statistical approach to threat conditioning data. In order to enable the easy computation of Bayes factors for threat conditioning data we present a new R package named condir, which can be used either via the R console or via a Shiny application. This article provides both a non-technical introduction to Bayesian analysis for researchers using the threat conditioning paradigm, and the necessary tools for computing Bayes factors easily. PMID:29038683

  3. [Prosthodontic research design from the standpoint of statistical analysis: learning and knowing the research design].

    PubMed

    Tanoue, Naomi

    2007-10-01

    For any kind of research, "Research Design" is the most important. The design is used to structure the research, to show how all of the major parts of the research project. It is necessary for all the researchers to begin the research after planning research design for what is the main theme, what is the background and reference, what kind of data is needed, and what kind of analysis is needed. It seems to be a roundabout route, but, in fact, it will be a shortcut. The research methods must be appropriate to the objectives of the study. Regarding the hypothesis-testing research that is the traditional style of the research, the research design based on statistics is undoubtedly necessary considering that the research basically proves "a hypothesis" with data and statistics theory. On the subject of the clinical trial, which is the clinical version of the hypothesis-testing research, the statistical method must be mentioned in a clinical trial planning. This report describes the basis of the research design for a prosthodontics study.

  4. Monitoring Items in Real Time to Enhance CAT Security

    ERIC Educational Resources Information Center

    Zhang, Jinming; Li, Jie

    2016-01-01

    An IRT-based sequential procedure is developed to monitor items for enhancing test security. The procedure uses a series of statistical hypothesis tests to examine whether the statistical characteristics of each item under inspection have changed significantly during CAT administration. This procedure is compared with a previously developed…

  5. HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS.

    PubMed

    Song, Chi; Tseng, George C

    2014-01-01

    Meta-analysis techniques have been widely developed and applied in genomic applications, especially for combining multiple transcriptomic studies. In this paper, we propose an order statistic of p-values ( r th ordered p-value, rOP) across combined studies as the test statistic. We illustrate different hypothesis settings that detect gene markers differentially expressed (DE) "in all studies", "in the majority of studies", or "in one or more studies", and specify rOP as a suitable method for detecting DE genes "in the majority of studies". We develop methods to estimate the parameter r in rOP for real applications. Statistical properties such as its asymptotic behavior and a one-sided testing correction for detecting markers of concordant expression changes are explored. Power calculation and simulation show better performance of rOP compared to classical Fisher's method, Stouffer's method, minimum p-value method and maximum p-value method under the focused hypothesis setting. Theoretically, rOP is found connected to the naïve vote counting method and can be viewed as a generalized form of vote counting with better statistical properties. The method is applied to three microarray meta-analysis examples including major depressive disorder, brain cancer and diabetes. The results demonstrate rOP as a more generalizable, robust and sensitive statistical framework to detect disease-related markers.

  6. The researcher and the consultant: from testing to probability statements.

    PubMed

    Hamra, Ghassan B; Stang, Andreas; Poole, Charles

    2015-09-01

    In the first instalment of this series, Stang and Poole provided an overview of Fisher significance testing (ST), Neyman-Pearson null hypothesis testing (NHT), and their unfortunate and unintended offspring, null hypothesis significance testing. In addition to elucidating the distinction between the first two and the evolution of the third, the authors alluded to alternative models of statistical inference; namely, Bayesian statistics. Bayesian inference has experienced a revival in recent decades, with many researchers advocating for its use as both a complement and an alternative to NHT and ST. This article will continue in the direction of the first instalment, providing practicing researchers with an introduction to Bayesian inference. Our work will draw on the examples and discussion of the previous dialogue.

  7. Power Enhancement in High Dimensional Cross-Sectional Tests

    PubMed Central

    Fan, Jianqing; Liao, Yuan; Yao, Jiawei

    2016-01-01

    We propose a novel technique to boost the power of testing a high-dimensional vector H : θ = 0 against sparse alternatives where the null hypothesis is violated only by a couple of components. Existing tests based on quadratic forms such as the Wald statistic often suffer from low powers due to the accumulation of errors in estimating high-dimensional parameters. More powerful tests for sparse alternatives such as thresholding and extreme-value tests, on the other hand, require either stringent conditions or bootstrap to derive the null distribution and often suffer from size distortions due to the slow convergence. Based on a screening technique, we introduce a “power enhancement component”, which is zero under the null hypothesis with high probability, but diverges quickly under sparse alternatives. The proposed test statistic combines the power enhancement component with an asymptotically pivotal statistic, and strengthens the power under sparse alternatives. The null distribution does not require stringent regularity conditions, and is completely determined by that of the pivotal statistic. As specific applications, the proposed methods are applied to testing the factor pricing models and validating the cross-sectional independence in panel data models. PMID:26778846

  8. Hypothesis testing of scientific Monte Carlo calculations.

    PubMed

    Wallerberger, Markus; Gull, Emanuel

    2017-11-01

    The steadily increasing size of scientific Monte Carlo simulations and the desire for robust, correct, and reproducible results necessitates rigorous testing procedures for scientific simulations in order to detect numerical problems and programming bugs. However, the testing paradigms developed for deterministic algorithms have proven to be ill suited for stochastic algorithms. In this paper we demonstrate explicitly how the technique of statistical hypothesis testing, which is in wide use in other fields of science, can be used to devise automatic and reliable tests for Monte Carlo methods, and we show that these tests are able to detect some of the common problems encountered in stochastic scientific simulations. We argue that hypothesis testing should become part of the standard testing toolkit for scientific simulations.

  9. Hypothesis testing of scientific Monte Carlo calculations

    NASA Astrophysics Data System (ADS)

    Wallerberger, Markus; Gull, Emanuel

    2017-11-01

    The steadily increasing size of scientific Monte Carlo simulations and the desire for robust, correct, and reproducible results necessitates rigorous testing procedures for scientific simulations in order to detect numerical problems and programming bugs. However, the testing paradigms developed for deterministic algorithms have proven to be ill suited for stochastic algorithms. In this paper we demonstrate explicitly how the technique of statistical hypothesis testing, which is in wide use in other fields of science, can be used to devise automatic and reliable tests for Monte Carlo methods, and we show that these tests are able to detect some of the common problems encountered in stochastic scientific simulations. We argue that hypothesis testing should become part of the standard testing toolkit for scientific simulations.

  10. To P or Not to P: Backing Bayesian Statistics.

    PubMed

    Buchinsky, Farrel J; Chadha, Neil K

    2017-12-01

    In biomedical research, it is imperative to differentiate chance variation from truth before we generalize what we see in a sample of subjects to the wider population. For decades, we have relied on null hypothesis significance testing, where we calculate P values for our data to decide whether to reject a null hypothesis. This methodology is subject to substantial misinterpretation and errant conclusions. Instead of working backward by calculating the probability of our data if the null hypothesis were true, Bayesian statistics allow us instead to work forward, calculating the probability of our hypothesis given the available data. This methodology gives us a mathematical means of incorporating our "prior probabilities" from previous study data (if any) to produce new "posterior probabilities." Bayesian statistics tell us how confidently we should believe what we believe. It is time to embrace and encourage their use in our otolaryngology research.

  11. An Empirical Comparison of Selected Two-Sample Hypothesis Testing Procedures Which Are Locally Most Powerful Under Certain Conditions.

    ERIC Educational Resources Information Center

    Hoover, H. D.; Plake, Barbara

    The relative power of the Mann-Whitney statistic, the t-statistic, the median test, a test based on exceedances (A,B), and two special cases of (A,B) the Tukey quick test and the revised Tukey quick test, was investigated via a Monte Carlo experiment. These procedures were compared across four population probability models: uniform, beta, normal,…

  12. Statistical Hypothesis Testing in Intraspecific Phylogeography: NCPA versus ABC

    PubMed Central

    Templeton, Alan R.

    2009-01-01

    Nested clade phylogeographic analysis (NCPA) and approximate Bayesian computation (ABC) have been used to test phylogeographic hypotheses. Multilocus NCPA tests null hypotheses, whereas ABC discriminates among a finite set of alternatives. The interpretive criteria of NCPA are explicit and allow complex models to be built from simple components. The interpretive criteria of ABC are ad hoc and require the specification of a complete phylogeographic model. The conclusions from ABC are often influenced by implicit assumptions arising from the many parameters needed to specify a complex model. These complex models confound many assumptions so that biological interpretations are difficult. Sampling error is accounted for in NCPA, but ABC ignores important sources of sampling error that creates pseudo-statistical power. NCPA generates the full sampling distribution of its statistics, but ABC only yields local probabilities, which in turn make it impossible to distinguish between a good fitting model, a non-informative model, and an over-determined model. Both NCPA and ABC use approximations, but convergences of the approximations used in NCPA are well defined whereas those in ABC are not. NCPA can analyze a large number of locations, but ABC cannot. Finally, the dimensionality of tested hypothesis is known in NCPA, but not for ABC. As a consequence, the “probabilities” generated by ABC are not true probabilities and are statistically non-interpretable. Accordingly, ABC should not be used for hypothesis testing, but simulation approaches are valuable when used in conjunction with NCPA or other methods that do not rely on highly parameterized models. PMID:19192182

  13. The frequentist implications of optional stopping on Bayesian hypothesis tests.

    PubMed

    Sanborn, Adam N; Hills, Thomas T

    2014-04-01

    Null hypothesis significance testing (NHST) is the most commonly used statistical methodology in psychology. The probability of achieving a value as extreme or more extreme than the statistic obtained from the data is evaluated, and if it is low enough, the null hypothesis is rejected. However, because common experimental practice often clashes with the assumptions underlying NHST, these calculated probabilities are often incorrect. Most commonly, experimenters use tests that assume that sample sizes are fixed in advance of data collection but then use the data to determine when to stop; in the limit, experimenters can use data monitoring to guarantee that the null hypothesis will be rejected. Bayesian hypothesis testing (BHT) provides a solution to these ills because the stopping rule used is irrelevant to the calculation of a Bayes factor. In addition, there are strong mathematical guarantees on the frequentist properties of BHT that are comforting for researchers concerned that stopping rules could influence the Bayes factors produced. Here, we show that these guaranteed bounds have limited scope and often do not apply in psychological research. Specifically, we quantitatively demonstrate the impact of optional stopping on the resulting Bayes factors in two common situations: (1) when the truth is a combination of the hypotheses, such as in a heterogeneous population, and (2) when a hypothesis is composite-taking multiple parameter values-such as the alternative hypothesis in a t-test. We found that, for these situations, while the Bayesian interpretation remains correct regardless of the stopping rule used, the choice of stopping rule can, in some situations, greatly increase the chance of experimenters finding evidence in the direction they desire. We suggest ways to control these frequentist implications of stopping rules on BHT.

  14. Bayes factor and posterior probability: Complementary statistical evidence to p-value.

    PubMed

    Lin, Ruitao; Yin, Guosheng

    2015-09-01

    As a convention, a p-value is often computed in hypothesis testing and compared with the nominal level of 0.05 to determine whether to reject the null hypothesis. Although the smaller the p-value, the more significant the statistical test, it is difficult to perceive the p-value in a probability scale and quantify it as the strength of the data against the null hypothesis. In contrast, the Bayesian posterior probability of the null hypothesis has an explicit interpretation of how strong the data support the null. We make a comparison of the p-value and the posterior probability by considering a recent clinical trial. The results show that even when we reject the null hypothesis, there is still a substantial probability (around 20%) that the null is true. Not only should we examine whether the data would have rarely occurred under the null hypothesis, but we also need to know whether the data would be rare under the alternative. As a result, the p-value only provides one side of the information, for which the Bayes factor and posterior probability may offer complementary evidence. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. Optimizing Aircraft Availability: Where to Spend Your Next O&M Dollar

    DTIC Science & Technology

    2010-03-01

    patterns of variance are present. In addition, we use the Breusch - Pagan test to statistically determine whether homoscedasticity exists. For this... Breusch - Pagan test , large p-values are preferred so that we may accept the null hypothesis of normality. Failure to meet the fourth assumption is...Next, we show the residual by predicted plot and the Breusch - Pagan test for constant variance of the residuals. The null hypothesis is that the

  16. Estimating Required Contingency Funds for Construction Projects using Multiple Linear Regression

    DTIC Science & Technology

    2006-03-01

    Breusch - Pagan test , in which the null hypothesis states that the residuals have constant variance. The alternate hypothesis is that the residuals do not...variance, the Breusch - Pagan test provides statistical evidence that the assumption is justified. For the proposed model, the p-value is 0.173...entire test sample. v Acknowledgments First, I would like to acknowledge the influence and help of Greg Hoffman. His work served as the

  17. Formulating appropriate statistical hypotheses for treatment comparison in clinical trial design and analysis.

    PubMed

    Huang, Peng; Ou, Ai-hua; Piantadosi, Steven; Tan, Ming

    2014-11-01

    We discuss the problem of properly defining treatment superiority through the specification of hypotheses in clinical trials. The need to precisely define the notion of superiority in a one-sided hypothesis test problem has been well recognized by many authors. Ideally designed null and alternative hypotheses should correspond to a partition of all possible scenarios of underlying true probability models P={P(ω):ω∈Ω} such that the alternative hypothesis Ha={P(ω):ω∈Ωa} can be inferred upon the rejection of null hypothesis Ho={P(ω):ω∈Ω(o)} However, in many cases, tests are carried out and recommendations are made without a precise definition of superiority or a specification of alternative hypothesis. Moreover, in some applications, the union of probability models specified by the chosen null and alternative hypothesis does not constitute a completed model collection P (i.e., H(o)∪H(a) is smaller than P). This not only imposes a strong non-validated assumption of the underlying true models, but also leads to different superiority claims depending on which test is used instead of scientific plausibility. Different ways to partition P fro testing treatment superiority often have different implications on sample size, power, and significance in both efficacy and comparative effectiveness trial design. Such differences are often overlooked. We provide a theoretical framework for evaluating the statistical properties of different specification of superiority in typical hypothesis testing. This can help investigators to select proper hypotheses for treatment comparison inclinical trial design. Copyright © 2014 Elsevier Inc. All rights reserved.

  18. On Some Assumptions of the Null Hypothesis Statistical Testing

    ERIC Educational Resources Information Center

    Patriota, Alexandre Galvão

    2017-01-01

    Bayesian and classical statistical approaches are based on different types of logical principles. In order to avoid mistaken inferences and misguided interpretations, the practitioner must respect the inference rules embedded into each statistical method. Ignoring these principles leads to the paradoxical conclusions that the hypothesis…

  19. Unscaled Bayes factors for multiple hypothesis testing in microarray experiments.

    PubMed

    Bertolino, Francesco; Cabras, Stefano; Castellanos, Maria Eugenia; Racugno, Walter

    2015-12-01

    Multiple hypothesis testing collects a series of techniques usually based on p-values as a summary of the available evidence from many statistical tests. In hypothesis testing, under a Bayesian perspective, the evidence for a specified hypothesis against an alternative, conditionally on data, is given by the Bayes factor. In this study, we approach multiple hypothesis testing based on both Bayes factors and p-values, regarding multiple hypothesis testing as a multiple model selection problem. To obtain the Bayes factors we assume default priors that are typically improper. In this case, the Bayes factor is usually undetermined due to the ratio of prior pseudo-constants. We show that ignoring prior pseudo-constants leads to unscaled Bayes factor which do not invalidate the inferential procedure in multiple hypothesis testing, because they are used within a comparative scheme. In fact, using partial information from the p-values, we are able to approximate the sampling null distribution of the unscaled Bayes factor and use it within Efron's multiple testing procedure. The simulation study suggests that under normal sampling model and even with small sample sizes, our approach provides false positive and false negative proportions that are less than other common multiple hypothesis testing approaches based only on p-values. The proposed procedure is illustrated in two simulation studies, and the advantages of its use are showed in the analysis of two microarray experiments. © The Author(s) 2011.

  20. Finite-sample and asymptotic sign-based tests for parameters of non-linear quantile regression with Markov noise

    NASA Astrophysics Data System (ADS)

    Sirenko, M. A.; Tarasenko, P. F.; Pushkarev, M. I.

    2017-01-01

    One of the most noticeable features of sign-based statistical procedures is an opportunity to build an exact test for simple hypothesis testing of parameters in a regression model. In this article, we expanded a sing-based approach to the nonlinear case with dependent noise. The examined model is a multi-quantile regression, which makes it possible to test hypothesis not only of regression parameters, but of noise parameters as well.

  1. Assessment of statistical education in Indonesia: Preliminary results and initiation to simulation-based inference

    NASA Astrophysics Data System (ADS)

    Saputra, K. V. I.; Cahyadi, L.; Sembiring, U. A.

    2018-01-01

    Start in this paper, we assess our traditional elementary statistics education and also we introduce elementary statistics with simulation-based inference. To assess our statistical class, we adapt the well-known CAOS (Comprehensive Assessment of Outcomes in Statistics) test that serves as an external measure to assess the student’s basic statistical literacy. This test generally represents as an accepted measure of statistical literacy. We also introduce a new teaching method on elementary statistics class. Different from the traditional elementary statistics course, we will introduce a simulation-based inference method to conduct hypothesis testing. From the literature, it has shown that this new teaching method works very well in increasing student’s understanding of statistics.

  2. Conservativeness in Rejection of the Null Hypothesis when Using the Continuity Correction in the MH Chi-Square Test in DIF Applications

    ERIC Educational Resources Information Center

    Paek, Insu

    2010-01-01

    Conservative bias in rejection of a null hypothesis from using the continuity correction in the Mantel-Haenszel (MH) procedure was examined through simulation in a differential item functioning (DIF) investigation context in which statistical testing uses a prespecified level [alpha] for the decision on an item with respect to DIF. The standard MH…

  3. Availability of Instructional Materials at the Basic Education Level in Enugu Educational Zone of Enugu State, Nigeria

    ERIC Educational Resources Information Center

    Chukwu, Leo C.; Eze, Thecla A. Y.; Agada, Fidelia Chinyelugo

    2016-01-01

    The study examined the availability of instructional materials at the basic education level in Enugu Education Zone of Enugu State, Nigeria. One research question and one hypothesis guided the study. The research question was answered using mean and grand mean ratings, while the hypothesis was tested using t-test statistics at 0.05 level of…

  4. Statistical considerations for agroforestry studies

    Treesearch

    James A. Baldwin

    1993-01-01

    Statistical topics that related to agroforestry studies are discussed. These included study objectives, populations of interest, sampling schemes, sample sizes, estimation vs. hypothesis testing, and P-values. In addition, a relatively new and very much improved histogram display is described.

  5. Introductory Statistics in the Garden

    ERIC Educational Resources Information Center

    Wagaman, John C.

    2017-01-01

    This article describes four semesters of introductory statistics courses that incorporate service learning and gardening into the curriculum with applications of the binomial distribution, least squares regression and hypothesis testing. The activities span multiple semesters and are iterative in nature.

  6. Revised standards for statistical evidence.

    PubMed

    Johnson, Valen E

    2013-11-26

    Recent advances in Bayesian hypothesis testing have led to the development of uniformly most powerful Bayesian tests, which represent an objective, default class of Bayesian hypothesis tests that have the same rejection regions as classical significance tests. Based on the correspondence between these two classes of tests, it is possible to equate the size of classical hypothesis tests with evidence thresholds in Bayesian tests, and to equate P values with Bayes factors. An examination of these connections suggest that recent concerns over the lack of reproducibility of scientific studies can be attributed largely to the conduct of significance tests at unjustifiably high levels of significance. To correct this problem, evidence thresholds required for the declaration of a significant finding should be increased to 25-50:1, and to 100-200:1 for the declaration of a highly significant finding. In terms of classical hypothesis tests, these evidence standards mandate the conduct of tests at the 0.005 or 0.001 level of significance.

  7. An Assessment of the Impact of Hafting on Paleoindian Point Variability

    PubMed Central

    Buchanan, Briggs; O'Brien, Michael J.; Kilby, J. David; Huckell, Bruce B.; Collard, Mark

    2012-01-01

    It has long been argued that the form of North American Paleoindian points was affected by hafting. According to this hypothesis, hafting constrained point bases such that they are less variable than point blades. The results of several studies have been claimed to be consistent with this hypothesis. However, there are reasons to be skeptical of these results. None of the studies employed statistical tests, and all of them focused on points recovered from kill and camp sites, which makes it difficult to be certain that the differences in variability are the result of hafting rather than a consequence of resharpening. Here, we report a study in which we tested the predictions of the hafting hypothesis by statistically comparing the variability of different parts of Clovis points. We controlled for the potentially confounding effects of resharpening by analyzing largely unused points from caches as well as points from kill and camp sites. The results of our analyses were not consistent with the predictions of the hypothesis. We found that several blade characters and point thickness were no more variable than the base characters. Our results indicate that the hafting hypothesis does not hold for Clovis points and indicate that there is a need to test its applicability in relation to post-Clovis Paleoindian points. PMID:22666320

  8. Suggestions for presenting the results of data analyses

    USGS Publications Warehouse

    Anderson, David R.; Link, William A.; Johnson, Douglas H.; Burnham, Kenneth P.

    2001-01-01

    We give suggestions for the presentation of research results from frequentist, information-theoretic, and Bayesian analysis paradigms, followed by several general suggestions. The information-theoretic and Bayesian methods offer alternative approaches to data analysis and inference compared to traditionally used methods. Guidance is lacking on the presentation of results under these alternative procedures and on nontesting aspects of classical frequentists methods of statistical analysis. Null hypothesis testing has come under intense criticism. We recommend less reporting of the results of statistical tests of null hypotheses in cases where the null is surely false anyway, or where the null hypothesis is of little interest to science or management.

  9. Does RAIM with Correct Exclusion Produce Unbiased Positions?

    PubMed Central

    Teunissen, Peter J. G.; Imparato, Davide; Tiberius, Christian C. J. M.

    2017-01-01

    As the navigation solution of exclusion-based RAIM follows from a combination of least-squares estimation and a statistically based exclusion-process, the computation of the integrity of the navigation solution has to take the propagated uncertainty of the combined estimation-testing procedure into account. In this contribution, we analyse, theoretically as well as empirically, the effect that this combination has on the first statistical moment, i.e., the mean, of the computed navigation solution. It will be shown, although statistical testing is intended to remove biases from the data, that biases will always remain under the alternative hypothesis, even when the correct alternative hypothesis is properly identified. The a posteriori exclusion of a biased satellite range from the position solution will therefore never remove the bias in the position solution completely. PMID:28672862

  10. Data-driven inference for the spatial scan statistic.

    PubMed

    Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C

    2011-08-02

    Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.

  11. Explorations in Statistics: Permutation Methods

    ERIC Educational Resources Information Center

    Curran-Everett, Douglas

    2012-01-01

    Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This eighth installment of "Explorations in Statistics" explores permutation methods, empiric procedures we can use to assess an experimental result--to test a null hypothesis--when we are reluctant to trust statistical…

  12. Phi Index: A New Metric to Test the Flush Early and Avoid the Rush Hypothesis

    PubMed Central

    Samia, Diogo S. M.; Blumstein, Daniel T.

    2014-01-01

    Optimal escape theory states that animals should counterbalance the costs and benefits of flight when escaping from a potential predator. However, in apparent contradiction with this well-established optimality model, birds and mammals generally initiate escape soon after beginning to monitor an approaching threat, a phenomena codified as the “Flush Early and Avoid the Rush” (FEAR) hypothesis. Typically, the FEAR hypothesis is tested using correlational statistics and is supported when there is a strong relationship between the distance at which an individual first responds behaviorally to an approaching predator (alert distance, AD), and its flight initiation distance (the distance at which it flees the approaching predator, FID). However, such correlational statistics are both inadequate to analyze relationships constrained by an envelope (such as that in the AD-FID relationship) and are sensitive to outliers with high leverage, which can lead one to erroneous conclusions. To overcome these statistical concerns we develop the phi index (Φ), a distribution-free metric to evaluate the goodness of fit of a 1∶1 relationship in a constraint envelope (the prediction of the FEAR hypothesis). Using both simulation and empirical data, we conclude that Φ is superior to traditional correlational analyses because it explicitly tests the FEAR prediction, is robust to outliers, and it controls for the disproportionate influence of observations from large predictor values (caused by the constrained envelope in AD-FID relationship). Importantly, by analyzing the empirical data we corroborate the strong effect that alertness has on flight as stated by the FEAR hypothesis. PMID:25405872

  13. Phi index: a new metric to test the flush early and avoid the rush hypothesis.

    PubMed

    Samia, Diogo S M; Blumstein, Daniel T

    2014-01-01

    Optimal escape theory states that animals should counterbalance the costs and benefits of flight when escaping from a potential predator. However, in apparent contradiction with this well-established optimality model, birds and mammals generally initiate escape soon after beginning to monitor an approaching threat, a phenomena codified as the "Flush Early and Avoid the Rush" (FEAR) hypothesis. Typically, the FEAR hypothesis is tested using correlational statistics and is supported when there is a strong relationship between the distance at which an individual first responds behaviorally to an approaching predator (alert distance, AD), and its flight initiation distance (the distance at which it flees the approaching predator, FID). However, such correlational statistics are both inadequate to analyze relationships constrained by an envelope (such as that in the AD-FID relationship) and are sensitive to outliers with high leverage, which can lead one to erroneous conclusions. To overcome these statistical concerns we develop the phi index (Φ), a distribution-free metric to evaluate the goodness of fit of a 1:1 relationship in a constraint envelope (the prediction of the FEAR hypothesis). Using both simulation and empirical data, we conclude that Φ is superior to traditional correlational analyses because it explicitly tests the FEAR prediction, is robust to outliers, and it controls for the disproportionate influence of observations from large predictor values (caused by the constrained envelope in AD-FID relationship). Importantly, by analyzing the empirical data we corroborate the strong effect that alertness has on flight as stated by the FEAR hypothesis.

  14. Statistical Analysis Techniques for Small Sample Sizes

    NASA Technical Reports Server (NTRS)

    Navard, S. E.

    1984-01-01

    The small sample sizes problem which is encountered when dealing with analysis of space-flight data is examined. Because of such a amount of data available, careful analyses are essential to extract the maximum amount of information with acceptable accuracy. Statistical analysis of small samples is described. The background material necessary for understanding statistical hypothesis testing is outlined and the various tests which can be done on small samples are explained. Emphasis is on the underlying assumptions of each test and on considerations needed to choose the most appropriate test for a given type of analysis.

  15. UNIFORMLY MOST POWERFUL BAYESIAN TESTS

    PubMed Central

    Johnson, Valen E.

    2014-01-01

    Uniformly most powerful tests are statistical hypothesis tests that provide the greatest power against a fixed null hypothesis among all tests of a given size. In this article, the notion of uniformly most powerful tests is extended to the Bayesian setting by defining uniformly most powerful Bayesian tests to be tests that maximize the probability that the Bayes factor, in favor of the alternative hypothesis, exceeds a specified threshold. Like their classical counterpart, uniformly most powerful Bayesian tests are most easily defined in one-parameter exponential family models, although extensions outside of this class are possible. The connection between uniformly most powerful tests and uniformly most powerful Bayesian tests can be used to provide an approximate calibration between p-values and Bayes factors. Finally, issues regarding the strong dependence of resulting Bayes factors and p-values on sample size are discussed. PMID:24659829

  16. Addendum to the article: Misuse of null hypothesis significance testing: Would estimation of positive and negative predictive values improve certainty of chemical risk assessment?

    PubMed

    Bundschuh, Mirco; Newman, Michael C; Zubrod, Jochen P; Seitz, Frank; Rosenfeldt, Ricki R; Schulz, Ralf

    2015-03-01

    We argued recently that the positive predictive value (PPV) and the negative predictive value (NPV) are valuable metrics to include during null hypothesis significance testing: They inform the researcher about the probability of statistically significant and non-significant test outcomes actually being true. Although commonly misunderstood, a reported p value estimates only the probability of obtaining the results or more extreme results if the null hypothesis of no effect was true. Calculations of the more informative PPV and NPV require a priori estimate of the probability (R). The present document discusses challenges of estimating R.

  17. A more powerful test based on ratio distribution for retention noninferiority hypothesis.

    PubMed

    Deng, Ling; Chen, Gang

    2013-03-11

    Rothmann et al. ( 2003 ) proposed a method for the statistical inference of fraction retention noninferiority (NI) hypothesis. A fraction retention hypothesis is defined as a ratio of the new treatment effect verse the control effect in the context of a time to event endpoint. One of the major concerns using this method in the design of an NI trial is that with a limited sample size, the power of the study is usually very low. This makes an NI trial not applicable particularly when using time to event endpoint. To improve power, Wang et al. ( 2006 ) proposed a ratio test based on asymptotic normality theory. Under a strong assumption (equal variance of the NI test statistic under null and alternative hypotheses), the sample size using Wang's test was much smaller than that using Rothmann's test. However, in practice, the assumption of equal variance is generally questionable for an NI trial design. This assumption is removed in the ratio test proposed in this article, which is derived directly from a Cauchy-like ratio distribution. In addition, using this method, the fundamental assumption used in Rothmann's test, that the observed control effect is always positive, that is, the observed hazard ratio for placebo over the control is greater than 1, is no longer necessary. Without assuming equal variance under null and alternative hypotheses, the sample size required for an NI trial can be significantly reduced if using the proposed ratio test for a fraction retention NI hypothesis.

  18. Statistical Power in Evaluations That Investigate Effects on Multiple Outcomes: A Guide for Researchers

    ERIC Educational Resources Information Center

    Porter, Kristin E.

    2016-01-01

    In education research and in many other fields, researchers are often interested in testing the effectiveness of an intervention on multiple outcomes, for multiple subgroups, at multiple points in time, or across multiple treatment groups. The resulting multiplicity of statistical hypothesis tests can lead to spurious findings of effects. Multiple…

  19. A basic introduction to statistics for the orthopaedic surgeon.

    PubMed

    Bertrand, Catherine; Van Riet, Roger; Verstreken, Frederik; Michielsen, Jef

    2012-02-01

    Orthopaedic surgeons should review the orthopaedic literature in order to keep pace with the latest insights and practices. A good understanding of basic statistical principles is of crucial importance to the ability to read articles critically, to interpret results and to arrive at correct conclusions. This paper explains some of the key concepts in statistics, including hypothesis testing, Type I and Type II errors, testing of normality, sample size and p values.

  20. A power comparison of generalized additive models and the spatial scan statistic in a case-control setting.

    PubMed

    Young, Robin L; Weinberg, Janice; Vieira, Verónica; Ozonoff, Al; Webster, Thomas F

    2010-07-19

    A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. The GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic.

  1. A power comparison of generalized additive models and the spatial scan statistic in a case-control setting

    PubMed Central

    2010-01-01

    Background A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. Results This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. Conclusions The GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic. PMID:20642827

  2. Wilcoxon's signed-rank statistic: what null hypothesis and why it matters.

    PubMed

    Li, Heng; Johnson, Terri

    2014-01-01

    In statistical literature, the term 'signed-rank test' (or 'Wilcoxon signed-rank test') has been used to refer to two distinct tests: a test for symmetry of distribution and a test for the median of a symmetric distribution, sharing a common test statistic. To avoid potential ambiguity, we propose to refer to those two tests by different names, as 'test for symmetry based on signed-rank statistic' and 'test for median based on signed-rank statistic', respectively. The utility of such terminological differentiation should become evident through our discussion of how those tests connect and contrast with sign test and one-sample t-test. Published 2014. This article is a U.S. Government work and is in the public domain in the USA. Published 2014. This article is a U.S. Government work and is in the public domain in the USA.

  3. Decision Support Systems: Applications in Statistics and Hypothesis Testing.

    ERIC Educational Resources Information Center

    Olsen, Christopher R.; Bozeman, William C.

    1988-01-01

    Discussion of the selection of appropriate statistical procedures by educators highlights a study conducted to investigate the effectiveness of decision aids in facilitating the use of appropriate statistics. Experimental groups and a control group using a printed flow chart, a computer-based decision aid, and a standard text are described. (11…

  4. Efficiency Analysis: Enhancing the Statistical and Evaluative Power of the Regression-Discontinuity Design.

    ERIC Educational Resources Information Center

    Madhere, Serge

    An analytic procedure, efficiency analysis, is proposed for improving the utility of quantitative program evaluation for decision making. The three features of the procedure are explained: (1) for statistical control, it adopts and extends the regression-discontinuity design; (2) for statistical inferences, it de-emphasizes hypothesis testing in…

  5. Confidence Intervals for Effect Sizes: Applying Bootstrap Resampling

    ERIC Educational Resources Information Center

    Banjanovic, Erin S.; Osborne, Jason W.

    2016-01-01

    Confidence intervals for effect sizes (CIES) provide readers with an estimate of the strength of a reported statistic as well as the relative precision of the point estimate. These statistics offer more information and context than null hypothesis statistic testing. Although confidence intervals have been recommended by scholars for many years,…

  6. On the insignificance of Herschel's sunspot correlation

    NASA Astrophysics Data System (ADS)

    Love, Jeffrey J.

    2013-08-01

    We examine William Herschel's hypothesis that solar-cycle variation of the Sun's irradiance has a modulating effect on the Earth's climate and that this is, specifically, manifested as an anticorrelation between sunspot number and the market price of wheat. Since Herschel first proposed his hypothesis in 1801, it has been regarded with both interest and skepticism. Recently, reports have been published that either support Herschel's hypothesis or rely on its validity. As a test of Herschel's hypothesis, we seek to reject a null hypothesis of a statistically random correlation between historical sunspot numbers, wheat prices in London and the United States, and wheat farm yields in the United States. We employ binary-correlation, Pearson-correlation, and frequency-domain methods. We test our methods using a historical geomagnetic activity index, well known to be causally correlated with sunspot number. As expected, the measured correlation between sunspot number and geomagnetic activity would be an unlikely realization of random data; the correlation is "statistically significant." On the other hand, measured correlations between sunspot number and wheat price and wheat yield data would be very likely realizations of random data; these correlations are "insignificant." Therefore, Herschel's hypothesis must be regarded with skepticism. We compare and contrast our results with those of other researchers. We discuss procedures for evaluating hypotheses that are formulated from historical data.

  7. An omnibus test for the global null hypothesis.

    PubMed

    Futschik, Andreas; Taus, Thomas; Zehetmayer, Sonja

    2018-01-01

    Global hypothesis tests are a useful tool in the context of clinical trials, genetic studies, or meta-analyses, when researchers are not interested in testing individual hypotheses, but in testing whether none of the hypotheses is false. There are several possibilities how to test the global null hypothesis when the individual null hypotheses are independent. If it is assumed that many of the individual null hypotheses are false, combination tests have been recommended to maximize power. If, however, it is assumed that only one or a few null hypotheses are false, global tests based on individual test statistics are more powerful (e.g. Bonferroni or Simes test). However, usually there is no a priori knowledge on the number of false individual null hypotheses. We therefore propose an omnibus test based on cumulative sums of the transformed p-values. We show that this test yields an impressive overall performance. The proposed method is implemented in an R-package called omnibus.

  8. Significance tests for functional data with complex dependence structure.

    PubMed

    Staicu, Ana-Maria; Lahiri, Soumen N; Carroll, Raymond J

    2015-01-01

    We propose an L 2 -norm based global testing procedure for the null hypothesis that multiple group mean functions are equal, for functional data with complex dependence structure. Specifically, we consider the setting of functional data with a multilevel structure of the form groups-clusters or subjects-units, where the unit-level profiles are spatially correlated within the cluster, and the cluster-level data are independent. Orthogonal series expansions are used to approximate the group mean functions and the test statistic is estimated using the basis coefficients. The asymptotic null distribution of the test statistic is developed, under mild regularity conditions. To our knowledge this is the first work that studies hypothesis testing, when data have such complex multilevel functional and spatial structure. Two small-sample alternatives, including a novel block bootstrap for functional data, are proposed, and their performance is examined in simulation studies. The paper concludes with an illustration of a motivating experiment.

  9. Pathogens and politics: further evidence that parasite prevalence predicts authoritarianism.

    PubMed

    Murray, Damian R; Schaller, Mark; Suedfeld, Peter

    2013-01-01

    According to a "parasite stress" hypothesis, authoritarian governments are more likely to emerge in regions characterized by a high prevalence of disease-causing pathogens. Recent cross-national evidence is consistent with this hypothesis, but there are inferential limitations associated with that evidence. We report two studies that address some of these limitations, and provide further tests of the hypothesis. Study 1 revealed that parasite prevalence strongly predicted cross-national differences on measures assessing individuals' authoritarian personalities, and this effect statistically mediated the relationship between parasite prevalence and authoritarian governance. The mediation result is inconsistent with an alternative explanation for previous findings. To address further limitations associated with cross-national comparisons, Study 2 tested the parasite stress hypothesis on a sample of traditional small-scale societies (the Standard Cross-Cultural Sample). Results revealed that parasite prevalence predicted measures of authoritarian governance, and did so even when statistically controlling for other threats to human welfare. (One additional threat-famine-also uniquely predicted authoritarianism.) Together, these results further substantiate the parasite stress hypothesis of authoritarianism, and suggest that societal differences in authoritarian governance result, in part, from cultural differences in individuals' authoritarian personalities.

  10. Pathogens and Politics: Further Evidence That Parasite Prevalence Predicts Authoritarianism

    PubMed Central

    Murray, Damian R.; Schaller, Mark; Suedfeld, Peter

    2013-01-01

    According to a "parasite stress" hypothesis, authoritarian governments are more likely to emerge in regions characterized by a high prevalence of disease-causing pathogens. Recent cross-national evidence is consistent with this hypothesis, but there are inferential limitations associated with that evidence. We report two studies that address some of these limitations, and provide further tests of the hypothesis. Study 1 revealed that parasite prevalence strongly predicted cross-national differences on measures assessing individuals' authoritarian personalities, and this effect statistically mediated the relationship between parasite prevalence and authoritarian governance. The mediation result is inconsistent with an alternative explanation for previous findings. To address further limitations associated with cross-national comparisons, Study 2 tested the parasite stress hypothesis on a sample of traditional small-scale societies (the Standard Cross-Cultural Sample). Results revealed that parasite prevalence predicted measures of authoritarian governance, and did so even when statistically controlling for other threats to human welfare. (One additional threat—famine—also uniquely predicted authoritarianism.) Together, these results further substantiate the parasite stress hypothesis of authoritarianism, and suggest that societal differences in authoritarian governance result, in part, from cultural differences in individuals' authoritarian personalities. PMID:23658718

  11. Experimental design, power and sample size for animal reproduction experiments.

    PubMed

    Chapman, Phillip L; Seidel, George E

    2008-01-01

    The present paper concerns statistical issues in the design of animal reproduction experiments, with emphasis on the problems of sample size determination and power calculations. We include examples and non-technical discussions aimed at helping researchers avoid serious errors that may invalidate or seriously impair the validity of conclusions from experiments. Screen shots from interactive power calculation programs and basic SAS power calculation programs are presented to aid in understanding statistical power and computing power in some common experimental situations. Practical issues that are common to most statistical design problems are briefly discussed. These include one-sided hypothesis tests, power level criteria, equality of within-group variances, transformations of response variables to achieve variance equality, optimal specification of treatment group sizes, 'post hoc' power analysis and arguments for the increased use of confidence intervals in place of hypothesis tests.

  12. Nonparametric estimation and testing of fixed effects panel data models

    PubMed Central

    Henderson, Daniel J.; Carroll, Raymond J.; Li, Qi

    2009-01-01

    In this paper we consider the problem of estimating nonparametric panel data models with fixed effects. We introduce an iterative nonparametric kernel estimator. We also extend the estimation method to the case of a semiparametric partially linear fixed effects model. To determine whether a parametric, semiparametric or nonparametric model is appropriate, we propose test statistics to test between the three alternatives in practice. We further propose a test statistic for testing the null hypothesis of random effects against fixed effects in a nonparametric panel data regression model. Simulations are used to examine the finite sample performance of the proposed estimators and the test statistics. PMID:19444335

  13. Is Conscious Stimulus Identification Dependent on Knowledge of the Perceptual Modality? Testing the “Source Misidentification Hypothesis”

    PubMed Central

    Overgaard, Morten; Lindeløv, Jonas; Svejstrup, Stinna; Døssing, Marianne; Hvid, Tanja; Kauffmann, Oliver; Mouridsen, Kim

    2013-01-01

    This paper reports an experiment intended to test a particular hypothesis derived from blindsight research, which we name the “source misidentification hypothesis.” According to this hypothesis, a subject may be correct about a stimulus without being correct about how she had access to this knowledge (whether the stimulus was visual, auditory, or something else). We test this hypothesis in healthy subjects, asking them to report whether a masked stimulus was presented auditorily or visually, what the stimulus was, and how clearly they experienced the stimulus using the Perceptual Awareness Scale (PAS). We suggest that knowledge about perceptual modality may be a necessary precondition in order to issue correct reports of which stimulus was presented. Furthermore, we find that PAS ratings correlate with correctness, and that subjects are at chance level when reporting no conscious experience of the stimulus. To demonstrate that particular levels of reporting accuracy are obtained, we employ a statistical strategy, which operationally tests the hypothesis of non-equality, such that the usual rejection of the null-hypothesis admits the conclusion of equivalence. PMID:23508677

  14. Invited Commentary: Can Issues With Reproducibility in Science Be Blamed on Hypothesis Testing?

    PubMed Central

    Weinberg, Clarice R.

    2017-01-01

    Abstract In the accompanying article (Am J Epidemiol. 2017;186(6):646–647), Dr. Timothy Lash makes a forceful case that the problems with reproducibility in science stem from our “culture” of null hypothesis significance testing. He notes that when attention is selectively given to statistically significant findings, the estimated effects will be systematically biased away from the null. Here I revisit the recent history of genetic epidemiology and argue for retaining statistical testing as an important part of the tool kit. Particularly when many factors are considered in an agnostic way, in what Lash calls “innovative” research, investigators need a selection strategy to identify which findings are most likely to be genuine, and hence worthy of further study. PMID:28938713

  15. How-To-Do-It: Snails, Pill Bugs, Mealworms, and Chi-Square? Using Invertebrate Behavior to Illustrate Hypothesis Testing with Chi-Square.

    ERIC Educational Resources Information Center

    Biermann, Carol

    1988-01-01

    Described is a study designed to introduce students to the behavior of common invertebrate animals, and to use of the chi-square statistical technique. Discusses activities with snails, pill bugs, and mealworms. Provides an abbreviated chi-square table and instructions for performing the experiments and statistical tests. (CW)

  16. Earthquake likelihood model testing

    USGS Publications Warehouse

    Schorlemmer, D.; Gerstenberger, M.C.; Wiemer, S.; Jackson, D.D.; Rhoades, D.A.

    2007-01-01

    INTRODUCTIONThe Regional Earthquake Likelihood Models (RELM) project aims to produce and evaluate alternate models of earthquake potential (probability per unit volume, magnitude, and time) for California. Based on differing assumptions, these models are produced to test the validity of their assumptions and to explore which models should be incorporated in seismic hazard and risk evaluation. Tests based on physical and geological criteria are useful but we focus on statistical methods using future earthquake catalog data only. We envision two evaluations: a test of consistency with observed data and a comparison of all pairs of models for relative consistency. Both tests are based on the likelihood method, and both are fully prospective (i.e., the models are not adjusted to fit the test data). To be tested, each model must assign a probability to any possible event within a specified region of space, time, and magnitude. For our tests the models must use a common format: earthquake rates in specified “bins” with location, magnitude, time, and focal mechanism limits.Seismology cannot yet deterministically predict individual earthquakes; however, it should seek the best possible models for forecasting earthquake occurrence. This paper describes the statistical rules of an experiment to examine and test earthquake forecasts. The primary purposes of the tests described below are to evaluate physical models for earthquakes, assure that source models used in seismic hazard and risk studies are consistent with earthquake data, and provide quantitative measures by which models can be assigned weights in a consensus model or be judged as suitable for particular regions.In this paper we develop a statistical method for testing earthquake likelihood models. A companion paper (Schorlemmer and Gerstenberger 2007, this issue) discusses the actual implementation of these tests in the framework of the RELM initiative.Statistical testing of hypotheses is a common task and a wide range of possible testing procedures exist. Jolliffe and Stephenson (2003) present different forecast verifications from atmospheric science, among them likelihood testing of probability forecasts and testing the occurrence of binary events. Testing binary events requires that for each forecasted event, the spatial, temporal and magnitude limits be given. Although major earthquakes can be considered binary events, the models within the RELM project express their forecasts on a spatial grid and in 0.1 magnitude units; thus the results are a distribution of rates over space and magnitude. These forecasts can be tested with likelihood tests.In general, likelihood tests assume a valid null hypothesis against which a given hypothesis is tested. The outcome is either a rejection of the null hypothesis in favor of the test hypothesis or a nonrejection, meaning the test hypothesis cannot outperform the null hypothesis at a given significance level. Within RELM, there is no accepted null hypothesis and thus the likelihood test needs to be expanded to allow comparable testing of equipollent hypotheses.To test models against one another, we require that forecasts are expressed in a standard format: the average rate of earthquake occurrence within pre-specified limits of hypocentral latitude, longitude, depth, magnitude, time period, and focal mechanisms. Focal mechanisms should either be described as the inclination of P-axis, declination of P-axis, and inclination of the T-axis, or as strike, dip, and rake angles. Schorlemmer and Gerstenberger (2007, this issue) designed classes of these parameters such that similar models will be tested against each other. These classes make the forecasts comparable between models. Additionally, we are limited to testing only what is precisely defined and consistently reported in earthquake catalogs. Therefore it is currently not possible to test such information as fault rupture length or area, asperity location, etc. Also, to account for data quality issues, we allow for location and magnitude uncertainties as well as the probability that an event is dependent on another event.As we mentioned above, only models with comparable forecasts can be tested against each other. Our current tests are designed to examine grid-based models. This requires that any fault-based model be adapted to a grid before testing is possible. While this is a limitation of the testing, it is an inherent difficulty in any such comparative testing. Please refer to appendix B for a statistical evaluation of the application of the Poisson hypothesis to fault-based models.The testing suite we present consists of three different tests: L-Test, N-Test, and R-Test. These tests are defined similarily to Kagan and Jackson (1995). The first two tests examine the consistency of the hypotheses with the observations while the last test compares the spatial performances of the models.

  17. Harnessing Multivariate Statistics for Ellipsoidal Data in Structural Geology

    NASA Astrophysics Data System (ADS)

    Roberts, N.; Davis, J. R.; Titus, S.; Tikoff, B.

    2015-12-01

    Most structural geology articles do not state significance levels, report confidence intervals, or perform regressions to find trends. This is, in part, because structural data tend to include directions, orientations, ellipsoids, and tensors, which are not treatable by elementary statistics. We describe a full procedural methodology for the statistical treatment of ellipsoidal data. We use a reconstructed dataset of deformed ooids in Maryland from Cloos (1947) to illustrate the process. Normalized ellipsoids have five degrees of freedom and can be represented by a second order tensor. This tensor can be permuted into a five dimensional vector that belongs to a vector space and can be treated with standard multivariate statistics. Cloos made several claims about the distribution of deformation in the South Mountain fold, Maryland, and we reexamine two particular claims using hypothesis testing: 1) octahedral shear strain increases towards the axial plane of the fold; 2) finite strain orientation varies systematically along the trend of the axial trace as it bends with the Appalachian orogen. We then test the null hypothesis that the southern segment of South Mountain is the same as the northern segment. This test illustrates the application of ellipsoidal statistics, which combine both orientation and shape. We report confidence intervals for each test, and graphically display our results with novel plots. This poster illustrates the importance of statistics in structural geology, especially when working with noisy or small datasets.

  18. Bug Distribution and Statistical Pattern Classification.

    ERIC Educational Resources Information Center

    Tatsuoka, Kikumi K.; Tatsuoka, Maurice M.

    1987-01-01

    The rule space model permits measurement of cognitive skill acquisition and error diagnosis. Further discussion introduces Bayesian hypothesis testing and bug distribution. An illustration involves an artificial intelligence approach to testing fractions and arithmetic. (Author/GDC)

  19. Spatio-temporal conditional inference and hypothesis tests for neural ensemble spiking precision

    PubMed Central

    Harrison, Matthew T.; Amarasingham, Asohan; Truccolo, Wilson

    2014-01-01

    The collective dynamics of neural ensembles create complex spike patterns with many spatial and temporal scales. Understanding the statistical structure of these patterns can help resolve fundamental questions about neural computation and neural dynamics. Spatio-temporal conditional inference (STCI) is introduced here as a semiparametric statistical framework for investigating the nature of precise spiking patterns from collections of neurons that is robust to arbitrarily complex and nonstationary coarse spiking dynamics. The main idea is to focus statistical modeling and inference, not on the full distribution of the data, but rather on families of conditional distributions of precise spiking given different types of coarse spiking. The framework is then used to develop families of hypothesis tests for probing the spatio-temporal precision of spiking patterns. Relationships among different conditional distributions are used to improve multiple hypothesis testing adjustments and to design novel Monte Carlo spike resampling algorithms. Of special note are algorithms that can locally jitter spike times while still preserving the instantaneous peri-stimulus time histogram (PSTH) or the instantaneous total spike count from a group of recorded neurons. The framework can also be used to test whether first-order maximum entropy models with possibly random and time-varying parameters can account for observed patterns of spiking. STCI provides a detailed example of the generic principle of conditional inference, which may be applicable in other areas of neurostatistical analysis. PMID:25380339

  20. Test Statistics and Confidence Intervals to Establish Noninferiority between Treatments with Ordinal Categorical Data.

    PubMed

    Zhang, Fanghong; Miyaoka, Etsuo; Huang, Fuping; Tanaka, Yutaka

    2015-01-01

    The problem for establishing noninferiority is discussed between a new treatment and a standard (control) treatment with ordinal categorical data. A measure of treatment effect is used and a method of specifying noninferiority margin for the measure is provided. Two Z-type test statistics are proposed where the estimation of variance is constructed under the shifted null hypothesis using U-statistics. Furthermore, the confidence interval and the sample size formula are given based on the proposed test statistics. The proposed procedure is applied to a dataset from a clinical trial. A simulation study is conducted to compare the performance of the proposed test statistics with that of the existing ones, and the results show that the proposed test statistics are better in terms of the deviation from nominal level and the power.

  1. How to Distinguish Normal from Disordered Children with Poor Language or Motor Skills

    ERIC Educational Resources Information Center

    Dyck, Murray; Piek, Jan

    2010-01-01

    Background & Aims: We tested the Diagnostic and Statistical Manual of Mental Disorders (DSM) hypothesis that so-called specific developmental disorders are marked by a pattern of specific discrepant achievement, and an alternative hypothesis that children with these disorders show a pattern of relatively pervasive low achievement. Methods…

  2. Beyond P Values and Hypothesis Testing: Using the Minimum Bayes Factor to Teach Statistical Inference in Undergraduate Introductory Statistics Courses

    ERIC Educational Resources Information Center

    Page, Robert; Satake, Eiki

    2017-01-01

    While interest in Bayesian statistics has been growing in statistics education, the treatment of the topic is still inadequate in both textbooks and the classroom. Because so many fields of study lead to careers that involve a decision-making process requiring an understanding of Bayesian methods, it is becoming increasingly clear that Bayesian…

  3. An Inferentialist Perspective on the Coordination of Actions and Reasons Involved in Making a Statistical Inference

    ERIC Educational Resources Information Center

    Bakker, Arthur; Ben-Zvi, Dani; Makar, Katie

    2017-01-01

    To understand how statistical and other types of reasoning are coordinated with actions to reduce uncertainty, we conducted a case study in vocational education that involved statistical hypothesis testing. We analyzed an intern's research project in a hospital laboratory in which reducing uncertainties was crucial to make a valid statistical…

  4. Elaborating Selected Statistical Concepts with Common Experience.

    ERIC Educational Resources Information Center

    Weaver, Kenneth A.

    1992-01-01

    Presents ways of elaborating statistical concepts so as to make course material more meaningful for students. Describes examples using exclamations, circus and cartoon characters, and falling leaves to illustrate variability, null hypothesis testing, and confidence interval. Concludes that the exercises increase student comprehension of the text…

  5. Research Design and Statistics for Applied Linguistics.

    ERIC Educational Resources Information Center

    Hatch, Evelyn; Farhady, Hossein

    An introduction to the conventions of research design and statistical analysis is presented for graduate students of applied linguistics. The chapters cover such concepts as the definition of research, variables, research designs, research report formats, sorting and displaying data, probability and hypothesis testing, comparing means,…

  6. Tests of Independence for Ordinal Data Using Bootstrap.

    ERIC Educational Resources Information Center

    Chan, Wai; Yung, Yiu-Fai; Bentler, Peter M.; Tang, Man-Lai

    1998-01-01

    Two bootstrap tests are proposed to test the independence hypothesis in a two-way cross table. Monte Carlo studies are used to compare the traditional asymptotic test with these bootstrap methods, and the bootstrap methods are found superior in two ways: control of Type I error and statistical power. (SLD)

  7. Three common misuses of P values

    PubMed Central

    Kim, Jeehyoung; Bang, Heejung

    2016-01-01

    “Significance” has a specific meaning in science, especially in statistics. The p-value as a measure of statistical significance (evidence against a null hypothesis) has long been used in statistical inference and has served as a key player in science and research. Despite its clear mathematical definition and original purpose, and being just one of the many statistical measures/criteria, its role has been over-emphasized along with hypothesis testing. Observing and reflecting on this practice, some journals have attempted to ban reporting of p-values, and the American Statistical Association (for the first time in its 177 year old history) released a statement on p-values in 2016. In this article, we intend to review the correct definition of the p-value as well as its common misuses, in the hope that our article is useful to clinicians and researchers. PMID:27695640

  8. Performing Inferential Statistics Prior to Data Collection

    ERIC Educational Resources Information Center

    Trafimow, David; MacDonald, Justin A.

    2017-01-01

    Typically, in education and psychology research, the investigator collects data and subsequently performs descriptive and inferential statistics. For example, a researcher might compute group means and use the null hypothesis significance testing procedure to draw conclusions about the populations from which the groups were drawn. We propose an…

  9. Resemblance profiles as clustering decision criteria: Estimating statistical power, error, and correspondence for a hypothesis test for multivariate structure.

    PubMed

    Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F

    2017-04-01

    Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.

  10. On the insignificance of Herschel's sunspot correlation

    USGS Publications Warehouse

    Love, Jeffrey J.

    2013-01-01

    We examine William Herschel's hypothesis that solar-cycle variation of the Sun's irradiance has a modulating effect on the Earth's climate and that this is, specifically, manifested as an anticorrelation between sunspot number and the market price of wheat. Since Herschel first proposed his hypothesis in 1801, it has been regarded with both interest and skepticism. Recently, reports have been published that either support Herschel's hypothesis or rely on its validity. As a test of Herschel's hypothesis, we seek to reject a null hypothesis of a statistically random correlation between historical sunspot numbers, wheat prices in London and the United States, and wheat farm yields in the United States. We employ binary-correlation, Pearson-correlation, and frequency-domain methods. We test our methods using a historical geomagnetic activity index, well known to be causally correlated with sunspot number. As expected, the measured correlation between sunspot number and geomagnetic activity would be an unlikely realization of random data; the correlation is “statistically significant.” On the other hand, measured correlations between sunspot number and wheat price and wheat yield data would be very likely realizations of random data; these correlations are “insignificant.” Therefore, Herschel's hypothesis must be regarded with skepticism. We compare and contrast our results with those of other researchers. We discuss procedures for evaluating hypotheses that are formulated from historical data.

  11. New Approaches to Robust Confidence Intervals for Location: A Simulation Study.

    DTIC Science & Technology

    1984-06-01

    obtain a denominator for the test statistic. Those statistics based on location estimates derived from Hampel’s redescending influence function or v...defined an influence function for a test in terms of the behavior of its P-values when the data are sampled from a model distribution modified by point...proposal could be used for interval estimation as well as hypothesis testing, the extension is immediate. Once an influence function has been defined

  12. Resampling-based Methods in Single and Multiple Testing for Equality of Covariance/Correlation Matrices

    PubMed Central

    Yang, Yang; DeGruttola, Victor

    2016-01-01

    Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients. PMID:22740584

  13. Resampling-based methods in single and multiple testing for equality of covariance/correlation matrices.

    PubMed

    Yang, Yang; DeGruttola, Victor

    2012-06-22

    Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients.

  14. Using the Coefficient of Determination "R"[superscript 2] to Test the Significance of Multiple Linear Regression

    ERIC Educational Resources Information Center

    Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.

    2013-01-01

    This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)

  15. Optimizing α for better statistical decisions: a case study involving the pace-of-life syndrome hypothesis: optimal α levels set to minimize Type I and II errors frequently result in different conclusions from those using α = 0.05.

    PubMed

    Mudge, Joseph F; Penny, Faith M; Houlahan, Jeff E

    2012-12-01

    Setting optimal significance levels that minimize Type I and Type II errors allows for more transparent and well-considered statistical decision making compared to the traditional α = 0.05 significance level. We use the optimal α approach to re-assess conclusions reached by three recently published tests of the pace-of-life syndrome hypothesis, which attempts to unify occurrences of different physiological, behavioral, and life history characteristics under one theory, over different scales of biological organization. While some of the conclusions reached using optimal α were consistent to those previously reported using the traditional α = 0.05 threshold, opposing conclusions were also frequently reached. The optimal α approach reduced probabilities of Type I and Type II errors, and ensured statistical significance was associated with biological relevance. Biologists should seriously consider their choice of α when conducting null hypothesis significance tests, as there are serious disadvantages with consistent reliance on the traditional but arbitrary α = 0.05 significance level. Copyright © 2012 WILEY Periodicals, Inc.

  16. Modeling Cross-Situational Word-Referent Learning: Prior Questions

    ERIC Educational Resources Information Center

    Yu, Chen; Smith, Linda B.

    2012-01-01

    Both adults and young children possess powerful statistical computation capabilities--they can infer the referent of a word from highly ambiguous contexts involving many words and many referents by aggregating cross-situational statistical information across contexts. This ability has been explained by models of hypothesis testing and by models of…

  17. Three Strategies for the Critical Use of Statistical Methods in Psychological Research

    ERIC Educational Resources Information Center

    Campitelli, Guillermo; Macbeth, Guillermo; Ospina, Raydonal; Marmolejo-Ramos, Fernando

    2017-01-01

    We present three strategies to replace the null hypothesis statistical significance testing approach in psychological research: (1) visual representation of cognitive processes and predictions, (2) visual representation of data distributions and choice of the appropriate distribution for analysis, and (3) model comparison. The three strategies…

  18. Application of Transformations in Parametric Inference

    ERIC Educational Resources Information Center

    Brownstein, Naomi; Pensky, Marianna

    2008-01-01

    The objective of the present paper is to provide a simple approach to statistical inference using the method of transformations of variables. We demonstrate performance of this powerful tool on examples of constructions of various estimation procedures, hypothesis testing, Bayes analysis and statistical inference for the stress-strength systems.…

  19. The MAX Statistic is Less Powerful for Genome Wide Association Studies Under Most Alternative Hypotheses.

    PubMed

    Shifflett, Benjamin; Huang, Rong; Edland, Steven D

    2017-01-01

    Genotypic association studies are prone to inflated type I error rates if multiple hypothesis testing is performed, e.g., sequentially testing for recessive, multiplicative, and dominant risk. Alternatives to multiple hypothesis testing include the model independent genotypic χ 2 test, the efficiency robust MAX statistic, which corrects for multiple comparisons but with some loss of power, or a single Armitage test for multiplicative trend, which has optimal power when the multiplicative model holds but with some loss of power when dominant or recessive models underlie the genetic association. We used Monte Carlo simulations to describe the relative performance of these three approaches under a range of scenarios. All three approaches maintained their nominal type I error rates. The genotypic χ 2 and MAX statistics were more powerful when testing a strictly recessive genetic effect or when testing a dominant effect when the allele frequency was high. The Armitage test for multiplicative trend was most powerful for the broad range of scenarios where heterozygote risk is intermediate between recessive and dominant risk. Moreover, all tests had limited power to detect recessive genetic risk unless the sample size was large, and conversely all tests were relatively well powered to detect dominant risk. Taken together, these results suggest the general utility of the multiplicative trend test when the underlying genetic model is unknown.

  20. Measuring and statistically testing the size of the effect of a chemical compound on a continuous in-vitro pharmacological response through a new statistical model of response detection limit

    PubMed Central

    Diaz, Francisco J.; McDonald, Peter R.; Pinter, Abraham; Chaguturu, Rathnam

    2018-01-01

    Biomolecular screening research frequently searches for the chemical compounds that are most likely to make a biochemical or cell-based assay system produce a strong continuous response. Several doses are tested with each compound and it is assumed that, if there is a dose-response relationship, the relationship follows a monotonic curve, usually a version of the median-effect equation. However, the null hypothesis of no relationship cannot be statistically tested using this equation. We used a linearized version of this equation to define a measure of pharmacological effect size, and use this measure to rank the investigated compounds in order of their overall capability to produce strong responses. The null hypothesis that none of the examined doses of a particular compound produced a strong response can be tested with this approach. The proposed approach is based on a new statistical model of the important concept of response detection limit, a concept that is usually neglected in the analysis of dose-response data with continuous responses. The methodology is illustrated with data from a study searching for compounds that neutralize the infection by a human immunodeficiency virus of brain glioblastoma cells. PMID:24905187

  1. Bayes Factor Approaches for Testing Interval Null Hypotheses

    ERIC Educational Resources Information Center

    Morey, Richard D.; Rouder, Jeffrey N.

    2011-01-01

    Psychological theories are statements of constraint. The role of hypothesis testing in psychology is to test whether specific theoretical constraints hold in data. Bayesian statistics is well suited to the task of finding supporting evidence for constraint, because it allows for comparing evidence for 2 hypotheses against each another. One issue…

  2. In silico model-based inference: a contemporary approach for hypothesis testing in network biology

    PubMed Central

    Klinke, David J.

    2014-01-01

    Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900’s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. PMID:25139179

  3. In silico model-based inference: a contemporary approach for hypothesis testing in network biology.

    PubMed

    Klinke, David J

    2014-01-01

    Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. © 2014 American Institute of Chemical Engineers.

  4. Statistical modeling, detection, and segmentation of stains in digitized fabric images

    NASA Astrophysics Data System (ADS)

    Gururajan, Arunkumar; Sari-Sarraf, Hamed; Hequet, Eric F.

    2007-02-01

    This paper will describe a novel and automated system based on a computer vision approach, for objective evaluation of stain release on cotton fabrics. Digitized color images of the stained fabrics are obtained, and the pixel values in the color and intensity planes of these images are probabilistically modeled as a Gaussian Mixture Model (GMM). Stain detection is posed as a decision theoretic problem, where the null hypothesis corresponds to absence of a stain. The null hypothesis and the alternate hypothesis mathematically translate into a first order GMM and a second order GMM respectively. The parameters of the GMM are estimated using a modified Expectation-Maximization (EM) algorithm. Minimum Description Length (MDL) is then used as the test statistic to decide the verity of the null hypothesis. The stain is then segmented by a decision rule based on the probability map generated by the EM algorithm. The proposed approach was tested on a dataset of 48 fabric images soiled with stains of ketchup, corn oil, mustard, ragu sauce, revlon makeup and grape juice. The decision theoretic part of the algorithm produced a correct detection rate (true positive) of 93% and a false alarm rate of 5% on these set of images.

  5. Invited Commentary: Can Issues With Reproducibility in Science Be Blamed on Hypothesis Testing?

    PubMed

    Weinberg, Clarice R

    2017-09-15

    In the accompanying article (Am J Epidemiol. 2017;186(6):646-647), Dr. Timothy Lash makes a forceful case that the problems with reproducibility in science stem from our "culture" of null hypothesis significance testing. He notes that when attention is selectively given to statistically significant findings, the estimated effects will be systematically biased away from the null. Here I revisit the recent history of genetic epidemiology and argue for retaining statistical testing as an important part of the tool kit. Particularly when many factors are considered in an agnostic way, in what Lash calls "innovative" research, investigators need a selection strategy to identify which findings are most likely to be genuine, and hence worthy of further study. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health 2017. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  6. A Test by Any Other Name: P Values, Bayes Factors, and Statistical Inference.

    PubMed

    Stern, Hal S

    2016-01-01

    Procedures used for statistical inference are receiving increased scrutiny as the scientific community studies the factors associated with insuring reproducible research. This note addresses recent negative attention directed at p values, the relationship of confidence intervals and tests, and the role of Bayesian inference and Bayes factors, with an eye toward better understanding these different strategies for statistical inference. We argue that researchers and data analysts too often resort to binary decisions (e.g., whether to reject or accept the null hypothesis) in settings where this may not be required.

  7. Implications for high speed research: The relationship between sonic boom signature distortion and atmospheric turbulence

    NASA Technical Reports Server (NTRS)

    Sparrow, Victor W.; Gionfriddo, Thomas A.

    1994-01-01

    In this study there were two primary tasks. The first was to develop an algorithm for quantifying the distortion in a sonic boom. Such an algorithm should be somewhat automatic, with minimal human intervention. Once the algorithm was developed, it was used to test the hypothesis that the cause of a sonic boom distortion was due to atmospheric turbulence. This hypothesis testing was the second task. Using readily available sonic boom data, we statistically tested whether there was a correlation between the sonic boom distortion and the distance a boom traveled through atmospheric turbulence.

  8. Conservative Tests under Satisficing Models of Publication Bias.

    PubMed

    McCrary, Justin; Christensen, Garret; Fanelli, Daniele

    2016-01-01

    Publication bias leads consumers of research to observe a selected sample of statistical estimates calculated by producers of research. We calculate critical values for statistical significance that could help to adjust after the fact for the distortions created by this selection effect, assuming that the only source of publication bias is file drawer bias. These adjusted critical values are easy to calculate and differ from unadjusted critical values by approximately 50%-rather than rejecting a null hypothesis when the t-ratio exceeds 2, the analysis suggests rejecting a null hypothesis when the t-ratio exceeds 3. Samples of published social science research indicate that on average, across research fields, approximately 30% of published t-statistics fall between the standard and adjusted cutoffs.

  9. Conservative Tests under Satisficing Models of Publication Bias

    PubMed Central

    McCrary, Justin; Christensen, Garret; Fanelli, Daniele

    2016-01-01

    Publication bias leads consumers of research to observe a selected sample of statistical estimates calculated by producers of research. We calculate critical values for statistical significance that could help to adjust after the fact for the distortions created by this selection effect, assuming that the only source of publication bias is file drawer bias. These adjusted critical values are easy to calculate and differ from unadjusted critical values by approximately 50%—rather than rejecting a null hypothesis when the t-ratio exceeds 2, the analysis suggests rejecting a null hypothesis when the t-ratio exceeds 3. Samples of published social science research indicate that on average, across research fields, approximately 30% of published t-statistics fall between the standard and adjusted cutoffs. PMID:26901834

  10. Cross-validation and hypothesis testing in neuroimaging: An irenic comment on the exchange between Friston and Lindquist et al.

    PubMed

    Reiss, Philip T

    2015-08-01

    The "ten ironic rules for statistical reviewers" presented by Friston (2012) prompted a rebuttal by Lindquist et al. (2013), which was followed by a rejoinder by Friston (2013). A key issue left unresolved in this discussion is the use of cross-validation to test the significance of predictive analyses. This note discusses the role that cross-validation-based and related hypothesis tests have come to play in modern data analyses, in neuroimaging and other fields. It is shown that such tests need not be suboptimal and can fill otherwise-unmet inferential needs. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Employer Learning and the Signaling Value of Education. National Longitudinal Surveys Discussion Paper.

    ERIC Educational Resources Information Center

    Altonji, Joseph G.; Pierret, Charles R.

    A statistical analysis was performed to test the hypothesis that, if profit-maximizing firms have limited information about the general productivity of new workers, they may choose to use easily observable characteristics such as years of education to discriminate statistically among workers. Information about employer learning was obtained by…

  12. Using the Descriptive Bootstrap to Evaluate Result Replicability (Because Statistical Significance Doesn't)

    ERIC Educational Resources Information Center

    Spinella, Sarah

    2011-01-01

    As result replicability is essential to science and difficult to achieve through external replicability, the present paper notes the insufficiency of null hypothesis statistical significance testing (NHSST) and explains the bootstrap as a plausible alternative, with a heuristic example to illustrate the bootstrap method. The bootstrap relies on…

  13. Learning of Grammar-Like Visual Sequences by Adults with and without Language-Learning Disabilities

    ERIC Educational Resources Information Center

    Aguilar, Jessica M.; Plante, Elena

    2014-01-01

    Purpose: Two studies examined learning of grammar-like visual sequences to determine whether a general deficit in statistical learning characterizes this population. Furthermore, we tested the hypothesis that difficulty in sustaining attention during the learning task might account for differences in statistical learning. Method: In Study 1,…

  14. The Gumbel hypothesis test for left censored observations using regional earthquake records as an example

    NASA Astrophysics Data System (ADS)

    Thompson, E. M.; Hewlett, J. B.; Baise, L. G.; Vogel, R. M.

    2011-01-01

    Annual maximum (AM) time series are incomplete (i.e., censored) when no events are included above the assumed censoring threshold (i.e., magnitude of completeness). We introduce a distrtibutional hypothesis test for left-censored Gumbel observations based on the probability plot correlation coefficient (PPCC). Critical values of the PPCC hypothesis test statistic are computed from Monte-Carlo simulations and are a function of sample size, censoring level, and significance level. When applied to a global catalog of earthquake observations, the left-censored Gumbel PPCC tests are unable to reject the Gumbel hypothesis for 45 of 46 seismic regions. We apply four different field significance tests for combining individual tests into a collective hypothesis test. None of the field significance tests are able to reject the global hypothesis that AM earthquake magnitudes arise from a Gumbel distribution. Because the field significance levels are not conclusive, we also compute the likelihood that these field significance tests are unable to reject the Gumbel model when the samples arise from a more complex distributional alternative. A power study documents that the censored Gumbel PPCC test is unable to reject some important and viable Generalized Extreme Value (GEV) alternatives. Thus, we cannot rule out the possibility that the global AM earthquake time series could arise from a GEV distribution with a finite upper bound, also known as a reverse Weibull distribution. Our power study also indicates that the binomial and uniform field significance tests are substantially more powerful than the more commonly used Bonferonni and false discovery rate multiple comparison procedures.

  15. A Ratio Test of Interrater Agreement with High Specificity

    ERIC Educational Resources Information Center

    Cousineau, Denis; Laurencelle, Louis

    2015-01-01

    Existing tests of interrater agreements have high statistical power; however, they lack specificity. If the ratings of the two raters do not show agreement but are not random, the current tests, some of which are based on Cohen's kappa, will often reject the null hypothesis, leading to the wrong conclusion that agreement is present. A new test of…

  16. GeneTools--application for functional annotation and statistical hypothesis testing.

    PubMed

    Beisvag, Vidar; Jünge, Frode K R; Bergum, Hallgeir; Jølsum, Lars; Lydersen, Stian; Günther, Clara-Cecilie; Ramampiaro, Heri; Langaas, Mette; Sandvik, Arne K; Laegreid, Astrid

    2006-10-24

    Modern biology has shifted from "one gene" approaches to methods for genomic-scale analysis like microarray technology, which allow simultaneous measurement of thousands of genes. This has created a need for tools facilitating interpretation of biological data in "batch" mode. However, such tools often leave the investigator with large volumes of apparently unorganized information. To meet this interpretation challenge, gene-set, or cluster testing has become a popular analytical tool. Many gene-set testing methods and software packages are now available, most of which use a variety of statistical tests to assess the genes in a set for biological information. However, the field is still evolving, and there is a great need for "integrated" solutions. GeneTools is a web-service providing access to a database that brings together information from a broad range of resources. The annotation data are updated weekly, guaranteeing that users get data most recently available. Data submitted by the user are stored in the database, where it can easily be updated, shared between users and exported in various formats. GeneTools provides three different tools: i) NMC Annotation Tool, which offers annotations from several databases like UniGene, Entrez Gene, SwissProt and GeneOntology, in both single- and batch search mode. ii) GO Annotator Tool, where users can add new gene ontology (GO) annotations to genes of interest. These user defined GO annotations can be used in further analysis or exported for public distribution. iii) eGOn, a tool for visualization and statistical hypothesis testing of GO category representation. As the first GO tool, eGOn supports hypothesis testing for three different situations (master-target situation, mutually exclusive target-target situation and intersecting target-target situation). An important additional function is an evidence-code filter that allows users, to select the GO annotations for the analysis. GeneTools is the first "all in one" annotation tool, providing users with a rapid extraction of highly relevant gene annotation data for e.g. thousands of genes or clones at once. It allows a user to define and archive new GO annotations and it supports hypothesis testing related to GO category representations. GeneTools is freely available through www.genetools.no

  17. Recurrence network measures for hypothesis testing using surrogate data: Application to black hole light curves

    NASA Astrophysics Data System (ADS)

    Jacob, Rinku; Harikrishnan, K. P.; Misra, R.; Ambika, G.

    2018-01-01

    Recurrence networks and the associated statistical measures have become important tools in the analysis of time series data. In this work, we test how effective the recurrence network measures are in analyzing real world data involving two main types of noise, white noise and colored noise. We use two prominent network measures as discriminating statistic for hypothesis testing using surrogate data for a specific null hypothesis that the data is derived from a linear stochastic process. We show that the characteristic path length is especially efficient as a discriminating measure with the conclusions reasonably accurate even with limited number of data points in the time series. We also highlight an additional advantage of the network approach in identifying the dimensionality of the system underlying the time series through a convergence measure derived from the probability distribution of the local clustering coefficients. As examples of real world data, we use the light curves from a prominent black hole system and show that a combined analysis using three primary network measures can provide vital information regarding the nature of temporal variability of light curves from different spectroscopic classes.

  18. Results of the Verification of the Statistical Distribution Model of Microseismicity Emission Characteristics

    NASA Astrophysics Data System (ADS)

    Cianciara, Aleksander

    2016-09-01

    The paper presents the results of research aimed at verifying the hypothesis that the Weibull distribution is an appropriate statistical distribution model of microseismicity emission characteristics, namely: energy of phenomena and inter-event time. It is understood that the emission under consideration is induced by the natural rock mass fracturing. Because the recorded emission contain noise, therefore, it is subjected to an appropriate filtering. The study has been conducted using the method of statistical verification of null hypothesis that the Weibull distribution fits the empirical cumulative distribution function. As the model describing the cumulative distribution function is given in an analytical form, its verification may be performed using the Kolmogorov-Smirnov goodness-of-fit test. Interpretations by means of probabilistic methods require specifying the correct model describing the statistical distribution of data. Because in these methods measurement data are not used directly, but their statistical distributions, e.g., in the method based on the hazard analysis, or in that that uses maximum value statistics.

  19. BETTER STATISTICS FOR BETTER DECISIONS: REJECTING NULL HYPOTHESES STATISTICAL TESTS IN FAVOR OF REPLICATION STATISTICS

    PubMed Central

    SANABRIA, FEDERICO; KILLEEN, PETER R.

    2008-01-01

    Despite being under challenge for the past 50 years, null hypothesis significance testing (NHST) remains dominant in the scientific field for want of viable alternatives. NHST, along with its significance level p, is inadequate for most of the uses to which it is put, a flaw that is of particular interest to educational practitioners who too often must use it to sanctify their research. In this article, we review the failure of NHST and propose prep, the probability of replicating an effect, as a more useful statistic for evaluating research and aiding practical decision making. PMID:19122766

  20. Mourning dove hunting regulation strategy based on annual harvest statistics and banding data

    USGS Publications Warehouse

    Otis, D.L.

    2006-01-01

    Although managers should strive to base game bird harvest management strategies on mechanistic population models, monitoring programs required to build and continuously update these models may not be in place. Alternatively, If estimates of total harvest and harvest rates are available, then population estimates derived from these harvest data can serve as the basis for making hunting regulation decisions based on population growth rates derived from these estimates. I present a statistically rigorous approach for regulation decision-making using a hypothesis-testing framework and an assumed framework of 3 hunting regulation alternatives. I illustrate and evaluate the technique with historical data on the mid-continent mallard (Anas platyrhynchos) population. I evaluate the statistical properties of the hypothesis-testing framework using the best available data on mourning doves (Zenaida macroura). I use these results to discuss practical implementation of the technique as an interim harvest strategy for mourning doves until reliable mechanistic population models and associated monitoring programs are developed.

  1. Nature's style: Naturally trendy

    USGS Publications Warehouse

    Cohn, T.A.; Lins, H.F.

    2005-01-01

    Hydroclimatological time series often exhibit trends. While trend magnitude can be determined with little ambiguity, the corresponding statistical significance, sometimes cited to bolster scientific and political argument, is less certain because significance depends critically on the null hypothesis which in turn reflects subjective notions about what one expects to see. We consider statistical trend tests of hydroclimatological data in the presence of long-term persistence (LTP). Monte Carlo experiments employing FARIMA models indicate that trend tests which fail to consider LTP greatly overstate the statistical significance of observed trends when LTP is present. A new test is presented that avoids this problem. From a practical standpoint, however, it may be preferable to acknowledge that the concept of statistical significance is meaningless when discussing poorly understood systems.

  2. Nature's style: Naturally trendy

    NASA Astrophysics Data System (ADS)

    Cohn, Timothy A.; Lins, Harry F.

    2005-12-01

    Hydroclimatological time series often exhibit trends. While trend magnitude can be determined with little ambiguity, the corresponding statistical significance, sometimes cited to bolster scientific and political argument, is less certain because significance depends critically on the null hypothesis which in turn reflects subjective notions about what one expects to see. We consider statistical trend tests of hydroclimatological data in the presence of long-term persistence (LTP). Monte Carlo experiments employing FARIMA models indicate that trend tests which fail to consider LTP greatly overstate the statistical significance of observed trends when LTP is present. A new test is presented that avoids this problem. From a practical standpoint, however, it may be preferable to acknowledge that the concept of statistical significance is meaningless when discussing poorly understood systems.

  3. Statistics 101 for Radiologists.

    PubMed

    Anvari, Arash; Halpern, Elkan F; Samir, Anthony E

    2015-10-01

    Diagnostic tests have wide clinical applications, including screening, diagnosis, measuring treatment effect, and determining prognosis. Interpreting diagnostic test results requires an understanding of key statistical concepts used to evaluate test efficacy. This review explains descriptive statistics and discusses probability, including mutually exclusive and independent events and conditional probability. In the inferential statistics section, a statistical perspective on study design is provided, together with an explanation of how to select appropriate statistical tests. Key concepts in recruiting study samples are discussed, including representativeness and random sampling. Variable types are defined, including predictor, outcome, and covariate variables, and the relationship of these variables to one another. In the hypothesis testing section, we explain how to determine if observed differences between groups are likely to be due to chance. We explain type I and II errors, statistical significance, and study power, followed by an explanation of effect sizes and how confidence intervals can be used to generalize observed effect sizes to the larger population. Statistical tests are explained in four categories: t tests and analysis of variance, proportion analysis tests, nonparametric tests, and regression techniques. We discuss sensitivity, specificity, accuracy, receiver operating characteristic analysis, and likelihood ratios. Measures of reliability and agreement, including κ statistics, intraclass correlation coefficients, and Bland-Altman graphs and analysis, are introduced. © RSNA, 2015.

  4. Techniques for recognizing identity of several response functions from the data of visual inspection

    NASA Astrophysics Data System (ADS)

    Nechval, Nicholas A.

    1996-08-01

    The purpose of this paper is to present some efficient techniques for recognizing from the observed data whether several response functions are identical to each other. For example, in an industrial setting the problem may be to determine whether the production coefficients established in a small-scale pilot study apply to each of several large- scale production facilities. The techniques proposed here combine sensor information from automated visual inspection of manufactured products which is carried out by means of pixel-by-pixel comparison of the sensed image of the product to be inspected with some reference pattern (or image). Let (a1, . . . , am) be p-dimensional parameters associated with m response models of the same type. This study is concerned with the simultaneous comparison of a1, . . . , am. A generalized maximum likelihood ratio (GMLR) test is derived for testing equality of these parameters, where each of the parameters represents a corresponding vector of regression coefficients. The GMLR test reduces to an equivalent test based on a statistic that has an F distribution. The main advantage of the test lies in its relative simplicity and the ease with which it can be applied. Another interesting test for the same problem is an application of Fisher's method of combining independent test statistics which can be considered as a parallel procedure to the GMLR test. The combination of independent test statistics does not appear to have been used very much in applied statistics. There does, however, seem to be potential data analytic value in techniques for combining distributional assessments in relation to statistically independent samples which are of joint experimental relevance. In addition, a new iterated test for the problem defined above is presented. A rejection of the null hypothesis by this test provides some reason why all the parameters are not equal. A numerical example is discussed in the context of the proposed procedures for hypothesis testing.

  5. Do Statistical Segmentation Abilities Predict Lexical-Phonological and Lexical-Semantic Abilities in Children with and without SLI?

    ERIC Educational Resources Information Center

    Mainela-Arnold, Elina; Evans, Julia L.

    2014-01-01

    This study tested the predictions of the procedural deficit hypothesis by investigating the relationship between sequential statistical learning and two aspects of lexical ability, lexical-phonological and lexical-semantic, in children with and without specific language impairment (SLI). Participants included forty children (ages 8;5-12;3), twenty…

  6. Finding P-Values for F Tests of Hypothesis on a Spreadsheet.

    ERIC Educational Resources Information Center

    Rochowicz, John A., Jr.

    The calculation of the F statistic for a one-factor analysis of variance (ANOVA) and the construction of an ANOVA tables are easily implemented on a spreadsheet. This paper describes how to compute the p-value (observed significance level) for a particular F statistic on a spreadsheet. Decision making on a spreadsheet and applications to the…

  7. A comparator-hypothesis account of biased contingency detection.

    PubMed

    Vadillo, Miguel A; Barberia, Itxaso

    2018-02-12

    Our ability to detect statistical dependencies between different events in the environment is strongly biased by the number of coincidences between them. Even when there is no true covariation between a cue and an outcome, if the marginal probability of either of them is high, people tend to perceive some degree of statistical contingency between both events. The present paper explores the ability of the Comparator Hypothesis to explain the general pattern of results observed in this literature. Our simulations show that this model can account for the biasing effects of the marginal probabilities of cues and outcomes. Furthermore, the overall fit of the Comparator Hypothesis to a sample of experimental conditions from previous studies is comparable to that of the popular Rescorla-Wagner model. These results should encourage researchers to further explore and put to the test the predictions of the Comparator Hypothesis in the domain of biased contingency detection. Copyright © 2018 Elsevier B.V. All rights reserved.

  8. Use of Pearson's Chi-Square for Testing Equality of Percentile Profiles across Multiple Populations.

    PubMed

    Johnson, William D; Beyl, Robbie A; Burton, Jeffrey H; Johnson, Callie M; Romer, Jacob E; Zhang, Lei

    2015-08-01

    In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For example, it may be more informative to compare two or more populations with respect to their within population distributions by testing the hypothesis that their corresponding respective 10 th , 50 th , and 90 th percentiles are equal. As a generalization of the median test, the proposed test statistic is asymptotically distributed as Chi-square with degrees of freedom dependent upon the number of percentiles tested and constraints of the null hypothesis. Results from simulation studies are used to validate the nominal 0.05 significance level under the null hypothesis, and asymptotic power properties that are suitable for testing equality of percentile profiles against selected profile discrepancies for a variety of underlying distributions. A pragmatic example is provided to illustrate the comparison of the percentile profiles for four body mass index distributions.

  9. On two-sample McNemar test.

    PubMed

    Xiang, Jim X

    2016-01-01

    Measuring a change in the existence of disease symptoms before and after a treatment is examined for statistical significance by means of the McNemar test. When comparing two treatments, Feuer and Kessler (1989) proposed a two-sample McNemar test. In this article, we show that this test usually inflates the type I error in the hypothesis testing, and propose a new two-sample McNemar test that is superior in terms of preserving type I error. We also make the connection between the two-sample McNemar test and the test statistic for the equal residual effects in a 2 × 2 crossover design. The limitations of the two-sample McNemar test are also discussed.

  10. Does mediator use contribute to the spacing effect for cued recall? Critical tests of the mediator hypothesis.

    PubMed

    Morehead, Kayla; Dunlosky, John; Rawson, Katherine A; Bishop, Melissa; Pyc, Mary A

    2018-04-01

    When study is spaced across sessions (versus massed within a single session), final performance is greater after spacing. This spacing effect may have multiple causes, and according to the mediator hypothesis, part of the effect can be explained by the use of mediator-based strategies. This hypothesis proposes that when study is spaced across sessions, rather than massed within a session, more mediators will be generated that are longer lasting and hence more mediators will be available to support criterion recall. In two experiments, participants were randomly assigned to study paired associates using either a spaced or massed schedule. They reported strategy use for each item during study trials and during the final test. Consistent with the mediator hypothesis, participants who had spaced (as compared to massed) practice reported using more mediators on the final test. This use of effective mediators also statistically accounted for some - but not all of - the spacing effect on final performance.

  11. Replication Unreliability in Psychology: Elusive Phenomena or “Elusive” Statistical Power?

    PubMed Central

    Tressoldi, Patrizio E.

    2012-01-01

    The focus of this paper is to analyze whether the unreliability of results related to certain controversial psychological phenomena may be a consequence of their low statistical power. Applying the Null Hypothesis Statistical Testing (NHST), still the widest used statistical approach, unreliability derives from the failure to refute the null hypothesis, in particular when exact or quasi-exact replications of experiments are carried out. Taking as example the results of meta-analyses related to four different controversial phenomena, subliminal semantic priming, incubation effect for problem solving, unconscious thought theory, and non-local perception, it was found that, except for semantic priming on categorization, the statistical power to detect the expected effect size (ES) of the typical study, is low or very low. The low power in most studies undermines the use of NHST to study phenomena with moderate or low ESs. We conclude by providing some suggestions on how to increase the statistical power or use different statistical approaches to help discriminate whether the results obtained may or may not be used to support or to refute the reality of a phenomenon with small ES. PMID:22783215

  12. The Intervening Galaxies Hypothesis of the Absorption Spectra of Quasi-Stellar Objects: Some Statistical Studies

    NASA Astrophysics Data System (ADS)

    Duari, Debiprosad; Narlikar, Jayant V.

    This paper examines, in the light of the available data, the hypothesis that the heavy element absorption line systems in the spectra of QSOs originate through en-route absorption by intervening galaxies, halos etc. Several statistical tests are applied in two different ways to compare the predictions of the intervening galaxies hypothesis (IGH) with actual observations. The database is taken from a recent 1991 compilation of absorption line systems by Junkkarinen, Hewitt and Burbidge. Although, prima facie, a considerable gap is found between the predictions of the intervening galaxies hypothesis and the actual observations despite inclusion of any effects of clustering and some likely selection effects, the gap narrows after invoking evolution in the number density of absorbers and allowing for the incompleteness and inhomogeneity of samples examined. On the latter count the gap might be bridgeable by stretching the parameters of the theory. It is concluded that although the intervening galaxies hypothesis is a possible natural explanation to account for the absorption line systems and may in fact do so in several cases, it seems too simplistic to be able to account for all the available data. It is further stressed that the statistical techniques described here will be useful for future studies of complete and homogenous samples with a view to deciding the extent of applicability of the IGH.

  13. Do Effect-Size Measures Measure up?: A Brief Assessment

    ERIC Educational Resources Information Center

    Onwuegbuzie, Anthony J.; Levin, Joel R.; Leech, Nancy L.

    2003-01-01

    Because of criticisms leveled at statistical hypothesis testing, some researchers have argued that measures of effect size should replace the significance-testing practice. We contend that although effect-size measures have logical appeal, they are also associated with a number of limitations that may result in problematic interpretations of them…

  14. Students' Understanding of Conditional Probability on Entering University

    ERIC Educational Resources Information Center

    Reaburn, Robyn

    2013-01-01

    An understanding of conditional probability is essential for students of inferential statistics as it is used in Null Hypothesis Tests. Conditional probability is also used in Bayes' theorem, in the interpretation of medical screening tests and in quality control procedures. This study examines the understanding of conditional probability of…

  15. RANDOMIZATION PROCEDURES FOR THE ANALYSIS OF EDUCATIONAL EXPERIMENTS.

    ERIC Educational Resources Information Center

    COLLIER, RAYMOND O.

    CERTAIN SPECIFIC ASPECTS OF HYPOTHESIS TESTS USED FOR ANALYSIS OF RESULTS IN RANDOMIZED EXPERIMENTS WERE STUDIED--(1) THE DEVELOPMENT OF THE THEORETICAL FACTOR, THAT OF PROVIDING INFORMATION ON STATISTICAL TESTS FOR CERTAIN EXPERIMENTAL DESIGNS AND (2) THE DEVELOPMENT OF THE APPLIED ELEMENT, THAT OF SUPPLYING THE EXPERIMENTER WITH MACHINERY FOR…

  16. Sample Size Estimation: The Easy Way

    ERIC Educational Resources Information Center

    Weller, Susan C.

    2015-01-01

    This article presents a simple approach to making quick sample size estimates for basic hypothesis tests. Although there are many sources available for estimating sample sizes, methods are not often integrated across statistical tests, levels of measurement of variables, or effect sizes. A few parameters are required to estimate sample sizes and…

  17. Correcting power and p-value calculations for bias in diffusion tensor imaging.

    PubMed

    Lauzon, Carolyn B; Landman, Bennett A

    2013-07-01

    Diffusion tensor imaging (DTI) provides quantitative parametric maps sensitive to tissue microarchitecture (e.g., fractional anisotropy, FA). These maps are estimated through computational processes and subject to random distortions including variance and bias. Traditional statistical procedures commonly used for study planning (including power analyses and p-value/alpha-rate thresholds) specifically model variability, but neglect potential impacts of bias. Herein, we quantitatively investigate the impacts of bias in DTI on hypothesis test properties (power and alpha-rate) using a two-sided hypothesis testing framework. We present theoretical evaluation of bias on hypothesis test properties, evaluate the bias estimation technique SIMEX for DTI hypothesis testing using simulated data, and evaluate the impacts of bias on spatially varying power and alpha rates in an empirical study of 21 subjects. Bias is shown to inflame alpha rates, distort the power curve, and cause significant power loss even in empirical settings where the expected difference in bias between groups is zero. These adverse effects can be attenuated by properly accounting for bias in the calculation of power and p-values. Copyright © 2013 Elsevier Inc. All rights reserved.

  18. Investigations of interference between electromagnetic transponders and wireless MOSFET dosimeters: a phantom study.

    PubMed

    Su, Zhong; Zhang, Lisha; Ramakrishnan, V; Hagan, Michael; Anscher, Mitchell

    2011-05-01

    To evaluate both the Calypso Systems' (Calypso Medical Technologies, Inc., Seattle, WA) localization accuracy in the presence of wireless metal-oxide-semiconductor field-effect transistor (MOSFET) dosimeters of dose verification system (DVS, Sicel Technologies, Inc., Morrisville, NC) and the dosimeters' reading accuracy in the presence of wireless electromagnetic transponders inside a phantom. A custom-made, solid-water phantom was fabricated with space for transponders and dosimeters. Two inserts were machined with positioning grooves precisely matching the dimensions of the transponders and dosimeters and were arranged in orthogonal and parallel orientations, respectively. To test the transponder localization accuracy with/without presence of dosimeters (hypothesis 1), multivariate analyses were performed on transponder-derived localization data with and without dosimeters at each preset distance to detect statistically significant localization differences between the control and test sets. To test dosimeter dose-reading accuracy with/without presence of transponders (hypothesis 2), an approach of alternating the transponder presence in seven identical fraction dose (100 cGy) deliveries and measurements was implemented. Two-way analysis of variance was performed to examine statistically significant dose-reading differences between the two groups and the different fractions. A relative-dose analysis method was also used to evaluate transponder impact on dose-reading accuracy after dose-fading effect was removed by a second-order polynomial fit. Multivariate analysis indicated that hypothesis 1 was false; there was a statistically significant difference between the localization data from the control and test sets. However, the upper and lower bounds of the 95% confidence intervals of the localized positional differences between the control and test sets were less than 0.1 mm, which was significantly smaller than the minimum clinical localization resolution of 0.5 mm. For hypothesis 2, analysis of variance indicated that there was no statistically significant difference between the dosimeter readings with and without the presence of transponders. Both orthogonal and parallel configurations had difference of polynomial-fit dose to measured dose values within 1.75%. The phantom study indicated that the Calypso System's localization accuracy was not affected clinically due to the presence of DVS wireless MOSFET dosimeters and the dosimeter-measured doses were not affected by the presence of transponders. Thus, the same patients could be implanted with both transponders and dosimeters to benefit from improved accuracy of radiotherapy treatments offered by conjunctional use of the two systems.

  19. Does McNemar's test compare the sensitivities and specificities of two diagnostic tests?

    PubMed

    Kim, Soeun; Lee, Woojoo

    2017-02-01

    McNemar's test is often used in practice to compare the sensitivities and specificities for the evaluation of two diagnostic tests. For correct evaluation of accuracy, an intuitive recommendation is to test the diseased and the non-diseased groups separately so that the sensitivities can be compared among the diseased, and specificities can be compared among the healthy group of people. This paper provides a rigorous theoretical framework for this argument and study the validity of McNemar's test regardless of the conditional independence assumption. We derive McNemar's test statistic under the null hypothesis considering both assumptions of conditional independence and conditional dependence. We then perform power analyses to show how the result is affected by the amount of the conditional dependence under alternative hypothesis.

  20. Significance levels for studies with correlated test statistics.

    PubMed

    Shi, Jianxin; Levinson, Douglas F; Whittemore, Alice S

    2008-07-01

    When testing large numbers of null hypotheses, one needs to assess the evidence against the global null hypothesis that none of the hypotheses is false. Such evidence typically is based on the test statistic of the largest magnitude, whose statistical significance is evaluated by permuting the sample units to simulate its null distribution. Efron (2007) has noted that correlation among the test statistics can induce substantial interstudy variation in the shapes of their histograms, which may cause misleading tail counts. Here, we show that permutation-based estimates of the overall significance level also can be misleading when the test statistics are correlated. We propose that such estimates be conditioned on a simple measure of the spread of the observed histogram, and we provide a method for obtaining conditional significance levels. We justify this conditioning using the conditionality principle described by Cox and Hinkley (1974). Application of the method to gene expression data illustrates the circumstances when conditional significance levels are needed.

  1. [The problem of small "n" and big "P" in neuropsycho-pharmacology, or how to keep the rate of false discoveries under control].

    PubMed

    Petschner, Péter; Bagdy, György; Tóthfalusi, Laszló

    2015-03-01

    One of the characteristics of many methods used in neuropsychopharmacology is that a large number of parameters (P) are measured in relatively few subjects (n). Functional magnetic resonance imaging, electroencephalography (EEG) and genomic studies are typical examples. For example one microarray chip can contain thousands of probes. Therefore, in studies using microarray chips, P may be several thousand-fold larger than n. Statistical analysis of such studies is a challenging task and they are refereed to in the statistical literature such as the small "n" big "P" problem. The problem has many facets including the controversies associated with multiple hypothesis testing. A typical scenario in this context is, when two or more groups are compared by the individual attributes. If the increased classification error due to the multiple testing is neglected, then several highly significant differences will be discovered. But in reality, some of these significant differences are coincidental, not reproducible findings. Several methods were proposed to solve this problem. In this review we discuss two of the proposed solutions, algorithms to compare sets and statistical hypothesis tests controlling the false discovery rate.

  2. Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies

    PubMed Central

    Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M.; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert

    2016-01-01

    The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008–2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0. PMID:27892471

  3. Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies.

    PubMed

    Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert

    2016-11-28

    The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.

  4. Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies

    NASA Astrophysics Data System (ADS)

    Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M.; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert

    2016-11-01

    The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.

  5. High Impact = High Statistical Standards? Not Necessarily So

    PubMed Central

    Tressoldi, Patrizio E.; Giofré, David; Sella, Francesco; Cumming, Geoff

    2013-01-01

    What are the statistical practices of articles published in journals with a high impact factor? Are there differences compared with articles published in journals with a somewhat lower impact factor that have adopted editorial policies to reduce the impact of limitations of Null Hypothesis Significance Testing? To investigate these questions, the current study analyzed all articles related to psychological, neuropsychological and medical issues, published in 2011 in four journals with high impact factors: Science, Nature, The New England Journal of Medicine and The Lancet, and three journals with relatively lower impact factors: Neuropsychology, Journal of Experimental Psychology-Applied and the American Journal of Public Health. Results show that Null Hypothesis Significance Testing without any use of confidence intervals, effect size, prospective power and model estimation, is the prevalent statistical practice used in articles published in Nature, 89%, followed by articles published in Science, 42%. By contrast, in all other journals, both with high and lower impact factors, most articles report confidence intervals and/or effect size measures. We interpreted these differences as consequences of the editorial policies adopted by the journal editors, which are probably the most effective means to improve the statistical practices in journals with high or low impact factors. PMID:23418533

  6. High impact  =  high statistical standards? Not necessarily so.

    PubMed

    Tressoldi, Patrizio E; Giofré, David; Sella, Francesco; Cumming, Geoff

    2013-01-01

    What are the statistical practices of articles published in journals with a high impact factor? Are there differences compared with articles published in journals with a somewhat lower impact factor that have adopted editorial policies to reduce the impact of limitations of Null Hypothesis Significance Testing? To investigate these questions, the current study analyzed all articles related to psychological, neuropsychological and medical issues, published in 2011 in four journals with high impact factors: Science, Nature, The New England Journal of Medicine and The Lancet, and three journals with relatively lower impact factors: Neuropsychology, Journal of Experimental Psychology-Applied and the American Journal of Public Health. Results show that Null Hypothesis Significance Testing without any use of confidence intervals, effect size, prospective power and model estimation, is the prevalent statistical practice used in articles published in Nature, 89%, followed by articles published in Science, 42%. By contrast, in all other journals, both with high and lower impact factors, most articles report confidence intervals and/or effect size measures. We interpreted these differences as consequences of the editorial policies adopted by the journal editors, which are probably the most effective means to improve the statistical practices in journals with high or low impact factors.

  7. A Computationally Efficient Hypothesis Testing Method for Epistasis Analysis using Multifactor Dimensionality Reduction

    PubMed Central

    Pattin, Kristine A.; White, Bill C.; Barney, Nate; Gui, Jiang; Nelson, Heather H.; Kelsey, Karl R.; Andrew, Angeline S.; Karagas, Margaret R.; Moore, Jason H.

    2008-01-01

    Multifactor dimensionality reduction (MDR) was developed as a nonparametric and model-free data mining method for detecting, characterizing, and interpreting epistasis in the absence of significant main effects in genetic and epidemiologic studies of complex traits such as disease susceptibility. The goal of MDR is to change the representation of the data using a constructive induction algorithm to make nonadditive interactions easier to detect using any classification method such as naïve Bayes or logistic regression. Traditionally, MDR constructed variables have been evaluated with a naïve Bayes classifier that is combined with 10-fold cross validation to obtain an estimate of predictive accuracy or generalizability of epistasis models. Traditionally, we have used permutation testing to statistically evaluate the significance of models obtained through MDR. The advantage of permutation testing is that it controls for false-positives due to multiple testing. The disadvantage is that permutation testing is computationally expensive. This is in an important issue that arises in the context of detecting epistasis on a genome-wide scale. The goal of the present study was to develop and evaluate several alternatives to large-scale permutation testing for assessing the statistical significance of MDR models. Using data simulated from 70 different epistasis models, we compared the power and type I error rate of MDR using a 1000-fold permutation test with hypothesis testing using an extreme value distribution (EVD). We find that this new hypothesis testing method provides a reasonable alternative to the computationally expensive 1000-fold permutation test and is 50 times faster. We then demonstrate this new method by applying it to a genetic epidemiology study of bladder cancer susceptibility that was previously analyzed using MDR and assessed using a 1000-fold permutation test. PMID:18671250

  8. An Independent Filter for Gene Set Testing Based on Spectral Enrichment.

    PubMed

    Frost, H Robert; Li, Zhigang; Asselbergs, Folkert W; Moore, Jason H

    2015-01-01

    Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.

  9. Critical Thinking Skills of U.S. Air Force Senior and Intermediate Developmental Education Students

    DTIC Science & Technology

    2016-02-16

    SAASS), and Air War College (AWC). T-tests indicated no statistically significant difference in the CT skills of the sample of ACSC and AWC students...hypothesis that there was no statistically significant difference in the CT skills of IDE and SDE students. SAASS, as a more selective advanced studies...potential to develop CT skills, concluding, “students in the experimental group performed at a statistically significantly higher level than students in

  10. Accuracy Evaluation of the Unified P-Value from Combining Correlated P-Values

    PubMed Central

    Alves, Gelio; Yu, Yi-Kuo

    2014-01-01

    Meta-analysis methods that combine -values into a single unified -value are frequently employed to improve confidence in hypothesis testing. An assumption made by most meta-analysis methods is that the -values to be combined are independent, which may not always be true. To investigate the accuracy of the unified -value from combining correlated -values, we have evaluated a family of statistical methods that combine: independent, weighted independent, correlated, and weighted correlated -values. Statistical accuracy evaluation by combining simulated correlated -values showed that correlation among -values can have a significant effect on the accuracy of the combined -value obtained. Among the statistical methods evaluated those that weight -values compute more accurate combined -values than those that do not. Also, statistical methods that utilize the correlation information have the best performance, producing significantly more accurate combined -values. In our study we have demonstrated that statistical methods that combine -values based on the assumption of independence can produce inaccurate -values when combining correlated -values, even when the -values are only weakly correlated. Therefore, to prevent from drawing false conclusions during hypothesis testing, our study advises caution be used when interpreting the -value obtained from combining -values of unknown correlation. However, when the correlation information is available, the weighting-capable statistical method, first introduced by Brown and recently modified by Hou, seems to perform the best amongst the methods investigated. PMID:24663491

  11. A web-portal for interactive data exploration, visualization, and hypothesis testing

    PubMed Central

    Bartsch, Hauke; Thompson, Wesley K.; Jernigan, Terry L.; Dale, Anders M.

    2014-01-01

    Clinical research studies generate data that need to be shared and statistically analyzed by their participating institutions. The distributed nature of research and the different domains involved present major challenges to data sharing, exploration, and visualization. The Data Portal infrastructure was developed to support ongoing research in the areas of neurocognition, imaging, and genetics. Researchers benefit from the integration of data sources across domains, the explicit representation of knowledge from domain experts, and user interfaces providing convenient access to project specific data resources and algorithms. The system provides an interactive approach to statistical analysis, data mining, and hypothesis testing over the lifetime of a study and fulfills a mandate of public sharing by integrating data sharing into a system built for active data exploration. The web-based platform removes barriers for research and supports the ongoing exploration of data. PMID:24723882

  12. Hypothesis testing in functional linear regression models with Neyman's truncation and wavelet thresholding for longitudinal data.

    PubMed

    Yang, Xiaowei; Nie, Kun

    2008-03-15

    Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data.

  13. The Consequences of Model Misidentification in the Interrupted Time-Series Experiment.

    ERIC Educational Resources Information Center

    Padia, William L.

    Campbell (l969) argued for the interrupted time-series experiment as a useful methodology for testing intervention effects in the social sciences. The validity of the statistical hypothesis testing of time-series, is, however, dependent upon the proper identification of the underlying stochastic nature of the data. Several types of model…

  14. [Effectiveness of the Military Mental Health Promotion Program].

    PubMed

    Woo, Chung Hee; Kim, Sun Ah

    2014-12-01

    This study was done to evaluate the Military Mental Health Promotion Program. The program was an email based cognitive behavioral intervention. The research design was a quasi-experimental study with a non-equivalent control group pretest-posttest design. Participants were 32 soldiers who agreed to participate in the program. Data were collected at three different times from January 2012 to March 2012; pre-test, post-test, and a one-month follow-up test. The data were statistically analyzed using SPSS 18.0. The effectiveness of the program was tested by repeated measures ANOVA. The first hypothesis that the level of depression in the experimental group who participated in the program would decrease compared to the control group was not supported in that the difference in group-time interaction was not statistically significant (F=2.19, p=.121). The second and third hypothesis related to anxiety and self-esteem were supported in group-time interaction, respectively (F=7.41, p=.001, F=11.67, p<.001). Results indicate that the program is effective in improving soldiers' mental health status in areas of anxiety and self-esteem.

  15. Sequential Tests of Multiple Hypotheses Controlling Type I and II Familywise Error Rates

    PubMed Central

    Bartroff, Jay; Song, Jinlin

    2014-01-01

    This paper addresses the following general scenario: A scientist wishes to perform a battery of experiments, each generating a sequential stream of data, to investigate some phenomenon. The scientist would like to control the overall error rate in order to draw statistically-valid conclusions from each experiment, while being as efficient as possible. The between-stream data may differ in distribution and dimension but also may be highly correlated, even duplicated exactly in some cases. Treating each experiment as a hypothesis test and adopting the familywise error rate (FWER) metric, we give a procedure that sequentially tests each hypothesis while controlling both the type I and II FWERs regardless of the between-stream correlation, and only requires arbitrary sequential test statistics that control the error rates for a given stream in isolation. The proposed procedure, which we call the sequential Holm procedure because of its inspiration from Holm’s (1979) seminal fixed-sample procedure, shows simultaneous savings in expected sample size and less conservative error control relative to fixed sample, sequential Bonferroni, and other recently proposed sequential procedures in a simulation study. PMID:25092948

  16. Proper Image Subtraction—Optimal Transient Detection, Photometry, and Hypothesis Testing

    NASA Astrophysics Data System (ADS)

    Zackay, Barak; Ofek, Eran O.; Gal-Yam, Avishay

    2016-10-01

    Transient detection and flux measurement via image subtraction stand at the base of time domain astronomy. Due to the varying seeing conditions, the image subtraction process is non-trivial, and existing solutions suffer from a variety of problems. Starting from basic statistical principles, we develop the optimal statistic for transient detection, flux measurement, and any image-difference hypothesis testing. We derive a closed-form statistic that: (1) is mathematically proven to be the optimal transient detection statistic in the limit of background-dominated noise, (2) is numerically stable, (3) for accurately registered, adequately sampled images, does not leave subtraction or deconvolution artifacts, (4) allows automatic transient detection to the theoretical sensitivity limit by providing credible detection significance, (5) has uncorrelated white noise, (6) is a sufficient statistic for any further statistical test on the difference image, and, in particular, allows us to distinguish particle hits and other image artifacts from real transients, (7) is symmetric to the exchange of the new and reference images, (8) is at least an order of magnitude faster to compute than some popular methods, and (9) is straightforward to implement. Furthermore, we present extensions of this method that make it resilient to registration errors, color-refraction errors, and any noise source that can be modeled. In addition, we show that the optimal way to prepare a reference image is the proper image coaddition presented in Zackay & Ofek. We demonstrate this method on simulated data and real observations from the PTF data release 2. We provide an implementation of this algorithm in MATLAB and Python.

  17. [Statistical validity of the Mexican Food Security Scale and the Latin American and Caribbean Food Security Scale].

    PubMed

    Villagómez-Ornelas, Paloma; Hernández-López, Pedro; Carrasco-Enríquez, Brenda; Barrios-Sánchez, Karina; Pérez-Escamilla, Rafael; Melgar-Quiñónez, Hugo

    2014-01-01

    This article validates the statistical consistency of two food security scales: the Mexican Food Security Scale (EMSA) and the Latin American and Caribbean Food Security Scale (ELCSA). Validity tests were conducted in order to verify that both scales were consistent instruments, conformed by independent, properly calibrated and adequately sorted items, arranged in a continuum of severity. The following tests were developed: sorting of items; Cronbach's alpha analysis; parallelism of prevalence curves; Rasch models; sensitivity analysis through mean differences' hypothesis test. The tests showed that both scales meet the required attributes and are robust statistical instruments for food security measurement. This is relevant given that the lack of access to food indicator, included in multidimensional poverty measurement in Mexico, is calculated with EMSA.

  18. Testing jumps via false discovery rate control.

    PubMed

    Yen, Yu-Min

    2013-01-01

    Many recently developed nonparametric jump tests can be viewed as multiple hypothesis testing problems. For such multiple hypothesis tests, it is well known that controlling type I error often makes a large proportion of erroneous rejections, and such situation becomes even worse when the jump occurrence is a rare event. To obtain more reliable results, we aim to control the false discovery rate (FDR), an efficient compound error measure for erroneous rejections in multiple testing problems. We perform the test via the Barndorff-Nielsen and Shephard (BNS) test statistic, and control the FDR with the Benjamini and Hochberg (BH) procedure. We provide asymptotic results for the FDR control. From simulations, we examine relevant theoretical results and demonstrate the advantages of controlling the FDR. The hybrid approach is then applied to empirical analysis on two benchmark stock indices with high frequency data.

  19. Underascertainment of Child Abuse Fatalities in France: Retrospective Analysis of Judicial Data to Assess Underreporting of Infant Homicides in Mortality Statistics

    ERIC Educational Resources Information Center

    Tursz, Anne; Crost, Monique; Gerbouin-Rerolle, Pascale; Cook, Jon M.

    2010-01-01

    Objectives: Test the hypothesis of an underestimation of infant homicides in mortality statistics in France; identify its causes; examine data from the judicial system and their contribution in correcting this underestimation. Methods: A retrospective, cross-sectional study was carried out in 26 courts in three regions of France of cases of infant…

  20. The Skillings-Mack test (Friedman test when there are missing data).

    PubMed

    Chatfield, Mark; Mander, Adrian

    2009-04-01

    The Skillings-Mack statistic (Skillings and Mack, 1981, Technometrics 23: 171-177) is a general Friedman-type statistic that can be used in almost any block design with an arbitrary missing-data structure. The missing data can be either missing by design, for example, an incomplete block design, or missing completely at random. The Skillings-Mack test is equivalent to the Friedman test when there are no missing data in a balanced complete block design, and the Skillings-Mack test is equivalent to the test suggested in Durbin (1951, British Journal of Psychology, Statistical Section 4: 85-90) for a balanced incomplete block design. The Friedman test was implemented in Stata by Goldstein (1991, Stata Technical Bulletin 3: 26-27) and further developed in Goldstein (2005, Stata Journal 5: 285). This article introduces the skilmack command, which performs the Skillings-Mack test.The skilmack command is also useful when there are many ties or equal ranks (N.B. the Friedman statistic compared with the chi(2) distribution will give a conservative result), as well as for small samples; appropriate results can be obtained by simulating the distribution of the test statistic under the null hypothesis.

  1. Profitability of HMOs: does non-profit status make a difference?

    PubMed

    Bryce, H J

    1994-06-01

    This study, based on 163 HMOs, tests the hypothesis that the rates of return on assets (ROA) are not significantly different between for-profit and non-profit HMOs. It finds no statistical support for rejecting the hypothesis. The marked similarity in profitability is fully explained by analyzing methods of cost control and accounting, operational incentives and constraints, and price determination. The paper concludes that profitability is not a defining distinction in the operation of managed care.

  2. Statistical power analysis in wildlife research

    USGS Publications Warehouse

    Steidl, R.J.; Hayes, J.P.

    1997-01-01

    Statistical power analysis can be used to increase the efficiency of research efforts and to clarify research results. Power analysis is most valuable in the design or planning phases of research efforts. Such prospective (a priori) power analyses can be used to guide research design and to estimate the number of samples necessary to achieve a high probability of detecting biologically significant effects. Retrospective (a posteriori) power analysis has been advocated as a method to increase information about hypothesis tests that were not rejected. However, estimating power for tests of null hypotheses that were not rejected with the effect size observed in the study is incorrect; these power estimates will always be a??0.50 when bias adjusted and have no relation to true power. Therefore, retrospective power estimates based on the observed effect size for hypothesis tests that were not rejected are misleading; retrospective power estimates are only meaningful when based on effect sizes other than the observed effect size, such as those effect sizes hypothesized to be biologically significant. Retrospective power analysis can be used effectively to estimate the number of samples or effect size that would have been necessary for a completed study to have rejected a specific null hypothesis. Simply presenting confidence intervals can provide additional information about null hypotheses that were not rejected, including information about the size of the true effect and whether or not there is adequate evidence to 'accept' a null hypothesis as true. We suggest that (1) statistical power analyses be routinely incorporated into research planning efforts to increase their efficiency, (2) confidence intervals be used in lieu of retrospective power analyses for null hypotheses that were not rejected to assess the likely size of the true effect, (3) minimum biologically significant effect sizes be used for all power analyses, and (4) if retrospective power estimates are to be reported, then the I?-level, effect sizes, and sample sizes used in calculations must also be reported.

  3. Methodological issues with adaptation of clinical trial design.

    PubMed

    Hung, H M James; Wang, Sue-Jane; O'Neill, Robert T

    2006-01-01

    Adaptation of clinical trial design generates many issues that have not been resolved for practical applications, though statistical methodology has advanced greatly. This paper focuses on some methodological issues. In one type of adaptation such as sample size re-estimation, only the postulated value of a parameter for planning the trial size may be altered. In another type, the originally intended hypothesis for testing may be modified using the internal data accumulated at an interim time of the trial, such as changing the primary endpoint and dropping a treatment arm. For sample size re-estimation, we make a contrast between an adaptive test weighting the two-stage test statistics with the statistical information given by the original design and the original sample mean test with a properly corrected critical value. We point out the difficulty in planning a confirmatory trial based on the crude information generated by exploratory trials. In regards to selecting a primary endpoint, we argue that the selection process that allows switching from one endpoint to the other with the internal data of the trial is not very likely to gain a power advantage over the simple process of selecting one from the two endpoints by testing them with an equal split of alpha (Bonferroni adjustment). For dropping a treatment arm, distributing the remaining sample size of the discontinued arm to other treatment arms can substantially improve the statistical power of identifying a superior treatment arm in the design. A common difficult methodological issue is that of how to select an adaptation rule in the trial planning stage. Pre-specification of the adaptation rule is important for the practicality consideration. Changing the originally intended hypothesis for testing with the internal data generates great concerns to clinical trial researchers.

  4. An Analysis of Competencies for Managing Science and Technology Programs

    DTIC Science & Technology

    2008-03-19

    competency modeling through a two-year task force commissioned by the Society for Industrial and Organizational Psychology (Shippmann and others, 2000:704...positions—specifically within Research and Development (R&D) programs. If so, the final investigative question tests whether those differences are...statistics are used to analyze the comparisons through hypothesis testing and t- tests relevant to the research investigative questions. These

  5. Exploring students’ perceived and actual ability in solving statistical problems based on Rasch measurement tools

    NASA Astrophysics Data System (ADS)

    Azila Che Musa, Nor; Mahmud, Zamalia; Baharun, Norhayati

    2017-09-01

    One of the important skills that is required from any student who are learning statistics is knowing how to solve statistical problems correctly using appropriate statistical methods. This will enable them to arrive at a conclusion and make a significant contribution and decision for the society. In this study, a group of 22 students majoring in statistics at UiTM Shah Alam were given problems relating to topics on testing of hypothesis which require them to solve the problems using confidence interval, traditional and p-value approach. Hypothesis testing is one of the techniques used in solving real problems and it is listed as one of the difficult concepts for students to grasp. The objectives of this study is to explore students’ perceived and actual ability in solving statistical problems and to determine which item in statistical problem solving that students find difficult to grasp. Students’ perceived and actual ability were measured based on the instruments developed from the respective topics. Rasch measurement tools such as Wright map and item measures for fit statistics were used to accomplish the objectives. Data were collected and analysed using Winsteps 3.90 software which is developed based on the Rasch measurement model. The results showed that students’ perceived themselves as moderately competent in solving the statistical problems using confidence interval and p-value approach even though their actual performance showed otherwise. Item measures for fit statistics also showed that the maximum estimated measures were found on two problems. These measures indicate that none of the students have attempted these problems correctly due to reasons which include their lack of understanding in confidence interval and probability values.

  6. Exploring High School Students Beginning Reasoning about Significance Tests with Technology

    ERIC Educational Resources Information Center

    García, Víctor N.; Sánchez, Ernesto

    2017-01-01

    In the present study we analyze how students reason about or make inferences given a particular hypothesis testing problem (without having studied formal methods of statistical inference) when using Fathom. They use Fathom to create an empirical sampling distribution through computer simulation. It is found that most student´s reasoning rely on…

  7. Fusion And Inference From Multiple And Massive Disparate Distributed Dynamic Data Sets

    DTIC Science & Technology

    2017-07-01

    principled methodology for two-sample graph testing; designed a provably almost-surely perfect vertex clustering algorithm for block model graphs; proved...3.7 Semi-Supervised Clustering Methodology ...................................................................... 9 3.8 Robust Hypothesis Testing...dimensional Euclidean space – allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for

  8. Automated Hypothesis Tests and Standard Errors for Nonstandard Problems with Description of Computer Package: A Draft.

    ERIC Educational Resources Information Center

    Lord, Frederic M.; Stocking, Martha

    A general Computer program is described that will compute asymptotic standard errors and carry out significance tests for an endless variety of (standard and) nonstandard large-sample statistical problems, without requiring the statistician to derive asymptotic standard error formulas. The program assumes that the observations have a multinormal…

  9. Preparing for the first meeting with a statistician.

    PubMed

    De Muth, James E

    2008-12-15

    Practical statistical issues that should be considered when performing data collection and analysis are reviewed. The meeting with a statistician should take place early in the research development before any study data are collected. The process of statistical analysis involves establishing the research question, formulating a hypothesis, selecting an appropriate test, sampling correctly, collecting data, performing tests, and making decisions. Once the objectives are established, the researcher can determine the characteristics or demographics of the individuals required for the study, how to recruit volunteers, what type of data are needed to answer the research question(s), and the best methods for collecting the required information. There are two general types of statistics: descriptive and inferential. Presenting data in a more palatable format for the reader is called descriptive statistics. Inferential statistics involve making an inference or decision about a population based on results obtained from a sample of that population. In order for the results of a statistical test to be valid, the sample should be representative of the population from which it is drawn. When collecting information about volunteers, researchers should only collect information that is directly related to the study objectives. Important information that a statistician will require first is an understanding of the type of variables involved in the study and which variables can be controlled by researchers and which are beyond their control. Data can be presented in one of four different measurement scales: nominal, ordinal, interval, or ratio. Hypothesis testing involves two mutually exclusive and exhaustive statements related to the research question. Statisticians should not be replaced by computer software, and they should be consulted before any research data are collected. When preparing to meet with a statistician, the pharmacist researcher should be familiar with the steps of statistical analysis and consider several questions related to the study to be conducted.

  10. A simple test of association for contingency tables with multiple column responses.

    PubMed

    Decady, Y J; Thomas, D R

    2000-09-01

    Loughin and Scherer (1998, Biometrics 54, 630-637) investigated tests of association in two-way tables when one of the categorical variables allows for multiple-category responses from individual respondents. Standard chi-squared tests are invalid in this case, and they developed a bootstrap test procedure that provides good control of test levels under the null hypothesis. This procedure and some others that have been proposed are computationally involved and are based on techniques that are relatively unfamiliar to many practitioners. In this paper, the methods introduced by Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) for analyzing complex survey data are used to develop a simple test based on a corrected chi-squared statistic.

  11. Investigations of interference between electromagnetic transponders and wireless MOSFET dosimeters: A phantom study

    PubMed Central

    Su, Zhong; Zhang, Lisha; Ramakrishnan, V.; Hagan, Michael; Anscher, Mitchell

    2011-01-01

    Purpose: To evaluate both the Calypso Systems’ (Calypso Medical Technologies, Inc., Seattle, WA) localization accuracy in the presence of wireless metal–oxide–semiconductor field-effect transistor (MOSFET) dosimeters of dose verification system (DVS, Sicel Technologies, Inc., Morrisville, NC) and the dosimeters’ reading accuracy in the presence of wireless electromagnetic transponders inside a phantom.Methods: A custom-made, solid-water phantom was fabricated with space for transponders and dosimeters. Two inserts were machined with positioning grooves precisely matching the dimensions of the transponders and dosimeters and were arranged in orthogonal and parallel orientations, respectively. To test the transponder localization accuracy with∕without presence of dosimeters (hypothesis 1), multivariate analyses were performed on transponder-derived localization data with and without dosimeters at each preset distance to detect statistically significant localization differences between the control and test sets. To test dosimeter dose-reading accuracy with∕without presence of transponders (hypothesis 2), an approach of alternating the transponder presence in seven identical fraction dose (100 cGy) deliveries and measurements was implemented. Two-way analysis of variance was performed to examine statistically significant dose-reading differences between the two groups and the different fractions. A relative-dose analysis method was also used to evaluate transponder impact on dose-reading accuracy after dose-fading effect was removed by a second-order polynomial fit.Results: Multivariate analysis indicated that hypothesis 1 was false; there was a statistically significant difference between the localization data from the control and test sets. However, the upper and lower bounds of the 95% confidence intervals of the localized positional differences between the control and test sets were less than 0.1 mm, which was significantly smaller than the minimum clinical localization resolution of 0.5 mm. For hypothesis 2, analysis of variance indicated that there was no statistically significant difference between the dosimeter readings with and without the presence of transponders. Both orthogonal and parallel configurations had difference of polynomial-fit dose to measured dose values within 1.75%.Conclusions: The phantom study indicated that the Calypso System’s localization accuracy was not affected clinically due to the presence of DVS wireless MOSFET dosimeters and the dosimeter-measured doses were not affected by the presence of transponders. Thus, the same patients could be implanted with both transponders and dosimeters to benefit from improved accuracy of radiotherapy treatments offered by conjunctional use of the two systems. PMID:21776780

  12. An introduction to medical statistics for health care professionals: Hypothesis tests and estimation.

    PubMed

    Thomas, Elaine

    2005-01-01

    This article is the second in a series of three that will give health care professionals (HCPs) a sound introduction to medical statistics (Thomas, 2004). The objective of research is to find out about the population at large. However, it is generally not possible to study the whole of the population and research questions are addressed in an appropriate study sample. The next crucial step is then to use the information from the sample of individuals to make statements about the wider population of like individuals. This procedure of drawing conclusions about the population, based on study data, is known as inferential statistics. The findings from the study give us the best estimate of what is true for the relevant population, given the sample is representative of the population. It is important to consider how accurate this best estimate is, based on a single sample, when compared to the unknown population figure. Any difference between the observed sample result and the population characteristic is termed the sampling error. This article will cover the two main forms of statistical inference (hypothesis tests and estimation) along with issues that need to be addressed when considering the implications of the study results. Copyright (c) 2005 Whurr Publishers Ltd.

  13. Testing the Developmental Origins of Health and Disease Hypothesis for Psychopathology Using Family-Based Quasi-Experimental Designs

    PubMed Central

    D’Onofrio, Brian M.; Class, Quetzal A.; Lahey, Benjamin B.; Larsson, Henrik

    2014-01-01

    The Developmental Origin of Health and Disease (DOHaD) hypothesis is a broad theoretical framework that emphasizes how early risk factors have a causal influence on psychopathology. Researchers have raised concerns about the causal interpretation of statistical associations between early risk factors and later psychopathology because most existing studies have been unable to rule out the possibility of environmental and genetic confounding. In this paper we illustrate how family-based quasi-experimental designs can test the DOHaD hypothesis by ruling out alternative hypotheses. We review the logic underlying sibling-comparison, co-twin control, offspring of siblings/twins, adoption, and in vitro fertilization designs. We then present results from studies using these designs focused on broad indices of fetal development (low birth weight and gestational age) and a particular teratogen, smoking during pregnancy. The results provide mixed support for the DOHaD hypothesis for psychopathology, illustrating the critical need to use design features that rule out unmeasured confounding. PMID:25364377

  14. Tree-space statistics and approximations for large-scale analysis of anatomical trees.

    PubMed

    Feragen, Aasa; Owen, Megan; Petersen, Jens; Wille, Mathilde M W; Thomsen, Laura H; Dirksen, Asger; de Bruijne, Marleen

    2013-01-01

    Statistical analysis of anatomical trees is hard to perform due to differences in the topological structure of the trees. In this paper we define statistical properties of leaf-labeled anatomical trees with geometric edge attributes by considering the anatomical trees as points in the geometric space of leaf-labeled trees. This tree-space is a geodesic metric space where any two trees are connected by a unique shortest path, which corresponds to a tree deformation. However, tree-space is not a manifold, and the usual strategy of performing statistical analysis in a tangent space and projecting onto tree-space is not available. Using tree-space and its shortest paths, a variety of statistical properties, such as mean, principal component, hypothesis testing and linear discriminant analysis can be defined. For some of these properties it is still an open problem how to compute them; others (like the mean) can be computed, but efficient alternatives are helpful in speeding up algorithms that use means iteratively, like hypothesis testing. In this paper, we take advantage of a very large dataset (N = 8016) to obtain computable approximations, under the assumption that the data trees parametrize the relevant parts of tree-space well. Using the developed approximate statistics, we illustrate how the structure and geometry of airway trees vary across a population and show that airway trees with Chronic Obstructive Pulmonary Disease come from a different distribution in tree-space than healthy ones. Software is available from http://image.diku.dk/aasa/software.php.

  15. Autoregressive statistical pattern recognition algorithms for damage detection in civil structures

    NASA Astrophysics Data System (ADS)

    Yao, Ruigen; Pakzad, Shamim N.

    2012-08-01

    Statistical pattern recognition has recently emerged as a promising set of complementary methods to system identification for automatic structural damage assessment. Its essence is to use well-known concepts in statistics for boundary definition of different pattern classes, such as those for damaged and undamaged structures. In this paper, several statistical pattern recognition algorithms using autoregressive models, including statistical control charts and hypothesis testing, are reviewed as potentially competitive damage detection techniques. To enhance the performance of statistical methods, new feature extraction techniques using model spectra and residual autocorrelation, together with resampling-based threshold construction methods, are proposed. Subsequently, simulated acceleration data from a multi degree-of-freedom system is generated to test and compare the efficiency of the existing and proposed algorithms. Data from laboratory experiments conducted on a truss and a large-scale bridge slab model are then used to further validate the damage detection methods and demonstrate the superior performance of proposed algorithms.

  16. Genetics and recent human evolution.

    PubMed

    Templeton, Alan R

    2007-07-01

    Starting with "mitochondrial Eve" in 1987, genetics has played an increasingly important role in studies of the last two million years of human evolution. It initially appeared that genetic data resolved the basic models of recent human evolution in favor of the "out-of-Africa replacement" hypothesis in which anatomically modern humans evolved in Africa about 150,000 years ago, started to spread throughout the world about 100,000 years ago, and subsequently drove to complete genetic extinction (replacement) all other human populations in Eurasia. Unfortunately, many of the genetic studies on recent human evolution have suffered from scientific flaws, including misrepresenting the models of recent human evolution, focusing upon hypothesis compatibility rather than hypothesis testing, committing the ecological fallacy, and failing to consider a broader array of alternative hypotheses. Once these flaws are corrected, there is actually little genetic support for the out-of-Africa replacement hypothesis. Indeed, when genetic data are used in a hypothesis-testing framework, the out-of-Africa replacement hypothesis is strongly rejected. The model of recent human evolution that emerges from a statistical hypothesis-testing framework does not correspond to any of the traditional models of human evolution, but it is compatible with fossil and archaeological data. These studies also reveal that any one gene or DNA region captures only a small part of human evolutionary history, so multilocus studies are essential. As more and more loci became available, genetics will undoubtedly offer additional insights and resolutions of human evolution.

  17. Statistical aspects of genetic association testing in small samples, based on selective DNA pooling data in the arctic fox.

    PubMed

    Szyda, Joanna; Liu, Zengting; Zatoń-Dobrowolska, Magdalena; Wierzbicki, Heliodor; Rzasa, Anna

    2008-01-01

    We analysed data from a selective DNA pooling experiment with 130 individuals of the arctic fox (Alopex lagopus), which originated from 2 different types regarding body size. The association between alleles of 6 selected unlinked molecular markers and body size was tested by using univariate and multinomial logistic regression models, applying odds ratio and test statistics from the power divergence family. Due to the small sample size and the resulting sparseness of the data table, in hypothesis testing we could not rely on the asymptotic distributions of the tests. Instead, we tried to account for data sparseness by (i) modifying confidence intervals of odds ratio; (ii) using a normal approximation of the asymptotic distribution of the power divergence tests with different approaches for calculating moments of the statistics; and (iii) assessing P values empirically, based on bootstrap samples. As a result, a significant association was observed for 3 markers. Furthermore, we used simulations to assess the validity of the normal approximation of the asymptotic distribution of the test statistics under the conditions of small and sparse samples.

  18. STATISTICAL METHODOLOGY FOR ESTIMATING PARAMETERS IN PBPK/PD MODELS

    EPA Science Inventory

    PBPK/PD models are large dynamic models that predict tissue concentration and biological effects of a toxicant before PBPK/PD models can be used in risk assessments in the arena of toxicological hypothesis testing, models allow the consequences of alternative mechanistic hypothes...

  19. Statistical inference for tumor growth inhibition T/C ratio.

    PubMed

    Wu, Jianrong

    2010-09-01

    The tumor growth inhibition T/C ratio is commonly used to quantify treatment effects in drug screening tumor xenograft experiments. The T/C ratio is converted to an antitumor activity rating using an arbitrary cutoff point and often without any formal statistical inference. Here, we applied a nonparametric bootstrap method and a small sample likelihood ratio statistic to make a statistical inference of the T/C ratio, including both hypothesis testing and a confidence interval estimate. Furthermore, sample size and power are also discussed for statistical design of tumor xenograft experiments. Tumor xenograft data from an actual experiment were analyzed to illustrate the application.

  20. After p Values: The New Statistics for Undergraduate Neuroscience Education.

    PubMed

    Calin-Jageman, Robert J

    2017-01-01

    Statistical inference is a methodological cornerstone for neuroscience education. For many years this has meant inculcating neuroscience majors into null hypothesis significance testing with p values. There is increasing concern, however, about the pervasive misuse of p values. It is time to start planning statistics curricula for neuroscience majors that replaces or de-emphasizes p values. One promising alternative approach is what Cumming has dubbed the "New Statistics", an approach that emphasizes effect sizes, confidence intervals, meta-analysis, and open science. I give an example of the New Statistics in action and describe some of the key benefits of adopting this approach in neuroscience education.

  1. Classification image analysis: estimation and statistical inference for two-alternative forced-choice experiments

    NASA Technical Reports Server (NTRS)

    Abbey, Craig K.; Eckstein, Miguel P.

    2002-01-01

    We consider estimation and statistical hypothesis testing on classification images obtained from the two-alternative forced-choice experimental paradigm. We begin with a probabilistic model of task performance for simple forced-choice detection and discrimination tasks. Particular attention is paid to general linear filter models because these models lead to a direct interpretation of the classification image as an estimate of the filter weights. We then describe an estimation procedure for obtaining classification images from observer data. A number of statistical tests are presented for testing various hypotheses from classification images based on some more compact set of features derived from them. As an example of how the methods we describe can be used, we present a case study investigating detection of a Gaussian bump profile.

  2. Using the Bootstrap Method for a Statistical Significance Test of Differences between Summary Histograms

    NASA Technical Reports Server (NTRS)

    Xu, Kuan-Man

    2006-01-01

    A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.

  3. The Heuristic Value of p in Inductive Statistical Inference

    PubMed Central

    Krueger, Joachim I.; Heck, Patrick R.

    2017-01-01

    Many statistical methods yield the probability of the observed data – or data more extreme – under the assumption that a particular hypothesis is true. This probability is commonly known as ‘the’ p-value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p-value has been subjected to much speculation, analysis, and criticism. We explore how well the p-value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p-value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p-value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say. PMID:28649206

  4. The Heuristic Value of p in Inductive Statistical Inference.

    PubMed

    Krueger, Joachim I; Heck, Patrick R

    2017-01-01

    Many statistical methods yield the probability of the observed data - or data more extreme - under the assumption that a particular hypothesis is true. This probability is commonly known as 'the' p -value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p -value has been subjected to much speculation, analysis, and criticism. We explore how well the p -value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p -value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p -value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say.

  5. STAPP: Spatiotemporal analysis of plantar pressure measurements using statistical parametric mapping.

    PubMed

    Booth, Brian G; Keijsers, Noël L W; Sijbers, Jan; Huysmans, Toon

    2018-05-03

    Pedobarography produces large sets of plantar pressure samples that are routinely subsampled (e.g. using regions of interest) or aggregated (e.g. center of pressure trajectories, peak pressure images) in order to simplify statistical analysis and provide intuitive clinical measures. We hypothesize that these data reductions discard gait information that can be used to differentiate between groups or conditions. To test the hypothesis of null information loss, we created an implementation of statistical parametric mapping (SPM) for dynamic plantar pressure datasets (i.e. plantar pressure videos). Our SPM software framework brings all plantar pressure videos into anatomical and temporal correspondence, then performs statistical tests at each sampling location in space and time. Novelly, we introduce non-linear temporal registration into the framework in order to normalize for timing differences within the stance phase. We refer to our software framework as STAPP: spatiotemporal analysis of plantar pressure measurements. Using STAPP, we tested our hypothesis on plantar pressure videos from 33 healthy subjects walking at different speeds. As walking speed increased, STAPP was able to identify significant decreases in plantar pressure at mid-stance from the heel through the lateral forefoot. The extent of these plantar pressure decreases has not previously been observed using existing plantar pressure analysis techniques. We therefore conclude that the subsampling of plantar pressure videos - a task which led to the discarding of gait information in our study - can be avoided using STAPP. Copyright © 2018 Elsevier B.V. All rights reserved.

  6. The role of control groups in mutagenicity studies: matching biological and statistical relevance.

    PubMed

    Hauschke, Dieter; Hothorn, Torsten; Schäfer, Juliane

    2003-06-01

    The statistical test of the conventional hypothesis of "no treatment effect" is commonly used in the evaluation of mutagenicity experiments. Failing to reject the hypothesis often leads to the conclusion in favour of safety. The major drawback of this indirect approach is that what is controlled by a prespecified level alpha is the probability of erroneously concluding hazard (producer risk). However, the primary concern of safety assessment is the control of the consumer risk, i.e. limiting the probability of erroneously concluding that a product is safe. In order to restrict this risk, safety has to be formulated as the alternative, and hazard, i.e. the opposite, has to be formulated as the hypothesis. The direct safety approach is examined for the case when the corresponding threshold value is expressed either as a fraction of the population mean for the negative control, or as a fraction of the difference between the positive and negative controls.

  7. The Coopersmith Self-Esteem Inventory Related to Academic Achievement and School Behavior. Interim Research Report # 23.

    ERIC Educational Resources Information Center

    Rubin, Rosalyn; And Others

    Scores on the Coopersmith Self-Esteem Inventory were related to scores on achievement and intelligence tests, and to socioeconomic level and to teachers' ratings of student behavior, in order to test the hypothesis that student self esteem would have a positive effect on academic achievement. There was a small but statistically significant…

  8. Identifying biologically relevant differences between metagenomic communities.

    PubMed

    Parks, Donovan H; Beiko, Robert G

    2010-03-15

    Metagenomics is the study of genetic material recovered directly from environmental samples. Taxonomic and functional differences between metagenomic samples can highlight the influence of ecological factors on patterns of microbial life in a wide range of habitats. Statistical hypothesis tests can help us distinguish ecological influences from sampling artifacts, but knowledge of only the P-value from a statistical hypothesis test is insufficient to make inferences about biological relevance. Current reporting practices for pairwise comparative metagenomics are inadequate, and better tools are needed for comparative metagenomic analysis. We have developed a new software package, STAMP, for comparative metagenomics that supports best practices in analysis and reporting. Examination of a pair of iron mine metagenomes demonstrates that deeper biological insights can be gained using statistical techniques available in our software. An analysis of the functional potential of 'Candidatus Accumulibacter phosphatis' in two enhanced biological phosphorus removal metagenomes identified several subsystems that differ between the A.phosphatis stains in these related communities, including phosphate metabolism, secretion and metal transport. Python source code and binaries are freely available from our website at http://kiwi.cs.dal.ca/Software/STAMP CONTACT: beiko@cs.dal.ca Supplementary data are available at Bioinformatics online.

  9. An Establishment-Level Test of the Statistical Discrimination Hypothesis.

    ERIC Educational Resources Information Center

    Tomaskovic-Devey, Donald; Skaggs, Sheryl

    1999-01-01

    Analysis of a sample of 306 workers shows that neither the gender nor racial composition of the workplace is associated with productivity. An alternative explanation for lower wages of women and minorities is social closure--the monopolizing of desirable positions by advantaged workers. (SK)

  10. Statistical Smoothing Methods and Image Analysis

    DTIC Science & Technology

    1988-12-01

    83 - 111. Rosenfeld, A. and Kak, A.C. (1982). Digital Picture Processing. Academic Press,Qrlando. Serra, J. (1982). Image Analysis and Mat hematical ...hypothesis testing. IEEE Trans. Med. Imaging, MI-6, 313-319. Wicksell, S.D. (1925) The corpuscle problem. A mathematical study of a biometric problem

  11. In silico experiment system for testing hypothesis on gene functions using three condition specific biological networks.

    PubMed

    Lee, Chai-Jin; Kang, Dongwon; Lee, Sangseon; Lee, Sunwon; Kang, Jaewoo; Kim, Sun

    2018-05-25

    Determining functions of a gene requires time consuming, expensive biological experiments. Scientists can speed up this experimental process if the literature information and biological networks can be adequately provided. In this paper, we present a web-based information system that can perform in silico experiments of computationally testing hypothesis on the function of a gene. A hypothesis that is specified in English by the user is converted to genes using a literature and knowledge mining system called BEST. Condition-specific TF, miRNA and PPI (protein-protein interaction) networks are automatically generated by projecting gene and miRNA expression data to template networks. Then, an in silico experiment is to test how well the target genes are connected from the knockout gene through the condition-specific networks. The test result visualizes path from the knockout gene to the target genes in the three networks. Statistical and information-theoretic scores are provided on the resulting web page to help scientists either accept or reject the hypothesis being tested. Our web-based system was extensively tested using three data sets, such as E2f1, Lrrk2, and Dicer1 knockout data sets. We were able to re-produce gene functions reported in the original research papers. In addition, we comprehensively tested with all disease names in MalaCards as hypothesis to show the effectiveness of our system. Our in silico experiment system can be very useful in suggesting biological mechanisms which can be further tested in vivo or in vitro. http://biohealth.snu.ac.kr/software/insilico/. Copyright © 2018 Elsevier Inc. All rights reserved.

  12. A risk-based approach to flood management decisions in a nonstationary world

    NASA Astrophysics Data System (ADS)

    Rosner, Ana; Vogel, Richard M.; Kirshen, Paul H.

    2014-03-01

    Traditional approaches to flood management in a nonstationary world begin with a null hypothesis test of "no trend" and its likelihood, with little or no attention given to the likelihood that we might ignore a trend if it really existed. Concluding a trend exists when it does not, or rejecting a trend when it exists are known as type I and type II errors, respectively. Decision-makers are poorly served by statistical and/or decision methods that do not carefully consider both over- and under-preparation errors, respectively. Similarly, little attention is given to how to integrate uncertainty in our ability to detect trends into a flood management decision context. We show how trend hypothesis test results can be combined with an adaptation's infrastructure costs and damages avoided to provide a rational decision approach in a nonstationary world. The criterion of expected regret is shown to be a useful metric that integrates the statistical, economic, and hydrological aspects of the flood management problem in a nonstationary world.

  13. HDBStat!: a platform-independent software suite for statistical analysis of high dimensional biology data.

    PubMed

    Trivedi, Prinal; Edwards, Jode W; Wang, Jelai; Gadbury, Gary L; Srinivasasainagendra, Vinodh; Zakharkin, Stanislav O; Kim, Kyoungmi; Mehta, Tapan; Brand, Jacob P L; Patki, Amit; Page, Grier P; Allison, David B

    2005-04-06

    Many efforts in microarray data analysis are focused on providing tools and methods for the qualitative analysis of microarray data. HDBStat! (High-Dimensional Biology-Statistics) is a software package designed for analysis of high dimensional biology data such as microarray data. It was initially developed for the analysis of microarray gene expression data, but it can also be used for some applications in proteomics and other aspects of genomics. HDBStat! provides statisticians and biologists a flexible and easy-to-use interface to analyze complex microarray data using a variety of methods for data preprocessing, quality control analysis and hypothesis testing. Results generated from data preprocessing methods, quality control analysis and hypothesis testing methods are output in the form of Excel CSV tables, graphs and an Html report summarizing data analysis. HDBStat! is a platform-independent software that is freely available to academic institutions and non-profit organizations. It can be downloaded from our website http://www.soph.uab.edu/ssg_content.asp?id=1164.

  14. Concerns regarding a call for pluralism of information theory and hypothesis testing

    USGS Publications Warehouse

    Lukacs, P.M.; Thompson, W.L.; Kendall, W.L.; Gould, W.R.; Doherty, P.F.; Burnham, K.P.; Anderson, D.R.

    2007-01-01

    1. Stephens et al . (2005) argue for `pluralism? in statistical analysis, combining null hypothesis testing and information-theoretic (I-T) methods. We show that I-T methods are more informative even in single variable problems and we provide an ecological example. 2. I-T methods allow inferences to be made from multiple models simultaneously. We believe multimodel inference is the future of data analysis, which cannot be achieved with null hypothesis-testing approaches. 3. We argue for a stronger emphasis on critical thinking in science in general and less reliance on exploratory data analysis and data dredging. Deriving alternative hypotheses is central to science; deriving a single interesting science hypothesis and then comparing it to a default null hypothesis (e.g. `no difference?) is not an efficient strategy for gaining knowledge. We think this single-hypothesis strategy has been relied upon too often in the past. 4. We clarify misconceptions presented by Stephens et al . (2005). 5. We think inference should be made about models, directly linked to scientific hypotheses, and their parameters conditioned on data, Prob(Hj| data). I-T methods provide a basis for this inference. Null hypothesis testing merely provides a probability statement about the data conditioned on a null model, Prob(data |H0). 6. Synthesis and applications. I-T methods provide a more informative approach to inference. I-T methods provide a direct measure of evidence for or against hypotheses and a means to consider simultaneously multiple hypotheses as a basis for rigorous inference. Progress in our science can be accelerated if modern methods can be used intelligently; this includes various I-T and Bayesian methods.

  15. Testing competing forms of the Milankovitch hypothesis: A multivariate approach

    NASA Astrophysics Data System (ADS)

    Kaufmann, Robert K.; Juselius, Katarina

    2016-02-01

    We test competing forms of the Milankovitch hypothesis by estimating the coefficients and diagnostic statistics for a cointegrated vector autoregressive model that includes 10 climate variables and four exogenous variables for solar insolation. The estimates are consistent with the physical mechanisms postulated to drive glacial cycles. They show that the climate variables are driven partly by solar insolation, determining the timing and magnitude of glaciations and terminations, and partly by internal feedback dynamics, pushing the climate variables away from equilibrium. We argue that the latter is consistent with a weak form of the Milankovitch hypothesis and that it should be restated as follows: internal climate dynamics impose perturbations on glacial cycles that are driven by solar insolation. Our results show that these perturbations are likely caused by slow adjustment between land ice volume and solar insolation. The estimated adjustment dynamics show that solar insolation affects an array of climate variables other than ice volume, each at a unique rate. This implies that previous efforts to test the strong form of the Milankovitch hypothesis by examining the relationship between solar insolation and a single climate variable are likely to suffer from omitted variable bias.

  16. Usefulness of Leukocyte Esterase Test Versus Rapid Strep Test for Diagnosis of Acute Strep Pharyngitis

    PubMed Central

    2015-01-01

    Objective: A study to compare the usage of throat swab testing for leukocyte esterase on a test strip(urine dip stick-multi stick) to rapid strep test for rapid diagnosis of Group A Beta hemolytic streptococci in cases of acute pharyngitis in children. Hypothesis: The testing of throat swab for leukocyte esterase on test strip currently used for urine testing may be used to detect throat infection and might be as useful as rapid strep. Methods: All patients who come with a complaint of sore throat and fever were examined clinically for erythema of pharynx, tonsils and also for any exudates. Informed consent was obtained from the parents and assent from the subjects. 3 swabs were taken from pharyngo-tonsillar region, testing for culture, rapid strep & Leukocyte Esterase. Results: Total number is 100. Cultures 9(+); for rapid strep== 84(-) and16 (+); For LE== 80(-) and 20(+) Statistics: From data configuration Rapid Strep versus LE test don’t seem to be a random (independent) assignment but extremely aligned. The Statistical results show rapid and LE show very agreeable results. Calculated Value of Chi Squared Exceeds Tabulated under 1 Degree Of Freedom (P<.0.0001) reject Null Hypothesis and Conclude Alternative Conclusions: Leukocyte esterase on throat swab is as useful as rapid strep test for rapid diagnosis of strep pharyngitis on test strip currently used for urine dip stick causing acute pharyngitis in children. PMID:27335975

  17. The fighting hypothesis in combat: how well does the fighting hypothesis explain human left-handed minorities?

    PubMed

    Groothuis, Ton G G; McManus, I C; Schaafsma, Sara M; Geuze, Reint H

    2013-06-01

    The strong population bias in hand preference in favor of right-handedness seems to be a typical human trait. An elegant evolutionary hypothesis explaining this trait is the so-called fighting hypothesis that postulates that left-handedness is under frequency-dependent selection. The fighting hypothesis assumes that left-handers, being in the minority because of health issues, are still maintained in the population since they would have a greater chance of winning in fights than right-handers due to a surprise effect. This review critically evaluates the assumptions and evidence for this hypothesis and concludes that some evidence, although consistent with the fighting hypothesis, does not directly support it and may also be interpreted differently. Other supportive data are ambiguous or open for both statistical and theoretical criticism. We conclude that, presently, evidence for the fighting hypothesis is not particularly strong, but that there is little evidence to reject it either. The hypothesis thus remains an intuitively plausible explanation for the persistent left-hand preference in the population. We suggest alternative explanations and several ways forward for obtaining more crucial data for testing this frequently cited hypothesis. © 2013 New York Academy of Sciences.

  18. Exchanging the liquidity hypothesis: Delay discounting of money and self-relevant non-money rewards

    PubMed Central

    Stuppy-Sullivan, Allison M.; Tormohlen, Kayla N.; Yi, Richard

    2015-01-01

    Evidence that primary rewards (e.g., food and drugs of abuse) are discounted more than money is frequently attributed to money's high degree of liquidity, or exchangeability for many commodities. The present study provides some evidence against this liquidity hypothesis by contrasting delay discounting of monetary rewards (liquid) and non-monetary commodities (non-liquid) that are self-relevant and utility-matched. Ninety-seven (97) undergraduate students initially completed a conventional binary-choice delay discounting of money task. Participants returned one week later and completed a self-relevant commodity delay discounting task. Both conventional hypothesis testing and more-conservative tests of statistical equivalence revealed correspondence in rate of delay discounting of money and self-relevant commodities, and in one magnitude condition, less discounting for the latter. The present results indicate that liquidity of money cannot fully account for the lower rate of delay discounting compared to non-money rewards. PMID:26556504

  19. Invited Commentary: The Need for Cognitive Science in Methodology.

    PubMed

    Greenland, Sander

    2017-09-15

    There is no complete solution for the problem of abuse of statistics, but methodological training needs to cover cognitive biases and other psychosocial factors affecting inferences. The present paper discusses 3 common cognitive distortions: 1) dichotomania, the compulsion to perceive quantities as dichotomous even when dichotomization is unnecessary and misleading, as in inferences based on whether a P value is "statistically significant"; 2) nullism, the tendency to privilege the hypothesis of no difference or no effect when there is no scientific basis for doing so, as when testing only the null hypothesis; and 3) statistical reification, treating hypothetical data distributions and statistical models as if they reflect known physical laws rather than speculative assumptions for thought experiments. As commonly misused, null-hypothesis significance testing combines these cognitive problems to produce highly distorted interpretation and reporting of study results. Interval estimation has so far proven to be an inadequate solution because it involves dichotomization, an avenue for nullism. Sensitivity and bias analyses have been proposed to address reproducibility problems (Am J Epidemiol. 2017;186(6):646-647); these methods can indeed address reification, but they can also introduce new distortions via misleading specifications for bias parameters. P values can be reframed to lessen distortions by presenting them without reference to a cutoff, providing them for relevant alternatives to the null, and recognizing their dependence on all assumptions used in their computation; they nonetheless require rescaling for measuring evidence. I conclude that methodological development and training should go beyond coverage of mechanistic biases (e.g., confounding, selection bias, measurement error) to cover distortions of conclusions produced by statistical methods and psychosocial forces. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  20. WASP (Write a Scientific Paper) using Excel - 8: t-Tests.

    PubMed

    Grech, Victor

    2018-06-01

    t-Testing is a common component of inferential statistics when comparing two means. This paper explains the central limit theorem and the concept of the null hypothesis as well as types of errors. On the practical side, this paper outlines how different t-tests may be performed in Microsoft Excel, for different purposes, both statically as well as dynamically, with Excel's functions. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. Statistical analysis of particle trajectories in living cells

    NASA Astrophysics Data System (ADS)

    Briane, Vincent; Kervrann, Charles; Vimond, Myriam

    2018-06-01

    Recent advances in molecular biology and fluorescence microscopy imaging have made possible the inference of the dynamics of molecules in living cells. Such inference allows us to understand and determine the organization and function of the cell. The trajectories of particles (e.g., biomolecules) in living cells, computed with the help of object tracking methods, can be modeled with diffusion processes. Three types of diffusion are considered: (i) free diffusion, (ii) subdiffusion, and (iii) superdiffusion. The mean-square displacement (MSD) is generally used to discriminate the three types of particle dynamics. We propose here a nonparametric three-decision test as an alternative to the MSD method. The rejection of the null hypothesis, i.e., free diffusion, is accompanied by claims of the direction of the alternative (subdiffusion or superdiffusion). We study the asymptotic behavior of the test statistic under the null hypothesis and under parametric alternatives which are currently considered in the biophysics literature. In addition, we adapt the multiple-testing procedure of Benjamini and Hochberg to fit with the three-decision-test setting, in order to apply the test procedure to a collection of independent trajectories. The performance of our procedure is much better than the MSD method as confirmed by Monte Carlo experiments. The method is demonstrated on real data sets corresponding to protein dynamics observed in fluorescence microscopy.

  2. The intermediates take it all: asymptotics of higher criticism statistics and a powerful alternative based on equal local levels.

    PubMed

    Gontscharuk, Veronika; Landwehr, Sandra; Finner, Helmut

    2015-01-01

    The higher criticism (HC) statistic, which can be seen as a normalized version of the famous Kolmogorov-Smirnov statistic, has a long history, dating back to the mid seventies. Originally, HC statistics were used in connection with goodness of fit (GOF) tests but they recently gained some attention in the context of testing the global null hypothesis in high dimensional data. The continuing interest for HC seems to be inspired by a series of nice asymptotic properties related to this statistic. For example, unlike Kolmogorov-Smirnov tests, GOF tests based on the HC statistic are known to be asymptotically sensitive in the moderate tails, hence it is favorably applied for detecting the presence of signals in sparse mixture models. However, some questions around the asymptotic behavior of the HC statistic are still open. We focus on two of them, namely, why a specific intermediate range is crucial for GOF tests based on the HC statistic and why the convergence of the HC distribution to the limiting one is extremely slow. Moreover, the inconsistency in the asymptotic and finite behavior of the HC statistic prompts us to provide a new HC test that has better finite properties than the original HC test while showing the same asymptotics. This test is motivated by the asymptotic behavior of the so-called local levels related to the original HC test. By means of numerical calculations and simulations we show that the new HC test is typically more powerful than the original HC test in normal mixture models. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Reporting Practices and Use of Quantitative Methods in Canadian Journal Articles in Psychology.

    PubMed

    Counsell, Alyssa; Harlow, Lisa L

    2017-05-01

    With recent focus on the state of research in psychology, it is essential to assess the nature of the statistical methods and analyses used and reported by psychological researchers. To that end, we investigated the prevalence of different statistical procedures and the nature of statistical reporting practices in recent articles from the four major Canadian psychology journals. The majority of authors evaluated their research hypotheses through the use of analysis of variance (ANOVA), t -tests, and multiple regression. Multivariate approaches were less common. Null hypothesis significance testing remains a popular strategy, but the majority of authors reported a standardized or unstandardized effect size measure alongside their significance test results. Confidence intervals on effect sizes were infrequently employed. Many authors provided minimal details about their statistical analyses and less than a third of the articles presented on data complications such as missing data and violations of statistical assumptions. Strengths of and areas needing improvement for reporting quantitative results are highlighted. The paper concludes with recommendations for how researchers and reviewers can improve comprehension and transparency in statistical reporting.

  4. Building Intuitions about Statistical Inference Based on Resampling

    ERIC Educational Resources Information Center

    Watson, Jane; Chance, Beth

    2012-01-01

    Formal inference, which makes theoretical assumptions about distributions and applies hypothesis testing procedures with null and alternative hypotheses, is notoriously difficult for tertiary students to master. The debate about whether this content should appear in Years 11 and 12 of the "Australian Curriculum: Mathematics" has gone on…

  5. SMAR Sessions

    NASA Technical Reports Server (NTRS)

    Ploutz-Snyder, Robert

    2011-01-01

    This slide presentation is a series of educational presentations that are on the statistical function of analysis of variance (ANOVA). Analysis of Variance (ANOVA) examines variability between groups, relative to within groups, to determine whether there's evidence that the groups are not from the same population. One other presentation reviews hypothesis testing.

  6. Confirmatory and Competitive Evaluation of Alternative Gene-Environment Interaction Hypotheses

    ERIC Educational Resources Information Center

    Belsky, Jay; Pluess, Michael; Widaman, Keith F.

    2013-01-01

    Background: Most gene-environment interaction (GXE) research, though based on clear, vulnerability-oriented hypotheses, is carried out using exploratory rather than hypothesis-informed statistical tests, limiting power and making formal evaluation of competing GXE propositions difficult. Method: We present and illustrate a new regression technique…

  7. Statistical Literacy: Simulations with Dolphins

    ERIC Educational Resources Information Center

    Strayer, Jeremy; Matuszewski, Amber

    2016-01-01

    In this article, Strayer and Matuszewski present a six-phase strategy that teachers can use to help students develop a conceptual understanding of inferential hypothesis testing through simulation. As Strayer and Matuszewski discuss the strategy, they describe each phase in general, explain how they implemented the phase while teaching their…

  8. Testing Pairwise Association between Spatially Autocorrelated Variables: A New Approach Using Surrogate Lattice Data

    PubMed Central

    Deblauwe, Vincent; Kennel, Pol; Couteron, Pierre

    2012-01-01

    Background Independence between observations is a standard prerequisite of traditional statistical tests of association. This condition is, however, violated when autocorrelation is present within the data. In the case of variables that are regularly sampled in space (i.e. lattice data or images), such as those provided by remote-sensing or geographical databases, this problem is particularly acute. Because analytic derivation of the null probability distribution of the test statistic (e.g. Pearson's r) is not always possible when autocorrelation is present, we propose instead the use of a Monte Carlo simulation with surrogate data. Methodology/Principal Findings The null hypothesis that two observed mapped variables are the result of independent pattern generating processes is tested here by generating sets of random image data while preserving the autocorrelation function of the original images. Surrogates are generated by matching the dual-tree complex wavelet spectra (and hence the autocorrelation functions) of white noise images with the spectra of the original images. The generated images can then be used to build the probability distribution function of any statistic of association under the null hypothesis. We demonstrate the validity of a statistical test of association based on these surrogates with both actual and synthetic data and compare it with a corrected parametric test and three existing methods that generate surrogates (randomization, random rotations and shifts, and iterative amplitude adjusted Fourier transform). Type I error control was excellent, even with strong and long-range autocorrelation, which is not the case for alternative methods. Conclusions/Significance The wavelet-based surrogates are particularly appropriate in cases where autocorrelation appears at all scales or is direction-dependent (anisotropy). We explore the potential of the method for association tests involving a lattice of binary data and discuss its potential for validation of species distribution models. An implementation of the method in Java for the generation of wavelet-based surrogates is available online as supporting material. PMID:23144961

  9. Analyzing thematic maps and mapping for accuracy

    USGS Publications Warehouse

    Rosenfield, G.H.

    1982-01-01

    Two problems which exist while attempting to test the accuracy of thematic maps and mapping are: (1) evaluating the accuracy of thematic content, and (2) evaluating the effects of the variables on thematic mapping. Statistical analysis techniques are applicable to both these problems and include techniques for sampling the data and determining their accuracy. In addition, techniques for hypothesis testing, or inferential statistics, are used when comparing the effects of variables. A comprehensive and valid accuracy test of a classification project, such as thematic mapping from remotely sensed data, includes the following components of statistical analysis: (1) sample design, including the sample distribution, sample size, size of the sample unit, and sampling procedure; and (2) accuracy estimation, including estimation of the variance and confidence limits. Careful consideration must be given to the minimum sample size necessary to validate the accuracy of a given. classification category. The results of an accuracy test are presented in a contingency table sometimes called a classification error matrix. Usually the rows represent the interpretation, and the columns represent the verification. The diagonal elements represent the correct classifications. The remaining elements of the rows represent errors by commission, and the remaining elements of the columns represent the errors of omission. For tests of hypothesis that compare variables, the general practice has been to use only the diagonal elements from several related classification error matrices. These data are arranged in the form of another contingency table. The columns of the table represent the different variables being compared, such as different scales of mapping. The rows represent the blocking characteristics, such as the various categories of classification. The values in the cells of the tables might be the counts of correct classification or the binomial proportions of these counts divided by either the row totals or the column totals from the original classification error matrices. In hypothesis testing, when the results of tests of multiple sample cases prove to be significant, some form of statistical test must be used to separate any results that differ significantly from the others. In the past, many analyses of the data in this error matrix were made by comparing the relative magnitudes of the percentage of correct classifications, for either individual categories, the entire map or both. More rigorous analyses have used data transformations and (or) two-way classification analysis of variance. A more sophisticated step of data analysis techniques would be to use the entire classification error matrices using the methods of discrete multivariate analysis or of multiviariate analysis of variance.

  10. Is there an association between astrological data and personality?

    PubMed

    Hume, N

    1977-07-01

    A test was made of the hypothesis that personality characteristics can be predicted on the basis of various features of the individual's astrological chart. Astrological charts were prepared for 196 college-age Ss who also were administered the MMPI and the Leary Interpersonal Check List. Ss were divided into those who had extreme scores on any of the 13 personality variables studied and those who did not. For each personality variable, comparisons were made on a large number of astrological dimensions between distributions of Ss with and without extreme test scores. Six hundred thirty-two such comparisons were made and evaluated with chi-square tests. In that the obtained number of statistically significnat chi-squares was less than what would be expected on a chance basis, the hypothesis was rejected.

  11. Detecting anomalies in CMB maps: a new method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Neelakanta, Jayanth T., E-mail: jayanthtn@gmail.com

    2015-10-01

    Ever since WMAP announced its first results, different analyses have shown that there is weak evidence for several large-scale anomalies in the CMB data. While the evidence for each anomaly appears to be weak, the fact that there are multiple seemingly unrelated anomalies makes it difficult to account for them via a single statistical fluke. So, one is led to considering a combination of these anomalies. But, if we ''hand-pick'' the anomalies (test statistics) to consider, we are making an a posteriori choice. In this article, we propose two statistics that do not suffer from this problem. The statistics aremore » linear and quadratic combinations of the a{sub ℓ m}'s with random co-efficients, and they test the null hypothesis that the a{sub ℓ m}'s are independent, normally-distributed, zero-mean random variables with an m-independent variance. The motivation for considering multiple modes is this: because most physical models that lead to large-scale anomalies result in coupling multiple ℓ and m modes, the ''coherence'' of this coupling should get enhanced if a combination of different modes is considered. In this sense, the statistics are thus much more generic than those that have been hitherto considered in literature. Using fiducial data, we demonstrate that the method works and discuss how it can be used with actual CMB data to make quite general statements about the incompatibility of the data with the null hypothesis.« less

  12. A Bayesian Approach to the Paleomagnetic Conglomerate Test

    NASA Astrophysics Data System (ADS)

    Heslop, David; Roberts, Andrew P.

    2018-02-01

    The conglomerate test has served the paleomagnetic community for over 60 years as a means to detect remagnetizations. The test states that if a suite of clasts within a bed have uniformly random paleomagnetic directions, then the conglomerate cannot have experienced a pervasive event that remagnetized the clasts in the same direction. The current form of the conglomerate test is based on null hypothesis testing, which results in a binary "pass" (uniformly random directions) or "fail" (nonrandom directions) outcome. We have recast the conglomerate test in a Bayesian framework with the aim of providing more information concerning the level of support a given data set provides for a hypothesis of uniformly random paleomagnetic directions. Using this approach, we place the conglomerate test in a fully probabilistic framework that allows for inconclusive results when insufficient information is available to draw firm conclusions concerning the randomness or nonrandomness of directions. With our method, sample sets larger than those typically employed in paleomagnetism may be required to achieve strong support for a hypothesis of random directions. Given the potentially detrimental effect of unrecognized remagnetizations on paleomagnetic reconstructions, it is important to provide a means to draw statistically robust data-driven inferences. Our Bayesian analysis provides a means to do this for the conglomerate test.

  13. Evaluation of the Air Void Analyzer

    DTIC Science & Technology

    2013-07-01

    lack of measurement would help explain the difference in values shown. Brief descriptions of other unpublished testing (Wang et al. 2008)  CTL Group...structure measurements taken from the controlled laboratory mixtures. A three-phase approach was used to evaluate the machine. First, a global ...method. Hypothesis testing using t-statistics was performed to increase understanding of the data collected globally in terms of the processes used for

  14. Surveillance system and method having an adaptive sequential probability fault detection test

    NASA Technical Reports Server (NTRS)

    Herzog, James P. (Inventor); Bickford, Randall L. (Inventor)

    2005-01-01

    System and method providing surveillance of an asset such as a process and/or apparatus by providing training and surveillance procedures that numerically fit a probability density function to an observed residual error signal distribution that is correlative to normal asset operation and then utilizes the fitted probability density function in a dynamic statistical hypothesis test for providing improved asset surveillance.

  15. Surveillance system and method having an adaptive sequential probability fault detection test

    NASA Technical Reports Server (NTRS)

    Bickford, Randall L. (Inventor); Herzog, James P. (Inventor)

    2006-01-01

    System and method providing surveillance of an asset such as a process and/or apparatus by providing training and surveillance procedures that numerically fit a probability density function to an observed residual error signal distribution that is correlative to normal asset operation and then utilizes the fitted probability density function in a dynamic statistical hypothesis test for providing improved asset surveillance.

  16. Surveillance System and Method having an Adaptive Sequential Probability Fault Detection Test

    NASA Technical Reports Server (NTRS)

    Bickford, Randall L. (Inventor); Herzog, James P. (Inventor)

    2008-01-01

    System and method providing surveillance of an asset such as a process and/or apparatus by providing training and surveillance procedures that numerically fit a probability density function to an observed residual error signal distribution that is correlative to normal asset operation and then utilizes the fitted probability density function in a dynamic statistical hypothesis test for providing improved asset surveillance.

  17. Distributed Immune Systems for Wireless Network Information Assurance

    DTIC Science & Technology

    2010-04-26

    ratio test (SPRT), where the goal is to optimize a hypothesis testing problem given a trade-off between the probability of errors and the...using cumulative sum (CUSUM) and Girshik-Rubin-Shiryaev (GRSh) statistics. In sequential versions of the problem the sequential probability ratio ...the more complicated problems, in particular those where no clear mean can be established. We developed algorithms based on the sequential probability

  18. Social Networks: Rational Learning and Information Aggregation

    DTIC Science & Technology

    2009-09-01

    most closely related to our work in this genre are Bala and Goyal (1998, 2001), DeMarzo , Vayanos and Zwiebel (2003) and Golub and Jackson (2007). These...observations. DeMarzo , Vayanos and Zwiebel and Golub and Jackson also study similar environ- ments and derive consensus-type results, whereby individuals in the...Cover T., “Hypothesis Testing with Finite Statistics,” The Annals of Mathematical Statistics, vol. 40, no. 3, pp. 828-835, 1969. [18] DeMarzo P.M

  19. Towards a Phylogenetic Approach to the Composition of Species Complexes in the North and Central American Triatoma, Vectors of Chagas Disease

    PubMed Central

    de la Rúa, Nicholas M.; Bustamante, Dulce M.; Menes, Marianela; Stevens, Lori; Monroy, Carlota; Kilpatrick, William; Rizzo, Donna; Klotz, Stephen A.; Schmidt, Justin; Axen, Heather J.; Dorn, Patricia L.

    2014-01-01

    Phylogenetic relationships of insect vectors of parasitic diseases are important for understanding the evolution of epidemiologically relevant traits, and may be useful in vector control. The subfamily Triatominae (Hemiptera:Reduviidae) includes ~140 extant species arranged in five tribes comprised of 15 genera. The genus Triatoma is the most species-rich and contains important vectors of Trypanosoma cruzi, the causative agent of Chagas disease. Triatoma species were grouped into complexes originally by morphology and more recently with the addition of information from molecular phylogenetics (the four-complex hypothesis); however, without a strict adherence to monophyly. To date, the validity of proposed species complexes has not been tested by statistical tests of topology. The goal of this study was to clarify the systematics of 19 Triatoma species from North and Central America. We inferred their evolutionary relatedness using two independent data sets: the complete nuclear Internal Transcribed Spacer-2 ribosomal DNA (ITS-2 rDNA) and head morphometrics. In addition, we used the Shimodaira-Hasegawa statistical test of topology to assess the fit of the data to a set of competing systematic hypotheses (topologies). An unconstrained topology inferred from the ITS-2 data was compared to topologies constrained based on the four-complex hypothesis or one inferred from our morphometry results. The unconstrained topology represents a statistically significant better fit of the molecular data than either the four-complex or the morphometric topology. We propose an update to the composition of species complexes in the North and Central American Triatoma, based on a phylogeny inferred from ITS-2 as a first step towards updating the phylogeny of the complexes based on monophyly and statistical tests of topologies. PMID:24681261

  20. Statistical significance versus clinical relevance.

    PubMed

    van Rijn, Marieke H C; Bech, Anneke; Bouyer, Jean; van den Brand, Jan A J G

    2017-04-01

    In March this year, the American Statistical Association (ASA) posted a statement on the correct use of P-values, in response to a growing concern that the P-value is commonly misused and misinterpreted. We aim to translate these warnings given by the ASA into a language more easily understood by clinicians and researchers without a deep background in statistics. Moreover, we intend to illustrate the limitations of P-values, even when used and interpreted correctly, and bring more attention to the clinical relevance of study findings using two recently reported studies as examples. We argue that P-values are often misinterpreted. A common mistake is saying that P < 0.05 means that the null hypothesis is false, and P ≥0.05 means that the null hypothesis is true. The correct interpretation of a P-value of 0.05 is that if the null hypothesis were indeed true, a similar or more extreme result would occur 5% of the times upon repeating the study in a similar sample. In other words, the P-value informs about the likelihood of the data given the null hypothesis and not the other way around. A possible alternative related to the P-value is the confidence interval (CI). It provides more information on the magnitude of an effect and the imprecision with which that effect was estimated. However, there is no magic bullet to replace P-values and stop erroneous interpretation of scientific results. Scientists and readers alike should make themselves familiar with the correct, nuanced interpretation of statistical tests, P-values and CIs. © The Author 2017. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.

  1. Assessment of resampling methods for causality testing: A note on the US inflation behavior

    PubMed Central

    Kyrtsou, Catherine; Kugiumtzis, Dimitris; Diks, Cees

    2017-01-01

    Different resampling methods for the null hypothesis of no Granger causality are assessed in the setting of multivariate time series, taking into account that the driving-response coupling is conditioned on the other observed variables. As appropriate test statistic for this setting, the partial transfer entropy (PTE), an information and model-free measure, is used. Two resampling techniques, time-shifted surrogates and the stationary bootstrap, are combined with three independence settings (giving a total of six resampling methods), all approximating the null hypothesis of no Granger causality. In these three settings, the level of dependence is changed, while the conditioning variables remain intact. The empirical null distribution of the PTE, as the surrogate and bootstrapped time series become more independent, is examined along with the size and power of the respective tests. Additionally, we consider a seventh resampling method by contemporaneously resampling the driving and the response time series using the stationary bootstrap. Although this case does not comply with the no causality hypothesis, one can obtain an accurate sampling distribution for the mean of the test statistic since its value is zero under H0. Results indicate that as the resampling setting gets more independent, the test becomes more conservative. Finally, we conclude with a real application. More specifically, we investigate the causal links among the growth rates for the US CPI, money supply and crude oil. Based on the PTE and the seven resampling methods, we consistently find that changes in crude oil cause inflation conditioning on money supply in the post-1986 period. However this relationship cannot be explained on the basis of traditional cost-push mechanisms. PMID:28708870

  2. Assessment of resampling methods for causality testing: A note on the US inflation behavior.

    PubMed

    Papana, Angeliki; Kyrtsou, Catherine; Kugiumtzis, Dimitris; Diks, Cees

    2017-01-01

    Different resampling methods for the null hypothesis of no Granger causality are assessed in the setting of multivariate time series, taking into account that the driving-response coupling is conditioned on the other observed variables. As appropriate test statistic for this setting, the partial transfer entropy (PTE), an information and model-free measure, is used. Two resampling techniques, time-shifted surrogates and the stationary bootstrap, are combined with three independence settings (giving a total of six resampling methods), all approximating the null hypothesis of no Granger causality. In these three settings, the level of dependence is changed, while the conditioning variables remain intact. The empirical null distribution of the PTE, as the surrogate and bootstrapped time series become more independent, is examined along with the size and power of the respective tests. Additionally, we consider a seventh resampling method by contemporaneously resampling the driving and the response time series using the stationary bootstrap. Although this case does not comply with the no causality hypothesis, one can obtain an accurate sampling distribution for the mean of the test statistic since its value is zero under H0. Results indicate that as the resampling setting gets more independent, the test becomes more conservative. Finally, we conclude with a real application. More specifically, we investigate the causal links among the growth rates for the US CPI, money supply and crude oil. Based on the PTE and the seven resampling methods, we consistently find that changes in crude oil cause inflation conditioning on money supply in the post-1986 period. However this relationship cannot be explained on the basis of traditional cost-push mechanisms.

  3. Comparison of Measures of Predictive Power.

    ERIC Educational Resources Information Center

    Tarling, Roger

    1982-01-01

    The Mean Cost Rating, P(A) from Signal Detection Theory, Kendall's rank correlation coefficient tau, and Goodman and Kruskal's gamma measures of predictive power are compared and shown to be different transformations of the statistic S. Gamma is generally preferred for hypothesis testing. Measures of association for ordered contingency tables are…

  4. Non-Bayesian Inference: Causal Structure Trumps Correlation

    ERIC Educational Resources Information Center

    Bes, Benedicte; Sloman, Steven; Lucas, Christopher G.; Raufaste, Eric

    2012-01-01

    The study tests the hypothesis that conditional probability judgments can be influenced by causal links between the target event and the evidence even when the statistical relations among variables are held constant. Three experiments varied the causal structure relating three variables and found that (a) the target event was perceived as more…

  5. Mind Your p's and Alphas.

    ERIC Educational Resources Information Center

    Stallings, William M.

    In the educational research literature alpha, the a priori level of significance, and p, the a posteriori probability of obtaining a test statistic of at least a certain value when the null hypothesis is true, are often confused. Explanations for this confusion are offered. Paradoxically, alpha retains a prominent place in textbook discussions of…

  6. Constructing the Exact Significance Level for a Person-Fit Statistic.

    ERIC Educational Resources Information Center

    Liou, Michelle; Chang, Chih-Hsin

    1992-01-01

    An extension is proposed for the network algorithm introduced by C.R. Mehta and N.R. Patel to construct exact tail probabilities for testing the general hypothesis that item responses are distributed according to the Rasch model. A simulation study indicates the efficiency of the algorithm. (SLD)

  7. Unicorns do exist: a tutorial on "proving" the null hypothesis.

    PubMed

    Streiner, David L

    2003-12-01

    Introductory statistics classes teach us that we can never prove the null hypothesis; all we can do is reject or fail to reject it. However, there are times when it is necessary to try to prove the nonexistence of a difference between groups. This most often happens within the context of comparing a new treatment against an established one and showing that the new intervention is not inferior to the standard. This article first outlines the logic of "noninferiority" testing by differentiating between the null hypothesis (that which we are trying to nullify) and the "nill" hypothesis (there is no difference), reversing the role of the null and alternate hypotheses, and defining an interval within which groups are said to be equivalent. We then work through an example and show how to calculate sample sizes for noninferiority studies.

  8. Report on the search for atmospheric holes using airs image data

    NASA Technical Reports Server (NTRS)

    Reinleitner, Lee A.

    1991-01-01

    Frank et al (1986) presented a very controversial hypothesis which states that the Earth is being bombarded by water-vapor clouds resulting from the disruption and vaporization of small comets. This hypothesis was based on single-pixel intensity decreases in the images of the earth's dayglow emissions at vacuum-ultraviolet (VUV) wavelengths using the DE-1 imager. These dark spots, or atmospheric holes, are hypothesized to be the result of VUV absorption by a water-vapor cloud between the imager and the dayglow-emitting region. Examined here is the VUV data set from the Auroral Ionospheric Remote Sensor (AIRS) instrument that was flown on the Polar BEAR satellite. AIRS was uniquely situated to test this hypothesis. Due to the altitude of the sensor, the holes should show multi-pixel intensity decreases in a scan line. A statistical estimate indicated that sufficient 130.4-nm data from AIRS existed to detect eight to nine such holes, but none was detected. The probability of this occurring is less than 1.0 x 10(exp -4). A statistical estimate indicated that sufficient 135.6-nm data from AIRS existed to detect approx. 2 holes, and two ambiguous cases are shown. In spite of the two ambiguous cases, the 135.6-nm data did not show clear support for the small-comet hypothesis. The 130.4-nm data clearly do not support the small-comet hypothesis.

  9. EFFICIENTLY ESTABLISHING CONCEPTS OF INFERENTIAL STATISTICS AND HYPOTHESIS DECISION MAKING THROUGH CONTEXTUALLY CONTROLLED EQUIVALENCE CLASSES

    PubMed Central

    Fienup, Daniel M; Critchfield, Thomas S

    2010-01-01

    Computerized lessons that reflect stimulus equivalence principles were used to teach college students concepts related to inferential statistics and hypothesis decision making. Lesson 1 taught participants concepts related to inferential statistics, and Lesson 2 taught them to base hypothesis decisions on a scientific hypothesis and the direction of an effect. Lesson 3 taught the conditional influence of inferential statistics over decisions regarding the scientific and null hypotheses. Participants entered the study with low scores on the targeted skills and left the study demonstrating a high level of accuracy on these skills, which involved mastering more relations than were taught formally. This study illustrates the efficiency of equivalence-based instruction in establishing academic skills in sophisticated learners. PMID:21358904

  10. A Unified Mixed-Effects Model for Rare-Variant Association in Sequencing Studies

    PubMed Central

    Sun, Jianping; Zheng, Yingye; Hsu, Li

    2013-01-01

    For rare-variant association analysis, due to extreme low frequencies of these variants, it is necessary to aggregate them by a prior set (e.g., genes and pathways) in order to achieve adequate power. In this paper, we consider hierarchical models to relate a set of rare variants to phenotype by modeling the effects of variants as a function of variant characteristics while allowing for variant-specific effect (heterogeneity). We derive a set of two score statistics, testing the group effect by variant characteristics and the heterogeneity effect. We make a novel modification to these score statistics so that they are independent under the null hypothesis and their asymptotic distributions can be derived. As a result, the computational burden is greatly reduced compared with permutation-based tests. Our approach provides a general testing framework for rare variants association, which includes many commonly used tests, such as the burden test [Li and Leal, 2008] and the sequence kernel association test [Wu et al., 2011], as special cases. Furthermore, in contrast to these tests, our proposed test has an added capacity to identify which components of variant characteristics and heterogeneity contribute to the association. Simulations under a wide range of scenarios show that the proposed test is valid, robust and powerful. An application to the Dallas Heart Study illustrates that apart from identifying genes with significant associations, the new method also provides additional information regarding the source of the association. Such information may be useful for generating hypothesis in future studies. PMID:23483651

  11. Generalizing Terwilliger's likelihood approach: a new score statistic to test for genetic association.

    PubMed

    el Galta, Rachid; Uitte de Willige, Shirley; de Visser, Marieke C H; Helmer, Quinta; Hsu, Li; Houwing-Duistermaat, Jeanine J

    2007-09-24

    In this paper, we propose a one degree of freedom test for association between a candidate gene and a binary trait. This method is a generalization of Terwilliger's likelihood ratio statistic and is especially powerful for the situation of one associated haplotype. As an alternative to the likelihood ratio statistic, we derive a score statistic, which has a tractable expression. For haplotype analysis, we assume that phase is known. By means of a simulation study, we compare the performance of the score statistic to Pearson's chi-square statistic and the likelihood ratio statistic proposed by Terwilliger. We illustrate the method on three candidate genes studied in the Leiden Thrombophilia Study. We conclude that the statistic follows a chi square distribution under the null hypothesis and that the score statistic is more powerful than Terwilliger's likelihood ratio statistic when the associated haplotype has frequency between 0.1 and 0.4 and has a small impact on the studied disorder. With regard to Pearson's chi-square statistic, the score statistic has more power when the associated haplotype has frequency above 0.2 and the number of variants is above five.

  12. A Closer Look at Data Independence: Comment on “Lies, Damned Lies, and Statistics (in Geology)”

    NASA Astrophysics Data System (ADS)

    Kravtsov, Sergey; Saunders, Rolando Olivas

    2011-02-01

    In his Forum (Eos, 90(47), 443, doi:10.1029/2009EO470004, 2009), P. Vermeesch suggests that statistical tests are not fit to interpret long data records. He asserts that for large enough data sets any true null hypothesis will always be rejected. This is certainly not the case! Here we revisit this author's example of weekly distribution of earthquakes and show that statistical results support the commonsense expectation that seismic activity does not depend on weekday (see the online supplement to this Eos issue for details (http://www.agu.org/eos_elec/)).

  13. In Response: Biological arguments for selecting effect sizes in ecotoxicological testing—A governmental perspective

    USGS Publications Warehouse

    Mebane, Christopher A.

    2015-01-01

    Criticisms of the uses of the no-observed-effect concentration (NOEC) and the lowest-observed-effect concentration (LOEC) and more generally the entire null hypothesis statistical testing scheme are hardly new or unique to the field of ecotoxicology [1-4]. Among the criticisms of NOECs and LOECs is that statistically similar LOECs (in terms of p value) can represent drastically different levels of effect. For instance, my colleagues and I found that a battery of chronic toxicity tests with different species and endpoints yielded LOECs with minimum detectable differences ranging from 3% to 48% reductions from controls [5].

  14. The use of analysis of variance procedures in biological studies

    USGS Publications Warehouse

    Williams, B.K.

    1987-01-01

    The analysis of variance (ANOVA) is widely used in biological studies, yet there remains considerable confusion among researchers about the interpretation of hypotheses being tested. Ambiguities arise when statistical designs are unbalanced, and in particular when not all combinations of design factors are represented in the data. This paper clarifies the relationship among hypothesis testing, statistical modelling and computing procedures in ANOVA for unbalanced data. A simple two-factor fixed effects design is used to illustrate three common parametrizations for ANOVA models, and some associations among these parametrizations are developed. Biologically meaningful hypotheses for main effects and interactions are given in terms of each parametrization, and procedures for testing the hypotheses are described. The standard statistical computing procedures in ANOVA are given along with their corresponding hypotheses. Throughout the development unbalanced designs are assumed and attention is given to problems that arise with missing cells.

  15. Ranking nano-enabled hybrid media for simultaneous removal of contaminants with different chemistries: Pseudo-equilibrium sorption tests versus column tests.

    PubMed

    Custodio, Tomas; Garcia, Jose; Markovski, Jasmina; McKay Gifford, James; Hristovski, Kiril D; Olson, Larry W

    2017-12-15

    The underlying hypothesis of this study was that pseudo-equilibrium and column testing conditions would provide the same sorbent ranking trends although the values of sorbents' performance descriptors (e.g. sorption capacity) may vary because of different kinetics and competition effects induced by the two testing approaches. To address this hypothesis, nano-enabled hybrid media were fabricated and its removal performances were assessed for two model contaminants under multi-point batch pseudo-equilibrium and continuous-flow conditions. Calculation of simultaneous removal capacity indices (SRC) demonstrated that the more resource demanding continuous-flow tests are able to generate the same performance rankings as the ones obtained by conducing the simpler pseudo-equilibrium tests. Furthermore, continuous overlap between the 98% confidence boundaries for each SRC index trend, not only validated the hypothesis that both testing conditions provide the same ranking trends, but also pointed that SRC indices are statistically the same for each media, regardless of employed method. In scenarios where rapid screening of new media is required to obtain the best performing synthesis formulation, use of pseudo-equilibrium tests proved to be reliable. Considering that kinetics induced effects on sorption capacity must not be neglected, more resource demanding column test could be conducted only with the top performing media that exhibit the highest sorption capacity. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Calculating p-values and their significances with the Energy Test for large datasets

    NASA Astrophysics Data System (ADS)

    Barter, W.; Burr, C.; Parkes, C.

    2018-04-01

    The energy test method is a multi-dimensional test of whether two samples are consistent with arising from the same underlying population, through the calculation of a single test statistic (called the T-value). The method has recently been used in particle physics to search for samples that differ due to CP violation. The generalised extreme value function has previously been used to describe the distribution of T-values under the null hypothesis that the two samples are drawn from the same underlying population. We show that, in a simple test case, the distribution is not sufficiently well described by the generalised extreme value function. We present a new method, where the distribution of T-values under the null hypothesis when comparing two large samples can be found by scaling the distribution found when comparing small samples drawn from the same population. This method can then be used to quickly calculate the p-values associated with the results of the test.

  17. In the Beginning-There Is the Introduction-and Your Study Hypothesis.

    PubMed

    Vetter, Thomas R; Mascha, Edward J

    2017-05-01

    Writing a manuscript for a medical journal is very akin to writing a newspaper article-albeit a scholarly one. Like any journalist, you have a story to tell. You need to tell your story in a way that is easy to follow and makes a compelling case to the reader. Although recommended since the beginning of the 20th century, the conventional Introduction-Methods-Results-And-Discussion (IMRAD) scientific reporting structure has only been the standard since the 1980s. The Introduction should be focused and succinct in communicating the significance, background, rationale, study aims or objectives, and the primary (and secondary, if appropriate) study hypotheses. Hypothesis testing involves posing both a null and an alternative hypothesis. The null hypothesis proposes that no difference or association exists on the outcome variable of interest between the interventions or groups being compared. The alternative hypothesis is the opposite of the null hypothesis and thus typically proposes that a difference in the population does exist between the groups being compared on the parameter of interest. Most investigators seek to reject the null hypothesis because of their expectation that the studied intervention does result in a difference between the study groups or that the association of interest does exist. Therefore, in most clinical and basic science studies and manuscripts, the alternative hypothesis is stated, not the null hypothesis. Also, in the Introduction, the alternative hypothesis is typically stated in the direction of interest, or the expected direction. However, when assessing the association of interest, researchers typically look in both directions (ie, favoring 1 group or the other) by conducting a 2-tailed statistical test because the true direction of the effect is typically not known, and either direction would be important to report.

  18. A SIGNIFICANCE TEST FOR THE LASSO1

    PubMed Central

    Lockhart, Richard; Taylor, Jonathan; Tibshirani, Ryan J.; Tibshirani, Robert

    2014-01-01

    In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an Exp(1) asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i.e., testing for a single significant predictor variable against the global null) requires only weak assumptions on the predictor matrix X. On the other hand, our proof for a general step in the lasso path places further technical assumptions on X and the generative model, but still allows for the important high-dimensional case p > n, and does not necessarily require that the current lasso model achieves perfect recovery of the truly active variables. Of course, for testing the significance of an additional variable between two nested linear models, one typically uses the chi-squared test, comparing the drop in residual sum of squares (RSS) to a χ12 distribution. But when this additional variable is not fixed, and has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than χ12 under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter λ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the l1 penalty. Therefore, the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties—adaptivity and shrinkage—and its null distribution is tractable and asymptotically Exp(1). PMID:25574062

  19. FADTTSter: accelerating hypothesis testing with functional analysis of diffusion tensor tract statistics

    NASA Astrophysics Data System (ADS)

    Noel, Jean; Prieto, Juan C.; Styner, Martin

    2017-03-01

    Functional Analysis of Diffusion Tensor Tract Statistics (FADTTS) is a toolbox for analysis of white matter (WM) fiber tracts. It allows associating diffusion properties along major WM bundles with a set of covariates of interest, such as age, diagnostic status and gender, and the structure of the variability of these WM tract properties. However, to use this toolbox, a user must have an intermediate knowledge in scripting languages (MATLAB). FADTTSter was created to overcome this issue and make the statistical analysis accessible to any non-technical researcher. FADTTSter is actively being used by researchers at the University of North Carolina. FADTTSter guides non-technical users through a series of steps including quality control of subjects and fibers in order to setup the necessary parameters to run FADTTS. Additionally, FADTTSter implements interactive charts for FADTTS' outputs. This interactive chart enhances the researcher experience and facilitates the analysis of the results. FADTTSter's motivation is to improve usability and provide a new analysis tool to the community that complements FADTTS. Ultimately, by enabling FADTTS to a broader audience, FADTTSter seeks to accelerate hypothesis testing in neuroimaging studies involving heterogeneous clinical data and diffusion tensor imaging. This work is submitted to the Biomedical Applications in Molecular, Structural, and Functional Imaging conference. The source code of this application is available in NITRC.

  20. OSO 8 observational limits to the acoustic coronal heating mechanism

    NASA Technical Reports Server (NTRS)

    Bruner, E. C., Jr.

    1981-01-01

    An improved analysis of time-resolved line profiles of the C IV resonance line at 1548 A has been used to test the acoustic wave hypothesis of solar coronal heating. It is shown that the observed motions and brightness fluctuations are consistent with the existence of acoustic waves. Specific account is taken of the effect of photon statistics on the observed velocities, and a test is devised to determine whether the motions represent propagating or evanescent waves. It is found that on the average about as much energy is carried upward as downward such that the net acoustic flux density is statistically consistent with zero. The statistical uncertainty in this null result is three orders of magnitue lower than the flux level needed to heat the corona.

  1. Use and effect of clinical photographic records on the motivation and oral hygiene practices of a group of mentally handicapped adults.

    PubMed

    Bickley, S R; Shaw, L; Shaw, M J

    1990-01-01

    A study was undertaken to test the hypothesis that clinical photographic records can be used to motivate oral hygiene performance in mentally handicapped adults. Plaque reduction over the period of the study was shown to be higher in the test group than the control group but differences between the test and control groups were not statistically significant. The small numbers involved (29) and the difficulties in matching subjects may have mitigated against demonstrating a statistically significant difference between the two groups. All participants demonstrated high levels of toothbrushing ability during the practical aspects of the study but this was not maintained through daily oral hygiene practices in the majority of subjects.

  2. The Epistemology of Mathematical and Statistical Modeling: A Quiet Methodological Revolution

    ERIC Educational Resources Information Center

    Rodgers, Joseph Lee

    2010-01-01

    A quiet methodological revolution, a modeling revolution, has occurred over the past several decades, almost without discussion. In contrast, the 20th century ended with contentious argument over the utility of null hypothesis significance testing (NHST). The NHST controversy may have been at least partially irrelevant, because in certain ways the…

  3. Effects of Instructional Design with Mental Model Analysis on Learning.

    ERIC Educational Resources Information Center

    Hong, Eunsook

    This paper presents a model for systematic instructional design that includes mental model analysis together with the procedures used in developing computer-based instructional materials in the area of statistical hypothesis testing. The instructional design model is based on the premise that the objective for learning is to achieve expert-like…

  4. Spatial autocorrelation in growth of undisturbed natural pine stands across Georgia

    Treesearch

    Raymond L. Czaplewski; Robin M. Reich; William A. Bechtold

    1994-01-01

    Moran's I statistic measures the spatial autocorrelation in a random variable measured at discrete locations in space. Permutation procedures test the null hypothesis that the observed Moran's I value is no greater than that expected by chance. The spatial autocorrelation of gross basal area increment is analyzed for undisturbed, naturally regenerated stands...

  5. Effect of an Interactive Component on Students' Conceptual Understanding of Hypothesis Testing

    ERIC Educational Resources Information Center

    Inkpen, Sarah Anne

    2016-01-01

    The Premier Technical College of Qatar (PTC-Q) has seen high failure rates among students taking a college statistics course. The students are English as a foreign language (EFL) learners in business studies and health sciences. Course delivery has involved conventional content/ curriculum-centered instruction with minimal to no interactive…

  6. Intimate Homicide between Asians and Non-Asians: The Impact of Community Context

    ERIC Educational Resources Information Center

    Wu, Bohsiu

    2009-01-01

    This study tests two competing hypotheses regarding the social structural dynamics of intimate homicide: backlash versus collective efficacy. This study also examines the role of race in how social factors specified in each hypothesis affect intimate homicide. Data are from the California Vital Statistics and Homicide Data, 1990-1999. Results from…

  7. Diagnosis of Cognitive Errors by Statistical Pattern Recognition Methods.

    ERIC Educational Resources Information Center

    Tatsuoka, Kikumi K.; Tatsuoka, Maurice M.

    The rule space model permits measurement of cognitive skill acquisition, diagnosis of cognitive errors, and detection of the strengths and weaknesses of knowledge possessed by individuals. Two ways to classify an individual into his or her most plausible latent state of knowledge include: (1) hypothesis testing--Bayes' decision rules for minimum…

  8. Dispositional and Explanatory Style Optimism as Potential Moderators of the Relationship between Hopelessness and Suicidal Ideation

    ERIC Educational Resources Information Center

    Hirsch, Jameson K.; Conner, Kenneth R.

    2006-01-01

    To test the hypothesis that higher levels of optimism reduce the association between hopelessness and suicidal ideation, 284 college students completed self-report measures of optimism and Beck scales for hopelessness, suicidal ideation, and depression. A statistically significant interaction between hopelessness and one measure of optimism was…

  9. Statistics for Radiology Research.

    PubMed

    Obuchowski, Nancy A; Subhas, Naveen; Polster, Joshua

    2017-02-01

    Biostatistics is an essential component in most original research studies in imaging. In this article we discuss five key statistical concepts for study design and analyses in modern imaging research: statistical hypothesis testing, particularly focusing on noninferiority studies; imaging outcomes especially when there is no reference standard; dealing with the multiplicity problem without spending all your study power; relevance of confidence intervals in reporting and interpreting study results; and finally tools for assessing quantitative imaging biomarkers. These concepts are presented first as examples of conversations between investigator and biostatistician, and then more detailed discussions of the statistical concepts follow. Three skeletal radiology examples are used to illustrate the concepts. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  10. Quantifying (dis)agreement between direct detection experiments in a halo-independent way

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feldstein, Brian; Kahlhoefer, Felix, E-mail: brian.feldstein@physics.ox.ac.uk, E-mail: felix.kahlhoefer@physics.ox.ac.uk

    We propose an improved method to study recent and near-future dark matter direct detection experiments with small numbers of observed events. Our method determines in a quantitative and halo-independent way whether the experiments point towards a consistent dark matter signal and identifies the best-fit dark matter parameters. To achieve true halo independence, we apply a recently developed method based on finding the velocity distribution that best describes a given set of data. For a quantitative global analysis we construct a likelihood function suitable for small numbers of events, which allows us to determine the best-fit particle physics properties of darkmore » matter considering all experiments simultaneously. Based on this likelihood function we propose a new test statistic that quantifies how well the proposed model fits the data and how large the tension between different direct detection experiments is. We perform Monte Carlo simulations in order to determine the probability distribution function of this test statistic and to calculate the p-value for both the dark matter hypothesis and the background-only hypothesis.« less

  11. Long working hours and use of psychotropic medicine: a follow-up study with register linkage.

    PubMed

    Hannerz, Harald; Albertsen, Karen

    2016-03-01

    This study aimed to investigate the possibility of a prospective association between long working hours and use of psychotropic medicine. Survey data drawn from random samples of the general working population of Denmark in the time period 1995-2010 were linked to national registers covering all inhabitants. The participants were followed for first occurrence of redeemed prescriptions for psychotropic medicine. The primary analysis included 25,959 observations (19,259 persons) and yielded a total of 2914 new cases of psychotropic drug use in 99,018 person-years at risk. Poisson regression was used to model incidence rates of redeemed prescriptions for psychotropic medicine as a function of working hours (32-40, 41-48, >48 hours/week). The analysis was controlled for gender, age, sample, shift work, and socioeconomic status. A likelihood ratio test was used to test the null hypothesis, which stated that the incidence rates were independent of weekly working hours. The likelihood ratio test did not reject the null hypothesis (P=0.085). The rate ratio (RR) was 1.04 [95% confidence interval (95% CI) 0.94-1.15] for the contrast 41-48 versus 32-40 work hours/week and 1.15 (95% CI 1.02-1.30) for >48 versus 32-40 hours/week. None of the rate ratios that were estimated in the present study were statistically significant after adjustment for multiple testing. However, stratified analyses, in which 30 RR were estimated, generated the hypothesis that overtime work (>48 hours/week) might be associated with an increased risk among night or shift workers (RR=1.51, 95% CI 1.15-1.98). The present study did not find a statistically significant association between long working hours and incidence of psychotropic drug usage among Danish employees.

  12. Statistical evaluation of synchronous spike patterns extracted by frequent item set mining

    PubMed Central

    Torre, Emiliano; Picado-Muiño, David; Denker, Michael; Borgelt, Christian; Grün, Sonja

    2013-01-01

    We recently proposed frequent itemset mining (FIM) as a method to perform an optimized search for patterns of synchronous spikes (item sets) in massively parallel spike trains. This search outputs the occurrence count (support) of individual patterns that are not trivially explained by the counts of any superset (closed frequent item sets). The number of patterns found by FIM makes direct statistical tests infeasible due to severe multiple testing. To overcome this issue, we proposed to test the significance not of individual patterns, but instead of their signatures, defined as the pairs of pattern size z and support c. Here, we derive in detail a statistical test for the significance of the signatures under the null hypothesis of full independence (pattern spectrum filtering, PSF) by means of surrogate data. As a result, injected spike patterns that mimic assembly activity are well detected, yielding a low false negative rate. However, this approach is prone to additionally classify patterns resulting from chance overlap of real assembly activity and background spiking as significant. These patterns represent false positives with respect to the null hypothesis of having one assembly of given signature embedded in otherwise independent spiking activity. We propose the additional method of pattern set reduction (PSR) to remove these false positives by conditional filtering. By employing stochastic simulations of parallel spike trains with correlated activity in form of injected spike synchrony in subsets of the neurons, we demonstrate for a range of parameter settings that the analysis scheme composed of FIM, PSF and PSR allows to reliably detect active assemblies in massively parallel spike trains. PMID:24167487

  13. On use of the multistage dose-response model for assessing laboratory animal carcinogenicity

    PubMed Central

    Nitcheva, Daniella; Piegorsch, Walter W.; West, R. Webster

    2007-01-01

    We explore how well a statistical multistage model describes dose-response patterns in laboratory animal carcinogenicity experiments from a large database of quantal response data. The data are collected from the U.S. EPA’s publicly available IRIS data warehouse and examined statistically to determine how often higher-order values in the multistage predictor yield significant improvements in explanatory power over lower-order values. Our results suggest that the addition of a second-order parameter to the model only improves the fit about 20% of the time, while adding even higher-order terms apparently does not contribute to the fit at all, at least with the study designs we captured in the IRIS database. Also included is an examination of statistical tests for assessing significance of higher-order terms in a multistage dose-response model. It is noted that bootstrap testing methodology appears to offer greater stability for performing the hypothesis tests than a more-common, but possibly unstable, “Wald” test. PMID:17490794

  14. An accurate test for homogeneity of odds ratios based on Cochran's Q-statistic.

    PubMed

    Kulinskaya, Elena; Dollinger, Michael B

    2015-06-10

    A frequently used statistic for testing homogeneity in a meta-analysis of K independent studies is Cochran's Q. For a standard test of homogeneity the Q statistic is referred to a chi-square distribution with K-1 degrees of freedom. For the situation in which the effects of the studies are logarithms of odds ratios, the chi-square distribution is much too conservative for moderate size studies, although it may be asymptotically correct as the individual studies become large. Using a mixture of theoretical results and simulations, we provide formulas to estimate the shape and scale parameters of a gamma distribution to fit the distribution of Q. Simulation studies show that the gamma distribution is a good approximation to the distribution for Q. Use of the gamma distribution instead of the chi-square distribution for Q should eliminate inaccurate inferences in assessing homogeneity in a meta-analysis. (A computer program for implementing this test is provided.) This hypothesis test is competitive with the Breslow-Day test both in accuracy of level and in power.

  15. Exchanging the liquidity hypothesis: Delay discounting of money and self-relevant non-money rewards.

    PubMed

    Stuppy-Sullivan, Allison M; Tormohlen, Kayla N; Yi, Richard

    2016-01-01

    Evidence that primary rewards (e.g., food and drugs of abuse) are discounted more than money is frequently attributed to money's high degree of liquidity, or exchangeability for many commodities. The present study provides some evidence against this liquidity hypothesis by contrasting delay discounting of monetary rewards (liquid) and non-monetary commodities (non-liquid) that are self-relevant and utility-matched. Ninety-seven (97) undergraduate students initially completed a conventional binary-choice delay discounting of money task. Participants returned one week later and completed a self-relevant commodity delay discounting task. Both conventional hypothesis testing and more-conservative tests of statistical equivalence revealed correspondence in rate of delay discounting of money and self-relevant commodities, and in one magnitude condition, less discounting for the latter. The present results indicate that liquidity of money cannot fully account for the lower rate of delay discounting compared to non-money rewards. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Isotopic Resonance Hypothesis: Experimental Verification by Escherichia coli Growth Measurements

    NASA Astrophysics Data System (ADS)

    Xie, Xueshu; Zubarev, Roman A.

    2015-03-01

    Isotopic composition of reactants affects the rates of chemical and biochemical reactions. As a rule, enrichment of heavy stable isotopes leads to progressively slower reactions. But the recent isotopic resonance hypothesis suggests that the dependence of the reaction rate upon the enrichment degree is not monotonous. Instead, at some ``resonance'' isotopic compositions, the kinetics increases, while at ``off-resonance'' compositions the same reactions progress slower. To test the predictions of this hypothesis for the elements C, H, N and O, we designed a precise (standard error +/-0.05%) experiment that measures the parameters of bacterial growth in minimal media with varying isotopic composition. A number of predicted resonance conditions were tested, with significant enhancements in kinetics discovered at these conditions. The combined statistics extremely strongly supports the validity of the isotopic resonance phenomenon (p << 10-15). This phenomenon has numerous implications for the origin of life studies and astrobiology, and possible applications in agriculture, biotechnology, medicine, chemistry and other areas.

  17. Bayesian evaluation of effect size after replicating an original study

    PubMed Central

    van Aert, Robbie C. M.; van Assen, Marcel A. L. M.

    2017-01-01

    The vast majority of published results in the literature is statistically significant, which raises concerns about their reliability. The Reproducibility Project Psychology (RPP) and Experimental Economics Replication Project (EE-RP) both replicated a large number of published studies in psychology and economics. The original study and replication were statistically significant in 36.1% in RPP and 68.8% in EE-RP suggesting many null effects among the replicated studies. However, evidence in favor of the null hypothesis cannot be examined with null hypothesis significance testing. We developed a Bayesian meta-analysis method called snapshot hybrid that is easy to use and understand and quantifies the amount of evidence in favor of a zero, small, medium and large effect. The method computes posterior model probabilities for a zero, small, medium, and large effect and adjusts for publication bias by taking into account that the original study is statistically significant. We first analytically approximate the methods performance, and demonstrate the necessity to control for the original study’s significance to enable the accumulation of evidence for a true zero effect. Then we applied the method to the data of RPP and EE-RP, showing that the underlying effect sizes of the included studies in EE-RP are generally larger than in RPP, but that the sample sizes of especially the included studies in RPP are often too small to draw definite conclusions about the true effect size. We also illustrate how snapshot hybrid can be used to determine the required sample size of the replication akin to power analysis in null hypothesis significance testing and present an easy to use web application (https://rvanaert.shinyapps.io/snapshot/) and R code for applying the method. PMID:28388646

  18. Outlier Removal and the Relation with Reporting Errors and Quality of Psychological Research

    PubMed Central

    Bakker, Marjan; Wicherts, Jelte M.

    2014-01-01

    Background The removal of outliers to acquire a significant result is a questionable research practice that appears to be commonly used in psychology. In this study, we investigated whether the removal of outliers in psychology papers is related to weaker evidence (against the null hypothesis of no effect), a higher prevalence of reporting errors, and smaller sample sizes in these papers compared to papers in the same journals that did not report the exclusion of outliers from the analyses. Methods and Findings We retrieved a total of 2667 statistical results of null hypothesis significance tests from 153 articles in main psychology journals, and compared results from articles in which outliers were removed (N = 92) with results from articles that reported no exclusion of outliers (N = 61). We preregistered our hypotheses and methods and analyzed the data at the level of articles. Results show no significant difference between the two types of articles in median p value, sample sizes, or prevalence of all reporting errors, large reporting errors, and reporting errors that concerned the statistical significance. However, we did find a discrepancy between the reported degrees of freedom of t tests and the reported sample size in 41% of articles that did not report removal of any data values. This suggests common failure to report data exclusions (or missingness) in psychological articles. Conclusions We failed to find that the removal of outliers from the analysis in psychological articles was related to weaker evidence (against the null hypothesis of no effect), sample size, or the prevalence of errors. However, our control sample might be contaminated due to nondisclosure of excluded values in articles that did not report exclusion of outliers. Results therefore highlight the importance of more transparent reporting of statistical analyses. PMID:25072606

  19. Changes in Occupational Radiation Exposures after Incorporation of a Real-time Dosimetry System in the Interventional Radiology Suite.

    PubMed

    Poudel, Sashi; Weir, Lori; Dowling, Dawn; Medich, David C

    2016-08-01

    A statistical pilot study was retrospectively performed to analyze potential changes in occupational radiation exposures to Interventional Radiology (IR) staff at Lawrence General Hospital after implementation of the i2 Active Radiation Dosimetry System (Unfors RaySafe Inc, 6045 Cochran Road Cleveland, OH 44139-3302). In this study, the monthly OSL dosimetry records obtained during the eight-month period prior to i2 implementation were normalized to the number of procedures performed during each month and statistically compared to the normalized dosimetry records obtained for the 8-mo period after i2 implementation. The resulting statistics included calculation of the mean and standard deviation of the dose equivalences per procedure and included appropriate hypothesis tests to assess for statistically valid differences between the pre and post i2 study periods. Hypothesis testing was performed on three groups of staff present during an IR procedure: The first group included all members of the IR staff, the second group consisted of the IR radiologists, and the third group consisted of the IR technician staff. After implementing the i2 active dosimetry system, participating members of the Lawrence General IR staff had a reduction in the average dose equivalence per procedure of 43.1% ± 16.7% (p = 0.04). Similarly, Lawrence General IR radiologists had a 65.8% ± 33.6% (p=0.01) reduction while the technologists had a 45.0% ± 14.4% (p=0.03) reduction.

  20. A single test for rejecting the null hypothesis in subgroups and in the overall sample.

    PubMed

    Lin, Yunzhi; Zhou, Kefei; Ganju, Jitendra

    2017-01-01

    In clinical trials, some patient subgroups are likely to demonstrate larger effect sizes than other subgroups. For example, the effect size, or informally the benefit with treatment, is often greater in patients with a moderate condition of a disease than in those with a mild condition. A limitation of the usual method of analysis is that it does not incorporate this ordering of effect size by patient subgroup. We propose a test statistic which supplements the conventional test by including this information and simultaneously tests the null hypothesis in pre-specified subgroups and in the overall sample. It results in more power than the conventional test when the differences in effect sizes across subgroups are at least moderately large; otherwise it loses power. The method involves combining p-values from models fit to pre-specified subgroups and the overall sample in a manner that assigns greater weight to subgroups in which a larger effect size is expected. Results are presented for randomized trials with two and three subgroups.

  1. CorSig: a general framework for estimating statistical significance of correlation and its application to gene co-expression analysis.

    PubMed

    Wang, Hong-Qiang; Tsai, Chung-Jui

    2013-01-01

    With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher's Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu's procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data. A web server for CorSig is provided at http://202.127.200.1:8080/probeWeb. R code for CorSig is freely available for non-commercial use at http://aspendb.uga.edu/downloads.

  2. Hitting Is Contagious in Baseball: Evidence from Long Hitting Streaks

    PubMed Central

    Bock, Joel R.; Maewal, Akhilesh; Gough, David A.

    2012-01-01

    Data analysis is used to test the hypothesis that “hitting is contagious”. A statistical model is described to study the effect of a hot hitter upon his teammates’ batting during a consecutive game hitting streak. Box score data for entire seasons comprising streaks of length games, including a total observations were compiled. Treatment and control sample groups () were constructed from core lineups of players on the streaking batter’s team. The percentile method bootstrap was used to calculate confidence intervals for statistics representing differences in the mean distributions of two batting statistics between groups. Batters in the treatment group (hot streak active) showed statistically significant improvements in hitting performance, as compared against the control. Mean for the treatment group was found to be to percentage points higher during hot streaks (mean difference increased points), while the batting heat index introduced here was observed to increase by points. For each performance statistic, the null hypothesis was rejected at the significance level. We conclude that the evidence suggests the potential existence of a “statistical contagion effect”. Psychological mechanisms essential to the empirical results are suggested, as several studies from the scientific literature lend credence to contagious phenomena in sports. Causal inference from these results is difficult, but we suggest and discuss several latent variables that may contribute to the observed results, and offer possible directions for future research. PMID:23251507

  3. Tips and Tricks for Successful Application of Statistical Methods to Biological Data.

    PubMed

    Schlenker, Evelyn

    2016-01-01

    This chapter discusses experimental design and use of statistics to describe characteristics of data (descriptive statistics) and inferential statistics that test the hypothesis posed by the investigator. Inferential statistics, based on probability distributions, depend upon the type and distribution of the data. For data that are continuous, randomly and independently selected, as well as normally distributed more powerful parametric tests such as Student's t test and analysis of variance (ANOVA) can be used. For non-normally distributed or skewed data, transformation of the data (using logarithms) may normalize the data allowing use of parametric tests. Alternatively, with skewed data nonparametric tests can be utilized, some of which rely on data that are ranked prior to statistical analysis. Experimental designs and analyses need to balance between committing type 1 errors (false positives) and type 2 errors (false negatives). For a variety of clinical studies that determine risk or benefit, relative risk ratios (random clinical trials and cohort studies) or odds ratios (case-control studies) are utilized. Although both use 2 × 2 tables, their premise and calculations differ. Finally, special statistical methods are applied to microarray and proteomics data, since the large number of genes or proteins evaluated increase the likelihood of false discoveries. Additional studies in separate samples are used to verify microarray and proteomic data. Examples in this chapter and references are available to help continued investigation of experimental designs and appropriate data analysis.

  4. Statistical inference and Aristotle's Rhetoric.

    PubMed

    Macdonald, Ranald R

    2004-11-01

    Formal logic operates in a closed system where all the information relevant to any conclusion is present, whereas this is not the case when one reasons about events and states of the world. Pollard and Richardson drew attention to the fact that the reasoning behind statistical tests does not lead to logically justifiable conclusions. In this paper statistical inferences are defended not by logic but by the standards of everyday reasoning. Aristotle invented formal logic, but argued that people mostly get at the truth with the aid of enthymemes--incomplete syllogisms which include arguing from examples, analogies and signs. It is proposed that statistical tests work in the same way--in that they are based on examples, invoke the analogy of a model and use the size of the effect under test as a sign that the chance hypothesis is unlikely. Of existing theories of statistical inference only a weak version of Fisher's takes this into account. Aristotle anticipated Fisher by producing an argument of the form that there were too many cases in which an outcome went in a particular direction for that direction to be plausibly attributed to chance. We can therefore conclude that Aristotle would have approved of statistical inference and there is a good reason for calling this form of statistical inference classical.

  5. Statistical hypothesis tests of some micrometeorological observations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    SethuRaman, S.; Tichler, J.

    Chi-square goodness-of-fit is used to test the hypothesis that the medium scale of turbulence in the atmospheric surface layer is normally distributed. Coefficients of skewness and excess are computed from the data. If the data are not normal, these coefficients are used in Edgeworth's asymptotic expansion of Gram-Charlier series to determine an altrnate probability density function. The observed data are then compared with the modified probability densities and the new chi-square values computed.Seventy percent of the data analyzed was either normal or approximatley normal. The coefficient of skewness g/sub 1/ has a good correlation with the chi-square values. Events withmore » vertical-barg/sub 1/vertical-bar<0.21 were normal to begin with and those with 0.21« less

  6. Confidence Intervals Make a Difference: Effects of Showing Confidence Intervals on Inferential Reasoning

    ERIC Educational Resources Information Center

    Hoekstra, Rink; Johnson, Addie; Kiers, Henk A. L.

    2012-01-01

    The use of confidence intervals (CIs) as an addition or as an alternative to null hypothesis significance testing (NHST) has been promoted as a means to make researchers more aware of the uncertainty that is inherent in statistical inference. Little is known, however, about whether presenting results via CIs affects how readers judge the…

  7. Direct and Indirect Effects of Birth Order on Personality and Identity: Support for the Null Hypothesis

    ERIC Educational Resources Information Center

    Dunkel, Curtis S.; Harbke, Colin R.; Papini, Dennis R.

    2009-01-01

    The authors proposed that birth order affects psychosocial outcomes through differential investment from parent to child and differences in the degree of identification from child to parent. The authors conducted this study to test these 2 models. Despite the use of statistical and methodological procedures to increase sensitivity and reduce…

  8. Assessment issues in the testing of children at school entry.

    PubMed

    Rock, Donald A; Stenner, A Jackson

    2005-01-01

    The authors introduce readers to the research documenting racial and ethnic gaps in school readiness. They describe the key tests, including the Peabody Picture Vocabulary Test (PPVT), the Early Childhood Longitudinal Study (ECLS), and several intelligence tests, and describe how they have been administered to several important national samples of children. Next, the authors review the different estimates of the gaps and discuss how to interpret these differences. In interpreting test results, researchers use the statistical term "standard deviation" to compare scores across the tests. On average, the tests find a gap of about 1 standard deviation. The ECLS-K estimate is the lowest, about half a standard deviation. The PPVT estimate is the highest, sometimes more than 1 standard deviation. When researchers adjust those gaps statistically to take into account different outside factors that might affect children's test scores, such as family income or home environment, the gap narrows but does not disappear. Why such different estimates of the gap? The authors consider explanations such as differences in the samples, racial or ethnic bias in the tests, and whether the tests reflect different aspects of school "readiness," and conclude that none is likely to explain the varying estimates. Another possible explanation is the Spearman Hypothesis-that all tests are imperfect measures of a general ability construct, g; the more highly a given test correlates with g, the larger the gap will be. But the Spearman Hypothesis, too, leaves questions to be investigated. A gap of 1 standard deviation may not seem large, but the authors show clearly how it results in striking disparities in the performance of black and white students and why it should be of serious concern to policymakers.

  9. Critical analysis of adsorption data statistically

    NASA Astrophysics Data System (ADS)

    Kaushal, Achla; Singh, S. K.

    2017-10-01

    Experimental data can be presented, computed, and critically analysed in a different way using statistics. A variety of statistical tests are used to make decisions about the significance and validity of the experimental data. In the present study, adsorption was carried out to remove zinc ions from contaminated aqueous solution using mango leaf powder. The experimental data was analysed statistically by hypothesis testing applying t test, paired t test and Chi-square test to (a) test the optimum value of the process pH, (b) verify the success of experiment and (c) study the effect of adsorbent dose in zinc ion removal from aqueous solutions. Comparison of calculated and tabulated values of t and χ 2 showed the results in favour of the data collected from the experiment and this has been shown on probability charts. K value for Langmuir isotherm was 0.8582 and m value for Freundlich adsorption isotherm obtained was 0.725, both are <1, indicating favourable isotherms. Karl Pearson's correlation coefficient values for Langmuir and Freundlich adsorption isotherms were obtained as 0.99 and 0.95 respectively, which show higher degree of correlation between the variables. This validates the data obtained for adsorption of zinc ions from the contaminated aqueous solution with the help of mango leaf powder.

  10. A close examination of double filtering with fold change and t test in microarray analysis

    PubMed Central

    2009-01-01

    Background Many researchers use the double filtering procedure with fold change and t test to identify differentially expressed genes, in the hope that the double filtering will provide extra confidence in the results. Due to its simplicity, the double filtering procedure has been popular with applied researchers despite the development of more sophisticated methods. Results This paper, for the first time to our knowledge, provides theoretical insight on the drawback of the double filtering procedure. We show that fold change assumes all genes to have a common variance while t statistic assumes gene-specific variances. The two statistics are based on contradicting assumptions. Under the assumption that gene variances arise from a mixture of a common variance and gene-specific variances, we develop the theoretically most powerful likelihood ratio test statistic. We further demonstrate that the posterior inference based on a Bayesian mixture model and the widely used significance analysis of microarrays (SAM) statistic are better approximations to the likelihood ratio test than the double filtering procedure. Conclusion We demonstrate through hypothesis testing theory, simulation studies and real data examples, that well constructed shrinkage testing methods, which can be united under the mixture gene variance assumption, can considerably outperform the double filtering procedure. PMID:19995439

  11. The optimal power puzzle: scrutiny of the monotone likelihood ratio assumption in multiple testing.

    PubMed

    Cao, Hongyuan; Sun, Wenguang; Kosorok, Michael R

    2013-01-01

    In single hypothesis testing, power is a non-decreasing function of type I error rate; hence it is desirable to test at the nominal level exactly to achieve optimal power. The puzzle lies in the fact that for multiple testing, under the false discovery rate paradigm, such a monotonic relationship may not hold. In particular, exact false discovery rate control may lead to a less powerful testing procedure if a test statistic fails to fulfil the monotone likelihood ratio condition. In this article, we identify different scenarios wherein the condition fails and give caveats for conducting multiple testing in practical settings.

  12. Hypothesis testing clarifies the systematics of the main Central American Chagas disease vector, Triatoma dimidiata (Latreille, 1811), across its geographic range.

    PubMed

    Dorn, Patricia L; de la Rúa, Nicholas M; Axen, Heather; Smith, Nicholas; Richards, Bethany R; Charabati, Jirias; Suarez, Julianne; Woods, Adrienne; Pessoa, Rafaela; Monroy, Carlota; Kilpatrick, C William; Stevens, Lori

    2016-10-01

    The widespread and diverse Triatoma dimidiata is the kissing bug species most important for Chagas disease transmission in Central America and a secondary vector in Mexico and northern South America. Its diversity may contribute to different Chagas disease prevalence in different localities and has led to conflicting systematic hypotheses describing various populations as subspecies or cryptic species. To resolve these conflicting hypotheses, we sequenced a nuclear (internal transcribed spacer 2, ITS-2) and mitochondrial gene (cytochrome b) from an extensive sampling of T. dimidiata across its geographic range. We evaluated the congruence of ITS-2 and cyt b phylogenies and tested the support for the previously proposed subspecies (inferred from ITS-2) by: (1) overlaying the ITS-2 subspecies assignments on a cyt b tree and, (2) assessing the statistical support for a cyt b topology constrained by the subspecies hypothesis. Unconstrained phylogenies inferred from ITS-2 and cyt b are congruent and reveal three clades including two putative cryptic species in addition to T. dimidiata sensu stricto. Neither the cyt b phylogeny nor hypothesis testing support the proposed subspecies inferred from ITS-2. Additionally, the two cryptic species are supported by phylogenies inferred from mitochondrially-encoded genes cytochrome c oxidase I and NADH dehydrogenase 4. In summary, our results reveal two cryptic species. Phylogenetic relationships indicate T. dimidiata sensu stricto is not subdivided into monophyletic clades consistent with subspecies. Based on increased support by hypothesis testing, we propose an updated systematic hypothesis for T. dimidiata based on extensive taxon sampling and analysis of both mitochondrial and nuclear genes. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies.

    PubMed

    Sayyari, Erfan; Mirarab, Siavash

    2018-02-28

    Phylogenetic species trees typically represent the speciation history as a bifurcating tree. Speciation events that simultaneously create more than two descendants, thereby creating polytomies in the phylogeny, are possible. Moreover, the inability to resolve relationships is often shown as a (soft) polytomy. Both types of polytomies have been traditionally studied in the context of gene tree reconstruction from sequence data. However, polytomies in the species tree cannot be detected or ruled out without considering gene tree discordance. In this paper, we describe a statistical test based on properties of the multi-species coalescent model to test the null hypothesis that a branch in an estimated species tree should be replaced by a polytomy. On both simulated and biological datasets, we show that the null hypothesis is rejected for all but the shortest branches, and in most cases, it is retained for true polytomies. The test, available as part of the Accurate Species TRee ALgorithm (ASTRAL) package, can help systematists decide whether their datasets are sufficient to resolve specific relationships of interest.

  14. Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies

    PubMed Central

    Sayyari, Erfan

    2018-01-01

    Phylogenetic species trees typically represent the speciation history as a bifurcating tree. Speciation events that simultaneously create more than two descendants, thereby creating polytomies in the phylogeny, are possible. Moreover, the inability to resolve relationships is often shown as a (soft) polytomy. Both types of polytomies have been traditionally studied in the context of gene tree reconstruction from sequence data. However, polytomies in the species tree cannot be detected or ruled out without considering gene tree discordance. In this paper, we describe a statistical test based on properties of the multi-species coalescent model to test the null hypothesis that a branch in an estimated species tree should be replaced by a polytomy. On both simulated and biological datasets, we show that the null hypothesis is rejected for all but the shortest branches, and in most cases, it is retained for true polytomies. The test, available as part of the Accurate Species TRee ALgorithm (ASTRAL) package, can help systematists decide whether their datasets are sufficient to resolve specific relationships of interest. PMID:29495636

  15. Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2007-01-01

    Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.

  16. Measuring continuous baseline covariate imbalances in clinical trial data

    PubMed Central

    Ciolino, Jody D.; Martin, Renee’ H.; Zhao, Wenle; Hill, Michael D.; Jauch, Edward C.; Palesch, Yuko Y.

    2014-01-01

    This paper presents and compares several methods of measuring continuous baseline covariate imbalance in clinical trial data. Simulations illustrate that though the t-test is an inappropriate method of assessing continuous baseline covariate imbalance, the test statistic itself is a robust measure in capturing imbalance in continuous covariate distributions. Guidelines to assess effects of imbalance on bias, type I error rate, and power for hypothesis test for treatment effect on continuous outcomes are presented, and the benefit of covariate-adjusted analysis (ANCOVA) is also illustrated. PMID:21865270

  17. The cost of large numbers of hypothesis tests on power, effect size and sample size.

    PubMed

    Lazzeroni, L C; Ray, A

    2012-01-01

    Advances in high-throughput biology and computer science are driving an exponential increase in the number of hypothesis tests in genomics and other scientific disciplines. Studies using current genotyping platforms frequently include a million or more tests. In addition to the monetary cost, this increase imposes a statistical cost owing to the multiple testing corrections needed to avoid large numbers of false-positive results. To safeguard against the resulting loss of power, some have suggested sample sizes on the order of tens of thousands that can be impractical for many diseases or may lower the quality of phenotypic measurements. This study examines the relationship between the number of tests on the one hand and power, detectable effect size or required sample size on the other. We show that once the number of tests is large, power can be maintained at a constant level, with comparatively small increases in the effect size or sample size. For example at the 0.05 significance level, a 13% increase in sample size is needed to maintain 80% power for ten million tests compared with one million tests, whereas a 70% increase in sample size is needed for 10 tests compared with a single test. Relative costs are less when measured by increases in the detectable effect size. We provide an interactive Excel calculator to compute power, effect size or sample size when comparing study designs or genome platforms involving different numbers of hypothesis tests. The results are reassuring in an era of extreme multiple testing.

  18. Spread of cattle led to the loss of matrilineal descent in Africa: a coevolutionary analysis.

    PubMed Central

    Holden, Clare Janaki; Mace, Ruth

    2003-01-01

    Matrilineal descent is rare in human societies that keep large livestock. However, this negative correlation does not provide reliable evidence that livestock and descent rules are functionally related, because human cultures are not statistically independent owing to their historical relationships (Galton's problem). We tested the hypothesis that when matrilineal cultures acquire cattle they become patrilineal using a sample of 68 Bantu- and Bantoid-speaking populations from sub-Saharan Africa. We used a phylogenetic comparative method to control for Galton's problem, and a maximum-parsimony Bantu language tree as a model of population history. We tested for coevolution between cattle and descent. We also tested the direction of cultural evolution--were cattle acquired before matriliny was lost? The results support the hypothesis that acquiring cattle led formerly matrilineal Bantu-speaking cultures to change to patrilineal or mixed descent. We discuss possible reasons for matriliny's association with horticulture and its rarity in pastoralist societies. We outline the daughter-biased parental investment hypothesis for matriliny, which is supported by data on sex, wealth and reproductive success from two African societies, the matrilineal Chewa in Malawi and the patrilineal Gabbra in Kenya. PMID:14667331

  19. Employing hypothesis testing and data from multiple genomic compartments to resolve recalcitrant backbone nodes in Goodenia s.l. (Goodeniaceae).

    PubMed

    Jabaily, Rachel S; Shepherd, Kelly A; Michener, Pryce S; Bush, Caroline J; Rivero, Rodrigo; Gardner, Andrew G; Sessa, Emily B

    2018-05-15

    Goodeniaceae is a primarily Australian flowering plant family with a complex taxonomy and evolutionary history. Previous phylogenetic analyses have successfully resolved the backbone topology of the largest clade in the family, Goodenia s.l., but have failed to clarify relationships within the species-rich and enigmatic Goodenia clade C, a prerequisite for taxonomic revision of the group. We used genome skimming to retrieve sequences for chloroplast, mitochondrial, and nuclear markers for 24 taxa representing Goodenia s.l., with a particular focus on Goodenia clade C. We performed extensive hypothesis tests to explore incongruence in clade C and evaluate statistical support for clades within this group, using datasets from all three genomic compartments. The mitochondrial dataset is comparable to the chloroplast dataset in providing resolution within Goodenia clade C, though backbone support values within this clade remain low. The hypothesis tests provided an additional, complementary means of evaluating support for clades. We propose that the major subclades of Goodenia clade C (C1-C3 + Verreauxia) are the result of a rapid radiation, and each represents a distinct lineage. Copyright © 2018. Published by Elsevier Inc.

  20. Sample Size and Statistical Conclusions from Tests of Fit to the Rasch Model According to the Rasch Unidimensional Measurement Model (Rumm) Program in Health Outcome Measurement.

    PubMed

    Hagell, Peter; Westergren, Albert

    Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).

  1. Testing second order cyclostationarity in the squared envelope spectrum of non-white vibration signals

    NASA Astrophysics Data System (ADS)

    Borghesani, P.; Pennacchi, P.; Ricci, R.; Chatterton, S.

    2013-10-01

    Cyclostationary models for the diagnostic signals measured on faulty rotating machineries have proved to be successful in many laboratory tests and industrial applications. The squared envelope spectrum has been pointed out as the most efficient indicator for the assessment of second order cyclostationary symptoms of damages, which are typical, for instance, of rolling element bearing faults. In an attempt to foster the spread of rotating machinery diagnostics, the current trend in the field is to reach higher levels of automation of the condition monitoring systems. For this purpose, statistical tests for the presence of cyclostationarity have been proposed during the last years. The statistical thresholds proposed in the past for the identification of cyclostationary components have been obtained under the hypothesis of having a white noise signal when the component is healthy. This need, coupled with the non-white nature of the real signals implies the necessity of pre-whitening or filtering the signal in optimal narrow-bands, increasing the complexity of the algorithm and the risk of losing diagnostic information or introducing biases on the result. In this paper, the authors introduce an original analytical derivation of the statistical tests for cyclostationarity in the squared envelope spectrum, dropping the hypothesis of white noise from the beginning. The effect of first order and second order cyclostationary components on the distribution of the squared envelope spectrum will be quantified and the effectiveness of the newly proposed threshold verified, providing a sound theoretical basis and a practical starting point for efficient automated diagnostics of machine components such as rolling element bearings. The analytical results will be verified by means of numerical simulations and by using experimental vibration data of rolling element bearings.

  2. geneCommittee: a web-based tool for extensively testing the discriminatory power of biologically relevant gene sets in microarray data classification.

    PubMed

    Reboiro-Jato, Miguel; Arrais, Joel P; Oliveira, José Luis; Fdez-Riverola, Florentino

    2014-01-30

    The diagnosis and prognosis of several diseases can be shortened through the use of different large-scale genome experiments. In this context, microarrays can generate expression data for a huge set of genes. However, to obtain solid statistical evidence from the resulting data, it is necessary to train and to validate many classification techniques in order to find the best discriminative method. This is a time-consuming process that normally depends on intricate statistical tools. geneCommittee is a web-based interactive tool for routinely evaluating the discriminative classification power of custom hypothesis in the form of biologically relevant gene sets. While the user can work with different gene set collections and several microarray data files to configure specific classification experiments, the tool is able to run several tests in parallel. Provided with a straightforward and intuitive interface, geneCommittee is able to render valuable information for diagnostic analyses and clinical management decisions based on systematically evaluating custom hypothesis over different data sets using complementary classifiers, a key aspect in clinical research. geneCommittee allows the enrichment of microarrays raw data with gene functional annotations, producing integrated datasets that simplify the construction of better discriminative hypothesis, and allows the creation of a set of complementary classifiers. The trained committees can then be used for clinical research and diagnosis. Full documentation including common use cases and guided analysis workflows is freely available at http://sing.ei.uvigo.es/GC/.

  3. Dispositional and explanatory style optimism as potential moderators of the relationship between hopelessness and suicidal ideation.

    PubMed

    Hirsch, Jameson K; Conner, Kenneth R

    2006-12-01

    To test the hypothesis that higher levels of optimism reduce the association between hopelessness and suicidal ideation, 284 college students completed self-report measures of optimism and Beck scales for hopelessness, suicidal ideation, and depression. A statistically significant interaction between hopelessness and one measure of optimism was obtained, consistent with the hypothesis that optimism moderates the relationship between hopelessness and suicidal ideation. Hopelessness is not inevitably associated with suicidal ideation. Optimism may be an important moderator of the association. The development of treatments to enhance optimism may complement standard treatments to reduce suicidality that target depression and hopelessness.

  4. Statistical Analysis of Model Data for Operational Space Launch Weather Support at Kennedy Space Center and Cape Canaveral Air Force Station

    NASA Technical Reports Server (NTRS)

    Bauman, William H., III

    2010-01-01

    The 12-km resolution North American Mesoscale (NAM) model (MesoNAM) is used by the 45th Weather Squadron (45 WS) Launch Weather Officers at Kennedy Space Center (KSC) and Cape Canaveral Air Force Station (CCAFS) to support space launch weather operations. The 45 WS tasked the Applied Meteorology Unit to conduct an objective statistics-based analysis of MesoNAM output compared to wind tower mesonet observations and then develop a an operational tool to display the results. The National Centers for Environmental Prediction began running the current version of the MesoNAM in mid-August 2006. The period of record for the dataset was 1 September 2006 - 31 January 2010. The AMU evaluated MesoNAM hourly forecasts from 0 to 84 hours based on model initialization times of 00, 06, 12 and 18 UTC. The MesoNAM forecast winds, temperature and dew point were compared to the observed values of these parameters from the sensors in the KSC/CCAFS wind tower network. The data sets were stratified by model initialization time, month and onshore/offshore flow for each wind tower. Statistics computed included bias (mean difference), standard deviation of the bias, root mean square error (RMSE) and a hypothesis test for bias = O. Twelve wind towers located in close proximity to key launch complexes were used for the statistical analysis with the sensors on the towers positioned at varying heights to include 6 ft, 30 ft, 54 ft, 60 ft, 90 ft, 162 ft, 204 ft and 230 ft depending on the launch vehicle and associated weather launch commit criteria being evaluated. These twelve wind towers support activities for the Space Shuttle (launch and landing), Delta IV, Atlas V and Falcon 9 launch vehicles. For all twelve towers, the results indicate a diurnal signal in the bias of temperature (T) and weaker but discernable diurnal signal in the bias of dewpoint temperature (T(sub d)) in the MesoNAM forecasts. Also, the standard deviation of the bias and RMSE of T, T(sub d), wind speed and wind direction indicated the model error increased with the forecast period all four parameters. The hypothesis testing uses statistics to determine the probability that a given hypothesis is true. The goal of using the hypothesis test was to determine if the model bias of any of the parameters assessed throughout the model forecast period was statistically zero. For th is dataset, if this test produced a value >= -1 .96 or <= 1.96 for a data point, then the bias at that point was effectively zero and the model forecast for that point was considered to have no error. A graphical user interface (GUI) was developed so the 45 WS would have an operational tool at their disposal that would be easy to navigate among the multiple stratifications of information to include tower locations, month, model initialization times, sensor heights and onshore/offshore flow. The AMU developed the GUI using HyperText Markup Language (HTML) so the tool could be used in most popular web browsers with computers running different operating systems such as Microsoft Windows and Linux.

  5. A statistical method for measuring activation of gene regulatory networks.

    PubMed

    Esteves, Gustavo H; Reis, Luiz F L

    2018-06-13

    Gene expression data analysis is of great importance for modern molecular biology, given our ability to measure the expression profiles of thousands of genes and enabling studies rooted in systems biology. In this work, we propose a simple statistical model for the activation measuring of gene regulatory networks, instead of the traditional gene co-expression networks. We present the mathematical construction of a statistical procedure for testing hypothesis regarding gene regulatory network activation. The real probability distribution for the test statistic is evaluated by a permutation based study. To illustrate the functionality of the proposed methodology, we also present a simple example based on a small hypothetical network and the activation measuring of two KEGG networks, both based on gene expression data collected from gastric and esophageal samples. The two KEGG networks were also analyzed for a public database, available through NCBI-GEO, presented as Supplementary Material. This method was implemented in an R package that is available at the BioConductor project website under the name maigesPack.

  6. An efficient genome-wide association test for mixed binary and continuous phenotypes with applications to substance abuse research.

    PubMed

    Buu, Anne; Williams, L Keoki; Yang, James J

    2018-03-01

    We propose a new genome-wide association test for mixed binary and continuous phenotypes that uses an efficient numerical method to estimate the empirical distribution of the Fisher's combination statistic under the null hypothesis. Our simulation study shows that the proposed method controls the type I error rate and also maintains its power at the level of the permutation method. More importantly, the computational efficiency of the proposed method is much higher than the one of the permutation method. The simulation results also indicate that the power of the test increases when the genetic effect increases, the minor allele frequency increases, and the correlation between responses decreases. The statistical analysis on the database of the Study of Addiction: Genetics and Environment demonstrates that the proposed method combining multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests.

  7. Siblings, Birth Order, and Cooperative-Competitive Social Behavior: A Comparison of Anglo-American and Mexican-American Children.

    ERIC Educational Resources Information Center

    Knight, George P.; Kagan, Spencer

    1982-01-01

    Tested the hypothesis that differences in cooperative-competitive social behavior between Anglo-Americans and Mexican Americans is a result of larger family size among the latter group. Found that, even after controlling for number of siblings and birth order, statistically significant differences in such behavior remained between the two groups.…

  8. Amazon forest carbon dynamics predicted by profiles of canopy leaf area and light environment

    Treesearch

    S. C. Stark; V. Leitold; J. L. Wu; M. O. Hunter; C. V. de Castilho; F. R. C. Costa; S. M. McMahon; G. G. Parker; M. Takako Shimabukuro; M. A. Lefsky; M. Keller; L. F. Alves; J. Schietti; Y. E. Shimabukuro; D. O. Brandao; T. K. Woodcock; N. Higuchi; P. B de Camargo; R. C. de Oliveira; S. R. Saleska

    2012-01-01

    Tropical forest structural variation across heterogeneous landscapes may control above-ground carbon dynamics. We tested the hypothesis that canopy structure (leaf area and light availability) – remotely estimated from LiDAR – control variation in above-ground coarse wood production (biomass growth). Using a statistical model, these factors predicted biomass growth...

  9. Fukunaga-Koontz feature transformation for statistical structural damage detection and hierarchical neuro-fuzzy damage localisation

    NASA Astrophysics Data System (ADS)

    Hoell, Simon; Omenzetter, Piotr

    2017-07-01

    Considering jointly damage sensitive features (DSFs) of signals recorded by multiple sensors, applying advanced transformations to these DSFs and assessing systematically their contribution to damage detectability and localisation can significantly enhance the performance of structural health monitoring systems. This philosophy is explored here for partial autocorrelation coefficients (PACCs) of acceleration responses. They are interrogated with the help of the linear discriminant analysis based on the Fukunaga-Koontz transformation using datasets of the healthy and selected reference damage states. Then, a simple but efficient fast forward selection procedure is applied to rank the DSF components with respect to statistical distance measures specialised for either damage detection or localisation. For the damage detection task, the optimal feature subsets are identified based on the statistical hypothesis testing. For damage localisation, a hierarchical neuro-fuzzy tool is developed that uses the DSF ranking to establish its own optimal architecture. The proposed approaches are evaluated experimentally on data from non-destructively simulated damage in a laboratory scale wind turbine blade. The results support our claim of being able to enhance damage detectability and localisation performance by transforming and optimally selecting DSFs. It is demonstrated that the optimally selected PACCs from multiple sensors or their Fukunaga-Koontz transformed versions can not only improve the detectability of damage via statistical hypothesis testing but also increase the accuracy of damage localisation when used as inputs into a hierarchical neuro-fuzzy network. Furthermore, the computational effort of employing these advanced soft computing models for damage localisation can be significantly reduced by using transformed DSFs.

  10. Multiple object tracking with non-unique data-to-object association via generalized hypothesis testing. [tracking several aircraft near each other or ships at sea

    NASA Technical Reports Server (NTRS)

    Porter, D. W.; Lefler, R. M.

    1979-01-01

    A generalized hypothesis testing approach is applied to the problem of tracking several objects where several different associations of data with objects are possible. Such problems occur, for instance, when attempting to distinctly track several aircraft maneuvering near each other or when tracking ships at sea. Conceptually, the problem is solved by first, associating data with objects in a statistically reasonable fashion and then, tracking with a bank of Kalman filters. The objects are assumed to have motion characterized by a fixed but unknown deterministic portion plus a random process portion modeled by a shaping filter. For example, the object might be assumed to have a mean straight line path about which it maneuvers in a random manner. Several hypothesized associations of data with objects are possible because of ambiguity as to which object the data comes from, false alarm/detection errors, and possible uncertainty in the number of objects being tracked. The statistical likelihood function is computed for each possible hypothesized association of data with objects. Then the generalized likelihood is computed by maximizing the likelihood over parameters that define the deterministic motion of the object.

  11. Demodulation of messages received with low signal to noise ratio

    NASA Astrophysics Data System (ADS)

    Marguinaud, A.; Quignon, T.; Romann, B.

    The implementation of this all-digital demodulator is derived from maximum likelihood considerations applied to an analytical representation of the received signal. Traditional adapted filters and phase lock loops are replaced by minimum variance estimators and hypothesis tests. These statistical tests become very simple when working on phase signal. These methods, combined with rigorous control data representation allow significant computation savings as compared to conventional realizations. Nominal operation has been verified down to energetic signal over noise of -3 dB upon a QPSK demodulator.

  12. Assessment of statistical significance and clinical relevance.

    PubMed

    Kieser, Meinhard; Friede, Tim; Gondan, Matthias

    2013-05-10

    In drug development, it is well accepted that a successful study will demonstrate not only a statistically significant result but also a clinically relevant effect size. Whereas standard hypothesis tests are used to demonstrate the former, it is less clear how the latter should be established. In the first part of this paper, we consider the responder analysis approach and study the performance of locally optimal rank tests when the outcome distribution is a mixture of responder and non-responder distributions. We find that these tests are quite sensitive to their planning assumptions and have therefore not really any advantage over standard tests such as the t-test and the Wilcoxon-Mann-Whitney test, which perform overall well and can be recommended for applications. In the second part, we present a new approach to the assessment of clinical relevance based on the so-called relative effect (or probabilistic index) and derive appropriate sample size formulae for the design of studies aiming at demonstrating both a statistically significant and clinically relevant effect. Referring to recent studies in multiple sclerosis, we discuss potential issues in the application of this approach. Copyright © 2012 John Wiley & Sons, Ltd.

  13. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial

    PubMed Central

    Hallgren, Kevin A.

    2012-01-01

    Many research designs require the assessment of inter-rater reliability (IRR) to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of their subsequent analyses for hypothesis testing. This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of some commonly-used IRR statistics. Computational examples include SPSS and R syntax for computing Cohen’s kappa and intra-class correlations to assess IRR. PMID:22833776

  14. Detection of statistical asymmetries in non-stationary sign time series: Analysis of foreign exchange data

    PubMed Central

    Takayasu, Hideki; Takayasu, Misako

    2017-01-01

    We extend the concept of statistical symmetry as the invariance of a probability distribution under transformation to analyze binary sign time series data of price difference from the foreign exchange market. We model segments of the sign time series as Markov sequences and apply a local hypothesis test to evaluate the symmetries of independence and time reversion in different periods of the market. For the test, we derive the probability of a binary Markov process to generate a given set of number of symbol pairs. Using such analysis, we could not only segment the time series according the different behaviors but also characterize the segments in terms of statistical symmetries. As a particular result, we find that the foreign exchange market is essentially time reversible but this symmetry is broken when there is a strong external influence. PMID:28542208

  15. PEPA test: fast and powerful differential analysis from relative quantitative proteomics data using shared peptides.

    PubMed

    Jacob, Laurent; Combes, Florence; Burger, Thomas

    2018-06-18

    We propose a new hypothesis test for the differential abundance of proteins in mass-spectrometry based relative quantification. An important feature of this type of high-throughput analyses is that it involves an enzymatic digestion of the sample proteins into peptides prior to identification and quantification. Due to numerous homology sequences, different proteins can lead to peptides with identical amino acid chains, so that their parent protein is ambiguous. These so-called shared peptides make the protein-level statistical analysis a challenge and are often not accounted for. In this article, we use a linear model describing peptide-protein relationships to build a likelihood ratio test of differential abundance for proteins. We show that the likelihood ratio statistic can be computed in linear time with the number of peptides. We also provide the asymptotic null distribution of a regularized version of our statistic. Experiments on both real and simulated datasets show that our procedures outperforms state-of-the-art methods. The procedures are available via the pepa.test function of the DAPAR Bioconductor R package.

  16. Linearised and non-linearised isotherm models optimization analysis by error functions and statistical means

    PubMed Central

    2014-01-01

    In adsorption study, to describe sorption process and evaluation of best-fitting isotherm model is a key analysis to investigate the theoretical hypothesis. Hence, numerous statistically analysis have been extensively used to estimate validity of the experimental equilibrium adsorption values with the predicted equilibrium values. Several statistical error analysis were carried out. In the present study, the following statistical analysis were carried out to evaluate the adsorption isotherm model fitness, like the Pearson correlation, the coefficient of determination and the Chi-square test, have been used. The ANOVA test was carried out for evaluating significance of various error functions and also coefficient of dispersion were evaluated for linearised and non-linearised models. The adsorption of phenol onto natural soil (Local name Kalathur soil) was carried out, in batch mode at 30 ± 20 C. For estimating the isotherm parameters, to get a holistic view of the analysis the models were compared between linear and non-linear isotherm models. The result reveled that, among above mentioned error functions and statistical functions were designed to determine the best fitting isotherm. PMID:25018878

  17. SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

    PubMed

    Chu, Annie; Cui, Jenny; Dinov, Ivo D

    2009-03-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.

  18. Basic biostatistics for post-graduate students

    PubMed Central

    Dakhale, Ganesh N.; Hiware, Sachin K.; Shinde, Abhijit T.; Mahatme, Mohini S.

    2012-01-01

    Statistical methods are important to draw valid conclusions from the obtained data. This article provides background information related to fundamental methods and techniques in biostatistics for the use of postgraduate students. Main focus is given to types of data, measurement of central variations and basic tests, which are useful for analysis of different types of observations. Few parameters like normal distribution, calculation of sample size, level of significance, null hypothesis, indices of variability, and different test are explained in detail by giving suitable examples. Using these guidelines, we are confident enough that postgraduate students will be able to classify distribution of data along with application of proper test. Information is also given regarding various free software programs and websites useful for calculations of statistics. Thus, postgraduate students will be benefitted in both ways whether they opt for academics or for industry. PMID:23087501

  19. A comparison of likelihood ratio tests and Rao's score test for three separable covariance matrix structures.

    PubMed

    Filipiak, Katarzyna; Klein, Daniel; Roy, Anuradha

    2017-01-01

    The problem of testing the separability of a covariance matrix against an unstructured variance-covariance matrix is studied in the context of multivariate repeated measures data using Rao's score test (RST). The RST statistic is developed with the first component of the separable structure as a first-order autoregressive (AR(1)) correlation matrix or an unstructured (UN) covariance matrix under the assumption of multivariate normality. It is shown that the distribution of the RST statistic under the null hypothesis of any separability does not depend on the true values of the mean or the unstructured components of the separable structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data, where the likelihood ratio test (LRT) cannot be used, and it outperforms the standard LRT in a number of contexts. Monte Carlo simulations are then used to study the comparative behavior of the null distribution of the RST statistic, as well as that of the LRT statistic, in terms of sample size considerations, and for the estimation of the empirical percentiles. Our findings are compared with existing results where the first component of the separable structure is a compound symmetry (CS) correlation matrix. It is also shown by simulations that the empirical null distribution of the RST statistic converges faster than the empirical null distribution of the LRT statistic to the limiting χ 2 distribution. The tests are implemented on a real dataset from medical studies. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Statistics for X-chromosome associations.

    PubMed

    Özbek, Umut; Lin, Hui-Min; Lin, Yan; Weeks, Daniel E; Chen, Wei; Shaffer, John R; Purcell, Shaun M; Feingold, Eleanor

    2018-06-13

    In a genome-wide association study (GWAS), association between genotype and phenotype at autosomal loci is generally tested by regression models. However, X-chromosome data are often excluded from published analyses of autosomes because of the difference between males and females in number of X chromosomes. Failure to analyze X-chromosome data at all is obviously less than ideal, and can lead to missed discoveries. Even when X-chromosome data are included, they are often analyzed with suboptimal statistics. Several mathematically sensible statistics for X-chromosome association have been proposed. The optimality of these statistics, however, is based on very specific simple genetic models. In addition, while previous simulation studies of these statistics have been informative, they have focused on single-marker tests and have not considered the types of error that occur even under the null hypothesis when the entire X chromosome is scanned. In this study, we comprehensively tested several X-chromosome association statistics using simulation studies that include the entire chromosome. We also considered a wide range of trait models for sex differences and phenotypic effects of X inactivation. We found that models that do not incorporate a sex effect can have large type I error in some cases. We also found that many of the best statistics perform well even when there are modest deviations, such as trait variance differences between the sexes or small sex differences in allele frequencies, from assumptions. © 2018 WILEY PERIODICALS, INC.

  1. Improving the space surveillance telescope's performance using multi-hypothesis testing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chris Zingarelli, J.; Cain, Stephen; Pearce, Eric

    2014-05-01

    The Space Surveillance Telescope (SST) is a Defense Advanced Research Projects Agency program designed to detect objects in space like near Earth asteroids and space debris in the geosynchronous Earth orbit (GEO) belt. Binary hypothesis test (BHT) methods have historically been used to facilitate the detection of new objects in space. In this paper a multi-hypothesis detection strategy is introduced to improve the detection performance of SST. In this context, the multi-hypothesis testing (MHT) determines if an unresolvable point source is in either the center, a corner, or a side of a pixel in contrast to BHT, which only testsmore » whether an object is in the pixel or not. The images recorded by SST are undersampled such as to cause aliasing, which degrades the performance of traditional detection schemes. The equations for the MHT are derived in terms of signal-to-noise ratio (S/N), which is computed by subtracting the background light level around the pixel being tested and dividing by the standard deviation of the noise. A new method for determining the local noise statistics that rejects outliers is introduced in combination with the MHT. An experiment using observations of a known GEO satellite are used to demonstrate the improved detection performance of the new algorithm over algorithms previously reported in the literature. The results show a significant improvement in the probability of detection by as much as 50% over existing algorithms. In addition to detection, the S/N results prove to be linearly related to the least-squares estimates of point source irradiance, thus improving photometric accuracy.« less

  2. Statistical analysis of experimental data for mathematical modeling of physical processes in the atmosphere

    NASA Astrophysics Data System (ADS)

    Karpushin, P. A.; Popov, Yu B.; Popova, A. I.; Popova, K. Yu; Krasnenko, N. P.; Lavrinenko, A. V.

    2017-11-01

    In this paper, the probabilities of faultless operation of aerologic stations are analyzed, the hypothesis of normality of the empirical data required for using the Kalman filter algorithms is tested, and the spatial correlation functions of distributions of meteorological parameters are determined. The results of a statistical analysis of two-term (0, 12 GMT) radiosonde observations of the temperature and wind velocity components at some preset altitude ranges in the troposphere in 2001-2016 are presented. These data can be used in mathematical modeling of physical processes in the atmosphere.

  3. In Defense of P Values

    PubMed Central

    Verhulst, Brad

    2016-01-01

    P values have become the scapegoat for a wide variety of problems in science. P values are generally over-emphasized, often incorrectly applied, and in some cases even abused. However, alternative methods of hypothesis testing will likely fall victim to the same criticisms currently leveled at P values if more fundamental changes are not made in the research process. Increasing the general level of statistical literacy and enhancing training in statistical methods provide a potential avenue for identifying, correcting, and preventing erroneous conclusions from entering the academic literature and for improving the general quality of patient care. PMID:28366961

  4. Estimating equivalence with quantile regression

    USGS Publications Warehouse

    Cade, B.S.

    2011-01-01

    Equivalence testing and corresponding confidence interval estimates are used to provide more enlightened statistical statements about parameter estimates by relating them to intervals of effect sizes deemed to be of scientific or practical importance rather than just to an effect size of zero. Equivalence tests and confidence interval estimates are based on a null hypothesis that a parameter estimate is either outside (inequivalence hypothesis) or inside (equivalence hypothesis) an equivalence region, depending on the question of interest and assignment of risk. The former approach, often referred to as bioequivalence testing, is often used in regulatory settings because it reverses the burden of proof compared to a standard test of significance, following a precautionary principle for environmental protection. Unfortunately, many applications of equivalence testing focus on establishing average equivalence by estimating differences in means of distributions that do not have homogeneous variances. I discuss how to compare equivalence across quantiles of distributions using confidence intervals on quantile regression estimates that detect differences in heterogeneous distributions missed by focusing on means. I used one-tailed confidence intervals based on inequivalence hypotheses in a two-group treatment-control design for estimating bioequivalence of arsenic concentrations in soils at an old ammunition testing site and bioequivalence of vegetation biomass at a reclaimed mining site. Two-tailed confidence intervals based both on inequivalence and equivalence hypotheses were used to examine quantile equivalence for negligible trends over time for a continuous exponential model of amphibian abundance. ?? 2011 by the Ecological Society of America.

  5. Improving the Space Surveillance Telescope's Performance Using Multi-Hypothesis Testing

    NASA Astrophysics Data System (ADS)

    Zingarelli, J. Chris; Pearce, Eric; Lambour, Richard; Blake, Travis; Peterson, Curtis J. R.; Cain, Stephen

    2014-05-01

    The Space Surveillance Telescope (SST) is a Defense Advanced Research Projects Agency program designed to detect objects in space like near Earth asteroids and space debris in the geosynchronous Earth orbit (GEO) belt. Binary hypothesis test (BHT) methods have historically been used to facilitate the detection of new objects in space. In this paper a multi-hypothesis detection strategy is introduced to improve the detection performance of SST. In this context, the multi-hypothesis testing (MHT) determines if an unresolvable point source is in either the center, a corner, or a side of a pixel in contrast to BHT, which only tests whether an object is in the pixel or not. The images recorded by SST are undersampled such as to cause aliasing, which degrades the performance of traditional detection schemes. The equations for the MHT are derived in terms of signal-to-noise ratio (S/N), which is computed by subtracting the background light level around the pixel being tested and dividing by the standard deviation of the noise. A new method for determining the local noise statistics that rejects outliers is introduced in combination with the MHT. An experiment using observations of a known GEO satellite are used to demonstrate the improved detection performance of the new algorithm over algorithms previously reported in the literature. The results show a significant improvement in the probability of detection by as much as 50% over existing algorithms. In addition to detection, the S/N results prove to be linearly related to the least-squares estimates of point source irradiance, thus improving photometric accuracy. The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.

  6. Demonstration of risk based, goal driven framework for hydrological field campaigns and inverse modeling with case studies

    NASA Astrophysics Data System (ADS)

    Harken, B.; Geiges, A.; Rubin, Y.

    2013-12-01

    There are several stages in any hydrological modeling campaign, including: formulation and analysis of a priori information, data acquisition through field campaigns, inverse modeling, and forward modeling and prediction of some environmental performance metric (EPM). The EPM being predicted could be, for example, contaminant concentration, plume travel time, or aquifer recharge rate. These predictions often have significant bearing on some decision that must be made. Examples include: how to allocate limited remediation resources between multiple contaminated groundwater sites, where to place a waste repository site, and what extraction rates can be considered sustainable in an aquifer. Providing an answer to these questions depends on predictions of EPMs using forward models as well as levels of uncertainty related to these predictions. Uncertainty in model parameters, such as hydraulic conductivity, leads to uncertainty in EPM predictions. Often, field campaigns and inverse modeling efforts are planned and undertaken with reduction of parametric uncertainty as the objective. The tool of hypothesis testing allows this to be taken one step further by considering uncertainty reduction in the ultimate prediction of the EPM as the objective and gives a rational basis for weighing costs and benefits at each stage. When using the tool of statistical hypothesis testing, the EPM is cast into a binary outcome. This is formulated as null and alternative hypotheses, which can be accepted and rejected with statistical formality. When accounting for all sources of uncertainty at each stage, the level of significance of this test provides a rational basis for planning, optimization, and evaluation of the entire campaign. Case-specific information, such as consequences prediction error and site-specific costs can be used in establishing selection criteria based on what level of risk is deemed acceptable. This framework is demonstrated and discussed using various synthetic case studies. The case studies involve contaminated aquifers where a decision must be made based on prediction of when a contaminant will arrive at a given location. The EPM, in this case contaminant travel time, is cast into the hypothesis testing framework. The null hypothesis states that the contaminant plume will arrive at the specified location before a critical value of time passes, and the alternative hypothesis states that the plume will arrive after the critical time passes. Different field campaigns are analyzed based on effectiveness in reducing the probability of selecting the wrong hypothesis, which in this case corresponds to reducing uncertainty in the prediction of plume arrival time. To examine the role of inverse modeling in this framework, case studies involving both Maximum Likelihood parameter estimation and Bayesian inversion are used.

  7. Small sample mediation testing: misplaced confidence in bootstrapped confidence intervals.

    PubMed

    Koopman, Joel; Howe, Michael; Hollenbeck, John R; Sin, Hock-Peng

    2015-01-01

    Bootstrapping is an analytical tool commonly used in psychology to test the statistical significance of the indirect effect in mediation models. Bootstrapping proponents have particularly advocated for its use for samples of 20-80 cases. This advocacy has been heeded, especially in the Journal of Applied Psychology, as researchers are increasingly utilizing bootstrapping to test mediation with samples in this range. We discuss reasons to be concerned with this escalation, and in a simulation study focused specifically on this range of sample sizes, we demonstrate not only that bootstrapping has insufficient statistical power to provide a rigorous hypothesis test in most conditions but also that bootstrapping has a tendency to exhibit an inflated Type I error rate. We then extend our simulations to investigate an alternative empirical resampling method as well as a Bayesian approach and demonstrate that they exhibit comparable statistical power to bootstrapping in small samples without the associated inflated Type I error. Implications for researchers testing mediation hypotheses in small samples are presented. For researchers wishing to use these methods in their own research, we have provided R syntax in the online supplemental materials. (c) 2015 APA, all rights reserved.

  8. Physiological time-series analysis: what does regularity quantify?

    NASA Technical Reports Server (NTRS)

    Pincus, S. M.; Goldberger, A. L.

    1994-01-01

    Approximate entropy (ApEn) is a recently developed statistic quantifying regularity and complexity that appears to have potential application to a wide variety of physiological and clinical time-series data. The focus here is to provide a better understanding of ApEn to facilitate its proper utilization, application, and interpretation. After giving the formal mathematical description of ApEn, we provide a multistep description of the algorithm as applied to two contrasting clinical heart rate data sets. We discuss algorithm implementation and interpretation and introduce a general mathematical hypothesis of the dynamics of a wide class of diseases, indicating the utility of ApEn to test this hypothesis. We indicate the relationship of ApEn to variability measures, the Fourier spectrum, and algorithms motivated by study of chaotic dynamics. We discuss further mathematical properties of ApEn, including the choice of input parameters, statistical issues, and modeling considerations, and we conclude with a section on caveats to ensure correct ApEn utilization.

  9. Unbiased estimation in seamless phase II/III trials with unequal treatment effect variances and hypothesis-driven selection rules.

    PubMed

    Robertson, David S; Prevost, A Toby; Bowden, Jack

    2016-09-30

    Seamless phase II/III clinical trials offer an efficient way to select an experimental treatment and perform confirmatory analysis within a single trial. However, combining the data from both stages in the final analysis can induce bias into the estimates of treatment effects. Methods for bias adjustment developed thus far have made restrictive assumptions about the design and selection rules followed. In order to address these shortcomings, we apply recent methodological advances to derive the uniformly minimum variance conditionally unbiased estimator for two-stage seamless phase II/III trials. Our framework allows for the precision of the treatment arm estimates to take arbitrary values, can be utilised for all treatments that are taken forward to phase III and is applicable when the decision to select or drop treatment arms is driven by a multiplicity-adjusted hypothesis testing procedure. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  10. Posttraumatic cognitions and posttraumatic stress symptoms among war-affected children: a cross-lagged analysis.

    PubMed

    Palosaari, Esa; Punamäki, Raija-Leena; Diab, Marwan; Qouta, Samir

    2013-08-01

    In a longitudinal study of war-affected children, we tested, first, whether posttraumatic cognitions (PTCs) mediated the relationship between initial and later posttraumatic stress symptoms (PTSSs). Second, we analyzed the relative strength of influences that PTCs and PTSSs have on each other in cross-lagged models of levels and latent change scores. The participants were 240 Palestinian children 10-12 years of age, reporting PTSSs and PTCs measures at 3, 5, and 11 months after a major war. Results show that PTCs did not mediate between initial and later PTSSs. The levels and changes in PTCs statistically significantly predicted later levels and changes in PTSSs, but PTSSs did not statistically significantly predict later PTCs. The results are consistent with the hypothesis that PTCs have a central role in the development and maintenance of PTSSs over time, but they do not support the hypothesis that initial PTSSs develop to chronic PTSSs through negative PTCs. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  11. Fully Bayesian tests of neutrality using genealogical summary statistics.

    PubMed

    Drummond, Alexei J; Suchard, Marc A

    2008-10-31

    Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome. Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size. Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.

  12. The Effect of Substituting p for alpha on the Unconditional and Conditional Powers of a Null Hypothesis Test.

    ERIC Educational Resources Information Center

    Martuza, Victor R.; Engel, John D.

    Results from classical power analysis (Brewer, 1972) suggest that a researcher should not set a=p (when p is less than a) in a posteriori fashion when a study yields statistically significant results because of a resulting decrease in power. The purpose of the present report is to use Bayesian theory in examining the validity of this…

  13. The Infection Dynamics of a Hypothetical Virus in a High School: Use of an Ultraviolet Detectable Powder

    ERIC Educational Resources Information Center

    Baltezore, Joan M.; Newbrey, Michael G.

    2007-01-01

    The purpose of this paper is to provide background information about the spread of viruses in a population, to introduce an adaptable procedure to further the understanding of epidemiology in the high school setting, and to show how hypothesis testing and statistics can be incorporated into a high school lab exercise. It describes a project which…

  14. The Clinical Utility of the Proposed DSM-5 Callous-Unemotional Subtype of Conduct Disorder in Young Girls

    ERIC Educational Resources Information Center

    Pardini, Dustin; Stepp, Stephanie; Hipwell, Alison; Stouthamer-Loeber, Magda; Loeber, Rolf

    2012-01-01

    Objective: A callous-unemotional (CU) subtype of conduct disorder (CD) has been proposed as an addition to the fifth edition of the "Diagnostic and Statistic Manual of Mental Disorders (DSM-5)." This study tested the hypothesis that young girls with the CU subtype of CD would exhibit more severe antisocial behavior and less severe internalizing…

  15. Researchers' Perceptions of Statistical Significance Contribute to Bias in Health and Exercise Science

    ERIC Educational Resources Information Center

    Buchanan, Taylor L.; Lohse, Keith R.

    2016-01-01

    We surveyed researchers in the health and exercise sciences to explore different areas and magnitudes of bias in researchers' decision making. Participants were presented with scenarios (testing a central hypothesis with p = 0.06 or p = 0.04) in a random order and surveyed about what they would do in each scenario. Participants showed significant…

  16. The Use of Probability Theory as a Basis for Planning and Controlling Overhead Costs in Education and Industry. Final Report.

    ERIC Educational Resources Information Center

    Vinson, R. B.

    In this report, the author suggests changes in the treatment of overhead costs by hypothesizing that "the effectiveness of standard costing in planning and controlling overhead costs can be increased through the use of probability theory and associated statistical techniques." To test the hypothesis, the author (1) presents an overview of the…

  17. Is humanity sustainable?

    PubMed Central

    Fowler, Charles W; Hobbs, Larry

    2003-01-01

    The principles and tenets of management require action to avoid sustained abnormal/pathological conditions. For the sustainability of interactive systems, each system should fall within its normal range of natural variation. This applies to individuals (as for fevers and hypertension, in medicine), populations (e.g. outbreaks of crop pests in agriculture), species (e.g. the rarity of endangerment in conservation) and ecosystems (e.g. abnormally low productivity or diversity in 'ecosystem-based management'). In this paper, we report tests of the hypothesis that the human species is ecologically normal. We reject the hypothesis for almost all of the cases we tested. Our species rarely falls within statistical confidence limits that envelop the central tendencies in variation among other species. For example, our population size, CO(2) production, energy use, biomass consumption and geographical range size differ from those of other species by orders of magnitude. We argue that other measures should be tested in a similar fashion to assess the prevalence of such differences and their practical implications. PMID:14728780

  18. Accuracy of maxillary positioning after standard and inverted orthognathic sequencing.

    PubMed

    Ritto, Fabio G; Ritto, Thiago G; Ribeiro, Danilo Passeado; Medeiros, Paulo José; de Moraes, Márcio

    2014-05-01

    This study aimed to compare the accuracy of maxillary positioning after bimaxillary orthognathic surgery, using 2 sequences. A total of 80 cephalograms (40 preoperative and 40 postoperative) from 40 patients were analyzed. Group 1 included radiographs of patients submitted to conventional sequence, whereas group 2 patients were submitted to inverted sequence. The final position of the maxillary central incisor was obtained after vertical and horizontal measurements of the tracings, and it was compared with what had been planned. The null hypothesis, which stated that there would be no difference between the groups, was tested. After applying the Welch t test for comparison of mean differences between maxillary desired and achieved position, considering a statistical significance of 5% and a 2-tailed test, the null hypothesis was not rejected (P > .05). Thus, there was no difference in the accuracy of maxillary positioning between groups. Conventional and inverted sequencing proved to be reliable in positioning the maxilla after LeFort I osteotomy in bimaxillary orthognathic surgeries. Copyright © 2014 Elsevier Inc. All rights reserved.

  19. Step-stress analysis for predicting dental ceramic reliability

    PubMed Central

    Borba, Márcia; Cesar, Paulo F.; Griggs, Jason A.; Bona, Álvaro Della

    2013-01-01

    Objective To test the hypothesis that step-stress analysis is effective to predict the reliability of an alumina-based dental ceramic (VITA In-Ceram AL blocks) subjected to a mechanical aging test. Methods Bar-shaped ceramic specimens were fabricated, polished to 1µm finish and divided into 3 groups (n=10): (1) step-stress accelerating test; (2) flexural strength- control; (3) flexural strength- mechanical aging. Specimens from group 1 were tested in an electromagnetic actuator (MTS Evolution) using a three-point flexure fixture (frequency: 2Hz; R=0.1) in 37°C water bath. Each specimen was subjected to an individual stress profile, and the number of cycles to failure was recorded. A cumulative damage model with an inverse power law lifetime-stress relation and Weibull lifetime distribution were used to fit the fatigue data. The data were used to predict the stress level and number of cycles for mechanical aging (group 3). Groups 2 and 3 were tested for three-point flexural strength (σ) in a universal testing machine with 1.0 s in 37°C water. Data were statistically analyzed using Mann-Whitney Rank Sum test. Results Step-stress data analysis showed that the profile most likely to weaken the specimens without causing fracture during aging (95% CI: 0–14% failures) was: 80 MPa stress amplitude and 105 cycles. The median σ values (MPa) for groups 2 (493±54) and 3 (423±103) were statistically different (p=0.009). Significance The aging profile determined by step-stress analysis was effective to reduce alumina ceramic strength as predicted by the reliability estimate, confirming the study hypothesis. PMID:23827018

  20. Step-stress analysis for predicting dental ceramic reliability.

    PubMed

    Borba, Márcia; Cesar, Paulo F; Griggs, Jason A; Della Bona, Alvaro

    2013-08-01

    To test the hypothesis that step-stress analysis is effective to predict the reliability of an alumina-based dental ceramic (VITA In-Ceram AL blocks) subjected to a mechanical aging test. Bar-shaped ceramic specimens were fabricated, polished to 1μm finish and divided into 3 groups (n=10): (1) step-stress accelerating test; (2) flexural strength-control; (3) flexural strength-mechanical aging. Specimens from group 1 were tested in an electromagnetic actuator (MTS Evolution) using a three-point flexure fixture (frequency: 2Hz; R=0.1) in 37°C water bath. Each specimen was subjected to an individual stress profile, and the number of cycles to failure was recorded. A cumulative damage model with an inverse power law lifetime-stress relation and Weibull lifetime distribution were used to fit the fatigue data. The data were used to predict the stress level and number of cycles for mechanical aging (group 3). Groups 2 and 3 were tested for three-point flexural strength (σ) in a universal testing machine with 1.0MPa/s stress rate, in 37°C water. Data were statistically analyzed using Mann-Whitney Rank Sum test. Step-stress data analysis showed that the profile most likely to weaken the specimens without causing fracture during aging (95% CI: 0-14% failures) was: 80MPa stress amplitude and 10(5) cycles. The median σ values (MPa) for groups 2 (493±54) and 3 (423±103) were statistically different (p=0.009). The aging profile determined by step-stress analysis was effective to reduce alumina ceramic strength as predicted by the reliability estimate, confirming the study hypothesis. Copyright © 2013 Academy of Dental Materials. Published by Elsevier Ltd. All rights reserved.

  1. Reaction times to weak test lights. [psychophysics biological model

    NASA Technical Reports Server (NTRS)

    Wandell, B. A.; Ahumada, P.; Welsh, D.

    1984-01-01

    Maloney and Wandell (1984) describe a model of the response of a single visual channel to weak test lights. The initial channel response is a linearly filtered version of the stimulus. The filter output is randomly sampled over time. Each time a sample occurs there is some probability increasing with the magnitude of the sampled response - that a discrete detection event is generated. Maloney and Wandell derive the statistics of the detection events. In this paper a test is conducted of the hypothesis that the reaction time responses to the presence of a weak test light are initiated at the first detection event. This makes it possible to extend the application of the model to lights that are slightly above threshold, but still within the linear operating range of the visual system. A parameter-free prediction of the model proposed by Maloney and Wandell for lights detected by this statistic is tested. The data are in agreement with the prediction.

  2. Order-restricted inference for means with missing values.

    PubMed

    Wang, Heng; Zhong, Ping-Shou

    2017-09-01

    Missing values appear very often in many applications, but the problem of missing values has not received much attention in testing order-restricted alternatives. Under the missing at random (MAR) assumption, we impute the missing values nonparametrically using kernel regression. For data with imputation, the classical likelihood ratio test designed for testing the order-restricted means is no longer applicable since the likelihood does not exist. This article proposes a novel method for constructing test statistics for assessing means with an increasing order or a decreasing order based on jackknife empirical likelihood (JEL) ratio. It is shown that the JEL ratio statistic evaluated under the null hypothesis converges to a chi-bar-square distribution, whose weights depend on missing probabilities and nonparametric imputation. Simulation study shows that the proposed test performs well under various missing scenarios and is robust for normally and nonnormally distributed data. The proposed method is applied to an Alzheimer's disease neuroimaging initiative data set for finding a biomarker for the diagnosis of the Alzheimer's disease. © 2017, The International Biometric Society.

  3. Journal news

    USGS Publications Warehouse

    Conroy, M.J.; Samuel, M.D.; White, Joanne C.

    1995-01-01

    Statistical power (and conversely, Type II error) is often ignored by biologists. Power is important to consider in the design of studies, to ensure that sufficient resources are allocated to address a hypothesis under examination. Deter- mining appropriate sample size when designing experiments or calculating power for a statistical test requires an investigator to consider the importance of making incorrect conclusions about the experimental hypothesis and the biological importance of the alternative hypothesis (or the biological effect size researchers are attempting to measure). Poorly designed studies frequently provide results that are at best equivocal, and do little to advance science or assist in decision making. Completed studies that fail to reject Ho should consider power and the related probability of a Type II error in the interpretation of results, particularly when implicit or explicit acceptance of Ho is used to support a biological hypothesis or management decision. Investigators must consider the biological question they wish to answer (Tacha et al. 1982) and assess power on the basis of biologically significant differences (Taylor and Gerrodette 1993). Power calculations are somewhat subjective, because the author must specify either f or the minimum difference that is biologically important. Biologists may have different ideas about what values are appropriate. While determining biological significance is of central importance in power analysis, it is also an issue of importance in wildlife science. Procedures, references, and computer software to compute power are accessible; therefore, authors should consider power. We welcome comments or suggestions on this subject.

  4. Hiring a Gay Man, Taking a Risk?: A Lab Experiment on Employment Discrimination and Risk Aversion.

    PubMed

    Baert, Stijn

    2018-01-01

    We investigate risk aversion as a driver of labor market discrimination against homosexual men. We show that more hiring discrimination by more risk-averse employers is consistent with taste-based and statistical discrimination. To test this hypothesis we conduct a scenario experiment in which experimental employers take a fictitious hiring decision concerning a heterosexual or homosexual male job candidate. In addition, participants are surveyed on their risk aversion and other characteristics that might correlate with this risk aversion. Analysis of the (post-)experimental data confirms our hypothesis. The likelihood of a beneficial hiring decision for homosexual male candidates decreases by 31.7% when employers are a standard deviation more risk-averse.

  5. Statistical evidence for common ancestry: Application to primates.

    PubMed

    Baum, David A; Ané, Cécile; Larget, Bret; Solís-Lemus, Claudia; Ho, Lam Si Tung; Boone, Peggy; Drummond, Chloe P; Bontrager, Martin; Hunter, Steven J; Saucier, William

    2016-06-01

    Since Darwin, biologists have come to recognize that the theory of descent from common ancestry (CA) is very well supported by diverse lines of evidence. However, while the qualitative evidence is overwhelming, we also need formal methods for quantifying the evidential support for CA over the alternative hypothesis of separate ancestry (SA). In this article, we explore a diversity of statistical methods using data from the primates. We focus on two alternatives to CA, species SA (the separate origin of each named species) and family SA (the separate origin of each family). We implemented statistical tests based on morphological, molecular, and biogeographic data and developed two new methods: one that tests for phylogenetic autocorrelation while correcting for variation due to confounding ecological traits and a method for examining whether fossil taxa have fewer derived differences than living taxa. We overwhelmingly rejected both species and family SA with infinitesimal P values. We compare these results with those from two companion papers, which also found tremendously strong support for the CA of all primates, and discuss future directions and general philosophical issues that pertain to statistical testing of historical hypotheses such as CA. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.

  6. EVOLUTION OF THE MAGNETIC FIELD LINE DIFFUSION COEFFICIENT AND NON-GAUSSIAN STATISTICS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Snodin, A. P.; Ruffolo, D.; Matthaeus, W. H.

    The magnetic field line random walk (FLRW) plays an important role in the transport of energy and particles in turbulent plasmas. For magnetic fluctuations that are transverse or almost transverse to a large-scale mean magnetic field, theories describing the FLRW usually predict asymptotic diffusion of magnetic field lines perpendicular to the mean field. Such theories often depend on the assumption that one can relate the Lagrangian and Eulerian statistics of the magnetic field via Corrsin’s hypothesis, and additionally take the distribution of magnetic field line displacements to be Gaussian. Here we take an ordinary differential equation (ODE) model with thesemore » underlying assumptions and test how well it describes the evolution of the magnetic field line diffusion coefficient in 2D+slab magnetic turbulence, by comparisons to computer simulations that do not involve such assumptions. In addition, we directly test the accuracy of the Corrsin approximation to the Lagrangian correlation. Over much of the studied parameter space we find that the ODE model is in fairly good agreement with computer simulations, in terms of both the evolution and asymptotic values of the diffusion coefficient. When there is poor agreement, we show that this can be largely attributed to the failure of Corrsin’s hypothesis rather than the assumption of Gaussian statistics of field line displacements. The degree of non-Gaussianity, which we measure in terms of the kurtosis, appears to be an indicator of how well Corrsin’s approximation works.« less

  7. ABrox-A user-friendly Python module for approximate Bayesian computation with a focus on model comparison.

    PubMed

    Mertens, Ulf Kai; Voss, Andreas; Radev, Stefan

    2018-01-01

    We give an overview of the basic principles of approximate Bayesian computation (ABC), a class of stochastic methods that enable flexible and likelihood-free model comparison and parameter estimation. Our new open-source software called ABrox is used to illustrate ABC for model comparison on two prominent statistical tests, the two-sample t-test and the Levene-Test. We further highlight the flexibility of ABC compared to classical Bayesian hypothesis testing by computing an approximate Bayes factor for two multinomial processing tree models. Last but not least, throughout the paper, we introduce ABrox using the accompanied graphical user interface.

  8. Sense of rhythm does not differentiate professional hurdlers from non-athletes.

    PubMed

    Skowronek, Tomasz; Słomka, Kajetan; Juras, Grzegorz; Szade, Bartlomiej

    2013-08-01

    The importance of rhythm and specific endurance capabilities were examined in the technical skill and performance of hurdle runners. Additionally, interaction effects among rhythm, anaerobic fitness, and body constitution were analyzed. Seven 18-year-old members of the Polish Junior National Team in 110 m hurdles and 8 age-matched controls who were non-athletes participated. Movement coordination tests (rhythm and differentiation tests) and an anaerobic fitness test were performed. There were no statistically significant differences between the athletes and the control group on the coordination or rhythm test variables. No support was found for the hypothesis that a hurdler's timing ability influences performance.

  9. Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis

    PubMed Central

    McDermott, Josh H.; Simoncelli, Eero P.

    2014-01-01

    Rainstorms, insect swarms, and galloping horses produce “sound textures” – the collective result of many similar acoustic events. Sound textures are distinguished by temporal homogeneity, suggesting they could be recognized with time-averaged statistics. To test this hypothesis, we processed real-world textures with an auditory model containing filters tuned for sound frequencies and their modulations, and measured statistics of the resulting decomposition. We then assessed the realism and recognizability of novel sounds synthesized to have matching statistics. Statistics of individual frequency channels, capturing spectral power and sparsity, generally failed to produce compelling synthetic textures. However, combining them with correlations between channels produced identifiable and natural-sounding textures. Synthesis quality declined if statistics were computed from biologically implausible auditory models. The results suggest that sound texture perception is mediated by relatively simple statistics of early auditory representations, presumably computed by downstream neural populations. The synthesis methodology offers a powerful tool for their further investigation. PMID:21903084

  10. Understanding the Link between Poverty and Food Insecurity among Children: Does the Definition of Poverty Matter?

    PubMed

    Wight, Vanessa; Kaushal, Neeraj; Waldfogel, Jane; Garfinkel, Irv

    2014-01-02

    This paper examines the association between poverty and food insecurity among children, using two different definitions of poverty-the official poverty measure (OPM) and the new supplemental poverty measure (SPM) of the Census Bureau, which is based on a more inclusive definition of family resources and needs. Our analysis is based on data from the 2001-11 Current Population Survey and shows that food insecurity and very low food security among children decline as income-to-needs ratio increases. The point estimates show that the associations are stronger as measured by the new supplemental measure of income-to-needs ratio than when estimated through the official measure. Statistical tests reject the hypothesis that poor households' odds of experiencing low food security are the same whether the SPM or OPM measure is used; but the tests do not reject the hypothesis when very low food security is the outcome.

  11. Acculturation, enculturation, and Asian American college students' mental health and attitudes toward seeking professional psychological help.

    PubMed

    Miller, Matthew J; Yang, Minji; Hui, Kayi; Choi, Na-Yeun; Lim, Robert H

    2011-07-01

    In the present study, we tested a theoretically and empirically derived partially indirect effects acculturation and enculturation model of Asian American college students' mental health and attitudes toward seeking professional psychological help. Latent variable path analysis with 296 self-identified Asian American college students supported the partially indirect effects model and demonstrated the ways in which behavioral acculturation, behavioral enculturation, values acculturation, values enculturation, and acculturation gap family conflict related to mental health and attitudes toward seeking professional psychological help directly and indirectly through acculturative stress. We also tested a generational status moderator hypothesis to determine whether differences in model-implied relationships emerged across U.S.- (n = 185) and foreign-born (n = 107) participants. Consistent with this hypothesis, statistically significant differences in structural coefficients emerged across generational status. Limitations, future directions for research, and counseling implications are discussed.

  12. One-way ANOVA based on interval information

    NASA Astrophysics Data System (ADS)

    Hesamian, Gholamreza

    2016-08-01

    This paper deals with extending the one-way analysis of variance (ANOVA) to the case where the observed data are represented by closed intervals rather than real numbers. In this approach, first a notion of interval random variable is introduced. Especially, a normal distribution with interval parameters is introduced to investigate hypotheses about the equality of interval means or test the homogeneity of interval variances assumption. Moreover, the least significant difference (LSD method) for investigating multiple comparison of interval means is developed when the null hypothesis about the equality of means is rejected. Then, at a given interval significance level, an index is applied to compare the interval test statistic and the related interval critical value as a criterion to accept or reject the null interval hypothesis of interest. Finally, the method of decision-making leads to some degrees to accept or reject the interval hypotheses. An applied example will be used to show the performance of this method.

  13. Understanding the Link between Poverty and Food Insecurity among Children: Does the Definition of Poverty Matter?

    PubMed Central

    Wight, Vanessa; Kaushal, Neeraj; Waldfogel, Jane; Garfinkel, Irv

    2014-01-01

    This paper examines the association between poverty and food insecurity among children, using two different definitions of poverty—the official poverty measure (OPM) and the new supplemental poverty measure (SPM) of the Census Bureau, which is based on a more inclusive definition of family resources and needs. Our analysis is based on data from the 2001–11 Current Population Survey and shows that food insecurity and very low food security among children decline as income-to-needs ratio increases. The point estimates show that the associations are stronger as measured by the new supplemental measure of income-to-needs ratio than when estimated through the official measure. Statistical tests reject the hypothesis that poor households’ odds of experiencing low food security are the same whether the SPM or OPM measure is used; but the tests do not reject the hypothesis when very low food security is the outcome. PMID:25045244

  14. Testing statistical self-similarity in the topology of river networks

    USGS Publications Warehouse

    Troutman, Brent M.; Mantilla, Ricardo; Gupta, Vijay K.

    2010-01-01

    Recent work has demonstrated that the topological properties of real river networks deviate significantly from predictions of Shreve's random model. At the same time the property of mean self-similarity postulated by Tokunaga's model is well supported by data. Recently, a new class of network model called random self-similar networks (RSN) that combines self-similarity and randomness has been introduced to replicate important topological features observed in real river networks. We investigate if the hypothesis of statistical self-similarity in the RSN model is supported by data on a set of 30 basins located across the continental United States that encompass a wide range of hydroclimatic variability. We demonstrate that the generators of the RSN model obey a geometric distribution, and self-similarity holds in a statistical sense in 26 of these 30 basins. The parameters describing the distribution of interior and exterior generators are tested to be statistically different and the difference is shown to produce the well-known Hack's law. The inter-basin variability of RSN parameters is found to be statistically significant. We also test generator dependence on two climatic indices, mean annual precipitation and radiative index of dryness. Some indication of climatic influence on the generators is detected, but this influence is not statistically significant with the sample size available. Finally, two key applications of the RSN model to hydrology and geomorphology are briefly discussed.

  15. Introducing SONS, a tool for operational taxonomic unit-based comparisons of microbial community memberships and structures.

    PubMed

    Schloss, Patrick D; Handelsman, Jo

    2006-10-01

    The recent advent of tools enabling statistical inferences to be drawn from comparisons of microbial communities has enabled the focus of microbial ecology to move from characterizing biodiversity to describing the distribution of that biodiversity. Although statistical tools have been developed to compare community structures across a phylogenetic tree, we lack tools to compare the memberships and structures of two communities at a particular operational taxonomic unit (OTU) definition. Furthermore, current tests of community structure do not indicate the similarity of the communities but only report the probability of a statistical hypothesis. Here we present a computer program, SONS, which implements nonparametric estimators for the fraction and richness of OTUs shared between two communities.

  16. Statistical Model to Analyze Quantitative Proteomics Data Obtained by 18O/16O Labeling and Linear Ion Trap Mass Spectrometry

    PubMed Central

    Jorge, Inmaculada; Navarro, Pedro; Martínez-Acedo, Pablo; Núñez, Estefanía; Serrano, Horacio; Alfranca, Arántzazu; Redondo, Juan Miguel; Vázquez, Jesús

    2009-01-01

    Statistical models for the analysis of protein expression changes by stable isotope labeling are still poorly developed, particularly for data obtained by 16O/18O labeling. Besides large scale test experiments to validate the null hypothesis are lacking. Although the study of mechanisms underlying biological actions promoted by vascular endothelial growth factor (VEGF) on endothelial cells is of considerable interest, quantitative proteomics studies on this subject are scarce and have been performed after exposing cells to the factor for long periods of time. In this work we present the largest quantitative proteomics study to date on the short term effects of VEGF on human umbilical vein endothelial cells by 18O/16O labeling. Current statistical models based on normality and variance homogeneity were found unsuitable to describe the null hypothesis in a large scale test experiment performed on these cells, producing false expression changes. A random effects model was developed including four different sources of variance at the spectrum-fitting, scan, peptide, and protein levels. With the new model the number of outliers at scan and peptide levels was negligible in three large scale experiments, and only one false protein expression change was observed in the test experiment among more than 1000 proteins. The new model allowed the detection of significant protein expression changes upon VEGF stimulation for 4 and 8 h. The consistency of the changes observed at 4 h was confirmed by a replica at a smaller scale and further validated by Western blot analysis of some proteins. Most of the observed changes have not been described previously and are consistent with a pattern of protein expression that dynamically changes over time following the evolution of the angiogenic response. With this statistical model the 18O labeling approach emerges as a very promising and robust alternative to perform quantitative proteomics studies at a depth of several thousand proteins. PMID:19181660

  17. Hybrid statistical testing for nuclear material accounting data and/or process monitoring data in nuclear safeguards

    DOE PAGES

    Burr, Tom; Hamada, Michael S.; Ticknor, Larry; ...

    2015-01-01

    The aim of nuclear safeguards is to ensure that special nuclear material is used for peaceful purposes. Historically, nuclear material accounting (NMA) has provided the quantitative basis for monitoring for nuclear material loss or diversion, and process monitoring (PM) data is collected by the operator to monitor the process. PM data typically support NMA in various ways, often by providing a basis to estimate some of the in-process nuclear material inventory. We develop options for combining PM residuals and NMA residuals (residual = measurement - prediction), using a hybrid of period-driven and data-driven hypothesis testing. The modified statistical tests canmore » be used on time series of NMA residuals (the NMA residual is the familiar material balance), or on a combination of PM and NMA residuals. The PM residuals can be generated on a fixed time schedule or as events occur.« less

  18. A comment on measuring the Hurst exponent of financial time series

    NASA Astrophysics Data System (ADS)

    Couillard, Michel; Davison, Matt

    2005-03-01

    A fundamental hypothesis of quantitative finance is that stock price variations are independent and can be modeled using Brownian motion. In recent years, it was proposed to use rescaled range analysis and its characteristic value, the Hurst exponent, to test for independence in financial time series. Theoretically, independent time series should be characterized by a Hurst exponent of 1/2. However, finite Brownian motion data sets will always give a value of the Hurst exponent larger than 1/2 and without an appropriate statistical test such a value can mistakenly be interpreted as evidence of long term memory. We obtain a more precise statistical significance test for the Hurst exponent and apply it to real financial data sets. Our empirical analysis shows no long-term memory in some financial returns, suggesting that Brownian motion cannot be rejected as a model for price dynamics.

  19. Sampling and Data Analysis for Environmental Microbiology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Murray, Christopher J.

    2001-06-01

    A brief review of the literature indicates the importance of statistical analysis in applied and environmental microbiology. Sampling designs are particularly important for successful studies, and it is highly recommended that researchers review their sampling design before heading to the laboratory or the field. Most statisticians have numerous stories of scientists who approached them after their study was complete only to have to tell them that the data they gathered could not be used to test the hypothesis they wanted to address. Once the data are gathered, a large and complex body of statistical techniques are available for analysis ofmore » the data. Those methods include both numerical and graphical techniques for exploratory characterization of the data. Hypothesis testing and analysis of variance (ANOVA) are techniques that can be used to compare the mean and variance of two or more groups of samples. Regression can be used to examine the relationships between sets of variables and is often used to examine the dependence of microbiological populations on microbiological parameters. Multivariate statistics provides several methods that can be used for interpretation of datasets with a large number of variables and to partition samples into similar groups, a task that is very common in taxonomy, but also has applications in other fields of microbiology. Geostatistics and other techniques have been used to examine the spatial distribution of microorganisms. The objectives of this chapter are to provide a brief survey of some of the statistical techniques that can be used for sample design and data analysis of microbiological data in environmental studies, and to provide some examples of their use from the literature.« less

  20. Estimating the Probability of Traditional Copying, Conditional on Answer-Copying Statistics.

    PubMed

    Allen, Jeff; Ghattas, Andrew

    2016-06-01

    Statistics for detecting copying on multiple-choice tests produce p values measuring the probability of a value at least as large as that observed, under the null hypothesis of no copying. The posterior probability of copying is arguably more relevant than the p value, but cannot be derived from Bayes' theorem unless the population probability of copying and probability distribution of the answer-copying statistic under copying are known. In this article, the authors develop an estimator for the posterior probability of copying that is based on estimable quantities and can be used with any answer-copying statistic. The performance of the estimator is evaluated via simulation, and the authors demonstrate how to apply the formula using actual data. Potential uses, generalizability to other types of cheating, and limitations of the approach are discussed.

  1. Long memory and multifractality: A joint test

    NASA Astrophysics Data System (ADS)

    Goddard, John; Onali, Enrico

    2016-06-01

    The properties of statistical tests for hypotheses concerning the parameters of the multifractal model of asset returns (MMAR) are investigated, using Monte Carlo techniques. We show that, in the presence of multifractality, conventional tests of long memory tend to over-reject the null hypothesis of no long memory. Our test addresses this issue by jointly estimating long memory and multifractality. The estimation and test procedures are applied to exchange rate data for 12 currencies. Among the nested model specifications that are investigated, in 11 out of 12 cases, daily returns are most appropriately characterized by a variant of the MMAR that applies a multifractal time-deformation process to NIID returns. There is no evidence of long memory.

  2. An experiment with spectral analysis of emotional speech affected by orthodontic appliances

    NASA Astrophysics Data System (ADS)

    Přibil, Jiří; Přibilová, Anna; Ďuračková, Daniela

    2012-11-01

    The contribution describes the effect of the fixed and removable orthodontic appliances on spectral properties of emotional speech. Spectral changes were analyzed and evaluated by spectrograms and mean Welch’s periodograms. This alternative approach to the standard listening test enables to obtain objective comparison based on statistical analysis by ANOVA and hypothesis tests. Obtained results of analysis performed on short sentences of a female speaker in four emotional states (joyous, sad, angry, and neutral) show that, first of all, the removable orthodontic appliance affects the spectrograms of produced speech.

  3. Statistical procedures for evaluating daily and monthly hydrologic model predictions

    USGS Publications Warehouse

    Coffey, M.E.; Workman, S.R.; Taraba, J.L.; Fogle, A.W.

    2004-01-01

    The overall study objective was to evaluate the applicability of different qualitative and quantitative methods for comparing daily and monthly SWAT computer model hydrologic streamflow predictions to observed data, and to recommend statistical methods for use in future model evaluations. Statistical methods were tested using daily streamflows and monthly equivalent runoff depths. The statistical techniques included linear regression, Nash-Sutcliffe efficiency, nonparametric tests, t-test, objective functions, autocorrelation, and cross-correlation. None of the methods specifically applied to the non-normal distribution and dependence between data points for the daily predicted and observed data. Of the tested methods, median objective functions, sign test, autocorrelation, and cross-correlation were most applicable for the daily data. The robust coefficient of determination (CD*) and robust modeling efficiency (EF*) objective functions were the preferred methods for daily model results due to the ease of comparing these values with a fixed ideal reference value of one. Predicted and observed monthly totals were more normally distributed, and there was less dependence between individual monthly totals than was observed for the corresponding predicted and observed daily values. More statistical methods were available for comparing SWAT model-predicted and observed monthly totals. The 1995 monthly SWAT model predictions and observed data had a regression Rr2 of 0.70, a Nash-Sutcliffe efficiency of 0.41, and the t-test failed to reject the equal data means hypothesis. The Nash-Sutcliffe coefficient and the R r2 coefficient were the preferred methods for monthly results due to the ability to compare these coefficients to a set ideal value of one.

  4. Intelligent Condition Diagnosis Method Based on Adaptive Statistic Test Filter and Diagnostic Bayesian Network

    PubMed Central

    Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing

    2016-01-01

    A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method. PMID:26761006

  5. Intelligent Condition Diagnosis Method Based on Adaptive Statistic Test Filter and Diagnostic Bayesian Network.

    PubMed

    Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing

    2016-01-08

    A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method.

  6. Identifying Pleiotropic Genes in Genome-Wide Association Studies for Multivariate Phenotypes with Mixed Measurement Scales

    PubMed Central

    Williams, L. Keoki; Buu, Anne

    2017-01-01

    We propose a multivariate genome-wide association test for mixed continuous, binary, and ordinal phenotypes. A latent response model is used to estimate the correlation between phenotypes with different measurement scales so that the empirical distribution of the Fisher’s combination statistic under the null hypothesis is estimated efficiently. The simulation study shows that our proposed correlation estimation methods have high levels of accuracy. More importantly, our approach conservatively estimates the variance of the test statistic so that the type I error rate is controlled. The simulation also shows that the proposed test maintains the power at the level very close to that of the ideal analysis based on known latent phenotypes while controlling the type I error. In contrast, conventional approaches–dichotomizing all observed phenotypes or treating them as continuous variables–could either reduce the power or employ a linear regression model unfit for the data. Furthermore, the statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that conducting a multivariate test on multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests. The proposed method also offers a new approach to analyzing the Fagerström Test for Nicotine Dependence as multivariate phenotypes in genome-wide association studies. PMID:28081206

  7. Visualizing statistical significance of disease clusters using cartograms.

    PubMed

    Kronenfeld, Barry J; Wong, David W S

    2017-05-15

    Health officials and epidemiological researchers often use maps of disease rates to identify potential disease clusters. Because these maps exaggerate the prominence of low-density districts and hide potential clusters in urban (high-density) areas, many researchers have used density-equalizing maps (cartograms) as a basis for epidemiological mapping. However, we do not have existing guidelines for visual assessment of statistical uncertainty. To address this shortcoming, we develop techniques for visual determination of statistical significance of clusters spanning one or more districts on a cartogram. We developed the techniques within a geovisual analytics framework that does not rely on automated significance testing, and can therefore facilitate visual analysis to detect clusters that automated techniques might miss. On a cartogram of the at-risk population, the statistical significance of a disease cluster is determinate from the rate, area and shape of the cluster under standard hypothesis testing scenarios. We develop formulae to determine, for a given rate, the area required for statistical significance of a priori and a posteriori designated regions under certain test assumptions. Uniquely, our approach enables dynamic inference of aggregate regions formed by combining individual districts. The method is implemented in interactive tools that provide choropleth mapping, automated legend construction and dynamic search tools to facilitate cluster detection and assessment of the validity of tested assumptions. A case study of leukemia incidence analysis in California demonstrates the ability to visually distinguish between statistically significant and insignificant regions. The proposed geovisual analytics approach enables intuitive visual assessment of statistical significance of arbitrarily defined regions on a cartogram. Our research prompts a broader discussion of the role of geovisual exploratory analyses in disease mapping and the appropriate framework for visually assessing the statistical significance of spatial clusters.

  8. Flowable composites for bonding orthodontic retainers.

    PubMed

    Tabrizi, Sama; Salemis, Elio; Usumez, Serdar

    2010-01-01

    To test the null hypothesis that there are no statistically significant differences between flowables and an orthodontic adhesive tested in terms of shear bond strength (SBS) and pullout resistance. To test the SBS of Light Bond, FlowTain, Filtek Supreme, and Tetric Flow were applied to the enamel surfaces of 15 teeth. Using matrices for application, each composite material was cured for 40 seconds and subjected to SBS testing. To test pullout resistance, 15 samples were prepared for each composite in which a wire was embedded; then the composite was cured for 40 seconds. Later, the ends of the wire were drawn up and tensile stress was applied until the resin failed. Findings were analyzed using an ANOVA and a Tukey HSD test. The SBS values for Light Bond, FlowTain, Filtek Supreme, and Tetric Flow were 19.0 +/- 10.9, 14.7 +/- 9.3, 22.4 +/- 16.3, and 16.8 +/- 11.8 MPa, respectively, and mean pullout values were 42.2 +/- 13.0, 24.0 +/- 6.9, 26.3 +/- 9.4, and 33.8 +/- 18.0 N, respectively. No statistically significant differences were found among the groups in terms of SBS (P > .05). On the other hand, Light Bond yielded significantly higher pullout values compared with the flowables Filtek Supreme and Flow-Tain (P < .01). However, there were no significant differences among the pullout values of flowables, nor between Light Bond and Tetric Flow (P > .05). The hypothesis is rejected. Light Bond yielded significantly higher pullout values compared with the flowables Filtek Supreme and FlowTain. However, flowable composites provided satisfactory SBS and wire pullout values, comparable to a standard orthodontic resin, and therefore can be used as an alternative for direct bonding of lingual retainers.

  9. Ergodic theorem, ergodic theory, and statistical mechanics

    PubMed Central

    Moore, Calvin C.

    2015-01-01

    This perspective highlights the mean ergodic theorem established by John von Neumann and the pointwise ergodic theorem established by George Birkhoff, proofs of which were published nearly simultaneously in PNAS in 1931 and 1932. These theorems were of great significance both in mathematics and in statistical mechanics. In statistical mechanics they provided a key insight into a 60-y-old fundamental problem of the subject—namely, the rationale for the hypothesis that time averages can be set equal to phase averages. The evolution of this problem is traced from the origins of statistical mechanics and Boltzman's ergodic hypothesis to the Ehrenfests' quasi-ergodic hypothesis, and then to the ergodic theorems. We discuss communications between von Neumann and Birkhoff in the Fall of 1931 leading up to the publication of these papers and related issues of priority. These ergodic theorems initiated a new field of mathematical-research called ergodic theory that has thrived ever since, and we discuss some of recent developments in ergodic theory that are relevant for statistical mechanics. PMID:25691697

  10. Dynamics and regulation of the southern brook trout (Salvelinus fontinalis) population in an Appalachian stream

    Treesearch

    Gary D. Grossman; Robert E. Ratajczak; C. Michael Wagner; J. Todd Petty

    2010-01-01

    1. We used information theoretic statistics [Akaike’s Information Criterion (AIC)] and regression analysis in a multiple hypothesis testing approach to assess the processes capable of explaining long-term demographic variation in a lightly exploited brook trout population in Ball Creek, NC. We sampled a 100-m-long second-order site during both spring and autumn 1991–...

  11. The impact of Lean bundles on hospital performance: does size matter?

    PubMed

    Al-Hyari, Khalil; Abu Hammour, Sewar; Abu Zaid, Mohammad Khair Saleem; Haffar, Mohamed

    2016-10-10

    Purpose The purpose of this paper is to study the effect of the implementation of Lean bundles on hospital performance in private hospitals in Jordan and evaluate how much the size of organization can affect the relationship between Lean bundles implementation and hospital performance. Design/methodology/approach The research is considered as quantitative method (descriptive and hypothesis testing). Three statistical techniques were adopted to analyse the data. Structural equation modeling techniques and multi-group analysis were used to examine the research's hypothesis, and to perform the required statistical analysis of the data from the survey. Reliability analysis and confirmatory factor analysis were used to test the construct validity, reliability and measurement loadings that were performed. Findings Lean bundles have been identified as an effective approach that can dramatically improve the organizational performance of private hospitals in Jordan. Main Lean bundles - just in time, human resource management, and total quality management are applicable to large, small and medium hospitals without significant differences in advantages that depend on size. Originality/value According to the researchers' best knowledge, this is the first research that studies the impact of Lean bundles implementation in healthcare sector in Jordan. This research also makes a significant contribution for decision makers in healthcare to increase their awareness of Lean bundles.

  12. Experiential avoidance as an emotion regulatory function: an empirical analysis of experiential avoidance in relation to behavioral avoidance, cognitive reappraisal, and response suppression.

    PubMed

    Wolgast, Martin; Lundh, Lars-Gunnar; Viborg, Gardar

    2013-01-01

    The purpose of the present study was to empirically test the suggestion that experiential avoidance in an emotion regulation context is best understood as an emotion regulatory function of topographically distinct strategies. To do this we examined whether a measure of experiential avoidance could statistically account for the effects of emotion regulation strategies intervening at different points of the emotion-generative process as conceptualized by Gross' (1998) process model of emotion regulation. The strategies under examination were behavioral avoidance, cognitive reappraisal, and response suppression. The specific hypotheses to be tested were (1) that behavioral avoidance, cognitive reappraisal, and response suppression would statistically mediate the differences in measures of psychological well-being between a clinical and nonclinical sample, but that (2) these indirect effects would be reduced to nonsignificant levels when controlling for differences in experiential avoidance. The results provide clear support for the first hypothesis with regard to all the studied strategies. In contrast to the second hypothesis, the results showed the predicted outcome pattern only for the response-focused strategy "response suppression" and not for cognitive reappraisal or behavioral avoidance. The results are interpreted and discussed in relation to theories on experiential avoidance and emotion regulation.

  13. Hypothesis Testing as an Act of Rationality

    NASA Astrophysics Data System (ADS)

    Nearing, Grey

    2017-04-01

    Statistical hypothesis testing is ad hoc in two ways. First, setting probabilistic rejection criteria is, as Neyman (1957) put it, an act of will rather than an act of rationality. Second, physical theories like conservation laws do not inherently admit probabilistic predictions, and so we must use what are called epistemic bridge principles to connect model predictions with the actual methods of hypothesis testing. In practice, these bridge principles are likelihood functions, error functions, or performance metrics. I propose that the reason we are faced with these problems is because we have historically failed to account for a fundamental component of basic logic - namely the portion of logic that explains how epistemic states evolve in the presence of empirical data. This component of Cox' (1946) calculitic logic is called information theory (Knuth, 2005), and adding information theory our hypothetico-deductive account of science yields straightforward solutions to both of the above problems. This also yields a straightforward method for dealing with Popper's (1963) problem of verisimilitude by facilitating a quantitative approach to measuring process isomorphism. In practice, this involves data assimilation. Finally, information theory allows us to reliably bound measures of epistemic uncertainty, thereby avoiding the problem of Bayesian incoherency under misspecified priors (Grünwald, 2006). I therefore propose solutions to four of the fundamental problems inherent in both hypothetico-deductive and/or Bayesian hypothesis testing. - Neyman (1957) Inductive Behavior as a Basic Concept of Philosophy of Science. - Cox (1946) Probability, Frequency and Reasonable Expectation. - Knuth (2005) Lattice Duality: The Origin of Probability and Entropy. - Grünwald (2006). Bayesian Inconsistency under Misspecification. - Popper (1963) Conjectures and Refutations: The Growth of Scientific Knowledge.

  14. Simulation of spatially evolving turbulence and the applicability of Taylor's hypothesis in compressible flow

    NASA Technical Reports Server (NTRS)

    Lee, Sangsan; Lele, Sanjiva K.; Moin, Parviz

    1992-01-01

    For the numerical simulation of inhomogeneous turbulent flows, a method is developed for generating stochastic inflow boundary conditions with a prescribed power spectrum. Turbulence statistics from spatial simulations using this method with a low fluctuation Mach number are in excellent agreement with the experimental data, which validates the procedure. Turbulence statistics from spatial simulations are also compared to those from temporal simulations using Taylor's hypothesis. Statistics such as turbulence intensity, vorticity, and velocity derivative skewness compare favorably with the temporal simulation. However, the statistics of dilatation show a significant departure from those obtained in the temporal simulation. To directly check the applicability of Taylor's hypothesis, space-time correlations of fluctuations in velocity, vorticity, and dilatation are investigated. Convection velocities based on vorticity and velocity fluctuations are computed as functions of the spatial and temporal separations. The profile of the space-time correlation of dilatation fluctuations is explained via a wave propagation model.

  15. A maximally selected test of symmetry about zero.

    PubMed

    Laska, Eugene; Meisner, Morris; Wanderling, Joseph

    2012-11-20

    The problem of testing symmetry about zero has a long and rich history in the statistical literature. We introduce a new test that sequentially discards observations whose absolute value is below increasing thresholds defined by the data. McNemar's statistic is obtained at each threshold and the largest is used as the test statistic. We obtain the exact distribution of this maximally selected McNemar and provide tables of critical values and a program for computing p-values. Power is compared with the t-test, the Wilcoxon Signed Rank Test and the Sign Test. The new test, MM, is slightly less powerful than the t-test and Wilcoxon Signed Rank Test for symmetric normal distributions with nonzero medians and substantially more powerful than all three tests for asymmetric mixtures of normal random variables with or without zero medians. The motivation for this test derives from the need to appraise the safety profile of new medications. If pre and post safety measures are obtained, then under the null hypothesis, the variables are exchangeable and the distribution of their difference is symmetric about a zero median. Large pre-post differences are the major concern of a safety assessment. The discarded small observations are not particularly relevant to safety and can reduce power to detect important asymmetry. The new test was utilized on data from an on-road driving study performed to determine if a hypnotic, a drug used to promote sleep, has next day residual effects. Copyright © 2012 John Wiley & Sons, Ltd.

  16. Do statistical segmentation abilities predict lexical-phonological and lexical-semantic abilities in children with and without SLI?

    PubMed Central

    Mainela-Arnold, Elina; Evans, Julia L.

    2014-01-01

    This study tested the predictions of the procedural deficit hypothesis by investigating the relationship between sequential statistical learning and two aspects of lexical ability, lexical-phonological and lexical-semantic, in children with and without specific language impairment (SLI). Participants included 40 children (ages 8;5–12;3), 20 children with SLI and 20 with typical development. Children completed Saffran’s statistical word segmentation task, a lexical-phonological access task (gating task), and a word definition task. Poor statistical learners were also poor at managing lexical-phonological competition during the gating task. However, statistical learning was not a significant predictor of semantic richness in word definitions. The ability to track statistical sequential regularities may be important for learning the inherently sequential structure of lexical-phonology, but not as important for learning lexical-semantic knowledge. Consistent with the procedural/declarative memory distinction, the brain networks associated with the two types of lexical learning are likely to have different learning properties. PMID:23425593

  17. A Bayesian bird's eye view of ‘Replications of important results in social psychology’

    PubMed Central

    Schönbrodt, Felix D.; Yao, Yuling; Gelman, Andrew; Wagenmakers, Eric-Jan

    2017-01-01

    We applied three Bayesian methods to reanalyse the preregistered contributions to the Social Psychology special issue ‘Replications of Important Results in Social Psychology’ (Nosek & Lakens. 2014 Registered reports: a method to increase the credibility of published results. Soc. Psychol. 45, 137–141. (doi:10.1027/1864-9335/a000192)). First, individual-experiment Bayesian parameter estimation revealed that for directed effect size measures, only three out of 44 central 95% credible intervals did not overlap with zero and fell in the expected direction. For undirected effect size measures, only four out of 59 credible intervals contained values greater than 0.10 (10% of variance explained) and only 19 intervals contained values larger than 0.05. Second, a Bayesian random-effects meta-analysis for all 38 t-tests showed that only one out of the 38 hierarchically estimated credible intervals did not overlap with zero and fell in the expected direction. Third, a Bayes factor hypothesis test was used to quantify the evidence for the null hypothesis against a default one-sided alternative. Only seven out of 60 Bayes factors indicated non-anecdotal support in favour of the alternative hypothesis (BF10>3), whereas 51 Bayes factors indicated at least some support for the null hypothesis. We hope that future analyses of replication success will embrace a more inclusive statistical approach by adopting a wider range of complementary techniques. PMID:28280547

  18. Patterns of host specificity among the helminth parasite fauna of freshwater siluriforms: testing the biogeographical core parasite fauna hypothesis.

    PubMed

    Rosas-Valdez, Rogelio; de León, Gerardo Pérez-Ponce

    2011-04-01

    Host specificity plays an essential role in shaping the evolutionary history of host-parasite associations. In this study, an index of host specificity recently proposed was used to test, quantitatively, the hypothesis that some groups of parasites are characteristics of some host fish families along their distribution range. A database with all published records on the helminth parasites of freshwater siluriforms of Mexico was used. The host specificity index was used considering its advantage to measure the taxonomic heterogeneity of the host assemblages and its appropriateness for unequal sampling data. The helminth parasite fauna of freshwater siluriforms in Mexico seems to be specific for different host taxonomic categories. However, a relatively high number of species (47% of the total helminth fauna) is specific to their respective host family. This result provides further corroboration for the biogeographic hypothesis of the core helminth fauna proposed previously. The statistical values for host specificity obtained herein seem to be independent of host range. However, the accurate taxonomic identification of the parasites is fundamental for the evaluation of host specificity and the accurate evolutionary interpretation of this phenomenon.

  19. Testing for variation in taxonomic extinction probabilities: a suggested methodology and some results

    USGS Publications Warehouse

    Conroy, M.J.; Nichols, J.D.

    1984-01-01

    Several important questions in evolutionary biology and paleobiology involve sources of variation in extinction rates. In all cases of which we are aware, extinction rates have been estimated from data in which the probability that an observation (e.g., a fossil taxon) will occur is related both to extinction rates and to what we term encounter probabilities. Any statistical method for analyzing fossil data should at a minimum permit separate inferences on these two components. We develop a method for estimating taxonomic extinction rates from stratigraphic range data and for testing hypotheses about variability in these rates. We use this method to estimate extinction rates and to test the hypothesis of constant extinction rates for several sets of stratigraphic range data. The results of our tests support the hypothesis that extinction rates varied over the geologic time periods examined. We also present a test that can be used to identify periods of high or low extinction probabilities and provide an example using Phanerozoic invertebrate data. Extinction rates should be analyzed using stochastic models, in which it is recognized that stratigraphic samples are random varlates and that sampling is imperfect

  20. cit: hypothesis testing software for mediation analysis in genomic applications.

    PubMed

    Millstein, Joshua; Chen, Gary K; Breton, Carrie V

    2016-08-01

    The challenges of successfully applying causal inference methods include: (i) satisfying underlying assumptions, (ii) limitations in data/models accommodated by the software and (iii) low power of common multiple testing approaches. The causal inference test (CIT) is based on hypothesis testing rather than estimation, allowing the testable assumptions to be evaluated in the determination of statistical significance. A user-friendly software package provides P-values and optionally permutation-based FDR estimates (q-values) for potential mediators. It can handle single and multiple binary and continuous instrumental variables, binary or continuous outcome variables and adjustment covariates. Also, the permutation-based FDR option provides a non-parametric implementation. Simulation studies demonstrate the validity of the cit package and show a substantial advantage of permutation-based FDR over other common multiple testing strategies. The cit open-source R package is freely available from the CRAN website (https://cran.r-project.org/web/packages/cit/index.html) with embedded C ++ code that utilizes the GNU Scientific Library, also freely available (http://www.gnu.org/software/gsl/). joshua.millstein@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Short-term earthquake forecasting based on an epidemic clustering model

    NASA Astrophysics Data System (ADS)

    Console, Rodolfo; Murru, Maura; Falcone, Giuseppe

    2016-04-01

    The application of rigorous statistical tools, with the aim of verifying any prediction method, requires a univocal definition of the hypothesis, or the model, characterizing the concerned anomaly or precursor, so as it can be objectively recognized in any circumstance and by any observer. This is mandatory to build up on the old-fashion approach consisting only of the retrospective anecdotic study of past cases. A rigorous definition of an earthquake forecasting hypothesis should lead to the objective identification of particular sub-volumes (usually named alarm volumes) of the total time-space volume within which the probability of occurrence of strong earthquakes is higher than the usual. The test of a similar hypothesis needs the observation of a sufficient number of past cases upon which a statistical analysis is possible. This analysis should be aimed to determine the rate at which the precursor has been followed (success rate) or not followed (false alarm rate) by the target seismic event, or the rate at which a target event has been preceded (alarm rate) or not preceded (failure rate) by the precursor. The binary table obtained from this kind of analysis leads to the definition of the parameters of the model that achieve the maximum number of successes and the minimum number of false alarms for a specific class of precursors. The mathematical tools suitable for this purpose may include the definition of Probability Gain or the R-Score, as well as the application of popular plots such as the Molchan error-diagram and the ROC diagram. Another tool for evaluating the validity of a forecasting method is the concept of the likelihood ratio (also named performance factor) of occurrence and non-occurrence of seismic events under different hypotheses. Whatever is the method chosen for building up a new hypothesis, usually based on retrospective data, the final assessment of its validity should be carried out by a test on a new and independent set of observations. The implementation of this step could be problematic for seismicity characterized by long-term recurrence. However, the separation of the data base of the data base collected in the past in two separate sections (one on which the best fit of the parameters is carried out, and the other on which the hypothesis is tested) can be a viable solution, known as retrospective-forward testing. In this study we show examples of application of the above mentioned concepts to the analysis of the Italian catalog of instrumental seismicity, making use of an epidemic algorithm developed to model short-term clustering features. This model, for which a precursory anomaly is just the occurrence of seismic activity, doesn't need the retrospective categorization of earthquakes in terms of foreshocks, mainshocks and aftershocks. It was introduced more than 15 years ago and tested so far in a number of real cases. It is now being run by several seismological centers around the world in forward real-time mode for testing purposes.

  2. The distribution of P-values in medical research articles suggested selective reporting associated with statistical significance.

    PubMed

    Perneger, Thomas V; Combescure, Christophe

    2017-07-01

    Published P-values provide a window into the global enterprise of medical research. The aim of this study was to use the distribution of published P-values to estimate the relative frequencies of null and alternative hypotheses and to seek irregularities suggestive of publication bias. This cross-sectional study included P-values published in 120 medical research articles in 2016 (30 each from the BMJ, JAMA, Lancet, and New England Journal of Medicine). The observed distribution of P-values was compared with expected distributions under the null hypothesis (i.e., uniform between 0 and 1) and the alternative hypothesis (strictly decreasing from 0 to 1). P-values were categorized according to conventional levels of statistical significance and in one-percent intervals. Among 4,158 recorded P-values, 26.1% were highly significant (P < 0.001), 9.1% were moderately significant (P ≥ 0.001 to < 0.01), 11.7% were weakly significant (P ≥ 0.01 to < 0.05), and 53.2% were nonsignificant (P ≥ 0.05). We noted three irregularities: (1) high proportion of P-values <0.001, especially in observational studies, (2) excess of P-values equal to 1, and (3) about twice as many P-values less than 0.05 compared with those more than 0.05. The latter finding was seen in both randomized trials and observational studies, and in most types of analyses, excepting heterogeneity tests and interaction tests. Under plausible assumptions, we estimate that about half of the tested hypotheses were null and the other half were alternative. This analysis suggests that statistical tests published in medical journals are not a random sample of null and alternative hypotheses but that selective reporting is prevalent. In particular, significant results are about twice as likely to be reported as nonsignificant results. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. Visualizing the Bayesian 2-test case: The effect of tree diagrams on medical decision making.

    PubMed

    Binder, Karin; Krauss, Stefan; Bruckmaier, Georg; Marienhagen, Jörg

    2018-01-01

    In medicine, diagnoses based on medical test results are probabilistic by nature. Unfortunately, cognitive illusions regarding the statistical meaning of test results are well documented among patients, medical students, and even physicians. There are two effective strategies that can foster insight into what is known as Bayesian reasoning situations: (1) translating the statistical information on the prevalence of a disease and the sensitivity and the false-alarm rate of a specific test for that disease from probabilities into natural frequencies, and (2) illustrating the statistical information with tree diagrams, for instance, or with other pictorial representation. So far, such strategies have only been empirically tested in combination for "1-test cases", where one binary hypothesis ("disease" vs. "no disease") has to be diagnosed based on one binary test result ("positive" vs. "negative"). However, in reality, often more than one medical test is conducted to derive a diagnosis. In two studies, we examined a total of 388 medical students from the University of Regensburg (Germany) with medical "2-test scenarios". Each student had to work on two problems: diagnosing breast cancer with mammography and sonography test results, and diagnosing HIV infection with the ELISA and Western Blot tests. In Study 1 (N = 190 participants), we systematically varied the presentation of statistical information ("only textual information" vs. "only tree diagram" vs. "text and tree diagram in combination"), whereas in Study 2 (N = 198 participants), we varied the kinds of tree diagrams ("complete tree" vs. "highlighted tree" vs. "pruned tree"). All versions were implemented in probability format (including probability trees) and in natural frequency format (including frequency trees). We found that natural frequency trees, especially when the question-related branches were highlighted, improved performance, but that none of the corresponding probabilistic visualizations did.

  4. The individual tolerance concept is not the sole explanation for the probit dose-effect model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Newman, M.C.; McCloskey, J.T.

    2000-02-01

    Predominant methods for analyzing dose- or concentration-effect data (i.e., probit analysis) are based on the concept of individual tolerance or individual effective dose (IED, the smallest characteristic dose needed to kill an individual). An alternative explanation (stochasticity hypothesis) is that individuals do not have unique tolerances: death results from stochastic processes occurring similarly in all individuals. These opposing hypotheses were tested with two types of experiments. First, time to stupefaction (TTS) was measured for zebra fish (Brachydanio rerio) exposed to benzocaine. The same 40 fish were exposed during five trials to test if the same order for TTS was maintainedmore » among trials. The IED hypothesis was supported with a minor stochastic component being present. Second, eastern mosquitofish (Gambusia holbrooki) were exposed to sublethal or lethal NaCl concentrations until a large portion of the lethally exposed fish died. After sufficient time for recovery, fish sublethally exposed and fish surviving lethal exposure were exposed simultaneously to lethal NaCl concentrations. No statistically significant effect was found of previous exposure on survival time but a large stochastic component to the survival dynamics was obvious. Repetition of this second type of test with pentachlorophenol also provided no support for the IED hypothesis. The authors conclude that neither hypothesis alone was the sole or dominant explanation for the lognormal (probit) model. Determination of the correct explanation (IED or stochastic) or the relative contributions of each is crucial to predicting consequences to populations after repeated or chronic exposures to any particular toxicant.« less

  5. What is too much variation? The null hypothesis in small-area analysis.

    PubMed Central

    Diehr, P; Cain, K; Connell, F; Volinn, E

    1990-01-01

    A small-area analysis (SAA) in health services research often calculates surgery rates for several small areas, compares the largest rate to the smallest, notes that the difference is large, and attempts to explain this discrepancy as a function of service availability, physician practice styles, or other factors. SAAs are often difficult to interpret because there is little theoretical basis for determining how much variation would be expected under the null hypothesis that all of the small areas have similar underlying surgery rates and that the observed variation is due to chance. We developed a computer program to simulate the distribution of several commonly used descriptive statistics under the null hypothesis, and used it to examine the variability in rates among the counties of the state of Washington. The expected variability when the null hypothesis is true is surprisingly large, and becomes worse for procedures with low incidence, for smaller populations, when there is variability among the populations of the counties, and when readmissions are possible. The characteristics of four descriptive statistics were studied and compared. None was uniformly good, but the chi-square statistic had better performance than the others. When we reanalyzed five journal articles that presented sufficient data, the results were usually statistically significant. Since SAA research today is tending to deal with low-incidence events, smaller populations, and measures where readmissions are possible, more research is needed on the distribution of small-area statistics under the null hypothesis. New standards are proposed for the presentation of SAA results. PMID:2312306

  6. What is too much variation? The null hypothesis in small-area analysis.

    PubMed

    Diehr, P; Cain, K; Connell, F; Volinn, E

    1990-02-01

    A small-area analysis (SAA) in health services research often calculates surgery rates for several small areas, compares the largest rate to the smallest, notes that the difference is large, and attempts to explain this discrepancy as a function of service availability, physician practice styles, or other factors. SAAs are often difficult to interpret because there is little theoretical basis for determining how much variation would be expected under the null hypothesis that all of the small areas have similar underlying surgery rates and that the observed variation is due to chance. We developed a computer program to simulate the distribution of several commonly used descriptive statistics under the null hypothesis, and used it to examine the variability in rates among the counties of the state of Washington. The expected variability when the null hypothesis is true is surprisingly large, and becomes worse for procedures with low incidence, for smaller populations, when there is variability among the populations of the counties, and when readmissions are possible. The characteristics of four descriptive statistics were studied and compared. None was uniformly good, but the chi-square statistic had better performance than the others. When we reanalyzed five journal articles that presented sufficient data, the results were usually statistically significant. Since SAA research today is tending to deal with low-incidence events, smaller populations, and measures where readmissions are possible, more research is needed on the distribution of small-area statistics under the null hypothesis. New standards are proposed for the presentation of SAA results.

  7. SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit

    PubMed Central

    Chu, Annie; Cui, Jenny; Dinov, Ivo D.

    2011-01-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994

  8. On the use of attractor dimension as a feature in structural health monitoring

    USGS Publications Warehouse

    Nichols, J.M.; Virgin, L.N.; Todd, M.D.; Nichols, J.D.

    2003-01-01

    Recent works in the vibration-based structural health monitoring community have emphasised the use of correlation dimension as a discriminating statistic in seperating a damaged from undamaged response. This paper explores the utility of attractor dimension as a 'feature' and offers some comparisons between different metrics reflecting dimension. This focus is on evaluating the performance of two different measures of dimension as damage indicators in a structural health monitoring context. Results indicate that the correlation dimension is probably a poor choice of statistic for the purpose of signal discrimination. Other measures of dimension may be used for the same purposes with a higher degree of statistical reliability. The question of competing methodologies is placed in a hypothesis testing framework and answered with experimental data taken from a cantilivered beam.

  9. Applications of Bayesian Statistics to Problems in Gamma-Ray Bursts

    NASA Technical Reports Server (NTRS)

    Meegan, Charles A.

    1997-01-01

    This presentation will describe two applications of Bayesian statistics to Gamma Ray Bursts (GRBS). The first attempts to quantify the evidence for a cosmological versus galactic origin of GRBs using only the observations of the dipole and quadrupole moments of the angular distribution of bursts. The cosmological hypothesis predicts isotropy, while the galactic hypothesis is assumed to produce a uniform probability distribution over positive values for these moments. The observed isotropic distribution indicates that the Bayes factor for the cosmological hypothesis over the galactic hypothesis is about 300. Another application of Bayesian statistics is in the estimation of chance associations of optical counterparts with galaxies. The Bayesian approach is preferred to frequentist techniques here because the Bayesian approach easily accounts for galaxy mass distributions and because one can incorporate three disjoint hypotheses: (1) bursts come from galactic centers, (2) bursts come from galaxies in proportion to luminosity, and (3) bursts do not come from external galaxies. This technique was used in the analysis of the optical counterpart to GRB970228.

  10. Improved tests reveal that the accelarating moment release hypothesis is statistically insignificant

    USGS Publications Warehouse

    Hardebeck, J.L.; Felzer, K.R.; Michael, A.J.

    2008-01-01

    We test the hypothesis that accelerating moment release (AMR) is a precursor to large earthquakes, using data from California, Nevada, and Sumatra. Spurious cases of AMR can arise from data fitting because the time period, area, and sometimes magnitude range analyzed before each main shock are often optimized to produce the strongest AMR signal. Optimizing the search criteria can identify apparent AMR even if no robust signal exists. For both 1950-2006 California-Nevada M ??? 6.5 earthquakes and the 2004 M9.3 Sumatra earthquake, we can find two contradictory patterns in the pre-main shock earthquakes by data fitting: AMR and decelerating moment release. We compare the apparent AMR found in the real data to the apparent AMR found in four types of synthetic catalogs with no inherent AMR. When spatiotemporal clustering is included in the simulations, similar AMR signals are found by data fitting in both the real and synthetic data sets even though the synthetic data sets contain no real AMR. These tests demonstrate that apparent AMR may arise from a combination of data fitting and normal foreshock and aftershock activity. In principle, data-fitting artifacts could be avoided if the free parameters were determined from scaling relationships between the duration and spatial extent of the AMR pattern and the magnitude of the earthquake that follows it. However, we demonstrate that previously proposed scaling relationships are unstable, statistical artifacts caused by the use of a minimum magnitude for the earthquake catalog that scales with the main shock magnitude. Some recent AMR studies have used spatial regions based on hypothetical stress loading patterns, rather than circles, to select the data. We show that previous tests were biased and that unbiased tests do not find this change to the method to be an improvement. The use of declustered catalogs has also been proposed to eliminate the effect of clustering but we demonstrate that this does not increase the statistical significance of AMR. Given the ease with which data fitting can find desired patterns in seismicity, future studies of AMR-like observations must include complete tests against synthetic catalogs that include spatiotemporal clustering.

  11. An asymptotic analysis of the logrank test.

    PubMed

    Strawderman, R L

    1997-01-01

    Asymptotic expansions for the null distribution of the logrank statistic and its distribution under local proportional hazards alternatives are developed in the case of iid observations. The results, which are derived from the work of Gu (1992) and Taniguchi (1992), are easy to interpret, and provide some theoretical justification for many behavioral characteristics of the logrank test that have been previously observed in simulation studies. We focus primarily upon (i) the inadequacy of the usual normal approximation under treatment group imbalance; and, (ii) the effects of treatment group imbalance on power and sample size calculations. A simple transformation of the logrank statistic is also derived based on results in Konishi (1991) and is found to substantially improve the standard normal approximation to its distribution under the null hypothesis of no survival difference when there is treatment group imbalance.

  12. A bootstrap based Neyman-Pearson test for identifying variable importance.

    PubMed

    Ditzler, Gregory; Polikar, Robi; Rosen, Gail

    2015-04-01

    Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman-Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any FS algorithm, regardless of the FS criteria used by that algorithm, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point. We provide freely available software implementations of the proposed methodology.

  13. A Primer on Bayesian Analysis for Experimental Psychopathologists

    PubMed Central

    Krypotos, Angelos-Miltiadis; Blanken, Tessa F.; Arnaudova, Inna; Matzke, Dora; Beckers, Tom

    2016-01-01

    The principal goals of experimental psychopathology (EPP) research are to offer insights into the pathogenic mechanisms of mental disorders and to provide a stable ground for the development of clinical interventions. The main message of the present article is that those goals are better served by the adoption of Bayesian statistics than by the continued use of null-hypothesis significance testing (NHST). In the first part of the article we list the main disadvantages of NHST and explain why those disadvantages limit the conclusions that can be drawn from EPP research. Next, we highlight the advantages of Bayesian statistics. To illustrate, we then pit NHST and Bayesian analysis against each other using an experimental data set from our lab. Finally, we discuss some challenges when adopting Bayesian statistics. We hope that the present article will encourage experimental psychopathologists to embrace Bayesian statistics, which could strengthen the conclusions drawn from EPP research. PMID:28748068

  14. From Statistics to Meaning: Infants’ Acquisition of Lexical Categories

    PubMed Central

    Lany, Jill; Saffran, Jenny R.

    2013-01-01

    Infants are highly sensitive to statistical patterns in their auditory language input that mark word categories (e.g., noun and verb). However, it is unknown whether experience with these cues facilitates the acquisition of semantic properties of word categories. In a study testing this hypothesis, infants first listened to an artificial language in which word categories were reliably distinguished by statistical cues (experimental group) or in which these properties did not cue category membership (control group). Both groups were then trained on identical pairings between the words and pictures from two categories (animals and vehicles). Only infants in the experimental group learned the trained associations between specific words and pictures. Moreover, these infants generalized the pattern to include novel pairings. These results suggest that experience with statistical cues marking lexical categories sets the stage for learning the meanings of individual words and for generalizing meanings to new category members. PMID:20424058

  15. Flexural strength and failure modes of layered ceramic structures.

    PubMed

    Borba, Márcia; de Araújo, Maico D; de Lima, Erick; Yoshimura, Humberto N; Cesar, Paulo F; Griggs, Jason A; Della Bona, Alvaro

    2011-12-01

    To evaluate the effect of the specimen design on the flexural strength (σ(f)) and failure mode of ceramic structures, testing the hypothesis that the ceramic material under tension controls the mechanical performance of the structure. Three ceramics used as framework materials for fixed partial dentures (YZ--Vita In-Ceram YZ; IZ--Vita In-Ceram Zirconia; AL--Vita In-Ceram AL) and two veneering porcelains (VM7 and VM9) were studied. Bar-shaped specimens were produced in three different designs (n=10): monolithic, two layers (porcelain-framework) and three layers (TRI) (porcelain-framework-porcelain). Specimens were tested for three-point flexural strength at 1MPa/s in 37°C artificial saliva. For bi-layered design, the specimens were tested in both conditions: with porcelain (PT) or framework ceramic (FT) layer under tension. Fracture surfaces were analyzed using stereomicroscope and scanning electron microscopy (SEM). Young's modulus (E) and Poisson's ratio (ν) were determined using ultrasonic pulse-echo method. Results were statistically analyzed by Kruskal-Wallis and Student-Newman-Keuls tests. Except for VM7 and VM9, significant differences were observed for E values among the materials. YZ showed the highest ν value followed by IZ and AL. YZ presented the highest σ(f). There was no statistical difference in the σ(f) value between IZ and IZ-FT and between AL and AL-FT. σ(f) values for YZ-PT, IZ-PT, IZ-TRI, AL-PT, AL-TRI were similar to the results obtained for VM7 and VM9. Two types of fracture mode were identified: total and partial failure. The mechanical performance of the specimens was determined by the material under tension during testing, confirming the study hypothesis. Copyright © 2011 Academy of Dental Materials. Published by Elsevier Ltd. All rights reserved.

  16. Access to health care and community social capital.

    PubMed

    Hendryx, Michael S; Ahern, Melissa M; Lovrich, Nicholas P; McCurdy, Arthur H

    2002-02-01

    To test the hypothesis that variation in reported access to health care is positively related to the level of social capital present in a community. The 1996 Household Survey of the Community Tracking Study, drawn from 22 metropolitan statistical areas across the United States (n = 19,672). Additional data for the 22 communities are from a 1996 multicity broadcast media marketing database, including key social capital indicators, the 1997 National Profile of Local Health Departments survey, and Interstudy, American Hospital Association, and American Medical Association sources. The design is cross-sectional. Self-reported access to care problems is the dependent variable. Independent variables include individual sociodemographic variables, community-level health sector variables, and social capital variables. Data are merged from the various sources and weighted to be population representative and are analyzed using hierarchical categorical modeling. Persons who live in metropolitan statistical areas featuring higher levels of social capital report fewer problems accessing health care. A higher HMO penetration rate in a metropolitan statistical area was also associated with fewer access problems. Other health sector variables were not related to health care access. The results observed for 22 major U.S. cities are consistent with the hypothesis that community social capital enables better access to care, perhaps through improving community accountability mechanisms.

  17. The Mantel-Haenszel procedure revisited: models and generalizations.

    PubMed

    Fidler, Vaclav; Nagelkerke, Nico

    2013-01-01

    Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented.

  18. The Mantel-Haenszel Procedure Revisited: Models and Generalizations

    PubMed Central

    Fidler, Vaclav; Nagelkerke, Nico

    2013-01-01

    Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented. PMID:23516463

  19. Bayesian Nonparametric Prediction and Statistical Inference

    DTIC Science & Technology

    1989-09-07

    Kadane, J. (1980), "Bayesian decision theory and the sim- plification of models," in Evaluation of Econometric Models, J. Kmenta and J. Ramsey , eds...the random model and weighted least squares regression," in Evaluation of Econometric Models, ed. by J. Kmenta and J. Ramsey , Academic Press, 197-217...likelihood function. On the other hand, H. Jeffreys’s theory of hypothesis testing covers the most important situations in which the prior is not diffuse. See

  20. Wavelet analysis in ecology and epidemiology: impact of statistical tests

    PubMed Central

    Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario

    2014-01-01

    Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the ‘beta-surrogate’ method. PMID:24284892

  1. Wavelet analysis in ecology and epidemiology: impact of statistical tests.

    PubMed

    Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario

    2014-02-06

    Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the 'beta-surrogate' method.

  2. A generalized Kruskal-Wallis test incorporating group uncertainty with application to genetic association studies.

    PubMed

    Acar, Elif F; Sun, Lei

    2013-06-01

    Motivated by genetic association studies of SNPs with genotype uncertainty, we propose a generalization of the Kruskal-Wallis test that incorporates group uncertainty when comparing k samples. The extended test statistic is based on probability-weighted rank-sums and follows an asymptotic chi-square distribution with k - 1 degrees of freedom under the null hypothesis. Simulation studies confirm the validity and robustness of the proposed test in finite samples. Application to a genome-wide association study of type 1 diabetic complications further demonstrates the utilities of this generalized Kruskal-Wallis test for studies with group uncertainty. The method has been implemented as an open-resource R program, GKW. © 2013, The International Biometric Society.

  3. The use and misuse of statistical methodologies in pharmacology research.

    PubMed

    Marino, Michael J

    2014-01-01

    Descriptive, exploratory, and inferential statistics are necessary components of hypothesis-driven biomedical research. Despite the ubiquitous need for these tools, the emphasis on statistical methods in pharmacology has become dominated by inferential methods often chosen more by the availability of user-friendly software than by any understanding of the data set or the critical assumptions of the statistical tests. Such frank misuse of statistical methodology and the quest to reach the mystical α<0.05 criteria has hampered research via the publication of incorrect analysis driven by rudimentary statistical training. Perhaps more critically, a poor understanding of statistical tools limits the conclusions that may be drawn from a study by divorcing the investigator from their own data. The net result is a decrease in quality and confidence in research findings, fueling recent controversies over the reproducibility of high profile findings and effects that appear to diminish over time. The recent development of "omics" approaches leading to the production of massive higher dimensional data sets has amplified these issues making it clear that new approaches are needed to appropriately and effectively mine this type of data. Unfortunately, statistical education in the field has not kept pace. This commentary provides a foundation for an intuitive understanding of statistics that fosters an exploratory approach and an appreciation for the assumptions of various statistical tests that hopefully will increase the correct use of statistics, the application of exploratory data analysis, and the use of statistical study design, with the goal of increasing reproducibility and confidence in the literature. Copyright © 2013. Published by Elsevier Inc.

  4. Enriching plausible new hypothesis generation in PubMed.

    PubMed

    Baek, Seung Han; Lee, Dahee; Kim, Minjoo; Lee, Jong Ho; Song, Min

    2017-01-01

    Most of earlier studies in the field of literature-based discovery have adopted Swanson's ABC model that links pieces of knowledge entailed in disjoint literatures. However, the issue concerning their practicability remains to be solved since most of them did not deal with the context surrounding the discovered associations and usually not accompanied with clinical confirmation. In this study, we aim to propose a method that expands and elaborates the existing hypothesis by advanced text mining techniques for capturing contexts. We extend ABC model to allow for multiple B terms with various biological types. We were able to concretize a specific, metabolite-related hypothesis with abundant contextual information by using the proposed method. Starting from explaining the relationship between lactosylceramide and arterial stiffness, the hypothesis was extended to suggest a potential pathway consisting of lactosylceramide, nitric oxide, malondialdehyde, and arterial stiffness. The experiment by domain experts showed that it is clinically valid. The proposed method is designed to provide plausible candidates of the concretized hypothesis, which are based on extracted heterogeneous entities and detailed relation information, along with a reliable ranking criterion. Statistical tests collaboratively conducted with biomedical experts provide the validity and practical usefulness of the method unlike previous studies. Applying the proposed method to other cases, it would be helpful for biologists to support the existing hypothesis and easily expect the logical process within it.

  5. Evidence for plant-derived xenomiRs based on a large-scale analysis of public small RNA sequencing data from human samples.

    PubMed

    Zhao, Qi; Liu, Yuanning; Zhang, Ning; Hu, Menghan; Zhang, Hao; Joshi, Trupti; Xu, Dong

    2018-01-01

    In recent years, an increasing number of studies have reported the presence of plant miRNAs in human samples, which resulted in a hypothesis asserting the existence of plant-derived exogenous microRNA (xenomiR). However, this hypothesis is not widely accepted in the scientific community due to possible sample contamination and the small sample size with lack of rigorous statistical analysis. This study provides a systematic statistical test that can validate (or invalidate) the plant-derived xenomiR hypothesis by analyzing 388 small RNA sequencing data from human samples in 11 types of body fluids/tissues. A total of 166 types of plant miRNAs were found in at least one human sample, of which 14 plant miRNAs represented more than 80% of the total plant miRNAs abundance in human samples. Plant miRNA profiles were characterized to be tissue-specific in different human samples. Meanwhile, the plant miRNAs identified from microbiome have an insignificant abundance compared to those from humans, while plant miRNA profiles in human samples were significantly different from those in plants, suggesting that sample contamination is an unlikely reason for all the plant miRNAs detected in human samples. This study also provides a set of testable synthetic miRNAs with isotopes that can be detected in situ after being fed to animals.

  6. Meta-Analysis of Rare Binary Adverse Event Data

    PubMed Central

    Bhaumik, Dulal K.; Amatya, Anup; Normand, Sharon-Lise; Greenhouse, Joel; Kaizar, Eloise; Neelon, Brian; Gibbons, Robert D.

    2013-01-01

    We examine the use of fixed-effects and random-effects moment-based meta-analytic methods for analysis of binary adverse event data. Special attention is paid to the case of rare adverse events which are commonly encountered in routine practice. We study estimation of model parameters and between-study heterogeneity. In addition, we examine traditional approaches to hypothesis testing of the average treatment effect and detection of the heterogeneity of treatment effect across studies. We derive three new methods, simple (unweighted) average treatment effect estimator, a new heterogeneity estimator, and a parametric bootstrapping test for heterogeneity. We then study the statistical properties of both the traditional and new methods via simulation. We find that in general, moment-based estimators of combined treatment effects and heterogeneity are biased and the degree of bias is proportional to the rarity of the event under study. The new methods eliminate much, but not all of this bias. The various estimators and hypothesis testing methods are then compared and contrasted using an example dataset on treatment of stable coronary artery disease. PMID:23734068

  7. Detecting trends in raptor counts: power and type I error rates of various statistical tests

    USGS Publications Warehouse

    Hatfield, J.S.; Gould, W.R.; Hoover, B.A.; Fuller, M.R.; Lindquist, E.L.

    1996-01-01

    We conducted simulations that estimated power and type I error rates of statistical tests for detecting trends in raptor population count data collected from a single monitoring site. Results of the simulations were used to help analyze count data of bald eagles (Haliaeetus leucocephalus) from 7 national forests in Michigan, Minnesota, and Wisconsin during 1980-1989. Seven statistical tests were evaluated, including simple linear regression on the log scale and linear regression with a permutation test. Using 1,000 replications each, we simulated n = 10 and n = 50 years of count data and trends ranging from -5 to 5% change/year. We evaluated the tests at 3 critical levels (alpha = 0.01, 0.05, and 0.10) for both upper- and lower-tailed tests. Exponential count data were simulated by adding sampling error with a coefficient of variation of 40% from either a log-normal or autocorrelated log-normal distribution. Not surprisingly, tests performed with 50 years of data were much more powerful than tests with 10 years of data. Positive autocorrelation inflated alpha-levels upward from their nominal levels, making the tests less conservative and more likely to reject the null hypothesis of no trend. Of the tests studied, Cox and Stuart's test and Pollard's test clearly had lower power than the others. Surprisingly, the linear regression t-test, Collins' linear regression permutation test, and the nonparametric Lehmann's and Mann's tests all had similar power in our simulations. Analyses of the count data suggested that bald eagles had increasing trends on at least 2 of the 7 national forests during 1980-1989.

  8. Understanding the Role of P Values and Hypothesis Tests in Clinical Research.

    PubMed

    Mark, Daniel B; Lee, Kerry L; Harrell, Frank E

    2016-12-01

    P values and hypothesis testing methods are frequently misused in clinical research. Much of this misuse appears to be owing to the widespread, mistaken belief that they provide simple, reliable, and objective triage tools for separating the true and important from the untrue or unimportant. The primary focus in interpreting therapeutic clinical research data should be on the treatment ("oomph") effect, a metaphorical force that moves patients given an effective treatment to a different clinical state relative to their control counterparts. This effect is assessed using 2 complementary types of statistical measures calculated from the data, namely, effect magnitude or size and precision of the effect size. In a randomized trial, effect size is often summarized using constructs, such as odds ratios, hazard ratios, relative risks, or adverse event rate differences. How large a treatment effect has to be to be consequential is a matter for clinical judgment. The precision of the effect size (conceptually related to the amount of spread in the data) is usually addressed with confidence intervals. P values (significance tests) were first proposed as an informal heuristic to help assess how "unexpected" the observed effect size was if the true state of nature was no effect or no difference. Hypothesis testing was a modification of the significance test approach that envisioned controlling the false-positive rate of study results over many (hypothetical) repetitions of the experiment of interest. Both can be helpful but, by themselves, provide only a tunnel vision perspective on study results that ignores the clinical effects the study was conducted to measure.

  9. Oxygen hypothesis of polar gigantism not supported by performance of Antarctic pycnogonids in hypoxia.

    PubMed

    Woods, H Arthur; Moran, Amy L; Arango, Claudia P; Mullen, Lindy; Shields, Chris

    2009-03-22

    Compared to temperate and tropical relatives, some high-latitude marine species are large-bodied, a phenomenon known as polar gigantism. A leading hypothesis on the physiological basis of gigantism posits that, in polar water, high oxygen availability coupled to low metabolic rates relieves constraints on oxygen transport and allows the evolution of large body size. Here, we test the oxygen hypothesis using Antarctic pycnogonids, which have been evolving in very cold conditions (-1.8-0 degrees C) for several million years and contain spectacular examples of gigantism. Pycnogonids from 12 species, spanning three orders of magnitude in body mass, were collected from McMurdo Sound, Antarctica. Individual sea spiders were forced into activity and their performance was measured at different experimental levels of dissolved oxygen (DO). The oxygen hypothesis predicts that, all else being equal, large pycnogonids should perform disproportionately poorly in hypoxia, an outcome that would appear as a statistically significant interaction between body size and oxygen level. In fact, although we found large effects of DO on performance, and substantial interspecific variability in oxygen sensitivity, there was no evidence for sizexDO interactions. These data do not support the oxygen hypothesis of Antarctic pycnogonid gigantism and suggest that explanations must be sought in other ecological or evolutionary processes.

  10. Oxygen hypothesis of polar gigantism not supported by performance of Antarctic pycnogonids in hypoxia

    PubMed Central

    Woods, H. Arthur; Moran, Amy L.; Arango, Claudia P.; Mullen, Lindy; Shields, Chris

    2008-01-01

    Compared to temperate and tropical relatives, some high-latitude marine species are large-bodied, a phenomenon known as polar gigantism. A leading hypothesis on the physiological basis of gigantism posits that, in polar water, high oxygen availability coupled to low metabolic rates relieves constraints on oxygen transport and allows the evolution of large body size. Here, we test the oxygen hypothesis using Antarctic pycnogonids, which have been evolving in very cold conditions (−1.8–0°C) for several million years and contain spectacular examples of gigantism. Pycnogonids from 12 species, spanning three orders of magnitude in body mass, were collected from McMurdo Sound, Antarctica. Individual sea spiders were forced into activity and their performance was measured at different experimental levels of dissolved oxygen (DO). The oxygen hypothesis predicts that, all else being equal, large pycnogonids should perform disproportionately poorly in hypoxia, an outcome that would appear as a statistically significant interaction between body size and oxygen level. In fact, although we found large effects of DO on performance, and substantial interspecific variability in oxygen sensitivity, there was no evidence for size×DO interactions. These data do not support the oxygen hypothesis of Antarctic pycnogonid gigantism and suggest that explanations must be sought in other ecological or evolutionary processes. PMID:19129117

  11. Mortality of veteran participants in the crossroads nuclear test

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Johnson, J.C.; Thaul, S.; Page, W.F.

    1997-07-01

    Operation CROSSROADS, conducted at Bikini Atoll in 1946, was the first post World War II test of nuclear weapons. Mortality experience of 40,000 military veteran participants in CROSSROADS was compared to that of a similar cohort of nonparticipating veterans. All-cause mortality of the participants was slightly increased over nonparticipants by 5% (p < .001). Smaller increases in participant mortality for all malignancies (1.4%, p = 0.26) or leukemia (2.0%, p = 0.9) were not statistically significant. These results do not support a hypothesis that radiation had increased participant cancer mortality over that of nonparticipants. 8 refs.

  12. Statistical reporting inconsistencies in experimental philosophy

    PubMed Central

    Colombo, Matteo; Duev, Georgi; Nuijten, Michèle B.; Sprenger, Jan

    2018-01-01

    Experimental philosophy (x-phi) is a young field of research in the intersection of philosophy and psychology. It aims to make progress on philosophical questions by using experimental methods traditionally associated with the psychological and behavioral sciences, such as null hypothesis significance testing (NHST). Motivated by recent discussions about a methodological crisis in the behavioral sciences, questions have been raised about the methodological standards of x-phi. Here, we focus on one aspect of this question, namely the rate of inconsistencies in statistical reporting. Previous research has examined the extent to which published articles in psychology and other behavioral sciences present statistical inconsistencies in reporting the results of NHST. In this study, we used the R package statcheck to detect statistical inconsistencies in x-phi, and compared rates of inconsistencies in psychology and philosophy. We found that rates of inconsistencies in x-phi are lower than in the psychological and behavioral sciences. From the point of view of statistical reporting consistency, x-phi seems to do no worse, and perhaps even better, than psychological science. PMID:29649220

  13. Advances in Statistical Methods for Substance Abuse Prevention Research

    PubMed Central

    MacKinnon, David P.; Lockwood, Chondra M.

    2010-01-01

    The paper describes advances in statistical methods for prevention research with a particular focus on substance abuse prevention. Standard analysis methods are extended to the typical research designs and characteristics of the data collected in prevention research. Prevention research often includes longitudinal measurement, clustering of data in units such as schools or clinics, missing data, and categorical as well as continuous outcome variables. Statistical methods to handle these features of prevention data are outlined. Developments in mediation, moderation, and implementation analysis allow for the extraction of more detailed information from a prevention study. Advancements in the interpretation of prevention research results include more widespread calculation of effect size and statistical power, the use of confidence intervals as well as hypothesis testing, detailed causal analysis of research findings, and meta-analysis. The increased availability of statistical software has contributed greatly to the use of new methods in prevention research. It is likely that the Internet will continue to stimulate the development and application of new methods. PMID:12940467

  14. Ionospheric scintillation studies

    NASA Technical Reports Server (NTRS)

    Rino, C. L.; Freemouw, E. J.

    1973-01-01

    The diffracted field of a monochromatic plane wave was characterized by two complex correlation functions. For a Gaussian complex field, these quantities suffice to completely define the statistics of the field. Thus, one can in principle calculate the statistics of any measurable quantity in terms of the model parameters. The best data fits were achieved for intensity statistics derived under the Gaussian statistics hypothesis. The signal structure that achieved the best fit was nearly invariant with scintillation level and irregularity source (ionosphere or solar wind). It was characterized by the fact that more than 80% of the scattered signal power is in phase quadrature with the undeviated or coherent signal component. Thus, the Gaussian-statistics hypothesis is both convenient and accurate for channel modeling work.

  15. Sunflower therapy for children with specific learning difficulties (dyslexia): a randomised, controlled trial.

    PubMed

    Bull, Leona

    2007-02-01

    The aim of the study was to determine the clinical and perceived effectiveness of the Sunflower therapy in the treatment of childhood dyslexia. The Sunflower therapy includes applied kinesiology, physical manipulation, massage, homeopathy, herbal remedies and neuro-linguistic programming. A multi-centred, randomised controlled trial was undertaken with 70 dyslexic children aged 6-13 years. The research study aimed to test the research hypothesis that dyslexic children 'feel better' and 'perform better' as a result of treatment by the Sunflower therapy. Children in the treatment group and the control group were assessed using a battery of standardised cognitive, Literacy and self-esteem tests before and after the intervention. Parents of children in the treatment group gave feedback on their experience of the Sunflower therapy. Test scores were compared using the Mann Whitney, and Wilcoxon statistical tests. While both groups of children improved in some of their test scores over time, there were no statistically significant improvements in cognitive or Literacy test performance associated with the treatment. However, there were statistically significant improvements in academic self-esteem, and reading self-esteem, for the treatment group. The majority of parents (57.13%) felt that the Sunflower therapy was effective in the treatment of learning difficulties. Further research is required to verify these findings, and should include a control group receiving a dummy treatment to exclude placebo effects.

  16. A Statistical Analysis of the Relationship between Harmonic Surprise and Preference in Popular Music.

    PubMed

    Miles, Scott A; Rosen, David S; Grzywacz, Norberto M

    2017-01-01

    Studies have shown that some musical pieces may preferentially activate reward centers in the brain. Less is known, however, about the structural aspects of music that are associated with this activation. Based on the music cognition literature, we propose two hypotheses for why some musical pieces are preferred over others. The first, the Absolute-Surprise Hypothesis, states that unexpected events in music directly lead to pleasure. The second, the Contrastive-Surprise Hypothesis, proposes that the juxtaposition of unexpected events and subsequent expected events leads to an overall rewarding response. We tested these hypotheses within the framework of information theory, using the measure of "surprise." This information-theoretic variable mathematically describes how improbable an event is given a known distribution. We performed a statistical investigation of surprise in the harmonic structure of songs within a representative corpus of Western popular music, namely, the McGill Billboard Project corpus. We found that chords of songs in the top quartile of the Billboard chart showed greater average surprise than those in the bottom quartile. We also found that the different sections within top-quartile songs varied more in their average surprise than the sections within bottom-quartile songs. The results of this study are consistent with both the Absolute- and Contrastive-Surprise Hypotheses. Although these hypotheses seem contradictory to one another, we cannot yet discard the possibility that both absolute and contrastive types of surprise play roles in the enjoyment of popular music. We call this possibility the Hybrid-Surprise Hypothesis. The results of this statistical investigation have implications for both music cognition and the human neural mechanisms of esthetic judgments.

  17. A Statistical Analysis of the Relationship between Harmonic Surprise and Preference in Popular Music

    PubMed Central

    Miles, Scott A.; Rosen, David S.; Grzywacz, Norberto M.

    2017-01-01

    Studies have shown that some musical pieces may preferentially activate reward centers in the brain. Less is known, however, about the structural aspects of music that are associated with this activation. Based on the music cognition literature, we propose two hypotheses for why some musical pieces are preferred over others. The first, the Absolute-Surprise Hypothesis, states that unexpected events in music directly lead to pleasure. The second, the Contrastive-Surprise Hypothesis, proposes that the juxtaposition of unexpected events and subsequent expected events leads to an overall rewarding response. We tested these hypotheses within the framework of information theory, using the measure of “surprise.” This information-theoretic variable mathematically describes how improbable an event is given a known distribution. We performed a statistical investigation of surprise in the harmonic structure of songs within a representative corpus of Western popular music, namely, the McGill Billboard Project corpus. We found that chords of songs in the top quartile of the Billboard chart showed greater average surprise than those in the bottom quartile. We also found that the different sections within top-quartile songs varied more in their average surprise than the sections within bottom-quartile songs. The results of this study are consistent with both the Absolute- and Contrastive-Surprise Hypotheses. Although these hypotheses seem contradictory to one another, we cannot yet discard the possibility that both absolute and contrastive types of surprise play roles in the enjoyment of popular music. We call this possibility the Hybrid-Surprise Hypothesis. The results of this statistical investigation have implications for both music cognition and the human neural mechanisms of esthetic judgments. PMID:28572763

  18. A Statistical Test of Correlations and Periodicities in the Geological Records

    NASA Astrophysics Data System (ADS)

    Yabushita, S.

    1997-09-01

    Matsumoto & Kubotani argued that there is a positive and statistically significant correlation between cratering and mass extinction. This argument is critically examined by adopting a method of Ertel used by Matsumoto & Kubotani but by applying it more directly to the extinction and cratering records. It is shown that on the null-hypothesis of random distribution of crater ages, the observed correlation has a probability of occurrence of 13%. However, when large craters are excluded whose ages agree with the times of peaks of extinction rate of marine fauna, one obtains a negative correlation. This result strongly indicates that mass extinction are not due to accumulation of impacts but due to isolated gigantic impacts.

  19. Being and feeling unique: statistical deviance and psychological marginality.

    PubMed

    Frable, D E

    1993-03-01

    Two studies tested the hypothesis that people with culturally stigmatized and concealable conditions (e.g., gays, epileptics, juvenile delinquents, and incest victims) would be more likely to feel unique than people with culturally valued or conspicuous conditions (e.g., the physically attractive, the intellectually gifted, the obese, and the facially scarred). In Study 1, culturally stigmatized individuals with concealable conditions were least likely to perceive consensus between their personal preferences and those of others. In Study 2, they were most likely to describe themselves as unique and to make these self-relevant decisions quickly. Marginality is a psychological reality, not just a statistical one, for those with stigmatized and concealable "master status" conditions.

  20. Autosomal STRs provide genetic evidence for the hypothesis that Tai people originate from southern China.

    PubMed

    Sun, Hao; Zhou, Chi; Huang, Xiaoqin; Lin, Keqin; Shi, Lei; Yu, Liang; Liu, Shuyuan; Chu, Jiayou; Yang, Zhaoqing

    2013-01-01

    Tai people are widely distributed in Thailand, Laos and southwestern China and are a large population of Southeast Asia. Although most anthropologists and historians agree that modern Tai people are from southwestern China and northern Thailand, the place from which they historically migrated remains controversial. Three popular hypotheses have been proposed: northern origin hypothesis, southern origin hypothesis or an indigenous origin. We compared the genetic relationships between the Tai in China and their "siblings" to test different hypotheses by analyzing 10 autosomal microsatellites. The genetic data of 916 samples from 19 populations were analyzed in this survey. The autosomal STR data from 15 of the 19 populations came from our previous study (Lin et al., 2010). 194 samples from four additional populations were genotyped in this study: Han (Yunnan), Dai (Dehong), Dai (Yuxi) and Mongolian. The results of genetic distance comparisons, genetic structure analyses and admixture analyses all indicate that populations from northern origin hypothesis have large genetic distances and are clearly differentiated from the Tai. The simulation-based ABC analysis also indicates this. The posterior probability of the northern origin hypothesis is just 0.04 [95%CI: (0.01-0.06)]. Conversely, genetic relationships were very close between the Tai and populations from southern origin or an indigenous origin hypothesis. Simulation-based ABC analyses were also used to distinguish the southern origin hypothesis from the indigenous origin hypothesis. The results indicate that the posterior probability of the southern origin hypothesis [0.640, 95%CI: (0.524-0.757)] is greater than that of the indigenous origin hypothesis [0.324, 95%CI: (0.211-0.438)]. Therefore, we propose that the genetic evidence does not support the hypothesis of northern origin. Our genetic data indicate that the southern origin hypothesis has higher probability than the other two hypotheses statistically, suggesting that the Tai people most likely originated from southern China.

  1. Religion and Happiness: A Study Among University Students in Turkey.

    PubMed

    Francis, Leslie J; Ok, Üzeyir; Robbins, Mandy

    2017-08-01

    This study tests the hypothesis that higher levels of positive religious affect are associated with higher levels of personal happiness among a sample of 348 students studying at a state university in Turkey who completed the Ok Religious Attitude Scale (Islam), the Oxford Happiness Inventory, and the short-form Eysenck Personality Questionnaire Revised. The data reported a small but statistically significant association between religiosity and happiness after taking sex and individual differences in personality into account.

  2. On-line Flagging of Anomalies and Adaptive Sequential Hypothesis Testing for Fine-feature Characterization of Geosynchronous Satellites

    DTIC Science & Technology

    2015-10-18

    statistical assessment of geosynchronous satellite status based on non- resolved photometry data,” AMOS Technical Conference, 2014 2. Wald, A...the seasonal changes. Its purpose is to enable an ongoing, automated assessment of satellite behavior through its life cycle using the photometry data...work on the decision to move the time slider [1], which is required in order to update the baseline signature (brightness) data for a satellite

  3. Historical and Future Projected Hydrologic Extremes over the Midwest and Great Lakes Region

    NASA Astrophysics Data System (ADS)

    Byun, K.; Hamlet, A. F.; Chiu, C. M.

    2016-12-01

    There is an increasing body of evidence from observed data that climate variability combined with regional climate change has had a significant impact on hydrologic cycles, including both seasonal patterns of runoff and altered hydrologic extremes (e.g. floods and extreme stormwater events). To better understand changing patterns of extreme high flows in Midwest and Great Lakes region, we analyzed long-term historical observations of peak streamflow at different gaging stations. We also conducted hydrologic model experiments using the Variable Infiltration Capacity (VIC) at 1/16 degree resolution in order to explore sensitivity of annual peak streamflow, both historically and under temperature and precipitation changes for several future periods. For future projections, the Hybrid Delta statistical downscaling approach applied to the Coupled Model Inter-comparison, Phase5 (CMIP5) Global Climate Model (GCM) scenarios was used to produce driving data for the VIC hydrologic model. Preliminary results for several test basins in the Midwest support the hypothesis that there are consistent and statistically significant changes in the mean annual flood starting before and after about 1975. Future projections using hydrologic model simulations support the hypothesis of higher peak flows due to warming and increasing precipitation projected for the 21st century. We will extend this preliminary analysis using observed data and simulations from 40 river basins in the Midwest to further test these hypotheses.

  4. Advances in Significance Testing for Cluster Detection

    NASA Astrophysics Data System (ADS)

    Coleman, Deidra Andrea

    Over the past two decades, much attention has been given to data driven project goals such as the Human Genome Project and the development of syndromic surveillance systems. A major component of these types of projects is analyzing the abundance of data. Detecting clusters within the data can be beneficial as it can lead to the identification of specified sequences of DNA nucleotides that are related to important biological functions or the locations of epidemics such as disease outbreaks or bioterrorism attacks. Cluster detection techniques require efficient and accurate hypothesis testing procedures. In this dissertation, we improve upon the hypothesis testing procedures for cluster detection by enhancing distributional theory and providing an alternative method for spatial cluster detection using syndromic surveillance data. In Chapter 2, we provide an efficient method to compute the exact distribution of the number and coverage of h-clumps of a collection of words. This method involves defining a Markov chain using a minimal deterministic automaton to reduce the number of states needed for computation. We allow words of the collection to contain other words of the collection making the method more general. We use our method to compute the distributions of the number and coverage of h-clumps in the Chi motif of H. influenza.. In Chapter 3, we provide an efficient algorithm to compute the exact distribution of multiple window discrete scan statistics for higher-order, multi-state Markovian sequences. This algorithm involves defining a Markov chain to efficiently keep track of probabilities needed to compute p-values of the statistic. We use our algorithm to identify cases where the available approximation does not perform well. We also use our algorithm to detect unusual clusters of made free throw shots by National Basketball Association players during the 2009-2010 regular season. In Chapter 4, we give a procedure to detect outbreaks using syndromic surveillance data while controlling the Bayesian False Discovery Rate (BFDR). The procedure entails choosing an appropriate Bayesian model that captures the spatial dependency inherent in epidemiological data and considers all days of interest, selecting a test statistic based on a chosen measure that provides the magnitude of the maximumal spatial cluster for each day, and identifying a cutoff value that controls the BFDR for rejecting the collective null hypothesis of no outbreak over a collection of days for a specified region.We use our procedure to analyze botulism-like syndrome data collected by the North Carolina Disease Event Tracking and Epidemiologic Collection Tool (NC DETECT).

  5. A rat model system to study complex disease risks, fitness, aging, and longevity.

    PubMed

    Koch, Lauren Gerard; Britton, Steven L; Wisløff, Ulrik

    2012-02-01

    The association between low exercise capacity and all-cause morbidity and mortality is statistically strong yet mechanistically unresolved. By connecting clinical observation with a theoretical base, we developed a working hypothesis that variation in capacity for oxygen metabolism is the central mechanistic determinant between disease and health (aerobic hypothesis). As an unbiased test, we show that two-way artificial selective breeding of rats for low and high intrinsic endurance exercise capacity also produces rats that differ for numerous disease risks, including the metabolic syndrome, cardiovascular complications, premature aging, and reduced longevity. This contrasting animal model system may prove to be translationally superior relative to more widely used simplistic models for understanding geriatric biology and medicine. Copyright © 2012 Elsevier Inc. All rights reserved.

  6. Random variability explains apparent global clustering of large earthquakes

    USGS Publications Warehouse

    Michael, A.J.

    2011-01-01

    The occurrence of 5 Mw ≥ 8.5 earthquakes since 2004 has created a debate over whether or not we are in a global cluster of large earthquakes, temporarily raising risks above long-term levels. I use three classes of statistical tests to determine if the record of M ≥ 7 earthquakes since 1900 can reject a null hypothesis of independent random events with a constant rate plus localized aftershock sequences. The data cannot reject this null hypothesis. Thus, the temporal distribution of large global earthquakes is well-described by a random process, plus localized aftershocks, and apparent clustering is due to random variability. Therefore the risk of future events has not increased, except within ongoing aftershock sequences, and should be estimated from the longest possible record of events.

  7. Case-based statistical learning applied to SPECT image classification

    NASA Astrophysics Data System (ADS)

    Górriz, Juan M.; Ramírez, Javier; Illán, I. A.; Martínez-Murcia, Francisco J.; Segovia, Fermín.; Salas-Gonzalez, Diego; Ortiz, A.

    2017-03-01

    Statistical learning and decision theory play a key role in many areas of science and engineering. Some examples include time series regression and prediction, optical character recognition, signal detection in communications or biomedical applications for diagnosis and prognosis. This paper deals with the topic of learning from biomedical image data in the classification problem. In a typical scenario we have a training set that is employed to fit a prediction model or learner and a testing set on which the learner is applied to in order to predict the outcome for new unseen patterns. Both processes are usually completely separated to avoid over-fitting and due to the fact that, in practice, the unseen new objects (testing set) have unknown outcomes. However, the outcome yields one of a discrete set of values, i.e. the binary diagnosis problem. Thus, assumptions on these outcome values could be established to obtain the most likely prediction model at the training stage, that could improve the overall classification accuracy on the testing set, or keep its performance at least at the level of the selected statistical classifier. In this sense, a novel case-based learning (c-learning) procedure is proposed which combines hypothesis testing from a discrete set of expected outcomes and a cross-validated classification stage.

  8. Not Just a Sum? Identifying Different Types of Interplay between Constituents in Combined Interventions

    PubMed Central

    Van Deun, Katrijn; Thorrez, Lieven; van den Berg, Robert A.; Smilde, Age K.; Van Mechelen, Iven

    2015-01-01

    Motivation Experiments in which the effect of combined manipulations is compared with the effects of their pure constituents have received a great deal of attention. Examples include the study of combination therapies and the comparison of double and single knockout model organisms. Often the effect of the combined manipulation is not a mere addition of the effects of its constituents, with quite different forms of interplay between the constituents being possible. Yet, a well-formalized taxonomy of possible forms of interplay is lacking, let alone a statistical methodology to test for their presence in empirical data. Results Starting from a taxonomy of a broad range of forms of interplay between constituents of a combined manipulation, we propose a sound statistical hypothesis testing framework to test for the presence of each particular form of interplay. We illustrate the framework with analyses of public gene expression data on the combined treatment of dendritic cells with curdlan and GM-CSF and show that these lead to valuable insights into the mode of action of the constituent treatments and their combination. Availability and Implementation R code implementing the statistical testing procedure for microarray gene expression data is available as supplementary material. The data are available from the Gene Expression Omnibus with accession number GSE32986. PMID:25965065

  9. Not Just a Sum? Identifying Different Types of Interplay between Constituents in Combined Interventions.

    PubMed

    Van Deun, Katrijn; Thorrez, Lieven; van den Berg, Robert A; Smilde, Age K; Van Mechelen, Iven

    2015-01-01

    Experiments in which the effect of combined manipulations is compared with the effects of their pure constituents have received a great deal of attention. Examples include the study of combination therapies and the comparison of double and single knockout model organisms. Often the effect of the combined manipulation is not a mere addition of the effects of its constituents, with quite different forms of interplay between the constituents being possible. Yet, a well-formalized taxonomy of possible forms of interplay is lacking, let alone a statistical methodology to test for their presence in empirical data. Starting from a taxonomy of a broad range of forms of interplay between constituents of a combined manipulation, we propose a sound statistical hypothesis testing framework to test for the presence of each particular form of interplay. We illustrate the framework with analyses of public gene expression data on the combined treatment of dendritic cells with curdlan and GM-CSF and show that these lead to valuable insights into the mode of action of the constituent treatments and their combination. R code implementing the statistical testing procedure for microarray gene expression data is available as supplementary material. The data are available from the Gene Expression Omnibus with accession number GSE32986.

  10. Statistical analysis of secondary particle distributions in relativistic nucleus-nucleus collisions

    NASA Technical Reports Server (NTRS)

    Mcguire, Stephen C.

    1987-01-01

    The use is described of several statistical techniques to characterize structure in the angular distributions of secondary particles from nucleus-nucleus collisions in the energy range 24 to 61 GeV/nucleon. The objective of this work was to determine whether there are correlations between emitted particle intensity and angle that may be used to support the existence of the quark gluon plasma. The techniques include chi-square null hypothesis tests, the method of discrete Fourier transform analysis, and fluctuation analysis. We have also used the method of composite unit vectors to test for azimuthal asymmetry in a data set of 63 JACEE-3 events. Each method is presented in a manner that provides the reader with some practical detail regarding its application. Of those events with relatively high statistics, Fe approaches 0 at 55 GeV/nucleon was found to possess an azimuthal distribution with a highly non-random structure. No evidence of non-statistical fluctuations was found in the pseudo-rapidity distributions of the events studied. It is seen that the most effective application of these methods relies upon the availability of many events or single events that possess very high multiplicities.

  11. A test to evaluate the earthquake prediction algorithm, M8

    USGS Publications Warehouse

    Healy, John H.; Kossobokov, Vladimir G.; Dewey, James W.

    1992-01-01

    A test of the algorithm M8 is described. The test is constructed to meet four rules, which we propose to be applicable to the test of any method for earthquake prediction:  1. An earthquake prediction technique should be presented as a well documented, logical algorithm that can be used by  investigators without restrictions. 2. The algorithm should be coded in a common programming language and implementable on widely available computer systems. 3. A test of the earthquake prediction technique should involve future predictions with a black box version of the algorithm in which potentially adjustable parameters are fixed in advance. The source of the input data must be defined and ambiguities in these data must be resolved automatically by the algorithm. 4. At least one reasonable null hypothesis should be stated in advance of testing the earthquake prediction method, and it should be stated how this null hypothesis will be used to estimate the statistical significance of the earthquake predictions. The M8 algorithm has successfully predicted several destructive earthquakes, in the sense that the earthquakes occurred inside regions with linear dimensions from 384 to 854 km that the algorithm had identified as being in times of increased probability for strong earthquakes. In addition, M8 has successfully "post predicted" high percentages of strong earthquakes in regions to which it has been applied in retroactive studies. The statistical significance of previous predictions has not been established, however, and post-prediction studies in general are notoriously subject to success-enhancement through hindsight. Nor has it been determined how much more precise an M8 prediction might be than forecasts and probability-of-occurrence estimates made by other techniques. We view our test of M8 both as a means to better determine the effectiveness of M8 and as an experimental structure within which to make observations that might lead to improvements in the algorithm or conceivably lead to a radically different approach to earthquake prediction.

  12. Meteor localization via statistical analysis of spatially temporal fluctuations in image sequences

    NASA Astrophysics Data System (ADS)

    Kukal, Jaromír.; Klimt, Martin; Šihlík, Jan; Fliegel, Karel

    2015-09-01

    Meteor detection is one of the most important procedures in astronomical imaging. Meteor path in Earth's atmosphere is traditionally reconstructed from double station video observation system generating 2D image sequences. However, the atmospheric turbulence and other factors cause spatially-temporal fluctuations of image background, which makes the localization of meteor path more difficult. Our approach is based on nonlinear preprocessing of image intensity using Box-Cox and logarithmic transform as its particular case. The transformed image sequences are then differentiated along discrete coordinates to obtain statistical description of sky background fluctuations, which can be modeled by multivariate normal distribution. After verification and hypothesis testing, we use the statistical model for outlier detection. Meanwhile the isolated outlier points are ignored, the compact cluster of outliers indicates the presence of meteoroids after ignition.

  13. Sex, Sport, IGF-1 and the Community Effect in Height Hypothesis

    PubMed Central

    Bogin, Barry; Hermanussen, Michael; Blum, Werner F.; Aßmann, Christian

    2015-01-01

    We test the hypothesis that differences in social status between groups of people within a population may induce variation in insulin-like growth factor-1(IGF-1) levels and, by extension, growth in height. This is called the community effect in height hypothesis. The relationship between IGF-1, assessed via finger-prick dried blood spot, and elite level sport competition outcomes were analysed for a sample of 116 undergraduate men and women. There was a statistically significant difference between winners and losers of a competition. Winners, as a group, had higher average pre-game and post-game IGF-1 levels than losers. We proposed this type of difference as a proxy for social dominance. We found no evidence that winners increased in IGF-1 levels over losers or that members of the same team were more similar in IGF-1 levels than they were to players from other teams. These findings provide limited support toward the community effect in height hypothesis. The findings are discussed in relation to the action of the growth hormone/IGF-1 axis as a transducer of multiple bio-social influences into a coherent signal which allows the growing human to adjust and adapt to local ecological conditions. PMID:25946190

  14. Sex, Sport, IGF-1 and the Community Effect in Height Hypothesis.

    PubMed

    Bogin, Barry; Hermanussen, Michael; Blum, Werner F; Aßmann, Christian

    2015-05-04

    We test the hypothesis that differences in social status between groups of people within a population may induce variation in insulin-like growth factor-1(IGF-1) levels and, by extension, growth in height. This is called the community effect in height hypothesis. The relationship between IGF-1, assessed via finger-prick dried blood spot, and elite level sport competition outcomes were analysed for a sample of 116 undergraduate men and women. There was a statistically significant difference between winners and losers of a competition. Winners, as a group, had higher average pre-game and post-game IGF-1 levels than losers. We proposed this type of difference as a proxy for social dominance. We found no evidence that winners increased in IGF-1 levels over losers or that members of the same team were more similar in IGF-1 levels than they were to players from other teams. These findings provide limited support toward the community effect in height hypothesis. The findings are discussed in relation to the action of the growth hormone/IGF-1 axis as a transducer of multiple bio-social influences into a coherent signal which allows the growing human to adjust and adapt to local ecological conditions.

  15. Singular Spectrum Analysis for Astronomical Time Series: Constructing a Parsimonious Hypothesis Test

    NASA Astrophysics Data System (ADS)

    Greco, G.; Kondrashov, D.; Kobayashi, S.; Ghil, M.; Branchesi, M.; Guidorzi, C.; Stratta, G.; Ciszak, M.; Marino, F.; Ortolan, A.

    We present a data-adaptive spectral method - Monte Carlo Singular Spectrum Analysis (MC-SSA) - and its modification to tackle astrophysical problems. Through numerical simulations we show the ability of the MC-SSA in dealing with 1/f β power-law noise affected by photon counting statistics. Such noise process is simulated by a first-order autoregressive, AR(1) process corrupted by intrinsic Poisson noise. In doing so, we statistically estimate a basic stochastic variation of the source and the corresponding fluctuations due to the quantum nature of light. In addition, MC-SSA test retains its effectiveness even when a significant percentage of the signal falls below a certain level of detection, e.g., caused by the instrument sensitivity. The parsimonious approach presented here may be broadly applied, from the search for extrasolar planets to the extraction of low-intensity coherent phenomena probably hidden in high energy transients.

  16. Zero- vs. one-dimensional, parametric vs. non-parametric, and confidence interval vs. hypothesis testing procedures in one-dimensional biomechanical trajectory analysis.

    PubMed

    Pataky, Todd C; Vanrenterghem, Jos; Robinson, Mark A

    2015-05-01

    Biomechanical processes are often manifested as one-dimensional (1D) trajectories. It has been shown that 1D confidence intervals (CIs) are biased when based on 0D statistical procedures, and the non-parametric 1D bootstrap CI has emerged in the Biomechanics literature as a viable solution. The primary purpose of this paper was to clarify that, for 1D biomechanics datasets, the distinction between 0D and 1D methods is much more important than the distinction between parametric and non-parametric procedures. A secondary purpose was to demonstrate that a parametric equivalent to the 1D bootstrap exists in the form of a random field theory (RFT) correction for multiple comparisons. To emphasize these points we analyzed six datasets consisting of force and kinematic trajectories in one-sample, paired, two-sample and regression designs. Results showed, first, that the 1D bootstrap and other 1D non-parametric CIs were qualitatively identical to RFT CIs, and all were very different from 0D CIs. Second, 1D parametric and 1D non-parametric hypothesis testing results were qualitatively identical for all six datasets. Last, we highlight the limitations of 1D CIs by demonstrating that they are complex, design-dependent, and thus non-generalizable. These results suggest that (i) analyses of 1D data based on 0D models of randomness are generally biased unless one explicitly identifies 0D variables before the experiment, and (ii) parametric and non-parametric 1D hypothesis testing provide an unambiguous framework for analysis when one׳s hypothesis explicitly or implicitly pertains to whole 1D trajectories. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Rank score and permutation testing alternatives for regression quantile estimates

    USGS Publications Warehouse

    Cade, B.S.; Richards, J.D.; Mielke, P.W.

    2006-01-01

    Performance of quantile rank score tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1) were evaluated by simulation for models with p = 2 and 6 predictors, moderate collinearity among predictors, homogeneous and hetero-geneous errors, small to moderate samples (n = 20–300), and central to upper quantiles (0.50–0.99). Test statistics evaluated were the conventional quantile rank score T statistic distributed as χ2 random variable with q degrees of freedom (where q parameters are constrained by H 0:) and an F statistic with its sampling distribution approximated by permutation. The permutation F-test maintained better Type I errors than the T-test for homogeneous error models with smaller n and more extreme quantiles τ. An F distributional approximation of the F statistic provided some improvements in Type I errors over the T-test for models with > 2 parameters, smaller n, and more extreme quantiles but not as much improvement as the permutation approximation. Both rank score tests required weighting to maintain correct Type I errors when heterogeneity under the alternative model increased to 5 standard deviations across the domain of X. A double permutation procedure was developed to provide valid Type I errors for the permutation F-test when null models were forced through the origin. Power was similar for conditions where both T- and F-tests maintained correct Type I errors but the F-test provided some power at smaller n and extreme quantiles when the T-test had no power because of excessively conservative Type I errors. When the double permutation scheme was required for the permutation F-test to maintain valid Type I errors, power was less than for the T-test with decreasing sample size and increasing quantiles. Confidence intervals on parameters and tolerance intervals for future predictions were constructed based on test inversion for an example application relating trout densities to stream channel width:depth.

  18. Mitigating active shooter impact: Analysis for policy options based on agent/computer-based modeling.

    PubMed

    Anklam, Charles; Kirby, Adam; Sharevski, Filipo; Dietz, J Eric

    2015-01-01

    Active shooting violence at confined settings, such as educational institutions, poses serious security concerns to public safety. In studying the effects of active shooter scenarios, the common denominator associated with all events, regardless of reason/intent for shooter motives, or type of weapons used, was the location chosen and time expended between the beginning of the event and its culmination. This in turn directly correlates to number of casualties incurred in any given event. The longer the event protracts, the more casualties are incurred until law enforcement or another barrier can react and culminate the situation. Using AnyLogic technology, devise modeling scenarios to test multiple hypotheses against free-agent modeling simulation to determine the best method to reduce casualties associated with active shooter scenarios. Test four possible scenarios of responding to active shooter in a public school setting using agent-based computer modeling techniques-scenario 1: basic scenario where no access control or any type of security is used within the school; scenario 2, scenario assumes that concealed carry individual(s) (5-10 percent of the work force) are present in the school; scenario 3, scenario assumes that the school has assigned resource officer; scenario 4, scenario assumes that the school has assigned resource officer and concealed carry individual(s) (5-10 percent) present in the school. Statistical data from modeling scenarios indicating which tested hypothesis resulted in fewer casualties and quicker culmination of event. The use of AnyLogic proved the initial hypothesis that a decrease on response time to an active shooter scenario directly reduced victim casualties. Modeling tests show statistically significant fewer casualties in scenarios where on scene armed responders such as resource officers and concealed carry personnel were present.

  19. A novel examination of atypical major depressive disorder based on attachment theory.

    PubMed

    Levitan, Robert D; Atkinson, Leslie; Pedersen, Rebecca; Buis, Tom; Kennedy, Sidney H; Chopra, Kevin; Leung, Eman M; Segal, Zindel V

    2009-06-01

    While a large body of descriptive work has thoroughly investigated the clinical correlates of atypical depression, little is known about its fundamental origins. This study examined atypical depression from an attachment theory framework. Our hypothesis was that, compared to adults with melancholic depression, those with atypical depression would report more anxious-ambivalent attachment and less secure attachment. As gender has been an important consideration in prior work on atypical depression, this same hypothesis was further tested in female subjects only. One hundred ninety-nine consecutive adults presenting to a tertiary mood disorders clinic with major depressive disorder with either atypical or melancholic features according to the Structured Clinical Interview for DSM-IV Axis-I Disorders were administered a self-report adult attachment questionnaire to assess the core dimensions of secure, anxious-ambivalent, and avoidant attachment. Attachment scores were compared across the 2 depressed groups defined by atypical and melancholic features using multivariate analysis of variance. The study was conducted between 1999 and 2004. When men and women were considered together, the multivariate test comparing attachment scores by depressive group was statistically significant at p < .05. Between-subjects testing indicated that atypical depression was associated with significantly lower secure attachment scores, with a trend toward higher anxious-ambivalent attachment scores, than was melancholia. When women were analyzed separately, the multivariate test was statistically significant at p < .01, with both secure and anxious-ambivalent attachment scores differing significantly across depressive groups. These preliminary findings suggest that attachment theory, and insecure and anxious-ambivalent attachment in particular, may be a useful framework from which to study the origins, clinical correlates, and treatment of atypical depression. Gender may be an important consideration when considering atypical depression from an attachment perspective. Copyright 2009 Physicians Postgraduate Press, Inc.

  20. Segment-Wise Genome-Wide Association Analysis Identifies a Candidate Region Associated with Schizophrenia in Three Independent Samples

    PubMed Central

    Rietschel, Marcella; Mattheisen, Manuel; Breuer, René; Schulze, Thomas G.; Nöthen, Markus M.; Levinson, Douglas; Shi, Jianxin; Gejman, Pablo V.; Cichon, Sven; Ophoff, Roel A.

    2012-01-01

    Recent studies suggest that variation in complex disorders (e.g., schizophrenia) is explained by a large number of genetic variants with small effect size (Odds Ratio∼1.05–1.1). The statistical power to detect these genetic variants in Genome Wide Association (GWA) studies with large numbers of cases and controls (∼15,000) is still low. As it will be difficult to further increase sample size, we decided to explore an alternative method for analyzing GWA data in a study of schizophrenia, dramatically reducing the number of statistical tests. The underlying hypothesis was that at least some of the genetic variants related to a common outcome are collocated in segments of chromosomes at a wider scale than single genes. Our approach was therefore to study the association between relatively large segments of DNA and disease status. An association test was performed for each SNP and the number of nominally significant tests in a segment was counted. We then performed a permutation-based binomial test to determine whether this region contained significantly more nominally significant SNPs than expected under the null hypothesis of no association, taking linkage into account. Genome Wide Association data of three independent schizophrenia case/control cohorts with European ancestry (Dutch, German, and US) using segments of DNA with variable length (2 to 32 Mbp) was analyzed. Using this approach we identified a region at chromosome 5q23.3-q31.3 (128–160 Mbp) that was significantly enriched with nominally associated SNPs in three independent case-control samples. We conclude that considering relatively wide segments of chromosomes may reveal reliable relationships between the genome and schizophrenia, suggesting novel methodological possibilities as well as raising theoretical questions. PMID:22723893

  1. Effect of adhesive resin flexibility on enamel fracture during metal bracket debonding: an ex vivo study.

    PubMed

    Kim, Young Kyung; Park, Hyo-Sang; Kim, Kyo-Han; Kwon, Tae-Yub

    2015-10-01

    To test the null hypothesis that neither the flexural properties of orthodontic adhesive resins nor the enamel pre-treatment methods would affect metal bracket debonding behaviours, including enamel fracture. A dimethacrylate-based resin (Transbond XT, TX) and two methyl methacrylate (MMA)-based resins (Super-Bond C&B, SB; an experimental light-cured resin, EXP) were tested. Flexural strength and flexural modulus for each resin were measured by a three-point-bending test. Metal brackets were bonded to human enamel pretreated with total-etch (TE) or self-etch adhesive using one of the three resins (a total of six groups, n = 15). After 24 hours of storage in water at 37°C, a shear bond strength (SBS) test was performed using the wire loop method. After debonding, remaining resin on the enamel surfaces and occurrence of enamel fracture were assessed. Statistical significance was set at P < 0.05. The two MMA resins exhibited substantially lower flexural strength and modulus values than the TX resin. The mean SBS values of all groups (10.15-11.09MPa) were statistically equivalent to one another (P > 0.05), except for the TE-TX group (13.51MPa, P < 0.05). The two EXP groups showed less resin remnant. Only in the two TX groups were enamel fractures observed (three cases for each group). The results were drawn only from ex vivo experiments. The hypothesis is rejected. This study suggests that a more flexible MMA resin is favourable for avoiding enamel fracture during metal bracket debonding. © The Author 2014. Published by Oxford University Press on behalf of the European Orthodontic Society. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  2. Investigating the Cosmic Web with Topological Data Analysis

    NASA Astrophysics Data System (ADS)

    Cisewski-Kehe, Jessi; Wu, Mike; Fasy, Brittany; Hellwing, Wojciech; Lovell, Mark; Rinaldo, Alessandro; Wasserman, Larry

    2018-01-01

    Data exhibiting complicated spatial structures are common in many areas of science (e.g. cosmology, biology), but can be difficult to analyze. Persistent homology is a popular approach within the area of Topological Data Analysis that offers a new way to represent, visualize, and interpret complex data by extracting topological features, which can be used to infer properties of the underlying structures. In particular, TDA may be useful for analyzing the large-scale structure (LSS) of the Universe, which is an intricate and spatially complex web of matter. In order to understand the physics of the Universe, theoretical and computational cosmologists develop large-scale simulations that allow for visualizing and analyzing the LSS under varying physical assumptions. Each point in the 3D data set represents a galaxy or a cluster of galaxies, and topological summaries ("persistent diagrams") can be obtained summarizing the different ordered holes in the data (e.g. connected components, loops, voids).The topological summaries are interesting and informative descriptors of the Universe on their own, but hypothesis tests using the topological summaries would provide a way to make more rigorous comparisons of LSS under different theoretical models. For example, the received cosmological model has cold dark matter (CDM); however, while the case is strong for CDM, there are some observational inconsistencies with this theory. Another possibility is warm dark matter (WDM). It is of interest to see if a CDM Universe and WDM Universe produce LSS that is topologically distinct.We present several possible test statistics for two-sample hypothesis tests using the topological summaries, carryout a simulation study to investigate the suitableness of the proposed test statistics using simulated data from a variation of the Voronoi foam model, and finally we apply the proposed inference framework to WDM vs. CDM cosmological simulation data.

  3. Novel statistical framework to identify differentially expressed genes allowing transcriptomic background differences.

    PubMed

    Ling, Zhi-Qiang; Wang, Yi; Mukaisho, Kenichi; Hattori, Takanori; Tatsuta, Takeshi; Ge, Ming-Hua; Jin, Li; Mao, Wei-Min; Sugihara, Hiroyuki

    2010-06-01

    Tests of differentially expressed genes (DEGs) from microarray experiments are based on the null hypothesis that genes that are irrelevant to the phenotype/stimulus are expressed equally in the target and control samples. However, this strict hypothesis is not always true, as there can be several transcriptomic background differences between target and control samples, including different cell/tissue types, different cell cycle stages and different biological donors. These differences lead to increased false positives, which have little biological/medical significance. In this article, we propose a statistical framework to identify DEGs between target and control samples from expression microarray data allowing transcriptomic background differences between these samples by introducing a modified null hypothesis that the gene expression background difference is normally distributed. We use an iterative procedure to perform robust estimation of the null hypothesis and identify DEGs as outliers. We evaluated our method using our own triplicate microarray experiment, followed by validations with reverse transcription-polymerase chain reaction (RT-PCR) and on the MicroArray Quality Control dataset. The evaluations suggest that our technique (i) results in less false positive and false negative results, as measured by the degree of agreement with RT-PCR of the same samples, (ii) can be applied to different microarray platforms and results in better reproducibility as measured by the degree of DEG identification concordance both intra- and inter-platforms and (iii) can be applied efficiently with only a few microarray replicates. Based on these evaluations, we propose that this method not only identifies more reliable and biologically/medically significant DEG, but also reduces the power-cost tradeoff problem in the microarray field. Source code and binaries freely available for download at http://comonca.org.cn/fdca/resources/softwares/deg.zip.

  4. Intact action segmentation in Parkinson's disease: Hypothesis testing using a novel computational approach.

    PubMed

    Schiffer, Anne-Marike; Nevado-Holgado, Alejo J; Johnen, Andreas; Schönberger, Anna R; Fink, Gereon R; Schubotz, Ricarda I

    2015-11-01

    Action observation is known to trigger predictions of the ongoing course of action and thus considered a hallmark example for predictive perception. A related task, which explicitly taps into the ability to predict actions based on their internal representations, is action segmentation; the task requires participants to demarcate where one action step is completed and another one begins. It thus benefits from a temporally precise prediction of the current action. Formation and exploitation of these temporal predictions of external events is now closely associated with a network including the basal ganglia and prefrontal cortex. Because decline of dopaminergic innervation leads to impaired function of the basal ganglia and prefrontal cortex in Parkinson's disease (PD), we hypothesised that PD patients would show increased temporal variability in the action segmentation task, especially under medication withdrawal (hypothesis 1). Another crucial aspect of action segmentation is its reliance on a semantic representation of actions. There is no evidence to suggest that action representations are substantially altered, or cannot be accessed, in non-demented PD patients. We therefore expected action segmentation judgments to follow the same overall patterns in PD patients and healthy controls (hypothesis 2), resulting in comparable segmentation profiles. Both hypotheses were tested with a novel classification approach. We present evidence for both hypotheses in the present study: classifier performance was slightly decreased when it was tested for its ability to predict the identity of movies segmented by PD patients, and a measure of normativity of response behaviour was decreased when patients segmented movies under medication-withdrawal without access to an episodic memory of the sequence. This pattern of results is consistent with hypothesis 1. However, the classifier analysis also revealed that responses given by patients and controls create very similar action-specific patterns, thus delivering evidence in favour hypothesis 2. In terms of methodology, the use of classifiers in the present study allowed us to establish similarity of behaviour across groups (hypothesis 2). The approach opens up a new avenue that standard statistical methods often fail to provide and is discussed in terms of its merits to measure hypothesised similarities across study populations. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. A Meta-Analysis of Motivational Interviewing Process: Technical, Relational, and Conditional Process Models of Change

    PubMed Central

    Magill, Molly; Apodaca, Timothy R.; Borsari, Brian; Gaume, Jacques; Hoadley, Ariel; Gordon, Rebecca E.F.; Tonigan, J. Scott; Moyers, Theresa

    2018-01-01

    Objective In the present meta-analysis, we test the technical and relational hypotheses of Motivational Interviewing (MI) efficacy. We also propose an a priori conditional process model where heterogeneity of technical path effect sizes should be explained by interpersonal/relational (i.e., empathy, MI Spirit) and intrapersonal (i.e., client treatment seeking status) moderators. Method A systematic review identified k = 58 reports, describing 36 primary studies and 40 effect sizes (N = 3025 participants). Statistical methods calculated the inverse variance-weighted pooled correlation coefficient for the therapist to client and the client to outcome paths across multiple target behaviors (i.e., alcohol use, other drug use, other behavior change). Results Therapist MI-consistent skills were correlated with more client change talk (r = .55, p < .001) as well as more sustain talk (r = .40, p < .001). MI-inconsistent skills were correlated with more sustain talk (r = .16, p < .001), but not change talk. When these indicators were combined into proportions, as recommended in the Motivational Interviewing Skill Code, the overall technical hypothesis was supported. Specifically, proportion MI consistency was related to higher proportion change talk (r = .11, p = .004) and higher proportion change talk was related to reductions in risk behavior at follow up (r = −.16, p < .001). When tested as two independent effects, client change talk was not significant, but sustain talk was positively associated with worse outcome (r = .19, p < .001). Finally, the relational hypothesis was not supported, but heterogeneity in technical hypothesis path effect sizes was partially explained by inter- and intra-personal moderators. Conclusions This meta-analysis provides additional support for the technical hypothesis of MI efficacy; future research on the relational hypothesis should occur in the field rather than in the context of clinical trials. PMID:29265832

  6. A meta-analysis of motivational interviewing process: Technical, relational, and conditional process models of change.

    PubMed

    Magill, Molly; Apodaca, Timothy R; Borsari, Brian; Gaume, Jacques; Hoadley, Ariel; Gordon, Rebecca E F; Tonigan, J Scott; Moyers, Theresa

    2018-02-01

    In the present meta-analysis, we test the technical and relational hypotheses of Motivational Interviewing (MI) efficacy. We also propose an a priori conditional process model where heterogeneity of technical path effect sizes should be explained by interpersonal/relational (i.e., empathy, MI Spirit) and intrapersonal (i.e., client treatment seeking status) moderators. A systematic review identified k = 58 reports, describing 36 primary studies and 40 effect sizes (N = 3,025 participants). Statistical methods calculated the inverse variance-weighted pooled correlation coefficient for the therapist to client and the client to outcome paths across multiple target behaviors (i.e., alcohol use, other drug use, other behavior change). Therapist MI-consistent skills were correlated with more client change talk (r = .55, p < .001) as well as more sustain talk (r = .40, p < .001). MI-inconsistent skills were correlated with more sustain talk (r = .16, p < .001), but not change talk. When these indicators were combined into proportions, as recommended in the Motivational Interviewing Skill Code, the overall technical hypothesis was supported. Specifically, proportion MI consistency was related to higher proportion change talk (r = .11, p = .004) and higher proportion change talk was related to reductions in risk behavior at follow up (r = -.16, p < .001). When tested as two independent effects, client change talk was not significant, but sustain talk was positively associated with worse outcome (r = .19, p < .001). Finally, the relational hypothesis was not supported, but heterogeneity in technical hypothesis path effect sizes was partially explained by inter- and intrapersonal moderators. This meta-analysis provides additional support for the technical hypothesis of MI efficacy; future research on the relational hypothesis should occur in the field rather than in the context of clinical trials. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  7. The Role Model Effect on Gender Equity: How are Female College Students Influenced by Female Teaching Assistants in Science?

    NASA Astrophysics Data System (ADS)

    Ebert, Darilyn

    The gender gap of women in science is an important and unresolved issue in higher education and occupational opportunities. The present study was motivated by the fact that there are typically fewer females than males advancing in science, and therefore fewer female science instructor role models. This observation inspired the questions: Are female college students influenced in a positive way by female science teaching assistants (TAs), and if so how can their influence be measured? The study tested the hypothesis that female TAs act as role models for female students and thereby encourage interest and increase overall performance. To test this "role model" hypothesis, the reasoning ability and self-efficacy of a sample of 724 introductory college biology students were assessed at the beginning and end of the Spring 2010 semester. Achievement was measured by exams and course work. Performance of four randomly formed groups was compared: 1) female students with female TAs, 2) male students with female TAs, 3) female students with male TAs, and 4) male students with male TAs. Based on the role model hypothesis, female students with female TAs were predicted to perform better than female students with male TAs. However, group comparisons revealed similar performances across all four groups in achievement, reasoning ability and self-efficacy. The slight differences found between the four groups in student exam and coursework scores were not statistically significant. Therefore, the results did not support the role model hypothesis. Given that both lecture professors in the present study were males, and given that professors typically have more teaching experience, finer skills and knowledge of subject matter than do TAs, a future study that includes both female science professors and female TAs, may be more likely to find support for the hypothesis.

  8. Integrating Symbolic and Statistical Methods for Testing Intelligent Systems Applications to Machine Learning and Computer Vision

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jha, Sumit Kumar; Pullum, Laura L; Ramanathan, Arvind

    Embedded intelligent systems ranging from tiny im- plantable biomedical devices to large swarms of autonomous un- manned aerial systems are becoming pervasive in our daily lives. While we depend on the flawless functioning of such intelligent systems, and often take their behavioral correctness and safety for granted, it is notoriously difficult to generate test cases that expose subtle errors in the implementations of machine learning algorithms. Hence, the validation of intelligent systems is usually achieved by studying their behavior on representative data sets, using methods such as cross-validation and bootstrapping.In this paper, we present a new testing methodology for studyingmore » the correctness of intelligent systems. Our approach uses symbolic decision procedures coupled with statistical hypothesis testing to. We also use our algorithm to analyze the robustness of a human detection algorithm built using the OpenCV open-source computer vision library. We show that the human detection implementation can fail to detect humans in perturbed video frames even when the perturbations are so small that the corresponding frames look identical to the naked eye.« less

  9. Migraine patients consistently show abnormal vestibular bedside tests.

    PubMed

    Maranhão, Eliana Teixeira; Maranhão-Filho, Péricles; Luiz, Ronir Raggio; Vincent, Maurice Borges

    2016-01-01

    Migraine and vertigo are common disorders, with lifetime prevalences of 16% and 7% respectively, and co-morbidity around 3.2%. Vestibular syndromes and dizziness occur more frequently in migraine patients. We investigated bedside clinical signs indicative of vestibular dysfunction in migraineurs. To test the hypothesis that vestibulo-ocular reflex, vestibulo-spinal reflex and fall risk (FR) responses as measured by 14 bedside tests are abnormal in migraineurs without vertigo, as compared with controls. Cross-sectional study including sixty individuals - thirty migraineurs, 25 women, 19-60 y-o; and 30 gender/age healthy paired controls. Migraineurs showed a tendency to perform worse in almost all tests, albeit only the Romberg tandem test was statistically different from controls. A combination of four abnormal tests better discriminated the two groups (93.3% specificity). Migraine patients consistently showed abnormal vestibular bedside tests when compared with controls.

  10. A comparison of dental ultrasonic technologies on subgingival calculus removal: a pilot study.

    PubMed

    Silva, Lidia Brión; Hodges, Kathleen O; Calley, Kristin Hamman; Seikel, John A

    2012-01-01

    This pilot study compared the clinical endpoints of the magnetostrictive and piezoelectric ultrasonic instruments on calculus removal. The null hypothesis stated that there is no statistically significant difference in calculus removal between the 2 instruments. A quasi-experimental pre- and post-test design was used. Eighteen participants were included. The magnetostrictive and piezoelectric ultrasonic instruments were used in 2 assigned contra-lateral quadrants on each participant. A data collector, blind to treatment assignment, assessed the calculus on 6 predetermined tooth sites before and after ultrasonic instrumentation. Calculus size was evaluated using ordinal measurements on a 4 point scale (0, 1, 2, 3). Subjects were required to have size 2 or 3 calculus deposit on the 6 predetermined sites. One clinician instrumented the pre-assigned quadrants. A maximum time of 20 minutes of instrumentation was allowed with each technology. Immediately after instrumentation, the data collector then conducted the post-test calculus evaluation. The repeated analysis of variance (ANOVA) was used to analyze the pre- and post-test calculus data (p≤0.05). The null hypothesis was accepted indicating that there is no statistically significant difference in calculus removal when comparing technologies (p≤0.05). Therefore, under similar conditions, both technologies removed the same amount of calculus. This research design could be used as a foundation for continued research in this field. Future studies include implementing this study design with a larger sample size and/or modifying the study design to include multiple clinicians who are data collectors. Also, deposit removal with periodontal maintenance patients could be explored.

  11. Increased occurrence of dental anomalies associated with infraocclusion of deciduous molars.

    PubMed

    Shalish, Miriam; Peck, Sheldon; Wasserstein, Atalia; Peck, Leena

    2010-05-01

    To test the null hypothesis that there is no relationship between infraocclusion and the occurrence of other dental anomalies in subjects selected for clear-cut infraocclusion of one or more deciduous molars. The experimental sample consisted of 99 orthodontic patients (43 from Boston, Mass, United States; 56 from Jerusalem, Israel) with at least one deciduous molar in infraocclusion greater than 1 mm vertical discrepancy, measured from the mesial marginal ridge of the first permanent molar. Panoramic radiographs and dental casts were used to determine the presence of other dental anomalies, including agenesis of permanent teeth, microdontia of maxillary lateral incisors, palatally displaced canines (PDC), and distal angulation of the mandibular second premolars (MnP2-DA). Comparative prevalence reference values were utilized and statistical testing was performed using the chi-square test (P < .05) and odds ratio. The studied dental anomalies showed two to seven times greater prevalence in the infraocclusion samples, compared with reported prevalence in reference samples. In most cases, the infraoccluded deciduous molar exfoliated eventually and the underlying premolar erupted spontaneously. In some severe phenotypes (10%), the infraoccluded deciduous molar was extracted and space was regained to allow uncomplicated eruption of the associated premolar. Statistically significant associations were observed between the presence of infraocclusion and the occurrence of tooth agenesis, microdontia of maxillary lateral incisors, PDC, and MnP2-DA. These associations support a hypothesis favoring shared causal genetic factors. Clinically, infraocclusion may be considered an early marker for the development of later appearing dental anomalies, such as tooth agenesis and PDC.

  12. Classification of HCV and HIV-1 Sequences with the Branching Index

    PubMed Central

    Hraber, Peter; Kuiken, Carla; Waugh, Mark; Geer, Shaun; Bruno, William J.; Leitner, Thomas

    2009-01-01

    SUMMARY Classification of viral sequences should be fast, objective, accurate, and reproducible. Most methods that classify sequences use either pairwise distances or phylogenetic relations, but cannot discern when a sequence is unclassifiable. The branching index (BI) combines distance and phylogeny methods to compute a ratio that quantifies how closely a query sequence clusters with a subtype clade. In the hypothesis-testing framework of statistical inference, the BI is compared with a threshold to test whether sufficient evidence exists for the query sequence to be classified among known sequences. If above the threshold, the null hypothesis of no support for the subtype relation is rejected and the sequence is taken as belonging to the subtype clade with which it clusters on the tree. This study evaluates statistical properties of the branching index for subtype classification in HCV and HIV-1. Pairs of BI values with known positive and negative test results were computed from 10,000 random fragments of reference alignments. Sampled fragments were of sufficient length to contain phylogenetic signal that groups reference sequences together properly into subtype clades. For HCV, a threshold BI of 0.71 yields 95.1% agreement with reference subtypes, with equal false positive and false negative rates. For HIV-1, a threshold of 0.66 yields 93.5% agreement. Higher thresholds can be used where lower false positive rates are required. In synthetic recombinants, regions without breakpoints are recognized accurately; regions with breakpoints do not uniquely represent any known subtype. Web-based services for viral subtype classification with the branching index are available online. PMID:18753218

  13. Flaxseed oil supplementation manipulates correlations between serum individual mol % free fatty acid levels and insulin resistance in type 2 diabetics. Insulin resistance and percent remaining pancreatic β-cell function are unaffected.

    PubMed

    Barre, D E; Mizier-Barre, K A; Griscti, O; Hafez, K

    2016-10-01

    Elevated total serum free fatty acids (FFAs) concentrations have been suggested, controversially, to enhance insulin resistance and decrease percent remaining β-cell function. However, concentrations of individual serum FFAs have never been published in terms of their relationship (correlation) to homeostatic model assessment-insulin resistance (HOMA-IR) and percent remaining β-cell function (HOMA-%β) in the type 2 diabetics (T2Ds). Alpha-linolenic acid consumption has a negative correlation with the insulin resistance, which in turn is negatively correlated with the remaining β-cell function. The primary objective was to test the hypothesis that there would be different relationship (correlation) between the blood serum individual free FFA mol % levels and HOMA-IR and/or HOMA-%β in T2D. The secondary objective was to test the hypothesis that flaxseed oil, previously being shown to be ineffective in the glycemic control in T2Ds, may alter these correlations in a statistically significant manner as well as HOMA-IR and/or HOMA-%β. Patients were recruited via a newspaper advertisement and two physicians have been employed. All the patients came to visit one and three months later for a second visit. At the second visit, the subjects were randomly assigned (double blind) to flaxseed or safflower oil treatment for three months, until the third visit. Different statistically significant correlations or trends towards among some serum individual free FFA mol % levels and HOMA-IR and HOMA-%β, pre- and post-flaxseed and safflower oil supplementation were found. However, flaxseed oil had no impact on HOMA-IR or HOMA-%β despite statistically significant alterations in correlations compared to baseline HOMA-IR. The obtained data indicate that high doses of flaxseed oil have no statistically significant effect on HOMA-IR or HOMA-%β in T2Ds, probably due to the additive effects of negative and positive correlations.

  14. Inference of median difference based on the Box-Cox model in randomized clinical trials.

    PubMed

    Maruo, K; Isogawa, N; Gosho, M

    2015-05-10

    In randomized clinical trials, many medical and biological measurements are not normally distributed and are often skewed. The Box-Cox transformation is a powerful procedure for comparing two treatment groups for skewed continuous variables in terms of a statistical test. However, it is difficult to directly estimate and interpret the location difference between the two groups on the original scale of the measurement. We propose a helpful method that infers the difference of the treatment effect on the original scale in a more easily interpretable form. We also provide statistical analysis packages that consistently include an estimate of the treatment effect, covariance adjustments, standard errors, and statistical hypothesis tests. The simulation study that focuses on randomized parallel group clinical trials with two treatment groups indicates that the performance of the proposed method is equivalent to or better than that of the existing non-parametric approaches in terms of the type-I error rate and power. We illustrate our method with cluster of differentiation 4 data in an acquired immune deficiency syndrome clinical trial. Copyright © 2015 John Wiley & Sons, Ltd.

  15. Accelerated aging effects on surface hardness and roughness of lingual retainer adhesives.

    PubMed

    Ramoglu, Sabri Ilhan; Usumez, Serdar; Buyukyilmaz, Tamer

    2008-01-01

    To test the null hypothesis that accelerated aging has no effect on the surface microhardness and roughness of two light-cured lingual retainer adhesives. Ten samples of light-cured materials, Transbond Lingual Retainer (3M Unitek) and Light Cure Retainer (Reliance) were cured with a halogen light for 40 seconds. Vickers hardness and surface roughness were measured before and after accelerated aging of 300 hours in a weathering tester. Differences between mean values were analyzed for statistical significance using a t-test. The level of statistical significance was set at P < .05. The mean Vickers hardness of Transbond Lingual Retainer was 62.8 +/- 3.5 and 79.6 +/- 4.9 before and after aging, respectively. The mean Vickers hardness of Light Cure Retainer was 40.3 +/- 2.6 and 58.3 +/- 4.3 before and after aging, respectively. Differences in both groups were statistically significant (P < .001). Following aging, mean surface roughness was changed from 0.039 microm to 0.121 microm and from 0.021 microm to 0.031 microm for Transbond Lingual Retainer and Light Cure Retainer, respectively. The roughening of Transbond Lingual Retainer with aging was statistically significant (P < .05), while the change in the surface roughness of Light Cure Retainer was not (P > .05). Accelerated aging significantly increased the surface microhardness of both light-cured retainer adhesives tested. It also significantly increased the surface roughness of the Transbond Lingual Retainer.

  16. Temperature, Not Fine Particulate Matter (PM2.5), is Causally Associated with Short-Term Acute Daily Mortality Rates: Results from One Hundred United States Cities

    PubMed Central

    Cox, Tony; Popken, Douglas; Ricci, Paolo F

    2013-01-01

    Exposures to fine particulate matter (PM2.5) in air (C) have been suspected of contributing causally to increased acute (e.g., same-day or next-day) human mortality rates (R). We tested this causal hypothesis in 100 United States cities using the publicly available NMMAPS database. Although a significant, approximately linear, statistical C-R association exists in simple statistical models, closer analysis suggests that it is not causal. Surprisingly, conditioning on other variables that have been extensively considered in previous analyses (usually using splines or other smoothers to approximate their effects), such as month of the year and mean daily temperature, suggests that they create strong, nonlinear confounding that explains the statistical association between PM2.5 and mortality rates in this data set. As this finding disagrees with conventional wisdom, we apply several different techniques to examine it. Conditional independence tests for potential causation, non-parametric classification tree analysis, Bayesian Model Averaging (BMA), and Granger-Sims causality testing, show no evidence that PM2.5 concentrations have any causal impact on increasing mortality rates. This apparent absence of a causal C-R relation, despite their statistical association, has potentially important implications for managing and communicating the uncertain health risks associated with, but not necessarily caused by, PM2.5 exposures. PMID:23983662

  17. Seeking health information on the web: positive hypothesis testing.

    PubMed

    Kayhan, Varol Onur

    2013-04-01

    The goal of this study is to investigate positive hypothesis testing among consumers of health information when they search the Web. After demonstrating the extent of positive hypothesis testing using Experiment 1, we conduct Experiment 2 to test the effectiveness of two debiasing techniques. A total of 60 undergraduate students searched a tightly controlled online database developed by the authors to test the validity of a hypothesis. The database had four abstracts that confirmed the hypothesis and three abstracts that disconfirmed it. Findings of Experiment 1 showed that majority of participants (85%) exhibited positive hypothesis testing. In Experiment 2, we found that the recommendation technique was not effective in reducing positive hypothesis testing since none of the participants assigned to this server could retrieve disconfirming evidence. Experiment 2 also showed that the incorporation technique successfully reduced positive hypothesis testing since 75% of the participants could retrieve disconfirming evidence. Positive hypothesis testing on the Web is an understudied topic. More studies are needed to validate the effectiveness of the debiasing techniques discussed in this study and develop new techniques. Search engine developers should consider developing new options for users so that both confirming and disconfirming evidence can be presented in search results as users test hypotheses using search engines. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  18. Trait humor and longevity: do comics have the last laugh?

    PubMed

    Rotton, J

    1992-01-01

    Four sets of biographical data were analyzed in order to test the hypothesis that the ability to generate humor is associated with longevity. Although steps were taken to ensure that tests had high levels of statistical power, analyses provided very little support for the idea that individuals with a well-developed sense of humor live longer than serious writers and other entertainers. In addition, a subsidiary analysis revealed that those in the business of entertaining others died at an earlier age than those in other lines of endeavor. These findings suggest that researchers should turn their attention from trait humor to the effects of humorous material.

  19. Two Methods of Automatic Evaluation of Speech Signal Enhancement Recorded in the Open-Air MRI Environment

    NASA Astrophysics Data System (ADS)

    Přibil, Jiří; Přibilová, Anna; Frollo, Ivan

    2017-12-01

    The paper focuses on two methods of evaluation of successfulness of speech signal enhancement recorded in the open-air magnetic resonance imager during phonation for the 3D human vocal tract modeling. The first approach enables to obtain a comparison based on statistical analysis by ANOVA and hypothesis tests. The second method is based on classification by Gaussian mixture models (GMM). The performed experiments have confirmed that the proposed ANOVA and GMM classifiers for automatic evaluation of the speech quality are functional and produce fully comparable results with the standard evaluation based on the listening test method.

  20. Computer-aided auditing of prescription drug claims.

    PubMed

    Iyengar, Vijay S; Hermiz, Keith B; Natarajan, Ramesh

    2014-09-01

    We describe a methodology for identifying and ranking candidate audit targets from a database of prescription drug claims. The relevant audit targets may include various entities such as prescribers, patients and pharmacies, who exhibit certain statistical behavior indicative of potential fraud and abuse over the prescription claims during a specified period of interest. Our overall approach is consistent with related work in statistical methods for detection of fraud and abuse, but has a relative emphasis on three specific aspects: first, based on the assessment of domain experts, certain focus areas are selected and data elements pertinent to the audit analysis in each focus area are identified; second, specialized statistical models are developed to characterize the normalized baseline behavior in each focus area; and third, statistical hypothesis testing is used to identify entities that diverge significantly from their expected behavior according to the relevant baseline model. The application of this overall methodology to a prescription claims database from a large health plan is considered in detail.

Top