Sample records for albeit statistically significant

  1. Common pitfalls in statistical analysis: Clinical versus statistical significance

    PubMed Central

    Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc

    2015-01-01

    In clinical research, study results, which are statistically significant are often interpreted as being clinically important. While statistical significance indicates the reliability of the study results, clinical significance reflects its impact on clinical practice. The third article in this series exploring pitfalls in statistical analysis clarifies the importance of differentiating between statistical significance and clinical significance. PMID:26229754

  2. Statistical Significance Testing from Three Perspectives and Interpreting Statistical Significance and Nonsignificance and the Role of Statistics in Research.

    ERIC Educational Resources Information Center

    Levin, Joel R.; And Others

    1993-01-01

    Journal editors respond to criticisms of reliance on statistical significance in research reporting. Joel R. Levin ("Journal of Educational Psychology") defends its use, whereas William D. Schafer ("Measurement and Evaluation in Counseling and Development") emphasizes the distinction between statistically significant and important. William Asher…

  3. Visualizing statistical significance of disease clusters using cartograms.

    PubMed

    Kronenfeld, Barry J; Wong, David W S

    2017-05-15

    Health officials and epidemiological researchers often use maps of disease rates to identify potential disease clusters. Because these maps exaggerate the prominence of low-density districts and hide potential clusters in urban (high-density) areas, many researchers have used density-equalizing maps (cartograms) as a basis for epidemiological mapping. However, we do not have existing guidelines for visual assessment of statistical uncertainty. To address this shortcoming, we develop techniques for visual determination of statistical significance of clusters spanning one or more districts on a cartogram. We developed the techniques within a geovisual analytics framework that does not rely on automated significance testing, and can therefore facilitate visual analysis to detect clusters that automated techniques might miss. On a cartogram of the at-risk population, the statistical significance of a disease cluster is determinate from the rate, area and shape of the cluster under standard hypothesis testing scenarios. We develop formulae to determine, for a given rate, the area required for statistical significance of a priori and a posteriori designated regions under certain test assumptions. Uniquely, our approach enables dynamic inference of aggregate regions formed by combining individual districts. The method is implemented in interactive tools that provide choropleth mapping, automated legend construction and dynamic search tools to facilitate cluster detection and assessment of the validity of tested assumptions. A case study of leukemia incidence analysis in California demonstrates the ability to visually distinguish between statistically significant and insignificant regions. The proposed geovisual analytics approach enables intuitive visual assessment of statistical significance of arbitrarily defined regions on a cartogram. Our research prompts a broader discussion of the role of geovisual exploratory analyses in disease mapping and the appropriate

  4. Significance levels for studies with correlated test statistics.

    PubMed

    Shi, Jianxin; Levinson, Douglas F; Whittemore, Alice S

    2008-07-01

    When testing large numbers of null hypotheses, one needs to assess the evidence against the global null hypothesis that none of the hypotheses is false. Such evidence typically is based on the test statistic of the largest magnitude, whose statistical significance is evaluated by permuting the sample units to simulate its null distribution. Efron (2007) has noted that correlation among the test statistics can induce substantial interstudy variation in the shapes of their histograms, which may cause misleading tail counts. Here, we show that permutation-based estimates of the overall significance level also can be misleading when the test statistics are correlated. We propose that such estimates be conditioned on a simple measure of the spread of the observed histogram, and we provide a method for obtaining conditional significance levels. We justify this conditioning using the conditionality principle described by Cox and Hinkley (1974). Application of the method to gene expression data illustrates the circumstances when conditional significance levels are needed.

  5. Statistical Significance vs. Practical Significance: An Exploration through Health Education

    ERIC Educational Resources Information Center

    Rosen, Brittany L.; DeMaria, Andrea L.

    2012-01-01

    The purpose of this paper is to examine the differences between statistical and practical significance, including strengths and criticisms of both methods, as well as provide information surrounding the application of various effect sizes and confidence intervals within health education research. Provided are recommendations, explanations and…

  6. Health significance and statistical uncertainty. The value of P-value.

    PubMed

    Consonni, Dario; Bertazzi, Pier Alberto

    2017-10-27

    The P-value is widely used as a summary statistics of scientific results. Unfortunately, there is a widespread tendency to dichotomize its value in "P<0.05" (defined as "statistically significant") and "P>0.05" ("statistically not significant"), with the former implying a "positive" result and the latter a "negative" one. To show the unsuitability of such an approach when evaluating the effects of environmental and occupational risk factors. We provide examples of distorted use of P-value and of the negative consequences for science and public health of such a black-and-white vision. The rigid interpretation of P-value as a dichotomy favors the confusion between health relevance and statistical significance, discourages thoughtful thinking, and distorts attention from what really matters, the health significance. A much better way to express and communicate scientific results involves reporting effect estimates (e.g., risks, risks ratios or risk differences) and their confidence intervals (CI), which summarize and convey both health significance and statistical uncertainty. Unfortunately, many researchers do not usually consider the whole interval of CI but only examine if it includes the null-value, therefore degrading this procedure to the same P-value dichotomy (statistical significance or not). In reporting statistical results of scientific research present effects estimates with their confidence intervals and do not qualify the P-value as "significant" or "not significant".

  7. The questioned p value: clinical, practical and statistical significance.

    PubMed

    Jiménez-Paneque, Rosa

    2016-09-09

    The use of p-value and statistical significance have been questioned since the early 80s in the last century until today. Much has been discussed about it in the field of statistics and its applications, especially in Epidemiology and Public Health. As a matter of fact, the p-value and its equivalent, statistical significance, are difficult concepts to grasp for the many health professionals some way involved in research applied to their work areas. However, its meaning should be clear in intuitive terms although it is based on theoretical concepts of the field of Statistics. This paper attempts to present the p-value as a concept that applies to everyday life and therefore intuitively simple but whose proper use cannot be separated from theoretical and methodological elements of inherent complexity. The reasons behind the criticism received by the p-value and its isolated use are intuitively explained, mainly the need to demarcate statistical significance from clinical significance and some of the recommended remedies for these problems are approached as well. It finally refers to the current trend to vindicate the p-value appealing to the convenience of its use in certain situations and the recent statement of the American Statistical Association in this regard.

  8. Tests of Statistical Significance Made Sound

    ERIC Educational Resources Information Center

    Haig, Brian D.

    2017-01-01

    This article considers the nature and place of tests of statistical significance (ToSS) in science, with particular reference to psychology. Despite the enormous amount of attention given to this topic, psychology's understanding of ToSS remains deficient. The major problem stems from a widespread and uncritical acceptance of null hypothesis…

  9. Statistically significant relational data mining :

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann

    This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publicationsmore » that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.« less

  10. Précis of statistical significance: rationale, validity, and utility.

    PubMed

    Chow, S L

    1998-04-01

    The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of

  11. The Use of Meta-Analytic Statistical Significance Testing

    ERIC Educational Resources Information Center

    Polanin, Joshua R.; Pigott, Terri D.

    2015-01-01

    Meta-analysis multiplicity, the concept of conducting multiple tests of statistical significance within one review, is an underdeveloped literature. We address this issue by considering how Type I errors can impact meta-analytic results, suggest how statistical power may be affected through the use of multiplicity corrections, and propose how…

  12. Common pitfalls in statistical analysis: “P” values, statistical significance and confidence intervals

    PubMed Central

    Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc

    2015-01-01

    In the second part of a series on pitfalls in statistical analysis, we look at various ways in which a statistically significant study result can be expressed. We debunk some of the myths regarding the ‘P’ value, explain the importance of ‘confidence intervals’ and clarify the importance of including both values in a paper PMID:25878958

  13. Finding Statistically Significant Communities in Networks

    PubMed Central

    Lancichinetti, Andrea; Radicchi, Filippo; Ramasco, José J.; Fortunato, Santo

    2011-01-01

    Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks. PMID:21559480

  14. Statistical significance of combinatorial regulations

    PubMed Central

    Terada, Aika; Okada-Hatakeyama, Mariko; Tsuda, Koji; Sese, Jun

    2013-01-01

    More than three transcription factors often work together to enable cells to respond to various signals. The detection of combinatorial regulation by multiple transcription factors, however, is not only computationally nontrivial but also extremely unlikely because of multiple testing correction. The exponential growth in the number of tests forces us to set a strict limit on the maximum arity. Here, we propose an efficient branch-and-bound algorithm called the “limitless arity multiple-testing procedure” (LAMP) to count the exact number of testable combinations and calibrate the Bonferroni factor to the smallest possible value. LAMP lists significant combinations without any limit, whereas the family-wise error rate is rigorously controlled under the threshold. In the human breast cancer transcriptome, LAMP discovered statistically significant combinations of as many as eight binding motifs. This method may contribute to uncover pathways regulated in a coordinated fashion and find hidden associations in heterogeneous data. PMID:23882073

  15. Assigning statistical significance to proteotypic peptides via database searches

    PubMed Central

    Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo

    2011-01-01

    Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId’s knowledge database to include proteotypic information, utilized RAId’s statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId’s programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. PMID:21055489

  16. Testing statistical isotropy in cosmic microwave background polarization maps

    NASA Astrophysics Data System (ADS)

    Rath, Pranati K.; Samal, Pramoda Kumar; Panda, Srikanta; Mishra, Debesh D.; Aluri, Pavan K.

    2018-04-01

    We apply our symmetry based Power tensor technique to test conformity of PLANCK Polarization maps with statistical isotropy. On a wide range of angular scales (l = 40 - 150), our preliminary analysis detects many statistically anisotropic multipoles in foreground cleaned full sky PLANCK polarization maps viz., COMMANDER and NILC. We also study the effect of residual foregrounds that may still be present in the Galactic plane using both common UPB77 polarization mask, as well as the individual component separation method specific polarization masks. However, some of the statistically anisotropic modes still persist, albeit significantly in NILC map. We further probed the data for any coherent alignments across multipoles in several bins from the chosen multipole range.

  17. Determining the Statistical Significance of Relative Weights

    ERIC Educational Resources Information Center

    Tonidandel, Scott; LeBreton, James M.; Johnson, Jeff W.

    2009-01-01

    Relative weight analysis is a procedure for estimating the relative importance of correlated predictors in a regression equation. Because the sampling distribution of relative weights is unknown, researchers using relative weight analysis are unable to make judgments regarding the statistical significance of the relative weights. J. W. Johnson…

  18. How many spectral lines are statistically significant?

    NASA Astrophysics Data System (ADS)

    Freund, J.

    When experimental line spectra are fitted with least squares techniques one frequently does not know whether n or n + 1 lines may be fitted safely. This paper shows how an F-test can be applied in order to determine the statistical significance of including an extra line into the fitting routine.

  19. Statistical significance test for transition matrices of atmospheric Markov chains

    NASA Technical Reports Server (NTRS)

    Vautard, Robert; Mo, Kingtse C.; Ghil, Michael

    1990-01-01

    Low-frequency variability of large-scale atmospheric dynamics can be represented schematically by a Markov chain of multiple flow regimes. This Markov chain contains useful information for the long-range forecaster, provided that the statistical significance of the associated transition matrix can be reliably tested. Monte Carlo simulation yields a very reliable significance test for the elements of this matrix. The results of this test agree with previously used empirical formulae when each cluster of maps identified as a distinct flow regime is sufficiently large and when they all contain a comparable number of maps. Monte Carlo simulation provides a more reliable way to test the statistical significance of transitions to and from small clusters. It can determine the most likely transitions, as well as the most unlikely ones, with a prescribed level of statistical significance.

  20. Advances in Testing the Statistical Significance of Mediation Effects

    ERIC Educational Resources Information Center

    Mallinckrodt, Brent; Abraham, W. Todd; Wei, Meifen; Russell, Daniel W.

    2006-01-01

    P. A. Frazier, A. P. Tix, and K. E. Barron (2004) highlighted a normal theory method popularized by R. M. Baron and D. A. Kenny (1986) for testing the statistical significance of indirect effects (i.e., mediator variables) in multiple regression contexts. However, simulation studies suggest that this method lacks statistical power relative to some…

  1. Assessment of statistical significance and clinical relevance.

    PubMed

    Kieser, Meinhard; Friede, Tim; Gondan, Matthias

    2013-05-10

    In drug development, it is well accepted that a successful study will demonstrate not only a statistically significant result but also a clinically relevant effect size. Whereas standard hypothesis tests are used to demonstrate the former, it is less clear how the latter should be established. In the first part of this paper, we consider the responder analysis approach and study the performance of locally optimal rank tests when the outcome distribution is a mixture of responder and non-responder distributions. We find that these tests are quite sensitive to their planning assumptions and have therefore not really any advantage over standard tests such as the t-test and the Wilcoxon-Mann-Whitney test, which perform overall well and can be recommended for applications. In the second part, we present a new approach to the assessment of clinical relevance based on the so-called relative effect (or probabilistic index) and derive appropriate sample size formulae for the design of studies aiming at demonstrating both a statistically significant and clinically relevant effect. Referring to recent studies in multiple sclerosis, we discuss potential issues in the application of this approach. Copyright © 2012 John Wiley & Sons, Ltd.

  2. Increasing the statistical significance of entanglement detection in experiments.

    PubMed

    Jungnitsch, Bastian; Niekamp, Sönke; Kleinmann, Matthias; Gühne, Otfried; Lu, He; Gao, Wei-Bo; Chen, Yu-Ao; Chen, Zeng-Bing; Pan, Jian-Wei

    2010-05-28

    Entanglement is often verified by a violation of an inequality like a Bell inequality or an entanglement witness. Considerable effort has been devoted to the optimization of such inequalities in order to obtain a high violation. We demonstrate theoretically and experimentally that such an optimization does not necessarily lead to a better entanglement test, if the statistical error is taken into account. Theoretically, we show for different error models that reducing the violation of an inequality can improve the significance. Experimentally, we observe this phenomenon in a four-photon experiment, testing the Mermin and Ardehali inequality for different levels of noise. Furthermore, we provide a way to develop entanglement tests with high statistical significance.

  3. Testing the Difference of Correlated Agreement Coefficients for Statistical Significance

    ERIC Educational Resources Information Center

    Gwet, Kilem L.

    2016-01-01

    This article addresses the problem of testing the difference between two correlated agreement coefficients for statistical significance. A number of authors have proposed methods for testing the difference between two correlated kappa coefficients, which require either the use of resampling methods or the use of advanced statistical modeling…

  4. Statistical significance of trace evidence matches using independent physicochemical measurements

    NASA Astrophysics Data System (ADS)

    Almirall, Jose R.; Cole, Michael; Furton, Kenneth G.; Gettinby, George

    1997-02-01

    A statistical approach to the significance of glass evidence is proposed using independent physicochemical measurements and chemometrics. Traditional interpretation of the significance of trace evidence matches or exclusions relies on qualitative descriptors such as 'indistinguishable from,' 'consistent with,' 'similar to' etc. By performing physical and chemical measurements with are independent of one another, the significance of object exclusions or matches can be evaluated statistically. One of the problems with this approach is that the human brain is excellent at recognizing and classifying patterns and shapes but performs less well when that object is represented by a numerical list of attributes. Chemometrics can be employed to group similar objects using clustering algorithms and provide statistical significance in a quantitative manner. This approach is enhanced when population databases exist or can be created and the data in question can be evaluated given these databases. Since the selection of the variables used and their pre-processing can greatly influence the outcome, several different methods could be employed in order to obtain a more complete picture of the information contained in the data. Presently, we report on the analysis of glass samples using refractive index measurements and the quantitative analysis of the concentrations of the metals: Mg, Al, Ca, Fe, Mn, Ba, Sr, Ti and Zr. The extension of this general approach to fiber and paint comparisons also is discussed. This statistical approach should not replace the current interpretative approaches to trace evidence matches or exclusions but rather yields an additional quantitative measure. The lack of sufficient general population databases containing the needed physicochemical measurements and the potential for confusion arising from statistical analysis currently hamper this approach and ways of overcoming these obstacles are presented.

  5. Estimation of the geochemical threshold and its statistical significance

    USGS Publications Warehouse

    Miesch, A.T.

    1981-01-01

    A statistic is proposed for estimating the geochemical threshold and its statistical significance, or it may be used to identify a group of extreme values that can be tested for significance by other means. The statistic is the maximum gap between adjacent values in an ordered array after each gap has been adjusted for the expected frequency. The values in the ordered array are geochemical values transformed by either ln(?? - ??) or ln(?? - ??) and then standardized so that the mean is zero and the variance is unity. The expected frequency is taken from a fitted normal curve with unit area. The midpoint of an adjusted gap that exceeds the corresponding critical value may be taken as an estimate of the geochemical threshold, and the associated probability indicates the likelihood that the threshold separates two geochemical populations. The adjusted gap test may fail to identify threshold values if the variation tends to be continuous from background values to the higher values that reflect mineralized ground. However, the test will serve to identify other anomalies that may be too subtle to have been noted by other means. ?? 1981.

  6. Decadal power in land air temperatures: Is it statistically significant?

    NASA Astrophysics Data System (ADS)

    Thejll, Peter A.

    2001-12-01

    The geographical distribution and properties of the well-known 10-11 year signal in terrestrial temperature records is investigated. By analyzing the Global Historical Climate Network data for surface air temperatures we verify that the signal is strongest in North America and is similar in nature to that reported earlier by R. G. Currie. The decadal signal is statistically significant for individual stations, but it is not possible to show that the signal is statistically significant globally, using strict tests. In North America, during the twentieth century, the decadal variability in the solar activity cycle is associated with the decadal part of the North Atlantic Oscillation index series in such a way that both of these signals correspond to the same spatial pattern of cooling and warming. A method for testing statistical results with Monte Carlo trials on data fields with specified temporal structure and specific spatial correlation retained is presented.

  7. Statistical Significance for Hierarchical Clustering

    PubMed Central

    Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

    2017-01-01

    Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990

  8. Detecting Statistically Significant Communities of Triangle Motifs in Undirected Networks

    DTIC Science & Technology

    2016-04-26

    REPORT TYPE Final 3. DATES COVERED (From - To) 15 Oct 2014 to 14 Jan 2015 4. TITLE AND SUBTITLE Detecting statistically significant clusters of...extend the work of Perry et al. [6] by developing a statistical framework that supports the detection of triangle motif-based clusters in complex...priori, the need for triangle motif-based clustering . 2. Developed an algorithm for clustering undirected networks, where the triangle con guration was

  9. Sibling Competition & Growth Tradeoffs. Biological vs. Statistical Significance.

    PubMed

    Kramer, Karen L; Veile, Amanda; Otárola-Castillo, Erik

    2016-01-01

    Early childhood growth has many downstream effects on future health and reproduction and is an important measure of offspring quality. While a tradeoff between family size and child growth outcomes is theoretically predicted in high-fertility societies, empirical evidence is mixed. This is often attributed to phenotypic variation in parental condition. However, inconsistent study results may also arise because family size confounds the potentially differential effects that older and younger siblings can have on young children's growth. Additionally, inconsistent results might reflect that the biological significance associated with different growth trajectories is poorly understood. This paper addresses these concerns by tracking children's monthly gains in height and weight from weaning to age five in a high fertility Maya community. We predict that: 1) as an aggregate measure family size will not have a major impact on child growth during the post weaning period; 2) competition from young siblings will negatively impact child growth during the post weaning period; 3) however because of their economic value, older siblings will have a negligible effect on young children's growth. Accounting for parental condition, we use linear mixed models to evaluate the effects that family size, younger and older siblings have on children's growth. Congruent with our expectations, it is younger siblings who have the most detrimental effect on children's growth. While we find statistical evidence of a quantity/quality tradeoff effect, the biological significance of these results is negligible in early childhood. Our findings help to resolve why quantity/quality studies have had inconsistent results by showing that sibling competition varies with sibling age composition, not just family size, and that biological significance is distinct from statistical significance.

  10. Your Chi-Square Test Is Statistically Significant: Now What?

    ERIC Educational Resources Information Center

    Sharpe, Donald

    2015-01-01

    Applied researchers have employed chi-square tests for more than one hundred years. This paper addresses the question of how one should follow a statistically significant chi-square test result in order to determine the source of that result. Four approaches were evaluated: calculating residuals, comparing cells, ransacking, and partitioning. Data…

  11. Statistical Significance Testing in Second Language Research: Basic Problems and Suggestions for Reform

    ERIC Educational Resources Information Center

    Norris, John M.

    2015-01-01

    Traditions of statistical significance testing in second language (L2) quantitative research are strongly entrenched in how researchers design studies, select analyses, and interpret results. However, statistical significance tests using "p" values are commonly misinterpreted by researchers, reviewers, readers, and others, leading to…

  12. Sibling Competition & Growth Tradeoffs. Biological vs. Statistical Significance

    PubMed Central

    Kramer, Karen L.; Veile, Amanda; Otárola-Castillo, Erik

    2016-01-01

    Early childhood growth has many downstream effects on future health and reproduction and is an important measure of offspring quality. While a tradeoff between family size and child growth outcomes is theoretically predicted in high-fertility societies, empirical evidence is mixed. This is often attributed to phenotypic variation in parental condition. However, inconsistent study results may also arise because family size confounds the potentially differential effects that older and younger siblings can have on young children’s growth. Additionally, inconsistent results might reflect that the biological significance associated with different growth trajectories is poorly understood. This paper addresses these concerns by tracking children’s monthly gains in height and weight from weaning to age five in a high fertility Maya community. We predict that: 1) as an aggregate measure family size will not have a major impact on child growth during the post weaning period; 2) competition from young siblings will negatively impact child growth during the post weaning period; 3) however because of their economic value, older siblings will have a negligible effect on young children’s growth. Accounting for parental condition, we use linear mixed models to evaluate the effects that family size, younger and older siblings have on children’s growth. Congruent with our expectations, it is younger siblings who have the most detrimental effect on children’s growth. While we find statistical evidence of a quantity/quality tradeoff effect, the biological significance of these results is negligible in early childhood. Our findings help to resolve why quantity/quality studies have had inconsistent results by showing that sibling competition varies with sibling age composition, not just family size, and that biological significance is distinct from statistical significance. PMID:26938742

  13. Quantification and statistical significance analysis of group separation in NMR-based metabonomics studies

    PubMed Central

    Goodpaster, Aaron M.; Kennedy, Michael A.

    2015-01-01

    Currently, no standard metrics are used to quantify cluster separation in PCA or PLS-DA scores plots for metabonomics studies or to determine if cluster separation is statistically significant. Lack of such measures makes it virtually impossible to compare independent or inter-laboratory studies and can lead to confusion in the metabonomics literature when authors putatively identify metabolites distinguishing classes of samples based on visual and qualitative inspection of scores plots that exhibit marginal separation. While previous papers have addressed quantification of cluster separation in PCA scores plots, none have advocated routine use of a quantitative measure of separation that is supported by a standard and rigorous assessment of whether or not the cluster separation is statistically significant. Here quantification and statistical significance of separation of group centroids in PCA and PLS-DA scores plots are considered. The Mahalanobis distance is used to quantify the distance between group centroids, and the two-sample Hotelling's T2 test is computed for the data, related to an F-statistic, and then an F-test is applied to determine if the cluster separation is statistically significant. We demonstrate the value of this approach using four datasets containing various degrees of separation, ranging from groups that had no apparent visual cluster separation to groups that had no visual cluster overlap. Widespread adoption of such concrete metrics to quantify and evaluate the statistical significance of PCA and PLS-DA cluster separation would help standardize reporting of metabonomics data. PMID:26246647

  14. Social significance of community structure: Statistical view

    NASA Astrophysics Data System (ADS)

    Li, Hui-Jia; Daniels, Jasmine J.

    2015-01-01

    Community structure analysis is a powerful tool for social networks that can simplify their topological and functional analysis considerably. However, since community detection methods have random factors and real social networks obtained from complex systems always contain error edges, evaluating the significance of a partitioned community structure is an urgent and important question. In this paper, integrating the specific characteristics of real society, we present a framework to analyze the significance of a social community. The dynamics of social interactions are modeled by identifying social leaders and corresponding hierarchical structures. Instead of a direct comparison with the average outcome of a random model, we compute the similarity of a given node with the leader by the number of common neighbors. To determine the membership vector, an efficient community detection algorithm is proposed based on the position of the nodes and their corresponding leaders. Then, using a log-likelihood score, the tightness of the community can be derived. Based on the distribution of community tightness, we establish a connection between p -value theory and network analysis, and then we obtain a significance measure of statistical form . Finally, the framework is applied to both benchmark networks and real social networks. Experimental results show that our work can be used in many fields, such as determining the optimal number of communities, analyzing the social significance of a given community, comparing the performance among various algorithms, etc.

  15. Statistical significance versus clinical relevance.

    PubMed

    van Rijn, Marieke H C; Bech, Anneke; Bouyer, Jean; van den Brand, Jan A J G

    2017-04-01

    In March this year, the American Statistical Association (ASA) posted a statement on the correct use of P-values, in response to a growing concern that the P-value is commonly misused and misinterpreted. We aim to translate these warnings given by the ASA into a language more easily understood by clinicians and researchers without a deep background in statistics. Moreover, we intend to illustrate the limitations of P-values, even when used and interpreted correctly, and bring more attention to the clinical relevance of study findings using two recently reported studies as examples. We argue that P-values are often misinterpreted. A common mistake is saying that P < 0.05 means that the null hypothesis is false, and P ≥0.05 means that the null hypothesis is true. The correct interpretation of a P-value of 0.05 is that if the null hypothesis were indeed true, a similar or more extreme result would occur 5% of the times upon repeating the study in a similar sample. In other words, the P-value informs about the likelihood of the data given the null hypothesis and not the other way around. A possible alternative related to the P-value is the confidence interval (CI). It provides more information on the magnitude of an effect and the imprecision with which that effect was estimated. However, there is no magic bullet to replace P-values and stop erroneous interpretation of scientific results. Scientists and readers alike should make themselves familiar with the correct, nuanced interpretation of statistical tests, P-values and CIs. © The Author 2017. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.

  16. Statistical Significance and Effect Size: Two Sides of a Coin.

    ERIC Educational Resources Information Center

    Fan, Xitao

    This paper suggests that statistical significance testing and effect size are two sides of the same coin; they complement each other, but do not substitute for one another. Good research practice requires that both should be taken into consideration to make sound quantitative decisions. A Monte Carlo simulation experiment was conducted, and a…

  17. Publication of statistically significant research findings in prosthodontics & implant dentistry in the context of other dental specialties.

    PubMed

    Papageorgiou, Spyridon N; Kloukos, Dimitrios; Petridis, Haralampos; Pandis, Nikolaos

    2015-10-01

    To assess the hypothesis that there is excessive reporting of statistically significant studies published in prosthodontic and implantology journals, which could indicate selective publication. The last 30 issues of 9 journals in prosthodontics and implant dentistry were hand-searched for articles with statistical analyses. The percentages of significant and non-significant results were tabulated by parameter of interest. Univariable/multivariable logistic regression analyses were applied to identify possible predictors of reporting statistically significance findings. The results of this study were compared with similar studies in dentistry with random-effects meta-analyses. From the 2323 included studies 71% of them reported statistically significant results, with the significant results ranging from 47% to 86%. Multivariable modeling identified that geographical area and involvement of statistician were predictors of statistically significant results. Compared to interventional studies, the odds that in vitro and observational studies would report statistically significant results was increased by 1.20 times (OR: 2.20, 95% CI: 1.66-2.92) and 0.35 times (OR: 1.35, 95% CI: 1.05-1.73), respectively. The probability of statistically significant results from randomized controlled trials was significantly lower compared to various study designs (difference: 30%, 95% CI: 11-49%). Likewise the probability of statistically significant results in prosthodontics and implant dentistry was lower compared to other dental specialties, but this result did not reach statistical significant (P>0.05). The majority of studies identified in the fields of prosthodontics and implant dentistry presented statistically significant results. The same trend existed in publications of other specialties in dentistry. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Significant Statistics: Viewed with a Contextual Lens

    ERIC Educational Resources Information Center

    Tait-McCutcheon, Sandi

    2010-01-01

    This paper examines the pedagogical and organisational changes three lead teachers made to their statistics teaching and learning programs. The lead teachers posed the research question: What would the effect of contextually integrating statistical investigations and literacies into other curriculum areas be on student achievement? By finding the…

  19. "What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"

    ERIC Educational Resources Information Center

    Ozturk, Elif

    2012-01-01

    The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…

  20. Statistical significance of the rich-club phenomenon in complex networks

    NASA Astrophysics Data System (ADS)

    Jiang, Zhi-Qiang; Zhou, Wei-Xing

    2008-04-01

    We propose that the rich-club phenomenon in complex networks should be defined in the spirit of bootstrapping, in which a null model is adopted to assess the statistical significance of the rich-club detected. Our method can serve as a definition of the rich-club phenomenon and is applied to analyze three real networks and three model networks. The results show significant improvement compared with previously reported results. We report a dilemma with an exceptional example, showing that there does not exist an omnipotent definition for the rich-club phenomenon.

  1. Using the bootstrap to establish statistical significance for relative validity comparisons among patient-reported outcome measures

    PubMed Central

    2013-01-01

    Background Relative validity (RV), a ratio of ANOVA F-statistics, is often used to compare the validity of patient-reported outcome (PRO) measures. We used the bootstrap to establish the statistical significance of the RV and to identify key factors affecting its significance. Methods Based on responses from 453 chronic kidney disease (CKD) patients to 16 CKD-specific and generic PRO measures, RVs were computed to determine how well each measure discriminated across clinically-defined groups of patients compared to the most discriminating (reference) measure. Statistical significance of RV was quantified by the 95% bootstrap confidence interval. Simulations examined the effects of sample size, denominator F-statistic, correlation between comparator and reference measures, and number of bootstrap replicates. Results The statistical significance of the RV increased as the magnitude of denominator F-statistic increased or as the correlation between comparator and reference measures increased. A denominator F-statistic of 57 conveyed sufficient power (80%) to detect an RV of 0.6 for two measures correlated at r = 0.7. Larger denominator F-statistics or higher correlations provided greater power. Larger sample size with a fixed denominator F-statistic or more bootstrap replicates (beyond 500) had minimal impact. Conclusions The bootstrap is valuable for establishing the statistical significance of RV estimates. A reasonably large denominator F-statistic (F > 57) is required for adequate power when using the RV to compare the validity of measures with small or moderate correlations (r < 0.7). Substantially greater power can be achieved when comparing measures of a very high correlation (r > 0.9). PMID:23721463

  2. Fostering Students' Statistical Literacy through Significant Learning Experience

    ERIC Educational Resources Information Center

    Krishnan, Saras

    2015-01-01

    A major objective of statistics education is to develop students' statistical literacy that enables them to be educated users of data in context. Teaching statistics in today's educational settings is not an easy feat because teachers have a huge task in keeping up with the demands of the new generation of learners. The present day students have…

  3. Distinguishing between statistical significance and practical/clinical meaningfulness using statistical inference.

    PubMed

    Wilkinson, Michael

    2014-03-01

    Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.

  4. Statistics Refresher for Molecular Imaging Technologists, Part 2: Accuracy of Interpretation, Significance, and Variance.

    PubMed

    Farrell, Mary Beth

    2018-06-01

    This article is the second part of a continuing education series reviewing basic statistics that nuclear medicine and molecular imaging technologists should understand. In this article, the statistics for evaluating interpretation accuracy, significance, and variance are discussed. Throughout the article, actual statistics are pulled from the published literature. We begin by explaining 2 methods for quantifying interpretive accuracy: interreader and intrareader reliability. Agreement among readers can be expressed simply as a percentage. However, the Cohen κ-statistic is a more robust measure of agreement that accounts for chance. The higher the κ-statistic is, the higher is the agreement between readers. When 3 or more readers are being compared, the Fleiss κ-statistic is used. Significance testing determines whether the difference between 2 conditions or interventions is meaningful. Statistical significance is usually expressed using a number called a probability ( P ) value. Calculation of P value is beyond the scope of this review. However, knowing how to interpret P values is important for understanding the scientific literature. Generally, a P value of less than 0.05 is considered significant and indicates that the results of the experiment are due to more than just chance. Variance, standard deviation (SD), confidence interval, and standard error (SE) explain the dispersion of data around a mean of a sample drawn from a population. SD is commonly reported in the literature. A small SD indicates that there is not much variation in the sample data. Many biologic measurements fall into what is referred to as a normal distribution taking the shape of a bell curve. In a normal distribution, 68% of the data will fall within 1 SD, 95% will fall within 2 SDs, and 99.7% will fall within 3 SDs. Confidence interval defines the range of possible values within which the population parameter is likely to lie and gives an idea of the precision of the statistic being

  5. Using the Bootstrap Method for a Statistical Significance Test of Differences between Summary Histograms

    NASA Technical Reports Server (NTRS)

    Xu, Kuan-Man

    2006-01-01

    A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.

  6. Mass spectrometry-based protein identification with accurate statistical significance assignment.

    PubMed

    Alves, Gelio; Yu, Yi-Kuo

    2015-03-01

    Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sorić formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.

  7. Strategies for Testing Statistical and Practical Significance in Detecting DIF with Logistic Regression Models

    ERIC Educational Resources Information Center

    Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza

    2014-01-01

    This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…

  8. Clinical relevance vs. statistical significance: Using neck outcomes in patients with temporomandibular disorders as an example.

    PubMed

    Armijo-Olivo, Susan; Warren, Sharon; Fuentes, Jorge; Magee, David J

    2011-12-01

    Statistical significance has been used extensively to evaluate the results of research studies. Nevertheless, it offers only limited information to clinicians. The assessment of clinical relevance can facilitate the interpretation of the research results into clinical practice. The objective of this study was to explore different methods to evaluate the clinical relevance of the results using a cross-sectional study as an example comparing different neck outcomes between subjects with temporomandibular disorders and healthy controls. Subjects were compared for head and cervical posture, maximal cervical muscle strength, endurance of the cervical flexor and extensor muscles, and electromyographic activity of the cervical flexor muscles during the CranioCervical Flexion Test (CCFT). The evaluation of clinical relevance of the results was performed based on the effect size (ES), minimal important difference (MID), and clinical judgement. The results of this study show that it is possible to have statistical significance without having clinical relevance, to have both statistical significance and clinical relevance, to have clinical relevance without having statistical significance, or to have neither statistical significance nor clinical relevance. The evaluation of clinical relevance in clinical research is crucial to simplify the transfer of knowledge from research into practice. Clinical researchers should present the clinical relevance of their results. Copyright © 2011 Elsevier Ltd. All rights reserved.

  9. Assessing the significance of pedobarographic signals using random field theory.

    PubMed

    Pataky, Todd C

    2008-08-07

    Traditional pedobarographic statistical analyses are conducted over discrete regions. Recent studies have demonstrated that regionalization can corrupt pedobarographic field data through conflation when arbitrary dividing lines inappropriately delineate smooth field processes. An alternative is to register images such that homologous structures optimally overlap and then conduct statistical tests at each pixel to generate statistical parametric maps (SPMs). The significance of SPM processes may be assessed within the framework of random field theory (RFT). RFT is ideally suited to pedobarographic image analysis because its fundamental data unit is a lattice sampling of a smooth and continuous spatial field. To correct for the vast number of multiple comparisons inherent in such data, recent pedobarographic studies have employed a Bonferroni correction to retain a constant family-wise error rate. This approach unfortunately neglects the spatial correlation of neighbouring pixels, so provides an overly conservative (albeit valid) statistical threshold. RFT generally relaxes the threshold depending on field smoothness and on the geometry of the search area, but it also provides a framework for assigning p values to suprathreshold clusters based on their spatial extent. The current paper provides an overview of basic RFT concepts and uses simulated and experimental data to validate both RFT-relevant field smoothness estimations and RFT predictions regarding the topological characteristics of random pedobarographic fields. Finally, previously published experimental data are re-analysed using RFT inference procedures to demonstrate how RFT yields easily understandable statistical results that may be incorporated into routine clinical and laboratory analyses.

  10. The fragility of statistically significant findings from randomized trials in head and neck surgery.

    PubMed

    Noel, Christopher W; McMullen, Caitlin; Yao, Christopher; Monteiro, Eric; Goldstein, David P; Eskander, Antoine; de Almeida, John R

    2018-04-23

    The Fragility Index (FI) is a novel tool for evaluating the robustness of statistically significant findings in a randomized control trial (RCT). It measures the number of events upon which statistical significance depends. We sought to calculate the FI scores for RCTs in the head and neck cancer literature where surgery was a primary intervention. Potential articles were identified in PubMed (MEDLINE), Embase, and Cochrane without publication date restrictions. Two reviewers independently screened eligible RCTs reporting at least one dichotomous and statistically significant outcome. The data from each trial were extracted and the FI scores were calculated. Associations between trial characteristics and FI were determined. In total, 27 articles were identified. The median sample size was 67.5 (interquartile range [IQR] = 42-143) and the median number of events per trial was 8 (IQR = 2.25-18.25). The median FI score was 1 (IQR = 0-2.5), meaning that changing one patient from a nonevent to an event in the treatment arm would change the result to a statistically nonsignificant result, or P > .05. The FI score was less than the number of patients lost to follow-up in 71% of cases. The FI score was found to be moderately correlated with P value (ρ = -0.52, P = .007) and with journal impact factor (ρ = 0.49, P = .009) on univariable analysis. On multivariable analysis, only the P value was found to be a predictor of FI score (P = .001). Randomized trials in the head and neck cancer literature where surgery is a primary modality are relatively nonrobust statistically with low FI scores. Laryngoscope, 2018. © 2018 The American Laryngological, Rhinological and Otological Society, Inc.

  11. Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance

    NASA Astrophysics Data System (ADS)

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Suffredini, Anthony F.; Sacks, David B.; Yu, Yi-Kuo

    2016-02-01

    Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple `fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.

  12. Reporting of statistically significant results at ClinicalTrials.gov for completed superiority randomized controlled trials.

    PubMed

    Dechartres, Agnes; Bond, Elizabeth G; Scheer, Jordan; Riveros, Carolina; Atal, Ignacio; Ravaud, Philippe

    2016-11-30

    Publication bias and other reporting bias have been well documented for journal articles, but no study has evaluated the nature of results posted at ClinicalTrials.gov. We aimed to assess how many randomized controlled trials (RCTs) with results posted at ClinicalTrials.gov report statistically significant results and whether the proportion of trials with significant results differs when no treatment effect estimate or p-value is posted. We searched ClinicalTrials.gov in June 2015 for all studies with results posted. We included completed RCTs with a superiority hypothesis and considered results for the first primary outcome with results posted. For each trial, we assessed whether a treatment effect estimate and/or p-value was reported at ClinicalTrials.gov and if yes, whether results were statistically significant. If no treatment effect estimate or p-value was reported, we calculated the treatment effect and corresponding p-value using results per arm posted at ClinicalTrials.gov when sufficient data were reported. From the 17,536 studies with results posted at ClinicalTrials.gov, we identified 2823 completed phase 3 or 4 randomized trials with a superiority hypothesis. Of these, 1400 (50%) reported a treatment effect estimate and/or p-value. Results were statistically significant for 844 trials (60%), with a median p-value of 0.01 (Q1-Q3: 0.001-0.26). For the 1423 trials with no treatment effect estimate or p-value posted, we could calculate the treatment effect and corresponding p-value using results reported per arm for 929 (65%). For 494 trials (35%), p-values could not be calculated mainly because of insufficient reporting, censored data, or repeated measurements over time. For the 929 trials we could calculate p-values, we found statistically significant results for 342 (37%), with a median p-value of 0.19 (Q1-Q3: 0.005-0.59). Half of the trials with results posted at ClinicalTrials.gov reported a treatment effect estimate and/or p-value, with significant

  13. Statistical significance of task related deep brain EEG dynamic changes in the time-frequency domain.

    PubMed

    Chládek, J; Brázdil, M; Halámek, J; Plešinger, F; Jurák, P

    2013-01-01

    We present an off-line analysis procedure for exploring brain activity recorded from intra-cerebral electroencephalographic data (SEEG). The objective is to determine the statistical differences between different types of stimulations in the time-frequency domain. The procedure is based on computing relative signal power change and subsequent statistical analysis. An example of characteristic statistically significant event-related de/synchronization (ERD/ERS) detected across different frequency bands following different oddball stimuli is presented. The method is used for off-line functional classification of different brain areas.

  14. Intensive inpatient treatment for bulimia nervosa: Statistical and clinical significance of symptom changes.

    PubMed

    Diedrich, Alice; Schlegl, Sandra; Greetfeld, Martin; Fumi, Markus; Voderholzer, Ulrich

    2018-03-01

    This study examines the statistical and clinical significance of symptom changes during an intensive inpatient treatment program with a strong psychotherapeutic focus for individuals with severe bulimia nervosa. 295 consecutively admitted bulimic patients were administered the Structured Interview for Anorexic and Bulimic Syndromes-Self-Rating (SIAB-S), the Eating Disorder Inventory-2 (EDI-2), the Brief Symptom Inventory (BSI), and the Beck Depression Inventory-II (BDI-II) at treatment intake and discharge. Results indicated statistically significant symptom reductions with large effect sizes regarding severity of binge eating and compensatory behavior (SIAB-S), overall eating disorder symptom severity (EDI-2), overall psychopathology (BSI), and depressive symptom severity (BDI-II) even when controlling for antidepressant medication. The majority of patients showed either reliable (EDI-2: 33.7%, BSI: 34.8%, BDI-II: 18.1%) or even clinically significant symptom changes (EDI-2: 43.2%, BSI: 33.9%, BDI-II: 56.9%). Patients with clinically significant improvement were less distressed at intake and less likely to suffer from a comorbid borderline personality disorder when compared with those who did not improve to a clinically significant extent. Findings indicate that intensive psychotherapeutic inpatient treatment may be effective in about 75% of severely affected bulimic patients. For the remaining non-responding patients, inpatient treatment might be improved through an even stronger focus on the reduction of comorbid borderline personality traits.

  15. Using the Descriptive Bootstrap to Evaluate Result Replicability (Because Statistical Significance Doesn't)

    ERIC Educational Resources Information Center

    Spinella, Sarah

    2011-01-01

    As result replicability is essential to science and difficult to achieve through external replicability, the present paper notes the insufficiency of null hypothesis statistical significance testing (NHSST) and explains the bootstrap as a plausible alternative, with a heuristic example to illustrate the bootstrap method. The bootstrap relies on…

  16. Cloud-based solution to identify statistically significant MS peaks differentiating sample categories.

    PubMed

    Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B

    2013-03-23

    Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.

  17. Statistical Significance of Optical Map Alignments

    PubMed Central

    Sarkar, Deepayan; Goldstein, Steve; Schwartz, David C.

    2012-01-01

    Abstract The Optical Mapping System constructs ordered restriction maps spanning entire genomes through the assembly and analysis of large datasets comprising individually analyzed genomic DNA molecules. Such restriction maps uniquely reveal mammalian genome structure and variation, but also raise computational and statistical questions beyond those that have been solved in the analysis of smaller, microbial genomes. We address the problem of how to filter maps that align poorly to a reference genome. We obtain map-specific thresholds that control errors and improve iterative assembly. We also show how an optimal self-alignment score provides an accurate approximation to the probability of alignment, which is useful in applications seeking to identify structural genomic abnormalities. PMID:22506568

  18. Power, effects, confidence, and significance: an investigation of statistical practices in nursing research.

    PubMed

    Gaskin, Cadeyrn J; Happell, Brenda

    2014-05-01

    improvement. Most importantly, researchers should abandon the misleading practice of interpreting the results from inferential tests based solely on whether they are statistically significant (or not) and, instead, focus on reporting and interpreting effect sizes, confidence intervals, and significance levels. Nursing researchers also need to conduct and report a priori power analyses, and to address the issue of Type I experiment-wise error inflation in their studies. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.

  19. Statistically significant performance results of a mine detector and fusion algorithm from an x-band high-resolution SAR

    NASA Astrophysics Data System (ADS)

    Williams, Arnold C.; Pachowicz, Peter W.

    2004-09-01

    Current mine detection research indicates that no single sensor or single look from a sensor will detect mines/minefields in a real-time manner at a performance level suitable for a forward maneuver unit. Hence, the integrated development of detectors and fusion algorithms are of primary importance. A problem in this development process has been the evaluation of these algorithms with relatively small data sets, leading to anecdotal and frequently over trained results. These anecdotal results are often unreliable and conflicting among various sensors and algorithms. Consequently, the physical phenomena that ought to be exploited and the performance benefits of this exploitation are often ambiguous. The Army RDECOM CERDEC Night Vision Laboratory and Electron Sensors Directorate has collected large amounts of multisensor data such that statistically significant evaluations of detection and fusion algorithms can be obtained. Even with these large data sets care must be taken in algorithm design and data processing to achieve statistically significant performance results for combined detectors and fusion algorithms. This paper discusses statistically significant detection and combined multilook fusion results for the Ellipse Detector (ED) and the Piecewise Level Fusion Algorithm (PLFA). These statistically significant performance results are characterized by ROC curves that have been obtained through processing this multilook data for the high resolution SAR data of the Veridian X-Band radar. We discuss the implications of these results on mine detection and the importance of statistical significance, sample size, ground truth, and algorithm design in performance evaluation.

  20. How to get statistically significant effects in any ERP experiment (and why you shouldn't).

    PubMed

    Luck, Steven J; Gaspelin, Nicholas

    2017-01-01

    ERP experiments generate massive datasets, often containing thousands of values for each participant, even after averaging. The richness of these datasets can be very useful in testing sophisticated hypotheses, but this richness also creates many opportunities to obtain effects that are statistically significant but do not reflect true differences among groups or conditions (bogus effects). The purpose of this paper is to demonstrate how common and seemingly innocuous methods for quantifying and analyzing ERP effects can lead to very high rates of significant but bogus effects, with the likelihood of obtaining at least one such bogus effect exceeding 50% in many experiments. We focus on two specific problems: using the grand-averaged data to select the time windows and electrode sites for quantifying component amplitudes and latencies, and using one or more multifactor statistical analyses. Reanalyses of prior data and simulations of typical experimental designs are used to show how these problems can greatly increase the likelihood of significant but bogus results. Several strategies are described for avoiding these problems and for increasing the likelihood that significant effects actually reflect true differences among groups or conditions. © 2016 Society for Psychophysiological Research.

  1. How to Get Statistically Significant Effects in Any ERP Experiment (and Why You Shouldn’t)

    PubMed Central

    Luck, Steven J.; Gaspelin, Nicholas

    2016-01-01

    Event-related potential (ERP) experiments generate massive data sets, often containing thousands of values for each participant, even after averaging. The richness of these data sets can be very useful in testing sophisticated hypotheses, but this richness also creates many opportunities to obtain effects that are statistically significant but do not reflect true differences among groups or conditions (bogus effects). The purpose of this paper is to demonstrate how common and seemingly innocuous methods for quantifying and analyzing ERP effects can lead to very high rates of significant-but-bogus effects, with the likelihood of obtaining at least one such bogus effect exceeding 50% in many experiments. We focus on two specific problems: using the grand average data to select the time windows and electrode sites for quantifying component amplitudes and latencies, and using one or more multi-factor statistical analyses. Re-analyses of prior data and simulations of typical experimental designs are used to show how these problems can greatly increase the likelihood of significant-but-bogus results. Several strategies are described for avoiding these problems and for increasing the likelihood that significant effects actually reflect true differences among groups or conditions. PMID:28000253

  2. Significant Association of Urinary Toxic Metals and Autism-Related Symptoms—A Nonlinear Statistical Analysis with Cross Validation

    PubMed Central

    Adams, James; Kruger, Uwe; Geis, Elizabeth; Gehn, Eva; Fimbres, Valeria; Pollard, Elena; Mitchell, Jessica; Ingram, Julie; Hellmers, Robert; Quig, David; Hahn, Juergen

    2017-01-01

    Introduction A number of previous studies examined a possible association of toxic metals and autism, and over half of those studies suggest that toxic metal levels are different in individuals with Autism Spectrum Disorders (ASD). Additionally, several studies found that those levels correlate with the severity of ASD. Methods In order to further investigate these points, this paper performs the most detailed statistical analysis to date of a data set in this field. First morning urine samples were collected from 67 children and adults with ASD and 50 neurotypical controls of similar age and gender. The samples were analyzed to determine the levels of 10 urinary toxic metals (UTM). Autism-related symptoms were assessed with eleven behavioral measures. Statistical analysis was used to distinguish participants on the ASD spectrum and neurotypical participants based upon the UTM data alone. The analysis also included examining the association of autism severity with toxic metal excretion data using linear and nonlinear analysis. “Leave-one-out” cross-validation was used to ensure statistical independence of results. Results and Discussion Average excretion levels of several toxic metals (lead, tin, thallium, antimony) were significantly higher in the ASD group. However, ASD classification using univariate statistics proved difficult due to large variability, but nonlinear multivariate statistical analysis significantly improved ASD classification with Type I/II errors of 15% and 18%, respectively. These results clearly indicate that the urinary toxic metal excretion profiles of participants in the ASD group were significantly different from those of the neurotypical participants. Similarly, nonlinear methods determined a significantly stronger association between the behavioral measures and toxic metal excretion. The association was strongest for the Aberrant Behavior Checklist (including subscales on Irritability, Stereotypy, Hyperactivity, and Inappropriate

  3. Statistical significance estimation of a signal within the GooFit framework on GPUs

    NASA Astrophysics Data System (ADS)

    Cristella, Leonardo; Di Florio, Adriano; Pompili, Alexis

    2017-03-01

    In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B+ → J/ψϕK+. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.

  4. The intriguing evolution of effect sizes in biomedical research over time: smaller but more often statistically significant.

    PubMed

    Monsarrat, Paul; Vergnes, Jean-Noel

    2018-01-01

    In medicine, effect sizes (ESs) allow the effects of independent variables (including risk/protective factors or treatment interventions) on dependent variables (e.g., health outcomes) to be quantified. Given that many public health decisions and health care policies are based on ES estimates, it is important to assess how ESs are used in the biomedical literature and to investigate potential trends in their reporting over time. Through a big data approach, the text mining process automatically extracted 814 120 ESs from 13 322 754 PubMed abstracts. Eligible ESs were risk ratio, odds ratio, and hazard ratio, along with their confidence intervals. Here we show a remarkable decrease of ES values in PubMed abstracts between 1990 and 2015 while, concomitantly, results become more often statistically significant. Medians of ES values have decreased over time for both "risk" and "protective" values. This trend was found in nearly all fields of biomedical research, with the most marked downward tendency in genetics. Over the same period, the proportion of statistically significant ESs increased regularly: among the abstracts with at least 1 ES, 74% were statistically significant in 1990-1995, vs 85% in 2010-2015. whereas decreasing ESs could be an intrinsic evolution in biomedical research, the concomitant increase of statistically significant results is more intriguing. Although it is likely that growing sample sizes in biomedical research could explain these results, another explanation may lie in the "publish or perish" context of scientific research, with the probability of a growing orientation toward sensationalism in research reports. Important provisions must be made to improve the credibility of biomedical research and limit waste of resources. © The Authors 2017. Published by Oxford University Press.

  5. Tipping points in the arctic: eyeballing or statistical significance?

    PubMed

    Carstensen, Jacob; Weydmann, Agata

    2012-02-01

    Arctic ecosystems have experienced and are projected to experience continued large increases in temperature and declines in sea ice cover. It has been hypothesized that small changes in ecosystem drivers can fundamentally alter ecosystem functioning, and that this might be particularly pronounced for Arctic ecosystems. We present a suite of simple statistical analyses to identify changes in the statistical properties of data, emphasizing that changes in the standard error should be considered in addition to changes in mean properties. The methods are exemplified using sea ice extent, and suggest that the loss rate of sea ice accelerated by factor of ~5 in 1996, as reported in other studies, but increases in random fluctuations, as an early warning signal, were observed already in 1990. We recommend to employ the proposed methods more systematically for analyzing tipping points to document effects of climate change in the Arctic.

  6. A Note on Comparing the Power of Test Statistics at Low Significance Levels.

    PubMed

    Morris, Nathan; Elston, Robert

    2011-01-01

    It is an obvious fact that the power of a test statistic is dependent upon the significance (alpha) level at which the test is performed. It is perhaps a less obvious fact that the relative performance of two statistics in terms of power is also a function of the alpha level. Through numerous personal discussions, we have noted that even some competent statisticians have the mistaken intuition that relative power comparisons at traditional levels such as α = 0.05 will be roughly similar to relative power comparisons at very low levels, such as the level α = 5 × 10 -8 , which is commonly used in genome-wide association studies. In this brief note, we demonstrate that this notion is in fact quite wrong, especially with respect to comparing tests with differing degrees of freedom. In fact, at very low alpha levels the cost of additional degrees of freedom is often comparatively low. Thus we recommend that statisticians exercise caution when interpreting the results of power comparison studies which use alpha levels that will not be used in practice.

  7. Use of Tests of Statistical Significance and Other Analytic Choices in a School Psychology Journal: Review of Practices and Suggested Alternatives.

    ERIC Educational Resources Information Center

    Snyder, Patricia A.; Thompson, Bruce

    The use of tests of statistical significance was explored, first by reviewing some criticisms of contemporary practice in the use of statistical tests as reflected in a series of articles in the "American Psychologist" and in the appointment of a "Task Force on Statistical Inference" by the American Psychological Association…

  8. Albeit nocturnal, rats subjected to traumatic brain injury do not differ in neurobehavioral performance whether tested during the day or night.

    PubMed

    Niesman, Peter J; Wei, Jiahui; LaPorte, Megan J; Carlson, Lauren J; Nassau, Kileigh L; Bao, Gina C; Cheng, Jeffrey P; de la Tremblaye, Patricia; Lajud, Naima; Bondi, Corina O; Kline, Anthony E

    2018-02-05

    Behavioral assessments in rats are overwhelmingly conducted during the day, albeit that is when they are least active. This incongruity may preclude optimal performance. Hence, the goal of this study was to determine if differences in neurobehavior exist in traumatic brain injured (TBI) rats when assessed during the day vs. night. The hypothesis was that the night group would perform better than the day group on all behavioral tasks. Anesthetized adult male rats received either a cortical impact or sham injury and then were randomly assigned to either Day (1:00-3:00p.m.) or Night (7:30-9:30p.m.) testing. Motor function (beam-balance/walk) was conducted on post-operative days 1-5 and cognitive performance (spatial learning) was assessed on days 14-18. Corticosterone (CORT) levels were quantified at 24h and 21days after TBI. No significant differences were revealed between the TBI rats tested during the Day vs. Night for motor or cognition (p's<0.05). CORT levels were higher in the Night-tested TBI and sham groups at 24h (p<0.05), but returned to baseline and were no longer different by day 21 (p>0.05), suggesting an initial, but transient, stress response that did not affect neurobehavioral outcome. These data suggest that the time rats are tested has no noticeable impact on their performance, which does not support the hypothesis. The finding validates the interpretations from numerous studies conducted when rats were tested during the day vs. their natural active period. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. On Interestingness Measures for Mining Statistically Significant and Novel Clinical Associations from EMRs

    PubMed Central

    Abar, Orhan; Charnigo, Richard J.; Rayapati, Abner

    2017-01-01

    Association rule mining has received significant attention from both the data mining and machine learning communities. While data mining researchers focus more on designing efficient algorithms to mine rules from large datasets, the learning community has explored applications of rule mining to classification. A major problem with rule mining algorithms is the explosion of rules even for moderate sized datasets making it very difficult for end users to identify both statistically significant and potentially novel rules that could lead to interesting new insights and hypotheses. Researchers have proposed many domain independent interestingness measures using which, one can rank the rules and potentially glean useful rules from the top ranked ones. However, these measures have not been fully explored for rule mining in clinical datasets owing to the relatively large sizes of the datasets often encountered in healthcare and also due to limited access to domain experts for review/analysis. In this paper, using an electronic medical record (EMR) dataset of diagnoses and medications from over three million patient visits to the University of Kentucky medical center and affiliated clinics, we conduct a thorough evaluation of dozens of interestingness measures proposed in data mining literature, including some new composite measures. Using cumulative relevance metrics from information retrieval, we compare these interestingness measures against human judgments obtained from a practicing psychiatrist for association rules involving the depressive disorders class as the consequent. Our results not only surface new interesting associations for depressive disorders but also indicate classes of interestingness measures that weight rule novelty and statistical strength in contrasting ways, offering new insights for end users in identifying interesting rules. PMID:28736771

  10. A randomized trial in a massive online open course shows people don't know what a statistically significant relationship looks like, but they can learn.

    PubMed

    Fisher, Aaron; Anderson, G Brooke; Peng, Roger; Leek, Jeff

    2014-01-01

    Scatterplots are the most common way for statisticians, scientists, and the public to visually detect relationships between measured variables. At the same time, and despite widely publicized controversy, P-values remain the most commonly used measure to statistically justify relationships identified between variables. Here we measure the ability to detect statistically significant relationships from scatterplots in a randomized trial of 2,039 students in a statistics massive open online course (MOOC). Each subject was shown a random set of scatterplots and asked to visually determine if the underlying relationships were statistically significant at the P < 0.05 level. Subjects correctly classified only 47.4% (95% CI [45.1%-49.7%]) of statistically significant relationships, and 74.6% (95% CI [72.5%-76.6%]) of non-significant relationships. Adding visual aids such as a best fit line or scatterplot smooth increased the probability a relationship was called significant, regardless of whether the relationship was actually significant. Classification of statistically significant relationships improved on repeat attempts of the survey, although classification of non-significant relationships did not. Our results suggest: (1) that evidence-based data analysis can be used to identify weaknesses in theoretical procedures in the hands of average users, (2) data analysts can be trained to improve detection of statistically significant results with practice, but (3) data analysts have incorrect intuition about what statistically significant relationships look like, particularly for small effects. We have built a web tool for people to compare scatterplots with their corresponding p-values which is available here: http://glimmer.rstudio.com/afisher/EDA/.

  11. A randomized trial in a massive online open course shows people don’t know what a statistically significant relationship looks like, but they can learn

    PubMed Central

    Fisher, Aaron; Anderson, G. Brooke; Peng, Roger

    2014-01-01

    Scatterplots are the most common way for statisticians, scientists, and the public to visually detect relationships between measured variables. At the same time, and despite widely publicized controversy, P-values remain the most commonly used measure to statistically justify relationships identified between variables. Here we measure the ability to detect statistically significant relationships from scatterplots in a randomized trial of 2,039 students in a statistics massive open online course (MOOC). Each subject was shown a random set of scatterplots and asked to visually determine if the underlying relationships were statistically significant at the P < 0.05 level. Subjects correctly classified only 47.4% (95% CI [45.1%–49.7%]) of statistically significant relationships, and 74.6% (95% CI [72.5%–76.6%]) of non-significant relationships. Adding visual aids such as a best fit line or scatterplot smooth increased the probability a relationship was called significant, regardless of whether the relationship was actually significant. Classification of statistically significant relationships improved on repeat attempts of the survey, although classification of non-significant relationships did not. Our results suggest: (1) that evidence-based data analysis can be used to identify weaknesses in theoretical procedures in the hands of average users, (2) data analysts can be trained to improve detection of statistically significant results with practice, but (3) data analysts have incorrect intuition about what statistically significant relationships look like, particularly for small effects. We have built a web tool for people to compare scatterplots with their corresponding p-values which is available here: http://glimmer.rstudio.com/afisher/EDA/. PMID:25337457

  12. A study on the use of Gumbel approximation with the Bernoulli spatial scan statistic.

    PubMed

    Read, S; Bath, P A; Willett, P; Maheswaran, R

    2013-08-30

    The Bernoulli version of the spatial scan statistic is a well established method of detecting localised spatial clusters in binary labelled point data, a typical application being the epidemiological case-control study. A recent study suggests the inferential accuracy of several versions of the spatial scan statistic (principally the Poisson version) can be improved, at little computational cost, by using the Gumbel distribution, a method now available in SaTScan(TM) (www.satscan.org). We study in detail the effect of this technique when applied to the Bernoulli version and demonstrate that it is highly effective, albeit with some increase in false alarm rates at certain significance thresholds. We explain how this increase is due to the discrete nature of the Bernoulli spatial scan statistic and demonstrate that it can affect even small p-values. Despite this, we argue that the Gumbel method is actually preferable for very small p-values. Furthermore, we extend previous research by running benchmark trials on 12 000 synthetic datasets, thus demonstrating that the overall detection capability of the Bernoulli version (i.e. ratio of power to false alarm rate) is not noticeably affected by the use of the Gumbel method. We also provide an example application of the Gumbel method using data on hospital admissions for chronic obstructive pulmonary disease. Copyright © 2013 John Wiley & Sons, Ltd.

  13. Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Sacks, David B.; Yu, Yi-Kuo

    2018-06-01

    Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.

  14. Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry.

    PubMed

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Sacks, David B; Yu, Yi-Kuo

    2018-06-05

    Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.

  15. Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data

    PubMed Central

    Kim, Sung-Min; Choi, Yosoon

    2017-01-01

    To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z-score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z-scores: high content with a high z-score (HH), high content with a low z-score (HL), low content with a high z-score (LH), and low content with a low z-score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1–4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required. PMID:28629168

  16. Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data.

    PubMed

    Kim, Sung-Min; Choi, Yosoon

    2017-06-18

    To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z -score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z -scores: high content with a high z -score (HH), high content with a low z -score (HL), low content with a high z -score (LH), and low content with a low z -score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1-4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required.

  17. The role of baryons in creating statistically significant planes of satellites around Milky Way-mass galaxies

    NASA Astrophysics Data System (ADS)

    Ahmed, Sheehan H.; Brooks, Alyson M.; Christensen, Charlotte R.

    2017-04-01

    We investigate whether the inclusion of baryonic physics influences the formation of thin, coherently rotating planes of satellites such as those seen around the Milky Way and Andromeda. For four Milky Way-mass simulations, each run both as dark matter-only and with baryons included, we are able to identify a planar configuration that significantly maximizes the number of plane satellite members. The maximum plane member satellites are consistently different between the dark matter-only and baryonic versions of the same run due to the fact that satellites are both more likely to be destroyed and to infall later in the baryonic runs. Hence, studying satellite planes in dark matter-only simulations is misleading, because they will be composed of different satellite members than those that would exist if baryons were included. Additionally, the destruction of satellites in the baryonic runs leads to less radially concentrated satellite distributions, a result that is critical to making planes that are statistically significant compared to a random distribution. Since all planes pass through the centre of the galaxy, it is much harder to create a plane of a given height from a random distribution if the satellites have a low radial concentration. We identify Andromeda's low radial satellite concentration as a key reason why the plane in Andromeda is highly significant. Despite this, when corotation is considered, none of the satellite planes identified for the simulated galaxies are as statistically significant as the observed planes around the Milky Way and Andromeda, even in the baryonic runs.

  18. "Clinical" Significance: "Clinical" Significance and "Practical" Significance are NOT the Same Things

    ERIC Educational Resources Information Center

    Peterson, Lisa S.

    2008-01-01

    Clinical significance is an important concept in research, particularly in education and the social sciences. The present article first compares clinical significance to other measures of "significance" in statistics. The major methods used to determine clinical significance are explained and the strengths and weaknesses of clinical significance…

  19. The distribution of P-values in medical research articles suggested selective reporting associated with statistical significance.

    PubMed

    Perneger, Thomas V; Combescure, Christophe

    2017-07-01

    Published P-values provide a window into the global enterprise of medical research. The aim of this study was to use the distribution of published P-values to estimate the relative frequencies of null and alternative hypotheses and to seek irregularities suggestive of publication bias. This cross-sectional study included P-values published in 120 medical research articles in 2016 (30 each from the BMJ, JAMA, Lancet, and New England Journal of Medicine). The observed distribution of P-values was compared with expected distributions under the null hypothesis (i.e., uniform between 0 and 1) and the alternative hypothesis (strictly decreasing from 0 to 1). P-values were categorized according to conventional levels of statistical significance and in one-percent intervals. Among 4,158 recorded P-values, 26.1% were highly significant (P < 0.001), 9.1% were moderately significant (P ≥ 0.001 to < 0.01), 11.7% were weakly significant (P ≥ 0.01 to < 0.05), and 53.2% were nonsignificant (P ≥ 0.05). We noted three irregularities: (1) high proportion of P-values <0.001, especially in observational studies, (2) excess of P-values equal to 1, and (3) about twice as many P-values less than 0.05 compared with those more than 0.05. The latter finding was seen in both randomized trials and observational studies, and in most types of analyses, excepting heterogeneity tests and interaction tests. Under plausible assumptions, we estimate that about half of the tested hypotheses were null and the other half were alternative. This analysis suggests that statistical tests published in medical journals are not a random sample of null and alternative hypotheses but that selective reporting is prevalent. In particular, significant results are about twice as likely to be reported as nonsignificant results. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. The thresholds for statistical and clinical significance – a five-step procedure for evaluation of intervention effects in randomised clinical trials

    PubMed Central

    2014-01-01

    Background Thresholds for statistical significance are insufficiently demonstrated by 95% confidence intervals or P-values when assessing results from randomised clinical trials. First, a P-value only shows the probability of getting a result assuming that the null hypothesis is true and does not reflect the probability of getting a result assuming an alternative hypothesis to the null hypothesis is true. Second, a confidence interval or a P-value showing significance may be caused by multiplicity. Third, statistical significance does not necessarily result in clinical significance. Therefore, assessment of intervention effects in randomised clinical trials deserves more rigour in order to become more valid. Methods Several methodologies for assessing the statistical and clinical significance of intervention effects in randomised clinical trials were considered. Balancing simplicity and comprehensiveness, a simple five-step procedure was developed. Results For a more valid assessment of results from a randomised clinical trial we propose the following five-steps: (1) report the confidence intervals and the exact P-values; (2) report Bayes factor for the primary outcome, being the ratio of the probability that a given trial result is compatible with a ‘null’ effect (corresponding to the P-value) divided by the probability that the trial result is compatible with the intervention effect hypothesised in the sample size calculation; (3) adjust the confidence intervals and the statistical significance threshold if the trial is stopped early or if interim analyses have been conducted; (4) adjust the confidence intervals and the P-values for multiplicity due to number of outcome comparisons; and (5) assess clinical significance of the trial results. Conclusions If the proposed five-step procedure is followed, this may increase the validity of assessments of intervention effects in randomised clinical trials. PMID:24588900

  1. A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation

    PubMed Central

    Eddy, Sean R.

    2008-01-01

    Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores (“Viterbi” scores) are Gumbel-distributed with constant λ = log 2, and the high scoring tail of Forward scores is exponential with the same constant λ. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments. PMID:18516236

  2. Conducting tests for statistically significant differences using forest inventory data

    Treesearch

    James A. Westfall; Scott A. Pugh; John W. Coulston

    2013-01-01

    Many forest inventory and monitoring programs are based on a sample of ground plots from which estimates of forest resources are derived. In addition to evaluating metrics such as number of trees or amount of cubic wood volume, it is often desirable to make comparisons between resource attributes. To properly conduct statistical tests for differences, it is imperative...

  3. Statistical significance of seasonal warming/cooling trends

    NASA Astrophysics Data System (ADS)

    Ludescher, Josef; Bunde, Armin; Schellnhuber, Hans Joachim

    2017-04-01

    The question whether a seasonal climate trend (e.g., the increase of summer temperatures in Antarctica in the last decades) is of anthropogenic or natural origin is of great importance for mitigation and adaption measures alike. The conventional significance analysis assumes that (i) the seasonal climate trends can be quantified by linear regression, (ii) the different seasonal records can be treated as independent records, and (iii) the persistence in each of these seasonal records can be characterized by short-term memory described by an autoregressive process of first order. Here we show that assumption ii is not valid, due to strong intraannual correlations by which different seasons are correlated. We also show that, even in the absence of correlations, for Gaussian white noise, the conventional analysis leads to a strong overestimation of the significance of the seasonal trends, because multiple testing has not been taken into account. In addition, when the data exhibit long-term memory (which is the case in most climate records), assumption iii leads to a further overestimation of the trend significance. Combining Monte Carlo simulations with the Holm-Bonferroni method, we demonstrate how to obtain reliable estimates of the significance of the seasonal climate trends in long-term correlated records. For an illustration, we apply our method to representative temperature records from West Antarctica, which is one of the fastest-warming places on Earth and belongs to the crucial tipping elements in the Earth system.

  4. A Network-Based Method to Assess the Statistical Significance of Mild Co-Regulation Effects

    PubMed Central

    Horvát, Emőke-Ágnes; Zhang, Jitao David; Uhlmann, Stefan; Sahin, Özgür; Zweig, Katharina Anna

    2013-01-01

    Recent development of high-throughput, multiplexing technology has initiated projects that systematically investigate interactions between two types of components in biological networks, for instance transcription factors and promoter sequences, or microRNAs (miRNAs) and mRNAs. In terms of network biology, such screening approaches primarily attempt to elucidate relations between biological components of two distinct types, which can be represented as edges between nodes in a bipartite graph. However, it is often desirable not only to determine regulatory relationships between nodes of different types, but also to understand the connection patterns of nodes of the same type. Especially interesting is the co-occurrence of two nodes of the same type, i.e., the number of their common neighbours, which current high-throughput screening analysis fails to address. The co-occurrence gives the number of circumstances under which both of the biological components are influenced in the same way. Here we present SICORE, a novel network-based method to detect pairs of nodes with a statistically significant co-occurrence. We first show the stability of the proposed method on artificial data sets: when randomly adding and deleting observations we obtain reliable results even with noise exceeding the expected level in large-scale experiments. Subsequently, we illustrate the viability of the method based on the analysis of a proteomic screening data set to reveal regulatory patterns of human microRNAs targeting proteins in the EGFR-driven cell cycle signalling system. Since statistically significant co-occurrence may indicate functional synergy and the mechanisms underlying canalization, and thus hold promise in drug target identification and therapeutic development, we provide a platform-independent implementation of SICORE with a graphical user interface as a novel tool in the arsenal of high-throughput screening analysis. PMID:24039936

  5. Synthetic velocity gradient tensors and the identification of statistically significant aspects of the structure of turbulence

    NASA Astrophysics Data System (ADS)

    Keylock, Christopher J.

    2017-08-01

    A method is presented for deriving random velocity gradient tensors given a source tensor. These synthetic tensors are constrained to lie within mathematical bounds of the non-normality of the source tensor, but we do not impose direct constraints upon scalar quantities typically derived from the velocity gradient tensor and studied in fluid mechanics. Hence, it becomes possible to ask hypotheses of data at a point regarding the statistical significance of these scalar quantities. Having presented our method and the associated mathematical concepts, we apply it to homogeneous, isotropic turbulence to test the utility of the approach for a case where the behavior of the tensor is understood well. We show that, as well as the concentration of data along the Vieillefosse tail, actual turbulence is also preferentially located in the quadrant where there is both excess enstrophy (Q>0 ) and excess enstrophy production (R<0 ). We also examine the topology implied by the strain eigenvalues and find that for the statistically significant results there is a particularly strong relative preference for the formation of disklike structures in the (Q<0 ,R<0 ) quadrant. With the method shown to be useful for a turbulence that is already understood well, it should be of even greater utility for studying complex flows seen in industry and the environment.

  6. Statistical Significance of Periodicity and Log-Periodicity with Heavy-Tailed Correlated Noise

    NASA Astrophysics Data System (ADS)

    Zhou, Wei-Xing; Sornette, Didier

    We estimate the probability that random noise, of several plausible standard distributions, creates a false alarm that a periodicity (or log-periodicity) is found in a time series. The solution of this problem is already known for independent Gaussian distributed noise. We investigate more general situations with non-Gaussian correlated noises and present synthetic tests on the detectability and statistical significance of periodic components. A periodic component of a time series is usually detected by some sort of Fourier analysis. Here, we use the Lomb periodogram analysis, which is suitable and outperforms Fourier transforms for unevenly sampled time series. We examine the false-alarm probability of the largest spectral peak of the Lomb periodogram in the presence of power-law distributed noises, of short-range and of long-range fractional-Gaussian noises. Increasing heavy-tailness (respectively correlations describing persistence) tends to decrease (respectively increase) the false-alarm probability of finding a large spurious Lomb peak. Increasing anti-persistence tends to decrease the false-alarm probability. We also study the interplay between heavy-tailness and long-range correlations. In order to fully determine if a Lomb peak signals a genuine rather than a spurious periodicity, one should in principle characterize the Lomb peak height, its width and its relations to other peaks in the complete spectrum. As a step towards this full characterization, we construct the joint-distribution of the frequency position (relative to other peaks) and of the height of the highest peak of the power spectrum. We also provide the distributions of the ratio of the highest Lomb peak to the second highest one. Using the insight obtained by the present statistical study, we re-examine previously reported claims of ``log-periodicity'' and find that the credibility for log-periodicity in 2D-freely decaying turbulence is weakened while it is strengthened for fracture, for the

  7. Statistical downscaling rainfall using artificial neural network: significantly wetter Bangkok?

    NASA Astrophysics Data System (ADS)

    Vu, Minh Tue; Aribarg, Thannob; Supratid, Siriporn; Raghavan, Srivatsan V.; Liong, Shie-Yui

    2016-11-01

    Artificial neural network (ANN) is an established technique with a flexible mathematical structure that is capable of identifying complex nonlinear relationships between input and output data. The present study utilizes ANN as a method of statistically downscaling global climate models (GCMs) during the rainy season at meteorological site locations in Bangkok, Thailand. The study illustrates the applications of the feed forward back propagation using large-scale predictor variables derived from both the ERA-Interim reanalyses data and present day/future GCM data. The predictors are first selected over different grid boxes surrounding Bangkok region and then screened by using principal component analysis (PCA) to filter the best correlated predictors for ANN training. The reanalyses downscaled results of the present day climate show good agreement against station precipitation with a correlation coefficient of 0.8 and a Nash-Sutcliffe efficiency of 0.65. The final downscaled results for four GCMs show an increasing trend of precipitation for rainy season over Bangkok by the end of the twenty-first century. The extreme values of precipitation determined using statistical indices show strong increases of wetness. These findings will be useful for policy makers in pondering adaptation measures due to flooding such as whether the current drainage network system is sufficient to meet the changing climate and to plan for a range of related adaptation/mitigation measures.

  8. Statistical trend analysis and extreme distribution of significant wave height from 1958 to 1999 - an application to the Italian Seas

    NASA Astrophysics Data System (ADS)

    Martucci, G.; Carniel, S.; Chiggiato, J.; Sclavo, M.; Lionello, P.; Galati, M. B.

    2010-06-01

    The study is a statistical analysis of sea states timeseries derived using the wave model WAM forced by the ERA-40 dataset in selected areas near the Italian coasts. For the period 1 January 1958 to 31 December 1999 the analysis yields: (i) the existence of a negative trend in the annual- and winter-averaged sea state heights; (ii) the existence of a turning-point in late 80's in the annual-averaged trend of sea state heights at a site in the Northern Adriatic Sea; (iii) the overall absence of a significant trend in the annual-averaged mean durations of sea states over thresholds; (iv) the assessment of the extreme values on a time-scale of thousand years. The analysis uses two methods to obtain samples of extremes from the independent sea states: the r-largest annual maxima and the peak-over-threshold. The two methods show statistical differences in retrieving the return values and more generally in describing the significant wave field. The r-largest annual maxima method provides more reliable predictions of the extreme values especially for small return periods (<100 years). Finally, the study statistically proves the existence of decadal negative trends in the significant wave heights and by this it conveys useful information on the wave climatology of the Italian seas during the second half of the 20th century.

  9. Does the Aging Process Significantly Modify the Mean Heart Rate?

    PubMed Central

    Santos, Marcos Antonio Almeida; Sousa, Antonio Carlos Sobral; Reis, Francisco Prado; Santos, Thayná Ramos; Lima, Sonia Oliveira; Barreto-Filho, José Augusto

    2013-01-01

    Background The Mean Heart Rate (MHR) tends to decrease with age. When adjusted for gender and diseases, the magnitude of this effect is unclear. Objective To analyze the MHR in a stratified sample of active and functionally independent individuals. Methods A total of 1,172 patients aged ≥ 40 years underwent Holter monitoring and were stratified by age group: 1 = 40-49, 2 = 50-59, 3 = 60-69, 4 = 70-79, 5 = ≥ 80 years. The MHR was evaluated according to age and gender, adjusted for Hypertension (SAH), dyslipidemia and non-insulin dependent diabetes mellitus (NIDDM). Several models of ANOVA, correlation and linear regression were employed. A two-tailed p value <0.05 was considered significant (95% CI). Results The MHR tended to decrease with the age range: 1 = 77.20 ± 7.10; 2 = 76.66 ± 7.07; 3 = 74.02 ± 7.46; 4 = 72.93 ± 7.35; 5 = 73.41 ± 7.98 (p < 0.001). Women showed a correlation with higher MHR (p <0.001). In the ANOVA and regression models, age and gender were predictors (p < 0.001). However, R2 and ETA2 < 0.10, as well as discrete standardized beta coefficients indicated reduced effect. Dyslipidemia, hypertension and DM did not influence the findings. Conclusion The MHR decreased with age. Women had higher values of MHR, regardless of the age group. Correlations between MHR and age or gender, albeit significant, showed the effect magnitude had little statistical relevance. The prevalence of SAH, dyslipidemia and diabetes mellitus did not influence the results. PMID:24029962

  10. On the statistical significance of excess events: Remarks of caution and the need for a standard method of calculation

    NASA Technical Reports Server (NTRS)

    Staubert, R.

    1985-01-01

    Methods for calculating the statistical significance of excess events and the interpretation of the formally derived values are discussed. It is argued that a simple formula for a conservative estimate should generally be used in order to provide a common understanding of quoted values.

  11. The Use of Intraoperative Sensors Significantly Increases the Patient-Reported Rate of Improvement in Primary Total Knee Arthroplasty.

    PubMed

    Chow, James C; Breslauer, Leigh

    2017-07-01

    Albeit multifactorial, patient satisfaction is predominantly driven by postoperative pain and function. Unfortunately, approximately 20% of total knee arthroplasty (TKA) recipients are dissatisfied with the outcome of their surgery. Objective balancing of the soft tissue envelope may contribute to significant decrease in pain and increase in function when compared with traditional subjective methods. In an effort to confirm this, a cohort of manual TKA patient outcomes was compared with sensor-assisted TKA outcomes. One hundred fourteen patients (57 manual, 57 sensor assisted) received primary TKA. Both cohorts were matched for confounding variables. The dependent variables in this study were 6-month patient-reported outcome measures, including Knee Society Score and Oxford Knee Score. The range of motion and incidence of arthrofibrosis were also captured for both cohorts. The rate of improvement of all patient-reported outcome scores and subscores and range of motion was significantly higher in the sensor-assisted cohort. The rate of arthrofibrosis was lower in the sensor-assisted cohort but not statistically significant. The authors rejected the null hypothesis and concluded that the rate of improvement in objective, patient-reported outcome measures was higher in the sensor-assisted cohort than the manual cohort from preoperatively to 6 months postoperatively. [Orthopedics. 2017; 40(4):e648-e651.]. Copyright 2017, SLACK Incorporated.

  12. Sigsearch: a new term for post hoc unplanned search for statistically significant relationships with the intent to create publishable findings.

    PubMed

    Hashim, Muhammad Jawad

    2010-09-01

    Post-hoc secondary data analysis with no prespecified hypotheses has been discouraged by textbook authors and journal editors alike. Unfortunately no single term describes this phenomenon succinctly. I would like to coin the term "sigsearch" to define this practice and bring it within the teaching lexicon of statistics courses. Sigsearch would include any unplanned, post-hoc search for statistical significance using multiple comparisons of subgroups. It would also include data analysis with outcomes other than the prespecified primary outcome measure of a study as well as secondary data analyses of earlier research.

  13. Correlated randomness: Some examples of exotic statistical physics

    NASA Astrophysics Data System (ADS)

    Stanley, H. Eugene

    2005-05-01

    One challenge of biology, medicine, and economics is that the systems treated by these sciences have no perfect metronome in time and no perfect spatial architecture -- crystalline or otherwise. Nonetheless, as if by magic, out of nothing but randomness one finds remarkably fine-tuned processes in time and remarkably fine-tuned structures in space. To understand this `miracle', one might consider placing aside the human tendency to see the universe as a machine. Instead, one might address the challenge of uncovering how, through randomness (albeit, as we shall see, strongly correlated randomness), one can arrive at many spatial and temporal patterns in biology, medicine, and economics. Inspired by principles developed by statistical physics over the past 50 years -- scale invariance and universality -- we review some recent applications of correlated randomness to fields that might startle Boltzmann if he were alive today.

  14. Scalable detection of statistically significant communities and hierarchies, using message passing for modularity

    PubMed Central

    Zhang, Pan; Moore, Cristopher

    2014-01-01

    Modularity is a popular measure of community structure. However, maximizing the modularity can lead to many competing partitions, with almost the same modularity, that are poorly correlated with each other. It can also produce illusory ‘‘communities’’ in random graphs where none exist. We address this problem by using the modularity as a Hamiltonian at finite temperature and using an efficient belief propagation algorithm to obtain the consensus of many partitions with high modularity, rather than looking for a single partition that maximizes it. We show analytically and numerically that the proposed algorithm works all of the way down to the detectability transition in networks generated by the stochastic block model. It also performs well on real-world networks, revealing large communities in some networks where previous work has claimed no communities exist. Finally we show that by applying our algorithm recursively, subdividing communities until no statistically significant subcommunities can be found, we can detect hierarchical structure in real-world networks more efficiently than previous methods. PMID:25489096

  15. Rock Statistics at the Mars Pathfinder Landing Site, Roughness and Roving on Mars

    NASA Technical Reports Server (NTRS)

    Haldemann, A. F. C.; Bridges, N. T.; Anderson, R. C.; Golombek, M. P.

    1999-01-01

    Several rock counts have been carried out at the Mars Pathfinder landing site producing consistent statistics of rock coverage and size-frequency distributions. These rock statistics provide a primary element of "ground truth" for anchoring remote sensing information used to pick the Pathfinder, and future, landing sites. The observed rock population statistics should also be consistent with the emplacement and alteration processes postulated to govern the landing site landscape. The rock population databases can however be used in ways that go beyond the calculation of cumulative number and cumulative area distributions versus rock diameter and height. Since the spatial parameters measured to characterize each rock are determined with stereo image pairs, the rock database serves as a subset of the full landing site digital terrain model (DTM). Insofar as a rock count can be carried out in a speedier, albeit coarser, manner than the full DTM analysis, rock counting offers several operational and scientific products in the near term. Quantitative rock mapping adds further information to the geomorphic study of the landing site, and can also be used for rover traverse planning. Statistical analysis of the surface roughness using the rock count proxy DTM is sufficiently accurate when compared to the full DTM to compare with radar remote sensing roughness measures, and with rover traverse profiles.

  16. CorSig: a general framework for estimating statistical significance of correlation and its application to gene co-expression analysis.

    PubMed

    Wang, Hong-Qiang; Tsai, Chung-Jui

    2013-01-01

    With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher's Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu's procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data. A web server for

  17. Searching for statistically significant regulatory modules.

    PubMed

    Bailey, Timothy L; Noble, William Stafford

    2003-10-01

    The regulatory machinery controlling gene expression is complex, frequently requiring multiple, simultaneous DNA-protein interactions. The rate at which a gene is transcribed may depend upon the presence or absence of a collection of transcription factors bound to the DNA near the gene. Locating transcription factor binding sites in genomic DNA is difficult because the individual sites are small and tend to occur frequently by chance. True binding sites may be identified by their tendency to occur in clusters, sometimes known as regulatory modules. We describe an algorithm for detecting occurrences of regulatory modules in genomic DNA. The algorithm, called mcast, takes as input a DNA database and a collection of binding site motifs that are known to operate in concert. mcast uses a motif-based hidden Markov model with several novel features. The model incorporates motif-specific p-values, thereby allowing scores from motifs of different widths and specificities to be compared directly. The p-value scoring also allows mcast to only accept motif occurrences with significance below a user-specified threshold, while still assigning better scores to motif occurrences with lower p-values. mcast can search long DNA sequences, modeling length distributions between motifs within a regulatory module, but ignoring length distributions between modules. The algorithm produces a list of predicted regulatory modules, ranked by E-value. We validate the algorithm using simulated data as well as real data sets from fruitfly and human. http://meme.sdsc.edu/MCAST/paper

  18. Statistical trend analysis and extreme distribution of significant wave height from 1958 to 1999 - an application to the Italian Seas

    NASA Astrophysics Data System (ADS)

    Martucci, G.; Carniel, S.; Chiggiato, J.; Sclavo, M.; Lionello, P.; Galati, M. B.

    2009-09-01

    The study is a statistical analysis of sea states timeseries derived using the wave model WAM forced by the ERA-40 dataset in selected areas near the Italian coasts. For the period 1 January 1958 to 31 December 1999 the analysis yields: (i) the existence of a negative trend in the annual- and winter-averaged sea state heights; (ii) the existence of a turning-point in late 70's in the annual-averaged trend of sea state heights at a site in the Northern Adriatic Sea; (iii) the overall absence of a significant trend in the annual-averaged mean durations of sea states over thresholds; (iv) the assessment of the extreme values on a time-scale of thousand years. The analysis uses two methods to obtain samples of extremes from the independent sea states: the r-largest annual maxima and the peak-over-threshold. The two methods show statistical differences in retrieving the return values and more generally in describing the significant wave field. The study shows the existence of decadal negative trends in the significant wave heights and by this it conveys useful information on the wave climatology of the Italian seas during the second half of the 20th century.

  19. SU-F-BRD-05: Dosimetric Comparison of Protocol-Based SBRT Lung Treatment Modalities: Statistically Significant VMAT Advantages Over Fixed- Beam IMRT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Best, R; Harrell, A; Geesey, C

    2014-06-15

    Purpose: The purpose of this study is to inter-compare and find statistically significant differences between flattened field fixed-beam (FB) IMRT with flattening-filter free (FFF) volumetric modulated arc therapy (VMAT) for stereotactic body radiation therapy SBRT. Methods: SBRT plans using FB IMRT and FFF VMAT were generated for fifteen SBRT lung patients using 6 MV beams. For each patient, both IMRT and VMAT plans were created for comparison. Plans were generated utilizing RTOG 0915 (peripheral, 10 patients) and RTOG 0813 (medial, 5 patients) lung protocols. Target dose, critical structure dose, and treatment time were compared and tested for statistical significance. Parametersmore » of interest included prescription isodose surface coverage, target dose heterogeneity, high dose spillage (location and volume), low dose spillage (location and volume), lung dose spillage, and critical structure maximum- and volumetric-dose limits. Results: For all criteria, we found equivalent or higher conformality with VMAT plans as well as reduced critical structure doses. Several differences passed a Student's t-test of significance: VMAT reduced the high dose spillage, evaluated with conformality index (CI), by an average of 9.4%±15.1% (p=0.030) compared to IMRT. VMAT plans reduced the lung volume receiving 20 Gy by 16.2%±15.0% (p=0.016) compared with IMRT. For the RTOG 0915 peripheral lesions, the volumes of lung receiving 12.4 Gy and 11.6 Gy were reduced by 27.0%±13.8% and 27.5%±12.6% (for both, p<0.001) in VMAT plans. Of the 26 protocol pass/fail criteria, VMAT plans were able to achieve an average of 0.2±0.7 (p=0.026) more constraints than the IMRT plans. Conclusions: FFF VMAT has dosimetric advantages over fixed beam IMRT for lung SBRT. Significant advantages included increased dose conformity, and reduced organs-at-risk doses. The overall improvements in terms of protocol pass/fail criteria were more modest and will require more patient data to establish

  20. Funding source and primary outcome changes in clinical trials registered on ClinicalTrials.gov are associated with the reporting of a statistically significant primary outcome: a cross-sectional study.

    PubMed

    Ramagopalan, Sreeram V; Skingsley, Andrew P; Handunnetthi, Lahiru; Magnus, Daniel; Klingel, Michelle; Pakpoor, Julia; Goldacre, Ben

    2015-01-01

    We and others have shown a significant proportion of interventional trials registered on ClinicalTrials.gov have their primary outcomes altered after the listed study start and completion dates. The objectives of this study were to investigate whether changes made to primary outcomes are associated with the likelihood of reporting a statistically significant primary outcome on ClinicalTrials.gov. A cross-sectional analysis of all interventional clinical trials registered on ClinicalTrials.gov as of 20 November 2014 was performed. The main outcome was any change made to the initially listed primary outcome and the time of the change in relation to the trial start and end date. 13,238 completed interventional trials were registered with ClinicalTrials.gov that also had study results posted on the website. 2555 (19.3%) had one or more statistically significant primary outcomes. Statistical analysis showed that registration year, funding source and primary outcome change after trial completion were associated with reporting a statistically significant primary outcome .  Funding source and primary outcome change after trial completion are associated with a statistically significant primary outcome report on clinicaltrials.gov.

  1. Significant lexical relationships

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pedersen, T.; Kayaalp, M.; Bruce, R.

    Statistical NLP inevitably deals with a large number of rare events. As a consequence, NLP data often violates the assumptions implicit in traditional statistical procedures such as significance testing. We describe a significance test, an exact conditional test, that is appropriate for NLP data and can be performed using freely available software. We apply this test to the study of lexical relationships and demonstrate that the results obtained using this test are both theoretically more reliable and different from the results obtained using previously applied tests.

  2. Implementing statistical equating for MRCP(UK) Parts 1 and 2.

    PubMed

    McManus, I C; Chis, Liliana; Fox, Ray; Waller, Derek; Tang, Peter

    2014-09-26

    The MRCP(UK) exam, in 2008 and 2010, changed the standard-setting of its Part 1 and Part 2 examinations from a hybrid Angoff/Hofstee method to statistical equating using Item Response Theory, the reference group being UK graduates. The present paper considers the implementation of the change, the question of whether the pass rate increased amongst non-UK candidates, any possible role of Differential Item Functioning (DIF), and changes in examination predictive validity after the change. Analysis of data of MRCP(UK) Part 1 exam from 2003 to 2013 and Part 2 exam from 2005 to 2013. Inspection suggested that Part 1 pass rates were stable after the introduction of statistical equating, but showed greater annual variation probably due to stronger candidates taking the examination earlier. Pass rates seemed to have increased in non-UK graduates after equating was introduced, but was not associated with any changes in DIF after statistical equating. Statistical modelling of the pass rates for non-UK graduates found that pass rates, in both Part 1 and Part 2, were increasing year on year, with the changes probably beginning before the introduction of equating. The predictive validity of Part 1 for Part 2 was higher with statistical equating than with the previous hybrid Angoff/Hofstee method, confirming the utility of IRT-based statistical equating. Statistical equating was successfully introduced into the MRCP(UK) Part 1 and Part 2 written examinations, resulting in higher predictive validity than the previous Angoff/Hofstee standard setting. Concerns about an artefactual increase in pass rates for non-UK candidates after equating were shown not to be well-founded. Most likely the changes resulted from a genuine increase in candidate ability, albeit for reasons which remain unclear, coupled with a cognitive illusion giving the impression of a step-change immediately after equating began. Statistical equating provides a robust standard-setting method, with a better

  3. Time to first treatment: The significance of early treatment of exudative age-related macular degeneration.

    PubMed

    Rauch, Renate; Weingessel, Birgit; Maca, Saskia M; Vecsei-Marlovits, Pia V

    2012-07-01

    To determine whether the time span between initial symptoms and treatment with ranibizumab in patients with neovascular age-related macular degeneration has an effect on visual outcome. In this retrospective study, 45 patients with exudative age-related macular degeneration were split into 3 groups depending on the duration of visual symptoms--Group I: <1 month, Group II: 1 month to 6 months, and Group III: >6 months. Best-corrected visual acuity, clinical ophthalmologic examination, and central retinal thickness as measured by optical coherence tomography were recorded at baseline and 2 months later. Fluorescein angiography was performed at baseline. Treatment consisted of 2 intravitreal injections of 1.25 mg of ranibizumab at baseline and after 4 weeks. The mean time span between initial symptoms and treatment was 59 ± 62 days. In all groups, a reduction of retinal thickness was observed. Shorter disease duration, as estimated by persistence of visual symptoms, was correlated with a better visual outcome after treatment. Patients in Group I demonstrated a significant increase in best-corrected visual acuity (P = 0.007). Patients of Group II (P = 0.095) and Group III (P = 0.271) still achieved a visual improvement in best-corrected visual acuity, albeit not significant. The mean change in best-corrected visual acuity was 0.08 ± 0.1 in all patients and was not statistically significant between groups (P = 0.87). Duration of visual symptoms <1 month before treatment is associated with a better visual outcome. Treatment of new-onset wet age-related macular degeneration should be initiated as soon as possible.

  4. Weighted Feature Significance: A Simple, Interpretable Model of Compound Toxicity Based on the Statistical Enrichment of Structural Features

    PubMed Central

    Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.

    2009-01-01

    In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409

  5. A systematic review and meta-analysis of acute stroke unit care: What’s beyond the statistical significance?

    PubMed Central

    2013-01-01

    Background The benefits of stroke unit care in terms of reducing death, dependency and institutional care were demonstrated in a 2009 Cochrane review carried out by the Stroke Unit Trialists’ Collaboration. Methods As requested by the Belgian health authorities, a systematic review and meta-analysis of the effect of acute stroke units was performed. Clinical trials mentioned in the original Cochrane review were included. In addition, an electronic database search on Medline, Embase, the Cochrane Central Register of Controlled Trials, and Physiotherapy Evidence Database (PEDro) was conducted to identify trials published since 2006. Trials investigating acute stroke units compared to alternative care were eligible for inclusion. Study quality was appraised according to the criteria recommended by Scottish Intercollegiate Guidelines Network (SIGN) and the GRADE system. In the meta-analysis, dichotomous outcomes were estimated by calculating odds ratios (OR) and continuous outcomes were estimated by calculating standardized mean differences. The weight of a study was calculated based on inverse variance. Results Evidence from eight trials comparing acute stroke unit and conventional care (general medical ward) were retained for the main synthesis and analysis. The findings from this study were broadly in line with the original Cochrane review: acute stroke units can improve survival and independency, as well as reduce the chance of hospitalization and the length of inpatient stay. The improvement with stroke unit care on mortality was less conclusive and only reached borderline level of significance (OR 0.84, 95% CI 0.70 to 1.00, P = 0.05). This improvement became statistically non-significant (OR 0.87, 95% CI 0.74 to 1.03, P = 0.12) when data from two unpublished trials (Goteborg-Ostra and Svendborg) were added to the analysis. After further also adding two additional trials (Beijing, Stockholm) with very short observation periods (until discharge), the

  6. Determining coding CpG islands by identifying regions significant for pattern statistics on Markov chains.

    PubMed

    Singer, Meromit; Engström, Alexander; Schönhuth, Alexander; Pachter, Lior

    2011-09-23

    Recent experimental and computational work confirms that CpGs can be unmethylated inside coding exons, thereby showing that codons may be subjected to both genomic and epigenomic constraint. It is therefore of interest to identify coding CpG islands (CCGIs) that are regions inside exons enriched for CpGs. The difficulty in identifying such islands is that coding exons exhibit sequence biases determined by codon usage and constraints that must be taken into account. We present a method for finding CCGIs that showcases a novel approach we have developed for identifying regions of interest that are significant (with respect to a Markov chain) for the counts of any pattern. Our method begins with the exact computation of tail probabilities for the number of CpGs in all regions contained in coding exons, and then applies a greedy algorithm for selecting islands from among the regions. We show that the greedy algorithm provably optimizes a biologically motivated criterion for selecting islands while controlling the false discovery rate. We applied this approach to the human genome (hg18) and annotated CpG islands in coding exons. The statistical criterion we apply to evaluating islands reduces the number of false positives in existing annotations, while our approach to defining islands reveals significant numbers of undiscovered CCGIs in coding exons. Many of these appear to be examples of functional epigenetic specialization in coding exons.

  7. Time series, periodograms, and significance

    NASA Astrophysics Data System (ADS)

    Hernandez, G.

    1999-05-01

    The geophysical literature shows a wide and conflicting usage of methods employed to extract meaningful information on coherent oscillations from measurements. This makes it difficult, if not impossible, to relate the findings reported by different authors. Therefore, we have undertaken a critical investigation of the tests and methodology used for determining the presence of statistically significant coherent oscillations in periodograms derived from time series. Statistical significance tests are only valid when performed on the independent frequencies present in a measurement. Both the number of possible independent frequencies in a periodogram and the significance tests are determined by the number of degrees of freedom, which is the number of true independent measurements, present in the time series, rather than the number of sample points in the measurement. The number of degrees of freedom is an intrinsic property of the data, and it must be determined from the serial coherence of the time series. As part of this investigation, a detailed study has been performed which clearly illustrates the deleterious effects that the apparently innocent and commonly used processes of filtering, de-trending, and tapering of data have on periodogram analysis and the consequent difficulties in the interpretation of the statistical significance thus derived. For the sake of clarity, a specific example of actual field measurements containing unevenly-spaced measurements, gaps, etc., as well as synthetic examples, have been used to illustrate the periodogram approach, and pitfalls, leading to the (statistical) significance tests for the presence of coherent oscillations. Among the insights of this investigation are: (1) the concept of a time series being (statistically) band limited by its own serial coherence and thus having a critical sampling rate which defines one of the necessary requirements for the proper statistical design of an experiment; (2) the design of a critical

  8. Statistical Modeling of Occupational Exposure to Polycyclic Aromatic Hydrocarbons Using OSHA Data.

    PubMed

    Lee, Derrick G; Lavoué, Jérôme; Spinelli, John J; Burstyn, Igor

    2015-01-01

    Polycyclic aromatic hydrocarbons (PAHs) are a group of pollutants with multiple variants classified as carcinogenic. The Occupational Safety and Health Administration (OSHA) provided access to two PAH exposure databanks of United States workplace compliance testing data collected between 1979 and 2010. Mixed-effects logistic models were used to predict the exceedance fraction (EF), i.e., the probability of exceeding OSHA's Permissible Exposure Limit (PEL = 0.2 mg/m3) for PAHs based on industry and occupation. Measurements of coal tar pitch volatiles were used as a surrogate for PAHs. Time, databank, occupation, and industry were included as fixed-effects while an identifier for the compliance inspection number was included as a random effect. Analyses involved 2,509 full-shift personal measurements. Results showed that the majority of industries had an estimated EF < 0.5, although several industries, including Standardized Industry Classification codes 1623 (Water, Sewer, Pipeline, and Communication and Powerline Construction), 1711 (Plumbing, Heating, and Air-Conditioning), 2824 (Manmade Organic Fibres), 3496 (Misc. Fabricated Wire products), and 5812 (Eating Places), and Major group's 13 (Oil and Gas Extraction) and 30 (Rubber and Miscellaneous Plastic Products), were estimated to have more than an 80% likelihood of exceeding the PEL. There was an inverse temporal trend of exceeding the PEL, with lower risk in most recent years, albeit not statistically significant. Similar results were shown when incorporating occupation, but varied depending on the occupation as the majority of industries predicted at the administrative level, e.g., managers, had an estimated EF < 0.5 while at the minimally skilled/laborer level there was a substantial increase in the estimated EF. These statistical models allow the prediction of PAH exposure risk through individual occupational histories and will be used to create a job-exposure matrix for use in a population-based case

  9. Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains.

    PubMed

    Xia, Li C; Ai, Dongmei; Cram, Jacob A; Liang, Xiaoyi; Fuhrman, Jed A; Sun, Fengzhu

    2015-09-21

    Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.

  10. Consomic mouse strain selection based on effect size measurement, statistical significance testing and integrated behavioral z-scoring: focus on anxiety-related behavior and locomotion.

    PubMed

    Labots, M; Laarakker, M C; Ohl, F; van Lith, H A

    2016-06-29

    Selecting chromosome substitution strains (CSSs, also called consomic strains/lines) used in the search for quantitative trait loci (QTLs) consistently requires the identification of the respective phenotypic trait of interest and is simply based on a significant difference between a consomic and host strain. However, statistical significance as represented by P values does not necessarily predicate practical importance. We therefore propose a method that pays attention to both the statistical significance and the actual size of the observed effect. The present paper extends on this approach and describes in more detail the use of effect size measures (Cohen's d, partial eta squared - η p (2) ) together with the P value as statistical selection parameters for the chromosomal assignment of QTLs influencing anxiety-related behavior and locomotion in laboratory mice. The effect size measures were based on integrated behavioral z-scoring and were calculated in three experiments: (A) a complete consomic male mouse panel with A/J as the donor strain and C57BL/6J as the host strain. This panel, including host and donor strains, was analyzed in the modified Hole Board (mHB). The consomic line with chromosome 19 from A/J (CSS-19A) was selected since it showed increased anxiety-related behavior, but similar locomotion compared to its host. (B) Following experiment A, female CSS-19A mice were compared with their C57BL/6J counterparts; however no significant differences and effect sizes close to zero were found. (C) A different consomic mouse strain (CSS-19PWD), with chromosome 19 from PWD/PhJ transferred on the genetic background of C57BL/6J, was compared with its host strain. Here, in contrast with CSS-19A, there was a decreased overall anxiety in CSS-19PWD compared to C57BL/6J males, but not locomotion. This new method shows an improved way to identify CSSs for QTL analysis for anxiety-related behavior using a combination of statistical significance testing and effect

  11. Antecedents of students' achievement in statistics

    NASA Astrophysics Data System (ADS)

    Awaludin, Izyan Syazana; Razak, Ruzanna Ab; Harris, Hezlin; Selamat, Zarehan

    2015-02-01

    The applications of statistics in most fields have been vast. Many degree programmes at local universities require students to enroll in at least one statistics course. The standard of these courses varies across different degree programmes. This is because of students' diverse academic backgrounds in which some comes far from the field of statistics. The high failure rate in statistics courses for non-science stream students had been concerning every year. The purpose of this research is to investigate the antecedents of students' achievement in statistics. A total of 272 students participated in the survey. Multiple linear regression was applied to examine the relationship between the factors and achievement. We found that statistics anxiety was a significant predictor of students' achievement. We also found that students' age has significant effect to achievement. Older students are more likely to achieve lowers scores in statistics. Student's level of study also has a significant impact on their achievement in statistics.

  12. Worry, Intolerance of Uncertainty, and Statistics Anxiety

    ERIC Educational Resources Information Center

    Williams, Amanda S.

    2013-01-01

    Statistics anxiety is a problem for most graduate students. This study investigates the relationship between intolerance of uncertainty, worry, and statistics anxiety. Intolerance of uncertainty was significantly related to worry, and worry was significantly related to three types of statistics anxiety. Six types of statistics anxiety were…

  13. Reducing statistics anxiety and enhancing statistics learning achievement: effectiveness of a one-minute strategy.

    PubMed

    Chiou, Chei-Chang; Wang, Yu-Min; Lee, Li-Tze

    2014-08-01

    Statistical knowledge is widely used in academia; however, statistics teachers struggle with the issue of how to reduce students' statistics anxiety and enhance students' statistics learning. This study assesses the effectiveness of a "one-minute paper strategy" in reducing students' statistics-related anxiety and in improving students' statistics-related achievement. Participants were 77 undergraduates from two classes enrolled in applied statistics courses. An experiment was implemented according to a pretest/posttest comparison group design. The quasi-experimental design showed that the one-minute paper strategy significantly reduced students' statistics anxiety and improved students' statistics learning achievement. The strategy was a better instructional tool than the textbook exercise for reducing students' statistics anxiety and improving students' statistics achievement.

  14. Influences on Labor Market Outcomes of African American College Graduates: A National Study

    ERIC Educational Resources Information Center

    Strayhorn, Terrell L.

    2008-01-01

    Using an expanded econometric model, this study sought to estimate more precisely the net effect of independent variables (i.e., attending an HBCU) on three measures of labor market outcomes for African American college graduates. Findings reveal a statistically significant, albeit moderate, relationship between measures of background, human and…

  15. Testing earthquake prediction algorithms: Statistically significant advance prediction of the largest earthquakes in the Circum-Pacific, 1992-1997

    USGS Publications Warehouse

    Kossobokov, V.G.; Romashkova, L.L.; Keilis-Borok, V. I.; Healy, J.H.

    1999-01-01

    Algorithms M8 and MSc (i.e., the Mendocino Scenario) were used in a real-time intermediate-term research prediction of the strongest earthquakes in the Circum-Pacific seismic belt. Predictions are made by M8 first. Then, the areas of alarm are reduced by MSc at the cost that some earthquakes are missed in the second approximation of prediction. In 1992-1997, five earthquakes of magnitude 8 and above occurred in the test area: all of them were predicted by M8 and MSc identified correctly the locations of four of them. The space-time volume of the alarms is 36% and 18%, correspondingly, when estimated with a normalized product measure of empirical distribution of epicenters and uniform time. The statistical significance of the achieved results is beyond 99% both for M8 and MSc. For magnitude 7.5 + , 10 out of 19 earthquakes were predicted by M8 in 40% and five were predicted by M8-MSc in 13% of the total volume considered. This implies a significance level of 81% for M8 and 92% for M8-MSc. The lower significance levels might result from a global change in seismic regime in 1993-1996, when the rate of the largest events has doubled and all of them become exclusively normal or reversed faults. The predictions are fully reproducible; the algorithms M8 and MSc in complete formal definitions were published before we started our experiment [Keilis-Borok, V.I., Kossobokov, V.G., 1990. Premonitory activation of seismic flow: Algorithm M8, Phys. Earth and Planet. Inter. 61, 73-83; Kossobokov, V.G., Keilis-Borok, V.I., Smith, S.W., 1990. Localization of intermediate-term earthquake prediction, J. Geophys. Res., 95, 19763-19772; Healy, J.H., Kossobokov, V.G., Dewey, J.W., 1992. A test to evaluate the earthquake prediction algorithm, M8. U.S. Geol. Surv. OFR 92-401]. M8 is available from the IASPEI Software Library [Healy, J.H., Keilis-Borok, V.I., Lee, W.H.K. (Eds.), 1997. Algorithms for Earthquake Statistics and Prediction, Vol. 6. IASPEI Software Library]. ?? 1999 Elsevier

  16. A novel complete-case analysis to determine statistical significance between treatments in an intention-to-treat population of randomized clinical trials involving missing data.

    PubMed

    Liu, Wei; Ding, Jinhui

    2018-04-01

    The application of the principle of the intention-to-treat (ITT) to the analysis of clinical trials is challenged in the presence of missing outcome data. The consequences of stopping an assigned treatment in a withdrawn subject are unknown. It is difficult to make a single assumption about missing mechanisms for all clinical trials because there are complicated reactions in the human body to drugs due to the presence of complex biological networks, leading to data missing randomly or non-randomly. Currently there is no statistical method that can tell whether a difference between two treatments in the ITT population of a randomized clinical trial with missing data is significant at a pre-specified level. Making no assumptions about the missing mechanisms, we propose a generalized complete-case (GCC) analysis based on the data of completers. An evaluation of the impact of missing data on the ITT analysis reveals that a statistically significant GCC result implies a significant treatment effect in the ITT population at a pre-specified significance level unless, relative to the comparator, the test drug is poisonous to the non-completers as documented in their medical records. Applications of the GCC analysis are illustrated using literature data, and its properties and limits are discussed.

  17. Null-space and statistical significance of first-arrival traveltime inversion

    NASA Astrophysics Data System (ADS)

    Morozov, Igor B.

    2004-03-01

    The strong uncertainty inherent in the traveltime inversion of first arrivals from surface sources is usually removed by using a priori constraints or regularization. This leads to the null-space (data-independent model variability) being inadequately sampled, and consequently, model uncertainties may be underestimated in traditional (such as checkerboard) resolution tests. To measure the full null-space model uncertainties, we use unconstrained Monte Carlo inversion and examine the statistics of the resulting model ensembles. In an application to 1-D first-arrival traveltime inversion, the τ-p method is used to build a set of models that are equivalent to the IASP91 model within small, ~0.02 per cent, time deviations. The resulting velocity variances are much larger, ~2-3 per cent within the regions above the mantle discontinuities, and are interpreted as being due to the null-space. Depth-variant depth averaging is required for constraining the velocities within meaningful bounds, and the averaging scalelength could also be used as a measure of depth resolution. Velocity variances show structure-dependent, negative correlation with the depth-averaging scalelength. Neither the smoothest (Herglotz-Wiechert) nor the mean velocity-depth functions reproduce the discontinuities in the IASP91 model; however, the discontinuities can be identified by the increased null-space velocity (co-)variances. Although derived for a 1-D case, the above conclusions also relate to higher dimensions.

  18. Twenty Five Years of Cognitive Care Education Research: Time for a Revolutionary Change

    ERIC Educational Resources Information Center

    Porter, Russell; Berry, Jeremy; Cude, Kellie; Anderson, Stephen; Britt, Sanfrena

    2018-01-01

    This is the third study of Cognitive Care Education in New York State nursing homes using cross-sectional methods over a 25 year period. The data indicate that the Cognitive Care Education increased at statistically significant levels, albeit by evolutionary means. It is now time for "A Revolutionary Change," for Cognitive Care…

  19. Analysis of variance to assess statistical significance of Laplacian estimation accuracy improvement due to novel variable inter-ring distances concentric ring electrodes.

    PubMed

    Makeyev, Oleksandr; Joe, Cody; Lee, Colin; Besio, Walter G

    2017-07-01

    Concentric ring electrodes have shown promise in non-invasive electrophysiological measurement demonstrating their superiority to conventional disc electrodes, in particular, in accuracy of Laplacian estimation. Recently, we have proposed novel variable inter-ring distances concentric ring electrodes. Analytic and finite element method modeling results for linearly increasing distances electrode configurations suggested they may decrease the truncation error resulting in more accurate Laplacian estimates compared to currently used constant inter-ring distances configurations. This study assesses statistical significance of Laplacian estimation accuracy improvement due to novel variable inter-ring distances concentric ring electrodes. Full factorial design of analysis of variance was used with one categorical and two numerical factors: the inter-ring distances, the electrode diameter, and the number of concentric rings in the electrode. The response variables were the Relative Error and the Maximum Error of Laplacian estimation computed using a finite element method model for each of the combinations of levels of three factors. Effects of the main factors and their interactions on Relative Error and Maximum Error were assessed and the obtained results suggest that all three factors have statistically significant effects in the model confirming the potential of using inter-ring distances as a means of improving accuracy of Laplacian estimation.

  20. Evaluating statistical and clinical significance of intervention effects in single-case experimental designs: an SPSS method to analyze univariate data.

    PubMed

    Maric, Marija; de Haan, Else; Hogendoorn, Sanne M; Wolters, Lidewij H; Huizenga, Hilde M

    2015-03-01

    Single-case experimental designs are useful methods in clinical research practice to investigate individual client progress. Their proliferation might have been hampered by methodological challenges such as the difficulty applying existing statistical procedures. In this article, we describe a data-analytic method to analyze univariate (i.e., one symptom) single-case data using the common package SPSS. This method can help the clinical researcher to investigate whether an intervention works as compared with a baseline period or another intervention type, and to determine whether symptom improvement is clinically significant. First, we describe the statistical method in a conceptual way and show how it can be implemented in SPSS. Simulation studies were performed to determine the number of observation points required per intervention phase. Second, to illustrate this method and its implications, we present a case study of an adolescent with anxiety disorders treated with cognitive-behavioral therapy techniques in an outpatient psychotherapy clinic, whose symptoms were regularly assessed before each session. We provide a description of the data analyses and results of this case study. Finally, we discuss the advantages and shortcomings of the proposed method. Copyright © 2014. Published by Elsevier Ltd.

  1. Performance studies of GooFit on GPUs vs RooFit on CPUs while estimating the statistical significance of a new physical signal

    NASA Astrophysics Data System (ADS)

    Di Florio, Adriano

    2017-10-01

    In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B + → J/ψϕK +. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.

  2. Massage induces an immediate, albeit short-term, reduction in muscle stiffness.

    PubMed

    Eriksson Crommert, M; Lacourpaille, L; Heales, L J; Tucker, K; Hug, F

    2015-10-01

    Using ultrasound shear wave elastography, the aims of this study were: (a) to evaluate the effect of massage on stiffness of the medial gastrocnemius (MG) muscle and (b) to determine whether this effect (if any) persists over a short period of rest. A 7-min massage protocol was performed unilaterally on MG in 18 healthy volunteers. Measurements of muscle shear elastic modulus (stiffness) were performed bilaterally (control and massaged leg) in a moderately stretched position at three time points: before massage (baseline), directly after massage (follow-up 1), and following 3 min of rest (follow-up 2). Directly after massage, participants rated pain experienced during the massage. MG shear elastic modulus of the massaged leg decreased significantly at follow-up 1 (-5.2 ± 8.8%, P = 0.019, d = -0.66). There was no difference between follow-up 2 and baseline for the massaged leg (P = 0.83) indicating that muscle stiffness returned to baseline values. Shear elastic modulus was not different between time points in the control leg. There was no association between perceived pain during the massage and stiffness reduction (r = 0.035; P = 0.89). This is the first study to provide evidence that massage reduces muscle stiffness. However, this effect is short lived and returns to baseline values quickly after cessation of the massage. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  3. Statistical Significance and Baseline Monitoring.

    DTIC Science & Technology

    1984-07-01

    impacted at once........................... 24 6 Observed versus nominal a levels for multivariate tests of data sets (50 runs of 4 groups each...cumulative proportion of the observations found for each nominal level. The results of the comparisons of the observed versus nominal a levels for the...a values are always higher than nominal levels. Virtual- . .,ly all nominal a levels are below 0.20. In other words, the discriminant analysis models

  4. Teaching Statistics in Biology: Using Inquiry-based Learning to Strengthen Understanding of Statistical Analysis in Biology Laboratory Courses

    PubMed Central

    2008-01-01

    There is an increasing need for students in the biological sciences to build a strong foundation in quantitative approaches to data analyses. Although most science, engineering, and math field majors are required to take at least one statistics course, statistical analysis is poorly integrated into undergraduate biology course work, particularly at the lower-division level. Elements of statistics were incorporated into an introductory biology course, including a review of statistics concepts and opportunity for students to perform statistical analysis in a biological context. Learning gains were measured with an 11-item statistics learning survey instrument developed for the course. Students showed a statistically significant 25% (p < 0.005) increase in statistics knowledge after completing introductory biology. Students improved their scores on the survey after completing introductory biology, even if they had previously completed an introductory statistics course (9%, improvement p < 0.005). Students retested 1 yr after completing introductory biology showed no loss of their statistics knowledge as measured by this instrument, suggesting that the use of statistics in biology course work may aid long-term retention of statistics knowledge. No statistically significant differences in learning were detected between male and female students in the study. PMID:18765754

  5. Teaching statistics in biology: using inquiry-based learning to strengthen understanding of statistical analysis in biology laboratory courses.

    PubMed

    Metz, Anneke M

    2008-01-01

    There is an increasing need for students in the biological sciences to build a strong foundation in quantitative approaches to data analyses. Although most science, engineering, and math field majors are required to take at least one statistics course, statistical analysis is poorly integrated into undergraduate biology course work, particularly at the lower-division level. Elements of statistics were incorporated into an introductory biology course, including a review of statistics concepts and opportunity for students to perform statistical analysis in a biological context. Learning gains were measured with an 11-item statistics learning survey instrument developed for the course. Students showed a statistically significant 25% (p < 0.005) increase in statistics knowledge after completing introductory biology. Students improved their scores on the survey after completing introductory biology, even if they had previously completed an introductory statistics course (9%, improvement p < 0.005). Students retested 1 yr after completing introductory biology showed no loss of their statistics knowledge as measured by this instrument, suggesting that the use of statistics in biology course work may aid long-term retention of statistics knowledge. No statistically significant differences in learning were detected between male and female students in the study.

  6. When is Chemical Similarity Significant? The Statistical Distribution of Chemical Similarity Scores and Its Extreme Values

    PubMed Central

    Baldi, Pierre

    2010-01-01

    As repositories of chemical molecules continue to expand and become more open, it becomes increasingly important to develop tools to search them efficiently and assess the statistical significance of chemical similarity scores. Here we develop a general framework for understanding, modeling, predicting, and approximating the distribution of chemical similarity scores and its extreme values in large databases. The framework can be applied to different chemical representations and similarity measures but is demonstrated here using the most common binary fingerprints with the Tanimoto similarity measure. After introducing several probabilistic models of fingerprints, including the Conditional Gaussian Uniform model, we show that the distribution of Tanimoto scores can be approximated by the distribution of the ratio of two correlated Normal random variables associated with the corresponding unions and intersections. This remains true also when the distribution of similarity scores is conditioned on the size of the query molecules in order to derive more fine-grained results and improve chemical retrieval. The corresponding extreme value distributions for the maximum scores are approximated by Weibull distributions. From these various distributions and their analytical forms, Z-scores, E-values, and p-values are derived to assess the significance of similarity scores. In addition, the framework allows one to predict also the value of standard chemical retrieval metrics, such as Sensitivity and Specificity at fixed thresholds, or ROC (Receiver Operating Characteristic) curves at multiple thresholds, and to detect outliers in the form of atypical molecules. Numerous and diverse experiments carried in part with large sets of molecules from the ChemDB show remarkable agreement between theory and empirical results. PMID:20540577

  7. Meteor trail footprint statistics

    NASA Astrophysics Data System (ADS)

    Mui, S. Y.; Ellicott, R. C.

    Footprint statistics derived from field-test data are presented. The statistics are the probability that two receivers will lie in the same footprint. The dependence of the footprint statistics on the transmitter range, link orientation, and antenna polarization are examined. Empirical expressions for the footprint statistics are presented. The need to distinguish the instantaneous footprint, which is the area illuminated at a particular instant, from the composite footprint, which is the total area illuminated during the lifetime of the meteor trail, is explained. The statistics for the instantaneous and composite footprints have been found to be similar. The only significant difference lies in the parameter that represents the probability of two colocated receivers being in the same footprint. The composite footprint statistics can be used to calculate the space diversity gain of a multiple-receiver system. The instantaneous footprint statistics are useful in the evaluation of the interference probability in a network of meteor burst communication nodes.

  8. Statistical lamb wave localization based on extreme value theory

    NASA Astrophysics Data System (ADS)

    Harley, Joel B.

    2018-04-01

    Guided wave localization methods based on delay-and-sum imaging, matched field processing, and other techniques have been designed and researched to create images that locate and describe structural damage. The maximum value of these images typically represent an estimated damage location. Yet, it is often unclear if this maximum value, or any other value in the image, is a statistically significant indicator of damage. Furthermore, there are currently few, if any, approaches to assess the statistical significance of guided wave localization images. As a result, we present statistical delay-and-sum and statistical matched field processing localization methods to create statistically significant images of damage. Our framework uses constant rate of false alarm statistics and extreme value theory to detect damage with little prior information. We demonstrate our methods with in situ guided wave data from an aluminum plate to detect two 0.75 cm diameter holes. Our results show an expected improvement in statistical significance as the number of sensors increase. With seventeen sensors, both methods successfully detect damage with statistical significance.

  9. A Technology-Based Statistical Reasoning Assessment Tool in Descriptive Statistics for Secondary School Students

    ERIC Educational Resources Information Center

    Chan, Shiau Wei; Ismail, Zaleha

    2014-01-01

    The focus of assessment in statistics has gradually shifted from traditional assessment towards alternative assessment where more attention has been paid to the core statistical concepts such as center, variability, and distribution. In spite of this, there are comparatively few assessments that combine the significant three types of statistical…

  10. How to read a paper. Statistics for the non-statistician. II: "Significant" relations and their pitfalls.

    PubMed

    Greenhalgh, T

    1997-08-16

    It is possible to be seriously misled by taking the statistical competence (and/or the intellectual honesty) of authors for granted. Some common errors committed (deliberately or inadvertently) by the authors of papers are given in the final box.

  11. How to read a paper. Statistics for the non-statistician. II: "Significant" relations and their pitfalls.

    PubMed Central

    Greenhalgh, T.

    1997-01-01

    It is possible to be seriously misled by taking the statistical competence (and/or the intellectual honesty) of authors for granted. Some common errors committed (deliberately or inadvertently) by the authors of papers are given in the final box. PMID:9277611

  12. Statistics Anxiety among Postgraduate Students

    ERIC Educational Resources Information Center

    Koh, Denise; Zawi, Mohd Khairi

    2014-01-01

    Most postgraduate programmes, that have research components, require students to take at least one course of research statistics. Not all postgraduate programmes are science based, there are a significant number of postgraduate students who are from the social sciences that will be taking statistics courses, as they try to complete their…

  13. Statistics Using Just One Formula

    ERIC Educational Resources Information Center

    Rosenthal, Jeffrey S.

    2018-01-01

    This article advocates that introductory statistics be taught by basing all calculations on a single simple margin-of-error formula and deriving all of the standard introductory statistical concepts (confidence intervals, significance tests, comparisons of means and proportions, etc) from that one formula. It is argued that this approach will…

  14. Will health fund rationalisation lead to significant premium reductions?

    PubMed

    Hanning, Brian

    2003-01-01

    It has been suggested that rationalisation of health funds will generate significant albeit unquantified cost savings and thus hold or reduce health fund premiums. 2001-2 Private Health Industry Administration Council (PHIAC) data has been used to analyse these suggestions. Payments by funds for clinical services will not vary after fund rationalisation. The savings after rationalisation will arise from reductions in management expenses, which form 10.9% of total fund expenditure. A number of rationalisation scenarios are considered. The highest theoretical industry wide saving found in any plausible scenario is 2.5%, and it is uncertain whether this level of saving could be achieved in practice. If a one off saving of this order were achieved, it would have no medium and long term impact on fund premiums increases given funds are facing cost increases of 4% to 5% per annum due to demographic changes and age standardised utilization increases. It is suggested discussions on fund amalgamation divert attention from the major factors increasing fund costs, which are substantially beyond fund control.

  15. Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science.

    PubMed

    Veldkamp, Coosje L S; Nuijten, Michèle B; Dominguez-Alvarez, Linda; van Assen, Marcel A L M; Wicherts, Jelte M

    2014-01-01

    Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this 'co-piloting' currently occurs in psychology, we surveyed the authors of 697 articles published in six top psychology journals and asked them whether they had collaborated on four aspects of analyzing data and reporting results, and whether the described data had been shared between the authors. We acquired responses for 49.6% of the articles and found that co-piloting on statistical analysis and reporting results is quite uncommon among psychologists, while data sharing among co-authors seems reasonably but not completely standard. We then used an automated procedure to study the prevalence of statistical reporting errors in the articles in our sample and examined the relationship between reporting errors and co-piloting. Overall, 63% of the articles contained at least one p-value that was inconsistent with the reported test statistic and the accompanying degrees of freedom, and 20% of the articles contained at least one p-value that was inconsistent to such a degree that it may have affected decisions about statistical significance. Overall, the probability that a given p-value was inconsistent was over 10%. Co-piloting was not found to be associated with reporting errors.

  16. Statistics 101 for Radiologists.

    PubMed

    Anvari, Arash; Halpern, Elkan F; Samir, Anthony E

    2015-10-01

    Diagnostic tests have wide clinical applications, including screening, diagnosis, measuring treatment effect, and determining prognosis. Interpreting diagnostic test results requires an understanding of key statistical concepts used to evaluate test efficacy. This review explains descriptive statistics and discusses probability, including mutually exclusive and independent events and conditional probability. In the inferential statistics section, a statistical perspective on study design is provided, together with an explanation of how to select appropriate statistical tests. Key concepts in recruiting study samples are discussed, including representativeness and random sampling. Variable types are defined, including predictor, outcome, and covariate variables, and the relationship of these variables to one another. In the hypothesis testing section, we explain how to determine if observed differences between groups are likely to be due to chance. We explain type I and II errors, statistical significance, and study power, followed by an explanation of effect sizes and how confidence intervals can be used to generalize observed effect sizes to the larger population. Statistical tests are explained in four categories: t tests and analysis of variance, proportion analysis tests, nonparametric tests, and regression techniques. We discuss sensitivity, specificity, accuracy, receiver operating characteristic analysis, and likelihood ratios. Measures of reliability and agreement, including κ statistics, intraclass correlation coefficients, and Bland-Altman graphs and analysis, are introduced. © RSNA, 2015.

  17. Forest statistics of western Kentucky

    Treesearch

    The Forest Survey Organization Central States Forest Experiment Station

    1950-01-01

    This Survey Release presents the more significant preliminary statistics on the forest area and timber volume for the western region of Kentucky. Similar reports for the remainder of the state will be published as soon as statistical tabulations are completed. Later, an analytical report for the state will be published which will interpret forest area, timber volume,...

  18. Forest statistics of southern Indiana

    Treesearch

    The Forest Survey Organization Central States Forest Experiment Station

    1951-01-01

    This Survey Release presents the more significant preliminary statistics on the forest area and timber volume for each of the three regions of southern Indiana. A similar report will be published for the two northern Indiana regions. Later, an analytical report for the state will be published which will interpret statistics on forest area, timber- volume, growth, and...

  19. Assessing the statistical significance of the achieved classification error of classifiers constructed using serum peptide profiles, and a prescription for random sampling repeated studies for massive high-throughput genomic and proteomic studies.

    PubMed

    Lyons-Weiler, James; Pelikan, Richard; Zeh, Herbert J; Whitcomb, David C; Malehorn, David E; Bigbee, William L; Hauskrecht, Milos

    2005-01-01

    Peptide profiles generated using SELDI/MALDI time of flight mass spectrometry provide a promising source of patient-specific information with high potential impact on the early detection and classification of cancer and other diseases. The new profiling technology comes, however, with numerous challenges and concerns. Particularly important are concerns of reproducibility of classification results and their significance. In this work we describe a computational validation framework, called PACE (Permutation-Achieved Classification Error), that lets us assess, for a given classification model, the significance of the Achieved Classification Error (ACE) on the profile data. The framework compares the performance statistic of the classifier on true data samples and checks if these are consistent with the behavior of the classifier on the same data with randomly reassigned class labels. A statistically significant ACE increases our belief that a discriminative signal was found in the data. The advantage of PACE analysis is that it can be easily combined with any classification model and is relatively easy to interpret. PACE analysis does not protect researchers against confounding in the experimental design, or other sources of systematic or random error. We use PACE analysis to assess significance of classification results we have achieved on a number of published data sets. The results show that many of these datasets indeed possess a signal that leads to a statistically significant ACE.

  20. Subharmonic Imaging and Pressure Estimation for Monitoring Neoadjuvant Chemotherapy

    DTIC Science & Technology

    2015-11-01

    ultrasound contrast agents to improve the monitoring of breast cancer treatment response to neoadjuvant therapies in women diagnosed with LABC by imaging...estimation (SHAPE). Software for analyzing RF data from a Logiq 9 ultrasound scanner (GE Healthcare, Milwauke, WI) to produce 3D SHAPE pressure...responders; albeit not statistically significant (p > 0.19). 14. SUBJECT TERMS Breast Cancer, Ultrasound Imaging, Ultrasound Contrast Agent, Pressure

  1. Statistics of Statisticians: Critical Mass of Statistics and Operational Research Groups

    NASA Astrophysics Data System (ADS)

    Kenna, Ralph; Berche, Bertrand

    Using a recently developed model, inspired by mean field theory in statistical physics, and data from the UK's Research Assessment Exercise, we analyse the relationship between the qualities of statistics and operational research groups and the quantities of researchers in them. Similar to other academic disciplines, we provide evidence for a linear dependency of quality on quantity up to an upper critical mass, which is interpreted as the average maximum number of colleagues with whom a researcher can communicate meaningfully within a research group. The model also predicts a lower critical mass, which research groups should strive to achieve to avoid extinction. For statistics and operational research, the lower critical mass is estimated to be 9 ± 3. The upper critical mass, beyond which research quality does not significantly depend on group size, is 17 ± 6.

  2. BETTER STATISTICS FOR BETTER DECISIONS: REJECTING NULL HYPOTHESES STATISTICAL TESTS IN FAVOR OF REPLICATION STATISTICS

    PubMed Central

    SANABRIA, FEDERICO; KILLEEN, PETER R.

    2008-01-01

    Despite being under challenge for the past 50 years, null hypothesis significance testing (NHST) remains dominant in the scientific field for want of viable alternatives. NHST, along with its significance level p, is inadequate for most of the uses to which it is put, a flaw that is of particular interest to educational practitioners who too often must use it to sanctify their research. In this article, we review the failure of NHST and propose prep, the probability of replicating an effect, as a more useful statistic for evaluating research and aiding practical decision making. PMID:19122766

  3. Ergodic Theory, Interpretations of Probability and the Foundations of Statistical Mechanics

    NASA Astrophysics Data System (ADS)

    van Lith, Janneke

    The traditional use of ergodic theory in the foundations of equilibrium statistical mechanics is that it provides a link between thermodynamic observables and microcanonical probabilities. First of all, the ergodic theorem demonstrates the equality of microcanonical phase averages and infinite time averages (albeit for a special class of systems, and up to a measure zero set of exceptions). Secondly, one argues that actual measurements of thermodynamic quantities yield time averaged quantities, since measurements take a long time. The combination of these two points is held to be an explanation why calculating microcanonical phase averages is a successful algorithm for predicting the values of thermodynamic observables. It is also well known that this account is problematic. This survey intends to show that ergodic theory nevertheless may have important roles to play, and it explores three other uses of ergodic theory. Particular attention is paid, firstly, to the relevance of specific interpretations of probability, and secondly, to the way in which the concern with systems in thermal equilibrium is translated into probabilistic language. With respect to the latter point, it is argued that equilibrium should not be represented as a stationary probability distribution as is standardly done; instead, a weaker definition is presented.

  4. Lessons from Inferentialism for Statistics Education

    ERIC Educational Resources Information Center

    Bakker, Arthur; Derry, Jan

    2011-01-01

    This theoretical paper relates recent interest in informal statistical inference (ISI) to the semantic theory termed inferentialism, a significant development in contemporary philosophy, which places inference at the heart of human knowing. This theory assists epistemological reflection on challenges in statistics education encountered when…

  5. Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science

    PubMed Central

    Veldkamp, Coosje L. S.; Nuijten, Michèle B.; Dominguez-Alvarez, Linda; van Assen, Marcel A. L. M.; Wicherts, Jelte M.

    2014-01-01

    Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this ‘co-piloting’ currently occurs in psychology, we surveyed the authors of 697 articles published in six top psychology journals and asked them whether they had collaborated on four aspects of analyzing data and reporting results, and whether the described data had been shared between the authors. We acquired responses for 49.6% of the articles and found that co-piloting on statistical analysis and reporting results is quite uncommon among psychologists, while data sharing among co-authors seems reasonably but not completely standard. We then used an automated procedure to study the prevalence of statistical reporting errors in the articles in our sample and examined the relationship between reporting errors and co-piloting. Overall, 63% of the articles contained at least one p-value that was inconsistent with the reported test statistic and the accompanying degrees of freedom, and 20% of the articles contained at least one p-value that was inconsistent to such a degree that it may have affected decisions about statistical significance. Overall, the probability that a given p-value was inconsistent was over 10%. Co-piloting was not found to be associated with reporting errors. PMID:25493918

  6. Lupus nephritis remission, albeit with positive anti-doping test.

    PubMed

    Jakez-Ocampo, Juan; Richaud-Patin, Yvonne; Llorente, Luis

    2007-01-01

    A 39-year-old woman developed systemic lupus erythematosus with nephropathy after a holiday in Jamaica. She was prescribed with prednisone, azathioprine and aspirin. As she was obsessed with aesthetic procedures, she decided not to take the prescription. Instead, she took her bodybuilding trainer's advice of one intramuscular injection of stanozolol for 10 weeks in order to increase her gluteus area. One week after finishing the latter regimen, there was no disease activity. Whether lupus remission in this patient was spontaneous or a consequence of stanozolol administration will remain a riddle for this fortunate outcome.

  7. Pisces did not have increased heart failure: data-driven comparisons of binary proportions between levels of a categorical variable can result in incorrect statistical significance levels.

    PubMed

    Austin, Peter C; Goldwasser, Meredith A

    2008-03-01

    We examined the impact on statistical inference when a chi(2) test is used to compare the proportion of successes in the level of a categorical variable that has the highest observed proportion of successes with the proportion of successes in all other levels of the categorical variable combined. Monte Carlo simulations and a case study examining the association between astrological sign and hospitalization for heart failure. A standard chi(2) test results in an inflation of the type I error rate, with the type I error rate increasing as the number of levels of the categorical variable increases. Using a standard chi(2) test, the hospitalization rate for Pisces was statistically significantly different from that of the other 11 astrological signs combined (P=0.026). After accounting for the fact that the selection of Pisces was based on it having the highest observed proportion of heart failure hospitalizations, subjects born under the sign of Pisces no longer had a significantly higher rate of heart failure hospitalization compared to the other residents of Ontario (P=0.152). Post hoc comparisons of the proportions of successes across different levels of a categorical variable can result in incorrect inferences.

  8. The Statistical Power of Planned Comparisons.

    ERIC Educational Resources Information Center

    Benton, Roberta L.

    Basic principles underlying statistical power are examined; and issues pertaining to effect size, sample size, error variance, and significance level are highlighted via the use of specific hypothetical examples. Analysis of variance (ANOVA) and related methods remain popular, although other procedures sometimes have more statistical power against…

  9. Understanding Statistics - Cancer Statistics

    Cancer.gov

    Annual reports of U.S. cancer statistics including new cases, deaths, trends, survival, prevalence, lifetime risk, and progress toward Healthy People targets, plus statistical summaries for a number of common cancer types.

  10. P-Value Club: Teaching Significance Level on the Dance Floor

    ERIC Educational Resources Information Center

    Gray, Jennifer

    2010-01-01

    Courses: Beginning research methods and statistics courses, as well as advanced communication courses that require reading research articles and completing research projects involving statistics. Objective: Students will understand the difference between significant and nonsignificant statistical results based on p-value.

  11. Confounding and Statistical Significance of Indirect Effects: Childhood Adversity, Education, Smoking, and Anxious and Depressive Symptomatology.

    PubMed

    Sheikh, Mashhood Ahmed

    2017-01-01

    association between childhood adversity and ADS in adulthood. However, when education was excluded as a mediator-response confounding variable, the indirect effect of childhood adversity on ADS in adulthood was statistically significant ( p < 0.05). This study shows that a careful inclusion of potential confounding variables is important when assessing mediation.

  12. Confounding and Statistical Significance of Indirect Effects: Childhood Adversity, Education, Smoking, and Anxious and Depressive Symptomatology

    PubMed Central

    Sheikh, Mashhood Ahmed

    2017-01-01

    association between childhood adversity and ADS in adulthood. However, when education was excluded as a mediator-response confounding variable, the indirect effect of childhood adversity on ADS in adulthood was statistically significant (p < 0.05). This study shows that a careful inclusion of potential confounding variables is important when assessing mediation. PMID:28824498

  13. [Big data in official statistics].

    PubMed

    Zwick, Markus

    2015-08-01

    The concept of "big data" stands to change the face of official statistics over the coming years, having an impact on almost all aspects of data production. The tasks of future statisticians will not necessarily be to produce new data, but rather to identify and make use of existing data to adequately describe social and economic phenomena. Until big data can be used correctly in official statistics, a lot of questions need to be answered and problems solved: the quality of data, data protection, privacy, and the sustainable availability are some of the more pressing issues to be addressed. The essential skills of official statisticians will undoubtedly change, and this implies a number of challenges to be faced by statistical education systems, in universities, and inside the statistical offices. The national statistical offices of the European Union have concluded a concrete strategy for exploring the possibilities of big data for official statistics, by means of the Big Data Roadmap and Action Plan 1.0. This is an important first step and will have a significant influence on implementing the concept of big data inside the statistical offices of Germany.

  14. Simple Statistics: - Summarized!

    ERIC Educational Resources Information Center

    Blai, Boris, Jr.

    Statistics are an essential tool for making proper judgement decisions. It is concerned with probability distribution models, testing of hypotheses, significance tests and other means of determining the correctness of deductions and the most likely outcome of decisions. Measures of central tendency include the mean, median and mode. A second…

  15. A statistical approach to selecting and confirming validation targets in -omics experiments

    PubMed Central

    2012-01-01

    Background Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. Results Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. Conclusions For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. PMID:22738145

  16. Predicting Success in Psychological Statistics Courses.

    PubMed

    Lester, David

    2016-06-01

    Many students perform poorly in courses on psychological statistics, and it is useful to be able to predict which students will have difficulties. In a study of 93 undergraduates enrolled in Statistical Methods (18 men, 75 women; M age = 22.0 years, SD = 5.1), performance was significantly associated with sex (female students performed better) and proficiency in algebra in a linear regression analysis. Anxiety about statistics was not associated with course performance, indicating that basic mathematical skills are the best correlate for performance in statistics courses and can usefully be used to stream students into classes by ability. © The Author(s) 2016.

  17. A Statistics Curriculum for the Undergraduate Chemistry Major

    ERIC Educational Resources Information Center

    Schlotter, Nicholas E.

    2013-01-01

    Our ability to statistically analyze data has grown significantly with the maturing of computer hardware and software. However, the evolution of our statistics capabilities has taken place without a corresponding evolution in the curriculum for the undergraduate chemistry major. Most faculty understands the need for a statistical educational…

  18. A Tablet-PC Software Application for Statistics Classes

    ERIC Educational Resources Information Center

    Probst, Alexandre C.

    2014-01-01

    A significant deficiency in the area of introductory statistics education exists: Student performance on standardized assessments after a full semester statistics course is poor and students report a very low desire to learn statistics. Research on the current generation of students indicates an affinity for technology and for multitasking.…

  19. Statistical Analyses for Probabilistic Assessments of the Reactor Pressure Vessel Structural Integrity: Building a Master Curve on an Extract of the 'Euro' Fracture Toughness Dataset, Controlling Statistical Uncertainty for Both Mono-Temperature and multi-temperature tests

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Josse, Florent; Lefebvre, Yannick; Todeschini, Patrick

    2006-07-01

    Assessing the structural integrity of a nuclear Reactor Pressure Vessel (RPV) subjected to pressurized-thermal-shock (PTS) transients is extremely important to safety. In addition to conventional deterministic calculations to confirm RPV integrity, Electricite de France (EDF) carries out probabilistic analyses. Probabilistic analyses are interesting because some key variables, albeit conventionally taken at conservative values, can be modeled more accurately through statistical variability. One variable which significantly affects RPV structural integrity assessment is cleavage fracture initiation toughness. The reference fracture toughness method currently in use at EDF is the RCCM and ASME Code lower-bound K{sub IC} based on the indexing parameter RT{submore » NDT}. However, in order to quantify the toughness scatter for probabilistic analyses, the master curve method is being analyzed at present. Furthermore, the master curve method is a direct means of evaluating fracture toughness based on K{sub JC} data. In the framework of the master curve investigation undertaken by EDF, this article deals with the following two statistical items: building a master curve from an extract of a fracture toughness dataset (from the European project 'Unified Reference Fracture Toughness Design curves for RPV Steels') and controlling statistical uncertainty for both mono-temperature and multi-temperature tests. Concerning the first point, master curve temperature dependence is empirical in nature. To determine the 'original' master curve, Wallin postulated that a unified description of fracture toughness temperature dependence for ferritic steels is possible, and used a large number of data corresponding to nuclear-grade pressure vessel steels and welds. Our working hypothesis is that some ferritic steels may behave in slightly different ways. Therefore we focused exclusively on the basic french reactor vessel metal of types A508 Class 3 and A 533 grade B Class 1, taking the

  20. An operational definition of a statistically meaningful trend.

    PubMed

    Bryhn, Andreas C; Dimberg, Peter H

    2011-04-28

    Linear trend analysis of time series is standard procedure in many scientific disciplines. If the number of data is large, a trend may be statistically significant even if data are scattered far from the trend line. This study introduces and tests a quality criterion for time trends referred to as statistical meaningfulness, which is a stricter quality criterion for trends than high statistical significance. The time series is divided into intervals and interval mean values are calculated. Thereafter, r(2) and p values are calculated from regressions concerning time and interval mean values. If r(2) ≥ 0.65 at p ≤ 0.05 in any of these regressions, then the trend is regarded as statistically meaningful. Out of ten investigated time series from different scientific disciplines, five displayed statistically meaningful trends. A Microsoft Excel application (add-in) was developed which can perform statistical meaningfulness tests and which may increase the operationality of the test. The presented method for distinguishing statistically meaningful trends should be reasonably uncomplicated for researchers with basic statistics skills and may thus be useful for determining which trends are worth analysing further, for instance with respect to causal factors. The method can also be used for determining which segments of a time trend may be particularly worthwhile to focus on.

  1. Mutual interference between statistical summary perception and statistical learning.

    PubMed

    Zhao, Jiaying; Ngo, Nhi; McKendrick, Ryan; Turk-Browne, Nicholas B

    2011-09-01

    The visual system is an efficient statistician, extracting statistical summaries over sets of objects (statistical summary perception) and statistical regularities among individual objects (statistical learning). Although these two kinds of statistical processing have been studied extensively in isolation, their relationship is not yet understood. We first examined how statistical summary perception influences statistical learning by manipulating the task that participants performed over sets of objects containing statistical regularities (Experiment 1). Participants who performed a summary task showed no statistical learning of the regularities, whereas those who performed control tasks showed robust learning. We then examined how statistical learning influences statistical summary perception by manipulating whether the sets being summarized contained regularities (Experiment 2) and whether such regularities had already been learned (Experiment 3). The accuracy of summary judgments improved when regularities were removed and when learning had occurred in advance. In sum, calculating summary statistics impeded statistical learning, and extracting statistical regularities impeded statistical summary perception. This mutual interference suggests that statistical summary perception and statistical learning are fundamentally related.

  2. Spectral and cross-spectral analysis of uneven time series with the smoothed Lomb-Scargle periodogram and Monte Carlo evaluation of statistical significance

    NASA Astrophysics Data System (ADS)

    Pardo-Igúzquiza, Eulogio; Rodríguez-Tovar, Francisco J.

    2012-12-01

    Many spectral analysis techniques have been designed assuming sequences taken with a constant sampling interval. However, there are empirical time series in the geosciences (sediment cores, fossil abundance data, isotope analysis, …) that do not follow regular sampling because of missing data, gapped data, random sampling or incomplete sequences, among other reasons. In general, interpolating an uneven series in order to obtain a succession with a constant sampling interval alters the spectral content of the series. In such cases it is preferable to follow an approach that works with the uneven data directly, avoiding the need for an explicit interpolation step. The Lomb-Scargle periodogram is a popular choice in such circumstances, as there are programs available in the public domain for its computation. One new computer program for spectral analysis improves the standard Lomb-Scargle periodogram approach in two ways: (1) It explicitly adjusts the statistical significance to any bias introduced by variance reduction smoothing, and (2) it uses a permutation test to evaluate confidence levels, which is better suited than parametric methods when neighbouring frequencies are highly correlated. Another novel program for cross-spectral analysis offers the advantage of estimating the Lomb-Scargle cross-periodogram of two uneven time series defined on the same interval, and it evaluates the confidence levels of the estimated cross-spectra by a non-parametric computer intensive permutation test. Thus, the cross-spectrum, the squared coherence spectrum, the phase spectrum, and the Monte Carlo statistical significance of the cross-spectrum and the squared-coherence spectrum can be obtained. Both of the programs are written in ANSI Fortran 77, in view of its simplicity and compatibility. The program code is of public domain, provided on the website of the journal (http://www.iamg.org/index.php/publisher/articleview/frmArticleID/112/). Different examples (with simulated and

  3. [Statistics for statistics?--Thoughts about psychological tools].

    PubMed

    Berger, Uwe; Stöbel-Richter, Yve

    2007-12-01

    Statistical methods take a prominent place among psychologists' educational programs. Being known as difficult to understand and heavy to learn, students fear of these contents. Those, who do not aspire after a research carrier at the university, will forget the drilled contents fast. Furthermore, because it does not apply for the work with patients and other target groups at a first glance, the methodological education as a whole was often questioned. For many psychological practitioners the statistical education makes only sense by enforcing respect against other professions, namely physicians. For the own business, statistics is rarely taken seriously as a professional tool. The reason seems to be clear: Statistics treats numbers, while psychotherapy treats subjects. So, does statistics ends in itself? With this article, we try to answer the question, if and how statistical methods were represented within the psychotherapeutical and psychological research. Therefore, we analyzed 46 Originals of a complete volume of the journal Psychotherapy, Psychosomatics, Psychological Medicine (PPmP). Within the volume, 28 different analyse methods were applied, from which 89 per cent were directly based upon statistics. To be able to write and critically read Originals as a backbone of research, presumes a high degree of statistical education. To ignore statistics means to ignore research and at least to reveal the own professional work to arbitrariness.

  4. Tools for Basic Statistical Analysis

    NASA Technical Reports Server (NTRS)

    Luz, Paul L.

    2005-01-01

    Statistical Analysis Toolset is a collection of eight Microsoft Excel spreadsheet programs, each of which performs calculations pertaining to an aspect of statistical analysis. These programs present input and output data in user-friendly, menu-driven formats, with automatic execution. The following types of calculations are performed: Descriptive statistics are computed for a set of data x(i) (i = 1, 2, 3 . . . ) entered by the user. Normal Distribution Estimates will calculate the statistical value that corresponds to cumulative probability values, given a sample mean and standard deviation of the normal distribution. Normal Distribution from two Data Points will extend and generate a cumulative normal distribution for the user, given two data points and their associated probability values. Two programs perform two-way analysis of variance (ANOVA) with no replication or generalized ANOVA for two factors with four levels and three repetitions. Linear Regression-ANOVA will curvefit data to the linear equation y=f(x) and will do an ANOVA to check its significance.

  5. Review: Zinc’s functional significance in the vertebrate retina

    PubMed Central

    Chappell, Richard L.

    2014-01-01

    This review covers a broad range of topics related to the actions of zinc on the cells of the vertebrate retina. Much of this review relies on studies in which zinc was applied exogenously, and therefore the results, albeit highly suggestive, lack physiologic significance. This view stems from the fact that the concentrations of zinc used in these studies may not be encountered under the normal circumstances of life. This caveat is due to the lack of a zinc-specific probe with which to measure the concentrations of Zn2+ that may be released from neurons or act upon them. However, a great deal of relevant information has been garnered from studies in which Zn2+ was chelated, and the effects of its removal compared with findings obtained in its presence. For a more complete discussion of the consequences of depletion or excess in the body’s trace elements, the reader is referred to a recent review by Ugarte et al. in which they provide a detailed account of the interactions, toxicity, and metabolic activity of the essential trace elements iron, zinc, and copper in retinal physiology and disease. In addition, Smart et al. have published a splendid review on the modulation by zinc of inhibitory and excitatory amino acid receptor ion channels. PMID:25324679

  6. Why publishing everything is more effective than selective publishing of statistically significant results.

    PubMed

    van Assen, Marcel A L M; van Aert, Robbie C M; Nuijten, Michèle B; Wicherts, Jelte M

    2014-01-01

    De Winter and Happee examined whether science based on selective publishing of significant results may be effective in accurate estimation of population effects, and whether this is even more effective than a science in which all results are published (i.e., a science without publication bias). Based on their simulation study they concluded that "selective publishing yields a more accurate meta-analytic estimation of the true effect than publishing everything, (and that) publishing nonreplicable results while placing null results in the file drawer can be beneficial for the scientific collective" (p.4). Using their scenario with a small to medium population effect size, we show that publishing everything is more effective for the scientific collective than selective publishing of significant results. Additionally, we examined a scenario with a null effect, which provides a more dramatic illustration of the superiority of publishing everything over selective publishing. Publishing everything is more effective than only reporting significant outcomes.

  7. Why Publishing Everything Is More Effective than Selective Publishing of Statistically Significant Results

    PubMed Central

    van Assen, Marcel A. L. M.; van Aert, Robbie C. M.; Nuijten, Michèle B.; Wicherts, Jelte M.

    2014-01-01

    Background De Winter and Happee [1] examined whether science based on selective publishing of significant results may be effective in accurate estimation of population effects, and whether this is even more effective than a science in which all results are published (i.e., a science without publication bias). Based on their simulation study they concluded that “selective publishing yields a more accurate meta-analytic estimation of the true effect than publishing everything, (and that) publishing nonreplicable results while placing null results in the file drawer can be beneficial for the scientific collective” (p.4). Methods and Findings Using their scenario with a small to medium population effect size, we show that publishing everything is more effective for the scientific collective than selective publishing of significant results. Additionally, we examined a scenario with a null effect, which provides a more dramatic illustration of the superiority of publishing everything over selective publishing. Conclusion Publishing everything is more effective than only reporting significant outcomes. PMID:24465448

  8. Tables of Significance Points for the Variance-Weighted Kolmogorov-Smirnov Statistics.

    DTIC Science & Technology

    1981-02-19

    neN x; 1; ( := 0 for all z Vo0 For the values of a function V: N0 IR we use both notations V(i) and vi " ii Tables of Significance Points for the...satisfying 0 < V0 < 10 and v, < Vi -i ¥i IN,* The following functions define a 4-Sheffer sequence (see (A.12)) for the derivative operator D: d i if x ɘ f(,c...S U(i ) < Ii ¥ Vi l,...,M)IMl 7 i7- 1.2. Recursions. For this section we assume p’ 1 without loss of generality.M With qn-k(x)- x n-k/(n-k)! in (A.13

  9. Assay optimization: a statistical design of experiments approach.

    PubMed

    Altekar, Maneesha; Homon, Carol A; Kashem, Mohammed A; Mason, Steven W; Nelson, Richard M; Patnaude, Lori A; Yingling, Jeffrey; Taylor, Paul B

    2007-03-01

    With the transition from manual to robotic HTS in the last several years, assay optimization has become a significant bottleneck. Recent advances in robotic liquid handling have made it feasible to reduce assay optimization timelines with the application of statistically designed experiments. When implemented, they can efficiently optimize assays by rapidly identifying significant factors, complex interactions, and nonlinear responses. This article focuses on the use of statistically designed experiments in assay optimization.

  10. Whither Statistics Education Research?

    ERIC Educational Resources Information Center

    Watson, Jane

    2016-01-01

    This year marks the 25th anniversary of the publication of a "National Statement on Mathematics for Australian Schools", which was the first curriculum statement this country had including "Chance and Data" as a significant component. It is hence an opportune time to survey the history of the related statistics education…

  11. Clinical significance in nursing research: A discussion and descriptive analysis.

    PubMed

    Polit, Denise F

    2017-08-01

    It is widely understood that statistical significance should not be equated with clinical significance, but the topic of clinical significance has not received much attention in the nursing literature. By contrast, interest in conceptualizing and operationalizing clinical significance has been a "hot topic" in other health care fields for several decades. The major purpose of this paper is to briefly describe recent advances in defining and quantifying clinical significance. The overview covers both group-level indicators of clinical significance (e.g., effect size indexes), and individual-level benchmarks (e.g., the minimal important change index). A secondary purpose is to describe the extent to which developments in clinical significance have penetrated the nursing literature. A descriptive analysis of a sample of primary research articles published in three high-impact nursing research journals in 2016 was undertaken. A total of 362 articles were electronically searched for terms relating to statistical and clinical significance. Of the 362 articles, 261 were reports of quantitative studies, the vast majority of which (93%) included a formal evaluation of the statistical significance of the results. By contrast, the term "clinical significance" or related surrogate terms were found in only 33 papers, and most often the term was used informally, without explicit definition or assessment. Raising consciousness about clinical significance should be an important priority among nurse researchers. Several recommendations are offered to improve the visibility and salience of clinical significance in nursing science. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Impaired Statistical Learning in Developmental Dyslexia

    PubMed Central

    Thiessen, Erik D.; Holt, Lori L.

    2015-01-01

    Purpose Developmental dyslexia (DD) is commonly thought to arise from phonological impairments. However, an emerging perspective is that a more general procedural learning deficit, not specific to phonological processing, may underlie DD. The current study examined if individuals with DD are capable of extracting statistical regularities across sequences of passively experienced speech and nonspeech sounds. Such statistical learning is believed to be domain-general, to draw upon procedural learning systems, and to relate to language outcomes. Method DD and control groups were familiarized with a continuous stream of syllables or sine-wave tones, the ordering of which was defined by high or low transitional probabilities across adjacent stimulus pairs. Participants subsequently judged two 3-stimulus test items with either high or low statistical coherence as being the most similar to the sounds heard during familiarization. Results As with control participants, the DD group was sensitive to the transitional probability structure of the familiarization materials as evidenced by above-chance performance. However, the performance of participants with DD was significantly poorer than controls across linguistic and nonlinguistic stimuli. In addition, reading-related measures were significantly correlated with statistical learning performance of both speech and nonspeech material. Conclusion Results are discussed in light of procedural learning impairments among participants with DD. PMID:25860795

  13. Neutron Scattering in Chemistry: Experiments, Models and Statistical Description of Physical Phenomena

    NASA Astrophysics Data System (ADS)

    Ramirez Cuesta, Timmy

    Incoherent inelastic neutron scattering spectroscopy is a very powerful technique that requires the use of ab-initio models to interpret the experimental data. Albeit not exact the information obtained from the models gives very valuable insight into the dynamics of atoms in solids and molecules, that, in turn, provides unique access to the vibrational density of states. It is extremely sensitive to hydrogen since the neutron cross section of hydrogen is the largest of all chemical elements. Hydrogen, being the lightest element highlights quantum effects more pronounced than the rest of the elements.In the case of non-crystalline or disordered materials, the models provide partial information and only a reduced sampling of possible configurations can be done at the present. With very large computing power, as exascale computing will provide, a new opportunity arises to study these systems and introduce a description of statistical configurations including energetics and dynamics characterization of configurational entropy. As part of the ICE-MAN project, we are developing the tools to manage the workflows, visualize and analyze the results. To use state of the art computational methods and most neutron scattering that using atomistic models for interpretation of experimental data This work is supported by the Laboratory Directed Research and Development (LDRD 8237) program of the UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy.

  14. Statistical significance of hair analysis of clenbuterol to discriminate therapeutic use from contamination.

    PubMed

    Krumbholz, Aniko; Anielski, Patricia; Gfrerer, Lena; Graw, Matthias; Geyer, Hans; Schänzer, Wilhelm; Dvorak, Jiri; Thieme, Detlef

    2014-01-01

    Clenbuterol is a well-established β2-agonist, which is prohibited in sports and strictly regulated for use in the livestock industry. During the last few years clenbuterol-positive results in doping controls and in samples from residents or travellers from a high-risk country were suspected to be related the illegal use of clenbuterol for fattening. A sensitive liquid chromatography-tandem mass spectrometry (LC-MS/MS) method was developed to detect low clenbuterol residues in hair with a detection limit of 0.02 pg/mg. A sub-therapeutic application study and a field study with volunteers, who have a high risk of contamination, were performed. For the application study, a total dosage of 30 µg clenbuterol was applied to 20 healthy volunteers on 5 subsequent days. One month after the beginning of the application, clenbuterol was detected in the proximal hair segment (0-1 cm) in concentrations between 0.43 and 4.76 pg/mg. For the second part, samples of 66 Mexican soccer players were analyzed. In 89% of these volunteers, clenbuterol was detectable in their hair at concentrations between 0.02 and 1.90 pg/mg. A comparison of both parts showed no statistical difference between sub-therapeutic application and contamination. In contrast, discrimination to a typical abuse of clenbuterol is apparently possible. Due to these findings results of real doping control samples can be evaluated. Copyright © 2014 John Wiley & Sons, Ltd.

  15. Interpreting “statistical hypothesis testing” results in clinical research

    PubMed Central

    Sarmukaddam, Sanjeev B.

    2012-01-01

    Difference between “Clinical Significance and Statistical Significance” should be kept in mind while interpreting “statistical hypothesis testing” results in clinical research. This fact is already known to many but again pointed out here as philosophy of “statistical hypothesis testing” is sometimes unnecessarily criticized mainly due to failure in considering such distinction. Randomized controlled trials are also wrongly criticized similarly. Some scientific method may not be applicable in some peculiar/particular situation does not mean that the method is useless. Also remember that “statistical hypothesis testing” is not for decision making and the field of “decision analysis” is very much an integral part of science of statistics. It is not correct to say that “confidence intervals have nothing to do with confidence” unless one understands meaning of the word “confidence” as used in context of confidence interval. Interpretation of the results of every study should always consider all possible alternative explanations like chance, bias, and confounding. Statistical tests in inferential statistics are, in general, designed to answer the question “How likely is the difference found in random sample(s) is due to chance” and therefore limitation of relying only on statistical significance in making clinical decisions should be avoided. PMID:22707861

  16. Estimating order statistics of network degrees

    NASA Astrophysics Data System (ADS)

    Chu, J.; Nadarajah, S.

    2018-01-01

    We model the order statistics of network degrees of big data sets by a range of generalised beta distributions. A three parameter beta distribution due to Libby and Novick (1982) is shown to give the best overall fit for at least four big data sets. The fit of this distribution is significantly better than the fit suggested by Olhede and Wolfe (2012) across the whole range of order statistics for all four data sets.

  17. Statistical aspects of solar flares

    NASA Technical Reports Server (NTRS)

    Wilson, Robert M.

    1987-01-01

    A survey of the statistical properties of 850 H alpha solar flares during 1975 is presented. Comparison of the results found here with those reported elsewhere for different epochs is accomplished. Distributions of rise time, decay time, and duration are given, as are the mean, mode, median, and 90th percentile values. Proportions by selected groupings are also determined. For flares in general, mean values for rise time, decay time, and duration are 5.2 + or - 0.4 min, and 18.1 + or 1.1 min, respectively. Subflares, accounting for nearly 90 percent of the flares, had mean values lower than those found for flares of H alpha importance greater than 1, and the differences are statistically significant. Likewise, flares of bright and normal relative brightness have mean values of decay time and duration that are significantly longer than those computed for faint flares, and mass-motion related flares are significantly longer than non-mass-motion related flares. Seventy-three percent of the mass-motion related flares are categorized as being a two-ribbon flare and/or being accompanied by a high-speed dark filament. Slow rise time flares (rise time greater than 5 min) have a mean value for duration that is significantly longer than that computed for fast rise time flares, and long-lived duration flares (duration greater than 18 min) have a mean value for rise time that is significantly longer than that computed for short-lived duration flares, suggesting a positive linear relationship between rise time and duration for flares. Monthly occurrence rates for flares in general and by group are found to be linearly related in a positive sense to monthly sunspot number. Statistical testing reveals the association between sunspot number and numbers of flares to be significant at the 95 percent level of confidence, and the t statistic for slope is significant at greater than 99 percent level of confidence. Dependent upon the specific fit, between 58 percent and 94 percent of

  18. Childhood-compared to adolescent-onset bipolar disorder has more statistically significant clinical correlates.

    PubMed

    Holtzman, Jessica N; Miller, Shefali; Hooshmand, Farnaz; Wang, Po W; Chang, Kiki D; Hill, Shelley J; Rasgon, Natalie L; Ketter, Terence A

    2015-07-01

    The strengths and limitations of considering childhood-and adolescent-onset bipolar disorder (BD) separately versus together remain to be established. We assessed this issue. BD patients referred to the Stanford Bipolar Disorder Clinic during 2000-2011 were assessed with the Systematic Treatment Enhancement Program for BD Affective Disorders Evaluation. Patients with childhood- and adolescent-onset were compared to those with adult-onset for 7 unfavorable bipolar illness characteristics with replicated associations with early-onset patients. Among 502 BD outpatients, those with childhood- (<13 years, N=110) and adolescent- (13-18 years, N=218) onset had significantly higher rates for 4/7 unfavorable illness characteristics, including lifetime comorbid anxiety disorder, at least ten lifetime mood episodes, lifetime alcohol use disorder, and prior suicide attempt, than those with adult-onset (>18 years, N=174). Childhood- but not adolescent-onset BD patients also had significantly higher rates of first-degree relative with mood disorder, lifetime substance use disorder, and rapid cycling in the prior year. Patients with pooled childhood/adolescent - compared to adult-onset had significantly higher rates for 5/7 of these unfavorable illness characteristics, while patients with childhood- compared to adolescent-onset had significantly higher rates for 4/7 of these unfavorable illness characteristics. Caucasian, insured, suburban, low substance abuse, American specialty clinic-referred sample limits generalizability. Onset age is based on retrospective recall. Childhood- compared to adolescent-onset BD was more robustly related to unfavorable bipolar illness characteristics, so pooling these groups attenuated such relationships. Further study is warranted to determine the extent to which adolescent-onset BD represents an intermediate phenotype between childhood- and adult-onset BD. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. The Relationship between Statistics Self-Efficacy, Statistics Anxiety, and Performance in an Introductory Graduate Statistics Course

    ERIC Educational Resources Information Center

    Schneider, William R.

    2011-01-01

    The purpose of this study was to determine the relationship between statistics self-efficacy, statistics anxiety, and performance in introductory graduate statistics courses. The study design compared two statistics self-efficacy measures developed by Finney and Schraw (2003), a statistics anxiety measure developed by Cruise and Wilkins (1980),…

  20. Effect of functionally significant deiodinase single nucleotide polymorphisms on drinking behavior in alcohol dependence: an exploratory investigation

    PubMed Central

    Lee, MR; Schwandt, ML; Bollinger, JW; Dias, AA; Oot, EN; Goldman, D; Hodgkinson, CA; Leggio, L

    2016-01-01

    Background Abnormalities of the hypothalamic-pituitary-thyroid (HPT) axis have been reported in alcoholism, however, there is no definitive agreement on the specific thyroid abnormalities and their underlying mechanisms in alcohol dependence (AD). The biological activity of thyroid hormones or the availability of T3 is regulated by the three deiodinase enzymes D1, D2 and D3. In the context of alcohol use, functionally significant single nucleotide polymorphisms (SNP’s) of these deiodinase genes may play a role in HPT dysfunction. Methods The present study explored the effect of three functionally significant SNP’s (D1: rs2235544, D2: rs225014 and rs12885300) of deiodinase genes on drinking behavior and thyroid stimulating hormone (TSH) levels in alcohol dependent (N=521) and control subjects (N=228). Results Rs225014 was associated with significant differences in the amount of naturalistic alcohol drinking assessed by the Timeline Follow-Back (TLFB). Alcohol-dependent subjects had significantly higher thyroid stimulating hormone levels compared to controls; however, there was no effect of genotype on TSH levels for either group. Conclusions These findings extend previous studies on thyroid dysfunction in alcoholism and provide novel, albeit preliminary, information by linking functionally significant genetic polymorphisms of the deiodinase enzymes with alcohol drinking behavior. PMID:26207529

  1. Statistical Control Paradigm for Aerospace Structures Under Impulsive Disturbances

    DTIC Science & Technology

    2006-08-03

    attitude control system with an innovative and robust statistical controller design shows significant promise for use in attitude hold mode operation...indicate that the existing attitude control system with an innovative and robust statistical controller design shows significant promise for use in...and three thrusters are for use in controlling the attitude of the satellite. Then the angular momentum of the satellite with three thrusters and a

  2. [Comment on] Statistical discrimination

    NASA Astrophysics Data System (ADS)

    Chinn, Douglas

    In the December 8, 1981, issue of Eos, a news item reported the conclusion of a National Research Council study that sexual discrimination against women with Ph.D.'s exists in the field of geophysics. Basically, the item reported that even when allowances are made for motherhood the percentage of female Ph.D.'s holding high university and corporate positions is significantly lower than the percentage of male Ph.D.'s holding the same types of positions. The sexual discrimination conclusion, based only on these statistics, assumes that there are no basic psychological differences between men and women that might cause different populations in the employment group studied. Therefore, the reasoning goes, after taking into account possible effects from differences related to anatomy, such as women stopping their careers in order to bear and raise children, the statistical distributions of positions held by male and female Ph.D.'s ought to be very similar to one another. Any significant differences between the distributions must be caused primarily by sexual discrimination.

  3. Relationship between Graduate Students' Statistics Self-Efficacy, Statistics Anxiety, Attitude toward Statistics, and Social Support

    ERIC Educational Resources Information Center

    Perepiczka, Michelle; Chandler, Nichelle; Becerra, Michael

    2011-01-01

    Statistics plays an integral role in graduate programs. However, numerous intra- and interpersonal factors may lead to successful completion of needed coursework in this area. The authors examined the extent of the relationship between self-efficacy to learn statistics and statistics anxiety, attitude towards statistics, and social support of 166…

  4. Data-driven inference for the spatial scan statistic.

    PubMed

    Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C

    2011-08-02

    Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.

  5. Forest statistics of Indiana

    Treesearch

    The Forest Survey Organization Central States Forest Experiment Station

    1953-01-01

    The Forest Survey is conducted in the various regions by the forest experiment stations of the Forest Service. In Indiana the project is directed by the Central States Forest Experiment Station with headquarters in Columbus, Ohio. This Survey Release presents the more significant preliminary statistics on the forest area timber volume, timber growth, and timber drain...

  6. Forest statistics of Kentucky

    Treesearch

    The Forest Survey Organization Central States Forest Experiment Station

    1952-01-01

    The Forest Survey is conducted in the various regions by the forest experiment stations of the Forest Service. In Kentucky the project is directed by the Central States Forest Experiment Station with headquarters in Columbus, Ohio. This Survey Release presents the more significant preliminary statistics on the forest area, timber volume, timber growth, and timber drain...

  7. A simulation study of the strength of evidence in the recommendation of medications based on two trials with statistically significant results

    PubMed Central

    Ioannidis, John P. A.

    2017-01-01

    A typical rule that has been used for the endorsement of new medications by the Food and Drug Administration is to have two trials, each convincing on its own, demonstrating effectiveness. “Convincing” may be subjectively interpreted, but the use of p-values and the focus on statistical significance (in particular with p < .05 being coined significant) is pervasive in clinical research. Therefore, in this paper, we calculate with simulations what it means to have exactly two trials, each with p < .05, in terms of the actual strength of evidence quantified by Bayes factors. Our results show that different cases where two trials have a p-value below .05 have wildly differing Bayes factors. Bayes factors of at least 20 in favor of the alternative hypothesis are not necessarily achieved and they fail to be reached in a large proportion of cases, in particular when the true effect size is small (0.2 standard deviations) or zero. In a non-trivial number of cases, evidence actually points to the null hypothesis, in particular when the true effect size is zero, when the number of trials is large, and when the number of participants in both groups is low. We recommend use of Bayes factors as a routine tool to assess endorsement of new medications, because Bayes factors consistently quantify strength of evidence. Use of p-values may lead to paradoxical and spurious decision-making regarding the use of new medications. PMID:28273140

  8. Testing for significance of phase synchronisation dynamics in the EEG.

    PubMed

    Daly, Ian; Sweeney-Reed, Catherine M; Nasuto, Slawomir J

    2013-06-01

    A number of tests exist to check for statistical significance of phase synchronisation within the Electroencephalogram (EEG); however, the majority suffer from a lack of generality and applicability. They may also fail to account for temporal dynamics in the phase synchronisation, regarding synchronisation as a constant state instead of a dynamical process. Therefore, a novel test is developed for identifying the statistical significance of phase synchronisation based upon a combination of work characterising temporal dynamics of multivariate time-series and Markov modelling. We show how this method is better able to assess the significance of phase synchronisation than a range of commonly used significance tests. We also show how the method may be applied to identify and classify significantly different phase synchronisation dynamics in both univariate and multivariate datasets.

  9. The Effect Size Statistic: Overview of Various Choices.

    ERIC Educational Resources Information Center

    Mahadevan, Lakshmi

    Over the years, methodologists have been recommending that researchers use magnitude of effect estimates in result interpretation to highlight the distinction between statistical and practical significance (cf. R. Kirk, 1996). A magnitude of effect statistic (i.e., effect size) tells to what degree the dependent variable can be controlled,…

  10. Why significant variables aren't automatically good predictors.

    PubMed

    Lo, Adeline; Chernoff, Herman; Zheng, Tian; Lo, Shaw-Hwa

    2015-11-10

    Thus far, genome-wide association studies (GWAS) have been disappointing in the inability of investigators to use the results of identified, statistically significant variants in complex diseases to make predictions useful for personalized medicine. Why are significant variables not leading to good prediction of outcomes? We point out that this problem is prevalent in simple as well as complex data, in the sciences as well as the social sciences. We offer a brief explanation and some statistical insights on why higher significance cannot automatically imply stronger predictivity and illustrate through simulations and a real breast cancer example. We also demonstrate that highly predictive variables do not necessarily appear as highly significant, thus evading the researcher using significance-based methods. We point out that what makes variables good for prediction versus significance depends on different properties of the underlying distributions. If prediction is the goal, we must lay aside significance as the only selection standard. We suggest that progress in prediction requires efforts toward a new research agenda of searching for a novel criterion to retrieve highly predictive variables rather than highly significant variables. We offer an alternative approach that was not designed for significance, the partition retention method, which was very effective predicting on a long-studied breast cancer data set, by reducing the classification error rate from 30% to 8%.

  11. Harmonic statistics

    NASA Astrophysics Data System (ADS)

    Eliazar, Iddo

    2017-05-01

    The exponential, the normal, and the Poisson statistical laws are of major importance due to their universality. Harmonic statistics are as universal as the three aforementioned laws, but yet they fall short in their 'public relations' for the following reason: the full scope of harmonic statistics cannot be described in terms of a statistical law. In this paper we describe harmonic statistics, in their full scope, via an object termed harmonic Poisson process: a Poisson process, over the positive half-line, with a harmonic intensity. The paper reviews the harmonic Poisson process, investigates its properties, and presents the connections of this object to an assortment of topics: uniform statistics, scale invariance, random multiplicative perturbations, Pareto and inverse-Pareto statistics, exponential growth and exponential decay, power-law renormalization, convergence and domains of attraction, the Langevin equation, diffusions, Benford's law, and 1/f noise.

  12. Statistical process control in nursing research.

    PubMed

    Polit, Denise F; Chaboyer, Wendy

    2012-02-01

    In intervention studies in which randomization to groups is not possible, researchers typically use quasi-experimental designs. Time series designs are strong quasi-experimental designs but are seldom used, perhaps because of technical and analytic hurdles. Statistical process control (SPC) is an alternative analytic approach to testing hypotheses about intervention effects using data collected over time. SPC, like traditional statistical methods, is a tool for understanding variation and involves the construction of control charts that distinguish between normal, random fluctuations (common cause variation), and statistically significant special cause variation that can result from an innovation. The purpose of this article is to provide an overview of SPC and to illustrate its use in a study of a nursing practice improvement intervention. Copyright © 2011 Wiley Periodicals, Inc.

  13. Statistics Anxiety and Business Statistics: The International Student

    ERIC Educational Resources Information Center

    Bell, James A.

    2008-01-01

    Does the international student suffer from statistics anxiety? To investigate this, the Statistics Anxiety Rating Scale (STARS) was administered to sixty-six beginning statistics students, including twelve international students and fifty-four domestic students. Due to the small number of international students, nonparametric methods were used to…

  14. Changing world extreme temperature statistics

    NASA Astrophysics Data System (ADS)

    Finkel, J. M.; Katz, J. I.

    2018-04-01

    We use the Global Historical Climatology Network--daily database to calculate a nonparametric statistic that describes the rate at which all-time daily high and low temperature records have been set in nine geographic regions (continents or major portions of continents) during periods mostly from the mid-20th Century to the present. This statistic was defined in our earlier work on temperature records in the 48 contiguous United States. In contrast to this earlier work, we find that in every region except North America all-time high records were set at a rate significantly (at least $3\\sigma$) higher than in the null hypothesis of a stationary climate. Except in Antarctica, all-time low records were set at a rate significantly lower than in the null hypothesis. In Europe, North Africa and North Asia the rate of setting new all-time highs increased suddenly in the 1990's, suggesting a change in regional climate regime; in most other regions there was a steadier increase.

  15. A SIGNIFICANCE TEST FOR THE LASSO1

    PubMed Central

    Lockhart, Richard; Taylor, Jonathan; Tibshirani, Ryan J.; Tibshirani, Robert

    2014-01-01

    In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an Exp(1) asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i.e., testing for a single significant predictor variable against the global null) requires only weak assumptions on the predictor matrix X. On the other hand, our proof for a general step in the lasso path places further technical assumptions on X and the generative model, but still allows for the important high-dimensional case p > n, and does not necessarily require that the current lasso model achieves perfect recovery of the truly active variables. Of course, for testing the significance of an additional variable between two nested linear models, one typically uses the chi-squared test, comparing the drop in residual sum of squares (RSS) to a χ12 distribution. But when this additional variable is not fixed, and has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than χ12 under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter λ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the l1 penalty. Therefore, the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties—adaptivity and

  16. Software Used to Generate Cancer Statistics - SEER Cancer Statistics

    Cancer.gov

    Videos that highlight topics and trends in cancer statistics and definitions of statistical terms. Also software tools for analyzing and reporting cancer statistics, which are used to compile SEER's annual reports.

  17. Understanding Statistics and Statistics Education: A Chinese Perspective

    ERIC Educational Resources Information Center

    Shi, Ning-Zhong; He, Xuming; Tao, Jian

    2009-01-01

    In recent years, statistics education in China has made great strides. However, there still exists a fairly large gap with the advanced levels of statistics education in more developed countries. In this paper, we identify some existing problems in statistics education in Chinese schools and make some proposals as to how they may be overcome. We…

  18. Design of order statistics filters using feedforward neural networks

    NASA Astrophysics Data System (ADS)

    Maslennikova, Yu. S.; Bochkarev, V. V.

    2016-08-01

    In recent years significant progress have been made in the development of nonlinear data processing techniques. Such techniques are widely used in digital data filtering and image enhancement. Many of the most effective nonlinear filters based on order statistics. The widely used median filter is the best known order statistic filter. Generalized form of these filters could be presented based on Lloyd's statistics. Filters based on order statistics have excellent robustness properties in the presence of impulsive noise. In this paper, we present special approach for synthesis of order statistics filters using artificial neural networks. Optimal Lloyd's statistics are used for selecting of initial weights for the neural network. Adaptive properties of neural networks provide opportunities to optimize order statistics filters for data with asymmetric distribution function. Different examples demonstrate the properties and performance of presented approach.

  19. Directory of Michigan Library Statistics. 1994 Edition. Reporting 1992 and 1993 Statistical Activities including: Public Library Statistics, Library Cooperative Statistics, Regional/Subregional Statistics.

    ERIC Educational Resources Information Center

    Leaf, Donald C., Comp.; Neely, Linda, Comp.

    This edition focuses on statistical data supplied by Michigan public libraries, public library cooperatives, and those public libraries which serve as regional or subregional outlets for blind and physically handicapped services. Since statistics in Michigan academic libraries are typically collected in odd-numbered years, they are not included…

  20. A note on generalized Genome Scan Meta-Analysis statistics

    PubMed Central

    Koziol, James A; Feng, Anne C

    2005-01-01

    Background Wise et al. introduced a rank-based statistical technique for meta-analysis of genome scans, the Genome Scan Meta-Analysis (GSMA) method. Levinson et al. recently described two generalizations of the GSMA statistic: (i) a weighted version of the GSMA statistic, so that different studies could be ascribed different weights for analysis; and (ii) an order statistic approach, reflecting the fact that a GSMA statistic can be computed for each chromosomal region or bin width across the various genome scan studies. Results We provide an Edgeworth approximation to the null distribution of the weighted GSMA statistic, and, we examine the limiting distribution of the GSMA statistics under the order statistic formulation, and quantify the relevance of the pairwise correlations of the GSMA statistics across different bins on this limiting distribution. We also remark on aggregate criteria and multiple testing for determining significance of GSMA results. Conclusion Theoretical considerations detailed herein can lead to clarification and simplification of testing criteria for generalizations of the GSMA statistic. PMID:15717930

  1. Statistical Symbolic Execution with Informed Sampling

    NASA Technical Reports Server (NTRS)

    Filieri, Antonio; Pasareanu, Corina S.; Visser, Willem; Geldenhuys, Jaco

    2014-01-01

    Symbolic execution techniques have been proposed recently for the probabilistic analysis of programs. These techniques seek to quantify the likelihood of reaching program events of interest, e.g., assert violations. They have many promising applications but have scalability issues due to high computational demand. To address this challenge, we propose a statistical symbolic execution technique that performs Monte Carlo sampling of the symbolic program paths and uses the obtained information for Bayesian estimation and hypothesis testing with respect to the probability of reaching the target events. To speed up the convergence of the statistical analysis, we propose Informed Sampling, an iterative symbolic execution that first explores the paths that have high statistical significance, prunes them from the state space and guides the execution towards less likely paths. The technique combines Bayesian estimation with a partial exact analysis for the pruned paths leading to provably improved convergence of the statistical analysis. We have implemented statistical symbolic execution with in- formed sampling in the Symbolic PathFinder tool. We show experimentally that the informed sampling obtains more precise results and converges faster than a purely statistical analysis and may also be more efficient than an exact symbolic analysis. When the latter does not terminate symbolic execution with informed sampling can give meaningful results under the same time and memory limits.

  2. Further developments in cloud statistics for computer simulations

    NASA Technical Reports Server (NTRS)

    Chang, D. T.; Willand, J. H.

    1972-01-01

    This study is a part of NASA's continued program to provide global statistics of cloud parameters for computer simulation. The primary emphasis was on the development of the data bank of the global statistical distributions of cloud types and cloud layers and their applications in the simulation of the vertical distributions of in-cloud parameters such as liquid water content. These statistics were compiled from actual surface observations as recorded in Standard WBAN forms. Data for a total of 19 stations were obtained and reduced. These stations were selected to be representative of the 19 primary cloud climatological regions defined in previous studies of cloud statistics. Using the data compiled in this study, a limited study was conducted of the hemogeneity of cloud regions, the latitudinal dependence of cloud-type distributions, the dependence of these statistics on sample size, and other factors in the statistics which are of significance to the problem of simulation. The application of the statistics in cloud simulation was investigated. In particular, the inclusion of the new statistics in an expanded multi-step Monte Carlo simulation scheme is suggested and briefly outlined.

  3. Teaching Biology through Statistics: Application of Statistical Methods in Genetics and Zoology Courses

    PubMed Central

    Colon-Berlingeri, Migdalisel; Burrowes, Patricia A.

    2011-01-01

    Incorporation of mathematics into biology curricula is critical to underscore for undergraduate students the relevance of mathematics to most fields of biology and the usefulness of developing quantitative process skills demanded in modern biology. At our institution, we have made significant changes to better integrate mathematics into the undergraduate biology curriculum. The curricular revision included changes in the suggested course sequence, addition of statistics and precalculus as prerequisites to core science courses, and incorporating interdisciplinary (math–biology) learning activities in genetics and zoology courses. In this article, we describe the activities developed for these two courses and the assessment tools used to measure the learning that took place with respect to biology and statistics. We distinguished the effectiveness of these learning opportunities in helping students improve their understanding of the math and statistical concepts addressed and, more importantly, their ability to apply them to solve a biological problem. We also identified areas that need emphasis in both biology and mathematics courses. In light of our observations, we recommend best practices that biology and mathematics academic departments can implement to train undergraduates for the demands of modern biology. PMID:21885822

  4. Teaching biology through statistics: application of statistical methods in genetics and zoology courses.

    PubMed

    Colon-Berlingeri, Migdalisel; Burrowes, Patricia A

    2011-01-01

    Incorporation of mathematics into biology curricula is critical to underscore for undergraduate students the relevance of mathematics to most fields of biology and the usefulness of developing quantitative process skills demanded in modern biology. At our institution, we have made significant changes to better integrate mathematics into the undergraduate biology curriculum. The curricular revision included changes in the suggested course sequence, addition of statistics and precalculus as prerequisites to core science courses, and incorporating interdisciplinary (math-biology) learning activities in genetics and zoology courses. In this article, we describe the activities developed for these two courses and the assessment tools used to measure the learning that took place with respect to biology and statistics. We distinguished the effectiveness of these learning opportunities in helping students improve their understanding of the math and statistical concepts addressed and, more importantly, their ability to apply them to solve a biological problem. We also identified areas that need emphasis in both biology and mathematics courses. In light of our observations, we recommend best practices that biology and mathematics academic departments can implement to train undergraduates for the demands of modern biology.

  5. Effects of quantum coherence on work statistics

    NASA Astrophysics Data System (ADS)

    Xu, Bao-Ming; Zou, Jian; Guo, Li-Sha; Kong, Xiang-Mu

    2018-05-01

    In the conventional two-point measurement scheme of quantum thermodynamics, quantum coherence is destroyed by the first measurement. But as we know the coherence really plays an important role in the quantum thermodynamics process, and how to describe the work statistics for a quantum coherent process is still an open question. In this paper, we use the full counting statistics method to investigate the effects of quantum coherence on work statistics. First, we give a general discussion and show that for a quantum coherent process, work statistics is very different from that of the two-point measurement scheme, specifically the average work is increased or decreased and the work fluctuation can be decreased by quantum coherence, which strongly depends on the relative phase, the energy level structure, and the external protocol. Then, we concretely consider a quenched one-dimensional transverse Ising model and show that quantum coherence has a more significant influence on work statistics in the ferromagnetism regime compared with that in the paramagnetism regime, so that due to the presence of quantum coherence the work statistics can exhibit the critical phenomenon even at high temperature.

  6. Statistical learning and selective inference.

    PubMed

    Taylor, Jonathan; Tibshirani, Robert J

    2015-06-23

    We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.

  7. Environmental Health Practice: Statistically Based Performance Measurement

    PubMed Central

    Enander, Richard T.; Gagnon, Ronald N.; Hanumara, R. Choudary; Park, Eugene; Armstrong, Thomas; Gute, David M.

    2007-01-01

    Objectives. State environmental and health protection agencies have traditionally relied on a facility-by-facility inspection-enforcement paradigm to achieve compliance with government regulations. We evaluated the effectiveness of a new approach that uses a self-certification random sampling design. Methods. Comprehensive environmental and occupational health data from a 3-year statewide industry self-certification initiative were collected from representative automotive refinishing facilities located in Rhode Island. Statistical comparisons between baseline and postintervention data facilitated a quantitative evaluation of statewide performance. Results. The analysis of field data collected from 82 randomly selected automotive refinishing facilities showed statistically significant improvements (P<.05, Fisher exact test) in 4 major performance categories: occupational health and safety, air pollution control, hazardous waste management, and wastewater discharge. Statistical significance was also shown when a modified Bonferroni adjustment for multiple comparisons was performed. Conclusions. Our findings suggest that the new self-certification approach to environmental and worker protection is effective and can be used as an adjunct to further enhance state and federal enforcement programs. PMID:17267709

  8. New scanning technique using Adaptive Statistical Iterative Reconstruction (ASIR) significantly reduced the radiation dose of cardiac CT.

    PubMed

    Tumur, Odgerel; Soon, Kean; Brown, Fraser; Mykytowycz, Marcus

    2013-06-01

    The aims of our study were to evaluate the effect of application of Adaptive Statistical Iterative Reconstruction (ASIR) algorithm on the radiation dose of coronary computed tomography angiography (CCTA) and its effects on image quality of CCTA and to evaluate the effects of various patient and CT scanning factors on the radiation dose of CCTA. This was a retrospective study that included 347 consecutive patients who underwent CCTA at a tertiary university teaching hospital between 1 July 2009 and 20 September 2011. Analysis was performed comparing patient demographics, scan characteristics, radiation dose and image quality in two groups of patients in whom conventional Filtered Back Projection (FBP) or ASIR was used for image reconstruction. There were 238 patients in the FBP group and 109 patients in the ASIR group. There was no difference between the groups in the use of prospective gating, scan length or tube voltage. In ASIR group, significantly lower tube current was used compared with FBP group, 550 mA (450-600) vs. 650 mA (500-711.25) (median (interquartile range)), respectively, P < 0.001. There was 27% effective radiation dose reduction in the ASIR group compared with FBP group, 4.29 mSv (2.84-6.02) vs. 5.84 mSv (3.88-8.39) (median (interquartile range)), respectively, P < 0.001. Although ASIR was associated with increased image noise compared with FBP (39.93 ± 10.22 vs. 37.63 ± 18.79 (mean ± standard deviation), respectively, P < 0.001), it did not affect the signal intensity, signal-to-noise ratio, contrast-to-noise ratio or the diagnostic quality of CCTA. Application of ASIR reduces the radiation dose of CCTA without affecting the image quality. © 2013 The Authors. Journal of Medical Imaging and Radiation Oncology © 2013 The Royal Australian and New Zealand College of Radiologists.

  9. Method for Expressing Clinical and Statistical Significance of Ocular and Corneal Wavefront Error Aberrations

    PubMed Central

    Smolek, Michael K.

    2011-01-01

    Purpose The significance of ocular or corneal aberrations may be subject to misinterpretation whenever eyes with different pupil sizes or the application of different Zernike expansion orders are compared. A method is shown that uses simple mathematical interpolation techniques based on normal data to rapidly determine the clinical significance of aberrations, without concern for pupil and expansion order. Methods Corneal topography (Tomey, Inc.; Nagoya, Japan) from 30 normal corneas was collected and the corneal wavefront error analyzed by Zernike polynomial decomposition into specific aberration types for pupil diameters of 3, 5, 7, and 10 mm and Zernike expansion orders of 6, 8, 10 and 12. Using this 4×4 matrix of pupil sizes and fitting orders, best-fitting 3-dimensional functions were determined for the mean and standard deviation of the RMS error for specific aberrations. The functions were encoded into software to determine the significance of data acquired from non-normal cases. Results The best-fitting functions for 6 types of aberrations were determined: defocus, astigmatism, prism, coma, spherical aberration, and all higher-order aberrations. A clinical screening method of color-coding the significance of aberrations in normal, postoperative LASIK, and keratoconus cases having different pupil sizes and different expansion orders is demonstrated. Conclusions A method to calibrate wavefront aberrometry devices by using a standard sample of normal cases was devised. This method could be potentially useful in clinical studies involving patients with uncontrolled pupil sizes or in studies that compare data from aberrometers that use different Zernike fitting-order algorithms. PMID:22157570

  10. Statistical reporting inconsistencies in experimental philosophy

    PubMed Central

    Colombo, Matteo; Duev, Georgi; Nuijten, Michèle B.; Sprenger, Jan

    2018-01-01

    Experimental philosophy (x-phi) is a young field of research in the intersection of philosophy and psychology. It aims to make progress on philosophical questions by using experimental methods traditionally associated with the psychological and behavioral sciences, such as null hypothesis significance testing (NHST). Motivated by recent discussions about a methodological crisis in the behavioral sciences, questions have been raised about the methodological standards of x-phi. Here, we focus on one aspect of this question, namely the rate of inconsistencies in statistical reporting. Previous research has examined the extent to which published articles in psychology and other behavioral sciences present statistical inconsistencies in reporting the results of NHST. In this study, we used the R package statcheck to detect statistical inconsistencies in x-phi, and compared rates of inconsistencies in psychology and philosophy. We found that rates of inconsistencies in x-phi are lower than in the psychological and behavioral sciences. From the point of view of statistical reporting consistency, x-phi seems to do no worse, and perhaps even better, than psychological science. PMID:29649220

  11. Statistical Analysis of Big Data on Pharmacogenomics

    PubMed Central

    Fan, Jianqing; Liu, Han

    2013-01-01

    This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905

  12. Analysis of statistical misconception in terms of statistical reasoning

    NASA Astrophysics Data System (ADS)

    Maryati, I.; Priatna, N.

    2018-05-01

    Reasoning skill is needed for everyone to face globalization era, because every person have to be able to manage and use information from all over the world which can be obtained easily. Statistical reasoning skill is the ability to collect, group, process, interpret, and draw conclusion of information. Developing this skill can be done through various levels of education. However, the skill is low because many people assume that statistics is just the ability to count and using formulas and so do students. Students still have negative attitude toward course which is related to research. The purpose of this research is analyzing students’ misconception in descriptive statistic course toward the statistical reasoning skill. The observation was done by analyzing the misconception test result and statistical reasoning skill test; observing the students’ misconception effect toward statistical reasoning skill. The sample of this research was 32 students of math education department who had taken descriptive statistic course. The mean value of misconception test was 49,7 and standard deviation was 10,6 whereas the mean value of statistical reasoning skill test was 51,8 and standard deviation was 8,5. If the minimal value is 65 to state the standard achievement of a course competence, students’ mean value is lower than the standard competence. The result of students’ misconception study emphasized on which sub discussion that should be considered. Based on the assessment result, it was found that students’ misconception happen on this: 1) writing mathematical sentence and symbol well, 2) understanding basic definitions, 3) determining concept that will be used in solving problem. In statistical reasoning skill, the assessment was done to measure reasoning from: 1) data, 2) representation, 3) statistic format, 4) probability, 5) sample, and 6) association.

  13. Statistical Inference at Work: Statistical Process Control as an Example

    ERIC Educational Resources Information Center

    Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia

    2008-01-01

    To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…

  14. What can we learn from noise? - Mesoscopic nonequilibrium statistical physics.

    PubMed

    Kobayashi, Kensuke

    2016-01-01

    Mesoscopic systems - small electric circuits working in quantum regime - offer us a unique experimental stage to explorer quantum transport in a tunable and precise way. The purpose of this Review is to show how they can contribute to statistical physics. We introduce the significance of fluctuation, or equivalently noise, as noise measurement enables us to address the fundamental aspects of a physical system. The significance of the fluctuation theorem (FT) in statistical physics is noted. We explain what information can be deduced from the current noise measurement in mesoscopic systems. As an important application of the noise measurement to statistical physics, we describe our experimental work on the current and current noise in an electron interferometer, which is the first experimental test of FT in quantum regime. Our attempt will shed new light in the research field of mesoscopic quantum statistical physics.

  15. Frog Statistics

    Science.gov Websites

    Whole Frog Project and Virtual Frog Dissection Statistics wwwstats output for January 1 through duplicate or extraneous accesses. For example, in these statistics, while a POST requesting an image is as well. Note that this under-represents the bytes requested. Starting date for following statistics

  16. Descriptive and inferential statistical methods used in burns research.

    PubMed

    Al-Benna, Sammy; Al-Ajam, Yazan; Way, Benjamin; Steinstraesser, Lars

    2010-05-01

    Burns research articles utilise a variety of descriptive and inferential methods to present and analyse data. The aim of this study was to determine the descriptive methods (e.g. mean, median, SD, range, etc.) and survey the use of inferential methods (statistical tests) used in articles in the journal Burns. This study defined its population as all original articles published in the journal Burns in 2007. Letters to the editor, brief reports, reviews, and case reports were excluded. Study characteristics, use of descriptive statistics and the number and types of statistical methods employed were evaluated. Of the 51 articles analysed, 11(22%) were randomised controlled trials, 18(35%) were cohort studies, 11(22%) were case control studies and 11(22%) were case series. The study design and objectives were defined in all articles. All articles made use of continuous and descriptive data. Inferential statistics were used in 49(96%) articles. Data dispersion was calculated by standard deviation in 30(59%). Standard error of the mean was quoted in 19(37%). The statistical software product was named in 33(65%). Of the 49 articles that used inferential statistics, the tests were named in 47(96%). The 6 most common tests used (Student's t-test (53%), analysis of variance/co-variance (33%), chi(2) test (27%), Wilcoxon & Mann-Whitney tests (22%), Fisher's exact test (12%)) accounted for the majority (72%) of statistical methods employed. A specified significance level was named in 43(88%) and the exact significance levels were reported in 28(57%). Descriptive analysis and basic statistical techniques account for most of the statistical tests reported. This information should prove useful in deciding which tests should be emphasised in educating burn care professionals. These results highlight the need for burn care professionals to have a sound understanding of basic statistics, which is crucial in interpreting and reporting data. Advice should be sought from professionals

  17. Common Scientific and Statistical Errors in Obesity Research

    PubMed Central

    George, Brandon J.; Beasley, T. Mark; Brown, Andrew W.; Dawson, John; Dimova, Rositsa; Divers, Jasmin; Goldsby, TaShauna U.; Heo, Moonseong; Kaiser, Kathryn A.; Keith, Scott; Kim, Mimi Y.; Li, Peng; Mehta, Tapan; Oakes, J. Michael; Skinner, Asheley; Stuart, Elizabeth; Allison, David B.

    2015-01-01

    We identify 10 common errors and problems in the statistical analysis, design, interpretation, and reporting of obesity research and discuss how they can be avoided. The 10 topics are: 1) misinterpretation of statistical significance, 2) inappropriate testing against baseline values, 3) excessive and undisclosed multiple testing and “p-value hacking,” 4) mishandling of clustering in cluster randomized trials, 5) misconceptions about nonparametric tests, 6) mishandling of missing data, 7) miscalculation of effect sizes, 8) ignoring regression to the mean, 9) ignoring confirmation bias, and 10) insufficient statistical reporting. We hope that discussion of these errors can improve the quality of obesity research by helping researchers to implement proper statistical practice and to know when to seek the help of a statistician. PMID:27028280

  18. Computed statistics at streamgages, and methods for estimating low-flow frequency statistics and development of regional regression equations for estimating low-flow frequency statistics at ungaged locations in Missouri

    USGS Publications Warehouse

    Southard, Rodney E.

    2013-01-01

    located in Region 1, 120 were located in Region 2, and 10 were located in Region 3. Streamgages located outside of Missouri were selected to extend the range of data used for the independent variables in the regression analyses. Streamgages included in the regression analyses had 10 or more years of record and were considered to be affected minimally by anthropogenic activities or trends. Regional regression analyses identified three characteristics as statistically significant for the development of regional equations. For Region 1, drainage area, longest flow path, and streamflow-variability index were statistically significant. The range in the standard error of estimate for Region 1 is 79.6 to 94.2 percent. For Region 2, drainage area and streamflow variability index were statistically significant, and the range in the standard error of estimate is 48.2 to 72.1 percent. For Region 3, drainage area and streamflow-variability index also were statistically significant with a range in the standard error of estimate of 48.1 to 96.2 percent. Limitations on the use of estimating low-flow frequency statistics at ungaged locations are dependent on the method used. The first method outlined for use in Missouri, power curve equations, were developed to estimate the selected statistics for ungaged locations on 28 selected streams with multiple streamgages located on the same stream. A second method uses a drainage-area ratio to compute statistics at an ungaged location using data from a single streamgage on the same stream with 10 or more years of record. Ungaged locations on these streams may use the ratio of the drainage area at an ungaged location to the drainage area at a streamgage location to scale the selected statistic value from the streamgage location to the ungaged location. This method can be used if the drainage area of the ungaged location is within 40 to 150 percent of the streamgage drainage area. The third method is the use of the regional regression equations

  19. MQSA National Statistics

    MedlinePlus

    ... Standards Act and Program MQSA Insights MQSA National Statistics Share Tweet Linkedin Pin it More sharing options ... but should level off with time. Archived Scorecard Statistics 2018 Scorecard Statistics 2017 Scorecard Statistics 2016 Scorecard ...

  20. The Development of Statistical Models for Predicting Surgical Site Infections in Japan: Toward a Statistical Model-Based Standardized Infection Ratio.

    PubMed

    Fukuda, Haruhisa; Kuroki, Manabu

    2016-03-01

    To develop and internally validate a surgical site infection (SSI) prediction model for Japan. Retrospective observational cohort study. We analyzed surveillance data submitted to the Japan Nosocomial Infections Surveillance system for patients who had undergone target surgical procedures from January 1, 2010, through December 31, 2012. Logistic regression analyses were used to develop statistical models for predicting SSIs. An SSI prediction model was constructed for each of the procedure categories by statistically selecting the appropriate risk factors from among the collected surveillance data and determining their optimal categorization. Standard bootstrapping techniques were applied to assess potential overfitting. The C-index was used to compare the predictive performances of the new statistical models with those of models based on conventional risk index variables. The study sample comprised 349,987 cases from 428 participant hospitals throughout Japan, and the overall SSI incidence was 7.0%. The C-indices of the new statistical models were significantly higher than those of the conventional risk index models in 21 (67.7%) of the 31 procedure categories (P<.05). No significant overfitting was detected. Japan-specific SSI prediction models were shown to generally have higher accuracy than conventional risk index models. These new models may have applications in assessing hospital performance and identifying high-risk patients in specific procedure categories.

  1. Forest statistics of eastern Kentucky

    Treesearch

    The Forest Survey Organization Central States Forest Experiment Station

    1952-01-01

    The Forest Survey is conducted in the various regions by the forest experiment stations of the Forest Service. In Kentucky the project is directed by the Central States Forest Experiment Station with headquarters in Columbus, Ohio. This Survey Release presents the more significant preliminary statistics on the forest area and timber volume for the Eastern Kentucky...

  2. Statistical Diversions

    ERIC Educational Resources Information Center

    Petocz, Peter; Sowey, Eric

    2008-01-01

    As a branch of knowledge, Statistics is ubiquitous and its applications can be found in (almost) every field of human endeavour. In this article, the authors track down the possible source of the link between the "Siren song" and applications of Statistics. Answers to their previous five questions and five new questions on Statistics are presented.

  3. Descriptive statistics.

    PubMed

    Nick, Todd G

    2007-01-01

    Statistics is defined by the Medical Subject Headings (MeSH) thesaurus as the science and art of collecting, summarizing, and analyzing data that are subject to random variation. The two broad categories of summarizing and analyzing data are referred to as descriptive and inferential statistics. This chapter considers the science and art of summarizing data where descriptive statistics and graphics are used to display data. In this chapter, we discuss the fundamentals of descriptive statistics, including describing qualitative and quantitative variables. For describing quantitative variables, measures of location and spread, for example the standard deviation, are presented along with graphical presentations. We also discuss distributions of statistics, for example the variance, as well as the use of transformations. The concepts in this chapter are useful for uncovering patterns within the data and for effectively presenting the results of a project.

  4. Inverse statistics and information content

    NASA Astrophysics Data System (ADS)

    Ebadi, H.; Bolgorian, Meysam; Jafari, G. R.

    2010-12-01

    Inverse statistics analysis studies the distribution of investment horizons to achieve a predefined level of return. This distribution provides a maximum investment horizon which determines the most likely horizon for gaining a specific return. There exists a significant difference between inverse statistics of financial market data and a fractional Brownian motion (fBm) as an uncorrelated time-series, which is a suitable criteria to measure information content in financial data. In this paper we perform this analysis for the DJIA and S&P500 as two developed markets and Tehran price index (TEPIX) as an emerging market. We also compare these probability distributions with fBm probability, to detect when the behavior of the stocks are the same as fBm.

  5. Statistical inference for template aging

    NASA Astrophysics Data System (ADS)

    Schuckers, Michael E.

    2006-04-01

    A change in classification error rates for a biometric device is often referred to as template aging. Here we offer two methods for determining whether the effect of time is statistically significant. The first of these is the use of a generalized linear model to determine if these error rates change linearly over time. This approach generalizes previous work assessing the impact of covariates using generalized linear models. The second approach uses of likelihood ratio tests methodology. The focus here is on statistical methods for estimation not the underlying cause of the change in error rates over time. These methodologies are applied to data from the National Institutes of Standards and Technology Biometric Score Set Release 1. The results of these applications are discussed.

  6. Testing statistical self-similarity in the topology of river networks

    USGS Publications Warehouse

    Troutman, Brent M.; Mantilla, Ricardo; Gupta, Vijay K.

    2010-01-01

    Recent work has demonstrated that the topological properties of real river networks deviate significantly from predictions of Shreve's random model. At the same time the property of mean self-similarity postulated by Tokunaga's model is well supported by data. Recently, a new class of network model called random self-similar networks (RSN) that combines self-similarity and randomness has been introduced to replicate important topological features observed in real river networks. We investigate if the hypothesis of statistical self-similarity in the RSN model is supported by data on a set of 30 basins located across the continental United States that encompass a wide range of hydroclimatic variability. We demonstrate that the generators of the RSN model obey a geometric distribution, and self-similarity holds in a statistical sense in 26 of these 30 basins. The parameters describing the distribution of interior and exterior generators are tested to be statistically different and the difference is shown to produce the well-known Hack's law. The inter-basin variability of RSN parameters is found to be statistically significant. We also test generator dependence on two climatic indices, mean annual precipitation and radiative index of dryness. Some indication of climatic influence on the generators is detected, but this influence is not statistically significant with the sample size available. Finally, two key applications of the RSN model to hydrology and geomorphology are briefly discussed.

  7. The American Arts Industry: Size and Significance.

    ERIC Educational Resources Information Center

    Chartrand, Harry Hillman

    In this study, the U.S. arts industry is conceptually defined and measured with respect to statistical size. The contribution and significance of the arts industry to the economy is then assessed within the context of national competitiveness and the emerging knowledge economy. Study findings indicate that the arts industry contributes between 5%…

  8. [Variation trend and significance of adult tonsil size and tongue position].

    PubMed

    Bin, X; Zhou, Y

    2016-08-05

    Objective: The aim of this study is to explore the changing trend and significance of adult tonsil size and tongue position by observing adults in different age groups. Method: Oropharyngeal cavities of 1 060 adults who undergoing health examination and had no history of tonsil surgery were observed. Friedman tongue position (FTP) and tonsil size (TS) were scored according to Friedman's criteria and results were statistic analyzed to evaluate their changing law and significance. Result: Mean FTP scores increased with age significantly( P <0.01); FTP score in male was lower than that in female( P <0.01). TS score significantly decreased with age( P <0.05).The average score of TS had no statistical significance in different gender. Although there was no statistical significance, total score of FTP show an increasing trend with age( P >0.05);Total scores of FTP were different between sexes(male 4.12±0.67,female 4.23±0.68, P <0.05).BMI was not found to be statistically different when FTP scores, TS scores and total scores changed ( P >0.05); but it showed an increasing trend with age( P <0.01). Conclusion: Width of pharyngeal cavity in normal adults is always kept in certain stability, while it proves to be narrower in obese people. TS score and FTP score, which appear the opposite trend with age, can be thought as a major factor to keep a stable width of oral pharyngeal cavity. Copyright© by the Editorial Department of Journal of Clinical Otorhinolaryngology Head and Neck Surgery.

  9. Efforts to improve international migration statistics: a historical perspective.

    PubMed

    Kraly, E P; Gnanasekaran, K S

    1987-01-01

    During the past decade, the international statistical community has made several efforts to develop standards for the definition, collection and publication of statistics on international migration. This article surveys the history of official initiatives to standardize international migration statistics by reviewing the recommendations of the International Statistical Institute, International Labor Organization, and the UN, and reports a recently proposed agenda for moving toward comparability among national statistical systems. Heightening awareness of the benefits of exchange and creating motivation to implement international standards requires a 3-pronged effort from the international statistical community. 1st, it is essential to continue discussion about the significance of improvement, specifically standardization, of international migration statistics. The move from theory to practice in this area requires ongoing focus by migration statisticians so that conformity to international standards itself becomes a criterion by which national statistical practices are examined and assessed. 2nd, the countries should be provided with technical documentation to support and facilitate the implementation of the recommended statistical systems. Documentation should be developed with an understanding that conformity to international standards for migration and travel statistics must be achieved within existing national statistical programs. 3rd, the call for statistical research in this area requires more efforts by the community of migration statisticians, beginning with the mobilization of bilateral and multilateral resources to undertake the preceding list of activities.

  10. Cancer Incidence of 2,4-D Production Workers

    PubMed Central

    Burns, Carol; Bodner, Kenneth; Swaen, Gerard; Collins, James; Beard, Kathy; Lee, Marcia

    2011-01-01

    Despite showing no evidence of carcinogenicity in laboratory animals, the herbicide 2,4-dichlorophenoxyacetic acid (2,4-D) has been associated with non-Hodgkin lymphoma (NHL) in some human epidemiology studies, albeit inconsistently. We matched an existing cohort of 2,4-D manufacturing employees with cancer registries in three US states resulting in 244 cancers compared to 276 expected cases. The Standardized Incidence Ratio (SIR) for the 14 NHL cases was 1.36 (95% Confidence Interval (CI) 0.74–2.29). Risk estimates were higher in the upper cumulative exposure and duration subgroups, yet not statistically significant. There were no clear patterns of NHL risk with period of hire and histology subtypes. Statistically significant results were observed for prostate cancer (SIR = 0.74, 95% CI 0.57–0.94), and “other respiratory” cancers (SIR = 3.79, 95% CI 1.22–8.84; 4 of 5 cases were mesotheliomas). Overall, we observed fewer cancer cases than expected, and a non statistically significant increase in the number of NHL cases. PMID:22016704

  11. Classical Statistics and Statistical Learning in Imaging Neuroscience

    PubMed Central

    Bzdok, Danilo

    2017-01-01

    Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques. PMID:29056896

  12. Stan: Statistical inference

    NASA Astrophysics Data System (ADS)

    Stan Development Team

    2018-01-01

    Stan facilitates statistical inference at the frontiers of applied statistics and provides both a modeling language for specifying complex statistical models and a library of statistical algorithms for computing inferences with those models. These components are exposed through interfaces in environments such as R, Python, and the command line.

  13. Forest statistics of central Kentucky

    Treesearch

    The Forest Survey Organization Central States Forest Experiment Station

    1950-01-01

    This Survey Release presents the more significant preliminary statistics on the forest area and timber volume for each of the four regions of Central Kentucky. A similar report has been published for the Western Kentucky region and a release for the eastern region will be issued as soon as field work and tabulations are completed. Later an analytical report for the...

  14. Cluster Size Statistic and Cluster Mass Statistic: Two Novel Methods for Identifying Changes in Functional Connectivity Between Groups or Conditions

    PubMed Central

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods – the cluster size statistic (CSS) and cluster mass statistic (CMS) – are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity. PMID:24906136

  15. Cluster size statistic and cluster mass statistic: two novel methods for identifying changes in functional connectivity between groups or conditions.

    PubMed

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.

  16. A Statistical Analysis of Brain Morphology Using Wild Bootstrapping

    PubMed Central

    Ibrahim, Joseph G.; Tang, Niansheng; Rowe, Daniel B.; Hao, Xuejun; Bansal, Ravi; Peterson, Bradley S.

    2008-01-01

    Methods for the analysis of brain morphology, including voxel-based morphology and surface-based morphometries, have been used to detect associations between brain structure and covariates of interest, such as diagnosis, severity of disease, age, IQ, and genotype. The statistical analysis of morphometric measures usually involves two statistical procedures: 1) invoking a statistical model at each voxel (or point) on the surface of the brain or brain subregion, followed by mapping test statistics (e.g., t test) or their associated p values at each of those voxels; 2) correction for the multiple statistical tests conducted across all voxels on the surface of the brain region under investigation. We propose the use of new statistical methods for each of these procedures. We first use a heteroscedastic linear model to test the associations between the morphological measures at each voxel on the surface of the specified subregion (e.g., cortical or subcortical surfaces) and the covariates of interest. Moreover, we develop a robust test procedure that is based on a resampling method, called wild bootstrapping. This procedure assesses the statistical significance of the associations between a measure of given brain structure and the covariates of interest. The value of this robust test procedure lies in its computationally simplicity and in its applicability to a wide range of imaging data, including data from both anatomical and functional magnetic resonance imaging (fMRI). Simulation studies demonstrate that this robust test procedure can accurately control the family-wise error rate. We demonstrate the application of this robust test procedure to the detection of statistically significant differences in the morphology of the hippocampus over time across gender groups in a large sample of healthy subjects. PMID:17649909

  17. Statistical Reference Datasets

    National Institute of Standards and Technology Data Gateway

    Statistical Reference Datasets (Web, free access)   The Statistical Reference Datasets is also supported by the Standard Reference Data Program. The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software.

  18. Clinicopathological significance of c-MYC in esophageal squamous cell carcinoma.

    PubMed

    Lian, Yu; Niu, Xiangdong; Cai, Hui; Yang, Xiaojun; Ma, Haizhong; Ma, Shixun; Zhang, Yupeng; Chen, Yifeng

    2017-07-01

    Esophageal squamous cell carcinoma is one of the most common malignant tumors. The oncogene c-MYC is thought to be important in the initiation, promotion, and therapy resistance of cancer. In this study, we aim to investigate the clinicopathologic roles of c-MYC in esophageal squamous cell carcinoma tissue. This study is aimed at discovering and analyzing c-MYC expression in a series of human esophageal tissues. A total of 95 esophageal squamous cell carcinoma samples were analyzed by the western blotting and immunohistochemistry techniques. Then, correlation of c-MYC expression with clinicopathological features of esophageal squamous cell carcinoma patients was statistically analyzed. In most esophageal squamous cell carcinoma cases, the c-MYC expression was positive in tumor tissues. The positive rate of c-MYC expression in tumor tissues was 61.05%, obviously higher than the adjacent normal tissues (8.42%, 8/92) and atypical hyperplasia tissues (19.75%, 16/95). There was a statistical difference among adjacent normal tissues, atypical hyperplasia tissues, and tumor tissues. Overexpression of the c-MYC was detected in 61.05% (58/95) esophageal squamous cell carcinomas, which was significantly correlated with the degree of differentiation (p = 0.004). The positive rate of c-MYC expression was 40.0% in well-differentiated esophageal tissues, with a significantly statistical difference (p = 0.004). The positive rate of c-MYC was 41.5% in T1 + T2 esophageal tissues and 74.1% in T3 + T4 esophageal tissues, with a significantly statistical difference (p = 0.001). The positive rate of c-MYC was 45.0% in I + II esophageal tissues and 72.2% in III + IV esophageal tissues, with a significantly statistical difference (p = 0.011). The c-MYC expression strongly correlated with clinical staging (p = 0.011), differentiation degree (p = 0.004), lymph node metastasis (p = 0.003), and invasion depth (p = 0.001) of patients with esophageal squamous cell carcinoma. The c-MYC was

  19. Comparing identified and statistically significant lipids and polar metabolites in 15-year old serum and dried blood spot samples for longitudinal studies: Comparing lipids and metabolites in serum and DBS samples

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kyle, Jennifer E.; Casey, Cameron P.; Stratton, Kelly G.

    The use of dried blood spots (DBS) has many advantages over traditional plasma and serum samples such as smaller blood volume required, storage at room temperature, and ability for sampling in remote locations. However, understanding the robustness of different analytes in DBS samples is essential, especially in older samples collected for longitudinal studies. Here we analyzed DBS samples collected in 2000-2001 and stored at room temperature and compared them to matched serum samples stored at -80°C to determine if they could be effectively used as specific time points in a longitudinal study following metabolic disease. Four hundred small molecules weremore » identified in both the serum and DBS samples using gas chromatograph-mass spectrometry (GC-MS), liquid chromatography-MS (LC-MS) and LC-ion mobility spectrometry-MS (LC-IMS-MS). The identified polar metabolites overlapped well between the sample types, though only one statistically significant polar metabolite in a case-control study was conserved, indicating degradation occurs in the DBS samples affecting quantitation. Differences in the lipid identifications indicated that some oxidation occurs in the DBS samples. However, thirty-six statistically significant lipids correlated in both sample types indicating that lipid quantitation was more stable across the sample types.« less

  20. Are studies reporting significant results more likely to be published?

    PubMed

    Koletsi, Despina; Karagianni, Anthi; Pandis, Nikolaos; Makou, Margarita; Polychronopoulou, Argy; Eliades, Theodore

    2009-11-01

    Our objective was to assess the hypothesis that there are variations of the proportion of articles reporting a significant effect, with a higher percentage of those articles published in journals with impact factors. The contents of 5 orthodontic journals (American Journal of Orthodontics and Dentofacial Orthopedics, Angle Orthodontist, European Journal of Orthodontics, Journal of Orthodontics, and Orthodontics and Craniofacial Research), published between 2004 and 2008, were hand-searched. Articles with statistical analysis of data were included in the study and classified into 4 categories: behavior and psychology, biomaterials and biomechanics, diagnostic procedures and treatment, and craniofacial growth, morphology, and genetics. In total, 2622 articles were examined, with 1785 included in the analysis. Univariate and multivariate logistic regression analyses were applied with statistical significance as the dependent variable, and whether the journal had an impact factor, the subject, and the year were the independent predictors. A higher percentage of articles showed significant results relative to those without significant associations (on average, 88% vs 12%) for those journals. Overall, these journals published significantly more studies with significant results, ranging from 75% to 90% (P = 0.02). Multivariate modeling showed that journals with impact factors had a 100% increased probability of publishing a statistically significant result compared with journals with no impact factor (odds ratio [OR], 1.99; 95% CI, 1.19-3.31). Compared with articles on biomaterials and biomechanics, all other subject categories showed lower probabilities of significant results. Nonsignificant findings in behavior and psychology and diagnosis and treatment were 1.8 (OR, 1.75; 95% CI, 1.51-2.67) and 3.5 (OR, 3.50; 95% CI, 2.27-5.37) times more likely to be published, respectively. Journals seem to prefer reporting significant results; this might be because of authors

  1. Can Money Buy Happiness? A Statistical Analysis of Predictors for User Satisfaction

    ERIC Educational Resources Information Center

    Hunter, Ben; Perret, Robert

    2011-01-01

    2007 data from LibQUAL+[TM] and the ACRL Library Trends and Statistics database were analyzed to determine if there is a statistically significant correlation between library expenditures and usage statistics and library patron satisfaction across 73 universities. The results show that users of larger, better funded libraries have higher…

  2. IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

    PubMed

    Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

    2017-09-15

    Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  3. Active Learning with Rationales for Identifying Operationally Significant Anomalies in Aviation

    NASA Technical Reports Server (NTRS)

    Sharma, Manali; Das, Kamalika; Bilgic, Mustafa; Matthews, Bryan; Nielsen, David Lynn; Oza, Nikunj C.

    2016-01-01

    A major focus of the commercial aviation community is discovery of unknown safety events in flight operations data. Data-driven unsupervised anomaly detection methods are better at capturing unknown safety events compared to rule-based methods which only look for known violations. However, not all statistical anomalies that are discovered by these unsupervised anomaly detection methods are operationally significant (e.g., represent a safety concern). Subject Matter Experts (SMEs) have to spend significant time reviewing these statistical anomalies individually to identify a few operationally significant ones. In this paper we propose an active learning algorithm that incorporates SME feedback in the form of rationales to build a classifier that can distinguish between uninteresting and operationally significant anomalies. Experimental evaluation on real aviation data shows that our approach improves detection of operationally significant events by as much as 75% compared to the state-of-the-art. The learnt classifier also generalizes well to additional validation data sets.

  4. CHOICE OF INDICATOR DETERMINES THE SIGNIFICANCE AND RISK OBTAINED FROM THE STATISTICAL ASSOCIATION BETWEN FINE PARTICULATE MATTER MASS AND CARDIOVASCULAR MORTALITY

    EPA Science Inventory

    Minor changes in the indicator used to measure fine PM, which cause only modest changes in Mass concentrations, can lead to dramatic changes in the statistical relationship of fine PM mass with cardiovascular mortality. An epidemiologic study in Phoenix (Mar et al., 2000), augme...

  5. [Evaluation of using statistical methods in selected national medical journals].

    PubMed

    Sych, Z

    1996-01-01

    most important methods of mathematical statistics such as parametric tests of significance, analysis of variance (in single and dual classifications). non-parametric tests of significance, correlation and regression. The works, in which use was made of either multiple correlation or multiple regression or else more complex methods of studying the relationship for two or more numbers of variables, were incorporated into the works whose statistical methods were constituted by correlation and regression as well as other methods, e.g. statistical methods being used in epidemiology (coefficients of incidence and morbidity, standardization of coefficients, survival tables) factor analysis conducted by Jacobi-Hotellng's method, taxonomic methods and others. On the basis of the performed studies it has been established that the frequency of employing statistical methods in the six selected national, medical journals in the years 1988-1992 was 61.1-66.0% of the analyzed works (Tab. 3), and they generally were almost similar to the frequency provided in English language medical journals. On a whole, no significant differences were disclosed in the frequency of applied statistical methods (Tab. 4) as well as in frequency of random tests (Tab. 3) in the analyzed works, appearing in the medical journals in respective years 1988-1992. The most frequently used statistical methods in analyzed works for 1988-1992 were the measures of position 44.2-55.6% and measures of dispersion 32.5-38.5% as well as parametric tests of significance 26.3-33.1% of the works analyzed (Tab. 4). For the purpose of increasing the frequency and reliability of the used statistical methods, the didactics should be widened in the field of biostatistics at medical studies and postgraduation training designed for physicians and scientific-didactic workers.

  6. Neuroendocrine Tumor: Statistics

    MedlinePlus

    ... Tumor > Neuroendocrine Tumor: Statistics Request Permissions Neuroendocrine Tumor: Statistics Approved by the Cancer.Net Editorial Board , 01/ ... the body. It is important to remember that statistics on the survival rates for people with a ...

  7. Satellite observation of pollutant emissions from gas flaring activities near the Arctic

    NASA Astrophysics Data System (ADS)

    Li, Can; Hsu, N. Christina; Sayer, Andrew M.; Krotkov, Nickolay A.; Fu, Joshua S.; Lamsal, Lok N.; Lee, Jaehwa; Tsay, Si-Chee

    2016-05-01

    Gas flaring is a common practice in the oil industry that can have significant environmental impacts, but has until recently been largely overlooked in terms of relevance to climate change. We utilize data from various satellite sensors to examine pollutant emissions from oil exploitation activities in four areas near the Arctic. Despite the remoteness of these sparsely populated areas, tropospheric NO2 retrieved from the Ozone Monitoring Instrument (OMI) is substantial at ˜1 × 1015 molecules cm-2, suggesting sizeable emissions from these industrial activities. Statistically significant (at the 95% confidence level, corresponding uncertainties in parentheses) increasing trends of 0.017 (±0.01) × 1015 and 0.015 (±0.006) × 1015 molecules cm-2 year-1 over 2004-2015 were found for Bakken (USA) and Athabasca (Canada), two areas having recently experienced fast expansion in the oil industry. This rapid change has implications for emission inventories, which are updated less frequently. No significant trend was found for the North Sea (Europe), where oil production has been declining since the 1990s. For northern Russia, the trend was just under the 95% significance threshold at 0.0057 (±0.006) × 1015 molecules cm-2 year-1. This raises an interesting inconsistency as prior studies have suggested that, in contrast to the continued, albeit slow, expansion of Russian oil/gas production, gas flaring in Russia has decreased in recent years. However, only a fraction of oil fields in Russia were covered in our analysis. Satellite aerosol optical depth (AOD) data revealed similar tendencies, albeit at a weaker level of statistical significance, due to the longer lifetime of aerosols and contributions from other sources. This study demonstrates that synergetic use of data from multiple satellite sensors can provide valuable information on pollutant emission sources that is otherwise difficult to acquire.

  8. Significance testing as perverse probabilistic reasoning

    PubMed Central

    2011-01-01

    Truth claims in the medical literature rely heavily on statistical significance testing. Unfortunately, most physicians misunderstand the underlying probabilistic logic of significance tests and consequently often misinterpret their results. This near-universal misunderstanding is highlighted by means of a simple quiz which we administered to 246 physicians at two major academic hospitals, on which the proportion of incorrect responses exceeded 90%. A solid understanding of the fundamental concepts of probability theory is becoming essential to the rational interpretation of medical information. This essay provides a technically sound review of these concepts that is accessible to a medical audience. We also briefly review the debate in the cognitive sciences regarding physicians' aptitude for probabilistic inference. PMID:21356064

  9. The SACE Review Panel's Final Report: Significant Flaws in the Analysis of Statistical Data

    ERIC Educational Resources Information Center

    Gregory, Kelvin

    2006-01-01

    The South Australian Certificate of Education (SACE) is a credential and formal qualification within the Australian Qualifications Framework. A recent review of the SACE outlined a number of recommendations for significant changes to this certificate. These recommendations were the result of a process that began with the review panel…

  10. Ergodic theorem, ergodic theory, and statistical mechanics

    PubMed Central

    Moore, Calvin C.

    2015-01-01

    This perspective highlights the mean ergodic theorem established by John von Neumann and the pointwise ergodic theorem established by George Birkhoff, proofs of which were published nearly simultaneously in PNAS in 1931 and 1932. These theorems were of great significance both in mathematics and in statistical mechanics. In statistical mechanics they provided a key insight into a 60-y-old fundamental problem of the subject—namely, the rationale for the hypothesis that time averages can be set equal to phase averages. The evolution of this problem is traced from the origins of statistical mechanics and Boltzman's ergodic hypothesis to the Ehrenfests' quasi-ergodic hypothesis, and then to the ergodic theorems. We discuss communications between von Neumann and Birkhoff in the Fall of 1931 leading up to the publication of these papers and related issues of priority. These ergodic theorems initiated a new field of mathematical-research called ergodic theory that has thrived ever since, and we discuss some of recent developments in ergodic theory that are relevant for statistical mechanics. PMID:25691697

  11. The Necessity of the Hippocampus for Statistical Learning

    PubMed Central

    Covington, Natalie V.; Brown-Schmidt, Sarah; Duff, Melissa C.

    2018-01-01

    Converging evidence points to a role for the hippocampus in statistical learning, but open questions about its necessity remain. Evidence for necessity comes from Schapiro and colleagues who report that a single patient with damage to hippocampus and broader medial temporal lobe cortex was unable to discriminate new from old sequences in several statistical learning tasks. The aim of the current study was to replicate these methods in a larger group of patients who have either damage localized to hippocampus or a broader medial temporal lobe damage, to ascertain the necessity of the hippocampus in statistical learning. Patients with hippocampal damage consistently showed less learning overall compared with healthy comparison participants, consistent with an emerging consensus for hippocampal contributions to statistical learning. Interestingly, lesion size did not reliably predict performance. However, patients with hippocampal damage were not uniformly at chance and demonstrated above-chance performance in some task variants. These results suggest that hippocampus is necessary for statistical learning levels achieved by most healthy comparison participants but significant hippocampal pathology alone does not abolish such learning. PMID:29308986

  12. Statistical Literacy: Developing a Youth and Adult Education Statistical Project

    ERIC Educational Resources Information Center

    Conti, Keli Cristina; Lucchesi de Carvalho, Dione

    2014-01-01

    This article focuses on the notion of literacy--general and statistical--in the analysis of data from a fieldwork research project carried out as part of a master's degree that investigated the teaching and learning of statistics in adult education mathematics classes. We describe the statistical context of the project that involved the…

  13. Problematizing Statistical Literacy: An Intersection of Critical and Statistical Literacies

    ERIC Educational Resources Information Center

    Weiland, Travis

    2017-01-01

    In this paper, I problematize traditional notions of statistical literacy by juxtaposing it with critical literacy. At the school level statistical literacy is vitally important for students who are preparing to become citizens in modern societies that are increasingly shaped and driven by data based arguments. The teaching of statistics, which is…

  14. Usage Statistics

    MedlinePlus

    ... this page: https://medlineplus.gov/usestatistics.html MedlinePlus Statistics To use the sharing features on this page, ... By Quarter View image full size Quarterly User Statistics Quarter Page Views Unique Visitors Oct-Dec-98 ...

  15. Second Language Experience Facilitates Statistical Learning of Novel Linguistic Materials.

    PubMed

    Potter, Christine E; Wang, Tianlin; Saffran, Jenny R

    2017-04-01

    Recent research has begun to explore individual differences in statistical learning, and how those differences may be related to other cognitive abilities, particularly their effects on language learning. In this research, we explored a different type of relationship between language learning and statistical learning: the possibility that learning a new language may also influence statistical learning by changing the regularities to which learners are sensitive. We tested two groups of participants, Mandarin Learners and Naïve Controls, at two time points, 6 months apart. At each time point, participants performed two different statistical learning tasks: an artificial tonal language statistical learning task and a visual statistical learning task. Only the Mandarin-learning group showed significant improvement on the linguistic task, whereas both groups improved equally on the visual task. These results support the view that there are multiple influences on statistical learning. Domain-relevant experiences may affect the regularities that learners can discover when presented with novel stimuli. Copyright © 2016 Cognitive Science Society, Inc.

  16. Second language experience facilitates statistical learning of novel linguistic materials

    PubMed Central

    Potter, Christine E.; Wang, Tianlin; Saffran, Jenny R.

    2016-01-01

    Recent research has begun to explore individual differences in statistical learning, and how those differences may be related to other cognitive abilities, particularly their effects on language learning. In the present research, we explored a different type of relationship between language learning and statistical learning: the possibility that learning a new language may also influence statistical learning by changing the regularities to which learners are sensitive. We tested two groups of participants, Mandarin Learners and Naïve Controls, at two time points, six months apart. At each time point, participants performed two different statistical learning tasks: an artificial tonal language statistical learning task and a visual statistical learning task. Only the Mandarin-learning group showed significant improvement on the linguistic task, while both groups improved equally on the visual task. These results support the view that there are multiple influences on statistical learning. Domain-relevant experiences may affect the regularities that learners can discover when presented with novel stimuli. PMID:27988939

  17. Statistics of contractive cracking patterns. [frozen soil-water rheology

    NASA Technical Reports Server (NTRS)

    Noever, David A.

    1991-01-01

    The statistics of convective soil patterns are analyzed using statistical crystallography. An underlying hierarchy of order is found to span four orders of magnitude in characteristic pattern length. Strict mathematical requirements determine the two-dimensional (2D) topology, such that random partitioning of space yields a predictable statistical geometry for polygons. For all lengths, Aboav's and Lewis's laws are verified; this result is consistent both with the need to fill 2D space and most significantly with energy carried not by the patterns' interior, but by the boundaries. Together, this suggests a common mechanism of formation for both micro- and macro-freezing patterns.

  18. Statistical yearbook

    DOT National Transportation Integrated Search

    2010-01-01

    The Statistical Yearbook is an annual compilation of a wide range of international economic, social and environmental statistics on over 200 countries and areas, compiled from sources including UN agencies and other international, national and specia...

  19. Gene Identification Algorithms Using Exploratory Statistical Analysis of Periodicity

    NASA Astrophysics Data System (ADS)

    Mukherjee, Shashi Bajaj; Sen, Pradip Kumar

    2010-10-01

    Studying periodic pattern is expected as a standard line of attack for recognizing DNA sequence in identification of gene and similar problems. But peculiarly very little significant work is done in this direction. This paper studies statistical properties of DNA sequences of complete genome using a new technique. A DNA sequence is converted to a numeric sequence using various types of mappings and standard Fourier technique is applied to study the periodicity. Distinct statistical behaviour of periodicity parameters is found in coding and non-coding sequences, which can be used to distinguish between these parts. Here DNA sequences of Drosophila melanogaster were analyzed with significant accuracy.

  20. Statistical analogues of thermodynamic extremum principles

    NASA Astrophysics Data System (ADS)

    Ramshaw, John D.

    2018-05-01

    As shown by Jaynes, the canonical and grand canonical probability distributions of equilibrium statistical mechanics can be simply derived from the principle of maximum entropy, in which the statistical entropy S=- {k}{{B}}{\\sum }i{p}i{log}{p}i is maximised subject to constraints on the mean values of the energy E and/or number of particles N in a system of fixed volume V. The Lagrange multipliers associated with those constraints are then found to be simply related to the temperature T and chemical potential μ. Here we show that the constrained maximisation of S is equivalent to, and can therefore be replaced by, the essentially unconstrained minimisation of the obvious statistical analogues of the Helmholtz free energy F = E ‑ TS and the grand potential J = F ‑ μN. Those minimisations are more easily performed than the maximisation of S because they formally eliminate the constraints on the mean values of E and N and their associated Lagrange multipliers. This procedure significantly simplifies the derivation of the canonical and grand canonical probability distributions, and shows that the well known extremum principles for the various thermodynamic potentials possess natural statistical analogues which are equivalent to the constrained maximisation of S.

  1. [Completeness of mortality statistics in Navarra, Spain].

    PubMed

    Moreno-Iribas, Conchi; Guevara, Marcela; Díaz-González, Jorge; Álvarez-Arruti, Nerea; Casado, Itziar; Delfrade, Josu; Larumbe, Emilia; Aguirre, Jesús; Floristán, Yugo

    2013-01-01

    Women in the region of Navarra, Spain, have one of the highest life expectancies at birth in Europe. The aim of this study is to assess the completeness of the official mortality statistics of Navarra in 2009 and the impact of the under-registration of deaths on life expectancy estimates. Comparison of the number of deaths in Navarra using the official statistics from the Instituto Nacional de Estadística (INE) and the data derived from a multiple-source case-finding: the electronic health record, Instituto Navarro de Medicina Legal and INE including data that they received late. 5,249 deaths were identified, of which 103 were not included in the official mortality statistics. Taking into account only deaths that occurred in Spain, which are the only ones considered for the official statistics, the completeness was 98.4%. Estimated life expectancy at birth in 2009 descended from 86.6 years to 86.4 in women and from 80.0 to 79.6 years in men, after correcting for undercount. The results of this study ruled out the existence of significant under-registration of the official mortality statistics, confirming the exceptional longevity of women in Navarra, who are in the top position in Europe with a life expectancy at birth of 86.4 years.

  2. Statistical label fusion with hierarchical performance models

    PubMed Central

    Asman, Andrew J.; Dagley, Alexander S.; Landman, Bennett A.

    2014-01-01

    Label fusion is a critical step in many image segmentation frameworks (e.g., multi-atlas segmentation) as it provides a mechanism for generalizing a collection of labeled examples into a single estimate of the underlying segmentation. In the multi-label case, typical label fusion algorithms treat all labels equally – fully neglecting the known, yet complex, anatomical relationships exhibited in the data. To address this problem, we propose a generalized statistical fusion framework using hierarchical models of rater performance. Building on the seminal work in statistical fusion, we reformulate the traditional rater performance model from a multi-tiered hierarchical perspective. This new approach provides a natural framework for leveraging known anatomical relationships and accurately modeling the types of errors that raters (or atlases) make within a hierarchically consistent formulation. Herein, we describe several contributions. First, we derive a theoretical advancement to the statistical fusion framework that enables the simultaneous estimation of multiple (hierarchical) performance models within the statistical fusion context. Second, we demonstrate that the proposed hierarchical formulation is highly amenable to the state-of-the-art advancements that have been made to the statistical fusion framework. Lastly, in an empirical whole-brain segmentation task we demonstrate substantial qualitative and significant quantitative improvement in overall segmentation accuracy. PMID:24817809

  3. Statistics without Tears: Complex Statistics with Simple Arithmetic

    ERIC Educational Resources Information Center

    Smith, Brian

    2011-01-01

    One of the often overlooked aspects of modern statistics is the analysis of time series data. Modern introductory statistics courses tend to rush to probabilistic applications involving risk and confidence. Rarely does the first level course linger on such useful and fascinating topics as time series decomposition, with its practical applications…

  4. [Predicative significance of HRV and HRT to premature beat on patients with coal worker's pneumoconiosis].

    PubMed

    Bao, Ying; Wang, Dejun; Du, Zhenlan; Liu, Shuhen

    2014-07-01

    To determine the predicative significance of HRV and HRT to premature beat on patients with coal-worker's pneumoconiosis. 100 coal-worker's pneumoconiosis patients with premature beat (including 44 cases of occasional ventricular premature contraction and 56 cases of frequent ventricular premature contraction) were chosen as CWP group, and 50 healthy coal workers were chosen as control group. 24 h DCG was used to monitor and analyze the change of premature beat and to calculate HRV. Index: SDNN, SDANN, HFLF, HRT: TO, TS, compare HRV of CWP group and control group and the changes of HRT of both occasional and frequent ventricular premature contraction. The incidence of CWP at night (66.1%, 37 cases) is higher than that during daytime (33.9%, 19 cases), and the difference is statistically significant with P < 0.05. HRV (SDNN SDANN HF HL) indexes of CWP group are lower than control group, and the difference is statistically significant with P < 0.05. HRV indexes of control group at night are higher than that during daytime, and the difference is statistically significant with P < 0.05. Comparison of CWP group HRV indexes between day and night is statistically insignificant with P > 0.05. Compared with control group, TO of CWP group is higher while TS is lower, and the difference is statistically significant with P < 0.05. Compared with occasional ventricular premature contraction patients in CWP group, TO of frequent ventricular premature contraction patients is higher while TS is lower, and the difference is statistically significant with P < 0.05. Frequent ventricular premature contraction group in CWP group suffer from severe impaired autonomic nervous function injury, and abnormal HRV and HRT can be prognostic indicator of frequent ventricular premature contraction among coal-worker's pneumoconiosis patients.

  5. After p Values: The New Statistics for Undergraduate Neuroscience Education.

    PubMed

    Calin-Jageman, Robert J

    2017-01-01

    Statistical inference is a methodological cornerstone for neuroscience education. For many years this has meant inculcating neuroscience majors into null hypothesis significance testing with p values. There is increasing concern, however, about the pervasive misuse of p values. It is time to start planning statistics curricula for neuroscience majors that replaces or de-emphasizes p values. One promising alternative approach is what Cumming has dubbed the "New Statistics", an approach that emphasizes effect sizes, confidence intervals, meta-analysis, and open science. I give an example of the New Statistics in action and describe some of the key benefits of adopting this approach in neuroscience education.

  6. Principles of Statistics: What the Sports Medicine Professional Needs to Know.

    PubMed

    Riemann, Bryan L; Lininger, Monica R

    2018-07-01

    Understanding the results and statistics reported in original research remains a large challenge for many sports medicine practitioners and, in turn, may be among one of the biggest barriers to integrating research into sports medicine practice. The purpose of this article is to provide minimal essentials a sports medicine practitioner needs to know about interpreting statistics and research results to facilitate the incorporation of the latest evidence into practice. Topics covered include the difference between statistical significance and clinical meaningfulness; effect sizes and confidence intervals; reliability statistics, including the minimal detectable difference and minimal important difference; and statistical power. Copyright © 2018 Elsevier Inc. All rights reserved.

  7. A two-component rain model for the prediction of attenuation statistics

    NASA Technical Reports Server (NTRS)

    Crane, R. K.

    1982-01-01

    A two-component rain model has been developed for calculating attenuation statistics. In contrast to most other attenuation prediction models, the two-component model calculates the occurrence probability for volume cells or debris attenuation events. The model performed significantly better than the International Radio Consultative Committee model when used for predictions on earth-satellite paths. It is expected that the model will have applications in modeling the joint statistics required for space diversity system design, the statistics of interference due to rain scatter at attenuating frequencies, and the duration statistics for attenuation events.

  8. Deformation effect on spectral statistics of nuclei

    NASA Astrophysics Data System (ADS)

    Sabri, H.; Jalili Majarshin, A.

    2018-02-01

    In this study, we tried to get significant relations between the spectral statistics of atomic nuclei and their different degrees of deformations. To this aim, the empirical energy levels of 109 even-even nuclei in the 22 ≤ A ≤ 196 mass region are classified as their experimental and calculated quadrupole, octupole, hexadecapole and hexacontatetrapole deformations values and analyzed by random matrix theory. Our results show an obvious relation between the regularity of nuclei and strong quadrupole, hexadecapole and hexacontatetrapole deformations and but for nuclei that their octupole deformations are nonzero, we have observed a GOE-like statistics.

  9. An application of statistics to comparative metagenomics

    PubMed Central

    Rodriguez-Brito, Beltran; Rohwer, Forest; Edwards, Robert A

    2006-01-01

    Background Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments. Results Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified. Conclusion The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems. PMID:16549025

  10. An application of statistics to comparative metagenomics.

    PubMed

    Rodriguez-Brito, Beltran; Rohwer, Forest; Edwards, Robert A

    2006-03-20

    Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments. Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified. The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems.

  11. Attitudes toward statistics in medical postgraduates: measuring, evaluating and monitoring.

    PubMed

    Zhang, Yuhai; Shang, Lei; Wang, Rui; Zhao, Qinbo; Li, Chanjuan; Xu, Yongyong; Su, Haixia

    2012-11-23

    In medical training, statistics is considered a very difficult course to learn and teach. Current studies have found that students' attitudes toward statistics can influence their learning process. Measuring, evaluating and monitoring the changes of students' attitudes toward statistics are important. Few studies have focused on the attitudes of postgraduates, especially medical postgraduates. Our purpose was to understand current attitudes regarding statistics held by medical postgraduates and explore their effects on students' achievement. We also wanted to explore the influencing factors and the sources of these attitudes and monitor their changes after a systematic statistics course. A total of 539 medical postgraduates enrolled in a systematic statistics course completed the pre-form of the Survey of Attitudes Toward Statistics -28 scale, and 83 postgraduates were selected randomly from among them to complete the post-form scale after the course. Most medical postgraduates held positive attitudes toward statistics, but they thought statistics was a very difficult subject. The attitudes mainly came from experiences in a former statistical or mathematical class. Age, level of statistical education, research experience, specialty and mathematics basis may influence postgraduate attitudes toward statistics. There were significant positive correlations between course achievement and attitudes toward statistics. In general, student attitudes showed negative changes after completing a statistics course. The importance of student attitudes toward statistics must be recognized in medical postgraduate training. To make sure all students have a positive learning environment, statistics teachers should measure their students' attitudes and monitor their change of status during a course. Some necessary assistance should be offered for those students who develop negative attitudes.

  12. On the Determination of Poisson Statistics for Haystack Radar Observations of Orbital Debris

    NASA Technical Reports Server (NTRS)

    Stokely, Christopher L.; Benbrook, James R.; Horstman, Matt

    2007-01-01

    A convenient and powerful method is used to determine if radar detections of orbital debris are observed according to Poisson statistics. This is done by analyzing the time interval between detection events. For Poisson statistics, the probability distribution of the time interval between events is shown to be an exponential distribution. This distribution is a special case of the Erlang distribution that is used in estimating traffic loads on telecommunication networks. Poisson statistics form the basis of many orbital debris models but the statistical basis of these models has not been clearly demonstrated empirically until now. Interestingly, during the fiscal year 2003 observations with the Haystack radar in a fixed staring mode, there are no statistically significant deviations observed from that expected with Poisson statistics, either independent or dependent of altitude or inclination. One would potentially expect some significant clustering of events in time as a result of satellite breakups, but the presence of Poisson statistics indicates that such debris disperse rapidly with respect to Haystack's very narrow radar beam. An exception to Poisson statistics is observed in the months following the intentional breakup of the Fengyun satellite in January 2007.

  13. 40 CFR Appendix IV to Part 265 - Tests for Significance

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... changes in the concentration or value of an indicator parameter in periodic ground-water samples when... then be compared to the value of the t-statistic found in a table for t-test of significance at the specified level of significance. A calculated value of t which exceeds the value of t found in the table...

  14. 40 CFR Appendix IV to Part 265 - Tests for Significance

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... changes in the concentration or value of an indicator parameter in periodic ground-water samples when... then be compared to the value of the t-statistic found in a table for t-test of significance at the specified level of significance. A calculated value of t which exceeds the value of t found in the table...

  15. 40 CFR Appendix IV to Part 265 - Tests for Significance

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... changes in the concentration or value of an indicator parameter in periodic ground-water samples when... then be compared to the value of the t-statistic found in a table for t-test of significance at the specified level of significance. A calculated value of t which exceeds the value of t found in the table...

  16. 40 CFR Appendix IV to Part 265 - Tests for Significance

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... changes in the concentration or value of an indicator parameter in periodic ground-water samples when... then be compared to the value of the t-statistic found in a table for t-test of significance at the specified level of significance. A calculated value of t which exceeds the value of t found in the table...

  17. Perspectives on statistics education: observations from statistical consulting in an academic nursing environment.

    PubMed

    Hayat, Matthew J; Schmiege, Sarah J; Cook, Paul F

    2014-04-01

    Statistics knowledge is essential for understanding the nursing and health care literature, as well as for applying rigorous science in nursing research. Statistical consultants providing services to faculty and students in an academic nursing program have the opportunity to identify gaps and challenges in statistics education for nursing students. This information may be useful to curriculum committees and statistics educators. This article aims to provide perspective on statistics education stemming from the experiences of three experienced statistics educators who regularly collaborate and consult with nurse investigators. The authors share their knowledge and express their views about data management, data screening and manipulation, statistical software, types of scientific investigation, and advanced statistical topics not covered in the usual coursework. The suggestions provided promote a call for data to study these topics. Relevant data about statistics education can assist educators in developing comprehensive statistics coursework for nursing students. Copyright 2014, SLACK Incorporated.

  18. Modeling Ka-band low elevation angle propagation statistics

    NASA Technical Reports Server (NTRS)

    Russell, Thomas A.; Weinfield, John; Pearson, Chris; Ippolito, Louis J.

    1995-01-01

    The statistical variability of the secondary atmospheric propagation effects on satellite communications cannot be ignored at frequencies of 20 GHz or higher, particularly if the propagation margin allocation is such that link availability falls below 99 percent. The secondary effects considered in this paper are gaseous absorption, cloud absorption, and tropospheric scintillation; rain attenuation is the primary effect. Techniques and example results are presented for estimation of the overall combined impact of the atmosphere on satellite communications reliability. Statistical methods are employed throughout and the most widely accepted models for the individual effects are used wherever possible. The degree of correlation between the effects is addressed and some bounds on the expected variability in the combined effects statistics are derived from the expected variability in correlation. Example estimates are presented of combined effects statistics in the Washington D.C. area of 20 GHz and 5 deg elevation angle. The statistics of water vapor are shown to be sufficient for estimation of the statistics of gaseous absorption at 20 GHz. A computer model based on monthly surface weather is described and tested. Significant improvement in prediction of absorption extremes is demonstrated with the use of path weather data instead of surface data.

  19. Helping Students Develop Statistical Reasoning: Implementing a Statistical Reasoning Learning Environment

    ERIC Educational Resources Information Center

    Garfield, Joan; Ben-Zvi, Dani

    2009-01-01

    This article describes a model for an interactive, introductory secondary- or tertiary-level statistics course that is designed to develop students' statistical reasoning. This model is called a "Statistical Reasoning Learning Environment" and is built on the constructivist theory of learning.

  20. Statistics usage in the American Journal of Obstetrics and Gynecology: has anything changed?

    PubMed

    Welch, Gerald E; Gabbe, Steven G

    2002-03-01

    Our purpose was to compare statistical listing and usage between articles published in the American Journal of Obstetrics and Gynecology in 1994 with those published in 1999. All papers included in the obstetrics, fetus-placenta-newborn, and gynecology sections and the transactions of societies sections of the January through June 1999 issues of the American Journal of Obstetrics and Gynecology (volume 180, numbers 1 to 6) were reviewed for statistical usage. Each paper was given a rating for the cataloging of applied statistics and a rating for the appropriateness of statistical usage, when possible. These results were compared with the data collected on a similar review of articles published in 1994. Of the 238 available articles, 195 contained statistics and were reviewed. In comparison to the articles published in 1994, there were significantly more articles that completely cataloged applied statistics (74.3% vs 47.4%) (P <.0001), and there was a significant improvement in appropriateness of statistical usage (56.4% vs 30.3%) (P <.0001). Changes in the Instructions to Authors regarding the description of applied statistics and probable changes in the behavior of researchers and Editors have led to an improvement in the quality of statistics in papers published in the American Journal of Obstetrics and Gynecology.

  1. Digest of Educational Statistics, 1962. Bulletin, 1963, No. 10. OE-10024

    ERIC Educational Resources Information Center

    Office of Education, US Department of Health, Education, and Welfare, 1962

    1962-01-01

    This digest is a compilation of the more significant statistical material available in the Office of Education on the American educational system. It contains information on a variety of subjects within the broad field of educational statistics, including schools, enrollments, teachers, graduates, educational attainment, finances, and Federal…

  2. The new statistics: why and how.

    PubMed

    Cumming, Geoff

    2014-01-01

    We need to make substantial changes to how we conduct research. First, in response to heightened concern that our published research literature is incomplete and untrustworthy, we need new requirements to ensure research integrity. These include prespecification of studies whenever possible, avoidance of selection and other inappropriate data-analytic practices, complete reporting, and encouragement of replication. Second, in response to renewed recognition of the severe flaws of null-hypothesis significance testing (NHST), we need to shift from reliance on NHST to estimation and other preferred techniques. The new statistics refers to recommended practices, including estimation based on effect sizes, confidence intervals, and meta-analysis. The techniques are not new, but adopting them widely would be new for many researchers, as well as highly beneficial. This article explains why the new statistics are important and offers guidance for their use. It describes an eight-step new-statistics strategy for research with integrity, which starts with formulation of research questions in estimation terms, has no place for NHST, and is aimed at building a cumulative quantitative discipline.

  3. Health Statistics

    MedlinePlus

    ... births, deaths, marriages, and divorces are sometimes called "vital statistics." Researchers use statistics to see patterns of diseases in groups of people. This can help in figuring out who is at risk for certain diseases, finding ways to control diseases and deciding which diseases ...

  4. Potential errors and misuse of statistics in studies on leakage in endodontics.

    PubMed

    Lucena, C; Lopez, J M; Pulgar, R; Abalos, C; Valderrama, M J

    2013-04-01

    To assess the quality of the statistical methodology used in studies of leakage in Endodontics, and to compare the results found using appropriate versus inappropriate inferential statistical methods. The search strategy used the descriptors 'root filling' 'microleakage', 'dye penetration', 'dye leakage', 'polymicrobial leakage' and 'fluid filtration' for the time interval 2001-2010 in journals within the categories 'Dentistry, Oral Surgery and Medicine' and 'Materials Science, Biomaterials' of the Journal Citation Report. All retrieved articles were reviewed to find potential pitfalls in statistical methodology that may be encountered during study design, data management or data analysis. The database included 209 papers. In all the studies reviewed, the statistical methods used were appropriate for the category attributed to the outcome variable, but in 41% of the cases, the chi-square test or parametric methods were inappropriately selected subsequently. In 2% of the papers, no statistical test was used. In 99% of cases, a statistically 'significant' or 'not significant' effect was reported as a main finding, whilst only 1% also presented an estimation of the magnitude of the effect. When the appropriate statistical methods were applied in the studies with originally inappropriate data analysis, the conclusions changed in 19% of the cases. Statistical deficiencies in leakage studies may affect their results and interpretation and might be one of the reasons for the poor agreement amongst the reported findings. Therefore, more effort should be made to standardize statistical methodology. © 2012 International Endodontic Journal.

  5. Statistical properties of Fourier-based time-lag estimates

    NASA Astrophysics Data System (ADS)

    Epitropakis, A.; Papadakis, I. E.

    2016-06-01

    observed time series; b) smoothing of the cross-periodogram should be avoided, as this may introduce significant bias to the time-lag estimates, which can be taken into account by assuming a model cross-spectrum (and not just a model time-lag spectrum); c) time-lags should be estimated by dividing observed time series into a number, say m, of shorter data segments and averaging the resulting cross-periodograms; d) if the data segments have a duration ≳ 20 ks, the time-lag bias is ≲15% of its intrinsic value for the model cross-spectra and power-spectra considered in this work. This bias should be estimated in practise (by considering possible intrinsic cross-spectra that may be applicable to the time-lag spectra at hand) to assess the reliability of any time-lag analysis; e) the effects of experimental noise can be minimised by only estimating time-lags in the frequency range where the sample coherence is larger than 1.2/(1 + 0.2m). In this range, the amplitude of noise variations caused by measurement errors is smaller than the amplitude of the signal's intrinsic variations. As long as m ≳ 20, time-lags estimated by averaging over individual data segments have analytical error estimates that are within 95% of the true scatter around their mean, and their distribution is similar, albeit not identical, to a Gaussian.

  6. Looking Back over Their Shoulders: A Qualitative Analysis of Portuguese Teachers' Attitudes towards Statistics

    ERIC Educational Resources Information Center

    Martins, Jose Alexandre; Nascimento, Maria Manuel; Estrada, Assumpta

    2012-01-01

    Teachers' attitudes towards statistics can have a significant effect on their own statistical training, their teaching of statistics, and the future attitudes of their students. The influence of attitudes in teaching statistics in different contexts was previously studied in the work of Estrada et al. (2004, 2010a, 2010b) and Martins et al.…

  7. Ethics in Statistics

    ERIC Educational Resources Information Center

    Lenard, Christopher; McCarthy, Sally; Mills, Terence

    2014-01-01

    There are many different aspects of statistics. Statistics involves mathematics, computing, and applications to almost every field of endeavour. Each aspect provides an opportunity to spark someone's interest in the subject. In this paper we discuss some ethical aspects of statistics, and describe how an introduction to ethics has been…

  8. Larch Litter Removal Has No Significant Effect On Runoff

    Treesearch

    Richard S. Startz; David N. Tolsted

    1974-01-01

    Runoff was measured on paired litter-removed, litter-left plots in an 11-year-old European larch plantation. On five of the six pairs of plots, the plot with the litter left intact yielded more runoff. however, the differences were neither statistically nor hydrologically significant.

  9. Measuring Student Learning in Social Statistics: A Pretest-Posttest Study of Knowledge Gain

    ERIC Educational Resources Information Center

    Delucchi, Michael

    2014-01-01

    This study used a pretest-posttest design to measure student learning in undergraduate statistics. Data were derived from 185 students enrolled in six different sections of a social statistics course taught over a seven-year period by the same sociology instructor. The pretest-posttest instrument reveals statistically significant gains in…

  10. New heterogeneous test statistics for the unbalanced fixed-effect nested design.

    PubMed

    Guo, Jiin-Huarng; Billard, L; Luh, Wei-Ming

    2011-05-01

    When the underlying variances are unknown or/and unequal, using the conventional F test is problematic in the two-factor hierarchical data structure. Prompted by the approximate test statistics (Welch and Alexander-Govern methods), the authors develop four new heterogeneous test statistics to test factor A and factor B nested within A for the unbalanced fixed-effect two-stage nested design under variance heterogeneity. The actual significance levels and statistical power of the test statistics were compared in a simulation study. The results show that the proposed procedures maintain better Type I error rate control and have greater statistical power than those obtained by the conventional F test in various conditions. Therefore, the proposed test statistics are recommended in terms of robustness and easy implementation. ©2010 The British Psychological Society.

  11. State Transportation Statistics 2014

    DOT National Transportation Integrated Search

    2014-12-15

    The Bureau of Transportation Statistics (BTS) presents State Transportation Statistics 2014, a statistical profile of transportation in the 50 states and the District of Columbia. This is the 12th annual edition of State Transportation Statistics, a ...

  12. Attitudes toward statistics in medical postgraduates: measuring, evaluating and monitoring

    PubMed Central

    2012-01-01

    Background In medical training, statistics is considered a very difficult course to learn and teach. Current studies have found that students’ attitudes toward statistics can influence their learning process. Measuring, evaluating and monitoring the changes of students’ attitudes toward statistics are important. Few studies have focused on the attitudes of postgraduates, especially medical postgraduates. Our purpose was to understand current attitudes regarding statistics held by medical postgraduates and explore their effects on students’ achievement. We also wanted to explore the influencing factors and the sources of these attitudes and monitor their changes after a systematic statistics course. Methods A total of 539 medical postgraduates enrolled in a systematic statistics course completed the pre-form of the Survey of Attitudes Toward Statistics −28 scale, and 83 postgraduates were selected randomly from among them to complete the post-form scale after the course. Results Most medical postgraduates held positive attitudes toward statistics, but they thought statistics was a very difficult subject. The attitudes mainly came from experiences in a former statistical or mathematical class. Age, level of statistical education, research experience, specialty and mathematics basis may influence postgraduate attitudes toward statistics. There were significant positive correlations between course achievement and attitudes toward statistics. In general, student attitudes showed negative changes after completing a statistics course. Conclusions The importance of student attitudes toward statistics must be recognized in medical postgraduate training. To make sure all students have a positive learning environment, statistics teachers should measure their students’ attitudes and monitor their change of status during a course. Some necessary assistance should be offered for those students who develop negative attitudes. PMID:23173770

  13. Applications of spatial statistical network models to stream data

    USGS Publications Warehouse

    Isaak, Daniel J.; Peterson, Erin E.; Ver Hoef, Jay M.; Wenger, Seth J.; Falke, Jeffrey A.; Torgersen, Christian E.; Sowder, Colin; Steel, E. Ashley; Fortin, Marie-Josée; Jordan, Chris E.; Ruesch, Aaron S.; Som, Nicholas; Monestiez, Pascal

    2014-01-01

    Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for terrestrial applications and are not optimized for streams. A new class of spatial statistical model, based on valid covariance structures for stream networks, can be used with many common types of stream data (e.g., water quality attributes, habitat conditions, biological surveys) through application of appropriate distributions (e.g., Gaussian, binomial, Poisson). The spatial statistical network models account for spatial autocorrelation (i.e., nonindependence) among measurements, which allows their application to databases with clustered measurement locations. Large amounts of stream data exist in many areas where spatial statistical analyses could be used to develop novel insights, improve predictions at unsampled sites, and aid in the design of efficient monitoring strategies at relatively low cost. We review the topic of spatial autocorrelation and its effects on statistical inference, demonstrate the use of spatial statistics with stream datasets relevant to common research and management questions, and discuss additional applications and development potential for spatial statistics on stream networks. Free software for implementing the spatial statistical network models has been developed that enables custom applications with many stream databases.

  14. P values in display items are ubiquitous and almost invariably significant: A survey of top science journals

    PubMed Central

    Cristea, Ioana Alina

    2018-01-01

    P values represent a widely used, but pervasively misunderstood and fiercely contested method of scientific inference. Display items, such as figures and tables, often containing the main results, are an important source of P values. We conducted a survey comparing the overall use of P values and the occurrence of significant P values in display items of a sample of articles in the three top multidisciplinary journals (Nature, Science, PNAS) in 2017 and, respectively, in 1997. We also examined the reporting of multiplicity corrections and its potential influence on the proportion of statistically significant P values. Our findings demonstrated substantial and growing reliance on P values in display items, with increases of 2.5 to 14.5 times in 2017 compared to 1997. The overwhelming majority of P values (94%, 95% confidence interval [CI] 92% to 96%) were statistically significant. Methods to adjust for multiplicity were almost non-existent in 1997, but reported in many articles relying on P values in 2017 (Nature 68%, Science 48%, PNAS 38%). In their absence, almost all reported P values were statistically significant (98%, 95% CI 96% to 99%). Conversely, when any multiplicity corrections were described, 88% (95% CI 82% to 93%) of reported P values were statistically significant. Use of Bayesian methods was scant (2.5%) and rarely (0.7%) articles relied exclusively on Bayesian statistics. Overall, wider appreciation of the need for multiplicity corrections is a welcome evolution, but the rapid growth of reliance on P values and implausibly high rates of reported statistical significance are worrisome. PMID:29763472

  15. P values in display items are ubiquitous and almost invariably significant: A survey of top science journals.

    PubMed

    Cristea, Ioana Alina; Ioannidis, John P A

    2018-01-01

    P values represent a widely used, but pervasively misunderstood and fiercely contested method of scientific inference. Display items, such as figures and tables, often containing the main results, are an important source of P values. We conducted a survey comparing the overall use of P values and the occurrence of significant P values in display items of a sample of articles in the three top multidisciplinary journals (Nature, Science, PNAS) in 2017 and, respectively, in 1997. We also examined the reporting of multiplicity corrections and its potential influence on the proportion of statistically significant P values. Our findings demonstrated substantial and growing reliance on P values in display items, with increases of 2.5 to 14.5 times in 2017 compared to 1997. The overwhelming majority of P values (94%, 95% confidence interval [CI] 92% to 96%) were statistically significant. Methods to adjust for multiplicity were almost non-existent in 1997, but reported in many articles relying on P values in 2017 (Nature 68%, Science 48%, PNAS 38%). In their absence, almost all reported P values were statistically significant (98%, 95% CI 96% to 99%). Conversely, when any multiplicity corrections were described, 88% (95% CI 82% to 93%) of reported P values were statistically significant. Use of Bayesian methods was scant (2.5%) and rarely (0.7%) articles relied exclusively on Bayesian statistics. Overall, wider appreciation of the need for multiplicity corrections is a welcome evolution, but the rapid growth of reliance on P values and implausibly high rates of reported statistical significance are worrisome.

  16. Statistics Clinic

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

    2014-01-01

    Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.

  17. Statistics and Discoveries at the LHC (1/4)

    ScienceCinema

    Cowan, Glen

    2018-02-09

    The lectures will give an introduction to statistics as applied in particle physics and will provide all the necessary basics for data analysis at the LHC. Special emphasis will be placed on the the problems and questions that arise when searching for new phenomena, including p-values, discovery significance, limit setting procedures, treatment of small signals in the presence of large backgrounds. Specific issues that will be addressed include the advantages and drawbacks of different statistical test procedures (cut-based, likelihood-ratio, etc.), the look-elsewhere effect and treatment of systematic uncertainties.

  18. Statistics and Discoveries at the LHC (3/4)

    ScienceCinema

    Cowan, Glen

    2018-02-19

    The lectures will give an introduction to statistics as applied in particle physics and will provide all the necessary basics for data analysis at the LHC. Special emphasis will be placed on the the problems and questions that arise when searching for new phenomena, including p-values, discovery significance, limit setting procedures, treatment of small signals in the presence of large backgrounds. Specific issues that will be addressed include the advantages and drawbacks of different statistical test procedures (cut-based, likelihood-ratio, etc.), the look-elsewhere effect and treatment of systematic uncertainties.

  19. Statistics and Discoveries at the LHC (4/4)

    ScienceCinema

    Cowan, Glen

    2018-05-22

    The lectures will give an introduction to statistics as applied in particle physics and will provide all the necessary basics for data analysis at the LHC. Special emphasis will be placed on the the problems and questions that arise when searching for new phenomena, including p-values, discovery significance, limit setting procedures, treatment of small signals in the presence of large backgrounds. Specific issues that will be addressed include the advantages and drawbacks of different statistical test procedures (cut-based, likelihood-ratio, etc.), the look-elsewhere effect and treatment of systematic uncertainties.

  20. Statistics and Discoveries at the LHC (2/4)

    ScienceCinema

    Cowan, Glen

    2018-04-26

    The lectures will give an introduction to statistics as applied in particle physics and will provide all the necessary basics for data analysis at the LHC. Special emphasis will be placed on the the problems and questions that arise when searching for new phenomena, including p-values, discovery significance, limit setting procedures, treatment of small signals in the presence of large backgrounds. Specific issues that will be addressed include the advantages and drawbacks of different statistical test procedures (cut-based, likelihood-ratio, etc.), the look-elsewhere effect and treatment of systematic uncertainties.

  1. Performing Inferential Statistics Prior to Data Collection

    ERIC Educational Resources Information Center

    Trafimow, David; MacDonald, Justin A.

    2017-01-01

    Typically, in education and psychology research, the investigator collects data and subsequently performs descriptive and inferential statistics. For example, a researcher might compute group means and use the null hypothesis significance testing procedure to draw conclusions about the populations from which the groups were drawn. We propose an…

  2. Network-based statistical comparison of citation topology of bibliographic databases

    PubMed Central

    Šubelj, Lovro; Fiala, Dalibor; Bajec, Marko

    2014-01-01

    Modern bibliographic databases provide the basis for scientific research and its evaluation. While their content and structure differ substantially, there exist only informal notions on their reliability. Here we compare the topological consistency of citation networks extracted from six popular bibliographic databases including Web of Science, CiteSeer and arXiv.org. The networks are assessed through a rich set of local and global graph statistics. We first reveal statistically significant inconsistencies between some of the databases with respect to individual statistics. For example, the introduced field bow-tie decomposition of DBLP Computer Science Bibliography substantially differs from the rest due to the coverage of the database, while the citation information within arXiv.org is the most exhaustive. Finally, we compare the databases over multiple graph statistics using the critical difference diagram. The citation topology of DBLP Computer Science Bibliography is the least consistent with the rest, while, not surprisingly, Web of Science is significantly more reliable from the perspective of consistency. This work can serve either as a reference for scholars in bibliometrics and scientometrics or a scientific evaluation guideline for governments and research agencies. PMID:25263231

  3. Low statistical power in biomedical science: a review of three human research domains.

    PubMed

    Dumas-Mallet, Estelle; Button, Katherine S; Boraud, Thomas; Gonon, Francois; Munafò, Marcus R

    2017-02-01

    Studies with low statistical power increase the likelihood that a statistically significant finding represents a false positive result. We conducted a review of meta-analyses of studies investigating the association of biological, environmental or cognitive parameters with neurological, psychiatric and somatic diseases, excluding treatment studies, in order to estimate the average statistical power across these domains. Taking the effect size indicated by a meta-analysis as the best estimate of the likely true effect size, and assuming a threshold for declaring statistical significance of 5%, we found that approximately 50% of studies have statistical power in the 0-10% or 11-20% range, well below the minimum of 80% that is often considered conventional. Studies with low statistical power appear to be common in the biomedical sciences, at least in the specific subject areas captured by our search strategy. However, we also observe evidence that this depends in part on research methodology, with candidate gene studies showing very low average power and studies using cognitive/behavioural measures showing high average power. This warrants further investigation.

  4. Low statistical power in biomedical science: a review of three human research domains

    PubMed Central

    Dumas-Mallet, Estelle; Button, Katherine S.; Boraud, Thomas; Gonon, Francois

    2017-01-01

    Studies with low statistical power increase the likelihood that a statistically significant finding represents a false positive result. We conducted a review of meta-analyses of studies investigating the association of biological, environmental or cognitive parameters with neurological, psychiatric and somatic diseases, excluding treatment studies, in order to estimate the average statistical power across these domains. Taking the effect size indicated by a meta-analysis as the best estimate of the likely true effect size, and assuming a threshold for declaring statistical significance of 5%, we found that approximately 50% of studies have statistical power in the 0–10% or 11–20% range, well below the minimum of 80% that is often considered conventional. Studies with low statistical power appear to be common in the biomedical sciences, at least in the specific subject areas captured by our search strategy. However, we also observe evidence that this depends in part on research methodology, with candidate gene studies showing very low average power and studies using cognitive/behavioural measures showing high average power. This warrants further investigation. PMID:28386409

  5. Statistical Diversions

    ERIC Educational Resources Information Center

    Petocz, Peter; Sowey, Eric

    2012-01-01

    The term "data snooping" refers to the practice of choosing which statistical analyses to apply to a set of data after having first looked at those data. Data snooping contradicts a fundamental precept of applied statistics, that the scheme of analysis is to be planned in advance. In this column, the authors shall elucidate the…

  6. What can we learn from noise? — Mesoscopic nonequilibrium statistical physics —

    PubMed Central

    KOBAYASHI, Kensuke

    2016-01-01

    Mesoscopic systems — small electric circuits working in quantum regime — offer us a unique experimental stage to explorer quantum transport in a tunable and precise way. The purpose of this Review is to show how they can contribute to statistical physics. We introduce the significance of fluctuation, or equivalently noise, as noise measurement enables us to address the fundamental aspects of a physical system. The significance of the fluctuation theorem (FT) in statistical physics is noted. We explain what information can be deduced from the current noise measurement in mesoscopic systems. As an important application of the noise measurement to statistical physics, we describe our experimental work on the current and current noise in an electron interferometer, which is the first experimental test of FT in quantum regime. Our attempt will shed new light in the research field of mesoscopic quantum statistical physics. PMID:27477456

  7. Factors related to student performance in statistics courses in Lebanon

    NASA Astrophysics Data System (ADS)

    Naccache, Hiba Salim

    The purpose of the present study was to identify factors that may contribute to business students in Lebanese universities having difficulty in introductory and advanced statistics courses. Two statistics courses are required for business majors at Lebanese universities. Students are not obliged to be enrolled in any math courses prior to taking statistics courses. Drawing on recent educational research, this dissertation attempted to identify the relationship between (1) students’ scores on Lebanese university math admissions tests; (2) students’ scores on a test of very basic mathematical concepts; (3) students’ scores on the survey of attitude toward statistics (SATS); (4) course performance as measured by students’ final scores in the course; and (5) their scores on the final exam. Data were collected from 561 students enrolled in multiple sections of two courses: 307 students in the introductory statistics course and 260 in the advanced statistics course in seven campuses across Lebanon over one semester. The multiple regressions results revealed four significant relationships at the introductory level: between students’ scores on the math quiz with their (1) final exam scores; (2) their final averages; (3) the Cognitive subscale of the SATS with their final exam scores; and (4) their final averages. These four significant relationships were also found at the advanced level. In addition, two more significant relationships were found between students’ final average and the two subscales of Effort (5) and Affect (6). No relationship was found between students’ scores on the admission math tests and both their final exam scores and their final averages in both the introductory and advanced level courses. On the other hand, there was no relationship between students’ scores on Lebanese admissions tests and their final achievement. Although these results were consistent across course formats and instructors, they may encourage Lebanese universities

  8. Statistical methods of estimating mining costs

    USGS Publications Warehouse

    Long, K.R.

    2011-01-01

    Until it was defunded in 1995, the U.S. Bureau of Mines maintained a Cost Estimating System (CES) for prefeasibility-type economic evaluations of mineral deposits and estimating costs at producing and non-producing mines. This system had a significant role in mineral resource assessments to estimate costs of developing and operating known mineral deposits and predicted undiscovered deposits. For legal reasons, the U.S. Geological Survey cannot update and maintain CES. Instead, statistical tools are under development to estimate mining costs from basic properties of mineral deposits such as tonnage, grade, mineralogy, depth, strip ratio, distance from infrastructure, rock strength, and work index. The first step was to reestimate "Taylor's Rule" which relates operating rate to available ore tonnage. The second step was to estimate statistical models of capital and operating costs for open pit porphyry copper mines with flotation concentrators. For a sample of 27 proposed porphyry copper projects, capital costs can be estimated from three variables: mineral processing rate, strip ratio, and distance from nearest railroad before mine construction began. Of all the variables tested, operating costs were found to be significantly correlated only with strip ratio.

  9. Statistical Analysis Experiment for Freshman Chemistry Lab.

    ERIC Educational Resources Information Center

    Salzsieder, John C.

    1995-01-01

    Describes a laboratory experiment dissolving zinc from galvanized nails in which data can be gathered very quickly for statistical analysis. The data have sufficient significant figures and the experiment yields a nice distribution of random errors. Freshman students can gain an appreciation of the relationships between random error, number of…

  10. Significant-Loophole-Free Test of Bell's Theorem with Entangled Photons.

    PubMed

    Giustina, Marissa; Versteegh, Marijn A M; Wengerowsky, Sören; Handsteiner, Johannes; Hochrainer, Armin; Phelan, Kevin; Steinlechner, Fabian; Kofler, Johannes; Larsson, Jan-Åke; Abellán, Carlos; Amaya, Waldimar; Pruneri, Valerio; Mitchell, Morgan W; Beyer, Jörn; Gerrits, Thomas; Lita, Adriana E; Shalm, Lynden K; Nam, Sae Woo; Scheidl, Thomas; Ursin, Rupert; Wittmann, Bernhard; Zeilinger, Anton

    2015-12-18

    Local realism is the worldview in which physical properties of objects exist independently of measurement and where physical influences cannot travel faster than the speed of light. Bell's theorem states that this worldview is incompatible with the predictions of quantum mechanics, as is expressed in Bell's inequalities. Previous experiments convincingly supported the quantum predictions. Yet, every experiment requires assumptions that provide loopholes for a local realist explanation. Here, we report a Bell test that closes the most significant of these loopholes simultaneously. Using a well-optimized source of entangled photons, rapid setting generation, and highly efficient superconducting detectors, we observe a violation of a Bell inequality with high statistical significance. The purely statistical probability of our results to occur under local realism does not exceed 3.74×10^{-31}, corresponding to an 11.5 standard deviation effect.

  11. Taking a statistical approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wild, M.; Rouhani, S.

    1995-02-01

    A typical site investigation entails extensive sampling and monitoring. In the past, sampling plans have been designed on purely ad hoc bases, leading to significant expenditures and, in some cases, collection of redundant information. In many instances, sampling costs exceed the true worth of the collected data. The US Environmental Protection Agency (EPA) therefore has advocated the use of geostatistics to provide a logical framework for sampling and analysis of environmental data. Geostatistical methodology uses statistical techniques for the spatial analysis of a variety of earth-related data. The use of geostatistics was developed by the mining industry to estimate oremore » concentrations. The same procedure is effective in quantifying environmental contaminants in soils for risk assessments. Unlike classical statistical techniques, geostatistics offers procedures to incorporate the underlying spatial structure of the investigated field. Sample points spaced close together tend to be more similar than samples spaced further apart. This can guide sampling strategies and determine complex contaminant distributions. Geostatistic techniques can be used to evaluate site conditions on the basis of regular, irregular, random and even spatially biased samples. In most environmental investigations, it is desirable to concentrate sampling in areas of known or suspected contamination. The rigorous mathematical procedures of geostatistics allow for accurate estimates at unsampled locations, potentially reducing sampling requirements. The use of geostatistics serves as a decision-aiding and planning tool and can significantly reduce short-term site assessment costs, long-term sampling and monitoring needs, as well as lead to more accurate and realistic remedial design criteria.« less

  12. Statistical analysis of effective singular values in matrix rank determination

    NASA Technical Reports Server (NTRS)

    Konstantinides, Konstantinos; Yao, Kung

    1988-01-01

    A major problem in using SVD (singular-value decomposition) as a tool in determining the effective rank of a perturbed matrix is that of distinguishing between significantly small and significantly large singular values to the end, conference regions are derived for the perturbed singular values of matrices with noisy observation data. The analysis is based on the theories of perturbations of singular values and statistical significance test. Threshold bounds for perturbation due to finite-precision and i.i.d. random models are evaluated. In random models, the threshold bounds depend on the dimension of the matrix, the noisy variance, and predefined statistical level of significance. Results applied to the problem of determining the effective order of a linear autoregressive system from the approximate rank of a sample autocorrelation matrix are considered. Various numerical examples illustrating the usefulness of these bounds and comparisons to other previously known approaches are given.

  13. Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

    PubMed Central

    Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy

    2015-01-01

    Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically

  14. Statistical Policy Working Paper 24. Electronic Dissemination of Statistical Data

    DOT National Transportation Integrated Search

    1995-11-01

    The report, Statistical Policy Working Paper 24, Electronic Dissemination of Statistical Data, includes several topics, such as Options and Best Uses for Different Media Operation of Electronic Dissemination Service, Customer Service Programs, Cost a...

  15. Why Current Statistics of Complementary Alternative Medicine Clinical Trials is Invalid.

    PubMed

    Pandolfi, Maurizio; Carreras, Giulia

    2018-06-07

    It is not sufficiently known that frequentist statistics cannot provide direct information on the probability that the research hypothesis tested is correct. The error resulting from this misunderstanding is compounded when the hypotheses under scrutiny have precarious scientific bases, which, generally, those of complementary alternative medicine (CAM) are. In such cases, it is mandatory to use inferential statistics, considering the prior probability that the hypothesis tested is true, such as the Bayesian statistics. The authors show that, under such circumstances, no real statistical significance can be achieved in CAM clinical trials. In this respect, CAM trials involving human material are also hardly defensible from an ethical viewpoint.

  16. Detection by voxel-wise statistical analysis of significant changes in regional cerebral glucose uptake in an APP/PS1 transgenic mouse model of Alzheimer's disease.

    PubMed

    Dubois, Albertine; Hérard, Anne-Sophie; Delatour, Benoît; Hantraye, Philippe; Bonvento, Gilles; Dhenain, Marc; Delzescaux, Thierry

    2010-06-01

    Biomarkers and technologies similar to those used in humans are essential for the follow-up of Alzheimer's disease (AD) animal models, particularly for the clarification of mechanisms and the screening and validation of new candidate treatments. In humans, changes in brain metabolism can be detected by 1-deoxy-2-[(18)F] fluoro-D-glucose PET (FDG-PET) and assessed in a user-independent manner with dedicated software, such as Statistical Parametric Mapping (SPM). FDG-PET can be carried out in small animals, but its resolution is low as compared to the size of rodent brain structures. In mouse models of AD, changes in cerebral glucose utilization are usually detected by [(14)C]-2-deoxyglucose (2DG) autoradiography, but this requires prior manual outlining of regions of interest (ROI) on selected sections. Here, we evaluate the feasibility of applying the SPM method to 3D autoradiographic data sets mapping brain metabolic activity in a transgenic mouse model of AD. We report the preliminary results obtained with 4 APP/PS1 (64+/-1 weeks) and 3 PS1 (65+/-2 weeks) mice. We also describe new procedures for the acquisition and use of "blockface" photographs and provide the first demonstration of their value for the 3D reconstruction and spatial normalization of post mortem mouse brain volumes. Despite this limited sample size, our results appear to be meaningful, consistent, and more comprehensive than findings from previously published studies based on conventional ROI-based methods. The establishment of statistical significance at the voxel level, rather than with a user-defined ROI, makes it possible to detect more reliably subtle differences in geometrically complex regions, such as the hippocampus. Our approach is generic and could be easily applied to other biomarkers and extended to other species and applications. Copyright 2010 Elsevier Inc. All rights reserved.

  17. Statistical Diversions

    ERIC Educational Resources Information Center

    Petocz, Peter; Sowey, Eric

    2008-01-01

    In this article, the authors focus on hypothesis testing--that peculiarly statistical way of deciding things. Statistical methods for testing hypotheses were developed in the 1920s and 1930s by some of the most famous statisticians, in particular Ronald Fisher, Jerzy Neyman and Egon Pearson, who laid the foundations of almost all modern methods of…

  18. Pattern statistics on Markov chains and sensitivity to parameter estimation.

    PubMed

    Nuel, Grégory

    2006-10-17

    In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of sigma, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation.

  19. A statistical analysis of North East Atlantic (submicron) aerosol size distributions

    NASA Astrophysics Data System (ADS)

    Dall'Osto, M.; Monahan, C.; Greaney, R.; Beddows, D. C. S.; Harrison, R. M.; Ceburnis, D.; O'Dowd, C. D.

    2011-12-01

    The Global Atmospheric Watch research station at Mace Head (Ireland) offers the possibility to sample some of the cleanest air masses being imported into Europe as well as some of the most polluted being exported out of Europe. We present a statistical cluster analysis of the physical characteristics of aerosol size distributions in air ranging from the cleanest to the most polluted for the year 2008. Data coverage achieved was 75% throughout the year. By applying the Hartigan-Wong k-Means method, 12 clusters were identified as systematically occurring. These 12 clusters could be further combined into 4 categories with similar characteristics, namely: coastal nucleation category (occurring 21.3 % of the time), open ocean nucleation category (occurring 32.6% of the time), background clean marine category (occurring 26.1% of the time) and anthropogenic category (occurring 20% of the time) aerosol size distributions. The coastal nucleation category is characterised by a clear and dominant nucleation mode at sizes less than 10 nm while the open ocean nucleation category is characterised by a dominant Aitken mode between 15 nm and 50 nm. The background clean marine aerosol exhibited a clear bimodality in the sub-micron size distribution, with although it should be noted that either the Aitken mode or the accumulation mode may dominate the number concentration. However, peculiar background clean marine size distributions with coarser accumulation modes are also observed during winter months. By contrast, the continentally-influenced size distributions are generally more monomodal (accumulation), albeit with traces of bimodality. The open ocean category occurs more often during May, June and July, corresponding with the North East (NE) Atlantic high biological period. Combined with the relatively high percentage frequency of occurrence (32.6%), this suggests that the marine biota is an important source of new nano aerosol particles in NE Atlantic Air.

  20. A statistical analysis of North East Atlantic (submicron) aerosol size distributions

    NASA Astrophysics Data System (ADS)

    Dall'Osto, M.; Monahan, C.; Greaney, R.; Beddows, D. C. S.; Harrison, R. M.; Ceburnis, D.; O'Dowd, C. D.

    2011-08-01

    The Global Atmospheric Watch research station at Mace Head (Ireland) offers the possibility to sample some of the cleanest air masses being imported into Europe as well as some of the most polluted being exported out of Europe. We present a statistical Cluster~analysis of the physical characteristics of aerosol size distributions in air ranging from the cleanest to the most polluted for the year 2008. Data coverage achieved was 75 % throughout the year. By applying the Hartigan-Wong k-Means method, 12 Clusters were identified as systematically occurring and these 12 Clusters could be further combined into 4 categories with similar characteristics, namely: coastal nucleation category (occurring 21.3 % of the time), open ocean nucleation category (occurring 32.6 % of the time), background clean marine category (occurring 26.1 % of the time) and anthropogenic category (occurring 20 % of the time) aerosol size distributions. The coastal nucleation category is characterised by a clear and dominant nucleation mode at sizes less that 10 nm while the open ocean nucleation category is characterised by a dominant Aitken mode between 15 nm and 50 nm. The background clean marine characteristic is a clear bimodality in the size distribution, although it should be noted that either the Aitken mode or the Accumulation mode may dominate the number concentration. By contrast, the continentally-influenced size distributions are generally more mono-modal, albeit with traces of bi-modality. The open ocean category occurs more often during May, June and July, corresponding with the N. E. Atlantic high biological period. Combined with the relatively high percentage frequency of occurrence (32.6 %), this suggests that the marine biota is an important source of new aerosol particles in N. E. Atlantic Air.

  1. Statistical Modeling of Zr/Hf Extraction using TBP-D2EHPA Mixtures

    NASA Astrophysics Data System (ADS)

    Rezaeinejhad Jirandehi, Vahid; Haghshenas Fatmehsari, Davoud; Firoozi, Sadegh; Taghizadeh, Mohammad; Keshavarz Alamdari, Eskandar

    2012-12-01

    In the present work, response surface methodology was employed for the study and prediction of Zr/Hf extraction curves in a solvent extraction system using D2EHPA-TBP mixtures. The effect of change in the levels of temperature, nitric acid concentration, and TBP/D2EHPA ratio (T/D) on the Zr/Hf extraction/separation was studied by the use of central composite design. The results showed a statistically significant effect of T/D, nitric acid concentration, and temperature on the extraction percentage of Zr and Hf. In the case of Zr, a statistically significant interaction was found between T/D and nitric acid, whereas for Hf, both interactive terms between temperature and T/D and nitric acid were significant. Additionally, the extraction curves were profitably predicted applying the developed statistical regression equations; this approach is faster and more economical compared with experimentally obtained curves.

  2. Neural Systems with Numerically Matched Input-Output Statistic: Isotonic Bivariate Statistical Modeling

    PubMed Central

    Fiori, Simone

    2007-01-01

    Bivariate statistical modeling from incomplete data is a useful statistical tool that allows to discover the model underlying two data sets when the data in the two sets do not correspond in size nor in ordering. Such situation may occur when the sizes of the two data sets do not match (i.e., there are “holes” in the data) or when the data sets have been acquired independently. Also, statistical modeling is useful when the amount of available data is enough to show relevant statistical features of the phenomenon underlying the data. We propose to tackle the problem of statistical modeling via a neural (nonlinear) system that is able to match its input-output statistic to the statistic of the available data sets. A key point of the new implementation proposed here is that it is based on look-up-table (LUT) neural systems, which guarantee a computationally advantageous way of implementing neural systems. A number of numerical experiments, performed on both synthetic and real-world data sets, illustrate the features of the proposed modeling procedure. PMID:18566641

  3. Statistical Reasoning Ability, Self-Efficacy, and Value Beliefs in a University Statistics Course

    ERIC Educational Resources Information Center

    Olani, A.; Hoekstra, R.; Harskamp, E.; van der Werf, G.

    2011-01-01

    Introduction: The study investigated the degree to which students' statistical reasoning abilities, statistics self-efficacy, and perceived value of statistics improved during a reform based introductory statistics course. The study also examined whether the changes in these learning outcomes differed with respect to the students' mathematical…

  4. Statistics in Japanese universities.

    PubMed Central

    Ito, P K

    1979-01-01

    The teaching of statistics in the U.S. and Japanese universities is briefly reviewed. It is found that H. Hotelling's articles and subsequent relevant publications on the teaching of statistics have contributed to a considerable extent to the establishment of excellent departments of statistics in U.S. universities and colleges. Today the U.S. may be proud of many well-staffed and well-organized departments of theoretical and applied statistics with excellent undergraduate and graduate programs. On the contrary, no Japanese universities have an independent department of statistics at present, and the teaching of statistics has been spread among a heterogeneous group of departments of application. This was mainly due to the Japanese government regulation concerning the establishment of a university. However, it has recently been revised so that an independent department of statistics may be started in a Japanese university with undergraduate and graduate programs. It is hoped that discussions will be started among those concerned on the question of organization of the teaching of statistics in Japanese universities as soon as possible. PMID:396154

  5. The statistical significance of error probability as determined from decoding simulations for long codes

    NASA Technical Reports Server (NTRS)

    Massey, J. L.

    1976-01-01

    The very low error probability obtained with long error-correcting codes results in a very small number of observed errors in simulation studies of practical size and renders the usual confidence interval techniques inapplicable to the observed error probability. A natural extension of the notion of a 'confidence interval' is made and applied to such determinations of error probability by simulation. An example is included to show the surprisingly great significance of as few as two decoding errors in a very large number of decoding trials.

  6. Infants with Williams syndrome detect statistical regularities in continuous speech.

    PubMed

    Cashon, Cara H; Ha, Oh-Ryeong; Graf Estes, Katharine; Saffran, Jenny R; Mervis, Carolyn B

    2016-09-01

    Williams syndrome (WS) is a rare genetic disorder associated with delays in language and cognitive development. The reasons for the language delay are unknown. Statistical learning is a domain-general mechanism recruited for early language acquisition. In the present study, we investigated whether infants with WS were able to detect the statistical structure in continuous speech. Eighteen 8- to 20-month-olds with WS were familiarized with 2min of a continuous stream of synthesized nonsense words; the statistical structure of the speech was the only cue to word boundaries. They were tested on their ability to discriminate statistically-defined "words" and "part-words" (which crossed word boundaries) in the artificial language. Despite significant cognitive and language delays, infants with WS were able to detect the statistical regularities in the speech stream. These findings suggest that an inability to track the statistical properties of speech is unlikely to be the primary basis for the delays in the onset of language observed in infants with WS. These results provide the first evidence of statistical learning by infants with developmental delays. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. Incorporating an Interactive Statistics Workshop into an Introductory Biology Course-Based Undergraduate Research Experience (CURE) Enhances Students' Statistical Reasoning and Quantitative Literacy Skills.

    PubMed

    Olimpo, Jeffrey T; Pevey, Ryan S; McCabe, Thomas M

    2018-01-01

    Course-based undergraduate research experiences (CUREs) provide an avenue for student participation in authentic scientific opportunities. Within the context of such coursework, students are often expected to collect, analyze, and evaluate data obtained from their own investigations. Yet, limited research has been conducted that examines mechanisms for supporting students in these endeavors. In this article, we discuss the development and evaluation of an interactive statistics workshop that was expressly designed to provide students with an open platform for graduate teaching assistant (GTA)-mentored data processing, statistical testing, and synthesis of their own research findings. Mixed methods analyses of pre/post-intervention survey data indicated a statistically significant increase in students' reasoning and quantitative literacy abilities in the domain, as well as enhancement of student self-reported confidence in and knowledge of the application of various statistical metrics to real-world contexts. Collectively, these data reify an important role for scaffolded instruction in statistics in preparing emergent scientists to be data-savvy researchers in a globally expansive STEM workforce.

  8. Statistical methods for the beta-binomial model in teratology.

    PubMed Central

    Yamamoto, E; Yanagimoto, T

    1994-01-01

    The beta-binomial model is widely used for analyzing teratological data involving littermates. Recent developments in statistical analyses of teratological data are briefly reviewed with emphasis on the model. For statistical inference of the parameters in the beta-binomial distribution, separation of the likelihood introduces an likelihood inference. This leads to reducing biases of estimators and also to improving accuracy of empirical significance levels of tests. Separate inference of the parameters can be conducted in a unified way. PMID:8187716

  9. Statistical tests for power-law cross-correlated processes

    NASA Astrophysics Data System (ADS)

    Podobnik, Boris; Jiang, Zhi-Qiang; Zhou, Wei-Xing; Stanley, H. Eugene

    2011-12-01

    For stationary time series, the cross-covariance and the cross-correlation as functions of time lag n serve to quantify the similarity of two time series. The latter measure is also used to assess whether the cross-correlations are statistically significant. For nonstationary time series, the analogous measures are detrended cross-correlations analysis (DCCA) and the recently proposed detrended cross-correlation coefficient, ρDCCA(T,n), where T is the total length of the time series and n the window size. For ρDCCA(T,n), we numerically calculated the Cauchy inequality -1≤ρDCCA(T,n)≤1. Here we derive -1≤ρDCCA(T,n)≤1 for a standard variance-covariance approach and for a detrending approach. For overlapping windows, we find the range of ρDCCA within which the cross-correlations become statistically significant. For overlapping windows we numerically determine—and for nonoverlapping windows we derive—that the standard deviation of ρDCCA(T,n) tends with increasing T to 1/T. Using ρDCCA(T,n) we show that the Chinese financial market's tendency to follow the U.S. market is extremely weak. We also propose an additional statistical test that can be used to quantify the existence of cross-correlations between two power-law correlated time series.

  10. Modeling Soot Oxidation and Gasification with Bayesian Statistics

    DOE PAGES

    Josephson, Alexander J.; Gaffin, Neal D.; Smith, Sean T.; ...

    2017-08-22

    This paper presents a statistical method for model calibration using data collected from literature. The method is used to calibrate parameters for global models of soot consumption in combustion systems. This consumption is broken into two different submodels: first for oxidation where soot particles are attacked by certain oxidizing agents; second for gasification where soot particles are attacked by H 2O or CO 2 molecules. Rate data were collected from 19 studies in the literature and evaluated using Bayesian statistics to calibrate the model parameters. Bayesian statistics are valued in their ability to quantify uncertainty in modeling. The calibrated consumptionmore » model with quantified uncertainty is presented here along with a discussion of associated implications. The oxidation results are found to be consistent with previous studies. Significant variation is found in the CO 2 gasification rates.« less

  11. Modeling Soot Oxidation and Gasification with Bayesian Statistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Josephson, Alexander J.; Gaffin, Neal D.; Smith, Sean T.

    This paper presents a statistical method for model calibration using data collected from literature. The method is used to calibrate parameters for global models of soot consumption in combustion systems. This consumption is broken into two different submodels: first for oxidation where soot particles are attacked by certain oxidizing agents; second for gasification where soot particles are attacked by H 2O or CO 2 molecules. Rate data were collected from 19 studies in the literature and evaluated using Bayesian statistics to calibrate the model parameters. Bayesian statistics are valued in their ability to quantify uncertainty in modeling. The calibrated consumptionmore » model with quantified uncertainty is presented here along with a discussion of associated implications. The oxidation results are found to be consistent with previous studies. Significant variation is found in the CO 2 gasification rates.« less

  12. Sb2Te3 and Its Superlattices: Optimization by Statistical Design.

    PubMed

    Behera, Jitendra K; Zhou, Xilin; Ranjan, Alok; Simpson, Robert E

    2018-05-02

    The objective of this work is to demonstrate the usefulness of fractional factorial design for optimizing the crystal quality of chalcogenide van der Waals (vdW) crystals. We statistically analyze the growth parameters of highly c axis oriented Sb 2 Te 3 crystals and Sb 2 Te 3 -GeTe phase change vdW heterostructured superlattices. The statistical significance of the growth parameters of temperature, pressure, power, buffer materials, and buffer layer thickness was found by fractional factorial design and response surface analysis. Temperature, pressure, power, and their second-order interactions are the major factors that significantly influence the quality of the crystals. Additionally, using tungsten rather than molybdenum as a buffer layer significantly enhances the crystal quality. Fractional factorial design minimizes the number of experiments that are necessary to find the optimal growth conditions, resulting in an order of magnitude improvement in the crystal quality. We highlight that statistical design of experiment methods, which is more commonly used in product design, should be considered more broadly by those designing and optimizing materials.

  13. Statistical Significance of the Maximum Hardness Principle Applied to Some Selected Chemical Reactions.

    PubMed

    Saha, Ranajit; Pan, Sudip; Chattaraj, Pratim K

    2016-11-05

    The validity of the maximum hardness principle (MHP) is tested in the cases of 50 chemical reactions, most of which are organic in nature and exhibit anomeric effect. To explore the effect of the level of theory on the validity of MHP in an exothermic reaction, B3LYP/6-311++G(2df,3pd) and LC-BLYP/6-311++G(2df,3pd) (def2-QZVP for iodine and mercury) levels are employed. Different approximations like the geometric mean of hardness and combined hardness are considered in case there are multiple reactants and/or products. It is observed that, based on the geometric mean of hardness, while 82% of the studied reactions obey the MHP at the B3LYP level, 84% of the reactions follow this rule at the LC-BLYP level. Most of the reactions possess the hardest species on the product side. A 50% null hypothesis is rejected at a 1% level of significance.

  14. Statistical Irreversible Thermodynamics in the Framework of Zubarev's Nonequilibrium Statistical Operator Method

    NASA Astrophysics Data System (ADS)

    Luzzi, R.; Vasconcellos, A. R.; Ramos, J. G.; Rodrigues, C. G.

    2018-01-01

    We describe the formalism of statistical irreversible thermodynamics constructed based on Zubarev's nonequilibrium statistical operator (NSO) method, which is a powerful and universal tool for investigating the most varied physical phenomena. We present brief overviews of the statistical ensemble formalism and statistical irreversible thermodynamics. The first can be constructed either based on a heuristic approach or in the framework of information theory in the Jeffreys-Jaynes scheme of scientific inference; Zubarev and his school used both approaches in formulating the NSO method. We describe the main characteristics of statistical irreversible thermodynamics and discuss some particular considerations of several authors. We briefly describe how Rosenfeld, Bohr, and Prigogine proposed to derive a thermodynamic uncertainty principle.

  15. Online use statistics.

    PubMed

    Tannery, Nancy Hrinya; Silverman, Deborah L; Epstein, Barbara A

    2002-01-01

    Online use statistics can provide libraries with a tool to be used when developing an online collection of resources. Statistics can provide information on overall use of a collection, individual print and electronic journal use, and collection use by specific user populations. They can also be used to determine the number of user licenses to purchase. This paper focuses on the issue of use statistics made available for one collection of online resources.

  16. Statistical methodology for the analysis of dye-switch microarray experiments

    PubMed Central

    Mary-Huard, Tristan; Aubert, Julie; Mansouri-Attia, Nadera; Sandra, Olivier; Daudin, Jean-Jacques

    2008-01-01

    Background In individually dye-balanced microarray designs, each biological sample is hybridized on two different slides, once with Cy3 and once with Cy5. While this strategy ensures an automatic correction of the gene-specific labelling bias, it also induces dependencies between log-ratio measurements that must be taken into account in the statistical analysis. Results We present two original statistical procedures for the statistical analysis of individually balanced designs. These procedures are compared with the usual ML and REML mixed model procedures proposed in most statistical toolboxes, on both simulated and real data. Conclusion The UP procedure we propose as an alternative to usual mixed model procedures is more efficient and significantly faster to compute. This result provides some useful guidelines for the analysis of complex designs. PMID:18271965

  17. Scan statistics with local vote for target detection in distributed system

    NASA Astrophysics Data System (ADS)

    Luo, Junhai; Wu, Qi

    2017-12-01

    Target detection has occupied a pivotal position in distributed system. Scan statistics, as one of the most efficient detection methods, has been applied to a variety of anomaly detection problems and significantly improves the probability of detection. However, scan statistics cannot achieve the expected performance when the noise intensity is strong, or the signal emitted by the target is weak. The local vote algorithm can also achieve higher target detection rate. After the local vote, the counting rule is always adopted for decision fusion. The counting rule does not use the information about the contiguity of sensors but takes all sensors' data into consideration, which makes the result undesirable. In this paper, we propose a scan statistics with local vote (SSLV) method. This method combines scan statistics with local vote decision. Before scan statistics, each sensor executes local vote decision according to the data of its neighbors and its own. By combining the advantages of both, our method can obtain higher detection rate in low signal-to-noise ratio environment than the scan statistics. After the local vote decision, the distribution of sensors which have detected the target becomes more intensive. To make full use of local vote decision, we introduce a variable-step-parameter for the SSLV. It significantly shortens the scan period especially when the target is absent. Analysis and simulations are presented to demonstrate the performance of our method.

  18. State Transportation Statistics 2013

    DOT National Transportation Integrated Search

    2014-09-19

    The Bureau of Transportation Statistics (BTS), a part of the U.S. Department of Transportations (USDOT) Research and Innovative Technology Administration (RITA), presents State Transportation Statistics 2013, a statistical profile of transportatio...

  19. State Transportation Statistics 2012

    DOT National Transportation Integrated Search

    2013-08-15

    The Bureau of Transportation Statistics (BTS), a part of the U.S. Department of Transportation's (USDOT) Research and Innovative Technology Administration (RITA), presents State Transportation Statistics 2012, a statistical profile of transportation ...

  20. BTS statistical standards manual

    DOT National Transportation Integrated Search

    2005-10-01

    The Bureau of Transportation Statistics (BTS), like other federal statistical agencies, establishes professional standards to guide the methods and procedures for the collection, processing, storage, and presentation of statistical data. Standards an...

  1. Pattern statistics on Markov chains and sensitivity to parameter estimation

    PubMed Central

    Nuel, Grégory

    2006-01-01

    Background: In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). Results: In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of σ, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. Conclusion: We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation. PMID:17044916

  2. State transportation statistics 2009

    DOT National Transportation Integrated Search

    2009-01-01

    The Bureau of Transportation Statistics (BTS), a part of DOTs Research and : Innovative Technology Administration (RITA), presents State Transportation : Statistics 2009, a statistical profile of transportation in the 50 states and the : District ...

  3. No significant association between prenatal exposure poliovirus epidemics and psychosis.

    PubMed

    Cahill, Matthew; Chant, David; Welham, Joy; McGrath, John

    2002-06-01

    To examine the association between prenatal exposure to poliovirus infection and later development of schizophrenia or affective psychosis in a Southern Hemisphere psychiatric register. We calculated rates of poliomyelitis cases per 10 000 background population and rates for schizophrenia(n = 6078) and affective psychosis (n = 3707)per 10 000 births for the period 1930-1964. Empirically weighted regression was used to measure the association between a given psychosis birth-rate and a poliomyelitis epidemic during gestation. There was no statistically significant association between exposure to a poliomyelitis epidemic during gestation and subsequent development of schizophrenia or affective psychosis. The lack of a consistent statistically significant association between poliovirus epidemics and schizophrenia suggests that either poliovirus may have a small effect which is only detectable with large data-sets and/or the effect may be modified by location. Further investigation of such inconsistencies may help elucidate candidate risk-modifying factors for schizophrenia.

  4. New Statistics for Testing Differential Expression of Pathways from Microarray Data

    NASA Astrophysics Data System (ADS)

    Siu, Hoicheong; Dong, Hua; Jin, Li; Xiong, Momiao

    Exploring biological meaning from microarray data is very important but remains a great challenge. Here, we developed three new statistics: linear combination test, quadratic test and de-correlation test to identify differentially expressed pathways from gene expression profile. We apply our statistics to two rheumatoid arthritis datasets. Notably, our results reveal three significant pathways and 275 genes in common in two datasets. The pathways we found are meaningful to uncover the disease mechanisms of rheumatoid arthritis, which implies that our statistics are a powerful tool in functional analysis of gene expression data.

  5. State transportation statistics 2008

    DOT National Transportation Integrated Search

    2008-01-01

    The Bureau of Transportation Statistics (BTS), a part of the U.S. Department of Transportations : Research and Innovative Technology Administration, presents State Transportation : Statistics 2008, a statistical profi le of transportation in the 5...

  6. Adrenal Gland Tumors: Statistics

    MedlinePlus

    ... Gland Tumor: Statistics Request Permissions Adrenal Gland Tumor: Statistics Approved by the Cancer.Net Editorial Board , 03/ ... primary adrenal gland tumor is very uncommon. Exact statistics are not available for this type of tumor ...

  7. Asymptotic Linear Spectral Statistics for Spiked Hermitian Random Matrices

    NASA Astrophysics Data System (ADS)

    Passemier, Damien; McKay, Matthew R.; Chen, Yang

    2015-07-01

    Using the Coulomb Fluid method, this paper derives central limit theorems (CLTs) for linear spectral statistics of three "spiked" Hermitian random matrix ensembles. These include Johnstone's spiked model (i.e., central Wishart with spiked correlation), non-central Wishart with rank-one non-centrality, and a related class of non-central matrices. For a generic linear statistic, we derive simple and explicit CLT expressions as the matrix dimensions grow large. For all three ensembles under consideration, we find that the primary effect of the spike is to introduce an correction term to the asymptotic mean of the linear spectral statistic, which we characterize with simple formulas. The utility of our proposed framework is demonstrated through application to three different linear statistics problems: the classical likelihood ratio test for a population covariance, the capacity analysis of multi-antenna wireless communication systems with a line-of-sight transmission path, and a classical multiple sample significance testing problem.

  8. State Transportation Statistics 2011

    DOT National Transportation Integrated Search

    2012-08-08

    The Bureau of Transportation Statistics (BTS), a part of DOTs Research and Innovative Technology Administration (RITA), presents State Transportation Statistics 2011, a statistical profile of transportation in the 50 states and the District of Col...

  9. State Transportation Statistics 2010

    DOT National Transportation Integrated Search

    2011-09-14

    The Bureau of Transportation Statistics (BTS), a part of DOTs Research and Innovative Technology Administration (RITA), presents State Transportation Statistics 2010, a statistical profile of transportation in the 50 states and the District of Col...

  10. State transportation statistics 2005

    DOT National Transportation Integrated Search

    2005-12-01

    The Bureau of Transportation Statistics (BTS), a part of DOTs Research and Innovative : Technology Administration (RITA), presents State Transportation Statistics 2005, a : statistical profi le of transportation in the 50 states and the District o...

  11. State transportation statistics 2007

    DOT National Transportation Integrated Search

    2007-01-01

    The Bureau of Transportation Statistics (BTS), a part of DOTs Research and Innovative : Technology Administration (RITA), presents State Transportation Statistics 2007, a : statistical profi le of transportation in the 50 states and the District o...

  12. State transportation statistics 2006

    DOT National Transportation Integrated Search

    2006-12-01

    The Bureau of Transportation Statistics (BTS), a part of DOTs Research and Innovative : Technology Administration (RITA), presents State Transportation Statistics 2006, a : statistical profi le of transportation in the 50 states and the District o...

  13. [Morphometric anatomic study and clinical significance of lunate fossa].

    PubMed

    Aldemir, Cengiz; Önder, Merve; Doğan, Ali; Duygun, Fatih; Oğuz, Nurettin

    2015-01-01

    This study aims to investigate the depth, transverse and sagittal diameters of lunate fossa which is a significant structure of the wrist in terms of reducing the risk for volar plate screws, which are administered in distal radius fractures, from penetrating into the joint. Depth, transverse and sagittal diameters of lunate fossa in 50 right and 50 left adult dried radius bones without distal tip damage were measured by using MicroscribeG2X from the MicroScribe G series. Mean lunate fossa depth: left 2.419886±0.51 mm/right 2.543052±0.78 mm, mean lunate fossa sagittal diameter: left 19.656±1.57 mm/right 18.796±1.53 mm, mean lunate fossa transverse diameter: left 11.382±0.65 mm/right 11.106±0.91 mm. There was no statistically significant difference between right and left depth values of lunate fossa (p=0.320), whereas there was statistically significant difference between right and left transverse and sagittal diameters (p=0.006, p=0.048). Measurements involving depth of lunate fossa may guide the development of new anatomic plates and decrease complications like the penetration of screw into joint whilst volar plate administrations.

  14. Sunspot activity and influenza pandemics: a statistical assessment of the purported association.

    PubMed

    Towers, S

    2017-10-01

    Since 1978, a series of papers in the literature have claimed to find a significant association between sunspot activity and the timing of influenza pandemics. This paper examines these analyses, and attempts to recreate the three most recent statistical analyses by Ertel (1994), Tapping et al. (2001), and Yeung (2006), which all have purported to find a significant relationship between sunspot numbers and pandemic influenza. As will be discussed, each analysis had errors in the data. In addition, in each analysis arbitrary selections or assumptions were also made, and the authors did not assess the robustness of their analyses to changes in those arbitrary assumptions. Varying the arbitrary assumptions to other, equally valid, assumptions negates the claims of significance. Indeed, an arbitrary selection made in one of the analyses appears to have resulted in almost maximal apparent significance; changing it only slightly yields a null result. This analysis applies statistically rigorous methodology to examine the purported sunspot/pandemic link, using more statistically powerful un-binned analysis methods, rather than relying on arbitrarily binned data. The analyses are repeated using both the Wolf and Group sunspot numbers. In all cases, no statistically significant evidence of any association was found. However, while the focus in this particular analysis was on the purported relationship of influenza pandemics to sunspot activity, the faults found in the past analyses are common pitfalls; inattention to analysis reproducibility and robustness assessment are common problems in the sciences, that are unfortunately not noted often enough in review.

  15. State transportation statistics 2004

    DOT National Transportation Integrated Search

    2004-12-01

    The Bureau of Transportation Statistics (BTS) presents State Transportation : Statistics 2004, a statistical profile of transportation in the 50 states and the : District of Columbia. This document is a successor to the BTS State : Transportation Pro...

  16. Wildfire cluster detection using space-time scan statistics

    NASA Astrophysics Data System (ADS)

    Tonini, M.; Tuia, D.; Ratle, F.; Kanevski, M.

    2009-04-01

    The aim of the present study is to identify spatio-temporal clusters of fires sequences using space-time scan statistics. These statistical methods are specifically designed to detect clusters and assess their significance. Basically, scan statistics work by comparing a set of events occurring inside a scanning window (or a space-time cylinder for spatio-temporal data) with those that lie outside. Windows of increasing size scan the zone across space and time: the likelihood ratio is calculated for each window (comparing the ratio "observed cases over expected" inside and outside): the window with the maximum value is assumed to be the most probable cluster, and so on. Under the null hypothesis of spatial and temporal randomness, these events are distributed according to a known discrete-state random process (Poisson or Bernoulli), which parameters can be estimated. Given this assumption, it is possible to test whether or not the null hypothesis holds in a specific area. In order to deal with fires data, the space-time permutation scan statistic has been applied since it does not require the explicit specification of the population-at risk in each cylinder. The case study is represented by Florida daily fire detection using the Moderate Resolution Imaging Spectroradiometer (MODIS) active fire product during the period 2003-2006. As result, statistically significant clusters have been identified. Performing the analyses over the entire frame period, three out of the five most likely clusters have been identified in the forest areas, on the North of the country; the other two clusters cover a large zone in the South, corresponding to agricultural land and the prairies in the Everglades. Furthermore, the analyses have been performed separately for the four years to analyze if the wildfires recur each year during the same period. It emerges that clusters of forest fires are more frequent in hot seasons (spring and summer), while in the South areas they are widely

  17. Statistics, Uncertainty, and Transmitted Variation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wendelberger, Joanne Roth

    2014-11-05

    The field of Statistics provides methods for modeling and understanding data and making decisions in the presence of uncertainty. When examining response functions, variation present in the input variables will be transmitted via the response function to the output variables. This phenomenon can potentially have significant impacts on the uncertainty associated with results from subsequent analysis. This presentation will examine the concept of transmitted variation, its impact on designed experiments, and a method for identifying and estimating sources of transmitted variation in certain settings.

  18. Forest statistics of central and northern Indiana

    Treesearch

    The Forest Survey Organization Central States Forest Experiment Station

    1952-01-01

    The Forest Survey is conducted in the various regions by the forest experiment stations of the Forest Service. In Indiana the project is directed by the Central States Forest Experiment Station with headquarters in Columbus, Ohio. This Survey Release presents the more significant preliminary statistics on the forest area and timber volume for Central and Northern...

  19. Statistical Misconceptions and Rushton's Writings on Race.

    ERIC Educational Resources Information Center

    Cernovsky, Zack Z.

    The term "statistical significance" is often misunderstood or abused to imply a large effect size. A recent example is in the work of J. P. Rushton (1988, 1990) on differences between Negroids and Caucasoids. Rushton used brain size and cranial size as indicators of intelligence, using Pearson "r"s ranging from 0.03 to 0.35.…

  20. Prevalence of synchronous colorectal neoplasms in surgically treated gastric cancer patients and significance of screening colonoscopy.

    PubMed

    Suzuki, Akira; Koide, Naohiko; Takeuchi, Daisuke; Okumura, Motohiro; Ishizone, Satoshi; Suga, Tomoaki; Miyagawa, Shinichi

    2014-05-01

    The existence of other primary tumors during the treatment and management of gastric cancer (GC) is an important issue. The present study investigated the prevalence and management of synchronous colorectal neoplasms (CRN) in surgically treated GC patients. Of 381 surgically treated GC patients, 332 (87.1%) underwent colonoscopy to detect CRN before surgery or within a year after surgery. CRN were synchronously observed in 140 patients (42.2%). Adenoma was observed in 131 patients (39.4%). Endoscopic resection was done in 18 patients with adenoma. Colorectal cancer (CRC) was observed in 16 patients (4.8%), superficial CRC in 13 and advanced CRC in three patients. Endoscopic resection of superficial CRC was carried out in seven patients, whereas simultaneous surgical resection of CRC was done in nine patients. CRN were more frequently observed in men. CRC was more frequently observed in GC patients with distant metastasis, albeit without significance. The overall survival of GC patients with CRN or CRC was poorer than that of patients without CRN or CRC. Synchronous CRN were commonly associated with GC and screening colonoscopy should be offered to patients with GC. © 2013 The Authors. Digestive Endoscopy © 2013 Japan Gastroenterological Endoscopy Society.

  1. Incorporating an Interactive Statistics Workshop into an Introductory Biology Course-Based Undergraduate Research Experience (CURE) Enhances Students’ Statistical Reasoning and Quantitative Literacy Skills †

    PubMed Central

    Olimpo, Jeffrey T.; Pevey, Ryan S.; McCabe, Thomas M.

    2018-01-01

    Course-based undergraduate research experiences (CUREs) provide an avenue for student participation in authentic scientific opportunities. Within the context of such coursework, students are often expected to collect, analyze, and evaluate data obtained from their own investigations. Yet, limited research has been conducted that examines mechanisms for supporting students in these endeavors. In this article, we discuss the development and evaluation of an interactive statistics workshop that was expressly designed to provide students with an open platform for graduate teaching assistant (GTA)-mentored data processing, statistical testing, and synthesis of their own research findings. Mixed methods analyses of pre/post-intervention survey data indicated a statistically significant increase in students’ reasoning and quantitative literacy abilities in the domain, as well as enhancement of student self-reported confidence in and knowledge of the application of various statistical metrics to real-world contexts. Collectively, these data reify an important role for scaffolded instruction in statistics in preparing emergent scientists to be data-savvy researchers in a globally expansive STEM workforce. PMID:29904549

  2. Learning Predictive Statistics: Strategies and Brain Mechanisms.

    PubMed

    Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe

    2017-08-30

    When immersed in a new environment, we are challenged to decipher initially incomprehensible streams of sensory information. However, quite rapidly, the brain finds structure and meaning in these incoming signals, helping us to predict and prepare ourselves for future actions. This skill relies on extracting the statistics of event streams in the environment that contain regularities of variable complexity from simple repetitive patterns to complex probabilistic combinations. Here, we test the brain mechanisms that mediate our ability to adapt to the environment's statistics and predict upcoming events. By combining behavioral training and multisession fMRI in human participants (male and female), we track the corticostriatal mechanisms that mediate learning of temporal sequences as they change in structure complexity. We show that learning of predictive structures relates to individual decision strategy; that is, selecting the most probable outcome in a given context (maximizing) versus matching the exact sequence statistics. These strategies engage distinct human brain regions: maximizing engages dorsolateral prefrontal, cingulate, sensory-motor regions, and basal ganglia (dorsal caudate, putamen), whereas matching engages occipitotemporal regions (including the hippocampus) and basal ganglia (ventral caudate). Our findings provide evidence for distinct corticostriatal mechanisms that facilitate our ability to extract behaviorally relevant statistics to make predictions. SIGNIFICANCE STATEMENT Making predictions about future events relies on interpreting streams of information that may initially appear incomprehensible. Past work has studied how humans identify repetitive patterns and associative pairings. However, the natural environment contains regularities that vary in complexity from simple repetition to complex probabilistic combinations. Here, we combine behavior and multisession fMRI to track the brain mechanisms that mediate our ability to adapt to

  3. On the statistical assessment of classifiers using DNA microarray data

    PubMed Central

    Ancona, N; Maglietta, R; Piepoli, A; D'Addabbo, A; Cotugno, R; Savino, M; Liuni, S; Carella, M; Pesole, G; Perri, F

    2006-01-01

    Background In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. Results We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA) classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045) as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035) and e = 18% (p = 0.037) respectively. Moreover, the error rate decreases as the training set

  4. Application of pedagogy reflective in statistical methods course and practicum statistical methods

    NASA Astrophysics Data System (ADS)

    Julie, Hongki

    2017-08-01

    Subject Elementary Statistics, Statistical Methods and Statistical Methods Practicum aimed to equip students of Mathematics Education about descriptive statistics and inferential statistics. The students' understanding about descriptive and inferential statistics were important for students on Mathematics Education Department, especially for those who took the final task associated with quantitative research. In quantitative research, students were required to be able to present and describe the quantitative data in an appropriate manner, to make conclusions from their quantitative data, and to create relationships between independent and dependent variables were defined in their research. In fact, when students made their final project associated with quantitative research, it was not been rare still met the students making mistakes in the steps of making conclusions and error in choosing the hypothetical testing process. As a result, they got incorrect conclusions. This is a very fatal mistake for those who did the quantitative research. There were some things gained from the implementation of reflective pedagogy on teaching learning process in Statistical Methods and Statistical Methods Practicum courses, namely: 1. Twenty two students passed in this course and and one student did not pass in this course. 2. The value of the most accomplished student was A that was achieved by 18 students. 3. According all students, their critical stance could be developed by them, and they could build a caring for each other through a learning process in this course. 4. All students agreed that through a learning process that they undergo in the course, they can build a caring for each other.

  5. Methods and statistics for combining motif match scores.

    PubMed

    Bailey, T L; Gribskov, M

    1998-01-01

    Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.

  6. Statistics

    Cancer.gov

    Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.

  7. Uterine Cancer Statistics

    MedlinePlus

    ... Doing AMIGAS Stay Informed Cancer Home Uterine Cancer Statistics Language: English (US) Español (Spanish) Recommend on Facebook ... the most commonly diagnosed gynecologic cancer. U.S. Cancer Statistics Data Visualizations Tool The Data Visualizations tool makes ...

  8. Ensemble of Thermostatically Controlled Loads: Statistical Physics Approach.

    PubMed

    Chertkov, Michael; Chernyak, Vladimir

    2017-08-17

    Thermostatically controlled loads, e.g., air conditioners and heaters, are by far the most widespread consumers of electricity. Normally the devices are calibrated to provide the so-called bang-bang control - changing from on to off, and vice versa, depending on temperature. We considered aggregation of a large group of similar devices into a statistical ensemble, where the devices operate following the same dynamics, subject to stochastic perturbations and randomized, Poisson on/off switching policy. Using theoretical and computational tools of statistical physics, we analyzed how the ensemble relaxes to a stationary distribution and established a relationship between the relaxation and the statistics of the probability flux associated with devices' cycling in the mixed (discrete, switch on/off, and continuous temperature) phase space. This allowed us to derive the spectrum of the non-equilibrium (detailed balance broken) statistical system and uncover how switching policy affects oscillatory trends and the speed of the relaxation. Relaxation of the ensemble is of practical interest because it describes how the ensemble recovers from significant perturbations, e.g., forced temporary switching off aimed at utilizing the flexibility of the ensemble to provide "demand response" services to change consumption temporarily to balance a larger power grid. We discuss how the statistical analysis can guide further development of the emerging demand response technology.

  9. Study/experimental/research design: much more than statistics.

    PubMed

    Knight, Kenneth L

    2010-01-01

    The purpose of study, experimental, or research design in scientific manuscripts has changed significantly over the years. It has evolved from an explanation of the design of the experiment (ie, data gathering or acquisition) to an explanation of the statistical analysis. This practice makes "Methods" sections hard to read and understand. To clarify the difference between study design and statistical analysis, to show the advantages of a properly written study design on article comprehension, and to encourage authors to correctly describe study designs. The role of study design is explored from the introduction of the concept by Fisher through modern-day scientists and the AMA Manual of Style. At one time, when experiments were simpler, the study design and statistical design were identical or very similar. With the complex research that is common today, which often includes manipulating variables to create new variables and the multiple (and different) analyses of a single data set, data collection is very different than statistical design. Thus, both a study design and a statistical design are necessary. Scientific manuscripts will be much easier to read and comprehend. A proper experimental design serves as a road map to the study methods, helping readers to understand more clearly how the data were obtained and, therefore, assisting them in properly analyzing the results.

  10. Study/Experimental/Research Design: Much More Than Statistics

    PubMed Central

    Knight, Kenneth L.

    2010-01-01

    Abstract Context: The purpose of study, experimental, or research design in scientific manuscripts has changed significantly over the years. It has evolved from an explanation of the design of the experiment (ie, data gathering or acquisition) to an explanation of the statistical analysis. This practice makes “Methods” sections hard to read and understand. Objective: To clarify the difference between study design and statistical analysis, to show the advantages of a properly written study design on article comprehension, and to encourage authors to correctly describe study designs. Description: The role of study design is explored from the introduction of the concept by Fisher through modern-day scientists and the AMA Manual of Style. At one time, when experiments were simpler, the study design and statistical design were identical or very similar. With the complex research that is common today, which often includes manipulating variables to create new variables and the multiple (and different) analyses of a single data set, data collection is very different than statistical design. Thus, both a study design and a statistical design are necessary. Advantages: Scientific manuscripts will be much easier to read and comprehend. A proper experimental design serves as a road map to the study methods, helping readers to understand more clearly how the data were obtained and, therefore, assisting them in properly analyzing the results. PMID:20064054

  11. Ensemble of Thermostatically Controlled Loads: Statistical Physics Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chertkov, Michael; Chernyak, Vladimir

    Thermostatically Controlled Loads (TCL), e.g. air-conditioners and heaters, are by far the most wide-spread consumers of electricity. Normally the devices are calibrated to provide the so-called bang-bang control of temperature - changing from on to off , and vice versa, depending on temperature. Aggregation of a large group of similar devices into a statistical ensemble is considered, where the devices operate following the same dynamics subject to stochastic perturbations and randomized, Poisson on/off switching policy. We analyze, using theoretical and computational tools of statistical physics, how the ensemble relaxes to a stationary distribution and establish relation between the re- laxationmore » and statistics of the probability flux, associated with devices' cycling in the mixed (discrete, switch on/off , and continuous, temperature) phase space. This allowed us to derive and analyze spec- trum of the non-equilibrium (detailed balance broken) statistical system. and uncover how switching policy affects oscillatory trend and speed of the relaxation. Relaxation of the ensemble is of a practical interest because it describes how the ensemble recovers from significant perturbations, e.g. forceful temporary switching o aimed at utilizing flexibility of the ensemble in providing "demand response" services relieving consumption temporarily to balance larger power grid. We discuss how the statistical analysis can guide further development of the emerging demand response technology.« less

  12. Ensemble of Thermostatically Controlled Loads: Statistical Physics Approach

    DOE PAGES

    Chertkov, Michael; Chernyak, Vladimir

    2017-01-17

    Thermostatically Controlled Loads (TCL), e.g. air-conditioners and heaters, are by far the most wide-spread consumers of electricity. Normally the devices are calibrated to provide the so-called bang-bang control of temperature - changing from on to off , and vice versa, depending on temperature. Aggregation of a large group of similar devices into a statistical ensemble is considered, where the devices operate following the same dynamics subject to stochastic perturbations and randomized, Poisson on/off switching policy. We analyze, using theoretical and computational tools of statistical physics, how the ensemble relaxes to a stationary distribution and establish relation between the re- laxationmore » and statistics of the probability flux, associated with devices' cycling in the mixed (discrete, switch on/off , and continuous, temperature) phase space. This allowed us to derive and analyze spec- trum of the non-equilibrium (detailed balance broken) statistical system. and uncover how switching policy affects oscillatory trend and speed of the relaxation. Relaxation of the ensemble is of a practical interest because it describes how the ensemble recovers from significant perturbations, e.g. forceful temporary switching o aimed at utilizing flexibility of the ensemble in providing "demand response" services relieving consumption temporarily to balance larger power grid. We discuss how the statistical analysis can guide further development of the emerging demand response technology.« less

  13. Statistical test for ΔρDCCA cross-correlation coefficient

    NASA Astrophysics Data System (ADS)

    Guedes, E. F.; Brito, A. A.; Oliveira Filho, F. M.; Fernandez, B. F.; de Castro, A. P. N.; da Silva Filho, A. M.; Zebende, G. F.

    2018-07-01

    In this paper we propose a new statistical test for ΔρDCCA, Detrended Cross-Correlation Coefficient Difference, a tool to measure contagion/interdependence effect in time series of size N at different time scale n. For this proposition we analyzed simulated and real time series. The results showed that the statistical significance of ΔρDCCA depends on the size N and the time scale n, and we can define a critical value for this dependency in 90%, 95%, and 99% of confidence level, as will be shown in this paper.

  14. Students' Emergent Articulations of Statistical Models and Modeling in Making Informal Statistical Inferences

    ERIC Educational Resources Information Center

    Braham, Hana Manor; Ben-Zvi, Dani

    2017-01-01

    A fundamental aspect of statistical inference is representation of real-world data using statistical models. This article analyzes students' articulations of statistical models and modeling during their first steps in making informal statistical inferences. An integrated modeling approach (IMA) was designed and implemented to help students…

  15. Selected low-flow frequency statistics for continuous-record streamgages in Georgia, 2013

    USGS Publications Warehouse

    Gotvald, Anthony J.

    2016-04-13

    This report presents the annual and monthly minimum 1- and 7-day average streamflows with the 10-year recurrence interval (1Q10 and 7Q10) for 197 continuous-record streamgages in Georgia. Streamgages used in the study included active and discontinued stations having a minimum of 10 complete climatic years of record as of September 30, 2013. The 1Q10 and 7Q10 flow statistics were computed for 85 streamgages on unregulated streams with minimal diversions upstream, 43 streamgages on regulated streams, and 69 streamgages known, or considered, to be affected by varying degrees of diversions upstream. Descriptive information for each of these streamgages, including the U.S. Geological Survey (USGS) station number, station name, latitude, longitude, county, drainage area, and period of record analyzed also is presented.Kendall’s tau nonparametric test was used to determine the statistical significance of trends in annual and monthly minimum 1-day and 7-day average flows for the 197 streamgages. Significant negative trends in the minimum annual 1-day and 7-day average streamflow were indicated for 77 of the 197 streamgages. Many of these significant negative trends are due to the period of record ending during one of the recent droughts in Georgia, particularly those streamgages with record through the 2013 water year. Long-term unregulated streamgages with 70 or more years of record indicate significant negative trends in the annual minimum 7-day average flow for central and southern Georgia. Watersheds for some of these streamgages have experienced minimal human impact, thus indicating that the significant negative trends observed in flows at the long-term streamgages may be influenced by changing climatological conditions. A Kendall-tau trend analysis of the annual air temperature and precipitation totals for Georgia indicated no significant trends. A comprehensive analysis of causes of the trends in annual and monthly minimum 1-day and 7-day average flows in central

  16. A novel measure and significance testing in data analysis of cell image segmentation.

    PubMed

    Wu, Jin Chu; Halter, Michael; Kacker, Raghu N; Elliott, John T; Plant, Anne L

    2017-03-14

    Cell image segmentation (CIS) is an essential part of quantitative imaging of biological cells. Designing a performance measure and conducting significance testing are critical for evaluating and comparing the CIS algorithms for image-based cell assays in cytometry. Many measures and methods have been proposed and implemented to evaluate segmentation methods. However, computing the standard errors (SE) of the measures and their correlation coefficient is not described, and thus the statistical significance of performance differences between CIS algorithms cannot be assessed. We propose the total error rate (TER), a novel performance measure for segmenting all cells in the supervised evaluation. The TER statistically aggregates all misclassification error rates (MER) by taking cell sizes as weights. The MERs are for segmenting each single cell in the population. The TER is fully supported by the pairwise comparisons of MERs using 106 manually segmented ground-truth cells with different sizes and seven CIS algorithms taken from ImageJ. Further, the SE and 95% confidence interval (CI) of TER are computed based on the SE of MER that is calculated using the bootstrap method. An algorithm for computing the correlation coefficient of TERs between two CIS algorithms is also provided. Hence, the 95% CI error bars can be used to classify CIS algorithms. The SEs of TERs and their correlation coefficient can be employed to conduct the hypothesis testing, while the CIs overlap, to determine the statistical significance of the performance differences between CIS algorithms. A novel measure TER of CIS is proposed. The TER's SEs and correlation coefficient are computed. Thereafter, CIS algorithms can be evaluated and compared statistically by conducting the significance testing.

  17. Use of statistical procedures in Brazilian and international dental journals.

    PubMed

    Ambrosano, Gláucia Maria Bovi; Reis, André Figueiredo; Giannini, Marcelo; Pereira, Antônio Carlos

    2004-01-01

    A descriptive survey was performed in order to assess the statistical content and quality of Brazilian and international dental journals, and compare their evolution throughout the last decades. The authors identified the reporting and accuracy of statistical techniques in 1000 papers published from 1970 to 2000 in seven dental journals: three Brazilian (Brazilian Dental Journal, Revista de Odontologia da Universidade de Sao Paulo and Revista de Odontologia da UNESP) and four international journals (Journal of the American Dental Association, Journal of Dental Research, Caries Research and Journal of Periodontology). Papers were divided into two time periods: from 1970 to 1989, and from 1990 to 2000. A slight increase in the number of articles that presented some form of statistical technique was noticed for Brazilian journals (from 61.0 to 66.7%), whereas for international journals, a significant increase was observed (65.8 to 92.6%). In addition, a decrease in the number of statistical errors was verified. The most commonly used statistical tests as well as the most frequent errors found in dental journals were assessed. Hopefully, this investigation will encourage dental educators to better plan the teaching of biostatistics, and to improve the statistical quality of submitted manuscripts.

  18. Highway Statistics 1970

    DOT National Transportation Integrated Search

    1971-10-01

    This publication was prepared by the Highway Statistics Division, Office of Highway Planning, Federal Highway Administration. The 26th of an annual series, it presents the 1970 statistical and analytical tables of general interest on motor fuel, moto...

  19. Critical analysis of adsorption data statistically

    NASA Astrophysics Data System (ADS)

    Kaushal, Achla; Singh, S. K.

    2017-10-01

    Experimental data can be presented, computed, and critically analysed in a different way using statistics. A variety of statistical tests are used to make decisions about the significance and validity of the experimental data. In the present study, adsorption was carried out to remove zinc ions from contaminated aqueous solution using mango leaf powder. The experimental data was analysed statistically by hypothesis testing applying t test, paired t test and Chi-square test to (a) test the optimum value of the process pH, (b) verify the success of experiment and (c) study the effect of adsorbent dose in zinc ion removal from aqueous solutions. Comparison of calculated and tabulated values of t and χ 2 showed the results in favour of the data collected from the experiment and this has been shown on probability charts. K value for Langmuir isotherm was 0.8582 and m value for Freundlich adsorption isotherm obtained was 0.725, both are <1, indicating favourable isotherms. Karl Pearson's correlation coefficient values for Langmuir and Freundlich adsorption isotherms were obtained as 0.99 and 0.95 respectively, which show higher degree of correlation between the variables. This validates the data obtained for adsorption of zinc ions from the contaminated aqueous solution with the help of mango leaf powder.

  20. Limitations of Poisson statistics in describing radioactive decay.

    PubMed

    Sitek, Arkadiusz; Celler, Anna M

    2015-12-01

    The assumption that nuclear decays are governed by Poisson statistics is an approximation. This approximation becomes unjustified when data acquisition times longer than or even comparable with the half-lives of the radioisotope in the sample are considered. In this work, the limits of the Poisson-statistics approximation are investigated. The formalism for the statistics of radioactive decay based on binomial distribution is derived. The theoretical factor describing the deviation of variance of the number of decays predicated by the Poisson distribution from the true variance is defined and investigated for several commonly used radiotracers such as (18)F, (15)O, (82)Rb, (13)N, (99m)Tc, (123)I, and (201)Tl. The variance of the number of decays estimated using the Poisson distribution is significantly different than the true variance for a 5-minute observation time of (11)C, (15)O, (13)N, and (82)Rb. Durations of nuclear medicine studies often are relatively long; they may be even a few times longer than the half-lives of some short-lived radiotracers. Our study shows that in such situations the Poisson statistics is unsuitable and should not be applied to describe the statistics of the number of decays in radioactive samples. However, the above statement does not directly apply to counting statistics at the level of event detection. Low sensitivities of detectors which are used in imaging studies make the Poisson approximation near perfect. Copyright © 2015 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

  1. Statistics at a glance.

    PubMed

    Ector, Hugo

    2010-12-01

    I still remember my first book on statistics: "Elementary statistics with applications in medicine and the biological sciences" by Frederick E. Croxton. For me, it has been the start of pursuing understanding statistics in daily life and in medical practice. It was the first volume in a long row of books. In his introduction, Croxton pretends that"nearly everyone involved in any aspect of medicine needs to have some knowledge of statistics". The reality is that for many clinicians, statistics are limited to a "P < 0.05 = ok". I do not blame my colleagues who omit the paragraph on statistical methods. They have never had the opportunity to learn concise and clear descriptions of the key features. I have experienced how some authors can describe difficult methods in a well understandable language. Others fail completely. As a teacher, I tell my students that life is impossible without a basic knowledge of statistics. This feeling has resulted in an annual seminar of 90 minutes. This tutorial is the summary of this seminar. It is a summary and a transcription of the best pages I have detected.

  2. Pedagogical Utilization and Assessment of the Statistic Online Computational Resource in Introductory Probability and Statistics Courses.

    PubMed

    Dinov, Ivo D; Sanchez, Juana; Christou, Nicolas

    2008-01-01

    Technology-based instruction represents a new recent pedagogical paradigm that is rooted in the realization that new generations are much more comfortable with, and excited about, new technologies. The rapid technological advancement over the past decade has fueled an enormous demand for the integration of modern networking, informational and computational tools with classical pedagogical instruments. Consequently, teaching with technology typically involves utilizing a variety of IT and multimedia resources for online learning, course management, electronic course materials, and novel tools of communication, engagement, experimental, critical thinking and assessment.The NSF-funded Statistics Online Computational Resource (SOCR) provides a number of interactive tools for enhancing instruction in various undergraduate and graduate courses in probability and statistics. These resources include online instructional materials, statistical calculators, interactive graphical user interfaces, computational and simulation applets, tools for data analysis and visualization. The tools provided as part of SOCR include conceptual simulations and statistical computing interfaces, which are designed to bridge between the introductory and the more advanced computational and applied probability and statistics courses. In this manuscript, we describe our designs for utilizing SOCR technology in instruction in a recent study. In addition, present the results of the effectiveness of using SOCR tools at two different course intensity levels on three outcome measures: exam scores, student satisfaction and choice of technology to complete assignments. Learning styles assessment was completed at baseline. We have used three very different designs for three different undergraduate classes. Each course included a treatment group, using the SOCR resources, and a control group, using classical instruction techniques. Our findings include marginal effects of the SOCR treatment per individual

  3. Pedagogical Utilization and Assessment of the Statistic Online Computational Resource in Introductory Probability and Statistics Courses

    PubMed Central

    Dinov, Ivo D.; Sanchez, Juana; Christou, Nicolas

    2009-01-01

    Technology-based instruction represents a new recent pedagogical paradigm that is rooted in the realization that new generations are much more comfortable with, and excited about, new technologies. The rapid technological advancement over the past decade has fueled an enormous demand for the integration of modern networking, informational and computational tools with classical pedagogical instruments. Consequently, teaching with technology typically involves utilizing a variety of IT and multimedia resources for online learning, course management, electronic course materials, and novel tools of communication, engagement, experimental, critical thinking and assessment. The NSF-funded Statistics Online Computational Resource (SOCR) provides a number of interactive tools for enhancing instruction in various undergraduate and graduate courses in probability and statistics. These resources include online instructional materials, statistical calculators, interactive graphical user interfaces, computational and simulation applets, tools for data analysis and visualization. The tools provided as part of SOCR include conceptual simulations and statistical computing interfaces, which are designed to bridge between the introductory and the more advanced computational and applied probability and statistics courses. In this manuscript, we describe our designs for utilizing SOCR technology in instruction in a recent study. In addition, present the results of the effectiveness of using SOCR tools at two different course intensity levels on three outcome measures: exam scores, student satisfaction and choice of technology to complete assignments. Learning styles assessment was completed at baseline. We have used three very different designs for three different undergraduate classes. Each course included a treatment group, using the SOCR resources, and a control group, using classical instruction techniques. Our findings include marginal effects of the SOCR treatment per individual

  4. Applied Statistics.

    DTIC Science & Technology

    1980-03-18

    Association, 72, 1977, 881-885. A study of renewal processes with IMRL and DFR nterarrival times, Mark Brown. Technical Report No. 6, 6/7/77. Annals...11, 9/23/77. Scandi Acturial Journal, 1978, 211-224. q. On the distribution of the greatest coon divisor, Persl Diaconis & Paul Erdos. Technical Report...Peter Cooke. Technical Report No. 32, 4/3/79. Biometrika, 1980. StatistIcs as a mathematical discipline, D.V. Lindley. Technical Report No. 3, 4/25/79

  5. Statistical optics

    NASA Astrophysics Data System (ADS)

    Goodman, J. W.

    This book is based on the thesis that some training in the area of statistical optics should be included as a standard part of any advanced optics curriculum. Random variables are discussed, taking into account definitions of probability and random variables, distribution functions and density functions, an extension to two or more random variables, statistical averages, transformations of random variables, sums of real random variables, Gaussian random variables, complex-valued random variables, and random phasor sums. Other subjects examined are related to random processes, some first-order properties of light waves, the coherence of optical waves, some problems involving high-order coherence, effects of partial coherence on imaging systems, imaging in the presence of randomly inhomogeneous media, and fundamental limits in photoelectric detection of light. Attention is given to deterministic versus statistical phenomena and models, the Fourier transform, and the fourth-order moment of the spectrum of a detected speckle image.

  6. A Statistical Study on Neutron Star Masses

    NASA Astrophysics Data System (ADS)

    Cheng, Z.; Zhang, C. M.; Zhao, Y. H.; Wang, D. H.; Pan, Y. Y.; Lei, Y. J.

    2013-11-01

    We investigate the measurement of neutron star masses in different population of binaries. Based on the collection of the orbital parameters of 40 systems (46 sources), we apply the boot-strap method together with the Monte Carlo method to reconstruct the likelihood curves for each source separately. The cumulative analysis of the simulation result shows that the neutron star masses in X-ray systems and radio systems obey different distributions, and no evidence for the bimodal distribution could be found. Employing the Bayesian statistical techniques, we find that the most likely distributions for the high mass X-ray binaries (HMXBs), low mass X-ray binaries (LMXBs), double neutron star (DNS) systems, and neutron star-white dwarf (NS-WD) binary systems are (1.340±0.230) M_{⊙}, (1.505±0.125) M_{⊙}, (1.335±0.055) M_{⊙}, and (1.495±0.225) M_{⊙}, respectively. The statistical distribution has no significant deviation from the standard neutron star formation mechanism. It is noticed that the statistical results of the center masses of LMXBs and NS-WD systems are significantly higher than the other groups by about 0.16 M_{⊙}, which could be regarded as the evidence of accretion episodes. And if we regard the HMXBs and LMXBs as the progenitors of DNS and NS-WD systems, then we can draw the conclusion that the accretion effect must be very week during the evolution trajectory from HMXBs to DNS systems, and this could be the reason why the masses of DNS systems have such a narrow distribution.

  7. [The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].

    PubMed

    Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel

    2017-01-01

    The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.

  8. Business Statistics Education: Content and Software in Undergraduate Business Statistics Courses.

    ERIC Educational Resources Information Center

    Tabatabai, Manouchehr; Gamble, Ralph

    1997-01-01

    Survey responses from 204 of 500 business schools identified most often topics in business statistics I and II courses. The most popular software at both levels was Minitab. Most schools required both statistics I and II. (SK)

  9. A shift from significance test to hypothesis test through power analysis in medical research.

    PubMed

    Singh, G

    2006-01-01

    Medical research literature until recently, exhibited substantial dominance of the Fisher's significance test approach of statistical inference concentrating more on probability of type I error over Neyman-Pearson's hypothesis test considering both probability of type I and II error. Fisher's approach dichotomises results into significant or not significant results with a P value. The Neyman-Pearson's approach talks of acceptance or rejection of null hypothesis. Based on the same theory these two approaches deal with same objective and conclude in their own way. The advancement in computing techniques and availability of statistical software have resulted in increasing application of power calculations in medical research and thereby reporting the result of significance tests in the light of power of the test also. Significance test approach, when it incorporates power analysis contains the essence of hypothesis test approach. It may be safely argued that rising application of power analysis in medical research may have initiated a shift from Fisher's significance test to Neyman-Pearson's hypothesis test procedure.

  10. Median statistics estimates of Hubble and Newton's constants

    NASA Astrophysics Data System (ADS)

    Bethapudi, Suryarao; Desai, Shantanu

    2017-02-01

    Robustness of any statistics depends upon the number of assumptions it makes about the measured data. We point out the advantages of median statistics using toy numerical experiments and demonstrate its robustness, when the number of assumptions we can make about the data are limited. We then apply the median statistics technique to obtain estimates of two constants of nature, Hubble constant (H0) and Newton's gravitational constant ( G , both of which show significant differences between different measurements. For H0, we update the analyses done by Chen and Ratra (2011) and Gott et al. (2001) using 576 measurements. We find after grouping the different results according to their primary type of measurement, the median estimates are given by H0 = 72.5^{+2.5}_{-8} km/sec/Mpc with errors corresponding to 95% c.l. (2 σ) and G=6.674702^{+0.0014}_{-0.0009} × 10^{-11} Nm2kg-2 corresponding to 68% c.l. (1σ).

  11. Magnetic resonance imaging findings in Ménière's disease.

    PubMed

    Patel, V A; Oberman, B S; Zacharia, T T; Isildak, H

    2017-07-01

    To identify and evaluate cranial magnetic resonance imaging findings associated with Ménière's disease. Seventy-eight patients with a documented diagnosis of Ménière's disease and 35 controls underwent 1.5 T or 3 T magnetic resonance imaging of the brain. Patients also underwent otological, vestibular and audiometric examinations. Lack of visualisation of the left and right vestibular aqueducts was identified as statistically significant amongst Ménière's disease patients (left, p = 0.0001, odds ratio = 0.02; right, p = 0.0004, odds ratio = 0.03). Both vestibular aqueducts were of abnormal size in the Ménière's disease group, albeit with left-sided significance (left, p = 0.008, odds ratio = 10.91; right, p = 0.49, odds ratio = 2.47). Lack of vestibular aqueduct visualisation on magnetic resonance imaging was statistically significant in Ménière's disease patients compared to the general population. The study findings suggest that magnetic resonance imaging can be useful to rule out retrocochlear pathology and provide radiological data to support the clinical diagnosis of Ménière's disease.

  12. Infant Statistical Learning

    PubMed Central

    Saffran, Jenny R.; Kirkham, Natasha Z.

    2017-01-01

    Perception involves making sense of a dynamic, multimodal environment. In the absence of mechanisms capable of exploiting the statistical patterns in the natural world, infants would face an insurmountable computational problem. Infant statistical learning mechanisms facilitate the detection of structure. These abilities allow the infant to compute across elements in their environmental input, extracting patterns for further processing and subsequent learning. In this selective review, we summarize findings that show that statistical learning is both a broad and flexible mechanism (supporting learning from different modalities across many different content areas) and input specific (shifting computations depending on the type of input and goal of learning). We suggest that statistical learning not only provides a framework for studying language development and object knowledge in constrained laboratory settings, but also allows researchers to tackle real-world problems, such as multilingualism, the role of ever-changing learning environments, and differential developmental trajectories. PMID:28793812

  13. Performance of cancer cluster Q-statistics for case-control residential histories

    PubMed Central

    Sloan, Chantel D.; Jacquez, Geoffrey M.; Gallagher, Carolyn M.; Ward, Mary H.; Raaschou-Nielsen, Ole; Nordsborg, Rikke Baastrup; Meliker, Jaymie R.

    2012-01-01

    Few investigations of health event clustering have evaluated residential mobility, though causative exposures for chronic diseases such as cancer often occur long before diagnosis. Recently developed Q-statistics incorporate human mobility into disease cluster investigations by quantifying space- and time-dependent nearest neighbor relationships. Using residential histories from two cancer case-control studies, we created simulated clusters to examine Q-statistic performance. Results suggest the intersection of cases with significant clustering over their life course, Qi, with cases who are constituents of significant local clusters at given times, Qit, yielded the best performance, which improved with increasing cluster size. Upon comparison, a larger proportion of true positives were detected with Kulldorf’s spatial scan method if the time of clustering was provided. We recommend using Q-statistics to identify when and where clustering may have occurred, followed by the scan method to localize the candidate clusters. Future work should investigate the generalizability of these findings. PMID:23149326

  14. [Review of research design and statistical methods in Chinese Journal of Cardiology].

    PubMed

    Zhang, Li-jun; Yu, Jin-ming

    2009-07-01

    To evaluate the research design and the use of statistical methods in Chinese Journal of Cardiology. Peer through the research design and statistical methods in all of the original papers in Chinese Journal of Cardiology from December 2007 to November 2008. The most frequently used research designs are cross-sectional design (34%), prospective design (21%) and experimental design (25%). In all of the articles, 49 (25%) use wrong statistical methods, 29 (15%) lack some sort of statistic analysis, 23 (12%) have inconsistencies in description of methods. There are significant differences between different statistical methods (P < 0.001). The correction rates of multifactor analysis were low and repeated measurement datas were not used repeated measurement analysis. Many problems exist in Chinese Journal of Cardiology. Better research design and correct use of statistical methods are still needed. More strict review by statistician and epidemiologist is also required to improve the literature qualities.

  15. Musicians' edge: A comparison of auditory processing, cognitive abilities and statistical learning.

    PubMed

    Mandikal Vasuki, Pragati Rao; Sharma, Mridula; Demuth, Katherine; Arciuli, Joanne

    2016-12-01

    It has been hypothesized that musical expertise is associated with enhanced auditory processing and cognitive abilities. Recent research has examined the relationship between musicians' advantage and implicit statistical learning skills. In the present study, we assessed a variety of auditory processing skills, cognitive processing skills, and statistical learning (auditory and visual forms) in age-matched musicians (N = 17) and non-musicians (N = 18). Musicians had significantly better performance than non-musicians on frequency discrimination, and backward digit span. A key finding was that musicians had better auditory, but not visual, statistical learning than non-musicians. Performance on the statistical learning tasks was not correlated with performance on auditory and cognitive measures. Musicians' superior performance on auditory (but not visual) statistical learning suggests that musical expertise is associated with an enhanced ability to detect statistical regularities in auditory stimuli. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. How Attitudes towards Statistics Courses and the Field of Statistics Predicts Statistics Anxiety among Undergraduate Social Science Majors: A Validation of the Statistical Anxiety Scale

    ERIC Educational Resources Information Center

    O'Bryant, Monique J.

    2017-01-01

    The aim of this study was to validate an instrument that can be used by instructors or social scientist who are interested in evaluating statistics anxiety. The psychometric properties of the English version of the Statistical Anxiety Scale (SAS) was examined through a confirmatory factor analysis of scores from a sample of 323 undergraduate…

  17. [Confidence interval or p-value--similarities and differences between two important methods of statistical inference of quantitative studies].

    PubMed

    Harari, Gil

    2014-01-01

    Statistic significance, also known as p-value, and CI (Confidence Interval) are common statistics measures and are essential for the statistical analysis of studies in medicine and life sciences. These measures provide complementary information about the statistical probability and conclusions regarding the clinical significance of study findings. This article is intended to describe the methodologies, compare between the methods, assert their suitability for the different needs of study results analysis and to explain situations in which each method should be used.

  18. Analyzing gene expression from relative codon usage bias in Yeast genome: a statistical significance and biological relevance.

    PubMed

    Das, Shibsankar; Roymondal, Uttam; Sahoo, Satyabrata

    2009-08-15

    Based on the hypothesis that highly expressed genes are often characterized by strong compositional bias in terms of codon usage, there are a number of measures currently in use that quantify codon usage bias in genes, and hence provide numerical indices to predict the expression levels of genes. With the recent advent of expression measure from the score of the relative codon usage bias (RCBS), we have explicitly tested the performance of this numerical measure to predict the gene expression level and illustrate this with an analysis of Yeast genomes. In contradiction with previous other studies, we observe a weak correlations between GC content and RCBS, but a selective pressure on the codon preferences in highly expressed genes. The assertion that the expression of a given gene depends on the score of relative codon usage bias (RCBS) is supported by the data. We further observe a strong correlation between RCBS and protein length indicating natural selection in favour of shorter genes to be expressed at higher level. We also attempt a statistical analysis to assess the strength of relative codon bias in genes as a guide to their likely expression level, suggesting a decrease of the informational entropy in the highly expressed genes.

  19. A study of statistics anxiety levels of graduate dental hygiene students.

    PubMed

    Welch, Paul S; Jacks, Mary E; Smiley, Lynn A; Walden, Carolyn E; Clark, William D; Nguyen, Carol A

    2015-02-01

    In light of increased emphasis on evidence-based practice in the profession of dental hygiene, it is important that today's dental hygienist comprehend statistical measures to fully understand research articles, and thereby apply scientific evidence to practice. Therefore, the purpose of this study was to investigate statistics anxiety among graduate dental hygiene students in the U.S. A web-based self-report, anonymous survey was emailed to directors of 17 MSDH programs in the U.S. with a request to distribute to graduate students. The survey collected data on statistics anxiety, sociodemographic characteristics and evidence-based practice. Statistic anxiety was assessed using the Statistical Anxiety Rating Scale. Study significance level was α=0.05. Only 8 of the 17 invited programs participated in the study. Statistical Anxiety Rating Scale data revealed graduate dental hygiene students experience low to moderate levels of statistics anxiety. Specifically, the level of anxiety on the Interpretation Anxiety factor indicated this population could struggle with making sense of scientific research. A decisive majority (92%) of students indicated statistics is essential for evidence-based practice and should be a required course for all dental hygienists. This study served to identify statistics anxiety in a previously unexplored population. The findings should be useful in both theory building and in practical applications. Furthermore, the results can be used to direct future research. Copyright © 2015 The American Dental Hygienists’ Association.

  20. Statistical analysis of Geopotential Height (GH) timeseries based on Tsallis non-extensive statistical mechanics

    NASA Astrophysics Data System (ADS)

    Karakatsanis, L. P.; Iliopoulos, A. C.; Pavlos, E. G.; Pavlos, G. P.

    2018-02-01

    In this paper, we perform statistical analysis of time series deriving from Earth's climate. The time series are concerned with Geopotential Height (GH) and correspond to temporal and spatial components of the global distribution of month average values, during the period (1948-2012). The analysis is based on Tsallis non-extensive statistical mechanics and in particular on the estimation of Tsallis' q-triplet, namely {qstat, qsens, qrel}, the reconstructed phase space and the estimation of correlation dimension and the Hurst exponent of rescaled range analysis (R/S). The deviation of Tsallis q-triplet from unity indicates non-Gaussian (Tsallis q-Gaussian) non-extensive character with heavy tails probability density functions (PDFs), multifractal behavior and long range dependences for all timeseries considered. Also noticeable differences of the q-triplet estimation found in the timeseries at distinct local or temporal regions. Moreover, in the reconstructive phase space revealed a lower-dimensional fractal set in the GH dynamical phase space (strong self-organization) and the estimation of Hurst exponent indicated multifractality, non-Gaussianity and persistence. The analysis is giving significant information identifying and characterizing the dynamical characteristics of the earth's climate.

  1. Assessing attitudes towards statistics among medical students: psychometric properties of the Serbian version of the Survey of Attitudes Towards Statistics (SATS).

    PubMed

    Stanisavljevic, Dejana; Trajkovic, Goran; Marinkovic, Jelena; Bukumiric, Zoran; Cirkovic, Andja; Milic, Natasa

    2014-01-01

    Medical statistics has become important and relevant for future doctors, enabling them to practice evidence based medicine. Recent studies report that students' attitudes towards statistics play an important role in their statistics achievements. The aim of the study was to test the psychometric properties of the Serbian version of the Survey of Attitudes Towards Statistics (SATS) in order to acquire a valid instrument to measure attitudes inside the Serbian educational context. The validation study was performed on a cohort of 417 medical students who were enrolled in an obligatory introductory statistics course. The SATS adaptation was based on an internationally accepted methodology for translation and cultural adaptation. Psychometric properties of the Serbian version of the SATS were analyzed through the examination of factorial structure and internal consistency. Most medical students held positive attitudes towards statistics. The average total SATS score was above neutral (4.3±0.8), and varied from 1.9 to 6.2. Confirmatory factor analysis validated the six-factor structure of the questionnaire (Affect, Cognitive Competence, Value, Difficulty, Interest and Effort). Values for fit indices TLI (0.940) and CFI (0.961) were above the cut-off of ≥0.90. The RMSEA value of 0.064 (0.051-0.078) was below the suggested value of ≤0.08. Cronbach's alpha of the entire scale was 0.90, indicating scale reliability. In a multivariate regression model, self-rating of ability in mathematics and current grade point average were significantly associated with the total SATS score after adjusting for age and gender. Present study provided the evidence for the appropriate metric properties of the Serbian version of SATS. Confirmatory factor analysis validated the six-factor structure of the scale. The SATS might be reliable and a valid instrument for identifying medical students' attitudes towards statistics in the Serbian educational context.

  2. The significance of organ prolapse in gastroschisis.

    PubMed

    Koehler, Shannon M; Szabo, Aniko; Loichinger, Matt; Peterson, Erika; Christensen, Melissa; Wagner, Amy J

    2017-12-01

    The aim of this study was to evaluate the incidence and importance of organ prolapse (stomach, bladder, reproductive organs) in gastroschisis. This is a retrospective review of gastroschisis patients from 2000 to 2014 at a single tertiary institution. Statistical analysis was performed using a chi-square test, Student's t test, log-rank test, or Cox regression analysis models. All tests were conducted as two-tailed tests, and p-values <0.05 were considered statistically significant. One hundred seventy-one gastroschisis patients were identified. Sixty-nine (40.6%) had at least one prolapsed organ besides bowel. The most commonly prolapsed organs were stomach (n=45, 26.3%), reproductive organs (n=34, 19.9%), and bladder (n=15, 8.8%). Patients with prolapsed organs were more likely to have simple gastroschisis with significant decreases in the rate of atresia and necrosis/perforation. They progressed to earlier enteral feeds, discontinuation of parenteral nutrition, and discharge. Likewise, these patients were less likely to have complications such as central line infections, sepsis, and short gut syndrome. Gastroschisis is typically described as isolated bowel herniation, but a large portion have prolapse of other organs. Prolapsed organs are associated with simple gastroschisis, and improved outcomes most likely due to a larger fascial defect. This may be useful for prenatal and postnatal counseling of families. Case Control/Retrospective Comparative Study. Level III. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. Statistical Reviewers Improve Reporting in Biomedical Articles: A Randomized Trial

    PubMed Central

    Cobo, Erik; Selva-O'Callagham, Albert; Ribera, Josep-Maria; Cardellach, Francesc; Dominguez, Ruth; Vilardell, Miquel

    2007-01-01

    Background Although peer review is widely considered to be the most credible way of selecting manuscripts and improving the quality of accepted papers in scientific journals, there is little evidence to support its use. Our aim was to estimate the effects on manuscript quality of either adding a statistical peer reviewer or suggesting the use of checklists such as CONSORT or STARD to clinical reviewers or both. Methodology and Principal Findings Interventions were defined as 1) the addition of a statistical reviewer to the clinical peer review process, and 2) suggesting reporting guidelines to reviewers; with “no statistical expert” and “no checklist” as controls. The two interventions were crossed in a 2×2 balanced factorial design including original research articles consecutively selected, between May 2004 and March 2005, by the Medicina Clinica (Barc) editorial committee. We randomized manuscripts to minimize differences in terms of baseline quality and type of study (intervention, longitudinal, cross-sectional, others). Sample-size calculations indicated that 100 papers provide an 80% power to test a 55% standardized difference. We specified the main outcome as the increment in quality of papers as measured on the Goodman Scale. Two blinded evaluators rated the quality of manuscripts at initial submission and final post peer review version. Of the 327 manuscripts submitted to the journal, 131 were accepted for further review, and 129 were randomized. Of those, 14 that were lost to follow-up showed no differences in initial quality to the followed-up papers. Hence, 115 were included in the main analysis, with 16 rejected for publication after peer review. 21 (18.3%) of the 115 included papers were interventions, 46 (40.0%) were longitudinal designs, 28 (24.3%) cross-sectional and 20 (17.4%) others. The 16 (13.9%) rejected papers had a significantly lower initial score on the overall Goodman scale than accepted papers (difference 15.0, 95% CI: 4.6–24

  4. The environment associated with significant tornadoes in Bangladesh

    NASA Astrophysics Data System (ADS)

    Bikos, Dan; Finch, Jonathan; Case, Jonathan L.

    2016-01-01

    This paper investigates the environmental parameters favoring significant tornadoes in Bangladesh through a simulation of ten high-impact events. A climatological perspective is first presented on classifying significant tornadoes in Bangladesh, noting the challenges since reports of tornadoes are not documented in a formal manner. The statistical relationship between United States and Bangladesh tornado-related deaths suggests that significant tornadoes do occur in Bangladesh so this paper identifies the most significant tornadic events and analyzes the environmental conditions associated with these events. Given the scarcity of observational data to assess the near-storm environment in this region, high-resolution (3-km horizontal grid spacing) numerical weather prediction simulations are performed for events identified to be associated with a significant tornado. In comparison to similar events over the United States, significant tornado environments in Bangladesh are characterized by relatively high convective available potential energy, sufficient deep-layer vertical shear, and a propensity for deviant (i.e., well to the right of the mean flow) storm motion along a low-level convergence boundary.

  5. Significance testing testate amoeba water table reconstructions

    NASA Astrophysics Data System (ADS)

    Payne, Richard J.; Babeshko, Kirill V.; van Bellen, Simon; Blackford, Jeffrey J.; Booth, Robert K.; Charman, Dan J.; Ellershaw, Megan R.; Gilbert, Daniel; Hughes, Paul D. M.; Jassey, Vincent E. J.; Lamentowicz, Łukasz; Lamentowicz, Mariusz; Malysheva, Elena A.; Mauquoy, Dmitri; Mazei, Yuri; Mitchell, Edward A. D.; Swindles, Graeme T.; Tsyganov, Andrey N.; Turner, T. Edward; Telford, Richard J.

    2016-04-01

    Transfer functions are valuable tools in palaeoecology, but their output may not always be meaningful. A recently-developed statistical test ('randomTF') offers the potential to distinguish among reconstructions which are more likely to be useful, and those less so. We applied this test to a large number of reconstructions of peatland water table depth based on testate amoebae. Contrary to our expectations, a substantial majority (25 of 30) of these reconstructions gave non-significant results (P > 0.05). The underlying reasons for this outcome are unclear. We found no significant correlation between randomTF P-value and transfer function performance, the properties of the training set and reconstruction, or measures of transfer function fit. These results give cause for concern but we believe it would be extremely premature to discount the results of non-significant reconstructions. We stress the need for more critical assessment of transfer function output, replication of results and ecologically-informed interpretation of palaeoecological data.

  6. Research design and statistical methods in Pakistan Journal of Medical Sciences (PJMS).

    PubMed

    Akhtar, Sohail; Shah, Syed Wadood Ali; Rafiq, M; Khan, Ajmal

    2016-01-01

    This article compares the study design and statistical methods used in 2005, 2010 and 2015 of Pakistan Journal of Medical Sciences (PJMS). Only original articles of PJMS were considered for the analysis. The articles were carefully reviewed for statistical methods and designs, and then recorded accordingly. The frequency of each statistical method and research design was estimated and compared with previous years. A total of 429 articles were evaluated (n=74 in 2005, n=179 in 2010, n=176 in 2015) in which 171 (40%) were cross-sectional and 116 (27%) were prospective study designs. A verity of statistical methods were found in the analysis. The most frequent methods include: descriptive statistics (n=315, 73.4%), chi-square/Fisher's exact tests (n=205, 47.8%) and student t-test (n=186, 43.4%). There was a significant increase in the use of statistical methods over time period: t-test, chi-square/Fisher's exact test, logistic regression, epidemiological statistics, and non-parametric tests. This study shows that a diverse variety of statistical methods have been used in the research articles of PJMS and frequency improved from 2005 to 2015. However, descriptive statistics was the most frequent method of statistical analysis in the published articles while cross-sectional study design was common study design.

  7. Noise level and MPEG-2 encoder statistics

    NASA Astrophysics Data System (ADS)

    Lee, Jungwoo

    1997-01-01

    Most software in the movie and broadcasting industries are still in analog film or tape format, which typically contains random noise that originated from film, CCD camera, and tape recording. The performance of the MPEG-2 encoder may be significantly degraded by the noise. It is also affected by the scene type that includes spatial and temporal activity. The statistical property of noise originating from camera and tape player is analyzed and the models for the two types of noise are developed. The relationship between the noise, the scene type, and encoder statistics of a number of MPEG-2 parameters such as motion vector magnitude, prediction error, and quant scale are discussed. This analysis is intended to be a tool for designing robust MPEG encoding algorithms such as preprocessing and rate control.

  8. Bayesian statistics in radionuclide metrology: measurement of a decaying source

    NASA Astrophysics Data System (ADS)

    Bochud, François O.; Bailat, Claude J.; Laedermann, Jean-Pascal

    2007-08-01

    The most intuitive way of defining a probability is perhaps through the frequency at which it appears when a large number of trials are realized in identical conditions. The probability derived from the obtained histogram characterizes the so-called frequentist or conventional statistical approach. In this sense, probability is defined as a physical property of the observed system. By contrast, in Bayesian statistics, a probability is not a physical property or a directly observable quantity, but a degree of belief or an element of inference. The goal of this paper is to show how Bayesian statistics can be used in radionuclide metrology and what its advantages and disadvantages are compared with conventional statistics. This is performed through the example of an yttrium-90 source typically encountered in environmental surveillance measurement. Because of the very low activity of this kind of source and the small half-life of the radionuclide, this measurement takes several days, during which the source decays significantly. Several methods are proposed to compute simultaneously the number of unstable nuclei at a given reference time, the decay constant and the background. Asymptotically, all approaches give the same result. However, Bayesian statistics produces coherent estimates and confidence intervals in a much smaller number of measurements. Apart from the conceptual understanding of statistics, the main difficulty that could deter radionuclide metrologists from using Bayesian statistics is the complexity of the computation.

  9. Universal Recurrence Time Statistics of Characteristic Earthquakes

    NASA Astrophysics Data System (ADS)

    Goltz, C.; Turcotte, D. L.; Abaimov, S.; Nadeau, R. M.

    2006-12-01

    Characteristic earthquakes are defined to occur quasi-periodically on major faults. Do recurrence time statistics of such earthquakes follow a particular statistical distribution? If so, which one? The answer is fundamental and has important implications for hazard assessment. The problem cannot be solved by comparing the goodness of statistical fits as the available sequences are too short. The Parkfield sequence of M ≍ 6 earthquakes, one of the most extensive reliable data sets available, has grown to merely seven events with the last earthquake in 2004, for example. Recently, however, advances in seismological monitoring and improved processing methods have unveiled so-called micro-repeaters, micro-earthquakes which recur exactly in the same location on a fault. It seems plausible to regard these earthquakes as a miniature version of the classic characteristic earthquakes. Micro-repeaters are much more frequent than major earthquakes, leading to longer sequences for analysis. Due to their recent discovery, however, available sequences contain less than 20 events at present. In this paper we present results for the analysis of recurrence times for several micro-repeater sequences from Parkfield and adjacent regions. To improve the statistical significance of our findings, we combine several sequences into one by rescaling the individual sets by their respective mean recurrence intervals and Weibull exponents. This novel approach of rescaled combination yields the most extensive data set possible. We find that the resulting statistics can be fitted well by an exponential distribution, confirming the universal applicability of the Weibull distribution to characteristic earthquakes. A similar result is obtained from rescaled combination, however, with regard to the lognormal distribution.

  10. Transport Statistics - Transport - UNECE

    Science.gov Websites

    Statistics and Data Online Infocards Database SDG Papers E-Road Census Traffic Census Map Traffic Census 2015 available. Two new datasets have been added to the transport statistics database: bus and coach statistics Database Evaluations Follow UNECE Facebook Rss Twitter You tube Contact us Instagram Flickr Google+ Â

  11. Cancer Statistics Animator

    Cancer.gov

    This tool allows users to animate cancer trends over time by cancer site and cause of death, race, and sex. Provides access to incidence, mortality, and survival. Select the type of statistic, variables, format, and then extract the statistics in a delimited format for further analyses.

  12. Self-selection contributes significantly to the lower adiposity offaster, longer-distanced, male and female walkers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williams, Paul T.

    2006-01-06

    Although cross-sectional studies show active individuals areleaner than their sedentary counterparts, it remains to be determined towhat extent this is due to initially leaner men and women choosing toexercise longer and more intensely (self-selection bias). In this reportwalking volume (weekly distance) and intensity (speed) were compared tocurrent BMI (BMIcurrent) and BMI at the start of walking (BMIstarting) in20,353 women and 5,174 men who had walked regularly for exercise for 7.2and 10.6 years,respectively. The relationships of BMIcurrent andBMIstarting with distance and intensity were nonlinear (convex). Onaverage, BMIstarting explained>70 percent of the association betweenBMIcurrent and intensity, and 40 percent and 17 percentmore » of theassociation between BMIcurrent and distance in women and men,respectively. Although the declines in BMIcurrent with distance andintensity were greater among fatter than leaner individuals, the portionsattributable to BMIstarting remained relatively constant regardless offatness. Thus self-selection bias accounts for most of the decline in BMIwith walking intensity and smaller albeit significant proportions of thedecline with distance. This demonstration of self-selection is germane toother cross-sectional comparisons in epidemiological research, givenself-selection is unlikely to be limited to weight or peculiar tophysical activity.« less

  13. Applied extreme-value statistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kinnison, R.R.

    1983-05-01

    The statistical theory of extreme values is a well established part of theoretical statistics. Unfortunately, it is seldom part of applied statistics and is infrequently a part of statistical curricula except in advanced studies programs. This has resulted in the impression that it is difficult to understand and not of practical value. In recent environmental and pollution literature, several short articles have appeared with the purpose of documenting all that is necessary for the practical application of extreme value theory to field problems (for example, Roberts, 1979). These articles are so concise that only a statistician can recognise all themore » subtleties and assumptions necessary for the correct use of the material presented. The intent of this text is to expand upon several recent articles, and to provide the necessary statistical background so that the non-statistician scientist can recognize and extreme value problem when it occurs in his work, be confident in handling simple extreme value problems himself, and know when the problem is statistically beyond his capabilities and requires consultation.« less

  14. 2016 Annual Disability Statistics Compendium

    ERIC Educational Resources Information Center

    Lauer, E. A.; Houtenville, A. J.

    2017-01-01

    The "Annual Disability Statistics Compendium" is a publication of statistics about people with disabilities and about the government programs which serve them. The "Compendium" is designed to serve as a summary of government statistics. The 2016 "Annual Disability Statistics Compendium" was substantially revised and…

  15. High Impact = High Statistical Standards? Not Necessarily So

    PubMed Central

    Tressoldi, Patrizio E.; Giofré, David; Sella, Francesco; Cumming, Geoff

    2013-01-01

    What are the statistical practices of articles published in journals with a high impact factor? Are there differences compared with articles published in journals with a somewhat lower impact factor that have adopted editorial policies to reduce the impact of limitations of Null Hypothesis Significance Testing? To investigate these questions, the current study analyzed all articles related to psychological, neuropsychological and medical issues, published in 2011 in four journals with high impact factors: Science, Nature, The New England Journal of Medicine and The Lancet, and three journals with relatively lower impact factors: Neuropsychology, Journal of Experimental Psychology-Applied and the American Journal of Public Health. Results show that Null Hypothesis Significance Testing without any use of confidence intervals, effect size, prospective power and model estimation, is the prevalent statistical practice used in articles published in Nature, 89%, followed by articles published in Science, 42%. By contrast, in all other journals, both with high and lower impact factors, most articles report confidence intervals and/or effect size measures. We interpreted these differences as consequences of the editorial policies adopted by the journal editors, which are probably the most effective means to improve the statistical practices in journals with high or low impact factors. PMID:23418533

  16. High impact  =  high statistical standards? Not necessarily so.

    PubMed

    Tressoldi, Patrizio E; Giofré, David; Sella, Francesco; Cumming, Geoff

    2013-01-01

    What are the statistical practices of articles published in journals with a high impact factor? Are there differences compared with articles published in journals with a somewhat lower impact factor that have adopted editorial policies to reduce the impact of limitations of Null Hypothesis Significance Testing? To investigate these questions, the current study analyzed all articles related to psychological, neuropsychological and medical issues, published in 2011 in four journals with high impact factors: Science, Nature, The New England Journal of Medicine and The Lancet, and three journals with relatively lower impact factors: Neuropsychology, Journal of Experimental Psychology-Applied and the American Journal of Public Health. Results show that Null Hypothesis Significance Testing without any use of confidence intervals, effect size, prospective power and model estimation, is the prevalent statistical practice used in articles published in Nature, 89%, followed by articles published in Science, 42%. By contrast, in all other journals, both with high and lower impact factors, most articles report confidence intervals and/or effect size measures. We interpreted these differences as consequences of the editorial policies adopted by the journal editors, which are probably the most effective means to improve the statistical practices in journals with high or low impact factors.

  17. Non-Markovian full counting statistics in quantum dot molecules

    PubMed Central

    Xue, Hai-Bin; Jiao, Hu-Jun; Liang, Jiu-Qing; Liu, Wu-Ming

    2015-01-01

    Full counting statistics of electron transport is a powerful diagnostic tool for probing the nature of quantum transport beyond what is obtainable from the average current or conductance measurement alone. In particular, the non-Markovian dynamics of quantum dot molecule plays an important role in the nonequilibrium electron tunneling processes. It is thus necessary to understand the non-Markovian full counting statistics in a quantum dot molecule. Here we study the non-Markovian full counting statistics in two typical quantum dot molecules, namely, serially coupled and side-coupled double quantum dots with high quantum coherence in a certain parameter regime. We demonstrate that the non-Markovian effect manifests itself through the quantum coherence of the quantum dot molecule system, and has a significant impact on the full counting statistics in the high quantum-coherent quantum dot molecule system, which depends on the coupling of the quantum dot molecule system with the source and drain electrodes. The results indicated that the influence of the non-Markovian effect on the full counting statistics of electron transport, which should be considered in a high quantum-coherent quantum dot molecule system, can provide a better understanding of electron transport through quantum dot molecules. PMID:25752245

  18. Intervention for Maltreating Fathers: Statistically and Clinically Significant Change

    ERIC Educational Resources Information Center

    Scott, Katreena L.; Lishak, Vicky

    2012-01-01

    Objective: Fathers are seldom the focus of efforts to address child maltreatment and little is currently known about the effectiveness of intervention for this population. To address this gap, we examined the efficacy of a community-based group treatment program for fathers who had abused or neglected their children or exposed their children to…

  19. Research Design and Statistical Methods in Indian Medical Journals: A Retrospective Survey

    PubMed Central

    Hassan, Shabbeer; Yellur, Rajashree; Subramani, Pooventhan; Adiga, Poornima; Gokhale, Manoj; Iyer, Manasa S.; Mayya, Shreemathi S.

    2015-01-01

    Good quality medical research generally requires not only an expertise in the chosen medical field of interest but also a sound knowledge of statistical methodology. The number of medical research articles which have been published in Indian medical journals has increased quite substantially in the past decade. The aim of this study was to collate all evidence on study design quality and statistical analyses used in selected leading Indian medical journals. Ten (10) leading Indian medical journals were selected based on impact factors and all original research articles published in 2003 (N = 588) and 2013 (N = 774) were categorized and reviewed. A validated checklist on study design, statistical analyses, results presentation, and interpretation was used for review and evaluation of the articles. Main outcomes considered in the present study were – study design types and their frequencies, error/defects proportion in study design, statistical analyses, and implementation of CONSORT checklist in RCT (randomized clinical trials). From 2003 to 2013: The proportion of erroneous statistical analyses did not decrease (χ2=0.592, Φ=0.027, p=0.4418), 25% (80/320) in 2003 compared to 22.6% (111/490) in 2013. Compared with 2003, significant improvement was seen in 2013; the proportion of papers using statistical tests increased significantly (χ2=26.96, Φ=0.16, p<0.0001) from 42.5% (250/588) to 56.7 % (439/774). The overall proportion of errors in study design decreased significantly (χ2=16.783, Φ=0.12 p<0.0001), 41.3% (243/588) compared to 30.6% (237/774). In 2013, randomized clinical trials designs has remained very low (7.3%, 43/588) with majority showing some errors (41 papers, 95.3%). Majority of the published studies were retrospective in nature both in 2003 [79.1% (465/588)] and in 2013 [78.2% (605/774)]. Major decreases in error proportions were observed in both results presentation (χ2=24.477, Φ=0.17, p<0.0001), 82.2% (263/320) compared to 66.3% (325

  20. Research design and statistical methods in Indian medical journals: a retrospective survey.

    PubMed

    Hassan, Shabbeer; Yellur, Rajashree; Subramani, Pooventhan; Adiga, Poornima; Gokhale, Manoj; Iyer, Manasa S; Mayya, Shreemathi S

    2015-01-01

    Good quality medical research generally requires not only an expertise in the chosen medical field of interest but also a sound knowledge of statistical methodology. The number of medical research articles which have been published in Indian medical journals has increased quite substantially in the past decade. The aim of this study was to collate all evidence on study design quality and statistical analyses used in selected leading Indian medical journals. Ten (10) leading Indian medical journals were selected based on impact factors and all original research articles published in 2003 (N = 588) and 2013 (N = 774) were categorized and reviewed. A validated checklist on study design, statistical analyses, results presentation, and interpretation was used for review and evaluation of the articles. Main outcomes considered in the present study were - study design types and their frequencies, error/defects proportion in study design, statistical analyses, and implementation of CONSORT checklist in RCT (randomized clinical trials). From 2003 to 2013: The proportion of erroneous statistical analyses did not decrease (χ2=0.592, Φ=0.027, p=0.4418), 25% (80/320) in 2003 compared to 22.6% (111/490) in 2013. Compared with 2003, significant improvement was seen in 2013; the proportion of papers using statistical tests increased significantly (χ2=26.96, Φ=0.16, p<0.0001) from 42.5% (250/588) to 56.7 % (439/774). The overall proportion of errors in study design decreased significantly (χ2=16.783, Φ=0.12 p<0.0001), 41.3% (243/588) compared to 30.6% (237/774). In 2013, randomized clinical trials designs has remained very low (7.3%, 43/588) with majority showing some errors (41 papers, 95.3%). Majority of the published studies were retrospective in nature both in 2003 [79.1% (465/588)] and in 2013 [78.2% (605/774)]. Major decreases in error proportions were observed in both results presentation (χ2=24.477, Φ=0.17, p<0.0001), 82.2% (263/320) compared to 66.3% (325/490) and

  1. Applications of statistics to medical science, II overview of statistical procedures for general use.

    PubMed

    Watanabe, Hiroshi

    2012-01-01

    Procedures of statistical analysis are reviewed to provide an overview of applications of statistics for general use. Topics that are dealt with are inference on a population, comparison of two populations with respect to means and probabilities, and multiple comparisons. This study is the second part of series in which we survey medical statistics. Arguments related to statistical associations and regressions will be made in subsequent papers.

  2. The Ontology of Biological and Clinical Statistics (OBCS) for standardized and reproducible statistical analysis.

    PubMed

    Zheng, Jie; Harris, Marcelline R; Masci, Anna Maria; Lin, Yu; Hero, Alfred; Smith, Barry; He, Yongqun

    2016-09-14

    Statistics play a critical role in biological and clinical research. However, most reports of scientific results in the published literature make it difficult for the reader to reproduce the statistical analyses performed in achieving those results because they provide inadequate documentation of the statistical tests and algorithms applied. The Ontology of Biological and Clinical Statistics (OBCS) is put forward here as a step towards solving this problem. The terms in OBCS including 'data collection', 'data transformation in statistics', 'data visualization', 'statistical data analysis', and 'drawing a conclusion based on data', cover the major types of statistical processes used in basic biological research and clinical outcome studies. OBCS is aligned with the Basic Formal Ontology (BFO) and extends the Ontology of Biomedical Investigations (OBI), an OBO (Open Biological and Biomedical Ontologies) Foundry ontology supported by over 20 research communities. Currently, OBCS comprehends 878 terms, representing 20 BFO classes, 403 OBI classes, 229 OBCS specific classes, and 122 classes imported from ten other OBO ontologies. We discuss two examples illustrating how the ontology is being applied. In the first (biological) use case, we describe how OBCS was applied to represent the high throughput microarray data analysis of immunological transcriptional profiles in human subjects vaccinated with an influenza vaccine. In the second (clinical outcomes) use case, we applied OBCS to represent the processing of electronic health care data to determine the associations between hospital staffing levels and patient mortality. Our case studies were designed to show how OBCS can be used for the consistent representation of statistical analysis pipelines under two different research paradigms. Other ongoing projects using OBCS for statistical data processing are also discussed. The OBCS source code and documentation are available at: https://github.com/obcs/obcs . The Ontology

  3. Estimating annual high-flow statistics and monthly and seasonal low-flow statistics for ungaged sites on streams in Alaska and conterminous basins in Canada

    USGS Publications Warehouse

    Wiley, Jeffrey B.; Curran, Janet H.

    2003-01-01

    Methods for estimating daily mean flow-duration statistics for seven regions in Alaska and low-flow frequencies for one region, southeastern Alaska, were developed from daily mean discharges for streamflow-gaging stations in Alaska and conterminous basins in Canada. The 15-, 10-, 9-, 8-, 7-, 6-, 5-, 4-, 3-, 2-, and 1-percent duration flows were computed for the October-through-September water year for 222 stations in Alaska and conterminous basins in Canada. The 98-, 95-, 90-, 85-, 80-, 70-, 60-, and 50-percent duration flows were computed for the individual months of July, August, and September for 226 stations in Alaska and conterminous basins in Canada. The 98-, 95-, 90-, 85-, 80-, 70-, 60-, and 50-percent duration flows were computed for the season July-through-September for 65 stations in southeastern Alaska. The 7-day, 10-year and 7-day, 2-year low-flow frequencies for the season July-through-September were computed for 65 stations for most of southeastern Alaska. Low-flow analyses were limited to particular months or seasons in order to omit winter low flows, when ice effects reduce the quality of the records and validity of statistical assumptions. Regression equations for estimating the selected high-flow and low-flow statistics for the selected months and seasons for ungaged sites were developed from an ordinary-least-squares regression model using basin characteristics as independent variables. Drainage area and precipitation were significant explanatory variables for high flows, and drainage area, precipitation, mean basin elevation, and area of glaciers were significant explanatory variables for low flows. The estimating equations can be used at ungaged sites in Alaska and conterminous basins in Canada where streamflow regulation, streamflow diversion, urbanization, and natural damming and releasing of water do not affect the streamflow data for the given month or season. Standard errors of estimate ranged from 15 to 56 percent for high-duration flow

  4. Statistical-Dynamical Seasonal Forecasts of Central-Southwest Asian Winter Precipitation.

    NASA Astrophysics Data System (ADS)

    Tippett, Michael K.; Goddard, Lisa; Barnston, Anthony G.

    2005-06-01

    Interannual precipitation variability in central-southwest (CSW) Asia has been associated with East Asian jet stream variability and western Pacific tropical convection. However, atmospheric general circulation models (AGCMs) forced by observed sea surface temperature (SST) poorly simulate the region's interannual precipitation variability. The statistical-dynamical approach uses statistical methods to correct systematic deficiencies in the response of AGCMs to SST forcing. Statistical correction methods linking model-simulated Indo-west Pacific precipitation and observed CSW Asia precipitation result in modest, but statistically significant, cross-validated simulation skill in the northeast part of the domain for the period from 1951 to 1998. The statistical-dynamical method is also applied to recent (winter 1998/99 to 2002/03) multimodel, two-tier December-March precipitation forecasts initiated in October. This period includes 4 yr (winter of 1998/99 to 2001/02) of severe drought. Tercile probability forecasts are produced using ensemble-mean forecasts and forecast error estimates. The statistical-dynamical forecasts show enhanced probability of below-normal precipitation for the four drought years and capture the return to normal conditions in part of the region during the winter of 2002/03.May Kabul be without gold, but not without snow.—Traditional Afghan proverb

  5. SOCR: Statistics Online Computational Resource

    PubMed Central

    Dinov, Ivo D.

    2011-01-01

    The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an integrated educational web-based framework for: interactive distribution modeling, virtual online probability experimentation, statistical data analysis, visualization and integration. Following years of experience in statistical teaching at all college levels using established licensed statistical software packages, like STATA, S-PLUS, R, SPSS, SAS, Systat, etc., we have attempted to engineer a new statistics education environment, the Statistics Online Computational Resource (SOCR). This resource performs many of the standard types of statistical analysis, much like other classical tools. In addition, it is designed in a plug-in object-oriented architecture and is completely platform independent, web-based, interactive, extensible and secure. Over the past 4 years we have tested, fine-tuned and reanalyzed the SOCR framework in many of our undergraduate and graduate probability and statistics courses and have evidence that SOCR resources build student’s intuition and enhance their learning. PMID:21451741

  6. Statistical assessment of crosstalk enrichment between gene groups in biological networks.

    PubMed

    McCormack, Theodore; Frings, Oliver; Alexeyenko, Andrey; Sonnhammer, Erik L L

    2013-01-01

    Analyzing groups of functionally coupled genes or proteins in the context of global interaction networks has become an important aspect of bioinformatic investigations. Assessing the statistical significance of crosstalk enrichment between or within groups of genes can be a valuable tool for functional annotation of experimental gene sets. Here we present CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment between pairs of gene or protein groups in large biological networks. We demonstrate that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the ability of four different methods to reliably recover crosstalk within known biological pathways. We conclude that the methods preserving the second-order topological network properties perform best. Finally, we show how CrossTalkZ can be used to annotate experimental gene sets using known pathway annotations and that its performance at this task is superior to gene enrichment analysis (GEA). CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is implemented in C++, easy to use, fast, accepts various input file formats, and produces a number of statistics. These include z-score, p-value, false discovery rate, and a test of normality for the null distributions.

  7. Statistical characteristics of trajectories of diamagnetic unicellular organisms in a magnetic field.

    PubMed

    Gorobets, Yu I; Gorobets, O Yu

    2015-01-01

    The statistical model is proposed in this paper for description of orientation of trajectories of unicellular diamagnetic organisms in a magnetic field. The statistical parameter such as the effective energy is calculated on basis of this model. The resulting effective energy is the statistical characteristics of trajectories of diamagnetic microorganisms in a magnetic field connected with their metabolism. The statistical model is applicable for the case when the energy of the thermal motion of bacteria is negligible in comparison with their energy in a magnetic field and the bacteria manifest the significant "active random movement", i.e. there is the randomizing motion of the bacteria of non thermal nature, for example, movement of bacteria by means of flagellum. The energy of the randomizing active self-motion of bacteria is characterized by the new statistical parameter for biological objects. The parameter replaces the energy of the randomizing thermal motion in calculation of the statistical distribution. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution

    PubMed Central

    Gangnon, Ronald E.

    2011-01-01

    Summary The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, while rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. PMID:21762118

  9. Advances in statistics

    Treesearch

    Howard Stauffer; Nadav Nur

    2005-01-01

    The papers included in the Advances in Statistics section of the Partners in Flight (PIF) 2002 Proceedings represent a small sample of statistical topics of current importance to Partners In Flight research scientists: hierarchical modeling, estimation of detection probabilities, and Bayesian applications. Sauer et al. (this volume) examines a hierarchical model...

  10. Experimental Mathematics and Computational Statistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bailey, David H.; Borwein, Jonathan M.

    2009-04-30

    The field of statistics has long been noted for techniques to detect patterns and regularities in numerical data. In this article we explore connections between statistics and the emerging field of 'experimental mathematics'. These includes both applications of experimental mathematics in statistics, as well as statistical methods applied to computational mathematics.

  11. Whose statistical reasoning is facilitated by a causal structure intervention?

    PubMed

    McNair, Simon; Feeney, Aidan

    2015-02-01

    People often struggle when making Bayesian probabilistic estimates on the basis of competing sources of statistical evidence. Recently, Krynski and Tenenbaum (Journal of Experimental Psychology: General, 136, 430-450, 2007) proposed that a causal Bayesian framework accounts for peoples' errors in Bayesian reasoning and showed that, by clarifying the causal relations among the pieces of evidence, judgments on a classic statistical reasoning problem could be significantly improved. We aimed to understand whose statistical reasoning is facilitated by the causal structure intervention. In Experiment 1, although we observed causal facilitation effects overall, the effect was confined to participants high in numeracy. We did not find an overall facilitation effect in Experiment 2 but did replicate the earlier interaction between numerical ability and the presence or absence of causal content. This effect held when we controlled for general cognitive ability and thinking disposition. Our results suggest that clarifying causal structure facilitates Bayesian judgments, but only for participants with sufficient understanding of basic concepts in probability and statistics.

  12. Using spatial statistics to identify emerging hot spots of forest loss

    NASA Astrophysics Data System (ADS)

    Harris, Nancy L.; Goldman, Elizabeth; Gabris, Christopher; Nordling, Jon; Minnemeyer, Susan; Ansari, Stephen; Lippmann, Michael; Bennett, Lauren; Raad, Mansour; Hansen, Matthew; Potapov, Peter

    2017-02-01

    As sources of data for global forest monitoring grow larger, more complex and numerous, data analysis and interpretation become critical bottlenecks for effectively using them to inform land use policy discussions. Here in this paper, we present a method that combines big data analytical tools with Emerging Hot Spot Analysis (ArcGIS) to identify statistically significant spatiotemporal trends of forest loss in Brazil, Indonesia and the Democratic Republic of Congo (DRC) between 2000 and 2014. Results indicate that while the overall rate of forest loss in Brazil declined over the 14-year time period, spatiotemporal patterns of loss shifted, with forest loss significantly diminishing within the Amazonian states of Mato Grosso and Rondônia and intensifying within the cerrado biome. In Indonesia, forest loss intensified in Riau province in Sumatra and in Sukamara and West Kotawaringin regencies in Central Kalimantan. Substantial portions of West Kalimantan became new and statistically significant hot spots of forest loss in the years 2013 and 2014. Similarly, vast areas of DRC emerged as significant new hot spots of forest loss, with intensified loss radiating out from city centers such as Beni and Kisangani. While our results focus on identifying significant trends at the national scale, we also demonstrate the scalability of our approach to smaller or larger regions depending on the area of interest and specific research question involved. When combined with other contextual information, these statistical data models can help isolate the most significant clusters of loss occurring over dynamic forest landscapes and provide more coherent guidance for the allocation of resources for forest monitoring and enforcement efforts.

  13. Reporting of Numerical and Statistical Differences in Abstracts

    PubMed Central

    Dryver, Eric; Hux, Janet E

    2002-01-01

    OBJECTIVE The reporting of relative risk reductions (RRRs) or absolute risk reductions (ARRs) to quantify binary outcomes in trials engenders differing perceptions of therapeutic efficacy, and the merits of P values versus confidence intervals (CIs) are also controversial. We describe the manner in which numerical and statistical difference in treatment outcomes is presented in published abstracts. DESIGN A descriptive study of abstracts published in 1986 and 1996 in 8 general medical and specialty journals. Inclusion criteria: controlled, intervention trials with a binary primary or secondary outcome. Seven items were recorded: raw data (outcomes for each treatment arm), measure of relative difference (e.g., RRR), ARR, number needed to treat, P value, CI, and verbal statement of statistical significance. The prevalence of these items was compared between journals and across time. RESULTS Of 5,293 abstracts, 300 met the inclusion criteria. In 1986, 60% of abstracts did not provide both the raw data and a corresponding P value or CI, while 28% failed to do so in 1Dr. Hux is a Career Scientist of the Ontario Ministry of Health and receives salary support from the Institute for Clinical Evaluative Sciences in Ontario.996 (P < .001; RRR of 53%; ARR of 32%; CI for ARR 21% to 43%). The variability between journals was highly significant (P < .001). In 1986, 100% of abstracts lacked a measure of absolute difference while 88% of 1996 abstracts did so (P < .001). In 1986, 98% of abstracts lacked a CI while 65% of 1996 abstracts did so (P < .001). CONCLUSIONS The provision of quantitative outcome and statistical quantitative information has significantly increased between 1986 and 1996. However, further progress can be made to make abstracts more informative. PMID:11929506

  14. Statistics and Informatics in Space Astrophysics

    NASA Astrophysics Data System (ADS)

    Feigelson, E.

    2017-12-01

    The interest in statistical and computational methodology has seen rapid growth in space-based astrophysics, parallel to the growth seen in Earth remote sensing. There is widespread agreement that scientific interpretation of the cosmic microwave background, discovery of exoplanets, and classifying multiwavelength surveys is too complex to be accomplished with traditional techniques. NASA operates several well-functioning Science Archive Research Centers providing 0.5 PBy datasets to the research community. These databases are integrated with full-text journal articles in the NASA Astrophysics Data System (200K pageviews/day). Data products use interoperable formats and protocols established by the International Virtual Observatory Alliance. NASA supercomputers also support complex astrophysical models of systems such as accretion disks and planet formation. Academic researcher interest in methodology has significantly grown in areas such as Bayesian inference and machine learning, and statistical research is underway to treat problems such as irregularly spaced time series and astrophysical model uncertainties. Several scholarly societies have created interest groups in astrostatistics and astroinformatics. Improvements are needed on several fronts. Community education in advanced methodology is not sufficiently rapid to meet the research needs. Statistical procedures within NASA science analysis software are sometimes not optimal, and pipeline development may not use modern software engineering techniques. NASA offers few grant opportunities supporting research in astroinformatics and astrostatistics.

  15. Water Quality Statistics

    ERIC Educational Resources Information Center

    Hodgson, Ted; Andersen, Lyle; Robison-Cox, Jim; Jones, Clain

    2004-01-01

    Water quality experiments, especially the use of macroinvertebrates as indicators of water quality, offer an ideal context for connecting statistics and science. In the STAR program for secondary students and teachers, water quality experiments were also used as a context for teaching statistics. In this article, we trace one activity that uses…

  16. Teaching MBA Statistics Online: A Pedagogically Sound Process Approach

    ERIC Educational Resources Information Center

    Grandzol, John R.

    2004-01-01

    Delivering MBA statistics in the online environment presents significant challenges to education and students alike because of varying student preparedness levels, complexity of content, difficulty in assessing learning outcomes, and faculty availability and technological expertise. In this article, the author suggests a process model that…

  17. Truth, Damn Truth, and Statistics

    ERIC Educational Resources Information Center

    Velleman, Paul F.

    2008-01-01

    Statisticians and Statistics teachers often have to push back against the popular impression that Statistics teaches how to lie with data. Those who believe incorrectly that Statistics is solely a branch of Mathematics (and thus algorithmic), often see the use of judgment in Statistics as evidence that we do indeed manipulate our results. In the…

  18. Does income inequality get under the skin? A multilevel analysis of depression, anxiety and mental disorders in Sao Paulo, Brazil.

    PubMed

    Chiavegatto Filho, Alexandre Dias Porto; Kawachi, Ichiro; Wang, Yuan Pang; Viana, Maria Carmen; Andrade, Laura Helena Silveira Guerra

    2013-11-01

    Test the original income inequality theory, by analysing its association with depression, anxiety and any mental disorders. We analysed a sample of 3542 individuals aged 18 years and older selected through a stratified, multistage area probability sample of households from the São Paulo Metropolitan Area. Mental disorder symptoms were assessed using the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) criteria. Bayesian multilevel logistic models were performed. Living in areas with medium and high-income inequality was statistically associated with increased risk of depression, relative to low-inequality areas (OR 1.76; 95% CI 1.21 to 2.55, and 1.53; 95% CI 1.07 to 2.19, respectively). The same was not true for anxiety (OR 1.25; 95% CI 0.90 to 1.73, and OR 1.07; 95% CI 0.79 to 1.46). In the case of any mental disorder, results were mixed. In general, our findings were consistent with the income inequality theory, that is, people living in places with higher income inequality had an overall higher odd of mental disorders, albeit not always statistically significant. The fact that depression, but not anxiety, was statistically significant could indicate a pathway by which inequality influences health.

  19. Statistical and molecular analyses of evolutionary significance of red-green color vision and color blindness in vertebrates.

    PubMed

    Yokoyama, Shozo; Takenaka, Naomi

    2005-04-01

    Red-green color vision is strongly suspected to enhance the survival of its possessors. Despite being red-green color blind, however, many species have successfully competed in nature, which brings into question the evolutionary advantage of achieving red-green color vision. Here, we propose a new method of identifying positive selection at individual amino acid sites with the premise that if positive Darwinian selection has driven the evolution of the protein under consideration, then it should be found mostly at the branches in the phylogenetic tree where its function had changed. The statistical and molecular methods have been applied to 29 visual pigments with the wavelengths of maximal absorption at approximately 510-540 nm (green- or middle wavelength-sensitive [MWS] pigments) and at approximately 560 nm (red- or long wavelength-sensitive [LWS] pigments), which are sampled from a diverse range of vertebrate species. The results show that the MWS pigments are positively selected through amino acid replacements S180A, Y277F, and T285A and that the LWS pigments have been subjected to strong evolutionary conservation. The fact that these positively selected M/LWS pigments are found not only in animals with red-green color vision but also in those with red-green color blindness strongly suggests that both red-green color vision and color blindness have undergone adaptive evolution independently in different species.

  20. Evaluating collective significance of climatic trends: A comparison of methods on synthetic data

    NASA Astrophysics Data System (ADS)

    Huth, Radan; Dubrovský, Martin

    2017-04-01

    The common approach to determine whether climatic trends are significantly different from zero is to conduct individual (local) tests at each single site (station or gridpoint). Whether the number of sites where the trends are significantly non-zero can or cannot occur by random, is almost never evaluated in trend studies. That is, collective (global) significance of trends is ignored. We compare three approaches to evaluating collective statistical significance of trends at a network of sites, using the following statistics: (i) the number of successful local tests (a successful test means here a test in which the null hypothesis of no trend is rejected); this is a standard way of assessing collective significance in various applications in atmospheric sciences; (ii) the smallest p-value among the local tests (Walker test); and (iii) the counts of positive and negative trends regardless of their magnitudes and local significance. The third approach is a new procedure that we propose; the rationale behind it is that it is reasonable to assume that the prevalence of one sign of trends at individual sites is indicative of a high confidence in the trend not being zero, regardless of the (in)significance of individual local trends. A potentially large amount of information contained in trends that are not locally significant, which are typically deemed irrelevant and neglected, is thus not lost and is retained in the analysis. In this contribution we examine the feasibility of the proposed way of significance testing on synthetic data, produced by a multi-site stochastic generator, and compare it with the two other ways of assessing collective significance, which are well established now. The synthetic dataset, mimicking annual mean temperature on an array of stations (or gridpoints), is constructed assuming a given statistical structure characterized by (i) spatial separation (density of the station network), (ii) local variance, (iii) temporal and spatial

  1. The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research.

    PubMed

    Amrhein, Valentin; Korner-Nievergelt, Fränzi; Roth, Tobias

    2017-01-01

    The widespread use of 'statistical significance' as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (according to the American Statistical Association). We review why degrading p -values into 'significant' and 'nonsignificant' contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p -values at face value, but mistrust results with larger p -values. In either case, p -values tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance ( p  ≤ 0.05) is hardly replicable: at a good statistical power of 80%, two studies will be 'conflicting', meaning that one is significant and the other is not, in one third of the cases if there is a true effect. A replication can therefore not be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgment based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to selective reporting and to publication bias against nonsignificant findings. Data dredging, p -hacking, and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p -values should be interpreted as graded measures of the strength of evidence against the null hypothesis

  2. Outcome of temporal lobe epilepsy surgery predicted by statistical parametric PET imaging.

    PubMed

    Wong, C Y; Geller, E B; Chen, E Q; MacIntyre, W J; Morris, H H; Raja, S; Saha, G B; Lüders, H O; Cook, S A; Go, R T

    1996-07-01

    PET is useful in the presurgical evaluation of temporal lobe epilepsy. The purpose of this retrospective study is to assess the clinical use of statistical parametric imaging in predicting surgical outcome. Interictal 18FDG-PET scans in 17 patients with surgically-treated temporal lobe epilepsy (Group A-13 seizure-free, group B = 4 not seizure-free at 6 mo) were transformed into statistical parametric imaging, with each pixel representing a z-score value by using the mean and s.d. of count distribution in each individual patient, for both visual and quantitative analysis. Mean z-scores were significantly more negative in anterolateral (AL) and mesial (M) regions on the operated side than the nonoperated side in group A (AL: p < 0.00005, M: p = 0.0097), but not in group B (AL: p = 0.46, M: p = 0.08). Statistical parametric imaging correctly lateralized 16 out of 17 patients. Only the AL region, however, was significant in predicting surgical outcome (F = 29.03, p < 0.00005). Using a cut-off z-score value of -1.5, statistical parametric imaging correctly classified 92% of temporal lobes from group A and 88% of those from Group B. The preliminary results indicate that statistical parametric imaging provides both clinically useful information for lateralization in temporal lobe epilepsy and a reliable predictive indicator of clinical outcome following surgical treatment.

  3. RAId_DbS: Peptide Identification using Database Searches with Realistic Statistics

    PubMed Central

    Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo

    2007-01-01

    Background The key to mass-spectrometry-based proteomics is peptide identification. A major challenge in peptide identification is to obtain realistic E-values when assigning statistical significance to candidate peptides. Results Using a simple scoring scheme, we propose a database search method with theoretically characterized statistics. Taking into account possible skewness in the random variable distribution and the effect of finite sampling, we provide a theoretical derivation for the tail of the score distribution. For every experimental spectrum examined, we collect the scores of peptides in the database, and find good agreement between the collected score statistics and our theoretical distribution. Using Student's t-tests, we quantify the degree of agreement between the theoretical distribution and the score statistics collected. The T-tests may be used to measure the reliability of reported statistics. When combined with reported P-value for a peptide hit using a score distribution model, this new measure prevents exaggerated statistics. Another feature of RAId_DbS is its capability of detecting multiple co-eluted peptides. The peptide identification performance and statistical accuracy of RAId_DbS are assessed and compared with several other search tools. The executables and data related to RAId_DbS are freely available upon request. PMID:17961253

  4. Statistical analysis for validating ACO-KNN algorithm as feature selection in sentiment analysis

    NASA Astrophysics Data System (ADS)

    Ahmad, Siti Rohaidah; Yusop, Nurhafizah Moziyana Mohd; Bakar, Azuraliza Abu; Yaakub, Mohd Ridzwan

    2017-10-01

    This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbor (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, IG-GA, and IG-RSAR algorithms. This study evaluated the performance of the ACO-KNN algorithm using precision, recall, and F-score, which were validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data.

  5. Research design and statistical methods in Pakistan Journal of Medical Sciences (PJMS)

    PubMed Central

    Akhtar, Sohail; Shah, Syed Wadood Ali; Rafiq, M.; Khan, Ajmal

    2016-01-01

    Objective: This article compares the study design and statistical methods used in 2005, 2010 and 2015 of Pakistan Journal of Medical Sciences (PJMS). Methods: Only original articles of PJMS were considered for the analysis. The articles were carefully reviewed for statistical methods and designs, and then recorded accordingly. The frequency of each statistical method and research design was estimated and compared with previous years. Results: A total of 429 articles were evaluated (n=74 in 2005, n=179 in 2010, n=176 in 2015) in which 171 (40%) were cross-sectional and 116 (27%) were prospective study designs. A verity of statistical methods were found in the analysis. The most frequent methods include: descriptive statistics (n=315, 73.4%), chi-square/Fisher’s exact tests (n=205, 47.8%) and student t-test (n=186, 43.4%). There was a significant increase in the use of statistical methods over time period: t-test, chi-square/Fisher’s exact test, logistic regression, epidemiological statistics, and non-parametric tests. Conclusion: This study shows that a diverse variety of statistical methods have been used in the research articles of PJMS and frequency improved from 2005 to 2015. However, descriptive statistics was the most frequent method of statistical analysis in the published articles while cross-sectional study design was common study design. PMID:27022365

  6. Statistics in the pharmacy literature.

    PubMed

    Lee, Charlene M; Soin, Herpreet K; Einarson, Thomas R

    2004-09-01

    Research in statistical methods is essential for maintenance of high quality of the published literature. To update previous reports of the types and frequencies of statistical terms and procedures in research studies of selected professional pharmacy journals. We obtained all research articles published in 2001 in 6 journals: American Journal of Health-System Pharmacy, The Annals of Pharmacotherapy, Canadian Journal of Hospital Pharmacy, Formulary, Hospital Pharmacy, and Journal of the American Pharmaceutical Association. Two independent reviewers identified and recorded descriptive and inferential statistical terms/procedures found in the methods, results, and discussion sections of each article. Results were determined by tallying the total number of times, as well as the percentage, that each statistical term or procedure appeared in the articles. One hundred forty-four articles were included. Ninety-eight percent employed descriptive statistics; of these, 28% used only descriptive statistics. The most common descriptive statistical terms were percentage (90%), mean (74%), standard deviation (58%), and range (46%). Sixty-nine percent of the articles used inferential statistics, the most frequent being chi(2) (33%), Student's t-test (26%), Pearson's correlation coefficient r (18%), ANOVA (14%), and logistic regression (11%). Statistical terms and procedures were found in nearly all of the research articles published in pharmacy journals. Thus, pharmacy education should aim to provide current and future pharmacists with an understanding of the common statistical terms and procedures identified to facilitate the appropriate appraisal and consequential utilization of the information available in research articles.

  7. Kepler Planet Detection Metrics: Statistical Bootstrap Test

    NASA Technical Reports Server (NTRS)

    Jenkins, Jon M.; Burke, Christopher J.

    2016-01-01

    This document describes the data produced by the Statistical Bootstrap Test over the final three Threshold Crossing Event (TCE) deliveries to NExScI: SOC 9.1 (Q1Q16)1 (Tenenbaum et al. 2014), SOC 9.2 (Q1Q17) aka DR242 (Seader et al. 2015), and SOC 9.3 (Q1Q17) aka DR253 (Twicken et al. 2016). The last few years have seen significant improvements in the SOC science data processing pipeline, leading to higher quality light curves and more sensitive transit searches. The statistical bootstrap analysis results presented here and the numerical results archived at NASAs Exoplanet Science Institute (NExScI) bear witness to these software improvements. This document attempts to introduce and describe the main features and differences between these three data sets as a consequence of the software changes.

  8. Students' Perceptions of Statistics: An Exploration of Attitudes, Conceptualizations, and Content Knowledge of Statistics

    ERIC Educational Resources Information Center

    Bond, Marjorie E.; Perkins, Susan N.; Ramirez, Caroline

    2012-01-01

    Although statistics education research has focused on students' learning and conceptual understanding of statistics, researchers have only recently begun investigating students' perceptions of statistics. The term perception describes the overlap between cognitive and non-cognitive factors. In this mixed-methods study, undergraduate students…

  9. Significance of specificity of Tinetti B-POMA test and fall risk factor in third age of life.

    PubMed

    Avdić, Dijana; Pecar, Dzemal

    2006-02-01

    As for the third age, psychophysical abilities of humans gradually decrease, while the ability of adaptation to endogenous and exogenous burdens is going down. In 1987, "Harada" et al. (1) have found out that 9.5 million persons in USA have difficulties running daily activities, while 59% of them (which is 5.6 million) are older than 65 years in age. The study has encompassed 77 questioned persons of both sexes with their average age 71.73 +/- 5.63 (scope of 65-90 years in age), chosen by random sampling. Each patient has been questioned in his/her own home and familiar to great extent with the methodology and aims of the questionnaire. Percentage of questioned women was 64.94% (50 patients) while the percentage for men was 35.06% (27 patients). As for the value of risk factor score achieved conducting the questionnaire and B-POMA test, there are statistically significant differences between men and women, as well as between patients who fell and those who never did. As for the way of life (alone or in the community), there are no significant statistical differences. Average results gained through B-POMA test in this study are statistically significantly higher in men and patients who did not provide data about falling, while there was no statistically significant difference in the way of life. In relation to the percentage of maximum number of positive answers to particular questions, regarding gender, way of life and the data about falling, there were no statistically significant differences between the value of B-POMA test and the risk factor score (the questionnaire).

  10. Deconstructing Statistical Analysis

    ERIC Educational Resources Information Center

    Snell, Joel

    2014-01-01

    Using a very complex statistical analysis and research method for the sake of enhancing the prestige of an article or making a new product or service legitimate needs to be monitored and questioned for accuracy. 1) The more complicated the statistical analysis, and research the fewer the number of learned readers can understand it. This adds a…

  11. Statistical theory of dynamo

    NASA Astrophysics Data System (ADS)

    Kim, E.; Newton, A. P.

    2012-04-01

    One major problem in dynamo theory is the multi-scale nature of the MHD turbulence, which requires statistical theory in terms of probability distribution functions. In this contribution, we present the statistical theory of magnetic fields in a simplified mean field α-Ω dynamo model by varying the statistical property of alpha, including marginal stability and intermittency, and then utilize observational data of solar activity to fine-tune the mean field dynamo model. Specifically, we first present a comprehensive investigation into the effect of the stochastic parameters in a simplified α-Ω dynamo model. Through considering the manifold of marginal stability (the region of parameter space where the mean growth rate is zero), we show that stochastic fluctuations are conductive to dynamo. Furthermore, by considering the cases of fluctuating alpha that are periodic and Gaussian coloured random noise with identical characteristic time-scales and fluctuating amplitudes, we show that the transition to dynamo is significantly facilitated for stochastic alpha with random noise. Furthermore, we show that probability density functions (PDFs) of the growth-rate, magnetic field and magnetic energy can provide a wealth of useful information regarding the dynamo behaviour/intermittency. Finally, the precise statistical property of the dynamo such as temporal correlation and fluctuating amplitude is found to be dependent on the distribution the fluctuations of stochastic parameters. We then use observations of solar activity to constrain parameters relating to the effect in stochastic α-Ω nonlinear dynamo models. This is achieved through performing a comprehensive statistical comparison by computing PDFs of solar activity from observations and from our simulation of mean field dynamo model. The observational data that are used are the time history of solar activity inferred for C14 data in the past 11000 years on a long time scale and direct observations of the sun spot

  12. The choice of statistical methods for comparisons of dosimetric data in radiotherapy.

    PubMed

    Chaikh, Abdulhamid; Giraud, Jean-Yves; Perrin, Emmanuel; Bresciani, Jean-Pierre; Balosso, Jacques

    2014-09-18

    Novel irradiation techniques are continuously introduced in radiotherapy to optimize the accuracy, the security and the clinical outcome of treatments. These changes could raise the question of discontinuity in dosimetric presentation and the subsequent need for practice adjustments in case of significant modifications. This study proposes a comprehensive approach to compare different techniques and tests whether their respective dose calculation algorithms give rise to statistically significant differences in the treatment doses for the patient. Statistical investigation principles are presented in the framework of a clinical example based on 62 fields of radiotherapy for lung cancer. The delivered doses in monitor units were calculated using three different dose calculation methods: the reference method accounts the dose without tissues density corrections using Pencil Beam Convolution (PBC) algorithm, whereas new methods calculate the dose with tissues density correction for 1D and 3D using Modified Batho (MB) method and Equivalent Tissue air ratio (ETAR) method, respectively. The normality of the data and the homogeneity of variance between groups were tested using Shapiro-Wilks and Levene test, respectively, then non-parametric statistical tests were performed. Specifically, the dose means estimated by the different calculation methods were compared using Friedman's test and Wilcoxon signed-rank test. In addition, the correlation between the doses calculated by the three methods was assessed using Spearman's rank and Kendall's rank tests. The Friedman's test showed a significant effect on the calculation method for the delivered dose of lung cancer patients (p <0.001). The density correction methods yielded to lower doses as compared to PBC by on average (-5 ± 4.4 SD) for MB and (-4.7 ± 5 SD) for ETAR. Post-hoc Wilcoxon signed-rank test of paired comparisons indicated that the delivered dose was significantly reduced using density-corrected methods

  13. Comparing geological and statistical approaches for element selection in sediment tracing research

    NASA Astrophysics Data System (ADS)

    Laceby, J. Patrick; McMahon, Joe; Evrard, Olivier; Olley, Jon

    2015-04-01

    Elevated suspended sediment loads reduce reservoir capacity and significantly increase the cost of operating water treatment infrastructure, making the management of sediment supply to reservoirs of increasingly importance. Sediment fingerprinting techniques can be used to determine the relative contributions of different sources of sediment accumulating in reservoirs. The objective of this research is to compare geological and statistical approaches to element selection for sediment fingerprinting modelling. Time-integrated samplers (n=45) were used to obtain source samples from four major subcatchments flowing into the Baroon Pocket Dam in South East Queensland, Australia. The geochemistry of potential sources were compared to the geochemistry of sediment cores (n=12) sampled in the reservoir. The geochemical approach selected elements for modelling that provided expected, observed and statistical discrimination between sediment sources. Two statistical approaches selected elements for modelling with the Kruskal-Wallis H-test and Discriminatory Function Analysis (DFA). In particular, two different significance levels (0.05 & 0.35) for the DFA were included to investigate the importance of element selection on modelling results. A distribution model determined the relative contributions of different sources to sediment sampled in the Baroon Pocket Dam. Elemental discrimination was expected between one subcatchment (Obi Obi Creek) and the remaining subcatchments (Lexys, Falls and Bridge Creek). Six major elements were expected to provide discrimination. Of these six, only Fe2O3 and SiO2 provided expected, observed and statistical discrimination. Modelling results with this geological approach indicated 36% (+/- 9%) of sediment sampled in the reservoir cores were from mafic-derived sources and 64% (+/- 9%) were from felsic-derived sources. The geological and the first statistical approach (DFA0.05) differed by only 1% (σ 5%) for 5 out of 6 model groupings with only

  14. Powerful Statistical Inference for Nested Data Using Sufficient Summary Statistics

    PubMed Central

    Dowding, Irene; Haufe, Stefan

    2018-01-01

    Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This “naive” approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment. PMID:29615885

  15. Statistical physics inspired energy-efficient coded-modulation for optical communications.

    PubMed

    Djordjevic, Ivan B; Xu, Lei; Wang, Ting

    2012-04-15

    Because Shannon's entropy can be obtained by Stirling's approximation of thermodynamics entropy, the statistical physics energy minimization methods are directly applicable to the signal constellation design. We demonstrate that statistical physics inspired energy-efficient (EE) signal constellation designs, in combination with large-girth low-density parity-check (LDPC) codes, significantly outperform conventional LDPC-coded polarization-division multiplexed quadrature amplitude modulation schemes. We also describe an EE signal constellation design algorithm. Finally, we propose the discrete-time implementation of D-dimensional transceiver and corresponding EE polarization-division multiplexed system. © 2012 Optical Society of America

  16. Transportation statistics annual report 1995

    DOT National Transportation Integrated Search

    1995-01-01

    The summary of transportation statistics : programs and many of the tables and : graphs pioneered in last years Transportation : Statistics Annual Report have : been incorporated into the companion volume, : National Transportation Statistics. The...

  17. Experimental statistics for biological sciences.

    PubMed

    Bang, Heejung; Davidian, Marie

    2010-01-01

    In this chapter, we cover basic and fundamental principles and methods in statistics - from "What are Data and Statistics?" to "ANOVA and linear regression," which are the basis of any statistical thinking and undertaking. Readers can easily find the selected topics in most introductory statistics textbooks, but we have tried to assemble and structure them in a succinct and reader-friendly manner in a stand-alone chapter. This text has long been used in real classroom settings for both undergraduate and graduate students who do or do not major in statistical sciences. We hope that from this chapter, readers would understand the key statistical concepts and terminologies, how to design a study (experimental or observational), how to analyze the data (e.g., describe the data and/or estimate the parameter(s) and make inference), and how to interpret the results. This text would be most useful if it is used as a supplemental material, while the readers take their own statistical courses or it would serve as a great reference text associated with a manual for any statistical software as a self-teaching guide.

  18. Practical statistics in pain research.

    PubMed

    Kim, Tae Kyun

    2017-10-01

    Pain is subjective, while statistics related to pain research are objective. This review was written to help researchers involved in pain research make statistical decisions. The main issues are related with the level of scales that are often used in pain research, the choice of statistical methods between parametric or nonparametric statistics, and problems which arise from repeated measurements. In the field of pain research, parametric statistics used to be applied in an erroneous way. This is closely related with the scales of data and repeated measurements. The level of scales includes nominal, ordinal, interval, and ratio scales. The level of scales affects the choice of statistics between parametric or non-parametric methods. In the field of pain research, the most frequently used pain assessment scale is the ordinal scale, which would include the visual analogue scale (VAS). There used to be another view, however, which considered the VAS to be an interval or ratio scale, so that the usage of parametric statistics would be accepted practically in some cases. Repeated measurements of the same subjects always complicates statistics. It means that measurements inevitably have correlations between each other, and would preclude the application of one-way ANOVA in which independence between the measurements is necessary. Repeated measures of ANOVA (RMANOVA), however, would permit the comparison between the correlated measurements as long as the condition of sphericity assumption is satisfied. Conclusively, parametric statistical methods should be used only when the assumptions of parametric statistics, such as normality and sphericity, are established.

  19. Statistical validation of normal tissue complication probability models.

    PubMed

    Xu, Cheng-Jian; van der Schaaf, Arjen; Van't Veld, Aart A; Langendijk, Johannes A; Schilstra, Cornelis

    2012-09-01

    To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. A penalized regression method, LASSO (least absolute shrinkage and selection operator), was used to build NTCP models for xerostomia after radiation therapy treatment of head-and-neck cancer. Model assessment was based on the likelihood function and the area under the receiver operating characteristic curve. Repeated double cross-validation showed the uncertainty and instability of the NTCP models and indicated that the statistical significance of model performance can be obtained by permutation testing. Repeated double cross-validation and permutation tests are recommended to validate NTCP models before clinical use. Copyright © 2012 Elsevier Inc. All rights reserved.

  20. Alternative Derivations of the Statistical Mechanical Distribution Laws

    PubMed Central

    Wall, Frederick T.

    1971-01-01

    A new approach is presented for the derivation of statistical mechanical distribution laws. The derivations are accomplished by minimizing the Helmholtz free energy under constant temperature and volume, instead of maximizing the entropy under constant energy and volume. An alternative method involves stipulating equality of chemical potential, or equality of activity, for particles in different energy levels. This approach leads to a general statement of distribution laws applicable to all systems for which thermodynamic probabilities can be written. The methods also avoid use of the calculus of variations, Lagrangian multipliers, and Stirling's approximation for the factorial. The results are applied specifically to Boltzmann, Fermi-Dirac, and Bose-Einstein statistics. The special significance of chemical potential and activity is discussed for microscopic systems. PMID:16578712

  1. Alternative derivations of the statistical mechanical distribution laws.

    PubMed

    Wall, F T

    1971-08-01

    A new approach is presented for the derivation of statistical mechanical distribution laws. The derivations are accomplished by minimizing the Helmholtz free energy under constant temperature and volume, instead of maximizing the entropy under constant energy and volume. An alternative method involves stipulating equality of chemical potential, or equality of activity, for particles in different energy levels. This approach leads to a general statement of distribution laws applicable to all systems for which thermodynamic probabilities can be written. The methods also avoid use of the calculus of variations, Lagrangian multipliers, and Stirling's approximation for the factorial. The results are applied specifically to Boltzmann, Fermi-Dirac, and Bose-Einstein statistics. The special significance of chemical potential and activity is discussed for microscopic systems.

  2. Statistics at the Chinese Universities.

    DTIC Science & Technology

    1981-09-01

    education in China in the postwar years is pro- vided to give some perspective. My observa- tions on statistics at the Chinese universities are necessarily...has been accepted as a member society of ISI. 3. Education in China Understanding of statistics in universities in China will be enhanced through some...programaming), Statistical Mathematics (infer- ence, data analysis, industrial statistics , information theory), tiathematical Physics (dif- ferential

  3. Significance testing of rules in rule-based models of human problem solving

    NASA Technical Reports Server (NTRS)

    Lewis, C. M.; Hammer, J. M.

    1986-01-01

    Rule-based models of human problem solving have typically not been tested for statistical significance. Three methods of testing rules - analysis of variance, randomization, and contingency tables - are presented. Advantages and disadvantages of the methods are also described.

  4. The use and misuse of statistical analyses. [in geophysics and space physics

    NASA Technical Reports Server (NTRS)

    Reiff, P. H.

    1983-01-01

    The statistical techniques most often used in space physics include Fourier analysis, linear correlation, auto- and cross-correlation, power spectral density, and superposed epoch analysis. Tests are presented which can evaluate the significance of the results obtained through each of these. Data presented without some form of error analysis are frequently useless, since they offer no way of assessing whether a bump on a spectrum or on a superposed epoch analysis is real or merely a statistical fluctuation. Among many of the published linear correlations, for instance, the uncertainty in the intercept and slope is not given, so that the significance of the fitted parameters cannot be assessed.

  5. Statistical correlations in an ideal gas of particles obeying fractional exclusion statistics.

    PubMed

    Pellegrino, F M D; Angilella, G G N; March, N H; Pucci, R

    2007-12-01

    After a brief discussion of the concepts of fractional exchange and fractional exclusion statistics, we report partly analytical and partly numerical results on thermodynamic properties of assemblies of particles obeying fractional exclusion statistics. The effect of dimensionality is one focal point, the ratio mu/k_(B)T of chemical potential to thermal energy being obtained numerically as a function of a scaled particle density. Pair correlation functions are also presented as a function of the statistical parameter, with Friedel oscillations developing close to the fermion limit, for sufficiently large density.

  6. [Again review of research design and statistical methods of Chinese Journal of Cardiology].

    PubMed

    Kong, Qun-yu; Yu, Jin-ming; Jia, Gong-xian; Lin, Fan-li

    2012-11-01

    To re-evaluate and compare the research design and the use of statistical methods in Chinese Journal of Cardiology. Summary the research design and statistical methods in all of the original papers in Chinese Journal of Cardiology all over the year of 2011, and compared the result with the evaluation of 2008. (1) There is no difference in the distribution of the design of researches of between the two volumes. Compared with the early volume, the use of survival regression and non-parameter test are increased, while decreased in the proportion of articles with no statistical analysis. (2) The proportions of articles in the later volume are significant lower than the former, such as 6(4%) with flaws in designs, 5(3%) with flaws in the expressions, 9(5%) with the incomplete of analysis. (3) The rate of correction of variance analysis has been increased, so as the multi-group comparisons and the test of normality. The error rate of usage has been decreased form 17% to 25% without significance in statistics due to the ignorance of the test of homogeneity of variance. Many improvements showed in Chinese Journal of Cardiology such as the regulation of the design and statistics. The homogeneity of variance should be paid more attention in the further application.

  7. Statistics for nuclear engineers and scientists. Part 1. Basic statistical inference

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Beggs, W.J.

    1981-02-01

    This report is intended for the use of engineers and scientists working in the nuclear industry, especially at the Bettis Atomic Power Laboratory. It serves as the basis for several Bettis in-house statistics courses. The objectives of the report are to introduce the reader to the language and concepts of statistics and to provide a basic set of techniques to apply to problems of the collection and analysis of data. Part 1 covers subjects of basic inference. The subjects include: descriptive statistics; probability; simple inference for normally distributed populations, and for non-normal populations as well; comparison of two populations; themore » analysis of variance; quality control procedures; and linear regression analysis.« less

  8. Transportation statistics annual report 2004

    DOT National Transportation Integrated Search

    2004-09-01

    In this edition of the Transportation Statistics Annual Report, the : Bureau of Transportation Statistics (BTS) focuses on transportation : indicators related to 15 specific topics (chapter 2) and on the state of : transportation statistics (chapter ...

  9. Multi-trait analysis of genome-wide association summary statistics using MTAG.

    PubMed

    Turley, Patrick; Walters, Raymond K; Maghzian, Omeed; Okbay, Aysu; Lee, James J; Fontana, Mark Alan; Nguyen-Viet, Tuan Anh; Wedow, Robbee; Zacher, Meghan; Furlotte, Nicholas A; Magnusson, Patrik; Oskarsson, Sven; Johannesson, Magnus; Visscher, Peter M; Laibson, David; Cesarini, David; Neale, Benjamin M; Benjamin, Daniel J

    2018-02-01

    We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (N eff  = 354,862), neuroticism (N = 168,105), and subjective well-being (N = 388,538). As compared to the 32, 9, and 13 genome-wide significant loci identified in the single-trait GWAS (most of which are themselves novel), MTAG increases the number of associated loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.

  10. Selected Streamflow Statistics and Regression Equations for Predicting Statistics at Stream Locations in Monroe County, Pennsylvania

    USGS Publications Warehouse

    Thompson, Ronald E.; Hoffman, Scott A.

    2006-01-01

    A suite of 28 streamflow statistics, ranging from extreme low to high flows, was computed for 17 continuous-record streamflow-gaging stations and predicted for 20 partial-record stations in Monroe County and contiguous counties in north-eastern Pennsylvania. The predicted statistics for the partial-record stations were based on regression analyses relating inter-mittent flow measurements made at the partial-record stations indexed to concurrent daily mean flows at continuous-record stations during base-flow conditions. The same statistics also were predicted for 134 ungaged stream locations in Monroe County on the basis of regression analyses relating the statistics to GIS-determined basin characteristics for the continuous-record station drainage areas. The prediction methodology for developing the regression equations used to estimate statistics was developed for estimating low-flow frequencies. This study and a companion study found that the methodology also has application potential for predicting intermediate- and high-flow statistics. The statistics included mean monthly flows, mean annual flow, 7-day low flows for three recurrence intervals, nine flow durations, mean annual base flow, and annual mean base flows for two recurrence intervals. Low standard errors of prediction and high coefficients of determination (R2) indicated good results in using the regression equations to predict the statistics. Regression equations for the larger flow statistics tended to have lower standard errors of prediction and higher coefficients of determination (R2) than equations for the smaller flow statistics. The report discusses the methodologies used in determining the statistics and the limitations of the statistics and the equations used to predict the statistics. Caution is indicated in using the predicted statistics for small drainage area situations. Study results constitute input needed by water-resource managers in Monroe County for planning purposes and evaluation

  11. General Aviation Avionics Statistics : 1975

    DOT National Transportation Integrated Search

    1978-06-01

    This report presents avionics statistics for the 1975 general aviation (GA) aircraft fleet and updates a previous publication, General Aviation Avionics Statistics: 1974. The statistics are presented in a capability group framework which enables one ...

  12. OSPAR standard method and software for statistical analysis of beach litter data.

    PubMed

    Schulz, Marcus; van Loon, Willem; Fleet, David M; Baggelaar, Paul; van der Meulen, Eit

    2017-09-15

    The aim of this study is to develop standard statistical methods and software for the analysis of beach litter data. The optimal ensemble of statistical methods comprises the Mann-Kendall trend test, the Theil-Sen slope estimation, the Wilcoxon step trend test and basic descriptive statistics. The application of Litter Analyst, a tailor-made software for analysing the results of beach litter surveys, to OSPAR beach litter data from seven beaches bordering on the south-eastern North Sea, revealed 23 significant trends in the abundances of beach litter types for the period 2009-2014. Litter Analyst revealed a large variation in the abundance of litter types between beaches. To reduce the effects of spatial variation, trend analysis of beach litter data can most effectively be performed at the beach or national level. Spatial aggregation of beach litter data within a region is possible, but resulted in a considerable reduction in the number of significant trends. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Fisher statistics for analysis of diffusion tensor directional information.

    PubMed

    Hutchinson, Elizabeth B; Rutecki, Paul A; Alexander, Andrew L; Sutula, Thomas P

    2012-04-30

    A statistical approach is presented for the quantitative analysis of diffusion tensor imaging (DTI) directional information using Fisher statistics, which were originally developed for the analysis of vectors in the field of paleomagnetism. In this framework, descriptive and inferential statistics have been formulated based on the Fisher probability density function, a spherical analogue of the normal distribution. The Fisher approach was evaluated for investigation of rat brain DTI maps to characterize tissue orientation in the corpus callosum, fornix, and hilus of the dorsal hippocampal dentate gyrus, and to compare directional properties in these regions following status epilepticus (SE) or traumatic brain injury (TBI) with values in healthy brains. Direction vectors were determined for each region of interest (ROI) for each brain sample and Fisher statistics were applied to calculate the mean direction vector and variance parameters in the corpus callosum, fornix, and dentate gyrus of normal rats and rats that experienced TBI or SE. Hypothesis testing was performed by calculation of Watson's F-statistic and associated p-value giving the likelihood that grouped observations were from the same directional distribution. In the fornix and midline corpus callosum, no directional differences were detected between groups, however in the hilus, significant (p<0.0005) differences were found that robustly confirmed observations that were suggested by visual inspection of directionally encoded color DTI maps. The Fisher approach is a potentially useful analysis tool that may extend the current capabilities of DTI investigation by providing a means of statistical comparison of tissue structural orientation. Copyright © 2012 Elsevier B.V. All rights reserved.

  14. Insights into Corona Formation through Statistical Analyses

    NASA Technical Reports Server (NTRS)

    Glaze, L. S.; Stofan, E. R.; Smrekar, S. E.; Baloga, S. M.

    2002-01-01

    Statistical analysis of an expanded database of coronae on Venus indicates that the populations of Type 1 (with fracture annuli) and 2 (without fracture annuli) corona diameters are statistically indistinguishable, and therefore we have no basis for assuming different formation mechanisms. Analysis of the topography and diameters of coronae shows that coronae that are depressions, rimmed depressions, and domes tend to be significantly smaller than those that are plateaus, rimmed plateaus, or domes with surrounding rims. This is consistent with the model of Smrekar and Stofan and inconsistent with predictions of the spreading drop model of Koch and Manga. The diameter range for domes, the initial stage of corona formation, provides a broad constraint on the buoyancy of corona-forming plumes. Coronae are only slightly more likely to be topographically raised than depressions, with Type 1 coronae most frequently occurring as rimmed depressions and Type 2 coronae most frequently occuring with flat interiors and raised rims. Most Type 1 coronae are located along chasmata systems or fracture belts, while Type 2 coronas are found predominantly as isolated features in the plains. Coronae at hotspot rises tend to be significantly larger than coronae in other settings, consistent with a hotter upper mantle at hotspot rises and their active state.

  15. Statistical tests to compare motif count exceptionalities

    PubMed Central

    Robin, Stéphane; Schbath, Sophie; Vandewalle, Vincent

    2007-01-01

    Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use. PMID:17346349

  16. 20 CFR 634.4 - Statistical standards.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 20 Employees' Benefits 3 2011-04-01 2011-04-01 false Statistical standards. 634.4 Section 634.4... System § 634.4 Statistical standards. Recipients shall agree to provide required data following the statistical standards prescribed by the Bureau of Labor Statistics for cooperative statistical programs. ...

  17. 20 CFR 634.4 - Statistical standards.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 20 Employees' Benefits 3 2010-04-01 2010-04-01 false Statistical standards. 634.4 Section 634.4... System § 634.4 Statistical standards. Recipients shall agree to provide required data following the statistical standards prescribed by the Bureau of Labor Statistics for cooperative statistical programs. ...

  18. Assessing Attitudes towards Statistics among Medical Students: Psychometric Properties of the Serbian Version of the Survey of Attitudes Towards Statistics (SATS)

    PubMed Central

    Stanisavljevic, Dejana; Trajkovic, Goran; Marinkovic, Jelena; Bukumiric, Zoran; Cirkovic, Andja; Milic, Natasa

    2014-01-01

    Background Medical statistics has become important and relevant for future doctors, enabling them to practice evidence based medicine. Recent studies report that students’ attitudes towards statistics play an important role in their statistics achievements. The aim of the study was to test the psychometric properties of the Serbian version of the Survey of Attitudes Towards Statistics (SATS) in order to acquire a valid instrument to measure attitudes inside the Serbian educational context. Methods The validation study was performed on a cohort of 417 medical students who were enrolled in an obligatory introductory statistics course. The SATS adaptation was based on an internationally accepted methodology for translation and cultural adaptation. Psychometric properties of the Serbian version of the SATS were analyzed through the examination of factorial structure and internal consistency. Results Most medical students held positive attitudes towards statistics. The average total SATS score was above neutral (4.3±0.8), and varied from 1.9 to 6.2. Confirmatory factor analysis validated the six-factor structure of the questionnaire (Affect, Cognitive Competence, Value, Difficulty, Interest and Effort). Values for fit indices TLI (0.940) and CFI (0.961) were above the cut-off of ≥0.90. The RMSEA value of 0.064 (0.051–0.078) was below the suggested value of ≤0.08. Cronbach’s alpha of the entire scale was 0.90, indicating scale reliability. In a multivariate regression model, self-rating of ability in mathematics and current grade point average were significantly associated with the total SATS score after adjusting for age and gender. Conclusion Present study provided the evidence for the appropriate metric properties of the Serbian version of SATS. Confirmatory factor analysis validated the six-factor structure of the scale. The SATS might be reliable and a valid instrument for identifying medical students’ attitudes towards statistics in the Serbian

  19. Applications of spatial statistical network models to stream data

    Treesearch

    Daniel J. Isaak; Erin E. Peterson; Jay M. Ver Hoef; Seth J. Wenger; Jeffrey A. Falke; Christian E. Torgersen; Colin Sowder; E. Ashley Steel; Marie-Josee Fortin; Chris E. Jordan; Aaron S. Ruesch; Nicholas Som; Pascal Monestiez

    2014-01-01

    Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for...

  20. Thou Shalt Not Bear False Witness against Null Hypothesis Significance Testing

    ERIC Educational Resources Information Center

    García-Pérez, Miguel A.

    2017-01-01

    Null hypothesis significance testing (NHST) has been the subject of debate for decades and alternative approaches to data analysis have been proposed. This article addresses this debate from the perspective of scientific inquiry and inference. Inference is an inverse problem and application of statistical methods cannot reveal whether effects…

  1. Statistical analysis and digital processing of the Mössbauer spectra

    NASA Astrophysics Data System (ADS)

    Prochazka, Roman; Tucek, Pavel; Tucek, Jiri; Marek, Jaroslav; Mashlan, Miroslav; Pechousek, Jiri

    2010-02-01

    This work is focused on using the statistical methods and development of the filtration procedures for signal processing in Mössbauer spectroscopy. Statistical tools for noise filtering in the measured spectra are used in many scientific areas. The use of a pure statistical approach in accumulated Mössbauer spectra filtration is described. In Mössbauer spectroscopy, the noise can be considered as a Poisson statistical process with a Gaussian distribution for high numbers of observations. This noise is a superposition of the non-resonant photons counting with electronic noise (from γ-ray detection and discrimination units), and the velocity system quality that can be characterized by the velocity nonlinearities. The possibility of a noise-reducing process using a new design of statistical filter procedure is described. This mathematical procedure improves the signal-to-noise ratio and thus makes it easier to determine the hyperfine parameters of the given Mössbauer spectra. The filter procedure is based on a periodogram method that makes it possible to assign the statistically important components in the spectral domain. The significance level for these components is then feedback-controlled using the correlation coefficient test results. The estimation of the theoretical correlation coefficient level which corresponds to the spectrum resolution is performed. Correlation coefficient test is based on comparison of the theoretical and the experimental correlation coefficients given by the Spearman method. The correctness of this solution was analyzed by a series of statistical tests and confirmed by many spectra measured with increasing statistical quality for a given sample (absorber). The effect of this filter procedure depends on the signal-to-noise ratio and the applicability of this method has binding conditions.

  2. Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution.

    PubMed

    Gangnon, Ronald E

    2012-03-01

    The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, whereas rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. © 2011, The International Biometric Society.

  3. Statistical representation of multiphase flow

    NASA Astrophysics Data System (ADS)

    Subramaniam

    2000-11-01

    The relationship between two common statistical representations of multiphase flow, namely, the single--point Eulerian statistical representation of two--phase flow (D. A. Drew, Ann. Rev. Fluid Mech. (15), 1983), and the Lagrangian statistical representation of a spray using the dropet distribution function (F. A. Williams, Phys. Fluids 1 (6), 1958) is established for spherical dispersed--phase elements. This relationship is based on recent work which relates the droplet distribution function to single--droplet pdfs starting from a Liouville description of a spray (Subramaniam, Phys. Fluids 10 (12), 2000). The Eulerian representation, which is based on a random--field model of the flow, is shown to contain different statistical information from the Lagrangian representation, which is based on a point--process model. The two descriptions are shown to be simply related for spherical, monodisperse elements in statistically homogeneous two--phase flow, whereas such a simple relationship is precluded by the inclusion of polydispersity and statistical inhomogeneity. The common origin of these two representations is traced to a more fundamental statistical representation of a multiphase flow, whose concepts derive from a theory for dense sprays recently proposed by Edwards (Atomization and Sprays 10 (3--5), 2000). The issue of what constitutes a minimally complete statistical representation of a multiphase flow is resolved.

  4. Organ donation and transplantation statistics in Belgium for 2012 and 2013.

    PubMed

    Desschans, B; Evrard, P

    2014-11-01

    The 2012 and 2013 solid organ transplantation statistics were presented during the annual meeting of the Belgian Transplant Society. All data presented were collected from Eurotransplant International Foundation and/or from all individual Belgian transplant centers. It was demonstrated that the highest number of deceased donors detected (1310) from which 47.8% were an effective organ donor that corresponded to 29 per million inhabitants (pmi) in 2012 and 27.4 pmi in 2013. Out of 626 effective deceased organ donors, 491 (79%) were donors after brain death (DBD) and 135 (21%) donors after circulatory death (DCD), respectively. The majority (125/135; 93%) of DCD donors were DCD Maastricht category III donors and there were 7 (5%) donations following euthanasia. Family refusal tended to be lower for DCD (10.4%) compared to DBD donors (13.4%). Despite the increasing DCD donation rate, DBD donation remains stable in Belgium. The donor age is still increasing, reaching a median age of 53 years (range 0-90). Spontaneous intracranial bleeding (39.3%) and cranio-cerebral trauma (25%) remained the most frequent reasons of death. The number of living related kidney transplantations (57 in 2012 and 63 in 2013) followed the international trend albeit in Belgium it is still very limited. Nevertheless this activity could explain that the number of patients waiting for kidney transplantation (770) reached an absolute minimum in 2013. Except the reduced waiting list for lung transplantation (from 119 patients in 2011 to 85 in 2013), the waiting list remained stable for the other organs but almost 200 patients still died while on the waiting list. Belgium demonstrated the highest number of effective organ donors that corresponded to 29 per million inhabitants (pmi) in 2012 and 27.4 pmi in 2013. Thus far, and in contrast with other countries, there is no erosion of DBD in the DCD donor organ pool, but it is the important responsibility of all transplant centers and donor hospitals to

  5. National transportation statistics 2011

    DOT National Transportation Integrated Search

    2011-04-01

    Compiled and published by the U.S. Department of Transportation's Bureau of Transportation Statistics : (BTS), National Transportation Statistics presents information on the U.S. transportation system, including : its physical components, safety reco...

  6. National transportation statistics 2005

    DOT National Transportation Integrated Search

    2005-12-01

    Compiled and published by the U.S. Department of Transportations Bureau of : Transportation Statistics (BTS), National Transportation Statistics 2004 presents : information on the U.S. transportation system, including its physical components, : sa...

  7. National transportation statistics 2006

    DOT National Transportation Integrated Search

    2006-12-01

    Compiled and published by the U.S. Department of Transportations Bureau of : Transportation Statistics (BTS), National Transportation Statistics 2006 presents : information on the U.S. transportation system, including its physical components, : sa...

  8. National Transportation Statistics 2009

    DOT National Transportation Integrated Search

    2010-01-21

    Compiled and published by the U.S. Department of Transportation's Bureau of Transportation Statistics (BTS), National Transportation Statistics presents information on the U.S. transportation system, including its physical components, safety record, ...

  9. National transportation statistics 2004

    DOT National Transportation Integrated Search

    2005-01-01

    Compiled and published by the U.S. Department of Transportations Bureau of : Transportation Statistics (BTS), National Transportation Statistics 2004 presents : information on the U.S. transportation system, including its physical components, : sa...

  10. National Transportation Statistics 2007

    DOT National Transportation Integrated Search

    2007-04-12

    Compiled and published by the U.S. Department of Transportations Bureau of Transportation Statistics (BTS), National Transportation Statistics presents information on the U.S. transportation system, including its physical components, safety record...

  11. National Transportation Statistics 2008

    DOT National Transportation Integrated Search

    2009-01-08

    Compiled and published by the U.S. Department of Transportations Bureau of Transportation Statistics (BTS), National Transportation Statistics presents information on the U.S. transportation system, including its physical components, safety record...

  12. Confidence intervals for effect sizes: compliance and clinical significance in the Journal of Consulting and clinical Psychology.

    PubMed

    Odgaard, Eric C; Fowler, Robert L

    2010-06-01

    In 2005, the Journal of Consulting and Clinical Psychology (JCCP) became the first American Psychological Association (APA) journal to require statistical measures of clinical significance, plus effect sizes (ESs) and associated confidence intervals (CIs), for primary outcomes (La Greca, 2005). As this represents the single largest editorial effort to improve statistical reporting practices in any APA journal in at least a decade, in this article we investigate the efficacy of that change. All intervention studies published in JCCP in 2003, 2004, 2007, and 2008 were reviewed. Each article was coded for method of clinical significance, type of ES, and type of associated CI, broken down by statistical test (F, t, chi-square, r/R(2), and multivariate modeling). By 2008, clinical significance compliance was 75% (up from 31%), with 94% of studies reporting some measure of ES (reporting improved for individual statistical tests ranging from eta(2) = .05 to .17, with reasonable CIs). Reporting of CIs for ESs also improved, although only to 40%. Also, the vast majority of reported CIs used approximations, which become progressively less accurate for smaller sample sizes and larger ESs (cf. Algina & Kessleman, 2003). Changes are near asymptote for ESs and clinical significance, but CIs lag behind. As CIs for ESs are required for primary outcomes, we show how to compute CIs for the vast majority of ESs reported in JCCP, with an example of how to use CIs for ESs as a method to assess clinical significance.

  13. General Aviation Avionics Statistics : 1976

    DOT National Transportation Integrated Search

    1979-11-01

    This report presents avionics statistics for the 1976 general aviation (GA) aircraft fleet and is the third in a series titled "General Aviation Avionics Statistics." The statistics are presented in a capability group framework which enables one to r...

  14. 2017 Annual Disability Statistics Supplement

    ERIC Educational Resources Information Center

    Lauer, E. A; Houtenville, A. J.

    2018-01-01

    The "Annual Disability Statistics Supplement" is a companion report to the "Annual Disability Statistics Compendium." The "Supplement" presents statistics on the same topics as the "Compendium," with additional categorizations by demographic characteristics including age, gender and race/ethnicity. In…

  15. A spatial scan statistic for multiple clusters.

    PubMed

    Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie

    2011-10-01

    Spatial scan statistics are commonly used for geographical disease surveillance and cluster detection. While there are multiple clusters coexisting in the study area, they become difficult to detect because of clusters' shadowing effect to each other. The recently proposed sequential method showed its better power for detecting the second weaker cluster, but did not improve the ability of detecting the first stronger cluster which is more important than the second one. We propose a new extension of the spatial scan statistic which could be used to detect multiple clusters. Through constructing two or more clusters in the alternative hypothesis, our proposed method accounts for other coexisting clusters in the detecting and evaluating process. The performance of the proposed method is compared to the sequential method through an intensive simulation study, in which our proposed method shows better power in terms of both rejecting the null hypothesis and accurately detecting the coexisting clusters. In the real study of hand-foot-mouth disease data in Pingdu city, a true cluster town is successfully detected by our proposed method, which cannot be evaluated to be statistically significant by the standard method due to another cluster's shadowing effect. Copyright © 2011 Elsevier Inc. All rights reserved.

  16. The (mis)reporting of statistical results in psychology journals.

    PubMed

    Bakker, Marjan; Wicherts, Jelte M

    2011-09-01

    In order to study the prevalence, nature (direction), and causes of reporting errors in psychology, we checked the consistency of reported test statistics, degrees of freedom, and p values in a random sample of high- and low-impact psychology journals. In a second study, we established the generality of reporting errors in a random sample of recent psychological articles. Our results, on the basis of 281 articles, indicate that around 18% of statistical results in the psychological literature are incorrectly reported. Inconsistencies were more common in low-impact journals than in high-impact journals. Moreover, around 15% of the articles contained at least one statistical conclusion that proved, upon recalculation, to be incorrect; that is, recalculation rendered the previously significant result insignificant, or vice versa. These errors were often in line with researchers' expectations. We classified the most common errors and contacted authors to shed light on the origins of the errors.

  17. Self-assessed performance improves statistical fusion of image labels

    PubMed Central

    Bryan, Frederick W.; Xu, Zhoubing; Asman, Andrew J.; Allen, Wade M.; Reich, Daniel S.; Landman, Bennett A.

    2014-01-01

    . Statistical fusion resulted in statistically indistinguishable performance from self-assessed weighted voting. The authors developed a new theoretical basis for using self-assessed performance in the framework of statistical fusion and demonstrated that the combined sources of information (both statistical assessment and self-assessment) yielded statistically significant improvement over the methods considered separately. Conclusions: The authors present the first systematic characterization of self-assessed performance in manual labeling. The authors demonstrate that self-assessment and statistical fusion yield similar, but complementary, benefits for label fusion. Finally, the authors present a new theoretical basis for combining self-assessments with statistical label fusion. PMID:24593721

  18. Prognostic significance of MCM 2 and Ki-67 in neuroblastic tumors in children.

    PubMed

    Lewandowska, Magdalena; Taran, Katarzyna; Sitkiewicz, Anna; Andrzejewska, Ewa

    2015-12-02

    Neuroblastic tumors can be characterized by three features: spontaneous regression, maturation and aggressive proliferation. The most common and routinely used method of assessing tumor cell proliferation is to determine the Ki-67 index in the tumor tissue. Despite numerous studies, neuroblastoma biology is not fully understood, which makes treatment results unsatisfactory. MCM 2 is a potential prognostic factor in the neuroblastoma group. The study is based on retrospective analysis of 35 patients treated for neuroblastic tumors in the Department of Pediatric Surgery and Oncology of the Medical University of Lodz, during the period 2001-2011. The material comprised tissues of 16 tumors excised during the operation and 19 biopsy specimens. Immunohistochemical examinations were performed with immunoperoxidase using mouse monoclonal anti-MCM 2 and anti-Ki-67 antibodies. We observed that MCM 2 expression ranged from 2% to 98% and the Ki-67 index ranged from 0 to 95%. There was a statistically significant correlation between expression of MCM 2 and the value of the Ki-67 index and a correlation close to statistical significance between expression of MCM 2 and unfavorable histopathology. There was no statistical relationship between expression of MCM 2 and age over 1 year and N-myc amplification. The presented research shows that MCM 2 may have prognostic significance in neuroblastic pediatric tumors and as a potential prognostic factor could be the starting point of new individualized therapy.

  19. National transportation statistics 2000

    DOT National Transportation Integrated Search

    2000-04-13

    Compiled and published by the Bureau of Transportation Statistics (BTS), U.S. Department of Transportation, National Transportation Statistics 2000 presents information on the U.S. transportation system, including its physical components, safety reco...

  20. National transportation statistics 2003

    DOT National Transportation Integrated Search

    2004-03-01

    Compiled and published by the Bureau of Transportation Statistics (BTS), U.S. : Department of Transportation, National Transportation Statistics 2002 presents : information on the U.S. transportation system1, including its physical components, : safe...