Sample records for additional statistical analyses

  1. Use of Statistical Analyses in the Ophthalmic Literature

    PubMed Central

    Lisboa, Renato; Meira-Freitas, Daniel; Tatham, Andrew J.; Marvasti, Amir H.; Sharpsten, Lucie; Medeiros, Felipe A.

    2014-01-01

    Purpose To identify the most commonly used statistical analyses in the ophthalmic literature and to determine the likely gain in comprehension of the literature that readers could expect if they were to sequentially add knowledge of more advanced techniques to their statistical repertoire. Design Cross-sectional study Methods All articles published from January 2012 to December 2012 in Ophthalmology, American Journal of Ophthalmology and Archives of Ophthalmology were reviewed. A total of 780 peer-reviewed articles were included. Two reviewers examined each article and assigned categories to each one depending on the type of statistical analyses used. Discrepancies between reviewers were resolved by consensus. Main Outcome Measures Total number and percentage of articles containing each category of statistical analysis were obtained. Additionally we estimated the accumulated number and percentage of articles that a reader would be expected to be able to interpret depending on their statistical repertoire. Results Readers with little or no statistical knowledge would be expected to be able to interpret the statistical methods presented in only 20.8% of articles. In order to understand more than half (51.4%) of the articles published, readers were expected to be familiar with at least 15 different statistical methods. Knowledge of 21 categories of statistical methods was necessary to comprehend 70.9% of articles, while knowledge of more than 29 categories was necessary to comprehend more than 90% of articles. Articles in retina and glaucoma subspecialties showed a tendency for using more complex analysis when compared to cornea. Conclusions Readers of clinical journals in ophthalmology need to have substantial knowledge of statistical methodology to understand the results of published studies in the literature. The frequency of use of complex statistical analyses also indicates that those involved in the editorial peer-review process must have sound statistical

  2. Improving phylogenetic analyses by incorporating additional information from genetic sequence databases.

    PubMed

    Liang, Li-Jung; Weiss, Robert E; Redelings, Benjamin; Suchard, Marc A

    2009-10-01

    Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model parameters by default. We build objective and informative priors using hierarchical random effect models that combine additional datasets whose parameters are not of direct interest but are similar to the analysis of interest. We propose principled statistical methods that permit more precise parameter estimates in phylogenetic analyses by creating informative priors for parameters of interest. Using additional sequence datasets from our lab or public databases, we construct a fully Bayesian semiparametric hierarchical model to combine datasets. A dynamic iteratively reweighted Markov chain Monte Carlo algorithm conveniently recycles posterior samples from the individual analyses. We demonstrate the value of our approach by examining the insertion-deletion (indel) process in the enolase gene across the Tree of Life using the phylogenetic software BALI-PHY; we incorporate prior information about indels from 82 curated alignments downloaded from the BAliBASE database.

  3. SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

    PubMed

    Chu, Annie; Cui, Jenny; Dinov, Ivo D

    2009-03-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most

  4. SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit

    PubMed Central

    Chu, Annie; Cui, Jenny; Dinov, Ivo D.

    2011-01-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most

  5. Statistical Selection of Biological Models for Genome-Wide Association Analyses.

    PubMed

    Bi, Wenjian; Kang, Guolian; Pounds, Stanley B

    2018-05-24

    Genome-wide association studies have discovered many biologically important associations of genes with phenotypes. Typically, genome-wide association analyses formally test the association of each genetic feature (SNP, CNV, etc) with the phenotype of interest and summarize the results with multiplicity-adjusted p-values. However, very small p-values only provide evidence against the null hypothesis of no association without indicating which biological model best explains the observed data. Correctly identifying a specific biological model may improve the scientific interpretation and can be used to more effectively select and design a follow-up validation study. Thus, statistical methodology to identify the correct biological model for a particular genotype-phenotype association can be very useful to investigators. Here, we propose a general statistical method to summarize how accurately each of five biological models (null, additive, dominant, recessive, co-dominant) represents the data observed for each variant in a GWAS study. We show that the new method stringently controls the false discovery rate and asymptotically selects the correct biological model. Simulations of two-stage discovery-validation studies show that the new method has these properties and that its validation power is similar to or exceeds that of simple methods that use the same statistical model for all SNPs. Example analyses of three data sets also highlight these advantages of the new method. An R package is freely available at www.stjuderesearch.org/site/depts/biostats/maew. Copyright © 2018. Published by Elsevier Inc.

  6. [Clinical research=design*measurements*statistical analyses].

    PubMed

    Furukawa, Toshiaki

    2012-06-01

    A clinical study must address true endpoints that matter for the patients and the doctors. A good clinical study starts with a good clinical question. Formulating a clinical question in the form of PECO can sharpen one's original question. In order to perform a good clinical study one must have a knowledge of study design, measurements and statistical analyses: The first is taught by epidemiology, the second by psychometrics and the third by biostatistics.

  7. Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

    PubMed Central

    Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy

    2015-01-01

    Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically

  8. The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth

    ERIC Educational Resources Information Center

    Steyvers, Mark; Tenenbaum, Joshua B.

    2005-01-01

    We present statistical analyses of the large-scale structure of 3 types of semantic networks: word associations, WordNet, and Roget's Thesaurus. We show that they have a small-world structure, characterized by sparse connectivity, short average path lengths between words, and strong local clustering. In addition, the distributions of the number of…

  9. Study Designs and Statistical Analyses for Biomarker Research

    PubMed Central

    Gosho, Masahiko; Nagashima, Kengo; Sato, Yasunori

    2012-01-01

    Biomarkers are becoming increasingly important for streamlining drug discovery and development. In addition, biomarkers are widely expected to be used as a tool for disease diagnosis, personalized medication, and surrogate endpoints in clinical research. In this paper, we highlight several important aspects related to study design and statistical analysis for clinical research incorporating biomarkers. We describe the typical and current study designs for exploring, detecting, and utilizing biomarkers. Furthermore, we introduce statistical issues such as confounding and multiplicity for statistical tests in biomarker research. PMID:23012528

  10. Statistical Data Analyses of Trace Chemical, Biochemical, and Physical Analytical Signatures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Udey, Ruth Norma

    Analytical and bioanalytical chemistry measurement results are most meaningful when interpreted using rigorous statistical treatments of the data. The same data set may provide many dimensions of information depending on the questions asked through the applied statistical methods. Three principal projects illustrated the wealth of information gained through the application of statistical data analyses to diverse problems.

  11. Statistical analyses of the relative risk.

    PubMed Central

    Gart, J J

    1979-01-01

    Let P1 be the probability of a disease in one population and P2 be the probability of a disease in a second population. The ratio of these quantities, R = P1/P2, is termed the relative risk. We consider first the analyses of the relative risk from retrospective studies. The relation between the relative risk and the odds ratio (or cross-product ratio) is developed. The odds ratio can be considered a parameter of an exponential model possessing sufficient statistics. This permits the development of exact significance tests and confidence intervals in the conditional space. Unconditional tests and intervals are also considered briefly. The consequences of misclassification errors and ignoring matching or stratifying are also considered. The various methods are extended to combination of results over the strata. Examples of case-control studies testing the association between HL-A frequencies and cancer illustrate the techniques. The parallel analyses of prospective studies are given. If P1 and P2 are small with large samples sizes the appropriate model is a Poisson distribution. This yields a exponential model with sufficient statistics. Exact conditional tests and confidence intervals can then be developed. Here we consider the case where two populations are compared adjusting for sex differences as well as for the strata (or covariate) differences such as age. The methods are applied to two examples: (1) testing in the two sexes the ratio of relative risks of skin cancer in people living in different latitudes, and (2) testing over time the ratio of the relative risks of cancer in two cities, one of which fluoridated its drinking water and one which did not. PMID:540589

  12. Statistical Analyses of Raw Material Data for MTM45-1/CF7442A-36% RW: CMH Cure Cycle

    NASA Technical Reports Server (NTRS)

    Coroneos, Rula; Pai, Shantaram, S.; Murthy, Pappu

    2013-01-01

    This report describes statistical characterization of physical properties of the composite material system MTM45-1/CF7442A, which has been tested and is currently being considered for use on spacecraft structures. This composite system is made of 6K plain weave graphite fibers in a highly toughened resin system. This report summarizes the distribution types and statistical details of the tests and the conditions for the experimental data generated. These distributions will be used in multivariate regression analyses to help determine material and design allowables for similar material systems and to establish a procedure for other material systems. Additionally, these distributions will be used in future probabilistic analyses of spacecraft structures. The specific properties that are characterized are the ultimate strength, modulus, and Poisson??s ratio by using a commercially available statistical package. Results are displayed using graphical and semigraphical methods and are included in the accompanying appendixes.

  13. Scripts for TRUMP data analyses. Part II (HLA-related data): statistical analyses specific for hematopoietic stem cell transplantation.

    PubMed

    Kanda, Junya

    2016-01-01

    The Transplant Registry Unified Management Program (TRUMP) made it possible for members of the Japan Society for Hematopoietic Cell Transplantation (JSHCT) to analyze large sets of national registry data on autologous and allogeneic hematopoietic stem cell transplantation. However, as the processes used to collect transplantation information are complex and differed over time, the background of these processes should be understood when using TRUMP data. Previously, information on the HLA locus of patients and donors had been collected using a questionnaire-based free-description method, resulting in some input errors. To correct minor but significant errors and provide accurate HLA matching data, the use of a Stata or EZR/R script offered by the JSHCT is strongly recommended when analyzing HLA data in the TRUMP dataset. The HLA mismatch direction, mismatch counting method, and different impacts of HLA mismatches by stem cell source are other important factors in the analysis of HLA data. Additionally, researchers should understand the statistical analyses specific for hematopoietic stem cell transplantation, such as competing risk, landmark analysis, and time-dependent analysis, to correctly analyze transplant data. The data center of the JSHCT can be contacted if statistical assistance is required.

  14. "What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"

    ERIC Educational Resources Information Center

    Ozturk, Elif

    2012-01-01

    The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…

  15. Statistical analyses of commercial vehicle accident factors. Volume 1 Part 1

    DOT National Transportation Integrated Search

    1978-02-01

    Procedures for conducting statistical analyses of commercial vehicle accidents have been established and initially applied. A file of some 3,000 California Highway Patrol accident reports from two areas of California during a period of about one year...

  16. The intervals method: a new approach to analyse finite element outputs using multivariate statistics

    PubMed Central

    De Esteban-Trivigno, Soledad; Püschel, Thomas A.; Fortuny, Josep

    2017-01-01

    Background In this paper, we propose a new method, named the intervals’ method, to analyse data from finite element models in a comparative multivariate framework. As a case study, several armadillo mandibles are analysed, showing that the proposed method is useful to distinguish and characterise biomechanical differences related to diet/ecomorphology. Methods The intervals’ method consists of generating a set of variables, each one defined by an interval of stress values. Each variable is expressed as a percentage of the area of the mandible occupied by those stress values. Afterwards these newly generated variables can be analysed using multivariate methods. Results Applying this novel method to the biological case study of whether armadillo mandibles differ according to dietary groups, we show that the intervals’ method is a powerful tool to characterize biomechanical performance and how this relates to different diets. This allows us to positively discriminate between specialist and generalist species. Discussion We show that the proposed approach is a useful methodology not affected by the characteristics of the finite element mesh. Additionally, the positive discriminating results obtained when analysing a difficult case study suggest that the proposed method could be a very useful tool for comparative studies in finite element analysis using multivariate statistical approaches. PMID:29043107

  17. Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.

    PubMed

    Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg

    2009-11-01

    G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.

  18. A new statistical method for design and analyses of component tolerance

    NASA Astrophysics Data System (ADS)

    Movahedi, Mohammad Mehdi; Khounsiavash, Mohsen; Otadi, Mahmood; Mosleh, Maryam

    2017-03-01

    Tolerancing conducted by design engineers to meet customers' needs is a prerequisite for producing high-quality products. Engineers use handbooks to conduct tolerancing. While use of statistical methods for tolerancing is not something new, engineers often use known distributions, including the normal distribution. Yet, if the statistical distribution of the given variable is unknown, a new statistical method will be employed to design tolerance. In this paper, we use generalized lambda distribution for design and analyses component tolerance. We use percentile method (PM) to estimate the distribution parameters. The findings indicated that, when the distribution of the component data is unknown, the proposed method can be used to expedite the design of component tolerance. Moreover, in the case of assembled sets, more extensive tolerance for each component with the same target performance can be utilized.

  19. Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kleijnen, J.P.C.; Helton, J.C.

    1999-04-01

    The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are consideredmore » for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.« less

  20. Living systematic reviews: 3. Statistical methods for updating meta-analyses.

    PubMed

    Simmonds, Mark; Salanti, Georgia; McKenzie, Joanne; Elliott, Julian

    2017-11-01

    A living systematic review (LSR) should keep the review current as new research evidence emerges. Any meta-analyses included in the review will also need updating as new material is identified. If the aim of the review is solely to present the best current evidence standard meta-analysis may be sufficient, provided reviewers are aware that results may change at later updates. If the review is used in a decision-making context, more caution may be needed. When using standard meta-analysis methods, the chance of incorrectly concluding that any updated meta-analysis is statistically significant when there is no effect (the type I error) increases rapidly as more updates are performed. Inaccurate estimation of any heterogeneity across studies may also lead to inappropriate conclusions. This paper considers four methods to avoid some of these statistical problems when updating meta-analyses: two methods, that is, law of the iterated logarithm and the Shuster method control primarily for inflation of type I error and two other methods, that is, trial sequential analysis and sequential meta-analysis control for type I and II errors (failing to detect a genuine effect) and take account of heterogeneity. This paper compares the methods and considers how they could be applied to LSRs. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. Hydrometeorological and statistical analyses of heavy rainfall in Midwestern USA

    NASA Astrophysics Data System (ADS)

    Thorndahl, S.; Smith, J. A.; Krajewski, W. F.

    2012-04-01

    During the last two decades the mid-western states of the United States of America has been largely afflicted by heavy flood producing rainfall. Several of these storms seem to have similar hydrometeorological properties in terms of pattern, track, evolution, life cycle, clustering, etc. which raise the question if it is possible to derive general characteristics of the space-time structures of these heavy storms. This is important in order to understand hydrometeorological features, e.g. how storms evolve and with what frequency we can expect extreme storms to occur. In the literature, most studies of extreme rainfall are based on point measurements (rain gauges). However, with high resolution and quality radar observation periods exceeding more than two decades, it is possible to do long-term spatio-temporal statistical analyses of extremes. This makes it possible to link return periods to distributed rainfall estimates and to study precipitation structures which cause floods. However, doing these statistical frequency analyses of rainfall based on radar observations introduces some different challenges, converting radar reflectivity observations to "true" rainfall, which are not problematic doing traditional analyses on rain gauge data. It is for example difficult to distinguish reflectivity from high intensity rain from reflectivity from other hydrometeors such as hail, especially using single polarization radars which are used in this study. Furthermore, reflectivity from bright band (melting layer) should be discarded and anomalous propagation should be corrected in order to produce valid statistics of extreme radar rainfall. Other challenges include combining observations from several radars to one mosaic, bias correction against rain gauges, range correction, ZR-relationships, etc. The present study analyzes radar rainfall observations from 1996 to 2011 based the American NEXRAD network of radars over an area covering parts of Iowa, Wisconsin, Illinois, and

  2. Influence of peer review on the reporting of primary outcome(s) and statistical analyses of randomised trials.

    PubMed

    Hopewell, Sally; Witt, Claudia M; Linde, Klaus; Icke, Katja; Adedire, Olubusola; Kirtley, Shona; Altman, Douglas G

    2018-01-11

    Selective reporting of outcomes in clinical trials is a serious problem. We aimed to investigate the influence of the peer review process within biomedical journals on reporting of primary outcome(s) and statistical analyses within reports of randomised trials. Each month, PubMed (May 2014 to April 2015) was searched to identify primary reports of randomised trials published in six high-impact general and 12 high-impact specialty journals. The corresponding author of each trial was invited to complete an online survey asking authors about changes made to their manuscript as part of the peer review process. Our main outcomes were to assess: (1) the nature and extent of changes as part of the peer review process, in relation to reporting of the primary outcome(s) and/or primary statistical analysis; (2) how often authors followed these requests; and (3) whether this was related to specific journal or trial characteristics. Of 893 corresponding authors who were invited to take part in the online survey 258 (29%) responded. The majority of trials were multicentre (n = 191; 74%); median sample size 325 (IQR 138 to 1010). The primary outcome was clearly defined in 92% (n = 238), of which the direction of treatment effect was statistically significant in 49%. The majority responded (1-10 Likert scale) they were satisfied with the overall handling (mean 8.6, SD 1.5) and quality of peer review (mean 8.5, SD 1.5) of their manuscript. Only 3% (n = 8) said that the editor or peer reviewers had asked them to change or clarify the trial's primary outcome. However, 27% (n = 69) reported they were asked to change or clarify the statistical analysis of the primary outcome; most had fulfilled the request, the main motivation being to improve the statistical methods (n = 38; 55%) or avoid rejection (n = 30; 44%). Overall, there was little association between authors being asked to make this change and the type of journal, intervention, significance of the

  3. Analyses of the 1981-82 Illinois Public Library Statistics.

    ERIC Educational Resources Information Center

    Wallace, Danny P.

    Using data provided by the annual reports of Illinois public libraries and by the Illinois state library, this publication is a companion to the November 1982 issue of "Illinois Libraries," which enumerated the 16 data elements upon which the analyses are based. Three additional types of information are provided for each of six…

  4. Statistical analyses of Higgs- and Z -portal dark matter models

    NASA Astrophysics Data System (ADS)

    Ellis, John; Fowlie, Andrew; Marzola, Luca; Raidal, Martti

    2018-06-01

    We perform frequentist and Bayesian statistical analyses of Higgs- and Z -portal models of dark matter particles with spin 0, 1 /2 , and 1. Our analyses incorporate data from direct detection and indirect detection experiments, as well as LHC searches for monojet and monophoton events, and we also analyze the potential impacts of future direct detection experiments. We find acceptable regions of the parameter spaces for Higgs-portal models with real scalar, neutral vector, Majorana, or Dirac fermion dark matter particles, and Z -portal models with Majorana or Dirac fermion dark matter particles. In many of these cases, there are interesting prospects for discovering dark matter particles in Higgs or Z decays, as well as dark matter particles weighing ≳100 GeV . Negative results from planned direct detection experiments would still allow acceptable regions for Higgs- and Z -portal models with Majorana or Dirac fermion dark matter particles.

  5. Across-cohort QC analyses of GWAS summary statistics from complex traits.

    PubMed

    Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

    2016-01-01

    Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics F st statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.

  6. Across-cohort QC analyses of GWAS summary statistics from complex traits

    PubMed Central

    Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

    2017-01-01

    Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics Fst statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy. PMID:27552965

  7. A retrospective survey of research design and statistical analyses in selected Chinese medical journals in 1998 and 2008.

    PubMed

    Jin, Zhichao; Yu, Danghui; Zhang, Luoman; Meng, Hong; Lu, Jian; Gao, Qingbin; Cao, Yang; Ma, Xiuqiang; Wu, Cheng; He, Qian; Wang, Rui; He, Jia

    2010-05-25

    High quality clinical research not only requires advanced professional knowledge, but also needs sound study design and correct statistical analyses. The number of clinical research articles published in Chinese medical journals has increased immensely in the past decade, but study design quality and statistical analyses have remained suboptimal. The aim of this investigation was to gather evidence on the quality of study design and statistical analyses in clinical researches conducted in China for the first decade of the new millennium. Ten (10) leading Chinese medical journals were selected and all original articles published in 1998 (N = 1,335) and 2008 (N = 1,578) were thoroughly categorized and reviewed. A well-defined and validated checklist on study design, statistical analyses, results presentation, and interpretation was used for review and evaluation. Main outcomes were the frequencies of different types of study design, error/defect proportion in design and statistical analyses, and implementation of CONSORT in randomized clinical trials. From 1998 to 2008: The error/defect proportion in statistical analyses decreased significantly ( = 12.03, p<0.001), 59.8% (545/1,335) in 1998 compared to 52.2% (664/1,578) in 2008. The overall error/defect proportion of study design also decreased ( = 21.22, p<0.001), 50.9% (680/1,335) compared to 42.40% (669/1,578). In 2008, design with randomized clinical trials remained low in single digit (3.8%, 60/1,578) with two-third showed poor results reporting (defects in 44 papers, 73.3%). Nearly half of the published studies were retrospective in nature, 49.3% (658/1,335) in 1998 compared to 48.2% (761/1,578) in 2008. Decreases in defect proportions were observed in both results presentation ( = 93.26, p<0.001), 92.7% (945/1,019) compared to 78.2% (1023/1,309) and interpretation ( = 27.26, p<0.001), 9.7% (99/1,019) compared to 4.3% (56/1,309), some serious ones persisted. Chinese medical research seems to have made

  8. Chasing the peak: optimal statistics for weak shear analyses

    NASA Astrophysics Data System (ADS)

    Smit, Merijn; Kuijken, Konrad

    2018-01-01

    Context. Weak gravitational lensing analyses are fundamentally limited by the intrinsic distribution of galaxy shapes. It is well known that this distribution of galaxy ellipticity is non-Gaussian, and the traditional estimation methods, explicitly or implicitly assuming Gaussianity, are not necessarily optimal. Aims: We aim to explore alternative statistics for samples of ellipticity measurements. An optimal estimator needs to be asymptotically unbiased, efficient, and robust in retaining these properties for various possible sample distributions. We take the non-linear mapping of gravitational shear and the effect of noise into account. We then discuss how the distribution of individual galaxy shapes in the observed field of view can be modeled by fitting Fourier modes to the shear pattern directly. This allows scientific analyses using statistical information of the whole field of view, instead of locally sparse and poorly constrained estimates. Methods: We simulated samples of galaxy ellipticities, using both theoretical distributions and data for ellipticities and noise. We determined the possible bias Δe, the efficiency η and the robustness of the least absolute deviations, the biweight, and the convex hull peeling (CHP) estimators, compared to the canonical weighted mean. Using these statistics for regression, we have shown the applicability of direct Fourier mode fitting. Results: We find an improved performance of all estimators, when iteratively reducing the residuals after de-shearing the ellipticity samples by the estimated shear, which removes the asymmetry in the ellipticity distributions. We show that these estimators are then unbiased in the absence of noise, and decrease noise bias by more than 30%. Our results show that the CHP estimator distribution is skewed, but still centered around the underlying shear, and its bias least affected by noise. We find the least absolute deviations estimator to be the most efficient estimator in almost all

  9. A Retrospective Survey of Research Design and Statistical Analyses in Selected Chinese Medical Journals in 1998 and 2008

    PubMed Central

    Jin, Zhichao; Yu, Danghui; Zhang, Luoman; Meng, Hong; Lu, Jian; Gao, Qingbin; Cao, Yang; Ma, Xiuqiang; Wu, Cheng; He, Qian; Wang, Rui; He, Jia

    2010-01-01

    Background High quality clinical research not only requires advanced professional knowledge, but also needs sound study design and correct statistical analyses. The number of clinical research articles published in Chinese medical journals has increased immensely in the past decade, but study design quality and statistical analyses have remained suboptimal. The aim of this investigation was to gather evidence on the quality of study design and statistical analyses in clinical researches conducted in China for the first decade of the new millennium. Methodology/Principal Findings Ten (10) leading Chinese medical journals were selected and all original articles published in 1998 (N = 1,335) and 2008 (N = 1,578) were thoroughly categorized and reviewed. A well-defined and validated checklist on study design, statistical analyses, results presentation, and interpretation was used for review and evaluation. Main outcomes were the frequencies of different types of study design, error/defect proportion in design and statistical analyses, and implementation of CONSORT in randomized clinical trials. From 1998 to 2008: The error/defect proportion in statistical analyses decreased significantly ( = 12.03, p<0.001), 59.8% (545/1,335) in 1998 compared to 52.2% (664/1,578) in 2008. The overall error/defect proportion of study design also decreased ( = 21.22, p<0.001), 50.9% (680/1,335) compared to 42.40% (669/1,578). In 2008, design with randomized clinical trials remained low in single digit (3.8%, 60/1,578) with two-third showed poor results reporting (defects in 44 papers, 73.3%). Nearly half of the published studies were retrospective in nature, 49.3% (658/1,335) in 1998 compared to 48.2% (761/1,578) in 2008. Decreases in defect proportions were observed in both results presentation ( = 93.26, p<0.001), 92.7% (945/1,019) compared to 78.2% (1023/1,309) and interpretation ( = 27.26, p<0.001), 9.7% (99/1,019) compared to 4.3% (56/1,309), some serious

  10. Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science.

    PubMed

    Veldkamp, Coosje L S; Nuijten, Michèle B; Dominguez-Alvarez, Linda; van Assen, Marcel A L M; Wicherts, Jelte M

    2014-01-01

    Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this 'co-piloting' currently occurs in psychology, we surveyed the authors of 697 articles published in six top psychology journals and asked them whether they had collaborated on four aspects of analyzing data and reporting results, and whether the described data had been shared between the authors. We acquired responses for 49.6% of the articles and found that co-piloting on statistical analysis and reporting results is quite uncommon among psychologists, while data sharing among co-authors seems reasonably but not completely standard. We then used an automated procedure to study the prevalence of statistical reporting errors in the articles in our sample and examined the relationship between reporting errors and co-piloting. Overall, 63% of the articles contained at least one p-value that was inconsistent with the reported test statistic and the accompanying degrees of freedom, and 20% of the articles contained at least one p-value that was inconsistent to such a degree that it may have affected decisions about statistical significance. Overall, the probability that a given p-value was inconsistent was over 10%. Co-piloting was not found to be associated with reporting errors.

  11. Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science

    PubMed Central

    Veldkamp, Coosje L. S.; Nuijten, Michèle B.; Dominguez-Alvarez, Linda; van Assen, Marcel A. L. M.; Wicherts, Jelte M.

    2014-01-01

    Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this ‘co-piloting’ currently occurs in psychology, we surveyed the authors of 697 articles published in six top psychology journals and asked them whether they had collaborated on four aspects of analyzing data and reporting results, and whether the described data had been shared between the authors. We acquired responses for 49.6% of the articles and found that co-piloting on statistical analysis and reporting results is quite uncommon among psychologists, while data sharing among co-authors seems reasonably but not completely standard. We then used an automated procedure to study the prevalence of statistical reporting errors in the articles in our sample and examined the relationship between reporting errors and co-piloting. Overall, 63% of the articles contained at least one p-value that was inconsistent with the reported test statistic and the accompanying degrees of freedom, and 20% of the articles contained at least one p-value that was inconsistent to such a degree that it may have affected decisions about statistical significance. Overall, the probability that a given p-value was inconsistent was over 10%. Co-piloting was not found to be associated with reporting errors. PMID:25493918

  12. Statistical parameters of random heterogeneity estimated by analysing coda waves based on finite difference method

    NASA Astrophysics Data System (ADS)

    Emoto, K.; Saito, T.; Shiomi, K.

    2017-12-01

    Short-period (<1 s) seismograms are strongly affected by small-scale (<10 km) heterogeneities in the lithosphere. In general, short-period seismograms are analysed based on the statistical method by considering the interaction between seismic waves and randomly distributed small-scale heterogeneities. Statistical properties of the random heterogeneities have been estimated by analysing short-period seismograms. However, generally, the small-scale random heterogeneity is not taken into account for the modelling of long-period (>2 s) seismograms. We found that the energy of the coda of long-period seismograms shows a spatially flat distribution. This phenomenon is well known in short-period seismograms and results from the scattering by small-scale heterogeneities. We estimate the statistical parameters that characterize the small-scale random heterogeneity by modelling the spatiotemporal energy distribution of long-period seismograms. We analyse three moderate-size earthquakes that occurred in southwest Japan. We calculate the spatial distribution of the energy density recorded by a dense seismograph network in Japan at the period bands of 8-16 s, 4-8 s and 2-4 s and model them by using 3-D finite difference (FD) simulations. Compared to conventional methods based on statistical theories, we can calculate more realistic synthetics by using the FD simulation. It is not necessary to assume a uniform background velocity, body or surface waves and scattering properties considered in general scattering theories. By taking the ratio of the energy of the coda area to that of the entire area, we can separately estimate the scattering and the intrinsic absorption effects. Our result reveals the spectrum of the random inhomogeneity in a wide wavenumber range including the intensity around the corner wavenumber as P(m) = 8πε2a3/(1 + a2m2)2, where ε = 0.05 and a = 3.1 km, even though past studies analysing higher-frequency records could not detect the corner. Finally, we

  13. A guide to statistical analysis in microbial ecology: a community-focused, living review of multivariate data analyses.

    PubMed

    Buttigieg, Pier Luigi; Ramette, Alban

    2014-12-01

    The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynamic, web-based resource providing accessible descriptions of numerous multivariate techniques relevant to microbial ecologists. A combination of interactive elements allows users to discover and navigate between methods relevant to their needs and examine how they have been used by others in the field. We have designed GUSTA ME to become a community-led and -curated service, which we hope will provide a common reference and forum to discuss and disseminate analytical techniques relevant to the microbial ecology community. © 2014 The Authors. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies.

  14. Nonindependence and sensitivity analyses in ecological and evolutionary meta-analyses.

    PubMed

    Noble, Daniel W A; Lagisz, Malgorzata; O'dea, Rose E; Nakagawa, Shinichi

    2017-05-01

    Meta-analysis is an important tool for synthesizing research on a variety of topics in ecology and evolution, including molecular ecology, but can be susceptible to nonindependence. Nonindependence can affect two major interrelated components of a meta-analysis: (i) the calculation of effect size statistics and (ii) the estimation of overall meta-analytic estimates and their uncertainty. While some solutions to nonindependence exist at the statistical analysis stages, there is little advice on what to do when complex analyses are not possible, or when studies with nonindependent experimental designs exist in the data. Here we argue that exploring the effects of procedural decisions in a meta-analysis (e.g. inclusion of different quality data, choice of effect size) and statistical assumptions (e.g. assuming no phylogenetic covariance) using sensitivity analyses are extremely important in assessing the impact of nonindependence. Sensitivity analyses can provide greater confidence in results and highlight important limitations of empirical work (e.g. impact of study design on overall effects). Despite their importance, sensitivity analyses are seldom applied to problems of nonindependence. To encourage better practice for dealing with nonindependence in meta-analytic studies, we present accessible examples demonstrating the impact that ignoring nonindependence can have on meta-analytic estimates. We also provide pragmatic solutions for dealing with nonindependent study designs, and for analysing dependent effect sizes. Additionally, we offer reporting guidelines that will facilitate disclosure of the sources of nonindependence in meta-analyses, leading to greater transparency and more robust conclusions. © 2017 John Wiley & Sons Ltd.

  15. DESIGNING ENVIRONMENTAL MONITORING DATABASES FOR STATISTIC ASSESSMENT

    EPA Science Inventory

    Databases designed for statistical analyses have characteristics that distinguish them from databases intended for general use. EMAP uses a probabilistic sampling design to collect data to produce statistical assessments of environmental conditions. In addition to supporting the ...

  16. A weighted U-statistic for genetic association analyses of sequencing data.

    PubMed

    Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J; Lu, Qing

    2014-12-01

    With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. © 2014 WILEY PERIODICALS, INC.

  17. Statistical analyses on sandstones: Systematic approach for predicting petrographical and petrophysical properties

    NASA Astrophysics Data System (ADS)

    Stück, H. L.; Siegesmund, S.

    2012-04-01

    Sandstones are a popular natural stone due to their wide occurrence and availability. The different applications for these stones have led to an increase in demand. From the viewpoint of conservation and the natural stone industry, an understanding of the material behaviour of this construction material is very important. Sandstones are a highly heterogeneous material. Based on statistical analyses with a sufficiently large dataset, a systematic approach to predicting the material behaviour should be possible. Since the literature already contains a large volume of data concerning the petrographical and petrophysical properties of sandstones, a large dataset could be compiled for the statistical analyses. The aim of this study is to develop constraints on the material behaviour and especially on the weathering behaviour of sandstones. Approximately 300 samples from historical and presently mined natural sandstones in Germany and ones described worldwide were included in the statistical approach. The mineralogical composition and fabric characteristics were determined from detailed thin section analyses and descriptions in the literature. Particular attention was paid to evaluating the compositional and textural maturity, grain contact respectively contact thickness, type of cement, degree of alteration and the intergranular volume. Statistical methods were used to test for normal distributions and calculating the linear regression of the basic petrophysical properties of density, porosity, water uptake as well as the strength. The sandstones were classified into three different pore size distributions and evaluated with the other petrophysical properties. Weathering behavior like hygric swelling and salt loading tests were also included. To identify similarities between individual sandstones or to define groups of specific sandstone types, principle component analysis, cluster analysis and factor analysis were applied. Our results show that composition and porosity

  18. Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice

    PubMed Central

    Stewart, Gavin B.; Altman, Douglas G.; Askie, Lisa M.; Duley, Lelia; Simmonds, Mark C.; Stewart, Lesley A.

    2012-01-01

    Background Individual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and Findings We included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. Conclusions For these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials

  19. Testing Genetic Pleiotropy with GWAS Summary Statistics for Marginal and Conditional Analyses.

    PubMed

    Deng, Yangqing; Pan, Wei

    2017-12-01

    There is growing interest in testing genetic pleiotropy, which is when a single genetic variant influences multiple traits. Several methods have been proposed; however, these methods have some limitations. First, all the proposed methods are based on the use of individual-level genotype and phenotype data; in contrast, for logistical, and other, reasons, summary statistics of univariate SNP-trait associations are typically only available based on meta- or mega-analyzed large genome-wide association study (GWAS) data. Second, existing tests are based on marginal pleiotropy, which cannot distinguish between direct and indirect associations of a single genetic variant with multiple traits due to correlations among the traits. Hence, it is useful to consider conditional analysis, in which a subset of traits is adjusted for another subset of traits. For example, in spite of substantial lowering of low-density lipoprotein cholesterol (LDL) with statin therapy, some patients still maintain high residual cardiovascular risk, and, for these patients, it might be helpful to reduce their triglyceride (TG) level. For this purpose, in order to identify new therapeutic targets, it would be useful to identify genetic variants with pleiotropic effects on LDL and TG after adjusting the latter for LDL; otherwise, a pleiotropic effect of a genetic variant detected by a marginal model could simply be due to its association with LDL only, given the well-known correlation between the two types of lipids. Here, we develop a new pleiotropy testing procedure based only on GWAS summary statistics that can be applied for both marginal analysis and conditional analysis. Although the main technical development is based on published union-intersection testing methods, care is needed in specifying conditional models to avoid invalid statistical estimation and inference. In addition to the previously used likelihood ratio test, we also propose using generalized estimating equations under the

  20. Methods in pharmacoepidemiology: a review of statistical analyses and data reporting in pediatric drug utilization studies.

    PubMed

    Sequi, Marco; Campi, Rita; Clavenna, Antonio; Bonati, Maurizio

    2013-03-01

    To evaluate the quality of data reporting and statistical methods performed in drug utilization studies in the pediatric population. Drug utilization studies evaluating all drug prescriptions to children and adolescents published between January 1994 and December 2011 were retrieved and analyzed. For each study, information on measures of exposure/consumption, the covariates considered, descriptive and inferential analyses, statistical tests, and methods of data reporting was extracted. An overall quality score was created for each study using a 12-item checklist that took into account the presence of outcome measures, covariates of measures, descriptive measures, statistical tests, and graphical representation. A total of 22 studies were reviewed and analyzed. Of these, 20 studies reported at least one descriptive measure. The mean was the most commonly used measure (18 studies), but only five of these also reported the standard deviation. Statistical analyses were performed in 12 studies, with the chi-square test being the most commonly performed test. Graphs were presented in 14 papers. Sixteen papers reported the number of drug prescriptions and/or packages, and ten reported the prevalence of the drug prescription. The mean quality score was 8 (median 9). Only seven of the 22 studies received a score of ≥10, while four studies received a score of <6. Our findings document that only a few of the studies reviewed applied statistical methods and reported data in a satisfactory manner. We therefore conclude that the methodology of drug utilization studies needs to be improved.

  1. Transformation (normalization) of slope gradient and surface curvatures, automated for statistical analyses from DEMs

    NASA Astrophysics Data System (ADS)

    Csillik, O.; Evans, I. S.; Drăguţ, L.

    2015-03-01

    Automated procedures are developed to alleviate long tails in frequency distributions of morphometric variables. They minimize the skewness of slope gradient frequency distributions, and modify the kurtosis of profile and plan curvature distributions toward that of the Gaussian (normal) model. Box-Cox (for slope) and arctangent (for curvature) transformations are tested on nine digital elevation models (DEMs) of varying origin and resolution, and different landscapes, and shown to be effective. Resulting histograms are illustrated and show considerable improvements over those for previously recommended slope transformations (sine, square root of sine, and logarithm of tangent). Unlike previous approaches, the proposed method evaluates the frequency distribution of slope gradient values in a given area and applies the most appropriate transform if required. Sensitivity of the arctangent transformation is tested, showing that Gaussian-kurtosis transformations are acceptable also in terms of histogram shape. Cube root transformations of curvatures produced bimodal histograms. The transforms are applicable to morphometric variables and many others with skewed or long-tailed distributions. By avoiding long tails and outliers, they permit parametric statistics such as correlation, regression and principal component analyses to be applied, with greater confidence that requirements for linearity, additivity and even scatter of residuals (constancy of error variance) are likely to be met. It is suggested that such transformations should be routinely applied in all parametric analyses of long-tailed variables. Our Box-Cox and curvature automated transformations are based on a Python script, implemented as an easy-to-use script tool in ArcGIS.

  2. A d-statistic for single-case designs that is equivalent to the usual between-groups d-statistic.

    PubMed

    Shadish, William R; Hedges, Larry V; Pustejovsky, James E; Boyajian, Jonathan G; Sullivan, Kristynn J; Andrade, Alma; Barrientos, Jeannette L

    2014-01-01

    We describe a standardised mean difference statistic (d) for single-case designs that is equivalent to the usual d in between-groups experiments. We show how it can be used to summarise treatment effects over cases within a study, to do power analyses in planning new studies and grant proposals, and to meta-analyse effects across studies of the same question. We discuss limitations of this d-statistic, and possible remedies to them. Even so, this d-statistic is better founded statistically than other effect size measures for single-case design, and unlike many general linear model approaches such as multilevel modelling or generalised additive models, it produces a standardised effect size that can be integrated over studies with different outcome measures. SPSS macros for both effect size computation and power analysis are available.

  3. Review of Statistical Methods for Analysing Healthcare Resources and Costs

    PubMed Central

    Mihaylova, Borislava; Briggs, Andrew; O'Hagan, Anthony; Thompson, Simon G

    2011-01-01

    We review statistical methods for analysing healthcare resource use and costs, their ability to address skewness, excess zeros, multimodality and heavy right tails, and their ease for general use. We aim to provide guidance on analysing resource use and costs focusing on randomised trials, although methods often have wider applicability. Twelve broad categories of methods were identified: (I) methods based on the normal distribution, (II) methods following transformation of data, (III) single-distribution generalized linear models (GLMs), (IV) parametric models based on skewed distributions outside the GLM family, (V) models based on mixtures of parametric distributions, (VI) two (or multi)-part and Tobit models, (VII) survival methods, (VIII) non-parametric methods, (IX) methods based on truncation or trimming of data, (X) data components models, (XI) methods based on averaging across models, and (XII) Markov chain methods. Based on this review, our recommendations are that, first, simple methods are preferred in large samples where the near-normality of sample means is assured. Second, in somewhat smaller samples, relatively simple methods, able to deal with one or two of above data characteristics, may be preferable but checking sensitivity to assumptions is necessary. Finally, some more complex methods hold promise, but are relatively untried; their implementation requires substantial expertise and they are not currently recommended for wider applied work. Copyright © 2010 John Wiley & Sons, Ltd. PMID:20799344

  4. [Statistical analysis using freely-available "EZR (Easy R)" software].

    PubMed

    Kanda, Yoshinobu

    2015-10-01

    Clinicians must often perform statistical analyses for purposes such evaluating preexisting evidence and designing or executing clinical studies. R is a free software environment for statistical computing. R supports many statistical analysis functions, but does not incorporate a statistical graphical user interface (GUI). The R commander provides an easy-to-use basic-statistics GUI for R. However, the statistical function of the R commander is limited, especially in the field of biostatistics. Therefore, the author added several important statistical functions to the R commander and named it "EZR (Easy R)", which is now being distributed on the following website: http://www.jichi.ac.jp/saitama-sct/. EZR allows the application of statistical functions that are frequently used in clinical studies, such as survival analyses, including competing risk analyses and the use of time-dependent covariates and so on, by point-and-click access. In addition, by saving the script automatically created by EZR, users can learn R script writing, maintain the traceability of the analysis, and assure that the statistical process is overseen by a supervisor.

  5. A systematic review of the quality of statistical methods employed for analysing quality of life data in cancer randomised controlled trials.

    PubMed

    Hamel, Jean-Francois; Saulnier, Patrick; Pe, Madeline; Zikos, Efstathios; Musoro, Jammbe; Coens, Corneel; Bottomley, Andrew

    2017-09-01

    Over the last decades, Health-related Quality of Life (HRQoL) end-points have become an important outcome of the randomised controlled trials (RCTs). HRQoL methodology in RCTs has improved following international consensus recommendations. However, no international recommendations exist concerning the statistical analysis of such data. The aim of our study was to identify and characterise the quality of the statistical methods commonly used for analysing HRQoL data in cancer RCTs. Building on our recently published systematic review, we analysed a total of 33 published RCTs studying the HRQoL methods reported in RCTs since 1991. We focussed on the ability of the methods to deal with the three major problems commonly encountered when analysing HRQoL data: their multidimensional and longitudinal structure and the commonly high rate of missing data. All studies reported HRQoL being assessed repeatedly over time for a period ranging from 2 to 36 months. Missing data were common, with compliance rates ranging from 45% to 90%. From the 33 studies considered, 12 different statistical methods were identified. Twenty-nine studies analysed each of the questionnaire sub-dimensions without type I error adjustment. Thirteen studies repeated the HRQoL analysis at each assessment time again without type I error adjustment. Only 8 studies used methods suitable for repeated measurements. Our findings show a lack of consistency in statistical methods for analysing HRQoL data. Problems related to multiple comparisons were rarely considered leading to a high risk of false positive results. It is therefore critical that international recommendations for improving such statistical practices are developed. Copyright © 2017. Published by Elsevier Ltd.

  6. Statistical technique for analysing functional connectivity of multiple spike trains.

    PubMed

    Masud, Mohammad Shahed; Borisyuk, Roman

    2011-03-15

    A new statistical technique, the Cox method, used for analysing functional connectivity of simultaneously recorded multiple spike trains is presented. This method is based on the theory of modulated renewal processes and it estimates a vector of influence strengths from multiple spike trains (called reference trains) to the selected (target) spike train. Selecting another target spike train and repeating the calculation of the influence strengths from the reference spike trains enables researchers to find all functional connections among multiple spike trains. In order to study functional connectivity an "influence function" is identified. This function recognises the specificity of neuronal interactions and reflects the dynamics of postsynaptic potential. In comparison to existing techniques, the Cox method has the following advantages: it does not use bins (binless method); it is applicable to cases where the sample size is small; it is sufficiently sensitive such that it estimates weak influences; it supports the simultaneous analysis of multiple influences; it is able to identify a correct connectivity scheme in difficult cases of "common source" or "indirect" connectivity. The Cox method has been thoroughly tested using multiple sets of data generated by the neural network model of the leaky integrate and fire neurons with a prescribed architecture of connections. The results suggest that this method is highly successful for analysing functional connectivity of simultaneously recorded multiple spike trains. Copyright © 2011 Elsevier B.V. All rights reserved.

  7. Statistical Analyses of Femur Parameters for Designing Anatomical Plates.

    PubMed

    Wang, Lin; He, Kunjin; Chen, Zhengming

    2016-01-01

    Femur parameters are key prerequisites for scientifically designing anatomical plates. Meanwhile, individual differences in femurs present a challenge to design well-fitting anatomical plates. Therefore, to design anatomical plates more scientifically, analyses of femur parameters with statistical methods were performed in this study. The specific steps were as follows. First, taking eight anatomical femur parameters as variables, 100 femur samples were classified into three classes with factor analysis and Q-type cluster analysis. Second, based on the mean parameter values of the three classes of femurs, three sizes of average anatomical plates corresponding to the three classes of femurs were designed. Finally, based on Bayes discriminant analysis, a new femur could be assigned to the proper class. Thereafter, the average anatomical plate suitable for that new femur was selected from the three available sizes of plates. Experimental results showed that the classification of femurs was quite reasonable based on the anatomical aspects of the femurs. For instance, three sizes of condylar buttress plates were designed. Meanwhile, 20 new femurs are judged to which classes the femurs belong. Thereafter, suitable condylar buttress plates were determined and selected.

  8. Statistical analyses to support guidelines for marine avian sampling. Final report

    USGS Publications Warehouse

    Kinlan, Brian P.; Zipkin, Elise; O'Connell, Allan F.; Caldow, Chris

    2012-01-01

    distribution to describe counts of a given species in a particular region and season. 4. Using a large database of historical at-sea seabird survey data, we applied this technique to identify appropriate statistical distributions for modeling a variety of species, allowing the distribution to vary by season. For each species and season, we used the selected distribution to calculate and map retrospective statistical power to detect hotspots and coldspots, and map pvalues from Monte Carlo significance tests of hotspots and coldspots, in discrete lease blocks designated by the U.S. Department of Interior, Bureau of Ocean Energy Management (BOEM). 5. Because our definition of hotspots and coldspots does not explicitly include variability over time, we examine the relationship between the temporal scale of sampling and the proportion of variance captured in time series of key environmental correlates of marine bird abundance, as well as available marine bird abundance time series, and use these analyses to develop recommendations for the temporal distribution of sampling to adequately represent both shortterm and long-term variability. We conclude by presenting a schematic “decision tree” showing how this power analysis approach would fit in a general framework for avian survey design, and discuss implications of model assumptions and results. We discuss avenues for future development of this work, and recommendations for practical implementation in the context of siting and wildlife assessment for offshore renewable energy development projects.

  9. On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis.

    PubMed

    Li, Bing; Chun, Hyonho; Zhao, Hongyu

    2014-09-01

    We introduce a nonparametric method for estimating non-gaussian graphical models based on a new statistical relation called additive conditional independence, which is a three-way relation among random vectors that resembles the logical structure of conditional independence. Additive conditional independence allows us to use one-dimensional kernel regardless of the dimension of the graph, which not only avoids the curse of dimensionality but also simplifies computation. It also gives rise to a parallel structure to the gaussian graphical model that replaces the precision matrix by an additive precision operator. The estimators derived from additive conditional independence cover the recently introduced nonparanormal graphical model as a special case, but outperform it when the gaussian copula assumption is violated. We compare the new method with existing ones by simulations and in genetic pathway analysis.

  10. Statistical analyses of influence of solar and geomagnetic activities on car accident events

    NASA Astrophysics Data System (ADS)

    Alania, M. V.; Gil, A.; Wieliczuk, R.

    2001-01-01

    Statistical analyses of the influence of Solar and geomagnetic activity, sector structure of the interplanetary magnetic field and galactic cosmic ray Forbush effects on car accident events in Poland for the period of 1990-1999 have been carried out. Using auto-correlation, cross-correlation, spectral analyses and superposition epochs methods it has been shown that there are separate periods when car accident events have direct correlation with Ap index of the geomagnetic activity, sector structure of the interplanetary magnetic field and Forbush decreases of galactic cosmic rays. Nevertheless, the single-valued direct correlation is not possible to reveal for the whole period of 1990-1999. Periodicity of 7 days and its second harmonic (3.5 days) has been reliably revealed in the car accident events data in Poland for the each year of the period 1990-1999. It is shown that the maximum car accident events take place in Poland on Friday and practically does not depend on the level of solar and geomagnetic activities.

  11. Using a Five-Step Procedure for Inferential Statistical Analyses

    ERIC Educational Resources Information Center

    Kamin, Lawrence F.

    2010-01-01

    Many statistics texts pose inferential statistical problems in a disjointed way. By using a simple five-step procedure as a template for statistical inference problems, the student can solve problems in an organized fashion. The problem and its solution will thus be a stand-by-itself organic whole and a single unit of thought and effort. The…

  12. Neurotoxicological and statistical analyses of a mixture of five organophosphorus pesticides using a ray design.

    PubMed

    Moser, V C; Casey, M; Hamm, A; Carter, W H; Simmons, J E; Gennings, C

    2005-07-01

    Environmental exposures generally involve chemical mixtures instead of single chemicals. Statistical models such as the fixed-ratio ray design, wherein the mixing ratio (proportions) of the chemicals is fixed across increasing mixture doses, allows for the detection and characterization of interactions among the chemicals. In this study, we tested for interaction(s) in a mixture of five organophosphorus (OP) pesticides (chlorpyrifos, diazinon, dimethoate, acephate, and malathion). The ratio of the five pesticides (full ray) reflected the relative dietary exposure estimates of the general population as projected by the US EPA Dietary Exposure Evaluation Model (DEEM). A second mixture was tested using the same dose levels of all pesticides, but excluding malathion (reduced ray). The experimental approach first required characterization of dose-response curves for the individual OPs to build a dose-additivity model. A series of behavioral measures were evaluated in adult male Long-Evans rats at the time of peak effect following a single oral dose, and then tissues were collected for measurement of cholinesterase (ChE) activity. Neurochemical (blood and brain cholinesterase [ChE] activity) and behavioral (motor activity, gait score, tail-pinch response score) endpoints were evaluated statistically for evidence of additivity. The additivity model constructed from the single chemical data was used to predict the effects of the pesticide mixture along the full ray (10-450 mg/kg) and the reduced ray (1.75-78.8 mg/kg). The experimental mixture data were also modeled and statistically compared to the additivity models. Analysis of the 5-OP mixture (the full ray) revealed significant deviation from additivity for all endpoints except tail-pinch response. Greater-than-additive responses (synergism) were observed at the lower doses of the 5-OP mixture, which contained non-effective dose levels of each of the components. The predicted effective doses (ED20, ED50) were about half

  13. Statistical analyses and characteristics of volcanic tremor on Stromboli Volcano (Italy)

    NASA Astrophysics Data System (ADS)

    Falsaperla, S.; Langer, H.; Spampinato, S.

    A study of volcanic tremor on Stromboli is carried out on the basis of data recorded daily between 1993 and 1995 by a permanent seismic station (STR) located 1.8km away from the active craters. We also consider the signal of a second station (TF1), which operated for a shorter time span. Changes in the spectral tremor characteristics can be related to modifications in volcanic activity, particularly to lava effusions and explosive sequences. Statistical analyses were carried out on a set of spectra calculated daily from seismic signals where explosion quakes were present or excluded. Principal component analysis and cluster analysis were applied to identify different classes of spectra. Three clusters of spectra are associated with two different states of volcanic activity. One cluster corresponds to a state of low to moderate activity, whereas the two other clusters are present during phases with a high magma column as inferred from the occurrence of lava fountains or effusions. We therefore conclude that variations in volcanic activity at Stromboli are usually linked to changes in the spectral characteristics of volcanic tremor. Site effects are evident when comparing the spectra calculated from signals synchronously recorded at STR and TF1. However, some major spectral peaks at both stations may reflect source properties. Statistical considerations and polarization analysis are in favor of a prevailing presence of P-waves in the tremor signal along with a position of the source northwest of the craters and at shallow depth.

  14. "Who Was 'Shadow'?" The Computer Knows: Applying Grammar-Program Statistics in Content Analyses to Solve Mysteries about Authorship.

    ERIC Educational Resources Information Center

    Ellis, Barbara G.; Dick, Steven J.

    1996-01-01

    Employs the statistics-documentation portion of a word-processing program's grammar-check feature together with qualitative analyses to determine that Henry Watterson, long-time editor of the "Louisville Courier-Journal," was probably the South's famed Civil War correspondent "Shadow." (TB)

  15. Statistical analyses support power law distributions found in neuronal avalanches.

    PubMed

    Klaus, Andreas; Yu, Shan; Plenz, Dietmar

    2011-01-01

    The size distribution of neuronal avalanches in cortical networks has been reported to follow a power law distribution with exponent close to -1.5, which is a reflection of long-range spatial correlations in spontaneous neuronal activity. However, identifying power law scaling in empirical data can be difficult and sometimes controversial. In the present study, we tested the power law hypothesis for neuronal avalanches by using more stringent statistical analyses. In particular, we performed the following steps: (i) analysis of finite-size scaling to identify scale-free dynamics in neuronal avalanches, (ii) model parameter estimation to determine the specific exponent of the power law, and (iii) comparison of the power law to alternative model distributions. Consistent with critical state dynamics, avalanche size distributions exhibited robust scaling behavior in which the maximum avalanche size was limited only by the spatial extent of sampling ("finite size" effect). This scale-free dynamics suggests the power law as a model for the distribution of avalanche sizes. Using both the Kolmogorov-Smirnov statistic and a maximum likelihood approach, we found the slope to be close to -1.5, which is in line with previous reports. Finally, the power law model for neuronal avalanches was compared to the exponential and to various heavy-tail distributions based on the Kolmogorov-Smirnov distance and by using a log-likelihood ratio test. Both the power law distribution without and with exponential cut-off provided significantly better fits to the cluster size distributions in neuronal avalanches than the exponential, the lognormal and the gamma distribution. In summary, our findings strongly support the power law scaling in neuronal avalanches, providing further evidence for critical state dynamics in superficial layers of cortex.

  16. Tafamidis delays disease progression in patients with early stage transthyretin familial amyloid polyneuropathy: additional supportive analyses from the pivotal trial.

    PubMed

    Keohane, Denis; Schwartz, Jeffrey; Gundapaneni, Balarama; Stewart, Michelle; Amass, Leslie

    2017-03-01

    Tafamidis, a non-NSAID highly specific transthyretin stabilizer, delayed neurologic disease progression as measured by Neuropathy Impairment Score-Lower Limbs (NIS-LL) in an 18-month, double-blind, placebo-controlled randomized trial in 128 patients with early-stage transthyretin V30M familial amyloid polyneuropathy (ATTRV30M-FAP). The current post hoc analyses aimed to further evaluate the effects of tafamidis in delaying ATTRV30M-FAP progression in this trial. Pre-specified, repeated-measures analysis of change from baseline in NIS-LL in this trial (ClinicalTrials.gov NCT00409175) was repeated with addition of baseline as covariate and multiple imputation analysis for missing data by treatment group. Change in NIS-LL plus three small-fiber nerve tests (NIS-LL + Σ3) and NIS-LL plus seven nerve tests (NIS-LL + Σ7) were assessed without baseline as covariate. Treatment outcomes over the NIS-LL, Σ3, Σ7, modified body mass index and Norfolk Quality of Life-Diabetic Neuropathy Total Quality of Life Score were also examined using multivariate analysis techniques. Neuropathy progression based on NIS-LL change from baseline to Month 18 remained significantly reduced for tafamidis versus placebo in the baseline-adjusted and multiple imputation analyses. NIS-LL + Σ3 and NIS-LL + Σ7 captured significant treatment group differences. Multivariate analyses provided strong statistical evidence for a superior tafamidis treatment effect. These supportive analyses confirm that tafamidis delays neurologic progression in early-stage ATTRV30M-FAP. NCT00409175.

  17. Statistical contact angle analyses; "slow moving" drops on a horizontal silicon-oxide surface.

    PubMed

    Schmitt, M; Grub, J; Heib, F

    2015-06-01

    Sessile drop experiments on horizontal surfaces are commonly used to characterise surface properties in science and in industry. The advancing angle and the receding angle are measurable on every solid. Specially on horizontal surfaces even the notions themselves are critically questioned by some authors. Building a standard, reproducible and valid method of measuring and defining specific (advancing/receding) contact angles is an important challenge of surface science. Recently we have developed two/three approaches, by sigmoid fitting, by independent and by dependent statistical analyses, which are practicable for the determination of specific angles/slopes if inclining the sample surface. These approaches lead to contact angle data which are independent on "user-skills" and subjectivity of the operator which is also of urgent need to evaluate dynamic measurements of contact angles. We will show in this contribution that the slightly modified procedures are also applicable to find specific angles for experiments on horizontal surfaces. As an example droplets on a flat freshly cleaned silicon-oxide surface (wafer) are dynamically measured by sessile drop technique while the volume of the liquid is increased/decreased. The triple points, the time, the contact angles during the advancing and the receding of the drop obtained by high-precision drop shape analysis are statistically analysed. As stated in the previous contribution the procedure is called "slow movement" analysis due to the small covered distance and the dominance of data points with low velocity. Even smallest variations in velocity such as the minimal advancing motion during the withdrawing of the liquid are identifiable which confirms the flatness and the chemical homogeneity of the sample surface and the high sensitivity of the presented approaches. Copyright © 2014 Elsevier Inc. All rights reserved.

  18. Lessons learned from additional research analyses of unsolved clinical exome cases.

    PubMed

    Eldomery, Mohammad K; Coban-Akdemir, Zeynep; Harel, Tamar; Rosenfeld, Jill A; Gambin, Tomasz; Stray-Pedersen, Asbjørg; Küry, Sébastien; Mercier, Sandra; Lessel, Davor; Denecke, Jonas; Wiszniewski, Wojciech; Penney, Samantha; Liu, Pengfei; Bi, Weimin; Lalani, Seema R; Schaaf, Christian P; Wangler, Michael F; Bacino, Carlos A; Lewis, Richard Alan; Potocki, Lorraine; Graham, Brett H; Belmont, John W; Scaglia, Fernando; Orange, Jordan S; Jhangiani, Shalini N; Chiang, Theodore; Doddapaneni, Harsha; Hu, Jianhong; Muzny, Donna M; Xia, Fan; Beaudet, Arthur L; Boerwinkle, Eric; Eng, Christine M; Plon, Sharon E; Sutton, V Reid; Gibbs, Richard A; Posey, Jennifer E; Yang, Yaping; Lupski, James R

    2017-03-21

    Given the rarity of most single-gene Mendelian disorders, concerted efforts of data exchange between clinical and scientific communities are critical to optimize molecular diagnosis and novel disease gene discovery. We designed and implemented protocols for the study of cases for which a plausible molecular diagnosis was not achieved in a clinical genomics diagnostic laboratory (i.e. unsolved clinical exomes). Such cases were recruited to a research laboratory for further analyses, in order to potentially: (1) accelerate novel disease gene discovery; (2) increase the molecular diagnostic yield of whole exome sequencing (WES); and (3) gain insight into the genetic mechanisms of disease. Pilot project data included 74 families, consisting mostly of parent-offspring trios. Analyses performed on a research basis employed both WES from additional family members and complementary bioinformatics approaches and protocols. Analysis of all possible modes of Mendelian inheritance, focusing on both single nucleotide variants (SNV) and copy number variant (CNV) alleles, yielded a likely contributory variant in 36% (27/74) of cases. If one includes candidate genes with variants identified within a single family, a potential contributory variant was identified in a total of ~51% (38/74) of cases enrolled in this pilot study. The molecular diagnosis was achieved in 30/63 trios (47.6%). Besides this, the analysis workflow yielded evidence for pathogenic variants in disease-associated genes in 4/6 singleton cases (66.6%), 1/1 multiplex family involving three affected siblings, and 3/4 (75%) quartet families. Both the analytical pipeline and the collaborative efforts between the diagnostic and research laboratories provided insights that allowed recent disease gene discoveries (PURA, TANGO2, EMC1, GNB5, ATAD3A, and MIPEP) and increased the number of novel genes, defined in this study as genes identified in more than one family (DHX30 and EBF3). An efficient genomics pipeline in which

  19. Detailed statistical contact angle analyses; "slow moving" drops on inclining silicon-oxide surfaces.

    PubMed

    Schmitt, M; Groß, K; Grub, J; Heib, F

    2015-06-01

    Contact angle determination by sessile drop technique is essential to characterise surface properties in science and in industry. Different specific angles can be observed on every solid which are correlated with the advancing or the receding of the triple line. Different procedures and definitions for the determination of specific angles exist which are often not comprehensible or reproducible. Therefore one of the most important things in this area is to build standard, reproducible and valid methods for determining advancing/receding contact angles. This contribution introduces novel techniques to analyse dynamic contact angle measurements (sessile drop) in detail which are applicable for axisymmetric and non-axisymmetric drops. Not only the recently presented fit solution by sigmoid function and the independent analysis of the different parameters (inclination, contact angle, velocity of the triple point) but also the dependent analysis will be firstly explained in detail. These approaches lead to contact angle data and different access on specific contact angles which are independent from "user-skills" and subjectivity of the operator. As example the motion behaviour of droplets on flat silicon-oxide surfaces after different surface treatments is dynamically measured by sessile drop technique when inclining the sample plate. The triple points, the inclination angles, the downhill (advancing motion) and the uphill angles (receding motion) obtained by high-precision drop shape analysis are independently and dependently statistically analysed. Due to the small covered distance for the dependent analysis (<0.4mm) and the dominance of counted events with small velocity the measurements are less influenced by motion dynamics and the procedure can be called "slow moving" analysis. The presented procedures as performed are especially sensitive to the range which reaches from the static to the "slow moving" dynamic contact angle determination. They are characterised by

  20. The heterogeneity statistic I(2) can be biased in small meta-analyses.

    PubMed

    von Hippel, Paul T

    2015-04-14

    Estimated effects vary across studies, partly because of random sampling error and partly because of heterogeneity. In meta-analysis, the fraction of variance that is due to heterogeneity is estimated by the statistic I(2). We calculate the bias of I(2), focusing on the situation where the number of studies in the meta-analysis is small. Small meta-analyses are common; in the Cochrane Library, the median number of studies per meta-analysis is 7 or fewer. We use Mathematica software to calculate the expectation and bias of I(2). I(2) has a substantial bias when the number of studies is small. The bias is positive when the true fraction of heterogeneity is small, but the bias is typically negative when the true fraction of heterogeneity is large. For example, with 7 studies and no true heterogeneity, I(2) will overestimate heterogeneity by an average of 12 percentage points, but with 7 studies and 80 percent true heterogeneity, I(2) can underestimate heterogeneity by an average of 28 percentage points. Biases of 12-28 percentage points are not trivial when one considers that, in the Cochrane Library, the median I(2) estimate is 21 percent. The point estimate I(2) should be interpreted cautiously when a meta-analysis has few studies. In small meta-analyses, confidence intervals should supplement or replace the biased point estimate I(2).

  1. Football goal distributions and extremal statistics

    NASA Astrophysics Data System (ADS)

    Greenhough, J.; Birch, P. C.; Chapman, S. C.; Rowlands, G.

    2002-12-01

    We analyse the distributions of the number of goals scored by home teams, away teams, and the total scored in the match, in domestic football games from 169 countries between 1999 and 2001. The probability density functions (PDFs) of goals scored are too heavy-tailed to be fitted over their entire ranges by Poisson or negative binomial distributions which would be expected for uncorrelated processes. Log-normal distributions cannot include zero scores and here we find that the PDFs are consistent with those arising from extremal statistics. In addition, we show that it is sufficient to model English top division and FA Cup matches in the seasons of 1970/71-2000/01 on Poisson or negative binomial distributions, as reported in analyses of earlier seasons, and that these are not consistent with extremal statistics.

  2. Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I2 statistic.

    PubMed

    Bowden, Jack; Del Greco M, Fabiola; Minelli, Cosetta; Davey Smith, George; Sheehan, Nuala A; Thompson, John R

    2016-12-01

    : MR-Egger regression has recently been proposed as a method for Mendelian randomization (MR) analyses incorporating summary data estimates of causal effect from multiple individual variants, which is robust to invalid instruments. It can be used to test for directional pleiotropy and provides an estimate of the causal effect adjusted for its presence. MR-Egger regression provides a useful additional sensitivity analysis to the standard inverse variance weighted (IVW) approach that assumes all variants are valid instruments. Both methods use weights that consider the single nucleotide polymorphism (SNP)-exposure associations to be known, rather than estimated. We call this the `NO Measurement Error' (NOME) assumption. Causal effect estimates from the IVW approach exhibit weak instrument bias whenever the genetic variants utilized violate the NOME assumption, which can be reliably measured using the F-statistic. The effect of NOME violation on MR-Egger regression has yet to be studied. An adaptation of the I2 statistic from the field of meta-analysis is proposed to quantify the strength of NOME violation for MR-Egger. It lies between 0 and 1, and indicates the expected relative bias (or dilution) of the MR-Egger causal estimate in the two-sample MR context. We call it IGX2 . The method of simulation extrapolation is also explored to counteract the dilution. Their joint utility is evaluated using simulated data and applied to a real MR example. In simulated two-sample MR analyses we show that, when a causal effect exists, the MR-Egger estimate of causal effect is biased towards the null when NOME is violated, and the stronger the violation (as indicated by lower values of IGX2 ), the stronger the dilution. When additionally all genetic variants are valid instruments, the type I error rate of the MR-Egger test for pleiotropy is inflated and the causal effect underestimated. Simulation extrapolation is shown to substantially mitigate these adverse effects. We

  3. Additional Measurements and Analyses of H217O and H218O

    NASA Astrophysics Data System (ADS)

    Pearson, John; Yu, Shanshan; Walters, Adam; Daly, Adam M.

    2015-06-01

    Historically the analysis of the spectrum of water has been a balance between the quality of the data set and the applicability of the Hamiltonian to a highly non-rigid molecule. Recently, a number of different non-rigid analysis approaches have successfully been applied to 16O water resulting in a self-consistent set of transitions and energy levels to high J which allowed the spectrum to be modeled to experimental precision. The data set for 17O and 18O water was previously reviewed and many of the problematic measurements identified, but Hamiltonian modeling of the remaining data resulted in significantly poorer quality fits than that for the 16O parent. As a result, we have made additional microwave measurements and modeled the existing 17O and 18O data sets with an Euler series model. This effort has illuminated a number of additional problematic measurements in the previous data sets and has resulted in analyses of 17O and 18O water that are of similar quality to the 16O analysis. We report the new lines, the analyses and make recommendations on the quality of the experimental data sets. SS. Yu, J.C. Pearson, B.J. Drouin et al. J. Mol. Spectrosc. 279,~16-25 (2012) J. Tennyson, P.F. Bernath, L.R. Brown et al. J. Quant. Spectrosc. Rad. Trans. 117, 29-58 (2013) J. Tennyson, P.F. Bernath, L.R. Brown et al. J. Quant. Spectrosc. Rad. Trans. 110, 573-596 (2009) H.M. Pickett, J.C. Pearson, C.E. Miller J. Mol. Spectrosc. 233, 174-179 (2005)

  4. Systematic Mapping and Statistical Analyses of Valley Landform and Vegetation Asymmetries Across Hydroclimatic Gradients

    NASA Astrophysics Data System (ADS)

    Poulos, M. J.; Pierce, J. L.; McNamara, J. P.; Flores, A. N.; Benner, S. G.

    2015-12-01

    Terrain aspect alters the spatial distribution of insolation across topography, driving eco-pedo-hydro-geomorphic feedbacks that can alter landform evolution and result in valley asymmetries for a suite of land surface characteristics (e.g. slope length and steepness, vegetation, soil properties, and drainage development). Asymmetric valleys serve as natural laboratories for studying how landscapes respond to climate perturbation. In the semi-arid montane granodioritic terrain of the Idaho batholith, Northern Rocky Mountains, USA, prior works indicate that reduced insolation on northern (pole-facing) aspects prolongs snow pack persistence, and is associated with thicker, finer-grained soils, that retain more water, prolong the growing season, support coniferous forest rather than sagebrush steppe ecosystems, stabilize slopes at steeper angles, and produce sparser drainage networks. We hypothesize that the primary drivers of valley asymmetry development are changes in the pedon-scale water-balance that coalesce to alter catchment-scale runoff and drainage development, and ultimately cause the divide between north and south-facing land surfaces to migrate northward. We explore this conceptual framework by coupling land surface analyses with statistical modeling to assess relationships and the relative importance of land surface characteristics. Throughout the Idaho batholith, we systematically mapped and tabulated various statistical measures of landforms, land cover, and hydroclimate within discrete valley segments (n=~10,000). We developed a random forest based statistical model to predict valley slope asymmetry based upon numerous measures (n>300) of landscape asymmetries. Preliminary results suggest that drainages are tightly coupled with hillslopes throughout the region, with drainage-network slope being one of the strongest predictors of land-surface-averaged slope asymmetry. When slope-related statistics are excluded, due to possible autocorrelation, valley

  5. Insights into Corona Formation through Statistical Analyses

    NASA Technical Reports Server (NTRS)

    Glaze, L. S.; Stofan, E. R.; Smrekar, S. E.; Baloga, S. M.

    2002-01-01

    Statistical analysis of an expanded database of coronae on Venus indicates that the populations of Type 1 (with fracture annuli) and 2 (without fracture annuli) corona diameters are statistically indistinguishable, and therefore we have no basis for assuming different formation mechanisms. Analysis of the topography and diameters of coronae shows that coronae that are depressions, rimmed depressions, and domes tend to be significantly smaller than those that are plateaus, rimmed plateaus, or domes with surrounding rims. This is consistent with the model of Smrekar and Stofan and inconsistent with predictions of the spreading drop model of Koch and Manga. The diameter range for domes, the initial stage of corona formation, provides a broad constraint on the buoyancy of corona-forming plumes. Coronae are only slightly more likely to be topographically raised than depressions, with Type 1 coronae most frequently occurring as rimmed depressions and Type 2 coronae most frequently occuring with flat interiors and raised rims. Most Type 1 coronae are located along chasmata systems or fracture belts, while Type 2 coronas are found predominantly as isolated features in the plains. Coronae at hotspot rises tend to be significantly larger than coronae in other settings, consistent with a hotter upper mantle at hotspot rises and their active state.

  6. Authigenic oxide Neodymium Isotopic composition as a proxy of seawater: applying multivariate statistical analyses.

    NASA Astrophysics Data System (ADS)

    McKinley, C. C.; Scudder, R.; Thomas, D. J.

    2016-12-01

    The Neodymium Isotopic composition (Nd IC) of oxide coatings has been applied as a tracer of water mass composition and used to address fundamental questions about past ocean conditions. The leached authigenic oxide coating from marine sediment is widely assumed to reflect the dissolved trace metal composition of the bottom water interacting with sediment at the seafloor. However, recent studies have shown that readily reducible sediment components, in addition to trace metal fluxes from the pore water, are incorporated into the bottom water, influencing the trace metal composition of leached oxide coatings. This challenges the prevailing application of the authigenic oxide Nd IC as a proxy of seawater composition. Therefore, it is important to identify the component end-members that create sediments of different lithology and determine if, or how they might contribute to the Nd IC of oxide coatings. To investigate lithologic influence on the results of sequential leaching, we selected two sites with complete bulk sediment statistical characterization. Site U1370 in the South Pacific Gyre, is predominantly composed of Rhyolite ( 60%) and has a distinguishable ( 10%) Fe-Mn Oxyhydroxide component (Dunlea et al., 2015). Site 1149 near the Izu-Bonin-Arc is predominantly composed of dispersed ash ( 20-50%) and eolian dust from Asia ( 50-80%) (Scudder et al., 2014). We perform a two-step leaching procedure: a 14 mL of 0.02 M hydroxylamine hydrochloride (HH) in 20% acetic acid buffered to a pH 4 for one hour, targeting metals bound to Fe- and Mn- oxides fractions, and a second HH leach for 12 hours, designed to remove any remaining oxides from the residual component. We analyze all three resulting fractions for a large suite of major, trace and rare earth elements, a sub-set of the samples are also analyzed for Nd IC. We use multivariate statistical analyses of the resulting geochemical data to identify how each component of the sediment partitions across the sequential

  7. ADDITIONAL STRESS AND FRACTURE MECHANICS ANALYSES OF PRESSURIZED WATER REACTOR PRESSURE VESSEL NOZZLES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Walter, Matthew; Yin, Shengjun; Stevens, Gary

    2012-01-01

    In past years, the authors have undertaken various studies of nozzles in both boiling water reactors (BWRs) and pressurized water reactors (PWRs) located in the reactor pressure vessel (RPV) adjacent to the core beltline region. Those studies described stress and fracture mechanics analyses performed to assess various RPV nozzle geometries, which were selected based on their proximity to the core beltline region, i.e., those nozzle configurations that are located close enough to the core region such that they may receive sufficient fluence prior to end-of-life (EOL) to require evaluation of embrittlement as part of the RPV analyses associated with pressure-temperaturemore » (P-T) limits. In this paper, additional stress and fracture analyses are summarized that were performed for additional PWR nozzles with the following objectives: To expand the population of PWR nozzle configurations evaluated, which was limited in the previous work to just two nozzles (one inlet and one outlet nozzle). To model and understand differences in stress results obtained for an internal pressure load case using a two-dimensional (2-D) axi-symmetric finite element model (FEM) vs. a three-dimensional (3-D) FEM for these PWR nozzles. In particular, the ovalization (stress concentration) effect of two intersecting cylinders, which is typical of RPV nozzle configurations, was investigated. To investigate the applicability of previously recommended linear elastic fracture mechanics (LEFM) hand solutions for calculating the Mode I stress intensity factor for a postulated nozzle corner crack for pressure loading for these PWR nozzles. These analyses were performed to further expand earlier work completed to support potential revision and refinement of Title 10 to the U.S. Code of Federal Regulations (CFR), Part 50, Appendix G, Fracture Toughness Requirements, and are intended to supplement similar evaluation of nozzles presented at the 2008, 2009, and 2011 Pressure Vessels and Piping (PVP

  8. A weighted U statistic for association analyses considering genetic heterogeneity.

    PubMed

    Wei, Changshuai; Elston, Robert C; Lu, Qing

    2016-07-20

    Converging evidence suggests that common complex diseases with the same or similar clinical manifestations could have different underlying genetic etiologies. While current research interests have shifted toward uncovering rare variants and structural variations predisposing to human diseases, the impact of heterogeneity in genetic studies of complex diseases has been largely overlooked. Most of the existing statistical methods assume the disease under investigation has a homogeneous genetic effect and could, therefore, have low power if the disease undergoes heterogeneous pathophysiological and etiological processes. In this paper, we propose a heterogeneity-weighted U (HWU) method for association analyses considering genetic heterogeneity. HWU can be applied to various types of phenotypes (e.g., binary and continuous) and is computationally efficient for high-dimensional genetic data. Through simulations, we showed the advantage of HWU when the underlying genetic etiology of a disease was heterogeneous, as well as the robustness of HWU against different model assumptions (e.g., phenotype distributions). Using HWU, we conducted a genome-wide analysis of nicotine dependence from the Study of Addiction: Genetics and Environments dataset. The genome-wide analysis of nearly one million genetic markers took 7h, identifying heterogeneous effects of two new genes (i.e., CYP3A5 and IKBKB) on nicotine dependence. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  9. Correlating tephras and cryptotephras using glass compositional analyses and numerical and statistical methods: Review and evaluation

    NASA Astrophysics Data System (ADS)

    Lowe, David J.; Pearce, Nicholas J. G.; Jorgensen, Murray A.; Kuehn, Stephen C.; Tryon, Christian A.; Hayward, Chris L.

    2017-11-01

    preferably from the same analytical session, should be presented alongside new analytical data. In part 2 of the review, we describe, critically assess, and recommend ways in which tephras or cryptotephras can be correlated (in conjunction with other information) using numerical or statistical analyses of compositional data. Statistical methods provide a less subjective means of dealing with analytical data pertaining to tephra components (usually glass or crystals/phenocrysts) than heuristic alternatives. They enable a better understanding of relationships among the data from multiple viewpoints to be developed and help quantify the degree of uncertainty in establishing correlations. In common with other scientific hypothesis testing, it is easier to infer using such analysis that two or more tephras are different rather than the same. Adding stratigraphic, chronological, spatial, or palaeoenvironmental data (i.e. multiple criteria) is usually necessary and allows for more robust correlations to be made. A two-stage approach is useful, the first focussed on differences in the mean composition of samples, or their range, which can be visualised graphically via scatterplot matrices or bivariate plots coupled with the use of statistical tools such as distance measures, similarity coefficients, hierarchical cluster analysis (informed by distance measures or similarity or cophenetic coefficients), and principal components analysis (PCA). Some statistical methods (cluster analysis, discriminant analysis) are referred to as 'machine learning' in the computing literature. The second stage examines sample variance and the degree of compositional similarity so that sample equivalence or otherwise can be established on a statistical basis. This stage may involve discriminant function analysis (DFA), support vector machines (SVMs), canonical variates analysis (CVA), and ANOVA or MANOVA (or its two-sample special case, the Hotelling two-sample T2 test). Randomization tests can be used

  10. Additive interaction between heterogeneous environmental ...

    EPA Pesticide Factsheets

    BACKGROUND Environmental exposures often occur in tandem; however, epidemiological research often focuses on singular exposures. Statistical interactions among broad, well-characterized environmental domains have not yet been evaluated in association with health. We address this gap by conducting a county-level cross-sectional analysis of interactions between Environmental Quality Index (EQI) domain indices on preterm birth in the Unites States from 2000-2005.METHODS: The EQI, a county-level index constructed for the 2000-2005 time period, was constructed from five domain-specific indices (air, water, land, built and sociodemographic) using principal component analyses. County-level preterm birth rates (n=3141) were estimated using live births from the National Center for Health Statistics. Linear regression was used to estimate prevalence differences (PD) and 95% confidence intervals (CI) comparing worse environmental quality to the better quality for each model for a) each individual domain main effect b) the interaction contrast and c) the two main effects plus interaction effect (i.e. the “net effect”) to show departure from additive interaction for the all U.S counties. Analyses were also performed for subgroupings by four urban/rural strata. RESULTS: We found the suggestion of antagonistic interactions but no synergism, along with several purely additive (i.e., no interaction) associations. In the non-stratified model, we observed antagonistic interac

  11. A Meta-Meta-Analysis: Empirical Review of Statistical Power, Type I Error Rates, Effect Sizes, and Model Selection of Meta-Analyses Published in Psychology

    ERIC Educational Resources Information Center

    Cafri, Guy; Kromrey, Jeffrey D.; Brannick, Michael T.

    2010-01-01

    This article uses meta-analyses published in "Psychological Bulletin" from 1995 to 2005 to describe meta-analyses in psychology, including examination of statistical power, Type I errors resulting from multiple comparisons, and model choice. Retrospective power estimates indicated that univariate categorical and continuous moderators, individual…

  12. Improved analyses using function datasets and statistical modeling

    Treesearch

    John S. Hogland; Nathaniel M. Anderson

    2014-01-01

    Raster modeling is an integral component of spatial analysis. However, conventional raster modeling techniques can require a substantial amount of processing time and storage space and have limited statistical functionality and machine learning algorithms. To address this issue, we developed a new modeling framework using C# and ArcObjects and integrated that framework...

  13. Temporal scaling and spatial statistical analyses of groundwater level fluctuations

    NASA Astrophysics Data System (ADS)

    Sun, H.; Yuan, L., Sr.; Zhang, Y.

    2017-12-01

    Natural dynamics such as groundwater level fluctuations can exhibit multifractionality and/or multifractality due likely to multi-scale aquifer heterogeneity and controlling factors, whose statistics requires efficient quantification methods. This study explores multifractionality and non-Gaussian properties in groundwater dynamics expressed by time series of daily level fluctuation at three wells located in the lower Mississippi valley, after removing the seasonal cycle in the temporal scaling and spatial statistical analysis. First, using the time-scale multifractional analysis, a systematic statistical method is developed to analyze groundwater level fluctuations quantified by the time-scale local Hurst exponent (TS-LHE). Results show that the TS-LHE does not remain constant, implying the fractal-scaling behavior changing with time and location. Hence, we can distinguish the potentially location-dependent scaling feature, which may characterize the hydrology dynamic system. Second, spatial statistical analysis shows that the increment of groundwater level fluctuations exhibits a heavy tailed, non-Gaussian distribution, which can be better quantified by a Lévy stable distribution. Monte Carlo simulations of the fluctuation process also show that the linear fractional stable motion model can well depict the transient dynamics (i.e., fractal non-Gaussian property) of groundwater level, while fractional Brownian motion is inadequate to describe natural processes with anomalous dynamics. Analysis of temporal scaling and spatial statistics therefore may provide useful information and quantification to understand further the nature of complex dynamics in hydrology.

  14. One-dimensional statistical parametric mapping in Python.

    PubMed

    Pataky, Todd C

    2012-01-01

    Statistical parametric mapping (SPM) is a topological methodology for detecting field changes in smooth n-dimensional continua. Many classes of biomechanical data are smooth and contained within discrete bounds and as such are well suited to SPM analyses. The current paper accompanies release of 'SPM1D', a free and open-source Python package for conducting SPM analyses on a set of registered 1D curves. Three example applications are presented: (i) kinematics, (ii) ground reaction forces and (iii) contact pressure distribution in probabilistic finite element modelling. In addition to offering a high-level interface to a variety of common statistical tests like t tests, regression and ANOVA, SPM1D also emphasises fundamental concepts of SPM theory through stand-alone example scripts. Source code and documentation are available at: www.tpataky.net/spm1d/.

  15. Statistical Analyses for Probabilistic Assessments of the Reactor Pressure Vessel Structural Integrity: Building a Master Curve on an Extract of the 'Euro' Fracture Toughness Dataset, Controlling Statistical Uncertainty for Both Mono-Temperature and multi-temperature tests

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Josse, Florent; Lefebvre, Yannick; Todeschini, Patrick

    2006-07-01

    Assessing the structural integrity of a nuclear Reactor Pressure Vessel (RPV) subjected to pressurized-thermal-shock (PTS) transients is extremely important to safety. In addition to conventional deterministic calculations to confirm RPV integrity, Electricite de France (EDF) carries out probabilistic analyses. Probabilistic analyses are interesting because some key variables, albeit conventionally taken at conservative values, can be modeled more accurately through statistical variability. One variable which significantly affects RPV structural integrity assessment is cleavage fracture initiation toughness. The reference fracture toughness method currently in use at EDF is the RCCM and ASME Code lower-bound K{sub IC} based on the indexing parameter RT{submore » NDT}. However, in order to quantify the toughness scatter for probabilistic analyses, the master curve method is being analyzed at present. Furthermore, the master curve method is a direct means of evaluating fracture toughness based on K{sub JC} data. In the framework of the master curve investigation undertaken by EDF, this article deals with the following two statistical items: building a master curve from an extract of a fracture toughness dataset (from the European project 'Unified Reference Fracture Toughness Design curves for RPV Steels') and controlling statistical uncertainty for both mono-temperature and multi-temperature tests. Concerning the first point, master curve temperature dependence is empirical in nature. To determine the 'original' master curve, Wallin postulated that a unified description of fracture toughness temperature dependence for ferritic steels is possible, and used a large number of data corresponding to nuclear-grade pressure vessel steels and welds. Our working hypothesis is that some ferritic steels may behave in slightly different ways. Therefore we focused exclusively on the basic french reactor vessel metal of types A508 Class 3 and A 533 grade B Class 1, taking the

  16. Additive effects in high-voltage layered-oxide cells: A statistics of mixtures approach

    DOE PAGES

    Sahore, Ritu; Peebles, Cameron; Abraham, Daniel P.; ...

    2017-07-20

    Li 1.03(Ni 0.5Mn 0.3Co 0.2) 0.97O 2 (NMC)-based coin cells containing the electrolyte additives vinylene carbonate (VC) and tris(trimethylsilyl)phosphite (TMSPi) in the range of 0-2 wt% were cycled between 3.0 and 4.4 V. The changes in capacity at rates of C/10 and C/1 and resistance at 60% state of charge were found to follow linear-with-time kinetic rate laws. Further, the C/10 capacity and resistance data were amenable to modeling by a statistics of mixtures approach. Applying physical meaning to the terms in the empirical models indicated that the interactions between the electrolyte and additives were not simple. For example, theremore » were strong, synergistic interactions between VC and TMSPi affecting C/10 capacity loss, as expected, but there were other, more subtle interactions between the electrolyte components. In conclusion, the interactions between these components controlled the C/10 capacity decline and resistance increase.« less

  17. Exploratory study on a statistical method to analyse time resolved data obtained during nanomaterial exposure measurements

    NASA Astrophysics Data System (ADS)

    Clerc, F.; Njiki-Menga, G.-H.; Witschger, O.

    2013-04-01

    Most of the measurement strategies that are suggested at the international level to assess workplace exposure to nanomaterials rely on devices measuring, in real time, airborne particles concentrations (according different metrics). Since none of the instruments to measure aerosols can distinguish a particle of interest to the background aerosol, the statistical analysis of time resolved data requires special attention. So far, very few approaches have been used for statistical analysis in the literature. This ranges from simple qualitative analysis of graphs to the implementation of more complex statistical models. To date, there is still no consensus on a particular approach and the current period is always looking for an appropriate and robust method. In this context, this exploratory study investigates a statistical method to analyse time resolved data based on a Bayesian probabilistic approach. To investigate and illustrate the use of the this statistical method, particle number concentration data from a workplace study that investigated the potential for exposure via inhalation from cleanout operations by sandpapering of a reactor producing nanocomposite thin films have been used. In this workplace study, the background issue has been addressed through the near-field and far-field approaches and several size integrated and time resolved devices have been used. The analysis of the results presented here focuses only on data obtained with two handheld condensation particle counters. While one was measuring at the source of the released particles, the other one was measuring in parallel far-field. The Bayesian probabilistic approach allows a probabilistic modelling of data series, and the observed task is modelled in the form of probability distributions. The probability distributions issuing from time resolved data obtained at the source can be compared with the probability distributions issuing from the time resolved data obtained far-field, leading in a

  18. A multi-criteria evaluation system for marine litter pollution based on statistical analyses of OSPAR beach litter monitoring time series.

    PubMed

    Schulz, Marcus; Neumann, Daniel; Fleet, David M; Matthies, Michael

    2013-12-01

    During the last decades, marine pollution with anthropogenic litter has become a worldwide major environmental concern. Standardized monitoring of litter since 2001 on 78 beaches selected within the framework of the Convention for the Protection of the Marine Environment of the North-East Atlantic (OSPAR) has been used to identify temporal trends of marine litter. Based on statistical analyses of this dataset a two-part multi-criteria evaluation system for beach litter pollution of the North-East Atlantic and the North Sea is proposed. Canonical correlation analyses, linear regression analyses, and non-parametric analyses of variance were used to identify different temporal trends. A classification of beaches was derived from cluster analyses and served to define different states of beach quality according to abundances of 17 input variables. The evaluation system is easily applicable and relies on the above-mentioned classification and on significant temporal trends implied by significant rank correlations. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. Statistical analyses and computational prediction of helical kinks in membrane proteins

    NASA Astrophysics Data System (ADS)

    Huang, Y.-H.; Chen, C.-M.

    2012-10-01

    We have carried out statistical analyses and computer simulations of helical kinks for TM helices in the PDBTM database. About 59 % of 1562 TM helices showed a significant kink, and 38 % of these kinks are associated with prolines in a range of ±4 residues. Our analyses show that helical kinks are more populated in the central region of helices, particularly in the range of 1-3 residues away from the helix center. Among 1,053 helical kinks analyzed, 88 % of kinks are bends (change in helix axis without loss of helical character) and 12 % are disruptions (change in helix axis and loss of helical character). It is found that proline residues tend to cause larger kink angles in helical bends, while this effect is not observed in helical disruptions. A further analysis of these kinked helices suggests that a kinked helix usually has 1-2 broken backbone hydrogen bonds with the corresponding N-O distance in the range of 4.2-8.7 Å, whose distribution is sharply peaked at 4.9 Å followed by an exponential decay with increasing distance. Our main aims of this study are to understand the formation of helical kinks and to predict their structural features. Therefore we further performed molecular dynamics (MD) simulations under four simulation scenarios to investigate kink formation in 37 kinked TM helices and 5 unkinked TM helices. The representative models of these kinked helices are predicted by a clustering algorithm, SPICKER, from numerous decoy structures possessing the above generic features of kinked helices. Our results show an accuracy of 95 % in predicting the kink position of kinked TM helices and an error less than 10° in the angle prediction of 71.4 % kinked helices. For unkinked helices, based on various structure similarity tests, our predicted models are highly consistent with their crystal structure. These results provide strong supports for the validity of our method in predicting the structure of TM helices.

  20. Diurnal fluctuations in brain volume: Statistical analyses of MRI from large populations.

    PubMed

    Nakamura, Kunio; Brown, Robert A; Narayanan, Sridar; Collins, D Louis; Arnold, Douglas L

    2015-09-01

    We investigated fluctuations in brain volume throughout the day using statistical modeling of magnetic resonance imaging (MRI) from large populations. We applied fully automated image analysis software to measure the brain parenchymal fraction (BPF), defined as the ratio of the brain parenchymal volume and intracranial volume, thus accounting for variations in head size. The MRI data came from serial scans of multiple sclerosis (MS) patients in clinical trials (n=755, 3269 scans) and from subjects participating in the Alzheimer's Disease Neuroimaging Initiative (ADNI, n=834, 6114 scans). The percent change in BPF was modeled with a linear mixed effect (LME) model, and the model was applied separately to the MS and ADNI datasets. The LME model for the MS datasets included random subject effects (intercept and slope over time) and fixed effects for the time-of-day, time from the baseline scan, and trial, which accounted for trial-related effects (for example, different inclusion criteria and imaging protocol). The model for ADNI additionally included the demographics (baseline age, sex, subject type [normal, mild cognitive impairment, or Alzheimer's disease], and interaction between subject type and time from baseline). There was a statistically significant effect of time-of-day on the BPF change in MS clinical trial datasets (-0.180 per day, that is, 0.180% of intracranial volume, p=0.019) as well as the ADNI dataset (-0.438 per day, that is, 0.438% of intracranial volume, p<0.0001), showing that the brain volume is greater in the morning. Linearly correcting the BPF values with the time-of-day reduced the required sample size to detect a 25% treatment effect (80% power and 0.05 significance level) on change in brain volume from 2 time-points over a period of 1year by 2.6%. Our results have significant implications for future brain volumetric studies, suggesting that there is a potential acquisition time bias that should be randomized or statistically controlled to

  1. Reservoir zonation based on statistical analyses: A case study of the Nubian sandstone, Gulf of Suez, Egypt

    NASA Astrophysics Data System (ADS)

    El Sharawy, Mohamed S.; Gaafar, Gamal R.

    2016-12-01

    Both reservoir engineers and petrophysicists have been concerned about dividing a reservoir into zones for engineering and petrophysics purposes. Through decades, several techniques and approaches were introduced. Out of them, statistical reservoir zonation, stratigraphic modified Lorenz (SML) plot and the principal component and clustering analyses techniques were chosen to apply on the Nubian sandstone reservoir of Palaeozoic - Lower Cretaceous age, Gulf of Suez, Egypt, by using five adjacent wells. The studied reservoir consists mainly of sandstone with some intercalation of shale layers with varying thickness from one well to another. The permeability ranged from less than 1 md to more than 1000 md. The statistical reservoir zonation technique, depending on core permeability, indicated that the cored interval of the studied reservoir can be divided into two zones. Using reservoir properties such as porosity, bulk density, acoustic impedance and interval transit time indicated also two zones with an obvious variation in separation depth and zones continuity. The stratigraphic modified Lorenz (SML) plot indicated the presence of more than 9 flow units in the cored interval as well as a high degree of microscopic heterogeneity. On the other hand, principal component and cluster analyses, depending on well logging data (gamma ray, sonic, density and neutron), indicated that the whole reservoir can be divided at least into four electrofacies having a noticeable variation in reservoir quality, as correlated with the measured permeability. Furthermore, continuity or discontinuity of the reservoir zones can be determined using this analysis.

  2. Statistical methods for meta-analyses including information from studies without any events-add nothing to nothing and succeed nevertheless.

    PubMed

    Kuss, O

    2015-03-30

    Meta-analyses with rare events, especially those that include studies with no event in one ('single-zero') or even both ('double-zero') treatment arms, are still a statistical challenge. In the case of double-zero studies, researchers in general delete these studies or use continuity corrections to avoid them. A number of arguments against both options has been given, and statistical methods that use the information from double-zero studies without using continuity corrections have been proposed. In this paper, we collect them and compare them by simulation. This simulation study tries to mirror real-life situations as completely as possible by deriving true underlying parameters from empirical data on actually performed meta-analyses. It is shown that for each of the commonly encountered effect estimators valid statistical methods are available that use the information from double-zero studies without using continuity corrections. Interestingly, all of them are truly random effects models, and so also the current standard method for very sparse data as recommended from the Cochrane collaboration, the Yusuf-Peto odds ratio, can be improved on. For actual analysis, we recommend to use beta-binomial regression methods to arrive at summary estimates for the odds ratio, the relative risk, or the risk difference. Methods that ignore information from double-zero studies or use continuity corrections should no longer be used. We illustrate the situation with an example where the original analysis ignores 35 double-zero studies, and a superior analysis discovers a clinically relevant advantage of off-pump surgery in coronary artery bypass grafting. Copyright © 2014 John Wiley & Sons, Ltd.

  3. Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data.

    PubMed

    Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B

    2008-08-07

    There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Taverna can be used by data analysis

  4. Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data

    PubMed Central

    Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B

    2008-01-01

    Background There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Results Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Conclusion Taverna can

  5. Inappropriate Fiddling with Statistical Analyses to Obtain a Desirable P-value: Tests to Detect its Presence in Published Literature

    PubMed Central

    Gadbury, Gary L.; Allison, David B.

    2012-01-01

    Much has been written regarding p-values below certain thresholds (most notably 0.05) denoting statistical significance and the tendency of such p-values to be more readily publishable in peer-reviewed journals. Intuition suggests that there may be a tendency to manipulate statistical analyses to push a “near significant p-value” to a level that is considered significant. This article presents a method for detecting the presence of such manipulation (herein called “fiddling”) in a distribution of p-values from independent studies. Simulations are used to illustrate the properties of the method. The results suggest that the method has low type I error and that power approaches acceptable levels as the number of p-values being studied approaches 1000. PMID:23056287

  6. Inappropriate fiddling with statistical analyses to obtain a desirable p-value: tests to detect its presence in published literature.

    PubMed

    Gadbury, Gary L; Allison, David B

    2012-01-01

    Much has been written regarding p-values below certain thresholds (most notably 0.05) denoting statistical significance and the tendency of such p-values to be more readily publishable in peer-reviewed journals. Intuition suggests that there may be a tendency to manipulate statistical analyses to push a "near significant p-value" to a level that is considered significant. This article presents a method for detecting the presence of such manipulation (herein called "fiddling") in a distribution of p-values from independent studies. Simulations are used to illustrate the properties of the method. The results suggest that the method has low type I error and that power approaches acceptable levels as the number of p-values being studied approaches 1000.

  7. Epidemiology Characteristics, Methodological Assessment and Reporting of Statistical Analysis of Network Meta-Analyses in the Field of Cancer

    PubMed Central

    Ge, Long; Tian, Jin-hui; Li, Xiu-xia; Song, Fujian; Li, Lun; Zhang, Jun; Li, Ge; Pei, Gai-qin; Qiu, Xia; Yang, Ke-hu

    2016-01-01

    Because of the methodological complexity of network meta-analyses (NMAs), NMAs may be more vulnerable to methodological risks than conventional pair-wise meta-analysis. Our study aims to investigate epidemiology characteristics, conduction of literature search, methodological quality and reporting of statistical analysis process in the field of cancer based on PRISMA extension statement and modified AMSTAR checklist. We identified and included 102 NMAs in the field of cancer. 61 NMAs were conducted using a Bayesian framework. Of them, more than half of NMAs did not report assessment of convergence (60.66%). Inconsistency was assessed in 27.87% of NMAs. Assessment of heterogeneity in traditional meta-analyses was more common (42.62%) than in NMAs (6.56%). Most of NMAs did not report assessment of similarity (86.89%) and did not used GRADE tool to assess quality of evidence (95.08%). 43 NMAs were adjusted indirect comparisons, the methods used were described in 53.49% NMAs. Only 4.65% NMAs described the details of handling of multi group trials and 6.98% described the methods of similarity assessment. The median total AMSTAR-score was 8.00 (IQR: 6.00–8.25). Methodological quality and reporting of statistical analysis did not substantially differ by selected general characteristics. Overall, the quality of NMAs in the field of cancer was generally acceptable. PMID:27848997

  8. Topographic ERP analyses: a step-by-step tutorial review.

    PubMed

    Murray, Micah M; Brunet, Denis; Michel, Christoph M

    2008-06-01

    In this tutorial review, we detail both the rationale for as well as the implementation of a set of analyses of surface-recorded event-related potentials (ERPs) that uses the reference-free spatial (i.e. topographic) information available from high-density electrode montages to render statistical information concerning modulations in response strength, latency, and topography both between and within experimental conditions. In these and other ways these topographic analysis methods allow the experimenter to glean additional information and neurophysiologic interpretability beyond what is available from canonical waveform analyses. In this tutorial we present the example of somatosensory evoked potentials (SEPs) in response to stimulation of each hand to illustrate these points. For each step of these analyses, we provide the reader with both a conceptual and mathematical description of how the analysis is carried out, what it yields, and how to interpret its statistical outcome. We show that these topographic analysis methods are intuitive and easy-to-use approaches that can remove much of the guesswork often confronting ERP researchers and also assist in identifying the information contained within high-density ERP datasets.

  9. Statistical Diversions

    ERIC Educational Resources Information Center

    Petocz, Peter; Sowey, Eric

    2012-01-01

    The term "data snooping" refers to the practice of choosing which statistical analyses to apply to a set of data after having first looked at those data. Data snooping contradicts a fundamental precept of applied statistics, that the scheme of analysis is to be planned in advance. In this column, the authors shall elucidate the…

  10. Integration of statistical and physiological analyses of adaptation of near-isogenic barley lines.

    PubMed

    Romagosa, I; Fox, P N; García Del Moral, L F; Ramos, J M; García Del Moral, B; Roca de Togores, F; Molina-Cano, J L

    1993-08-01

    Seven near-isogenic barley lines, differing for three independent mutant genes, were grown in 15 environments in Spain. Genotype x environment interaction (G x E) for grain yield was examined with the Additive Main Effects and Multiplicative interaction (AMMI) model. The results of this statistical analysis of multilocation yield-data were compared with a morpho-physiological characterization of the lines at two sites (Molina-Cano et al. 1990). The first two principal component axes from the AMMI analysis were strongly associated with the morpho-physiological characters. The independent but parallel discrimination among genotypes reflects genetic differences and highlights the power of the AMMI analysis as a tool to investigate G x E. Characters which appear to be positively associated with yield in the germplasm under study could be identified for some environments.

  11. Radiation Induced Chromatin Conformation Changes Analysed by Fluorescent Localization Microscopy, Statistical Physics, and Graph Theory

    PubMed Central

    Müller, Patrick; Hillebrandt, Sabina; Krufczik, Matthias; Bach, Margund; Kaufmann, Rainer; Hausmann, Michael; Heermann, Dieter W.

    2015-01-01

    It has been well established that the architecture of chromatin in cell nuclei is not random but functionally correlated. Chromatin damage caused by ionizing radiation raises complex repair machineries. This is accompanied by local chromatin rearrangements and structural changes which may for instance improve the accessibility of damaged sites for repair protein complexes. Using stably transfected HeLa cells expressing either green fluorescent protein (GFP) labelled histone H2B or yellow fluorescent protein (YFP) labelled histone H2A, we investigated the positioning of individual histone proteins in cell nuclei by means of high resolution localization microscopy (Spectral Position Determination Microscopy = SPDM). The cells were exposed to ionizing radiation of different doses and aliquots were fixed after different repair times for SPDM imaging. In addition to the repair dependent histone protein pattern, the positioning of antibodies specific for heterochromatin and euchromatin was separately recorded by SPDM. The present paper aims to provide a quantitative description of structural changes of chromatin after irradiation and during repair. It introduces a novel approach to analyse SPDM images by means of statistical physics and graph theory. The method is based on the calculation of the radial distribution functions as well as edge length distributions for graphs defined by a triangulation of the marker positions. The obtained results show that through the cell nucleus the different chromatin re-arrangements as detected by the fluorescent nucleosomal pattern average themselves. In contrast heterochromatic regions alone indicate a relaxation after radiation exposure and re-condensation during repair whereas euchromatin seemed to be unaffected or behave contrarily. SPDM in combination with the analysis techniques applied allows the systematic elucidation of chromatin re-arrangements after irradiation and during repair, if selected sub-regions of nuclei are

  12. Radiation induced chromatin conformation changes analysed by fluorescent localization microscopy, statistical physics, and graph theory.

    PubMed

    Zhang, Yang; Máté, Gabriell; Müller, Patrick; Hillebrandt, Sabina; Krufczik, Matthias; Bach, Margund; Kaufmann, Rainer; Hausmann, Michael; Heermann, Dieter W

    2015-01-01

    It has been well established that the architecture of chromatin in cell nuclei is not random but functionally correlated. Chromatin damage caused by ionizing radiation raises complex repair machineries. This is accompanied by local chromatin rearrangements and structural changes which may for instance improve the accessibility of damaged sites for repair protein complexes. Using stably transfected HeLa cells expressing either green fluorescent protein (GFP) labelled histone H2B or yellow fluorescent protein (YFP) labelled histone H2A, we investigated the positioning of individual histone proteins in cell nuclei by means of high resolution localization microscopy (Spectral Position Determination Microscopy = SPDM). The cells were exposed to ionizing radiation of different doses and aliquots were fixed after different repair times for SPDM imaging. In addition to the repair dependent histone protein pattern, the positioning of antibodies specific for heterochromatin and euchromatin was separately recorded by SPDM. The present paper aims to provide a quantitative description of structural changes of chromatin after irradiation and during repair. It introduces a novel approach to analyse SPDM images by means of statistical physics and graph theory. The method is based on the calculation of the radial distribution functions as well as edge length distributions for graphs defined by a triangulation of the marker positions. The obtained results show that through the cell nucleus the different chromatin re-arrangements as detected by the fluorescent nucleosomal pattern average themselves. In contrast heterochromatic regions alone indicate a relaxation after radiation exposure and re-condensation during repair whereas euchromatin seemed to be unaffected or behave contrarily. SPDM in combination with the analysis techniques applied allows the systematic elucidation of chromatin re-arrangements after irradiation and during repair, if selected sub-regions of nuclei are

  13. Sunspot activity and influenza pandemics: a statistical assessment of the purported association.

    PubMed

    Towers, S

    2017-10-01

    Since 1978, a series of papers in the literature have claimed to find a significant association between sunspot activity and the timing of influenza pandemics. This paper examines these analyses, and attempts to recreate the three most recent statistical analyses by Ertel (1994), Tapping et al. (2001), and Yeung (2006), which all have purported to find a significant relationship between sunspot numbers and pandemic influenza. As will be discussed, each analysis had errors in the data. In addition, in each analysis arbitrary selections or assumptions were also made, and the authors did not assess the robustness of their analyses to changes in those arbitrary assumptions. Varying the arbitrary assumptions to other, equally valid, assumptions negates the claims of significance. Indeed, an arbitrary selection made in one of the analyses appears to have resulted in almost maximal apparent significance; changing it only slightly yields a null result. This analysis applies statistically rigorous methodology to examine the purported sunspot/pandemic link, using more statistically powerful un-binned analysis methods, rather than relying on arbitrarily binned data. The analyses are repeated using both the Wolf and Group sunspot numbers. In all cases, no statistically significant evidence of any association was found. However, while the focus in this particular analysis was on the purported relationship of influenza pandemics to sunspot activity, the faults found in the past analyses are common pitfalls; inattention to analysis reproducibility and robustness assessment are common problems in the sciences, that are unfortunately not noted often enough in review.

  14. Cancer Statistics Animator

    Cancer.gov

    This tool allows users to animate cancer trends over time by cancer site and cause of death, race, and sex. Provides access to incidence, mortality, and survival. Select the type of statistic, variables, format, and then extract the statistics in a delimited format for further analyses.

  15. A power comparison of generalized additive models and the spatial scan statistic in a case-control setting

    PubMed Central

    2010-01-01

    Background A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. Results This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. Conclusions The GAM permutation testing methods

  16. A power comparison of generalized additive models and the spatial scan statistic in a case-control setting.

    PubMed

    Young, Robin L; Weinberg, Janice; Vieira, Verónica; Ozonoff, Al; Webster, Thomas F

    2010-07-19

    A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. The GAM permutation testing methods provide a regression

  17. Statistical approaches in published ophthalmic clinical science papers: a comparison to statistical practice two decades ago.

    PubMed

    Zhang, Harrison G; Ying, Gui-Shuang

    2018-02-09

    The aim of this study is to evaluate the current practice of statistical analysis of eye data in clinical science papers published in British Journal of Ophthalmology ( BJO ) and to determine whether the practice of statistical analysis has improved in the past two decades. All clinical science papers (n=125) published in BJO in January-June 2017 were reviewed for their statistical analysis approaches for analysing primary ocular measure. We compared our findings to the results from a previous paper that reviewed BJO papers in 1995. Of 112 papers eligible for analysis, half of the studies analysed the data at an individual level because of the nature of observation, 16 (14%) studies analysed data from one eye only, 36 (32%) studies analysed data from both eyes at ocular level, one study (1%) analysed the overall summary of ocular finding per individual and three (3%) studies used the paired comparison. Among studies with data available from both eyes, 50 (89%) of 56 papers in 2017 did not analyse data from both eyes or ignored the intereye correlation, as compared with in 60 (90%) of 67 papers in 1995 (P=0.96). Among studies that analysed data from both eyes at an ocular level, 33 (92%) of 36 studies completely ignored the intereye correlation in 2017, as compared with in 16 (89%) of 18 studies in 1995 (P=0.40). A majority of studies did not analyse the data properly when data from both eyes were available. The practice of statistical analysis did not improve in the past two decades. Collaborative efforts should be made in the vision research community to improve the practice of statistical analysis for ocular data. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  18. Statistics for X-chromosome associations.

    PubMed

    Özbek, Umut; Lin, Hui-Min; Lin, Yan; Weeks, Daniel E; Chen, Wei; Shaffer, John R; Purcell, Shaun M; Feingold, Eleanor

    2018-06-13

    In a genome-wide association study (GWAS), association between genotype and phenotype at autosomal loci is generally tested by regression models. However, X-chromosome data are often excluded from published analyses of autosomes because of the difference between males and females in number of X chromosomes. Failure to analyze X-chromosome data at all is obviously less than ideal, and can lead to missed discoveries. Even when X-chromosome data are included, they are often analyzed with suboptimal statistics. Several mathematically sensible statistics for X-chromosome association have been proposed. The optimality of these statistics, however, is based on very specific simple genetic models. In addition, while previous simulation studies of these statistics have been informative, they have focused on single-marker tests and have not considered the types of error that occur even under the null hypothesis when the entire X chromosome is scanned. In this study, we comprehensively tested several X-chromosome association statistics using simulation studies that include the entire chromosome. We also considered a wide range of trait models for sex differences and phenotypic effects of X inactivation. We found that models that do not incorporate a sex effect can have large type I error in some cases. We also found that many of the best statistics perform well even when there are modest deviations, such as trait variance differences between the sexes or small sex differences in allele frequencies, from assumptions. © 2018 WILEY PERIODICALS, INC.

  19. Time Series Expression Analyses Using RNA-seq: A Statistical Approach

    PubMed Central

    Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P.

    2013-01-01

    RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis. PMID:23586021

  20. Time series expression analyses using RNA-seq: a statistical approach.

    PubMed

    Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P

    2013-01-01

    RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.

  1. Contour plot assessment of existing meta-analyses confirms robust association of statin use and acute kidney injury risk.

    PubMed

    Chevance, Aurélie; Schuster, Tibor; Steele, Russell; Ternès, Nils; Platt, Robert W

    2015-10-01

    Robustness of an existing meta-analysis can justify decisions on whether to conduct an additional study addressing the same research question. We illustrate the graphical assessment of the potential impact of an additional study on an existing meta-analysis using published data on statin use and the risk of acute kidney injury. A previously proposed graphical augmentation approach is used to assess the sensitivity of the current test and heterogeneity statistics extracted from existing meta-analysis data. In addition, we extended the graphical augmentation approach to assess potential changes in the pooled effect estimate after updating a current meta-analysis and applied the three graphical contour definitions to data from meta-analyses on statin use and acute kidney injury risk. In the considered example data, the pooled effect estimates and heterogeneity indices demonstrated to be considerably robust to the addition of a future study. Supportingly, for some previously inconclusive meta-analyses, a study update might yield statistically significant kidney injury risk increase associated with higher statin exposure. The illustrated contour approach should become a standard tool for the assessment of the robustness of meta-analyses. It can guide decisions on whether to conduct additional studies addressing a relevant research question. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Conceptual and statistical problems associated with the use of diversity indices in ecology.

    PubMed

    Barrantes, Gilbert; Sandoval, Luis

    2009-09-01

    Diversity indices, particularly the Shannon-Wiener index, have extensively been used in analyzing patterns of diversity at different geographic and ecological scales. These indices have serious conceptual and statistical problems which make comparisons of species richness or species abundances across communities nearly impossible. There is often no a single statistical method that retains all information needed to answer even a simple question. However, multivariate analyses could be used instead of diversity indices, such as cluster analyses or multiple regressions. More complex multivariate analyses, such as Canonical Correspondence Analysis, provide very valuable information on environmental variables associated to the presence and abundance of the species in a community. In addition, particular hypotheses associated to changes in species richness across localities, or change in abundance of one, or a group of species can be tested using univariate, bivariate, and/or rarefaction statistical tests. The rarefaction method has proved to be robust to standardize all samples to a common size. Even the simplest method as reporting the number of species per taxonomic category possibly provides more information than a diversity index value.

  3. [Statistics for statistics?--Thoughts about psychological tools].

    PubMed

    Berger, Uwe; Stöbel-Richter, Yve

    2007-12-01

    Statistical methods take a prominent place among psychologists' educational programs. Being known as difficult to understand and heavy to learn, students fear of these contents. Those, who do not aspire after a research carrier at the university, will forget the drilled contents fast. Furthermore, because it does not apply for the work with patients and other target groups at a first glance, the methodological education as a whole was often questioned. For many psychological practitioners the statistical education makes only sense by enforcing respect against other professions, namely physicians. For the own business, statistics is rarely taken seriously as a professional tool. The reason seems to be clear: Statistics treats numbers, while psychotherapy treats subjects. So, does statistics ends in itself? With this article, we try to answer the question, if and how statistical methods were represented within the psychotherapeutical and psychological research. Therefore, we analyzed 46 Originals of a complete volume of the journal Psychotherapy, Psychosomatics, Psychological Medicine (PPmP). Within the volume, 28 different analyse methods were applied, from which 89 per cent were directly based upon statistics. To be able to write and critically read Originals as a backbone of research, presumes a high degree of statistical education. To ignore statistics means to ignore research and at least to reveal the own professional work to arbitrariness.

  4. Classical Statistics and Statistical Learning in Imaging Neuroscience

    PubMed Central

    Bzdok, Danilo

    2017-01-01

    Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques. PMID:29056896

  5. Dispensing Processes Impact Apparent Biological Activity as Determined by Computational and Statistical Analyses

    PubMed Central

    Ekins, Sean; Olechno, Joe; Williams, Antony J.

    2013-01-01

    Dispensing and dilution processes may profoundly influence estimates of biological activity of compounds. Published data show Ephrin type-B receptor 4 IC50 values obtained via tip-based serial dilution and dispensing versus acoustic dispensing with direct dilution differ by orders of magnitude with no correlation or ranking of datasets. We generated computational 3D pharmacophores based on data derived by both acoustic and tip-based transfer. The computed pharmacophores differ significantly depending upon dispensing and dilution methods. The acoustic dispensing-derived pharmacophore correctly identified active compounds in a subsequent test set where the tip-based method failed. Data from acoustic dispensing generates a pharmacophore containing two hydrophobic features, one hydrogen bond donor and one hydrogen bond acceptor. This is consistent with X-ray crystallography studies of ligand-protein interactions and automatically generated pharmacophores derived from this structural data. In contrast, the tip-based data suggest a pharmacophore with two hydrogen bond acceptors, one hydrogen bond donor and no hydrophobic features. This pharmacophore is inconsistent with the X-ray crystallographic studies and automatically generated pharmacophores. In short, traditional dispensing processes are another important source of error in high-throughput screening that impacts computational and statistical analyses. These findings have far-reaching implications in biological research. PMID:23658723

  6. Evaluation of General Classes of Reliability Estimators Often Used in Statistical Analyses of Quasi-Experimental Designs

    NASA Astrophysics Data System (ADS)

    Saini, K. K.; Sehgal, R. K.; Sethi, B. L.

    2008-10-01

    In this paper major reliability estimators are analyzed and there comparatively result are discussed. There strengths and weaknesses are evaluated in this case study. Each of the reliability estimators has certain advantages and disadvantages. Inter-rater reliability is one of the best ways to estimate reliability when your measure is an observation. However, it requires multiple raters or observers. As an alternative, you could look at the correlation of ratings of the same single observer repeated on two different occasions. Each of the reliability estimators will give a different value for reliability. In general, the test-retest and inter-rater reliability estimates will be lower in value than the parallel forms and internal consistency ones because they involve measuring at different times or with different raters. Since reliability estimates are often used in statistical analyses of quasi-experimental designs.

  7. Analysing recurrent hospitalizations in heart failure: a review of statistical methodology, with application to CHARM-Preserved.

    PubMed

    Rogers, Jennifer K; Pocock, Stuart J; McMurray, John J V; Granger, Christopher B; Michelson, Eric L; Östergren, Jan; Pfeffer, Marc A; Solomon, Scott D; Swedberg, Karl; Yusuf, Salim

    2014-01-01

    Heart failure is characterized by recurrent hospitalizations, but often only the first event is considered in clinical trial reports. In chronic diseases, such as heart failure, analysing all events gives a more complete picture of treatment benefit. We describe methods of analysing repeat hospitalizations, and illustrate their value in one major trial. The Candesartan in Heart failure Assessment of Reduction in Mortality and morbidity (CHARM)-Preserved study compared candesartan with placebo in 3023 patients with heart failure and preserved systolic function. The heart failure hospitalization rates were 12.5 and 8.9 per 100 patient-years in the placebo and candesartan groups, respectively. The repeat hospitalizations were analysed using the Andersen-Gill, Poisson, and negative binomial methods. Death was incorporated into analyses by treating it as an additional event. The win ratio method and a method that jointly models hospitalizations and mortality were also considered. Using repeat events gave larger treatment benefits than time to first event analysis. The negative binomial method for the composite of recurrent heart failure hospitalizations and cardiovascular death gave a rate ratio of 0.75 [95% confidence interval (CI) 0.62-0.91, P = 0.003], whereas the hazard ratio for time to first heart failure hospitalization or cardiovascular death was 0.86 (95% CI 0.74-1.00, P = 0.050). In patients with preserved EF, candesartan reduces the rate of admissions for worsening heart failure, to a greater extent than apparent from analysing only first hospitalizations. Recurrent events should be routinely incorporated into the analysis of future clinical trials in heart failure. © 2013 The Authors. European Journal of Heart Failure © 2013 European Society of Cardiology.

  8. Statistics Clinic

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

    2014-01-01

    Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.

  9. Statistical analysis of fNIRS data: a comprehensive review.

    PubMed

    Tak, Sungho; Ye, Jong Chul

    2014-01-15

    Functional near-infrared spectroscopy (fNIRS) is a non-invasive method to measure brain activities using the changes of optical absorption in the brain through the intact skull. fNIRS has many advantages over other neuroimaging modalities such as positron emission tomography (PET), functional magnetic resonance imaging (fMRI), or magnetoencephalography (MEG), since it can directly measure blood oxygenation level changes related to neural activation with high temporal resolution. However, fNIRS signals are highly corrupted by measurement noises and physiology-based systemic interference. Careful statistical analyses are therefore required to extract neuronal activity-related signals from fNIRS data. In this paper, we provide an extensive review of historical developments of statistical analyses of fNIRS signal, which include motion artifact correction, short source-detector separation correction, principal component analysis (PCA)/independent component analysis (ICA), false discovery rate (FDR), serially-correlated errors, as well as inference techniques such as the standard t-test, F-test, analysis of variance (ANOVA), and statistical parameter mapping (SPM) framework. In addition, to provide a unified view of various existing inference techniques, we explain a linear mixed effect model with restricted maximum likelihood (ReML) variance estimation, and show that most of the existing inference methods for fNIRS analysis can be derived as special cases. Some of the open issues in statistical analysis are also described. Copyright © 2013 Elsevier Inc. All rights reserved.

  10. Statistical Analyses of Hydrophobic Interactions: A Mini-Review

    DOE PAGES

    Pratt, Lawrence R.; Chaudhari, Mangesh I.; Rempe, Susan B.

    2016-07-14

    Here this review focuses on the striking recent progress in solving for hydrophobic interactions between small inert molecules. We discuss several new understandings. First, the inverse temperature phenomenology of hydrophobic interactions, i.e., strengthening of hydrophobic bonds with increasing temperature, is decisively exhibited by hydrophobic interactions between atomic-scale hard sphere solutes in water. Second, inclusion of attractive interactions associated with atomic-size hydrophobic reference cases leads to substantial, nontrivial corrections to reference results for purely repulsive solutes. Hydrophobic bonds are weakened by adding solute dispersion forces to treatment of reference cases. The classic statistical mechanical theory for those corrections is not accuratemore » in this application, but molecular quasi-chemical theory shows promise. Lastly, because of the masking roles of excluded volume and attractive interactions, comparisons that do not discriminate the different possibilities face an interpretive danger.« less

  11. Statistical issues in quality control of proteomic analyses: good experimental design and planning.

    PubMed

    Cairns, David A

    2011-03-01

    Quality control is becoming increasingly important in proteomic investigations as experiments become more multivariate and quantitative. Quality control applies to all stages of an investigation and statistics can play a key role. In this review, the role of statistical ideas in the design and planning of an investigation is described. This involves the design of unbiased experiments using key concepts from statistical experimental design, the understanding of the biological and analytical variation in a system using variance components analysis and the determination of a required sample size to perform a statistically powerful investigation. These concepts are described through simple examples and an example data set from a 2-D DIGE pilot experiment. Each of these concepts can prove useful in producing better and more reproducible data. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Reporting characteristics of meta-analyses in orthodontics: methodological assessment and statistical recommendations.

    PubMed

    Papageorgiou, Spyridon N; Papadopoulos, Moschos A; Athanasiou, Athanasios E

    2014-02-01

    Ideally meta-analyses (MAs) should consolidate the characteristics of orthodontic research in order to produce an evidence-based answer. However severe flaws are frequently observed in most of them. The aim of this study was to evaluate the statistical methods, the methodology, and the quality characteristics of orthodontic MAs and to assess their reporting quality during the last years. Electronic databases were searched for MAs (with or without a proper systematic review) in the field of orthodontics, indexed up to 2011. The AMSTAR tool was used for quality assessment of the included articles. Data were analyzed with Student's t-test, one-way ANOVA, and generalized linear modelling. Risk ratios with 95% confidence intervals were calculated to represent changes during the years in reporting of key items associated with quality. A total of 80 MAs with 1086 primary studies were included in this evaluation. Using the AMSTAR tool, 25 (27.3%) of the MAs were found to be of low quality, 37 (46.3%) of medium quality, and 18 (22.5%) of high quality. Specific characteristics like explicit protocol definition, extensive searches, and quality assessment of included trials were associated with a higher AMSTAR score. Model selection and dealing with heterogeneity or publication bias were often problematic in the identified reviews. The number of published orthodontic MAs is constantly increasing, while their overall quality is considered to range from low to medium. Although the number of MAs of medium and high level seems lately to rise, several other aspects need improvement to increase their overall quality.

  13. Statistics for NAEG: past efforts, new results, and future plans

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilbert, R.O.; Simpson, J.C.; Kinnison, R.R.

    A brief review of Nevada Applied Ecology Group (NAEG) objectives is followed by a summary of past statistical analyses conducted by Pacific Northwest Laboratory for the NAEG. Estimates of spatial pattern of radionuclides and other statistical analyses at NS's 201, 219 and 221 are reviewed as background for new analyses presented in this paper. Suggested NAEG activities and statistical analyses needed for the projected termination date of NAEG studies in March 1986 are given.

  14. Natural time analysis and Tsallis non-additive entropy statistical mechanics.

    NASA Astrophysics Data System (ADS)

    Sarlis, N. V.; Skordas, E. S.; Varotsos, P.

    2016-12-01

    Upon analyzing the seismic data in natural time and employing a sliding natural time window comprising a number of events that would occur in a few months, it has been recently uncovered[1] that a precursory Seismic Electric Signals activity[2] initiates almost simultaneously with the appearance of a minimum in the fluctuations of the order parameter of seismicity [3]. Such minima have been ascertained [4] during periods of the magnitude time series exhibiting long range correlations [5] a few months before all earthquakes of magnitude 7.6 or larger that occurred in the entire Japanese area from 1 January 1984 to 11 March 2011 (the day of the M9 Tohoku-Oki earthquake). Before and after these minima, characteristic changes of the temporal correlations between earthquake magnitudes are observed which cannot be captured by Tsallis non-additive entropy statistical mechanics in the frame of which it has been suggested that kappa distributions arise [6]. Here, we extend the study concerning the existence of such minima in a large area that includes Aegean Sea and its surrounding area which exhibits in general seismo-tectonics [7] different than that of the entire Japanese area. References P. A. Varotsos et al., Tectonophysics, 589 (2013) 116. P. Varotsos and M. Lazaridou, Tectonophysics 188 (1991) 321. P.A. Varotsos et al., Phys Rev E 72 (2005) 041103. N. V. Sarlis et al., Proc Natl Acad Sci USA 110 (2013) 13734. P. A. Varotsos, N. V. Sarlis, and E. S. Skordas, J Geophys Res Space Physics 119 (2014), 9192, doi: 10.1002/2014JA0205800. G. Livadiotis, and D. J. McComas, J Geophys Res 114 (2009) A11105, doi:10.1029/2009JA014352. S. Uyeda et al., Tectonophysics, 304 (1999) 41.

  15. Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses

    PubMed Central

    Park, Danny S.; Brown, Brielin; Eng, Celeste; Huntsman, Scott; Hu, Donglei; Torgerson, Dara G.; Burchard, Esteban G.; Zaitlen, Noah

    2015-01-01

    Motivation: Approaches to identifying new risk loci, training risk prediction models, imputing untyped variants and fine-mapping causal variants from summary statistics of genome-wide association studies are playing an increasingly important role in the human genetics community. Current summary statistics-based methods rely on global ‘best guess’ reference panels to model the genetic correlation structure of the dataset being studied. This approach, especially in admixed populations, has the potential to produce misleading results, ignores variation in local structure and is not feasible when appropriate reference panels are missing or small. Here, we develop a method, Adapt-Mix, that combines information across all available reference panels to produce estimates of local genetic correlation structure for summary statistics-based methods in arbitrary populations. Results: We applied Adapt-Mix to estimate the genetic correlation structure of both admixed and non-admixed individuals using simulated and real data. We evaluated our method by measuring the performance of two summary statistics-based methods: imputation and joint-testing. When using our method as opposed to the current standard of ‘best guess’ reference panels, we observed a 28% decrease in mean-squared error for imputation and a 73.7% decrease in mean-squared error for joint-testing. Availability and implementation: Our method is publicly available in a software package called ADAPT-Mix available at https://github.com/dpark27/adapt_mix. Contact: noah.zaitlen@ucsf.edu PMID:26072481

  16. Statistical power of intervention analyses: simulation and empirical application to treated lumber prices

    Treesearch

    Jeffrey P. Prestemon

    2009-01-01

    Timber product markets are subject to large shocks deriving from natural disturbances and policy shifts. Statistical modeling of shocks is often done to assess their economic importance. In this article, I simulate the statistical power of univariate and bivariate methods of shock detection using time series intervention models. Simulations show that bivariate methods...

  17. Strengthen forensic entomology in court--the need for data exploration and the validation of a generalised additive mixed model.

    PubMed

    Baqué, Michèle; Amendt, Jens

    2013-01-01

    Developmental data of juvenile blow flies (Diptera: Calliphoridae) are typically used to calculate the age of immature stages found on or around a corpse and thus to estimate a minimum post-mortem interval (PMI(min)). However, many of those data sets don't take into account that immature blow flies grow in a non-linear fashion. Linear models do not supply a sufficient reliability on age estimates and may even lead to an erroneous determination of the PMI(min). According to the Daubert standard and the need for improvements in forensic science, new statistic tools like smoothing methods and mixed models allow the modelling of non-linear relationships and expand the field of statistical analyses. The present study introduces into the background and application of these statistical techniques by analysing a model which describes the development of the forensically important blow fly Calliphora vicina at different temperatures. The comparison of three statistical methods (linear regression, generalised additive modelling and generalised additive mixed modelling) clearly demonstrates that only the latter provided regression parameters that reflect the data adequately. We focus explicitly on both the exploration of the data--to assure their quality and to show the importance of checking it carefully prior to conducting the statistical tests--and the validation of the resulting models. Hence, we present a common method for evaluating and testing forensic entomological data sets by using for the first time generalised additive mixed models.

  18. Bibliographic study showed improving statistical methodology of network meta-analyses published between 1999 and 2015.

    PubMed

    Petropoulou, Maria; Nikolakopoulou, Adriani; Veroniki, Areti-Angeliki; Rios, Patricia; Vafaei, Afshin; Zarin, Wasifa; Giannatsi, Myrsini; Sullivan, Shannon; Tricco, Andrea C; Chaimani, Anna; Egger, Matthias; Salanti, Georgia

    2017-02-01

    To assess the characteristics and core statistical methodology specific to network meta-analyses (NMAs) in clinical research articles. We searched MEDLINE, EMBASE, and the Cochrane Database of Systematic Reviews from inception until April 14, 2015, for NMAs of randomized controlled trials including at least four different interventions. Two reviewers independently screened potential studies, whereas data abstraction was performed by a single reviewer and verified by a second. A total of 456 NMAs, which included a median (interquartile range) of 21 (13-40) studies and 7 (5-9) treatment nodes, were assessed. A total of 125 NMAs (27%) were star networks; this proportion declined from 100% in 2005 to 19% in 2015 (P = 0.01 by test of trend). An increasing number of NMAs discussed transitivity or inconsistency (0% in 2005, 86% in 2015, P < 0.01) and 150 (45%) used appropriate methods to test for inconsistency (14% in 2006, 74% in 2015, P < 0.01). Heterogeneity was explored in 256 NMAs (56%), with no change over time (P = 0.10). All pairwise effects were reported in 234 NMAs (51%), with some increase over time (P = 0.02). The hierarchy of treatments was presented in 195 NMAs (43%), the probability of being best was most commonly reported (137 NMAs, 70%), but use of surface under the cumulative ranking curves increased steeply (0% in 2005, 33% in 2015, P < 0.01). Many NMAs published in the medical literature have significant limitations in both the conduct and reporting of the statistical analysis and numerical results. The situation has, however, improved in recent years, in particular with respect to the evaluation of the underlying assumptions, but considerable room for further improvements remains. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. Towards interoperable and reproducible QSAR analyses: Exchange of datasets.

    PubMed

    Spjuth, Ola; Willighagen, Egon L; Guha, Rajarshi; Eklund, Martin; Wikberg, Jarl Es

    2010-06-30

    QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but

  20. Towards interoperable and reproducible QSAR analyses: Exchange of datasets

    PubMed Central

    2010-01-01

    Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets

  1. Applications of spatial statistical network models to stream data

    USGS Publications Warehouse

    Isaak, Daniel J.; Peterson, Erin E.; Ver Hoef, Jay M.; Wenger, Seth J.; Falke, Jeffrey A.; Torgersen, Christian E.; Sowder, Colin; Steel, E. Ashley; Fortin, Marie-Josée; Jordan, Chris E.; Ruesch, Aaron S.; Som, Nicholas; Monestiez, Pascal

    2014-01-01

    Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for terrestrial applications and are not optimized for streams. A new class of spatial statistical model, based on valid covariance structures for stream networks, can be used with many common types of stream data (e.g., water quality attributes, habitat conditions, biological surveys) through application of appropriate distributions (e.g., Gaussian, binomial, Poisson). The spatial statistical network models account for spatial autocorrelation (i.e., nonindependence) among measurements, which allows their application to databases with clustered measurement locations. Large amounts of stream data exist in many areas where spatial statistical analyses could be used to develop novel insights, improve predictions at unsampled sites, and aid in the design of efficient monitoring strategies at relatively low cost. We review the topic of spatial autocorrelation and its effects on statistical inference, demonstrate the use of spatial statistics with stream datasets relevant to common research and management questions, and discuss additional applications and development potential for spatial statistics on stream networks. Free software for implementing the spatial statistical network models has been developed that enables custom applications with many stream databases.

  2. The use and misuse of statistical analyses. [in geophysics and space physics

    NASA Technical Reports Server (NTRS)

    Reiff, P. H.

    1983-01-01

    The statistical techniques most often used in space physics include Fourier analysis, linear correlation, auto- and cross-correlation, power spectral density, and superposed epoch analysis. Tests are presented which can evaluate the significance of the results obtained through each of these. Data presented without some form of error analysis are frequently useless, since they offer no way of assessing whether a bump on a spectrum or on a superposed epoch analysis is real or merely a statistical fluctuation. Among many of the published linear correlations, for instance, the uncertainty in the intercept and slope is not given, so that the significance of the fitted parameters cannot be assessed.

  3. Illustrating the practice of statistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hamada, Christina A; Hamada, Michael S

    2009-01-01

    The practice of statistics involves analyzing data and planning data collection schemes to answer scientific questions. Issues often arise with the data that must be dealt with and can lead to new procedures. In analyzing data, these issues can sometimes be addressed through the statistical models that are developed. Simulation can also be helpful in evaluating a new procedure. Moreover, simulation coupled with optimization can be used to plan a data collection scheme. The practice of statistics as just described is much more than just using a statistical package. In analyzing the data, it involves understanding the scientific problem andmore » incorporating the scientist's knowledge. In modeling the data, it involves understanding how the data were collected and accounting for limitations of the data where possible. Moreover, the modeling is likely to be iterative by considering a series of models and evaluating the fit of these models. Designing a data collection scheme involves understanding the scientist's goal and staying within hislher budget in terms of time and the available resources. Consequently, a practicing statistician is faced with such tasks and requires skills and tools to do them quickly. We have written this article for students to provide a glimpse of the practice of statistics. To illustrate the practice of statistics, we consider a problem motivated by some precipitation data that our relative, Masaru Hamada, collected some years ago. We describe his rain gauge observational study in Section 2. We describe modeling and an initial analysis of the precipitation data in Section 3. In Section 4, we consider alternative analyses that address potential issues with the precipitation data. In Section 5, we consider the impact of incorporating additional infonnation. We design a data collection scheme to illustrate the use of simulation and optimization in Section 6. We conclude this article in Section 7 with a discussion.« less

  4. Errors in statistical decision making Chapter 2 in Applied Statistics in Agricultural, Biological, and Environmental Sciences

    USDA-ARS?s Scientific Manuscript database

    Agronomic and Environmental research experiments result in data that are analyzed using statistical methods. These data are unavoidably accompanied by uncertainty. Decisions about hypotheses, based on statistical analyses of these data are therefore subject to error. This error is of three types,...

  5. Evaluation and application of summary statistic imputation to discover new height-associated loci.

    PubMed

    Rüeger, Sina; McDaid, Aaron; Kutalik, Zoltán

    2018-05-01

    As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian

  6. Evaluation and application of summary statistic imputation to discover new height-associated loci

    PubMed Central

    2018-01-01

    As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian

  7. Informal Statistics Help Desk

    NASA Technical Reports Server (NTRS)

    Young, M.; Koslovsky, M.; Schaefer, Caroline M.; Feiveson, A. H.

    2017-01-01

    Back by popular demand, the JSC Biostatistics Laboratory and LSAH statisticians are offering an opportunity to discuss your statistical challenges and needs. Take the opportunity to meet the individuals offering expert statistical support to the JSC community. Join us for an informal conversation about any questions you may have encountered with issues of experimental design, analysis, or data visualization. Get answers to common questions about sample size, repeated measures, statistical assumptions, missing data, multiple testing, time-to-event data, and when to trust the results of your analyses.

  8. Selected Streamflow Statistics and Regression Equations for Predicting Statistics at Stream Locations in Monroe County, Pennsylvania

    USGS Publications Warehouse

    Thompson, Ronald E.; Hoffman, Scott A.

    2006-01-01

    A suite of 28 streamflow statistics, ranging from extreme low to high flows, was computed for 17 continuous-record streamflow-gaging stations and predicted for 20 partial-record stations in Monroe County and contiguous counties in north-eastern Pennsylvania. The predicted statistics for the partial-record stations were based on regression analyses relating inter-mittent flow measurements made at the partial-record stations indexed to concurrent daily mean flows at continuous-record stations during base-flow conditions. The same statistics also were predicted for 134 ungaged stream locations in Monroe County on the basis of regression analyses relating the statistics to GIS-determined basin characteristics for the continuous-record station drainage areas. The prediction methodology for developing the regression equations used to estimate statistics was developed for estimating low-flow frequencies. This study and a companion study found that the methodology also has application potential for predicting intermediate- and high-flow statistics. The statistics included mean monthly flows, mean annual flow, 7-day low flows for three recurrence intervals, nine flow durations, mean annual base flow, and annual mean base flows for two recurrence intervals. Low standard errors of prediction and high coefficients of determination (R2) indicated good results in using the regression equations to predict the statistics. Regression equations for the larger flow statistics tended to have lower standard errors of prediction and higher coefficients of determination (R2) than equations for the smaller flow statistics. The report discusses the methodologies used in determining the statistics and the limitations of the statistics and the equations used to predict the statistics. Caution is indicated in using the predicted statistics for small drainage area situations. Study results constitute input needed by water-resource managers in Monroe County for planning purposes and evaluation

  9. Designing Intervention Studies: Selected Populations, Range Restrictions, and Statistical Power

    PubMed Central

    Miciak, Jeremy; Taylor, W. Pat; Stuebing, Karla K.; Fletcher, Jack M.; Vaughn, Sharon

    2016-01-01

    An appropriate estimate of statistical power is critical for the design of intervention studies. Although the inclusion of a pretest covariate in the test of the primary outcome can increase statistical power, samples selected on the basis of pretest performance may demonstrate range restriction on the selection measure and other correlated measures. This can result in attenuated pretest-posttest correlations, reducing the variance explained by the pretest covariate. We investigated the implications of two potential range restriction scenarios: direct truncation on a selection measure and indirect range restriction on correlated measures. Empirical and simulated data indicated direct range restriction on the pretest covariate greatly reduced statistical power and necessitated sample size increases of 82%–155% (dependent on selection criteria) to achieve equivalent statistical power to parameters with unrestricted samples. However, measures demonstrating indirect range restriction required much smaller sample size increases (32%–71%) under equivalent scenarios. Additional analyses manipulated the correlations between measures and pretest-posttest correlations to guide planning experiments. Results highlight the need to differentiate between selection measures and potential covariates and to investigate range restriction as a factor impacting statistical power. PMID:28479943

  10. Designing Intervention Studies: Selected Populations, Range Restrictions, and Statistical Power.

    PubMed

    Miciak, Jeremy; Taylor, W Pat; Stuebing, Karla K; Fletcher, Jack M; Vaughn, Sharon

    2016-01-01

    An appropriate estimate of statistical power is critical for the design of intervention studies. Although the inclusion of a pretest covariate in the test of the primary outcome can increase statistical power, samples selected on the basis of pretest performance may demonstrate range restriction on the selection measure and other correlated measures. This can result in attenuated pretest-posttest correlations, reducing the variance explained by the pretest covariate. We investigated the implications of two potential range restriction scenarios: direct truncation on a selection measure and indirect range restriction on correlated measures. Empirical and simulated data indicated direct range restriction on the pretest covariate greatly reduced statistical power and necessitated sample size increases of 82%-155% (dependent on selection criteria) to achieve equivalent statistical power to parameters with unrestricted samples. However, measures demonstrating indirect range restriction required much smaller sample size increases (32%-71%) under equivalent scenarios. Additional analyses manipulated the correlations between measures and pretest-posttest correlations to guide planning experiments. Results highlight the need to differentiate between selection measures and potential covariates and to investigate range restriction as a factor impacting statistical power.

  11. New software for statistical analysis of Cambridge Structural Database data

    PubMed Central

    Sykes, Richard A.; McCabe, Patrick; Allen, Frank H.; Battle, Gary M.; Bruno, Ian J.; Wood, Peter A.

    2011-01-01

    A collection of new software tools is presented for the analysis of geometrical, chemical and crystallographic data from the Cambridge Structural Database (CSD). This software supersedes the program Vista. The new functionality is integrated into the program Mercury in order to provide statistical, charting and plotting options alongside three-dimensional structural visualization and analysis. The integration also permits immediate access to other information about specific CSD entries through the Mercury framework, a common requirement in CSD data analyses. In addition, the new software includes a range of more advanced features focused towards structural analysis such as principal components analysis, cone-angle correction in hydrogen-bond analyses and the ability to deal with topological symmetry that may be exhibited in molecular search fragments. PMID:22477784

  12. Statistical power analysis in wildlife research

    USGS Publications Warehouse

    Steidl, R.J.; Hayes, J.P.

    1997-01-01

    Statistical power analysis can be used to increase the efficiency of research efforts and to clarify research results. Power analysis is most valuable in the design or planning phases of research efforts. Such prospective (a priori) power analyses can be used to guide research design and to estimate the number of samples necessary to achieve a high probability of detecting biologically significant effects. Retrospective (a posteriori) power analysis has been advocated as a method to increase information about hypothesis tests that were not rejected. However, estimating power for tests of null hypotheses that were not rejected with the effect size observed in the study is incorrect; these power estimates will always be a??0.50 when bias adjusted and have no relation to true power. Therefore, retrospective power estimates based on the observed effect size for hypothesis tests that were not rejected are misleading; retrospective power estimates are only meaningful when based on effect sizes other than the observed effect size, such as those effect sizes hypothesized to be biologically significant. Retrospective power analysis can be used effectively to estimate the number of samples or effect size that would have been necessary for a completed study to have rejected a specific null hypothesis. Simply presenting confidence intervals can provide additional information about null hypotheses that were not rejected, including information about the size of the true effect and whether or not there is adequate evidence to 'accept' a null hypothesis as true. We suggest that (1) statistical power analyses be routinely incorporated into research planning efforts to increase their efficiency, (2) confidence intervals be used in lieu of retrospective power analyses for null hypotheses that were not rejected to assess the likely size of the true effect, (3) minimum biologically significant effect sizes be used for all power analyses, and (4) if retrospective power estimates are to

  13. Views of medical students: what, when and how do they want statistics taught?

    PubMed

    Fielding, S; Poobalan, A; Prescott, G J; Marais, D; Aucott, L

    2015-11-01

    A key skill for a practising clinician is being able to do research, understand the statistical analyses and interpret results in the medical literature. Basic statistics has become essential within medical education, but when, what and in which format is uncertain. To inform curriculum design/development we undertook a quantitative survey of fifth year medical students and followed them up with a series of focus groups to obtain their opinions as to what statistics teaching they want, when and how. A total of 145 students undertook the survey and five focus groups were held with between 3 and 9 participants each. Previous statistical training varied and students recognised their knowledge was inadequate and keen to see additional training implemented. Students were aware of the importance of statistics to their future careers, but apprehensive about learning. Face-to-face teaching supported by online resources was popular. Focus groups indicated the need for statistical training early in their degree and highlighted their lack of confidence and inconsistencies in support. The study found that the students see the importance of statistics training in the medical curriculum but that timing and mode of delivery are key. The findings have informed the design of a new course to be implemented in the third undergraduate year. Teaching will be based around published studies aiming to equip students with the basics required with additional resources available through a virtual learning environment. © The Author(s) 2015.

  14. Statistical analysis and interpretation of prenatal diagnostic imaging studies, Part 2: descriptive and inferential statistical methods.

    PubMed

    Tuuli, Methodius G; Odibo, Anthony O

    2011-08-01

    The objective of this article is to discuss the rationale for common statistical tests used for the analysis and interpretation of prenatal diagnostic imaging studies. Examples from the literature are used to illustrate descriptive and inferential statistics. The uses and limitations of linear and logistic regression analyses are discussed in detail.

  15. 2017 Annual Disability Statistics Supplement

    ERIC Educational Resources Information Center

    Lauer, E. A; Houtenville, A. J.

    2018-01-01

    The "Annual Disability Statistics Supplement" is a companion report to the "Annual Disability Statistics Compendium." The "Supplement" presents statistics on the same topics as the "Compendium," with additional categorizations by demographic characteristics including age, gender and race/ethnicity. In…

  16. Limitations of Using Microsoft Excel Version 2016 (MS Excel 2016) for Statistical Analysis for Medical Research.

    PubMed

    Tanavalee, Chotetawan; Luksanapruksa, Panya; Singhatanadgige, Weerasak

    2016-06-01

    Microsoft Excel (MS Excel) is a commonly used program for data collection and statistical analysis in biomedical research. However, this program has many limitations, including fewer functions that can be used for analysis and a limited number of total cells compared with dedicated statistical programs. MS Excel cannot complete analyses with blank cells, and cells must be selected manually for analysis. In addition, it requires multiple steps of data transformation and formulas to plot survival analysis graphs, among others. The Megastat add-on program, which will be supported by MS Excel 2016 soon, would eliminate some limitations of using statistic formulas within MS Excel.

  17. A Genome-Wide Association Analysis Reveals Epistatic Cancellation of Additive Genetic Variance for Root Length in Arabidopsis thaliana.

    PubMed

    Lachowiec, Jennifer; Shen, Xia; Queitsch, Christine; Carlborg, Örjan

    2015-01-01

    Efforts to identify loci underlying complex traits generally assume that most genetic variance is additive. Here, we examined the genetics of Arabidopsis thaliana root length and found that the genomic narrow-sense heritability for this trait in the examined population was statistically zero. The low amount of additive genetic variance that could be captured by the genome-wide genotypes likely explains why no associations to root length could be found using standard additive-model-based genome-wide association (GWA) approaches. However, as the broad-sense heritability for root length was significantly larger, and primarily due to epistasis, we also performed an epistatic GWA analysis to map loci contributing to the epistatic genetic variance. Four interacting pairs of loci were revealed, involving seven chromosomal loci that passed a standard multiple-testing corrected significance threshold. The genotype-phenotype maps for these pairs revealed epistasis that cancelled out the additive genetic variance, explaining why these loci were not detected in the additive GWA analysis. Small population sizes, such as in our experiment, increase the risk of identifying false epistatic interactions due to testing for associations with very large numbers of multi-marker genotypes in few phenotyped individuals. Therefore, we estimated the false-positive risk using a new statistical approach that suggested half of the associated pairs to be true positive associations. Our experimental evaluation of candidate genes within the seven associated loci suggests that this estimate is conservative; we identified functional candidate genes that affected root development in four loci that were part of three of the pairs. The statistical epistatic analyses were thus indispensable for confirming known, and identifying new, candidate genes for root length in this population of wild-collected A. thaliana accessions. We also illustrate how epistatic cancellation of the additive genetic variance

  18. Statistical Analyses of High-Resolution Aircraft and Satellite Observations of Sea Ice: Applications for Improving Model Simulations

    NASA Astrophysics Data System (ADS)

    Farrell, S. L.; Kurtz, N. T.; Richter-Menge, J.; Harbeck, J. P.; Onana, V.

    2012-12-01

    Satellite-derived estimates of ice thickness and observations of ice extent over the last decade point to a downward trend in the basin-scale ice volume of the Arctic Ocean. This loss has broad-ranging impacts on the regional climate and ecosystems, as well as implications for regional infrastructure, marine navigation, national security, and resource exploration. New observational datasets at small spatial and temporal scales are now required to improve our understanding of physical processes occurring within the ice pack and advance parameterizations in the next generation of numerical sea-ice models. High-resolution airborne and satellite observations of the sea ice are now available at meter-scale resolution or better that provide new details on the properties and morphology of the ice pack across basin scales. For example the NASA IceBridge airborne campaign routinely surveys the sea ice of the Arctic and Southern Oceans with an advanced sensor suite including laser and radar altimeters and digital cameras that together provide high-resolution measurements of sea ice freeboard, thickness, snow depth and lead distribution. Here we present statistical analyses of the ice pack primarily derived from the following IceBridge instruments: the Digital Mapping System (DMS), a nadir-looking, high-resolution digital camera; the Airborne Topographic Mapper, a scanning lidar; and the University of Kansas snow radar, a novel instrument designed to estimate snow depth on sea ice. Together these instruments provide data from which a wide range of sea ice properties may be derived. We provide statistics on lead distribution and spacing, lead width and area, floe size and distance between floes, as well as ridge height, frequency and distribution. The goals of this study are to (i) identify unique statistics that can be used to describe the characteristics of specific ice regions, for example first-year/multi-year ice, diffuse ice edge/consolidated ice pack, and convergent

  19. Statistical Analyses of Brain Surfaces Using Gaussian Random Fields on 2-D Manifolds

    PubMed Central

    Staib, Lawrence H.; Xu, Dongrong; Zhu, Hongtu; Peterson, Bradley S.

    2008-01-01

    Interest in the morphometric analysis of the brain and its subregions has recently intensified because growth or degeneration of the brain in health or illness affects not only the volume but also the shape of cortical and subcortical brain regions, and new image processing techniques permit detection of small and highly localized perturbations in shape or localized volume, with remarkable precision. An appropriate statistical representation of the shape of a brain region is essential, however, for detecting, localizing, and interpreting variability in its surface contour and for identifying differences in volume of the underlying tissue that produce that variability across individuals and groups of individuals. Our statistical representation of the shape of a brain region is defined by a reference region for that region and by a Gaussian random field (GRF) that is defined across the entire surface of the region. We first select a reference region from a set of segmented brain images of healthy individuals. The GRF is then estimated as the signed Euclidean distances between points on the surface of the reference region and the corresponding points on the corresponding region in images of brains that have been coregistered to the reference. Correspondences between points on these surfaces are defined through deformations of each region of a brain into the coordinate space of the reference region using the principles of fluid dynamics. The warped, coregistered region of each subject is then unwarped into its native space, simultaneously bringing into that space the map of corresponding points that was established when the surfaces of the subject and reference regions were tightly coregistered. The proposed statistical description of the shape of surface contours makes no assumptions, other than smoothness, about the shape of the region or its GRF. The description also allows for the detection and localization of statistically significant differences in the shapes of

  20. Statistical innovations in diagnostic device evaluation.

    PubMed

    Yu, Tinghui; Li, Qin; Gray, Gerry; Yue, Lilly Q

    2016-01-01

    Due to rapid technological development, innovations in diagnostic devices are proceeding at an extremely fast pace. Accordingly, the needs for adopting innovative statistical methods have emerged in the evaluation of diagnostic devices. Statisticians in the Center for Devices and Radiological Health at the Food and Drug Administration have provided leadership in implementing statistical innovations. The innovations discussed in this article include: the adoption of bootstrap and Jackknife methods, the implementation of appropriate multiple reader multiple case study design, the application of robustness analyses for missing data, and the development of study designs and data analyses for companion diagnostics.

  1. Statistic analyses of the color experience according to the age of the observer.

    PubMed

    Hunjet, Anica; Parac-Osterman, Durdica; Vucaj, Edita

    2013-04-01

    Psychological experience of color is a real state of the communication between the environment and color, and it will depend on the source of the light, angle of the view, and particular on the observer and his health condition. Hering's theory or a theory of the opponent processes supposes that cones, which are situated in the retina of the eye, are not sensible on the three chromatic domains (areas, fields, zones) (red, green and purple-blue), but they produce a signal based on the principle of the opposed pairs of colors. A reason of this theory depends on the fact that certain disorders of the color eyesight, which include blindness to certain colors, cause blindness to pairs of opponent colors. This paper presents a demonstration of the experience of blue and yellow tone according to the age of the observer. For the testing of the statistically significant differences in the omission in the color experience according to the color of the background we use following statistical tests: Mann-Whitnney U Test, Kruskal-Wallis ANOVA and Median test. It was proven that the differences are statistically significant in the elderly persons (older than 35 years).

  2. Geographical origin discrimination of lentils (Lens culinaris Medik.) using 1H NMR fingerprinting and multivariate statistical analyses.

    PubMed

    Longobardi, Francesco; Innamorato, Valentina; Di Gioia, Annalisa; Ventrella, Andrea; Lippolis, Vincenzo; Logrieco, Antonio F; Catucci, Lucia; Agostiano, Angela

    2017-12-15

    Lentil samples coming from two different countries, i.e. Italy and Canada, were analysed using untargeted 1 H NMR fingerprinting in combination with chemometrics in order to build models able to classify them according to their geographical origin. For such aim, Soft Independent Modelling of Class Analogy (SIMCA), k-Nearest Neighbor (k-NN), Principal Component Analysis followed by Linear Discriminant Analysis (PCA-LDA) and Partial Least Squares-Discriminant Analysis (PLS-DA) were applied to the NMR data and the results were compared. The best combination of average recognition (100%) and cross-validation prediction abilities (96.7%) was obtained for the PCA-LDA. All the statistical models were validated both by using a test set and by carrying out a Monte Carlo Cross Validation: the obtained performances were found to be satisfying for all the models, with prediction abilities higher than 95% demonstrating the suitability of the developed methods. Finally, the metabolites that mostly contributed to the lentil discrimination were indicated. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. An Embedded Statistical Method for Coupling Molecular Dynamics and Finite Element Analyses

    NASA Technical Reports Server (NTRS)

    Saether, E.; Glaessgen, E.H.; Yamakov, V.

    2008-01-01

    The coupling of molecular dynamics (MD) simulations with finite element methods (FEM) yields computationally efficient models that link fundamental material processes at the atomistic level with continuum field responses at higher length scales. The theoretical challenge involves developing a seamless connection along an interface between two inherently different simulation frameworks. Various specialized methods have been developed to solve particular classes of problems. Many of these methods link the kinematics of individual MD atoms with FEM nodes at their common interface, necessarily requiring that the finite element mesh be refined to atomic resolution. Some of these coupling approaches also require simulations to be carried out at 0 K and restrict modeling to two-dimensional material domains due to difficulties in simulating full three-dimensional material processes. In the present work, a new approach to MD-FEM coupling is developed based on a restatement of the standard boundary value problem used to define a coupled domain. The method replaces a direct linkage of individual MD atoms and finite element (FE) nodes with a statistical averaging of atomistic displacements in local atomic volumes associated with each FE node in an interface region. The FEM and MD computational systems are effectively independent and communicate only through an iterative update of their boundary conditions. With the use of statistical averages of the atomistic quantities to couple the two computational schemes, the developed approach is referred to as an embedded statistical coupling method (ESCM). ESCM provides an enhanced coupling methodology that is inherently applicable to three-dimensional domains, avoids discretization of the continuum model to atomic scale resolution, and permits finite temperature states to be applied.

  4. Alcohol intake and gastric cancer: Meta-analyses of published data versus individual participant data pooled analyses (StoP Project).

    PubMed

    Ferro, Ana; Morais, Samantha; Rota, Matteo; Pelucchi, Claudio; Bertuccio, Paola; Bonzi, Rossella; Galeone, Carlotta; Zhang, Zuo-Feng; Matsuo, Keitaro; Ito, Hidemi; Hu, Jinfu; Johnson, Kenneth C; Yu, Guo-Pei; Palli, Domenico; Ferraroni, Monica; Muscat, Joshua; Malekzadeh, Reza; Ye, Weimin; Song, Huan; Zaridze, David; Maximovitch, Dmitry; Fernández de Larrea, Nerea; Kogevinas, Manolis; Vioque, Jesus; Navarrete-Muñoz, Eva M; Pakseresht, Mohammadreza; Pourfarzi, Farhad; Wolk, Alicja; Orsini, Nicola; Bellavia, Andrea; Håkansson, Niclas; Mu, Lina; Pastorino, Roberta; Kurtz, Robert C; Derakhshan, Mohammad H; Lagiou, Areti; Lagiou, Pagona; Boffetta, Paolo; Boccia, Stefania; Negri, Eva; La Vecchia, Carlo; Peleteiro, Bárbara; Lunet, Nuno

    2018-05-01

    Individual participant data pooled analyses allow access to non-published data and statistical reanalyses based on more homogeneous criteria than meta-analyses based on systematic reviews. We quantified the impact of publication-related biases and heterogeneity in data analysis and presentation in summary estimates of the association between alcohol drinking and gastric cancer. We compared estimates obtained from conventional meta-analyses, using only data available in published reports from studies that take part in the Stomach Cancer Pooling (StoP) Project, with individual participant data pooled analyses including the same studies. A total of 22 studies from the StoP Project assessed the relation between alcohol intake and gastric cancer, 19 had specific data for levels of consumption and 18 according to cancer location; published reports addressing these associations were available from 18, 5 and 5 studies, respectively. The summary odds ratios [OR, (95%CI)] estimate obtained with published data for drinkers vs. non-drinkers was 10% higher than the one obtained with individual StoP data [18 vs. 22 studies: 1.21 (1.07-1.36) vs. 1.10 (0.99-1.23)] and more heterogeneous (I 2 : 63.6% vs 54.4%). In general, published data yielded less precise summary estimates (standard errors up to 2.6 times higher). Funnel plot analysis suggested publication bias. Meta-analyses of the association between alcohol drinking and gastric cancer tended to overestimate the magnitude of the effects, possibly due to publication bias. Additionally, individual participant data pooled analyses yielded more precise estimates for different levels of exposure or cancer subtypes. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. Statistical limitations in functional neuroimaging. I. Non-inferential methods and statistical models.

    PubMed Central

    Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P

    1999-01-01

    Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149

  6. How Big of a Problem is Analytic Error in Secondary Analyses of Survey Data?

    PubMed

    West, Brady T; Sakshaug, Joseph W; Aurelien, Guy Alain S

    2016-01-01

    Secondary analyses of survey data collected from large probability samples of persons or establishments further scientific progress in many fields. The complex design features of these samples improve data collection efficiency, but also require analysts to account for these features when conducting analysis. Unfortunately, many secondary analysts from fields outside of statistics, biostatistics, and survey methodology do not have adequate training in this area, and as a result may apply incorrect statistical methods when analyzing these survey data sets. This in turn could lead to the publication of incorrect inferences based on the survey data that effectively negate the resources dedicated to these surveys. In this article, we build on the results of a preliminary meta-analysis of 100 peer-reviewed journal articles presenting analyses of data from a variety of national health surveys, which suggested that analytic errors may be extremely prevalent in these types of investigations. We first perform a meta-analysis of a stratified random sample of 145 additional research products analyzing survey data from the Scientists and Engineers Statistical Data System (SESTAT), which describes features of the U.S. Science and Engineering workforce, and examine trends in the prevalence of analytic error across the decades used to stratify the sample. We once again find that analytic errors appear to be quite prevalent in these studies. Next, we present several example analyses of real SESTAT data, and demonstrate that a failure to perform these analyses correctly can result in substantially biased estimates with standard errors that do not adequately reflect complex sample design features. Collectively, the results of this investigation suggest that reviewers of this type of research need to pay much closer attention to the analytic methods employed by researchers attempting to publish or present secondary analyses of survey data.

  7. How Big of a Problem is Analytic Error in Secondary Analyses of Survey Data?

    PubMed Central

    West, Brady T.; Sakshaug, Joseph W.; Aurelien, Guy Alain S.

    2016-01-01

    Secondary analyses of survey data collected from large probability samples of persons or establishments further scientific progress in many fields. The complex design features of these samples improve data collection efficiency, but also require analysts to account for these features when conducting analysis. Unfortunately, many secondary analysts from fields outside of statistics, biostatistics, and survey methodology do not have adequate training in this area, and as a result may apply incorrect statistical methods when analyzing these survey data sets. This in turn could lead to the publication of incorrect inferences based on the survey data that effectively negate the resources dedicated to these surveys. In this article, we build on the results of a preliminary meta-analysis of 100 peer-reviewed journal articles presenting analyses of data from a variety of national health surveys, which suggested that analytic errors may be extremely prevalent in these types of investigations. We first perform a meta-analysis of a stratified random sample of 145 additional research products analyzing survey data from the Scientists and Engineers Statistical Data System (SESTAT), which describes features of the U.S. Science and Engineering workforce, and examine trends in the prevalence of analytic error across the decades used to stratify the sample. We once again find that analytic errors appear to be quite prevalent in these studies. Next, we present several example analyses of real SESTAT data, and demonstrate that a failure to perform these analyses correctly can result in substantially biased estimates with standard errors that do not adequately reflect complex sample design features. Collectively, the results of this investigation suggest that reviewers of this type of research need to pay much closer attention to the analytic methods employed by researchers attempting to publish or present secondary analyses of survey data. PMID:27355817

  8. Biological Parametric Mapping: A Statistical Toolbox for Multi-Modality Brain Image Analysis

    PubMed Central

    Casanova, Ramon; Ryali, Srikanth; Baer, Aaron; Laurienti, Paul J.; Burdette, Jonathan H.; Hayasaka, Satoru; Flowers, Lynn; Wood, Frank; Maldjian, Joseph A.

    2006-01-01

    In recent years multiple brain MR imaging modalities have emerged; however, analysis methodologies have mainly remained modality specific. In addition, when comparing across imaging modalities, most researchers have been forced to rely on simple region-of-interest type analyses, which do not allow the voxel-by-voxel comparisons necessary to answer more sophisticated neuroscience questions. To overcome these limitations, we developed a toolbox for multimodal image analysis called biological parametric mapping (BPM), based on a voxel-wise use of the general linear model. The BPM toolbox incorporates information obtained from other modalities as regressors in a voxel-wise analysis, thereby permitting investigation of more sophisticated hypotheses. The BPM toolbox has been developed in MATLAB with a user friendly interface for performing analyses, including voxel-wise multimodal correlation, ANCOVA, and multiple regression. It has a high degree of integration with the SPM (statistical parametric mapping) software relying on it for visualization and statistical inference. Furthermore, statistical inference for a correlation field, rather than a widely-used T-field, has been implemented in the correlation analysis for more accurate results. An example with in-vivo data is presented demonstrating the potential of the BPM methodology as a tool for multimodal image analysis. PMID:17070709

  9. The effect of rare variants on inflation of the test statistics in case-control analyses.

    PubMed

    Pirie, Ailith; Wood, Angela; Lush, Michael; Tyrer, Jonathan; Pharoah, Paul D P

    2015-02-20

    The detection of bias due to cryptic population structure is an important step in the evaluation of findings of genetic association studies. The standard method of measuring this bias in a genetic association study is to compare the observed median association test statistic to the expected median test statistic. This ratio is inflated in the presence of cryptic population structure. However, inflation may also be caused by the properties of the association test itself particularly in the analysis of rare variants. We compared the properties of the three most commonly used association tests: the likelihood ratio test, the Wald test and the score test when testing rare variants for association using simulated data. We found evidence of inflation in the median test statistics of the likelihood ratio and score tests for tests of variants with less than 20 heterozygotes across the sample, regardless of the total sample size. The test statistics for the Wald test were under-inflated at the median for variants below the same minor allele frequency. In a genetic association study, if a substantial proportion of the genetic variants tested have rare minor allele frequencies, the properties of the association test may mask the presence or absence of bias due to population structure. The use of either the likelihood ratio test or the score test is likely to lead to inflation in the median test statistic in the absence of population structure. In contrast, the use of the Wald test is likely to result in under-inflation of the median test statistic which may mask the presence of population structure.

  10. A statistical anomaly indicates symbiotic origins of eukaryotic membranes

    PubMed Central

    Bansal, Suneyna; Mittal, Aditya

    2015-01-01

    Compositional analyses of nucleic acids and proteins have shed light on possible origins of living cells. In this work, rigorous compositional analyses of ∼5000 plasma membrane lipid constituents of 273 species in the three life domains (archaea, eubacteria, and eukaryotes) revealed a remarkable statistical paradox, indicating symbiotic origins of eukaryotic cells involving eubacteria. For lipids common to plasma membranes of the three domains, the number of carbon atoms in eubacteria was found to be similar to that in eukaryotes. However, mutually exclusive subsets of same data show exactly the opposite—the number of carbon atoms in lipids of eukaryotes was higher than in eubacteria. This statistical paradox, called Simpson's paradox, was absent for lipids in archaea and for lipids not common to plasma membranes of the three domains. This indicates the presence of interaction(s) and/or association(s) in lipids forming plasma membranes of eubacteria and eukaryotes but not for those in archaea. Further inspection of membrane lipid structures affecting physicochemical properties of plasma membranes provides the first evidence (to our knowledge) on the symbiotic origins of eukaryotic cells based on the “third front” (i.e., lipids) in addition to the growing compositional data from nucleic acids and proteins. PMID:25631820

  11. Parametric and Nonparametric Statistical Methods for Genomic Selection of Traits with Additive and Epistatic Genetic Architectures

    PubMed Central

    Howard, Réka; Carriquiry, Alicia L.; Beavis, William D.

    2014-01-01

    Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cπ. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE. PMID:24727289

  12. Tipping points in the arctic: eyeballing or statistical significance?

    PubMed

    Carstensen, Jacob; Weydmann, Agata

    2012-02-01

    Arctic ecosystems have experienced and are projected to experience continued large increases in temperature and declines in sea ice cover. It has been hypothesized that small changes in ecosystem drivers can fundamentally alter ecosystem functioning, and that this might be particularly pronounced for Arctic ecosystems. We present a suite of simple statistical analyses to identify changes in the statistical properties of data, emphasizing that changes in the standard error should be considered in addition to changes in mean properties. The methods are exemplified using sea ice extent, and suggest that the loss rate of sea ice accelerated by factor of ~5 in 1996, as reported in other studies, but increases in random fluctuations, as an early warning signal, were observed already in 1990. We recommend to employ the proposed methods more systematically for analyzing tipping points to document effects of climate change in the Arctic.

  13. Primarily Statistics: Developing an Introductory Statistics Course for Pre-Service Elementary Teachers

    ERIC Educational Resources Information Center

    Green, Jennifer L.; Blankenship, Erin E.

    2013-01-01

    We developed an introductory statistics course for pre-service elementary teachers. In this paper, we describe the goals and structure of the course, as well as the assessments we implemented. Additionally, we use example course work to demonstrate pre-service teachers' progress both in learning statistics and as novice teachers. Overall, the…

  14. Using R-Project for Free Statistical Analysis in Extension Research

    ERIC Educational Resources Information Center

    Mangiafico, Salvatore S.

    2013-01-01

    One option for Extension professionals wishing to use free statistical software is to use online calculators, which are useful for common, simple analyses. A second option is to use a free computing environment capable of performing statistical analyses, like R-project. R-project is free, cross-platform, powerful, and respected, but may be…

  15. Statistics of Statisticians: Critical Mass of Statistics and Operational Research Groups

    NASA Astrophysics Data System (ADS)

    Kenna, Ralph; Berche, Bertrand

    Using a recently developed model, inspired by mean field theory in statistical physics, and data from the UK's Research Assessment Exercise, we analyse the relationship between the qualities of statistics and operational research groups and the quantities of researchers in them. Similar to other academic disciplines, we provide evidence for a linear dependency of quality on quantity up to an upper critical mass, which is interpreted as the average maximum number of colleagues with whom a researcher can communicate meaningfully within a research group. The model also predicts a lower critical mass, which research groups should strive to achieve to avoid extinction. For statistics and operational research, the lower critical mass is estimated to be 9 ± 3. The upper critical mass, beyond which research quality does not significantly depend on group size, is 17 ± 6.

  16. Grey literature in meta-analyses.

    PubMed

    Conn, Vicki S; Valentine, Jeffrey C; Cooper, Harris M; Rantz, Marilyn J

    2003-01-01

    In meta-analysis, researchers combine the results of individual studies to arrive at cumulative conclusions. Meta-analysts sometimes include "grey literature" in their evidential base, which includes unpublished studies and studies published outside widely available journals. Because grey literature is a source of data that might not employ peer review, critics have questioned the validity of its data and the results of meta-analyses that include it. To examine evidence regarding whether grey literature should be included in meta-analyses and strategies to manage grey literature in quantitative synthesis. This article reviews evidence on whether the results of studies published in peer-reviewed journals are representative of results from broader samplings of research on a topic as a rationale for inclusion of grey literature. Strategies to enhance access to grey literature are addressed. The most consistent and robust difference between published and grey literature is that published research is more likely to contain results that are statistically significant. Effect size estimates of published research are about one-third larger than those of unpublished studies. Unfunded and small sample studies are less likely to be published. Yet, importantly, methodological rigor does not differ between published and grey literature. Meta-analyses that exclude grey literature likely (a) over-represent studies with statistically significant findings, (b) inflate effect size estimates, and (c) provide less precise effect size estimates than meta-analyses including grey literature. Meta-analyses should include grey literature to fully reflect the existing evidential base and should assess the impact of methodological variations through moderator analysis.

  17. Evaluation of Evidence of Statistical Support and Corroboration of Subgroup Claims in Randomized Clinical Trials.

    PubMed

    Wallach, Joshua D; Sullivan, Patrick G; Trepanowski, John F; Sainani, Kristin L; Steyerberg, Ewout W; Ioannidis, John P A

    2017-04-01

    Many published randomized clinical trials (RCTs) make claims for subgroup differences. To evaluate how often subgroup claims reported in the abstracts of RCTs are actually supported by statistical evidence (P < .05 from an interaction test) and corroborated by subsequent RCTs and meta-analyses. This meta-epidemiological survey examines data sets of trials with at least 1 subgroup claim, including Subgroup Analysis of Trials Is Rarely Easy (SATIRE) articles and Discontinuation of Randomized Trials (DISCO) articles. We used Scopus (updated July 2016) to search for English-language articles citing each of the eligible index articles with at least 1 subgroup finding in the abstract. Articles with a subgroup claim in the abstract with or without evidence of statistical heterogeneity (P < .05 from an interaction test) in the text and articles attempting to corroborate the subgroup findings. Study characteristics of trials with at least 1 subgroup claim in the abstract were recorded. Two reviewers extracted the data necessary to calculate subgroup-level effect sizes, standard errors, and the P values for interaction. For individual RCTs and meta-analyses that attempted to corroborate the subgroup findings from the index articles, trial characteristics were extracted. Cochran Q test was used to reevaluate heterogeneity with the data from all available trials. The number of subgroup claims in the abstracts of RCTs, the number of subgroup claims in the abstracts of RCTs with statistical support (subgroup findings), and the number of subgroup findings corroborated by subsequent RCTs and meta-analyses. Sixty-four eligible RCTs made a total of 117 subgroup claims in their abstracts. Of these 117 claims, only 46 (39.3%) in 33 articles had evidence of statistically significant heterogeneity from a test for interaction. In addition, out of these 46 subgroup findings, only 16 (34.8%) ensured balance between randomization groups within the subgroups (eg, through stratified

  18. Impact of ontology evolution on functional analyses.

    PubMed

    Groß, Anika; Hartung, Michael; Prüfer, Kay; Kelso, Janet; Rahm, Erhard

    2012-10-15

    Ontologies are used in the annotation and analysis of biological data. As knowledge accumulates, ontologies and annotation undergo constant modifications to reflect this new knowledge. These modifications may influence the results of statistical applications such as functional enrichment analyses that describe experimental data in terms of ontological groupings. Here, we investigate to what degree modifications of the Gene Ontology (GO) impact these statistical analyses for both experimental and simulated data. The analysis is based on new measures for the stability of result sets and considers different ontology and annotation changes. Our results show that past changes in the GO are non-uniformly distributed over different branches of the ontology. Considering the semantic relatedness of significant categories in analysis results allows a more realistic stability assessment for functional enrichment studies. We observe that the results of term-enrichment analyses tend to be surprisingly stable despite changes in ontology and annotation.

  19. Students' attitudes towards learning statistics

    NASA Astrophysics Data System (ADS)

    Ghulami, Hassan Rahnaward; Hamid, Mohd Rashid Ab; Zakaria, Roslinazairimah

    2015-05-01

    Positive attitude towards learning is vital in order to master the core content of the subject matters under study. This is unexceptional in learning statistics course especially at the university level. Therefore, this study investigates the students' attitude towards learning statistics. Six variables or constructs have been identified such as affect, cognitive competence, value, difficulty, interest, and effort. The instrument used for the study is questionnaire that was adopted and adapted from the reliable instrument of Survey of Attitudes towards Statistics(SATS©). This study is conducted to engineering undergraduate students in one of the university in the East Coast of Malaysia. The respondents consist of students who were taking the applied statistics course from different faculties. The results are analysed in terms of descriptive analysis and it contributes to the descriptive understanding of students' attitude towards the teaching and learning process of statistics.

  20. Statistics for Radiology Research.

    PubMed

    Obuchowski, Nancy A; Subhas, Naveen; Polster, Joshua

    2017-02-01

    Biostatistics is an essential component in most original research studies in imaging. In this article we discuss five key statistical concepts for study design and analyses in modern imaging research: statistical hypothesis testing, particularly focusing on noninferiority studies; imaging outcomes especially when there is no reference standard; dealing with the multiplicity problem without spending all your study power; relevance of confidence intervals in reporting and interpreting study results; and finally tools for assessing quantitative imaging biomarkers. These concepts are presented first as examples of conversations between investigator and biostatistician, and then more detailed discussions of the statistical concepts follow. Three skeletal radiology examples are used to illustrate the concepts. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  1. Statistical Analysis of NAS Parallel Benchmarks and LINPACK Results

    NASA Technical Reports Server (NTRS)

    Meuer, Hans-Werner; Simon, Horst D.; Strohmeier, Erich; Lasinski, T. A. (Technical Monitor)

    1994-01-01

    In the last three years extensive performance data have been reported for parallel machines both based on the NAS Parallel Benchmarks, and on LINPACK. In this study we have used the reported benchmark results and performed a number of statistical experiments using factor, cluster, and regression analyses. In addition to the performance results of LINPACK and the eight NAS parallel benchmarks, we have also included peak performance of the machine, and the LINPACK n and n(sub 1/2) values. Some of the results and observations can be summarized as follows: 1) All benchmarks are strongly correlated with peak performance. 2) LINPACK and EP have each a unique signature. 3) The remaining NPB can grouped into three groups as follows: (CG and IS), (LU and SP), and (MG, FT, and BT). Hence three (or four with EP) benchmarks are sufficient to characterize the overall NPB performance. Our poster presentation will follow a standard poster format, and will present the data of our statistical analysis in detail.

  2. Identifying Frequent Users of an Urban Emergency Medical Service Using Descriptive Statistics and Regression Analyses.

    PubMed

    Norman, Chenelle; Mello, Michael; Choi, Bryan

    2016-01-01

    This retrospective cohort study provides a descriptive analysis of a population that frequently uses an urban emergency medical service (EMS) and identifies factors that contribute to use among all frequent users. For purposes of this study we divided frequent users into the following groups: low- frequent users (4 EMS transports in 2012), medium-frequent users (5 to 6 EMS transports in 2012), high-frequent users (7 to 10 EMS transports in 2012) and super-frequent users (11 or more EMS transports in 2012). Overall, we identified 539 individuals as frequent users. For all groups of EMS frequent users (i.e. low, medium, high and super) one or more hospital admissions, receiving a referral for follow-up care upon discharge, and having no insurance were found to be statistically significant with frequent EMS use (P<0.05). Within the diagnostic categories, 41.61% of super-frequent users had a diagnosis of "primarily substance abuse/misuse" and among low-frequent users a majority, 53.33%, were identified as having a "reoccurring (medical) diagnosis." Lastly, relative risk ratios for the highest group of users, super-frequent users, were 3.34 (95% CI [1.90-5.87]) for obtaining at least one referral for follow-up care, 13.67 (95% CI [5.60-33.34]) for having four or more hospital admissions and 5.95 (95% CI [1.80-19.63]) for having a diagnoses of primarily substance abuse/misuse. Findings from this study demonstrate that among low- and medium-frequent users a majority of patients are using EMS for reoccurring medical conditions. This could potentially be avoided with better care management. In addition, this study adds to the current literature that illustrates a strong correlation between substance abuse/misuse and high/super-frequent EMS use. For the subgroup analysis among individuals 65 years of age and older, we did not find any of the independent variables included in our model to be statistically significant with frequent EMS use.

  3. Statistics for Learning Genetics

    ERIC Educational Resources Information Center

    Charles, Abigail Sheena

    2012-01-01

    This study investigated the knowledge and skills that biology students may need to help them understand statistics/mathematics as it applies to genetics. The data are based on analyses of current representative genetics texts, practicing genetics professors' perspectives, and more directly, students' perceptions of, and performance in, doing…

  4. Additive scales in degenerative disease--calculation of effect sizes and clinical judgment.

    PubMed

    Riepe, Matthias W; Wilkinson, David; Förstl, Hans; Brieden, Andreas

    2011-12-16

    The therapeutic efficacy of an intervention is often assessed in clinical trials by scales measuring multiple diverse activities that are added to produce a cumulative global score. Medical communities and health care systems subsequently use these data to calculate pooled effect sizes to compare treatments. This is done because major doubt has been cast over the clinical relevance of statistically significant findings relying on p values with the potential to report chance findings. Hence in an aim to overcome this pooling the results of clinical studies into a meta-analyses with a statistical calculus has been assumed to be a more definitive way of deciding of efficacy. We simulate the therapeutic effects as measured with additive scales in patient cohorts with different disease severity and assess the limitations of an effect size calculation of additive scales which are proven mathematically. We demonstrate that the major problem, which cannot be overcome by current numerical methods, is the complex nature and neurobiological foundation of clinical psychiatric endpoints in particular and additive scales in general. This is particularly relevant for endpoints used in dementia research. 'Cognition' is composed of functions such as memory, attention, orientation and many more. These individual functions decline in varied and non-linear ways. Here we demonstrate that with progressive diseases cumulative values from multidimensional scales are subject to distortion by the limitations of the additive scale. The non-linearity of the decline of function impedes the calculation of effect sizes based on cumulative values from these multidimensional scales. Statistical analysis needs to be guided by boundaries of the biological condition. Alternatively, we suggest a different approach avoiding the error imposed by over-analysis of cumulative global scores from additive scales.

  5. Statistical Literacy as a Function of Online versus Hybrid Course Delivery Format for an Introductory Graduate Statistics Course

    ERIC Educational Resources Information Center

    Hahs-Vaughn, Debbie L.; Acquaye, Hannah; Griffith, Matthew D.; Jo, Hang; Matthews, Ken; Acharya, Parul

    2017-01-01

    Statistical literacy refers to understanding fundamental statistical concepts. Assessment of statistical literacy can take the forms of tasks that require students to identify, translate, compute, read, and interpret data. In addition, statistical instruction can take many forms encompassing course delivery format such as face-to-face, hybrid,…

  6. Statistical Literacy in the Data Science Workplace

    ERIC Educational Resources Information Center

    Grant, Robert

    2017-01-01

    Statistical literacy, the ability to understand and make use of statistical information including methods, has particular relevance in the age of data science, when complex analyses are undertaken by teams from diverse backgrounds. Not only is it essential to communicate to the consumers of information but also within the team. Writing from the…

  7. Introduction to Bayesian statistical approaches to compositional analyses of transgenic crops 1. Model validation and setting the stage.

    PubMed

    Harrison, Jay M; Breeze, Matthew L; Harrigan, George G

    2011-08-01

    Statistical comparisons of compositional data generated on genetically modified (GM) crops and their near-isogenic conventional (non-GM) counterparts typically rely on classical significance testing. This manuscript presents an introduction to Bayesian methods for compositional analysis along with recommendations for model validation. The approach is illustrated using protein and fat data from two herbicide tolerant GM soybeans (MON87708 and MON87708×MON89788) and a conventional comparator grown in the US in 2008 and 2009. Guidelines recommended by the US Food and Drug Administration (FDA) in conducting Bayesian analyses of clinical studies on medical devices were followed. This study is the first Bayesian approach to GM and non-GM compositional comparisons. The evaluation presented here supports a conclusion that a Bayesian approach to analyzing compositional data can provide meaningful and interpretable results. We further describe the importance of method validation and approaches to model checking if Bayesian approaches to compositional data analysis are to be considered viable by scientists involved in GM research and regulation. Copyright © 2011 Elsevier Inc. All rights reserved.

  8. Development of the Statistical Reasoning in Biology Concept Inventory (SRBCI)

    PubMed Central

    Deane, Thomas; Nomme, Kathy; Jeffery, Erica; Pollock, Carol; Birol, Gülnur

    2016-01-01

    We followed established best practices in concept inventory design and developed a 12-item inventory to assess student ability in statistical reasoning in biology (Statistical Reasoning in Biology Concept Inventory [SRBCI]). It is important to assess student thinking in this conceptual area, because it is a fundamental requirement of being statistically literate and associated skills are needed in almost all walks of life. Despite this, previous work shows that non–expert-like thinking in statistical reasoning is common, even after instruction. As science educators, our goal should be to move students along a novice-to-expert spectrum, which could be achieved with growing experience in statistical reasoning. We used item response theory analyses (the one-parameter Rasch model and associated analyses) to assess responses gathered from biology students in two populations at a large research university in Canada in order to test SRBCI’s robustness and sensitivity in capturing useful data relating to the students’ conceptual ability in statistical reasoning. Our analyses indicated that SRBCI is a unidimensional construct, with items that vary widely in difficulty and provide useful information about such student ability. SRBCI should be useful as a diagnostic tool in a variety of biology settings and as a means of measuring the success of teaching interventions designed to improve statistical reasoning skills. PMID:26903497

  9. Statistical quality control through overall vibration analysis

    NASA Astrophysics Data System (ADS)

    Carnero, M. a. Carmen; González-Palma, Rafael; Almorza, David; Mayorga, Pedro; López-Escobar, Carlos

    2010-05-01

    The present study introduces the concept of statistical quality control in automotive wheel bearings manufacturing processes. Defects on products under analysis can have a direct influence on passengers' safety and comfort. At present, the use of vibration analysis on machine tools for quality control purposes is not very extensive in manufacturing facilities. Noise and vibration are common quality problems in bearings. These failure modes likely occur under certain operating conditions and do not require high vibration amplitudes but relate to certain vibration frequencies. The vibration frequencies are affected by the type of surface problems (chattering) of ball races that are generated through grinding processes. The purpose of this paper is to identify grinding process variables that affect the quality of bearings by using statistical principles in the field of machine tools. In addition, an evaluation of the quality results of the finished parts under different combinations of process variables is assessed. This paper intends to establish the foundations to predict the quality of the products through the analysis of self-induced vibrations during the contact between the grinding wheel and the parts. To achieve this goal, the overall self-induced vibration readings under different combinations of process variables are analysed using statistical tools. The analysis of data and design of experiments follows a classical approach, considering all potential interactions between variables. The analysis of data is conducted through analysis of variance (ANOVA) for data sets that meet normality and homoscedasticity criteria. This paper utilizes different statistical tools to support the conclusions such as chi squared, Shapiro-Wilks, symmetry, Kurtosis, Cochran, Hartlett, and Hartley and Krushal-Wallis. The analysis presented is the starting point to extend the use of predictive techniques (vibration analysis) for quality control. This paper demonstrates the existence

  10. The disagreeable behaviour of the kappa statistic.

    PubMed

    Flight, Laura; Julious, Steven A

    2015-01-01

    It is often of interest to measure the agreement between a number of raters when an outcome is nominal or ordinal. The kappa statistic is used as a measure of agreement. The statistic is highly sensitive to the distribution of the marginal totals and can produce unreliable results. Other statistics such as the proportion of concordance, maximum attainable kappa and prevalence and bias adjusted kappa should be considered to indicate how well the kappa statistic represents agreement in the data. Each kappa should be considered and interpreted based on the context of the data being analysed. Copyright © 2014 John Wiley & Sons, Ltd.

  11. IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics

    PubMed Central

    2016-01-01

    Background We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. Objective To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. Methods The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Results Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence

  12. IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics.

    PubMed

    Hoyt, Robert Eugene; Snider, Dallas; Thompson, Carla; Mantravadi, Sarita

    2016-10-11

    We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix

  13. How to Make Nothing Out of Something: Analyses of the Impact of Study Sampling and Statistical Interpretation in Misleading Meta-Analytic Conclusions

    PubMed Central

    Cunningham, Michael R.; Baumeister, Roy F.

    2016-01-01

    The limited resource model states that self-control is governed by a relatively finite set of inner resources on which people draw when exerting willpower. Once self-control resources have been used up or depleted, they are less available for other self-control tasks, leading to a decrement in subsequent self-control success. The depletion effect has been studied for over 20 years, tested or extended in more than 600 studies, and supported in an independent meta-analysis (Hagger et al., 2010). Meta-analyses are supposed to reduce bias in literature reviews. Carter et al.’s (2015) meta-analysis, by contrast, included a series of questionable decisions involving sampling, methods, and data analysis. We provide quantitative analyses of key sampling issues: exclusion of many of the best depletion studies based on idiosyncratic criteria and the emphasis on mini meta-analyses with low statistical power as opposed to the overall depletion effect. We discuss two key methodological issues: failure to code for research quality, and the quantitative impact of weak studies by novice researchers. We discuss two key data analysis issues: questionable interpretation of the results of trim and fill and Funnel Plot Asymmetry test procedures, and the use and misinterpretation of the untested Precision Effect Test and Precision Effect Estimate with Standard Error (PEESE) procedures. Despite these serious problems, the Carter et al. (2015) meta-analysis results actually indicate that there is a real depletion effect – contrary to their title. PMID:27826272

  14. Dispersal of potato cyst nematodes measured using historical and spatial statistical analyses.

    PubMed

    Banks, N C; Hodda, M; Singh, S K; Matveeva, E M

    2012-06-01

    Rates and modes of dispersal of potato cyst nematodes (PCNs) were investigated. Analysis of records from eight countries suggested that PCNs spread a mean distance of 5.3 km/year radially from the site of first detection, and spread 212 km over ≈40 years before detection. Data from four countries with more detailed histories of invasion were analyzed further, using distance from first detection, distance from previous detection, distance from nearest detection, straight line distance, and road distance. Linear distance from first detection was significantly related to the time since the first detection. Estimated rate of spread was 5.7 km/year, and did not differ statistically between countries. Time between the first detection and estimated introduction date varied between 0 and 20 years, and differed among countries. Road distances from nearest and first detection were statistically significantly related to time, and gave slightly higher estimates for rate of spread of 6.0 and 7.9 km/year, respectively. These results indicate that the original site of introduction of PCNs may act as a source for subsequent spread and that this may occur at a relatively constant rate over time regardless of whether this distance is measured by road or by a straight line. The implications of this constant radial rate of dispersal for biosecurity and pest management are discussed, along with the effects of control strategies.

  15. Narrative Review of Statistical Reporting Checklists, Mandatory Statistical Editing, and Rectifying Common Problems in the Reporting of Scientific Articles.

    PubMed

    Dexter, Franklin; Shafer, Steven L

    2017-03-01

    Considerable attention has been drawn to poor reproducibility in the biomedical literature. One explanation is inadequate reporting of statistical methods by authors and inadequate assessment of statistical reporting and methods during peer review. In this narrative review, we examine scientific studies of several well-publicized efforts to improve statistical reporting. We also review several retrospective assessments of the impact of these efforts. These studies show that instructions to authors and statistical checklists are not sufficient; no findings suggested that either improves the quality of statistical methods and reporting. Second, even basic statistics, such as power analyses, are frequently missing or incorrectly performed. Third, statistical review is needed for all papers that involve data analysis. A consistent finding in the studies was that nonstatistical reviewers (eg, "scientific reviewers") and journal editors generally poorly assess statistical quality. We finish by discussing our experience with statistical review at Anesthesia & Analgesia from 2006 to 2016.

  16. Graphical augmentations to the funnel plot assess the impact of additional evidence on a meta-analysis.

    PubMed

    Langan, Dean; Higgins, Julian P T; Gregory, Walter; Sutton, Alexander J

    2012-05-01

    We aim to illustrate the potential impact of a new study on a meta-analysis, which gives an indication of the robustness of the meta-analysis. A number of augmentations are proposed to one of the most widely used of graphical displays, the funnel plot. Namely, 1) statistical significance contours, which define regions of the funnel plot in which a new study would have to be located to change the statistical significance of the meta-analysis; and 2) heterogeneity contours, which show how a new study would affect the extent of heterogeneity in a given meta-analysis. Several other features are also described, and the use of multiple features simultaneously is considered. The statistical significance contours suggest that one additional study, no matter how large, may have a very limited impact on the statistical significance of a meta-analysis. The heterogeneity contours illustrate that one outlying study can increase the level of heterogeneity dramatically. The additional features of the funnel plot have applications including 1) informing sample size calculations for the design of future studies eligible for inclusion in the meta-analysis; and 2) informing the updating prioritization of a portfolio of meta-analyses such as those prepared by the Cochrane Collaboration. Copyright © 2012 Elsevier Inc. All rights reserved.

  17. Quantile regression for the statistical analysis of immunological data with many non-detects.

    PubMed

    Eilers, Paul H C; Röder, Esther; Savelkoul, Huub F J; van Wijk, Roy Gerth

    2012-07-07

    Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.

  18. Global atmospheric circulation statistics, 1000-1 mb

    NASA Technical Reports Server (NTRS)

    Randel, William J.

    1992-01-01

    The atlas presents atmospheric general circulation statistics derived from twelve years (1979-90) of daily National Meteorological Center (NMC) operational geopotential height analyses; it is an update of a prior atlas using data over 1979-1986. These global analyses are available on pressure levels covering 1000-1 mb (approximately 0-50 km). The geopotential grids are a combined product of the Climate Analysis Center (which produces analyses over 70-1 mb) and operational NMC analyses (over 1000-100 mb). Balance horizontal winds and hydrostatic temperatures are derived from the geopotential fields.

  19. Multiple Phenotype Association Tests Using Summary Statistics in Genome-Wide Association Studies

    PubMed Central

    Liu, Zhonghua; Lin, Xihong

    2017-01-01

    Summary We study in this paper jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. PMID:28653391

  20. Teaching Statistics in Biology: Using Inquiry-based Learning to Strengthen Understanding of Statistical Analysis in Biology Laboratory Courses

    PubMed Central

    2008-01-01

    There is an increasing need for students in the biological sciences to build a strong foundation in quantitative approaches to data analyses. Although most science, engineering, and math field majors are required to take at least one statistics course, statistical analysis is poorly integrated into undergraduate biology course work, particularly at the lower-division level. Elements of statistics were incorporated into an introductory biology course, including a review of statistics concepts and opportunity for students to perform statistical analysis in a biological context. Learning gains were measured with an 11-item statistics learning survey instrument developed for the course. Students showed a statistically significant 25% (p < 0.005) increase in statistics knowledge after completing introductory biology. Students improved their scores on the survey after completing introductory biology, even if they had previously completed an introductory statistics course (9%, improvement p < 0.005). Students retested 1 yr after completing introductory biology showed no loss of their statistics knowledge as measured by this instrument, suggesting that the use of statistics in biology course work may aid long-term retention of statistics knowledge. No statistically significant differences in learning were detected between male and female students in the study. PMID:18765754

  1. Teaching statistics in biology: using inquiry-based learning to strengthen understanding of statistical analysis in biology laboratory courses.

    PubMed

    Metz, Anneke M

    2008-01-01

    There is an increasing need for students in the biological sciences to build a strong foundation in quantitative approaches to data analyses. Although most science, engineering, and math field majors are required to take at least one statistics course, statistical analysis is poorly integrated into undergraduate biology course work, particularly at the lower-division level. Elements of statistics were incorporated into an introductory biology course, including a review of statistics concepts and opportunity for students to perform statistical analysis in a biological context. Learning gains were measured with an 11-item statistics learning survey instrument developed for the course. Students showed a statistically significant 25% (p < 0.005) increase in statistics knowledge after completing introductory biology. Students improved their scores on the survey after completing introductory biology, even if they had previously completed an introductory statistics course (9%, improvement p < 0.005). Students retested 1 yr after completing introductory biology showed no loss of their statistics knowledge as measured by this instrument, suggesting that the use of statistics in biology course work may aid long-term retention of statistics knowledge. No statistically significant differences in learning were detected between male and female students in the study.

  2. 24 CFR 81.65 - Other information and analyses.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 24 Housing and Urban Development 1 2010-04-01 2010-04-01 false Other information and analyses. 81... information and analyses. When deemed appropriate and requested in writing, on a case by-case basis, by the... conduct additional analyses concerning any such report. A GSE shall submit additional reports or other...

  3. 24 CFR 81.65 - Other information and analyses.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 24 Housing and Urban Development 1 2012-04-01 2012-04-01 false Other information and analyses. 81... information and analyses. When deemed appropriate and requested in writing, on a case by-case basis, by the... conduct additional analyses concerning any such report. A GSE shall submit additional reports or other...

  4. 24 CFR 81.65 - Other information and analyses.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 24 Housing and Urban Development 1 2011-04-01 2011-04-01 false Other information and analyses. 81... information and analyses. When deemed appropriate and requested in writing, on a case by-case basis, by the... conduct additional analyses concerning any such report. A GSE shall submit additional reports or other...

  5. 24 CFR 81.65 - Other information and analyses.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 24 Housing and Urban Development 1 2014-04-01 2014-04-01 false Other information and analyses. 81... information and analyses. When deemed appropriate and requested in writing, on a case by-case basis, by the... conduct additional analyses concerning any such report. A GSE shall submit additional reports or other...

  6. 24 CFR 81.65 - Other information and analyses.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 24 Housing and Urban Development 1 2013-04-01 2013-04-01 false Other information and analyses. 81... information and analyses. When deemed appropriate and requested in writing, on a case by-case basis, by the... conduct additional analyses concerning any such report. A GSE shall submit additional reports or other...

  7. Identity-by-descent analyses for measuring population dynamics and selection in recombining pathogens.

    PubMed

    Henden, Lyndal; Lee, Stuart; Mueller, Ivo; Barry, Alyssa; Bahlo, Melanie

    2018-05-01

    Identification of genomic regions that are identical by descent (IBD) has proven useful for human genetic studies where analyses have led to the discovery of familial relatedness and fine-mapping of disease critical regions. Unfortunately however, IBD analyses have been underutilized in analysis of other organisms, including human pathogens. This is in part due to the lack of statistical methodologies for non-diploid genomes in addition to the added complexity of multiclonal infections. As such, we have developed an IBD methodology, called isoRelate, for analysis of haploid recombining microorganisms in the presence of multiclonal infections. Using the inferred IBD status at genomic locations, we have also developed a novel statistic for identifying loci under positive selection and propose relatedness networks as a means of exploring shared haplotypes within populations. We evaluate the performance of our methodologies for detecting IBD and selection, including comparisons with existing tools, then perform an exploratory analysis of whole genome sequencing data from a global Plasmodium falciparum dataset of more than 2500 genomes. This analysis identifies Southeast Asia as having many highly related isolates, possibly as a result of both reduced transmission from intensified control efforts and population bottlenecks following the emergence of antimalarial drug resistance. Many signals of selection are also identified, most of which overlap genes that are known to be associated with drug resistance, in addition to two novel signals observed in multiple countries that have yet to be explored in detail. Additionally, we investigate relatedness networks over the selected loci and determine that one of these sweeps has spread between continents while the other has arisen independently in different countries. IBD analysis of microorganisms using isoRelate can be used for exploring population structure, positive selection and haplotype distributions, and will be a

  8. Research Pearls: The Significance of Statistics and Perils of Pooling. Part 3: Pearls and Pitfalls of Meta-analyses and Systematic Reviews.

    PubMed

    Harris, Joshua D; Brand, Jefferson C; Cote, Mark P; Dhawan, Aman

    2017-08-01

    Within the health care environment, there has been a recent and appropriate trend towards emphasizing the value of care provision. Reduced cost and higher quality improve the value of care. Quality is a challenging, heterogeneous, variably defined concept. At the core of quality is the patient's outcome, quantified by a vast assortment of subjective and objective outcome measures. There has been a recent evolution towards evidence-based medicine in health care, clearly elucidating the role of high-quality evidence across groups of patients and studies. Synthetic studies, such as systematic reviews and meta-analyses, are at the top of the evidence-based medicine hierarchy. Thus, these investigations may be the best potential source of guiding diagnostic, therapeutic, prognostic, and economic medical decision making. Systematic reviews critically appraise and synthesize the best available evidence to provide a conclusion statement (a "take-home point") in response to a specific answerable clinical question. A meta-analysis uses statistical methods to quantitatively combine data from single studies. Meta-analyses should be performed with high methodological quality homogenous studies (Level I or II) or evidence randomized studies, to minimize confounding variable bias. When it is known that the literature is inadequate or a recent systematic review has already been performed with a demonstration of insufficient data, then a new systematic review does not add anything meaningful to the literature. PROSPERO registration and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines assist authors in the design and conduct of systematic reviews and should always be used. Complete transparency of the conduct of the review permits reproducibility and improves fidelity of the conclusions. Pooling of data from overly dissimilar investigations should be avoided. This particularly applies to Level IV evidence, that is, noncomparative investigations

  9. Reporting and methodological quality of meta-analyses in urological literature.

    PubMed

    Xia, Leilei; Xu, Jing; Guzzo, Thomas J

    2017-01-01

    To assess the overall quality of published urological meta-analyses and identify predictive factors for high quality. We systematically searched PubMed to identify meta-analyses published from January 1st, 2011 to December 31st, 2015 in 10 predetermined major paper-based urology journals. The characteristics of the included meta-analyses were collected, and their reporting and methodological qualities were assessed by the PRISMA checklist (27 items) and AMSTAR tool (11 items), respectively. Descriptive statistics were used for individual items as a measure of overall compliance, and PRISMA and AMSTAR scores were calculated as the sum of adequately reported domains. Logistic regression was used to identify predictive factors for high qualities. A total of 183 meta-analyses were included. The mean PRISMA and AMSTAR scores were 22.74 ± 2.04 and 7.57 ± 1.41, respectively. PRISMA item 5, protocol and registration, items 15 and 22, risk of bias across studies, items 16 and 23, additional analysis had less than 50% adherence. AMSTAR item 1, " a priori " design, item 5, list of studies and item 10, publication bias had less than 50% adherence. Logistic regression analyses showed that funding support and " a priori " design were associated with superior reporting quality, following PRISMA guideline and " a priori " design were associated with superior methodological quality. Reporting and methodological qualities of recently published meta-analyses in major paper-based urology journals are generally good. Further improvement could potentially be achieved by strictly adhering to PRISMA guideline and having " a priori " protocol.

  10. Comparing Visual and Statistical Analysis of Multiple Baseline Design Graphs.

    PubMed

    Wolfe, Katie; Dickenson, Tammiee S; Miller, Bridget; McGrath, Kathleen V

    2018-04-01

    A growing number of statistical analyses are being developed for single-case research. One important factor in evaluating these methods is the extent to which each corresponds to visual analysis. Few studies have compared statistical and visual analysis, and information about more recently developed statistics is scarce. Therefore, our purpose was to evaluate the agreement between visual analysis and four statistical analyses: improvement rate difference (IRD); Tau-U; Hedges, Pustejovsky, Shadish (HPS) effect size; and between-case standardized mean difference (BC-SMD). Results indicate that IRD and BC-SMD had the strongest overall agreement with visual analysis. Although Tau-U had strong agreement with visual analysis on raw values, it had poorer agreement when those values were dichotomized to represent the presence or absence of a functional relation. Overall, visual analysis appeared to be more conservative than statistical analysis, but further research is needed to evaluate the nature of these disagreements.

  11. Statistical Prediction in Proprietary Rehabilitation.

    ERIC Educational Resources Information Center

    Johnson, Kurt L.; And Others

    1987-01-01

    Applied statistical methods to predict case expenditures for low back pain rehabilitation cases in proprietary rehabilitation. Extracted predictor variables from case records of 175 workers compensation claimants with some degree of permanent disability due to back injury. Performed several multiple regression analyses resulting in a formula that…

  12. Statistical ecology comes of age

    PubMed Central

    Gimenez, Olivier; Buckland, Stephen T.; Morgan, Byron J. T.; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M.; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M.; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-01-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1–4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data. PMID:25540151

  13. Statistical ecology comes of age.

    PubMed

    Gimenez, Olivier; Buckland, Stephen T; Morgan, Byron J T; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-12-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1-4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data.

  14. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.

    PubMed

    Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P; Patterson, Nick; Price, Alkes L

    2014-10-15

    Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G

  15. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment

    PubMed Central

    Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P.; Patterson, Nick; Price, Alkes L.

    2014-01-01

    Motivation: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. Results: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1–5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case–control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of χ2 association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis

  16. Computed statistics at streamgages, and methods for estimating low-flow frequency statistics and development of regional regression equations for estimating low-flow frequency statistics at ungaged locations in Missouri

    USGS Publications Warehouse

    Southard, Rodney E.

    2013-01-01

    estimates on one of these streams can be calculated at an ungaged location that has a drainage area that is between 40 percent of the drainage area of the farthest upstream streamgage and within 150 percent of the drainage area of the farthest downstream streamgage along the stream of interest. The second method may be used on any stream with a streamgage that has operated for 10 years or longer and for which anthropogenic effects have not changed the low-flow characteristics at the ungaged location since collection of the streamflow data. A ratio of drainage area of the stream at the ungaged location to the drainage area of the stream at the streamgage was computed to estimate the statistic at the ungaged location. The range of applicability is between 40- and 150-percent of the drainage area of the streamgage, and the ungaged location must be located on the same stream as the streamgage. The third method uses regional regression equations to estimate selected low-flow frequency statistics for unregulated streams in Missouri. This report presents regression equations to estimate frequency statistics for the 10-year recurrence interval and for the N-day durations of 1, 2, 3, 7, 10, 30, and 60 days. Basin and climatic characteristics were computed using geographic information system software and digital geospatial data. A total of 35 characteristics were computed for use in preliminary statewide and regional regression analyses based on existing digital geospatial data and previous studies. Spatial analyses for geographical bias in the predictive accuracy of the regional regression equations defined three low-flow regions with the State representing the three major physiographic provinces in Missouri. Region 1 includes the Central Lowlands, Region 2 includes the Ozark Plateaus, and Region 3 includes the Mississippi Alluvial Plain. A total of 207 streamgages were used in the regression analyses for the regional equations. Of the 207 U.S. Geological Survey streamgages, 77 were

  17. Versatility of Cooperative Transcriptional Activation: A Thermodynamical Modeling Analysis for Greater-Than-Additive and Less-Than-Additive Effects

    PubMed Central

    Frank, Till D.; Carmody, Aimée M.; Kholodenko, Boris N.

    2012-01-01

    We derive a statistical model of transcriptional activation using equilibrium thermodynamics of chemical reactions. We examine to what extent this statistical model predicts synergy effects of cooperative activation of gene expression. We determine parameter domains in which greater-than-additive and less-than-additive effects are predicted for cooperative regulation by two activators. We show that the statistical approach can be used to identify different causes of synergistic greater-than-additive effects: nonlinearities of the thermostatistical transcriptional machinery and three-body interactions between RNA polymerase and two activators. In particular, our model-based analysis suggests that at low transcription factor concentrations cooperative activation cannot yield synergistic greater-than-additive effects, i.e., DNA transcription can only exhibit less-than-additive effects. Accordingly, transcriptional activity turns from synergistic greater-than-additive responses at relatively high transcription factor concentrations into less-than-additive responses at relatively low concentrations. In addition, two types of re-entrant phenomena are predicted. First, our analysis predicts that under particular circumstances transcriptional activity will feature a sequence of less-than-additive, greater-than-additive, and eventually less-than-additive effects when for fixed activator concentrations the regulatory impact of activators on the binding of RNA polymerase to the promoter increases from weak, to moderate, to strong. Second, for appropriate promoter conditions when activator concentrations are increased then the aforementioned re-entrant sequence of less-than-additive, greater-than-additive, and less-than-additive effects is predicted as well. Finally, our model-based analysis suggests that even for weak activators that individually induce only negligible increases in promoter activity, promoter activity can exhibit greater-than-additive responses when

  18. Septic tank additive impacts on microbial populations.

    PubMed

    Pradhan, S; Hoover, M T; Clark, G H; Gumpertz, M; Wollum, A G; Cobb, C; Strock, J

    2008-01-01

    Environmental health specialists, other onsite wastewater professionals, scientists, and homeowners have questioned the effectiveness of septic tank additives. This paper describes an independent, third-party, field scale, research study of the effects of three liquid bacterial septic tank additives and a control (no additive) on septic tank microbial populations. Microbial populations were measured quarterly in a field study for 12 months in 48 full-size, functioning septic tanks. Bacterial populations in the 48 septic tanks were statistically analyzed with a mixed linear model. Additive effects were assessed for three septic tank maintenance levels (low, intermediate, and high). Dunnett's t-test for tank bacteria (alpha = .05) indicated that none of the treatments were significantly different, overall, from the control at the statistical level tested. In addition, the additives had no significant effects on septic tank bacterial populations at any of the septic tank maintenance levels. Additional controlled, field-based research iswarranted, however, to address additional additives and experimental conditions.

  19. OdorMapComparer: an application for quantitative analyses and comparisons of fMRI brain odor maps.

    PubMed

    Liu, Nian; Xu, Fuqiang; Miller, Perry L; Shepherd, Gordon M

    2007-01-01

    Brain odor maps are reconstructed flat images that describe the spatial activity patterns in the glomerular layer of the olfactory bulbs in animals exposed to different odor stimuli. We have developed a software application, OdorMapComparer, to carry out quantitative analyses and comparisons of the fMRI odor maps. This application is an open-source window program that first loads two odor map images being compared. It allows image transformations including scaling, flipping, rotating, and warping so that the two images can be appropriately aligned to each other. It performs simple subtraction, addition, and average of signals in the two images. It also provides comparative statistics including the normalized correlation (NC) and spatial correlation coefficient. Experimental studies showed that the rodent fMRI odor maps for aliphatic aldehydes displayed spatial activity patterns that are similar in gross outlines but somewhat different in specific subregions. Analyses with OdorMapComparer indicate that the similarity between odor maps decreases with increasing difference in the length of carbon chains. For example, the map of butanal is more closely related to that of pentanal (with a NC = 0.617) than to that of octanal (NC = 0.082), which is consistent with animal behavioral studies. The study also indicates that fMRI odor maps are statistically odor-specific and repeatable across both the intra- and intersubject trials. OdorMapComparer thus provides a tool for quantitative, statistical analyses and comparisons of fMRI odor maps in a fashion that is integrated with the overall odor mapping techniques.

  20. Statistics for People Who (Think They) Hate Statistics. Third Edition

    ERIC Educational Resources Information Center

    Salkind, Neil J.

    2007-01-01

    This text teaches an often intimidating and difficult subject in a way that is informative, personable, and clear. The author takes students through various statistical procedures, beginning with correlation and graphical representation of data and ending with inferential techniques and analysis of variance. In addition, the text covers SPSS, and…

  1. The Empirical Nature and Statistical Treatment of Missing Data

    ERIC Educational Resources Information Center

    Tannenbaum, Christyn E.

    2009-01-01

    Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…

  2. Tips and Tricks for Successful Application of Statistical Methods to Biological Data.

    PubMed

    Schlenker, Evelyn

    2016-01-01

    This chapter discusses experimental design and use of statistics to describe characteristics of data (descriptive statistics) and inferential statistics that test the hypothesis posed by the investigator. Inferential statistics, based on probability distributions, depend upon the type and distribution of the data. For data that are continuous, randomly and independently selected, as well as normally distributed more powerful parametric tests such as Student's t test and analysis of variance (ANOVA) can be used. For non-normally distributed or skewed data, transformation of the data (using logarithms) may normalize the data allowing use of parametric tests. Alternatively, with skewed data nonparametric tests can be utilized, some of which rely on data that are ranked prior to statistical analysis. Experimental designs and analyses need to balance between committing type 1 errors (false positives) and type 2 errors (false negatives). For a variety of clinical studies that determine risk or benefit, relative risk ratios (random clinical trials and cohort studies) or odds ratios (case-control studies) are utilized. Although both use 2 × 2 tables, their premise and calculations differ. Finally, special statistical methods are applied to microarray and proteomics data, since the large number of genes or proteins evaluated increase the likelihood of false discoveries. Additional studies in separate samples are used to verify microarray and proteomic data. Examples in this chapter and references are available to help continued investigation of experimental designs and appropriate data analysis.

  3. Reporting and methodological quality of meta-analyses in urological literature

    PubMed Central

    Xu, Jing

    2017-01-01

    Purpose To assess the overall quality of published urological meta-analyses and identify predictive factors for high quality. Materials and Methods We systematically searched PubMed to identify meta-analyses published from January 1st, 2011 to December 31st, 2015 in 10 predetermined major paper-based urology journals. The characteristics of the included meta-analyses were collected, and their reporting and methodological qualities were assessed by the PRISMA checklist (27 items) and AMSTAR tool (11 items), respectively. Descriptive statistics were used for individual items as a measure of overall compliance, and PRISMA and AMSTAR scores were calculated as the sum of adequately reported domains. Logistic regression was used to identify predictive factors for high qualities. Results A total of 183 meta-analyses were included. The mean PRISMA and AMSTAR scores were 22.74 ± 2.04 and 7.57 ± 1.41, respectively. PRISMA item 5, protocol and registration, items 15 and 22, risk of bias across studies, items 16 and 23, additional analysis had less than 50% adherence. AMSTAR item 1, “a priori” design, item 5, list of studies and item 10, publication bias had less than 50% adherence. Logistic regression analyses showed that funding support and “a priori” design were associated with superior reporting quality, following PRISMA guideline and “a priori” design were associated with superior methodological quality. Conclusions Reporting and methodological qualities of recently published meta-analyses in major paper-based urology journals are generally good. Further improvement could potentially be achieved by strictly adhering to PRISMA guideline and having “a priori” protocol. PMID:28439452

  4. Multiple phenotype association tests using summary statistics in genome-wide association studies.

    PubMed

    Liu, Zhonghua; Lin, Xihong

    2018-03-01

    We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. © 2017, The International Biometric Society.

  5. Development of the Statistical Reasoning in Biology Concept Inventory (SRBCI).

    PubMed

    Deane, Thomas; Nomme, Kathy; Jeffery, Erica; Pollock, Carol; Birol, Gülnur

    2016-01-01

    We followed established best practices in concept inventory design and developed a 12-item inventory to assess student ability in statistical reasoning in biology (Statistical Reasoning in Biology Concept Inventory [SRBCI]). It is important to assess student thinking in this conceptual area, because it is a fundamental requirement of being statistically literate and associated skills are needed in almost all walks of life. Despite this, previous work shows that non-expert-like thinking in statistical reasoning is common, even after instruction. As science educators, our goal should be to move students along a novice-to-expert spectrum, which could be achieved with growing experience in statistical reasoning. We used item response theory analyses (the one-parameter Rasch model and associated analyses) to assess responses gathered from biology students in two populations at a large research university in Canada in order to test SRBCI's robustness and sensitivity in capturing useful data relating to the students' conceptual ability in statistical reasoning. Our analyses indicated that SRBCI is a unidimensional construct, with items that vary widely in difficulty and provide useful information about such student ability. SRBCI should be useful as a diagnostic tool in a variety of biology settings and as a means of measuring the success of teaching interventions designed to improve statistical reasoning skills. © 2016 T. Deane et al. CBE—Life Sciences Education © 2016 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).

  6. Bayesian statistics in medicine: a 25 year review.

    PubMed

    Ashby, Deborah

    2006-11-15

    This review examines the state of Bayesian thinking as Statistics in Medicine was launched in 1982, reflecting particularly on its applicability and uses in medical research. It then looks at each subsequent five-year epoch, with a focus on papers appearing in Statistics in Medicine, putting these in the context of major developments in Bayesian thinking and computation with reference to important books, landmark meetings and seminal papers. It charts the growth of Bayesian statistics as it is applied to medicine and makes predictions for the future. From sparse beginnings, where Bayesian statistics was barely mentioned, Bayesian statistics has now permeated all the major areas of medical statistics, including clinical trials, epidemiology, meta-analyses and evidence synthesis, spatial modelling, longitudinal modelling, survival modelling, molecular genetics and decision-making in respect of new technologies.

  7. Exploring the statistical and clinical impact of two interim analyses on the Phase II design with option for direct assignment.

    PubMed

    An, Ming-Wen; Mandrekar, Sumithra J; Edelman, Martin J; Sargent, Daniel J

    2014-07-01

    The primary goal of Phase II clinical trials is to understand better a treatment's safety and efficacy to inform a Phase III go/no-go decision. Many Phase II designs have been proposed, incorporating randomization, interim analyses, adaptation, and patient selection. The Phase II design with an option for direct assignment (i.e. stop randomization and assign all patients to the experimental arm based on a single interim analysis (IA) at 50% accrual) was recently proposed [An et al., 2012]. We discuss this design in the context of existing designs, and extend it from a single-IA to a two-IA design. We compared the statistical properties and clinical relevance of the direct assignment design with two IA (DAD-2) versus a balanced randomized design with two IA (BRD-2) and a direct assignment design with one IA (DAD-1), over a range of response rate ratios (2.0-3.0). The DAD-2 has minimal loss in power (<2.2%) and minimal increase in T1ER (<1.6%) compared to a BRD-2. As many as 80% more patients were treated with experimental vs. control in the DAD-2 than with the BRD-2 (experimental vs. control ratio: 1.8 vs. 1.0), and as many as 64% more in the DAD-2 than with the DAD-1 (1.8 vs. 1.1). We illustrate the DAD-2 using a case study in lung cancer. In the spectrum of Phase II designs, the direct assignment design, especially with two IA, provides a middle ground with desirable statistical properties and likely appeal to both clinicians and patients. Copyright © 2014 Elsevier Inc. All rights reserved.

  8. Unconscious analyses of visual scenes based on feature conjunctions.

    PubMed

    Tachibana, Ryosuke; Noguchi, Yasuki

    2015-06-01

    To efficiently process a cluttered scene, the visual system analyzes statistical properties or regularities of visual elements embedded in the scene. It is controversial, however, whether those scene analyses could also work for stimuli unconsciously perceived. Here we show that our brain performs the unconscious scene analyses not only using a single featural cue (e.g., orientation) but also based on conjunctions of multiple visual features (e.g., combinations of color and orientation information). Subjects foveally viewed a stimulus array (duration: 50 ms) where 4 types of bars (red-horizontal, red-vertical, green-horizontal, and green-vertical) were intermixed. Although a conscious perception of those bars was inhibited by a subsequent mask stimulus, the brain correctly analyzed the information about color, orientation, and color-orientation conjunctions of those invisible bars. The information of those features was then used for the unconscious configuration analysis (statistical processing) of the central bars, which induced a perceptual bias and illusory feature binding in visible stimuli at peripheral locations. While statistical analyses and feature binding are normally 2 key functions of the visual system to construct coherent percepts of visual scenes, our results show that a high-level analysis combining those 2 functions is correctly performed by unconscious computations in the brain. (c) 2015 APA, all rights reserved).

  9. Comparative statistical component analysis of transgenic, cyanophycin-producing potatoes in greenhouse and field trials.

    PubMed

    Schmidt, Kerstin; Schmidtke, Jörg; Mast, Yvonne; Waldvogel, Eva; Wohlleben, Wolfgang; Klemke, Friederike; Lockau, Wolfgang; Hausmann, Tina; Hühns, Maja; Broer, Inge

    2017-08-01

    Potatoes are a promising system for industrial production of the biopolymer cyanophycin as a second compound in addition to starch. To assess the efficiency in the field, we analysed the stability of the system, specifically its sensitivity to environmental factors. Field and greenhouse trials with transgenic potatoes (two independent events) were carried out for three years. The influence of environmental factors was measured and target compounds in the transgenic plants (cyanophycin, amino acids) were analysed for differences to control plants. Furthermore, non-target parameters (starch content, number, weight and size of tubers) were analysed for equivalence with control plants. The huge amount of data received was handled using modern statistical approaches to model the correlation between influencing environmental factors (year of cultivation, nitrogen fertilization, origin of plants, greenhouse or field cultivation) and key components (starch, amino acids, cyanophycin) and agronomic characteristics. General linear models were used for modelling, and standard effect sizes were applied to compare conventional and genetically modified plants. Altogether, the field trials prove that significant cyanophycin production is possible without reduction of starch content. Non-target compound composition seems to be equivalent under varying environmental conditions. Additionally, a quick test to measure cyanophycin content gives similar results compared to the extensive enzymatic test. This work facilitates the commercial cultivation of cyanophycin potatoes.

  10. Power, effects, confidence, and significance: an investigation of statistical practices in nursing research.

    PubMed

    Gaskin, Cadeyrn J; Happell, Brenda

    2014-05-01

    To (a) assess the statistical power of nursing research to detect small, medium, and large effect sizes; (b) estimate the experiment-wise Type I error rate in these studies; and (c) assess the extent to which (i) a priori power analyses, (ii) effect sizes (and interpretations thereof), and (iii) confidence intervals were reported. Statistical review. Papers published in the 2011 volumes of the 10 highest ranked nursing journals, based on their 5-year impact factors. Papers were assessed for statistical power, control of experiment-wise Type I error, reporting of a priori power analyses, reporting and interpretation of effect sizes, and reporting of confidence intervals. The analyses were based on 333 papers, from which 10,337 inferential statistics were identified. The median power to detect small, medium, and large effect sizes was .40 (interquartile range [IQR]=.24-.71), .98 (IQR=.85-1.00), and 1.00 (IQR=1.00-1.00), respectively. The median experiment-wise Type I error rate was .54 (IQR=.26-.80). A priori power analyses were reported in 28% of papers. Effect sizes were routinely reported for Spearman's rank correlations (100% of papers in which this test was used), Poisson regressions (100%), odds ratios (100%), Kendall's tau correlations (100%), Pearson's correlations (99%), logistic regressions (98%), structural equation modelling/confirmatory factor analyses/path analyses (97%), and linear regressions (83%), but were reported less often for two-proportion z tests (50%), analyses of variance/analyses of covariance/multivariate analyses of variance (18%), t tests (8%), Wilcoxon's tests (8%), Chi-squared tests (8%), and Fisher's exact tests (7%), and not reported for sign tests, Friedman's tests, McNemar's tests, multi-level models, and Kruskal-Wallis tests. Effect sizes were infrequently interpreted. Confidence intervals were reported in 28% of papers. The use, reporting, and interpretation of inferential statistics in nursing research need substantial

  11. Reporting quality of statistical methods in surgical observational studies: protocol for systematic review.

    PubMed

    Wu, Robert; Glen, Peter; Ramsay, Tim; Martel, Guillaume

    2014-06-28

    Observational studies dominate the surgical literature. Statistical adjustment is an important strategy to account for confounders in observational studies. Research has shown that published articles are often poor in statistical quality, which may jeopardize their conclusions. The Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines have been published to help establish standards for statistical reporting.This study will seek to determine whether the quality of statistical adjustment and the reporting of these methods are adequate in surgical observational studies. We hypothesize that incomplete reporting will be found in all surgical observational studies, and that the quality and reporting of these methods will be of lower quality in surgical journals when compared with medical journals. Finally, this work will seek to identify predictors of high-quality reporting. This work will examine the top five general surgical and medical journals, based on a 5-year impact factor (2007-2012). All observational studies investigating an intervention related to an essential component area of general surgery (defined by the American Board of Surgery), with an exposure, outcome, and comparator, will be included in this systematic review. Essential elements related to statistical reporting and quality were extracted from the SAMPL guidelines and include domains such as intent of analysis, primary analysis, multiple comparisons, numbers and descriptive statistics, association and correlation analyses, linear regression, logistic regression, Cox proportional hazard analysis, analysis of variance, survival analysis, propensity analysis, and independent and correlated analyses. Each article will be scored as a proportion based on fulfilling criteria in relevant analyses used in the study. A logistic regression model will be built to identify variables associated with high-quality reporting. A comparison will be made between the scores of surgical

  12. Statistical issues on the analysis of change in follow-up studies in dental research.

    PubMed

    Blance, Andrew; Tu, Yu-Kang; Baelum, Vibeke; Gilthorpe, Mark S

    2007-12-01

    To provide an overview to the problems in study design and associated analyses of follow-up studies in dental research, particularly addressing three issues: treatment-baselineinteractions; statistical power; and nonrandomization. Our previous work has shown that many studies purport an interacion between change (from baseline) and baseline values, which is often based on inappropriate statistical analyses. A priori power calculations are essential for randomized controlled trials (RCTs), but in the pre-test/post-test RCT design it is not well known to dental researchers that the choice of statistical method affects power, and that power is affected by treatment-baseline interactions. A common (good) practice in the analysis of RCT data is to adjust for baseline outcome values using ancova, thereby increasing statistical power. However, an important requirement for ancova is there to be no interaction between the groups and baseline outcome (i.e. effective randomization); the patient-selection process should not cause differences in mean baseline values across groups. This assumption is often violated for nonrandomized (observational) studies and the use of ancova is thus problematic, potentially giving biased estimates, invoking Lord's paradox and leading to difficulties in the interpretation of results. Baseline interaction issues can be overcome by use of statistical methods; not widely practiced in dental research: Oldham's method and multilevel modelling; the latter is preferred for its greater flexibility to deal with more than one follow-up occasion as well as additional covariates To illustrate these three key issues, hypothetical examples are considered from the fields of periodontology, orthodontics, and oral implantology. Caution needs to be exercised when considering the design and analysis of follow-up studies. ancova is generally inappropriate for nonrandomized studies and causal inferences from observational data should be avoided.

  13. AMAS: a fast tool for alignment manipulation and computing of summary statistics.

    PubMed

    Borowiec, Marek L

    2016-01-01

    The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, creation of replicate data sets, and removal of taxa. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It is computationally efficient, utilizes parallel processing, and performs better at concatenation than other popular tools. AMAS is a Python 3 program that relies solely on Python's core modules and needs no additional dependencies. AMAS source code and manual can be downloaded from http://github.com/marekborowiec/AMAS/ under GNU General Public License.

  14. Simultaneous assessment of phase chemistry, phase abundance and bulk chemistry with statistical electron probe micro-analyses: Application to cement clinkers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wilson, William; Krakowiak, Konrad J.; Ulm, Franz-Josef, E-mail: ulm@mit.edu

    2014-01-15

    According to recent developments in cement clinker engineering, the optimization of chemical substitutions in the main clinker phases offers a promising approach to improve both reactivity and grindability of clinkers. Thus, monitoring the chemistry of the phases may become part of the quality control at the cement plants, along with the usual measurements of the abundance of the mineralogical phases (quantitative X-ray diffraction) and the bulk chemistry (X-ray fluorescence). This paper presents a new method to assess these three complementary quantities with a single experiment. The method is based on electron microprobe spot analyses, performed over a grid located onmore » a representative surface of the sample and interpreted with advanced statistical tools. This paper describes the method and the experimental program performed on industrial clinkers to establish the accuracy in comparison to conventional methods. -- Highlights: •A new method of clinker characterization •Combination of electron probe technique with cluster analysis •Simultaneous assessment of phase abundance, composition and bulk chemistry •Experimental validation performed on industrial clinkers.« less

  15. Extreme between-study homogeneity in meta-analyses could offer useful insights.

    PubMed

    Ioannidis, John P A; Trikalinos, Thomas A; Zintzaras, Elias

    2006-10-01

    Meta-analyses are routinely evaluated for the presence of large between-study heterogeneity. We examined whether it is also important to probe whether there is extreme between-study homogeneity. We used heterogeneity tests with left-sided statistical significance for inference and developed a Monte Carlo simulation test for testing extreme homogeneity in risk ratios across studies, using the empiric distribution of the summary risk ratio and heterogeneity statistic. A left-sided P=0.01 threshold was set for claiming extreme homogeneity to minimize type I error. Among 11,803 meta-analyses with binary contrasts from the Cochrane Library, 143 (1.21%) had left-sided P-value <0.01 for the asymptotic Q statistic and 1,004 (8.50%) had left-sided P-value <0.10. The frequency of extreme between-study homogeneity did not depend on the number of studies in the meta-analyses. We identified examples where extreme between-study homogeneity (left-sided P-value <0.01) could result from various possibilities beyond chance. These included inappropriate statistical inference (asymptotic vs. Monte Carlo), use of a specific effect metric, correlated data or stratification using strong predictors of outcome, and biases and potential fraud. Extreme between-study homogeneity may provide useful insights about a meta-analysis and its constituent studies.

  16. SOCR: Statistics Online Computational Resource

    PubMed Central

    Dinov, Ivo D.

    2011-01-01

    The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an integrated educational web-based framework for: interactive distribution modeling, virtual online probability experimentation, statistical data analysis, visualization and integration. Following years of experience in statistical teaching at all college levels using established licensed statistical software packages, like STATA, S-PLUS, R, SPSS, SAS, Systat, etc., we have attempted to engineer a new statistics education environment, the Statistics Online Computational Resource (SOCR). This resource performs many of the standard types of statistical analysis, much like other classical tools. In addition, it is designed in a plug-in object-oriented architecture and is completely platform independent, web-based, interactive, extensible and secure. Over the past 4 years we have tested, fine-tuned and reanalyzed the SOCR framework in many of our undergraduate and graduate probability and statistics courses and have evidence that SOCR resources build student’s intuition and enhance their learning. PMID:21451741

  17. Critical analysis of adsorption data statistically

    NASA Astrophysics Data System (ADS)

    Kaushal, Achla; Singh, S. K.

    2017-10-01

    Experimental data can be presented, computed, and critically analysed in a different way using statistics. A variety of statistical tests are used to make decisions about the significance and validity of the experimental data. In the present study, adsorption was carried out to remove zinc ions from contaminated aqueous solution using mango leaf powder. The experimental data was analysed statistically by hypothesis testing applying t test, paired t test and Chi-square test to (a) test the optimum value of the process pH, (b) verify the success of experiment and (c) study the effect of adsorbent dose in zinc ion removal from aqueous solutions. Comparison of calculated and tabulated values of t and χ 2 showed the results in favour of the data collected from the experiment and this has been shown on probability charts. K value for Langmuir isotherm was 0.8582 and m value for Freundlich adsorption isotherm obtained was 0.725, both are <1, indicating favourable isotherms. Karl Pearson's correlation coefficient values for Langmuir and Freundlich adsorption isotherms were obtained as 0.99 and 0.95 respectively, which show higher degree of correlation between the variables. This validates the data obtained for adsorption of zinc ions from the contaminated aqueous solution with the help of mango leaf powder.

  18. Titanic: A Statistical Exploration.

    ERIC Educational Resources Information Center

    Takis, Sandra L.

    1999-01-01

    Uses the available data about the Titanic's passengers to interest students in exploring categorical data and the chi-square distribution. Describes activities incorporated into a statistics class and gives additional resources for collecting information about the Titanic. (ASK)

  19. a Statistical Texture Feature for Building Collapse Information Extraction of SAR Image

    NASA Astrophysics Data System (ADS)

    Li, L.; Yang, H.; Chen, Q.; Liu, X.

    2018-04-01

    Synthetic Aperture Radar (SAR) has become one of the most important ways to extract post-disaster collapsed building information, due to its extreme versatility and almost all-weather, day-and-night working capability, etc. In view of the fact that the inherent statistical distribution of speckle in SAR images is not used to extract collapsed building information, this paper proposed a novel texture feature of statistical models of SAR images to extract the collapsed buildings. In the proposed feature, the texture parameter of G0 distribution from SAR images is used to reflect the uniformity of the target to extract the collapsed building. This feature not only considers the statistical distribution of SAR images, providing more accurate description of the object texture, but also is applied to extract collapsed building information of single-, dual- or full-polarization SAR data. The RADARSAT-2 data of Yushu earthquake which acquired on April 21, 2010 is used to present and analyze the performance of the proposed method. In addition, the applicability of this feature to SAR data with different polarizations is also analysed, which provides decision support for the data selection of collapsed building information extraction.

  20. Differences in Performance Among Test Statistics for Assessing Phylogenomic Model Adequacy.

    PubMed

    Duchêne, David A; Duchêne, Sebastian; Ho, Simon Y W

    2018-05-18

    Statistical phylogenetic analyses of genomic data depend on models of nucleotide or amino acid substitution. The adequacy of these substitution models can be assessed using a number of test statistics, allowing the model to be rejected when it is found to provide a poor description of the evolutionary process. A potentially valuable use of model-adequacy test statistics is to identify when data sets are likely to produce unreliable phylogenetic estimates, but their differences in performance are rarely explored. We performed a comprehensive simulation study to identify test statistics that are sensitive to some of the most commonly cited sources of phylogenetic estimation error. Our results show that, for many test statistics, traditional thresholds for assessing model adequacy can fail to reject the model when the phylogenetic inferences are inaccurate and imprecise. This is particularly problematic when analysing loci that have few variable informative sites. We propose new thresholds for assessing substitution model adequacy and demonstrate their effectiveness in analyses of three phylogenomic data sets. These thresholds lead to frequent rejection of the model for loci that yield topological inferences that are imprecise and are likely to be inaccurate. We also propose the use of a summary statistic that provides a practical assessment of overall model adequacy. Our approach offers a promising means of enhancing model choice in genome-scale data sets, potentially leading to improvements in the reliability of phylogenomic inference.

  1. Statistical Exposé of a Multiple-Compartment Anaerobic Reactor Treating Domestic Wastewater.

    PubMed

    Pfluger, Andrew R; Hahn, Martha J; Hering, Amanda S; Munakata-Marr, Junko; Figueroa, Linda

    2018-06-01

      Mainstream anaerobic treatment of domestic wastewater is a promising energy-generating treatment strategy; however, such reactors operated in colder regions are not well characterized. Performance data from a pilot-scale, multiple-compartment anaerobic reactor taken over 786 days were subjected to comprehensive statistical analyses. Results suggest that chemical oxygen demand (COD) was a poor proxy for organics in anaerobic systems as oxygen demand from dissolved inorganic material, dissolved methane, and colloidal material influence dissolved and particulate COD measurements. Additionally, univariate and functional boxplots were useful in visualizing variability in contaminant concentrations and identifying statistical outliers. Further, significantly different dissolved organic removal and methane production was observed between operational years, suggesting that anaerobic reactor systems may not achieve steady-state performance within one year. Last, modeling multiple-compartment reactor systems will require data collected over at least two years to capture seasonal variations of the major anaerobic microbial functions occurring within each reactor compartment.

  2. The Ontology of Biological and Clinical Statistics (OBCS) for standardized and reproducible statistical analysis.

    PubMed

    Zheng, Jie; Harris, Marcelline R; Masci, Anna Maria; Lin, Yu; Hero, Alfred; Smith, Barry; He, Yongqun

    2016-09-14

    Statistics play a critical role in biological and clinical research. However, most reports of scientific results in the published literature make it difficult for the reader to reproduce the statistical analyses performed in achieving those results because they provide inadequate documentation of the statistical tests and algorithms applied. The Ontology of Biological and Clinical Statistics (OBCS) is put forward here as a step towards solving this problem. The terms in OBCS including 'data collection', 'data transformation in statistics', 'data visualization', 'statistical data analysis', and 'drawing a conclusion based on data', cover the major types of statistical processes used in basic biological research and clinical outcome studies. OBCS is aligned with the Basic Formal Ontology (BFO) and extends the Ontology of Biomedical Investigations (OBI), an OBO (Open Biological and Biomedical Ontologies) Foundry ontology supported by over 20 research communities. Currently, OBCS comprehends 878 terms, representing 20 BFO classes, 403 OBI classes, 229 OBCS specific classes, and 122 classes imported from ten other OBO ontologies. We discuss two examples illustrating how the ontology is being applied. In the first (biological) use case, we describe how OBCS was applied to represent the high throughput microarray data analysis of immunological transcriptional profiles in human subjects vaccinated with an influenza vaccine. In the second (clinical outcomes) use case, we applied OBCS to represent the processing of electronic health care data to determine the associations between hospital staffing levels and patient mortality. Our case studies were designed to show how OBCS can be used for the consistent representation of statistical analysis pipelines under two different research paradigms. Other ongoing projects using OBCS for statistical data processing are also discussed. The OBCS source code and documentation are available at: https://github.com/obcs/obcs . The Ontology

  3. Spacelab Charcoal Analyses

    NASA Technical Reports Server (NTRS)

    Slivon, L. E.; Hernon-Kenny, L. A.; Katona, V. R.; Dejarme, L. E.

    1995-01-01

    This report describes analytical methods and results obtained from chemical analysis of 31 charcoal samples in five sets. Each set was obtained from a single scrubber used to filter ambient air on board a Spacelab mission. Analysis of the charcoal samples was conducted by thermal desorption followed by gas chromatography/mass spectrometry (GC/MS). All samples were analyzed using identical methods. The method used for these analyses was able to detect compounds independent of their polarity or volatility. In addition to the charcoal samples, analyses of three Environmental Control and Life Support System (ECLSS) water samples were conducted specifically for trimethylamine.

  4. Using generalized additive (mixed) models to analyze single case designs.

    PubMed

    Shadish, William R; Zuur, Alain F; Sullivan, Kristynn J

    2014-04-01

    This article shows how to apply generalized additive models and generalized additive mixed models to single-case design data. These models excel at detecting the functional form between two variables (often called trend), that is, whether trend exists, and if it does, what its shape is (e.g., linear and nonlinear). In many respects, however, these models are also an ideal vehicle for analyzing single-case designs because they can consider level, trend, variability, overlap, immediacy of effect, and phase consistency that single-case design researchers examine when interpreting a functional relation. We show how these models can be implemented in a wide variety of ways to test whether treatment is effective, whether cases differ from each other, whether treatment effects vary over cases, and whether trend varies over cases. We illustrate diagnostic statistics and graphs, and we discuss overdispersion of data in detail, with examples of quasibinomial models for overdispersed data, including how to compute dispersion and quasi-AIC fit indices in generalized additive models. We show how generalized additive mixed models can be used to estimate autoregressive models and random effects and discuss the limitations of the mixed models compared to generalized additive models. We provide extensive annotated syntax for doing all these analyses in the free computer program R. Copyright © 2013 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.

  5. Statistical reporting of clinical pharmacology research.

    PubMed

    Ring, Arne; Schall, Robert; Loke, Yoon K; Day, Simon

    2017-06-01

    Research in clinical pharmacology covers a wide range of experiments, trials and investigations: clinical trials, systematic reviews and meta-analyses of drug usage after market approval, the investigation of pharmacokinetic-pharmacodynamic relationships, the search for mechanisms of action or for potential signals for efficacy and safety using biomarkers. Often these investigations are exploratory in nature, which has implications for the way the data should be analysed and presented. Here we summarize some of the statistical issues that are of particular importance in clinical pharmacology research. © 2017 The British Pharmacological Society.

  6. How the Mastery Rubric for Statistical Literacy Can Generate Actionable Evidence about Statistical and Quantitative Learning Outcomes

    ERIC Educational Resources Information Center

    Tractenberg, Rochelle E.

    2017-01-01

    Statistical literacy is essential to an informed citizenry; and two emerging trends highlight a growing need for training that achieves this literacy. The first trend is towards "big" data: while automated analyses can exploit massive amounts of data, the interpretation--and possibly more importantly, the replication--of results are…

  7. Post Hoc Analyses of ApoE Genotype-Defined Subgroups in Clinical Trials.

    PubMed

    Kennedy, Richard E; Cutter, Gary R; Wang, Guoqiao; Schneider, Lon S

    2016-01-01

    Many post hoc analyses of clinical trials in Alzheimer's disease (AD) and mild cognitive impairment (MCI) are in small Phase 2 trials. Subject heterogeneity may lead to statistically significant post hoc results that cannot be replicated in larger follow-up studies. We investigated the extent of this problem using simulation studies mimicking current trial methods with post hoc analyses based on ApoE4 carrier status. We used a meta-database of 24 studies, including 3,574 subjects with mild AD and 1,171 subjects with MCI/prodromal AD, to simulate clinical trial scenarios. Post hoc analyses examined if rates of progression on the Alzheimer's Disease Assessment Scale-cognitive (ADAS-cog) differed between ApoE4 carriers and non-carriers. Across studies, ApoE4 carriers were younger and had lower baseline scores, greater rates of progression, and greater variability on the ADAS-cog. Up to 18% of post hoc analyses for 18-month trials in AD showed greater rates of progression for ApoE4 non-carriers that were statistically significant but unlikely to be confirmed in follow-up studies. The frequency of erroneous conclusions dropped below 3% with trials of 100 subjects per arm. In MCI, rates of statistically significant differences with greater progression in ApoE4 non-carriers remained below 3% unless sample sizes were below 25 subjects per arm. Statistically significant differences for ApoE4 in post hoc analyses often reflect heterogeneity among small samples rather than true differential effect among ApoE4 subtypes. Such analyses must be viewed cautiously. ApoE genotype should be incorporated into the design stage to minimize erroneous conclusions.

  8. Statistical Treatment of Looking-Time Data

    ERIC Educational Resources Information Center

    Csibra, Gergely; Hernik, Mikolaj; Mascaro, Olivier; Tatone, Denis; Lengyel, Máté

    2016-01-01

    Looking times (LTs) are frequently measured in empirical research on infant cognition. We analyzed the statistical distribution of LTs across participants to develop recommendations for their treatment in infancy research. Our analyses focused on a common within-subject experimental design, in which longer looking to novel or unexpected stimuli is…

  9. Functional genomics annotation of a statistical epistasis network associated with bladder cancer susceptibility.

    PubMed

    Hu, Ting; Pan, Qinxin; Andrew, Angeline S; Langer, Jillian M; Cole, Michael D; Tomlinson, Craig R; Karagas, Margaret R; Moore, Jason H

    2014-04-11

    Several different genetic and environmental factors have been identified as independent risk factors for bladder cancer in population-based studies. Recent studies have turned to understanding the role of gene-gene and gene-environment interactions in determining risk. We previously developed the bioinformatics framework of statistical epistasis networks (SEN) to characterize the global structure of interacting genetic factors associated with a particular disease or clinical outcome. By applying SEN to a population-based study of bladder cancer among Caucasians in New Hampshire, we were able to identify a set of connected genetic factors with strong and significant interaction effects on bladder cancer susceptibility. To support our statistical findings using networks, in the present study, we performed pathway enrichment analyses on the set of genes identified using SEN, and found that they are associated with the carcinogen benzo[a]pyrene, a component of tobacco smoke. We further carried out an mRNA expression microarray experiment to validate statistical genetic interactions, and to determine if the set of genes identified in the SEN were differentially expressed in a normal bladder cell line and a bladder cancer cell line in the presence or absence of benzo[a]pyrene. Significant nonrandom sets of genes from the SEN were found to be differentially expressed in response to benzo[a]pyrene in both the normal bladder cells and the bladder cancer cells. In addition, the patterns of gene expression were significantly different between these two cell types. The enrichment analyses and the gene expression microarray results support the idea that SEN analysis of bladder in population-based studies is able to identify biologically meaningful statistical patterns. These results bring us a step closer to a systems genetic approach to understanding cancer susceptibility that integrates population and laboratory-based studies.

  10. Dissecting the genetics of complex traits using summary association statistics.

    PubMed

    Pasaniuc, Bogdan; Price, Alkes L

    2017-02-01

    During the past decade, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyse summary association statistics. Here, we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.

  11. Online Statistical Modeling (Regression Analysis) for Independent Responses

    NASA Astrophysics Data System (ADS)

    Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

    2017-06-01

    Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.

  12. A Statistical Method for Synthesizing Mediation Analyses Using the Product of Coefficient Approach Across Multiple Trials

    PubMed Central

    Huang, Shi; MacKinnon, David P.; Perrino, Tatiana; Gallo, Carlos; Cruden, Gracelyn; Brown, C Hendricks

    2016-01-01

    Mediation analysis often requires larger sample sizes than main effect analysis to achieve the same statistical power. Combining results across similar trials may be the only practical option for increasing statistical power for mediation analysis in some situations. In this paper, we propose a method to estimate: 1) marginal means for mediation path a, the relation of the independent variable to the mediator; 2) marginal means for path b, the relation of the mediator to the outcome, across multiple trials; and 3) the between-trial level variance-covariance matrix based on a bivariate normal distribution. We present the statistical theory and an R computer program to combine regression coefficients from multiple trials to estimate a combined mediated effect and confidence interval under a random effects model. Values of coefficients a and b, along with their standard errors from each trial are the input for the method. This marginal likelihood based approach with Monte Carlo confidence intervals provides more accurate inference than the standard meta-analytic approach. We discuss computational issues, apply the method to two real-data examples and make recommendations for the use of the method in different settings. PMID:28239330

  13. Multiple statistical tests: Lessons from a d20.

    PubMed

    Madan, Christopher R

    2016-01-01

    Statistical analyses are often conducted with α= .05. When multiple statistical tests are conducted, this procedure needs to be adjusted to compensate for the otherwise inflated Type I error. In some instances in tabletop gaming, sometimes it is desired to roll a 20-sided die (or 'd20') twice and take the greater outcome. Here I draw from probability theory and the case of a d20, where the probability of obtaining any specific outcome is (1)/ 20, to determine the probability of obtaining a specific outcome (Type-I error) at least once across repeated, independent statistical tests.

  14. Trends in selected streamflow statistics at 19 long-term streamflow-gaging stations indicative of outflows from Texas to Arkansas, Louisiana, Galveston Bay, and the Gulf of Mexico, 1922-2009

    USGS Publications Warehouse

    Barbie, Dana L.; Wehmeyer, Loren L.

    2012-01-01

    Trends in selected streamflow statistics during 1922-2009 were evaluated at 19 long-term streamflow-gaging stations considered indicative of outflows from Texas to Arkansas, Louisiana, Galveston Bay, and the Gulf of Mexico. The U.S. Geological Survey, in cooperation with the Texas Water Development Board, evaluated streamflow data from streamflow-gaging stations with more than 50 years of record that were active as of 2009. The outflows into Arkansas and Louisiana were represented by 3 streamflow-gaging stations, and outflows into the Gulf of Mexico, including Galveston Bay, were represented by 16 streamflow-gaging stations. Monotonic trend analyses were done using the following three streamflow statistics generated from daily mean values of streamflow: (1) annual mean daily discharge, (2) annual maximum daily discharge, and (3) annual minimum daily discharge. The trend analyses were based on the nonparametric Kendall's Tau test, which is useful for the detection of monotonic upward or downward trends with time. A total of 69 trend analyses by Kendall's Tau were computed - 19 periods of streamflow multiplied by the 3 streamflow statistics plus 12 additional trend analyses because the periods of record for 2 streamflow-gaging stations were divided into periods representing pre- and post-reservoir impoundment. Unless otherwise described, each trend analysis used the entire period of record for each streamflow-gaging station. The monotonic trend analysis detected 11 statistically significant downward trends, 37 instances of no trend, and 21 statistically significant upward trends. One general region studied, which seemingly has relatively more upward trends for many of the streamflow statistics analyzed, includes the rivers and associated creeks and bayous to Galveston Bay in the Houston metropolitan area. Lastly, the most western river basins considered (the Nueces and Rio Grande) had statistically significant downward trends for many of the streamflow statistics

  15. A statistical package for computing time and frequency domain analysis

    NASA Technical Reports Server (NTRS)

    Brownlow, J.

    1978-01-01

    The spectrum analysis (SPA) program is a general purpose digital computer program designed to aid in data analysis. The program does time and frequency domain statistical analyses as well as some preanalysis data preparation. The capabilities of the SPA program include linear trend removal and/or digital filtering of data, plotting and/or listing of both filtered and unfiltered data, time domain statistical characterization of data, and frequency domain statistical characterization of data.

  16. Statistical alignment: computational properties, homology testing and goodness-of-fit.

    PubMed

    Hein, J; Wiuf, C; Knudsen, B; Møller, M B; Wibling, G

    2000-09-08

    The model of insertions and deletions in biological sequences, first formulated by Thorne, Kishino, and Felsenstein in 1991 (the TKF91 model), provides a basis for performing alignment within a statistical framework. Here we investigate this model.Firstly, we show how to accelerate the statistical alignment algorithms several orders of magnitude. The main innovations are to confine likelihood calculations to a band close to the similarity based alignment, to get good initial guesses of the evolutionary parameters and to apply an efficient numerical optimisation algorithm for finding the maximum likelihood estimate. In addition, the recursions originally presented by Thorne, Kishino and Felsenstein can be simplified. Two proteins, about 1500 amino acids long, can be analysed with this method in less than five seconds on a fast desktop computer, which makes this method practical for actual data analysis.Secondly, we propose a new homology test based on this model, where homology means that an ancestor to a sequence pair can be found finitely far back in time. This test has statistical advantages relative to the traditional shuffle test for proteins.Finally, we describe a goodness-of-fit test, that allows testing the proposed insertion-deletion (indel) process inherent to this model and find that real sequences (here globins) probably experience indels longer than one, contrary to what is assumed by the model. Copyright 2000 Academic Press.

  17. Basic statistical analyses of candidate nickel-hydrogen cells for the Space Station Freedom

    NASA Technical Reports Server (NTRS)

    Maloney, Thomas M.; Frate, David T.

    1993-01-01

    Nickel-Hydrogen (Ni/H2) secondary batteries will be implemented as a power source for the Space Station Freedom as well as for other NASA missions. Consequently, characterization tests of Ni/H2 cells from Eagle-Picher, Whittaker-Yardney, and Hughes were completed at the NASA Lewis Research Center. Watt-hour efficiencies of each Ni/H2 cell were measured for regulated charge and discharge cycles as a function of temperature, charge rate, discharge rate, and state of charge. Temperatures ranged from -5 C to 30 C, charge rates ranged from C/10 to 1C, discharge rates ranged from C/10 to 2C, and states of charge ranged from 20 percent to 100 percent. Results from regression analyses and analyses of mean watt-hour efficiencies demonstrated that overall performance was best at temperatures between 10 C and 20 C while the discharge rate correlated most strongly with watt-hour efficiency. In general, the cell with back-to-back electrode arrangement, single stack, 26 percent KOH, and serrated zircar separator and the cell with a recirculating electrode arrangement, unit stack, 31 percent KOH, zircar separators performed best.

  18. 10 CFR 436.24 - Uncertainty analyses.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... Procedures for Life Cycle Cost Analyses § 436.24 Uncertainty analyses. If particular items of cost data or... impact of uncertainty on the calculation of life cycle cost effectiveness or the assignment of rank order... and probabilistic analysis. If additional analysis casts substantial doubt on the life cycle cost...

  19. 10 CFR 436.24 - Uncertainty analyses.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... Procedures for Life Cycle Cost Analyses § 436.24 Uncertainty analyses. If particular items of cost data or... impact of uncertainty on the calculation of life cycle cost effectiveness or the assignment of rank order... and probabilistic analysis. If additional analysis casts substantial doubt on the life cycle cost...

  20. 10 CFR 436.24 - Uncertainty analyses.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... Procedures for Life Cycle Cost Analyses § 436.24 Uncertainty analyses. If particular items of cost data or... impact of uncertainty on the calculation of life cycle cost effectiveness or the assignment of rank order... and probabilistic analysis. If additional analysis casts substantial doubt on the life cycle cost...

  1. 10 CFR 436.24 - Uncertainty analyses.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... Procedures for Life Cycle Cost Analyses § 436.24 Uncertainty analyses. If particular items of cost data or... impact of uncertainty on the calculation of life cycle cost effectiveness or the assignment of rank order... and probabilistic analysis. If additional analysis casts substantial doubt on the life cycle cost...

  2. Impact of searching clinical trial registries in systematic reviews of pharmaceutical treatments: methodological systematic review and reanalysis of meta-analyses.

    PubMed

    Baudard, Marie; Yavchitz, Amélie; Ravaud, Philippe; Perrodeau, Elodie; Boutron, Isabelle

    2017-02-17

    Objective  To evaluate the impact of searching clinical trial registries in systematic reviews. Design  Methodological systematic review and reanalyses of meta-analyses. Data sources  Medline was searched to identify systematic reviews of randomised controlled trials (RCTs) assessing pharmaceutical treatments published between June 2014 and January 2015. For all systematic reviews that did not report a trial registry search but reported the information to perform it, the World Health Organization International Trials Registry Platform (WHO ICTRP search portal) was searched for completed or terminated RCTs not originally included in the systematic review. Data extraction  For each systematic review, two researchers independently extracted the outcomes analysed, the number of patients included, and the treatment effect estimated. For each RCT identified, two researchers independently determined whether the results were available (ie, posted, published, or available on the sponsor website) and extracted the data. When additional data were retrieved, we reanalysed meta-analyses and calculated the weight of the additional RCTs and the change in summary statistics by comparison with the original meta-analysis. Results  Among 223 selected systematic reviews, 116 (52%) did not report a search of trial registries; 21 of these did not report the information to perform the search (key words, search date). A search was performed for 95 systematic reviews; for 54 (57%), no additional RCTs were found and for 41 (43%) 122 additional RCTs were identified. The search allowed for increasing the number of patients by more than 10% in 19 systematic reviews, 20% in 10, 30% in seven, and 50% in four. Moreover, 63 RCTs had results available; the results for 45 could be included in a meta-analysis. 14 systematic reviews including 45 RCTs were reanalysed. The weight of the additional RCTs in the recalculated meta-analyses ranged from 0% to 58% and was greater than 10% in five of 14

  3. Monitoring the quality consistency of Weibizhi tablets by micellar electrokinetic chromatography fingerprints combined with multivariate statistical analyses, the simple quantified ratio fingerprint method, and the fingerprint-efficacy relationship.

    PubMed

    Liu, Yingchun; Sun, Guoxiang; Wang, Yan; Yang, Lanping; Yang, Fangliang

    2015-06-01

    Micellar electrokinetic chromatography fingerprinting combined with quantification was successfully developed and applied to monitor the quality consistency of Weibizhi tablets, which is a classical compound preparation used to treat gastric ulcers. A background electrolyte composed of 57 mmol/L sodium borate, 21 mmol/L sodium dodecylsulfate and 100 mmol/L sodium hydroxide was used to separate compounds. To optimize capillary electrophoresis conditions, multivariate statistical analyses were applied. First, the most important factors influencing sample electrophoretic behavior were identified as background electrolyte concentrations. Then, a Box-Benhnken design response surface strategy using resolution index RF as an integrated response was set up to correlate factors with response. RF reflects the effective signal amount, resolution, and signal homogenization in an electropherogram, thus, it was regarded as an excellent indicator. In fingerprint assessments, simple quantified ratio fingerprint method was established for comprehensive quality discrimination of traditional Chinese medicines/herbal medicines from qualitative and quantitative perspectives, by which the quality of 27 samples from the same manufacturer were well differentiated. In addition, the fingerprint-efficacy relationship between fingerprints and antioxidant activities was established using partial least squares regression, which provided important medicinal efficacy information for quality control. The present study offered an efficient means for monitoring Weibizhi tablet quality consistency. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Computational modeling and statistical analyses on individual contact rate and exposure to disease in complex and confined transportation hubs

    NASA Astrophysics Data System (ADS)

    Wang, W. L.; Tsui, K. L.; Lo, S. M.; Liu, S. B.

    2018-01-01

    Crowded transportation hubs such as metro stations are thought as ideal places for the development and spread of epidemics. However, for the special features of complex spatial layout, confined environment with a large number of highly mobile individuals, it is difficult to quantify human contacts in such environments, wherein disease spreading dynamics were less explored in the previous studies. Due to the heterogeneity and dynamic nature of human interactions, increasing studies proved the importance of contact distance and length of contact in transmission probabilities. In this study, we show how detailed information on contact and exposure patterns can be obtained by statistical analyses on microscopic crowd simulation data. To be specific, a pedestrian simulation model-CityFlow was employed to reproduce individuals' movements in a metro station based on site survey data, values and distributions of individual contact rate and exposure in different simulation cases were obtained and analyzed. It is interesting that Weibull distribution fitted the histogram values of individual-based exposure in each case very well. Moreover, we found both individual contact rate and exposure had linear relationship with the average crowd densities of the environments. The results obtained in this paper can provide reference to epidemic study in complex and confined transportation hubs and refine the existing disease spreading models.

  5. Categorization of the trophic status of a hydroelectric power plant reservoir in the Brazilian Amazon by statistical analyses and fuzzy approaches.

    PubMed

    da Costa Lobato, Tarcísio; Hauser-Davis, Rachel Ann; de Oliveira, Terezinha Ferreira; Maciel, Marinalva Cardoso; Tavares, Maria Regina Madruga; da Silveira, Antônio Morais; Saraiva, Augusto Cesar Fonseca

    2015-02-15

    The Amazon area has been increasingly suffering from anthropogenic impacts, especially due to the construction of hydroelectric power plant reservoirs. The analysis and categorization of the trophic status of these reservoirs are of interest to indicate man-made changes in the environment. In this context, the present study aimed to categorize the trophic status of a hydroelectric power plant reservoir located in the Brazilian Amazon by constructing a novel Water Quality Index (WQI) and Trophic State Index (TSI) for the reservoir using major ion concentrations and physico-chemical water parameters determined in the area and taking into account the sampling locations and the local hydrological regimes. After applying statistical analyses (factor analysis and cluster analysis) and establishing a rule base of a fuzzy system to these indicators, the results obtained by the proposed method were then compared to the generally applied Carlson and a modified Lamparelli trophic state index (TSI), specific for trophic regions. The categorization of the trophic status by the proposed fuzzy method was shown to be more reliable, since it takes into account the specificities of the study area, while the Carlson and Lamparelli TSI do not, and, thus, tend to over or underestimate the trophic status of these ecosystems. The statistical techniques proposed and applied in the present study, are, therefore, relevant in cases of environmental management and policy decision-making processes, aiding in the identification of the ecological status of water bodies. With this, it is possible to identify which factors should be further investigated and/or adjusted in order to attempt the recovery of degraded water bodies. Copyright © 2014 Elsevier B.V. All rights reserved.

  6. The Economic Cost of Homosexuality: Multilevel Analyses

    ERIC Educational Resources Information Center

    Baumle, Amanda K.; Poston, Dudley, Jr.

    2011-01-01

    This article builds on earlier studies that have examined "the economic cost of homosexuality," by using data from the 2000 U.S. Census and by employing multilevel analyses. Our findings indicate that partnered gay men experience a 12.5 percent earnings penalty compared to married heterosexual men, and a statistically insignificant earnings…

  7. Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

    PubMed

    Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

    2014-09-18

    Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.

  8. Publication bias in dermatology systematic reviews and meta-analyses.

    PubMed

    Atakpo, Paul; Vassar, Matt

    2016-05-01

    Systematic reviews and meta-analyses in dermatology provide high-level evidence for clinicians and policy makers that influence clinical decision making and treatment guidelines. One methodological problem with systematic reviews is the under representation of unpublished studies. This problem is due in part to publication bias. Omission of statistically non-significant data from meta-analyses may result in overestimation of treatment effect sizes which may lead to clinical consequences. Our goal was to assess whether systematic reviewers in dermatology evaluate and report publication bias. Further, we wanted to conduct our own evaluation of publication bias on meta-analyses that failed to do so. Our study considered systematic reviews and meta-analyses from ten dermatology journals from 2006 to 2016. A PubMed search was conducted, and all full-text articles that met our inclusion criteria were retrieved and coded by the primary author. 293 articles were included in our analysis. Additionally, we formally evaluated publication bias in meta-analyses that failed to do so using trim and fill and cumulative meta-analysis by precision methods. Publication bias was mentioned in 107 articles (36.5%) and was formally evaluated in 64 articles (21.8%). Visual inspection of a funnel plot was the most common method of evaluating publication bias. Publication bias was present in 45 articles (15.3%), not present in 57 articles (19.5%) and not determined in 191 articles (65.2%). Using the trim and fill method, 7 meta-analyses (33.33%) showed evidence of publication bias. Although the trim and fill method only found evidence of publication bias in 7 meta-analyses, the cumulative meta-analysis by precision method found evidence of publication bias in 15 meta-analyses (71.4%). Many of the reviews in our study did not mention or evaluate publication bias. Further, of the 42 articles that stated following PRISMA reporting guidelines, 19 (45.2%) evaluated for publication bias. In

  9. Putting Meaning Back Into the Mean: A Comment on the Misuse of Elementary Statistics in a Sample of Manuscripts Submitted to Clinical Therapeutics.

    PubMed

    Forrester, Janet E

    2015-12-01

    Errors in the statistical presentation and analyses of data in the medical literature remain common despite efforts to improve the review process, including the creation of guidelines for authors and the use of statistical reviewers. This article discusses common elementary statistical errors seen in manuscripts recently submitted to Clinical Therapeutics and describes some ways in which authors and reviewers can identify errors and thus correct them before publication. A nonsystematic sample of manuscripts submitted to Clinical Therapeutics over the past year was examined for elementary statistical errors. Clinical Therapeutics has many of the same errors that reportedly exist in other journals. Authors require additional guidance to avoid elementary statistical errors and incentives to use the guidance. Implementation of reporting guidelines for authors and reviewers by journals such as Clinical Therapeutics may be a good approach to reduce the rate of statistical errors. Copyright © 2015 Elsevier HS Journals, Inc. All rights reserved.

  10. Descriptive and inferential statistical methods used in burns research.

    PubMed

    Al-Benna, Sammy; Al-Ajam, Yazan; Way, Benjamin; Steinstraesser, Lars

    2010-05-01

    Burns research articles utilise a variety of descriptive and inferential methods to present and analyse data. The aim of this study was to determine the descriptive methods (e.g. mean, median, SD, range, etc.) and survey the use of inferential methods (statistical tests) used in articles in the journal Burns. This study defined its population as all original articles published in the journal Burns in 2007. Letters to the editor, brief reports, reviews, and case reports were excluded. Study characteristics, use of descriptive statistics and the number and types of statistical methods employed were evaluated. Of the 51 articles analysed, 11(22%) were randomised controlled trials, 18(35%) were cohort studies, 11(22%) were case control studies and 11(22%) were case series. The study design and objectives were defined in all articles. All articles made use of continuous and descriptive data. Inferential statistics were used in 49(96%) articles. Data dispersion was calculated by standard deviation in 30(59%). Standard error of the mean was quoted in 19(37%). The statistical software product was named in 33(65%). Of the 49 articles that used inferential statistics, the tests were named in 47(96%). The 6 most common tests used (Student's t-test (53%), analysis of variance/co-variance (33%), chi(2) test (27%), Wilcoxon & Mann-Whitney tests (22%), Fisher's exact test (12%)) accounted for the majority (72%) of statistical methods employed. A specified significance level was named in 43(88%) and the exact significance levels were reported in 28(57%). Descriptive analysis and basic statistical techniques account for most of the statistical tests reported. This information should prove useful in deciding which tests should be emphasised in educating burn care professionals. These results highlight the need for burn care professionals to have a sound understanding of basic statistics, which is crucial in interpreting and reporting data. Advice should be sought from professionals

  11. The Influence of Statistical versus Exemplar Appeals on Indian Adults' Health Intentions: An Investigation of Direct Effects and Intervening Persuasion Processes.

    PubMed

    McKinley, Christopher J; Limbu, Yam; Jayachandran, C N

    2017-04-01

    In two separate investigations, we examined the persuasive effectiveness of statistical versus exemplar appeals on Indian adults' smoking cessation and mammography screening intentions. To more comprehensively address persuasion processes, we explored whether message response and perceived message effectiveness functioned as antecedents to persuasive effects. Results showed that statistical appeals led to higher levels of health intentions than exemplar appeals. In addition, findings from both studies indicated that statistical appeals stimulated more attention and were perceived as more effective than anecdotal accounts. Among male smokers, statistical appeals also generated greater cognitive processing than exemplar appeals. Subsequent mediation analyses revealed that message response and perceived message effectiveness fully carried the influence of appeal format on health intentions. Given these findings, future public health initiatives conducted among similar populations should design messages that include substantive factual information while ensuring that this content is perceived as credible and valuable.

  12. Challenges and solutions to pre- and post-randomization subgroup analyses.

    PubMed

    Desai, Manisha; Pieper, Karen S; Mahaffey, Ken

    2014-01-01

    Subgroup analyses are commonly performed in the clinical trial setting with the purpose of illustrating that the treatment effect was consistent across different patient characteristics or identifying characteristics that should be targeted for treatment. There are statistical issues involved in performing subgroup analyses, however. These have been given considerable attention in the literature for analyses where subgroups are defined by a pre-randomization feature. Although subgroup analyses are often performed with subgroups defined by a post-randomization feature--including analyses that estimate the treatment effect among compliers--discussion of these analyses has been neglected in the clinical literature. Such analyses pose a high risk of presenting biased descriptions of treatment effects. We summarize the challenges of doing all types of subgroup analyses described in the literature. In particular, we emphasize issues with post-randomization subgroup analyses. Finally, we provide guidelines on how to proceed across the spectrum of subgroup analyses.

  13. Distinguishing synchronous and time-varying synergies using point process interval statistics: motor primitives in frog and rat

    PubMed Central

    Hart, Corey B.; Giszter, Simon F.

    2013-01-01

    We present and apply a method that uses point process statistics to discriminate the forms of synergies in motor pattern data, prior to explicit synergy extraction. The method uses electromyogram (EMG) pulse peak timing or onset timing. Peak timing is preferable in complex patterns where pulse onsets may be overlapping. An interval statistic derived from the point processes of EMG peak timings distinguishes time-varying synergies from synchronous synergies (SS). Model data shows that the statistic is robust for most conditions. Its application to both frog hindlimb EMG and rat locomotion hindlimb EMG show data from these preparations is clearly most consistent with synchronous synergy models (p < 0.001). Additional direct tests of pulse and interval relations in frog data further bolster the support for synchronous synergy mechanisms in these data. Our method and analyses support separated control of rhythm and pattern of motor primitives, with the low level execution primitives comprising pulsed SS in both frog and rat, and both episodic and rhythmic behaviors. PMID:23675341

  14. ISSUES IN THE STATISTICAL ANALYSIS OF SMALL-AREA HEALTH DATA. (R825173)

    EPA Science Inventory

    The availability of geographically indexed health and population data, with advances in computing, geographical information systems and statistical methodology, have opened the way for serious exploration of small area health statistics based on routine data. Such analyses may be...

  15. Using DEWIS and R for Multi-Staged Statistics e-Assessments

    ERIC Educational Resources Information Center

    Gwynllyw, D. Rhys; Weir, Iain S.; Henderson, Karen L.

    2016-01-01

    We demonstrate how the DEWIS e-Assessment system may use embedded R code to facilitate the assessment of students' ability to perform involved statistical analyses. The R code has been written to emulate SPSS output and thus the statistical results for each bespoke data set can be generated efficiently and accurately using standard R routines.…

  16. Combined Analyses of Bacterial, Fungal and Nematode Communities in Andosolic Agricultural Soils in Japan

    PubMed Central

    Bao, Zhihua; Ikunaga, Yoko; Matsushita, Yuko; Morimoto, Sho; Takada-Hoshino, Yuko; Okada, Hiroaki; Oba, Hirosuke; Takemoto, Shuhei; Niwa, Shigeru; Ohigashi, Kentaro; Suzuki, Chika; Nagaoka, Kazunari; Takenaka, Makoto; Urashima, Yasufumi; Sekiguchi, Hiroyuki; Kushida, Atsuhiko; Toyota, Koki; Saito, Masanori; Tsushima, Seiya

    2012-01-01

    We simultaneously examined the bacteria, fungi and nematode communities in Andosols from four agro-geographical sites in Japan using polymerase chain reaction-denaturing gradient gel electrophoresis (PCR-DGGE) and statistical analyses to test the effects of environmental factors including soil properties on these communities depending on geographical sites. Statistical analyses such as Principal component analysis (PCA) and Redundancy analysis (RDA) revealed that the compositions of the three soil biota communities were strongly affected by geographical sites, which were in turn strongly associated with soil characteristics such as total C (TC), total N (TN), C/N ratio and annual mean soil temperature (ST). In particular, the TC, TN and C/N ratio had stronger effects on bacterial and fungal communities than on the nematode community. Additionally, two-way cluster analysis using the combined DGGE profile also indicated that all soil samples were classified into four clusters corresponding to the four sites, showing high site specificity of soil samples, and all DNA bands were classified into four clusters, showing the coexistence of specific DGGE bands of bacteria, fungi and nematodes in Andosol fields. The results of this study suggest that geography relative to soil properties has a simultaneous impact on soil microbial and nematode community compositions. This is the first combined profile analysis of bacteria, fungi and nematodes at different sites with agricultural Andosols. PMID:22223474

  17. Combined analyses of bacterial, fungal and nematode communities in andosolic agricultural soils in Japan.

    PubMed

    Bao, Zhihua; Ikunaga, Yoko; Matsushita, Yuko; Morimoto, Sho; Takada-Hoshino, Yuko; Okada, Hiroaki; Oba, Hirosuke; Takemoto, Shuhei; Niwa, Shigeru; Ohigashi, Kentaro; Suzuki, Chika; Nagaoka, Kazunari; Takenaka, Makoto; Urashima, Yasufumi; Sekiguchi, Hiroyuki; Kushida, Atsuhiko; Toyota, Koki; Saito, Masanori; Tsushima, Seiya

    2012-01-01

    We simultaneously examined the bacteria, fungi and nematode communities in Andosols from four agro-geographical sites in Japan using polymerase chain reaction-denaturing gradient gel electrophoresis (PCR-DGGE) and statistical analyses to test the effects of environmental factors including soil properties on these communities depending on geographical sites. Statistical analyses such as Principal component analysis (PCA) and Redundancy analysis (RDA) revealed that the compositions of the three soil biota communities were strongly affected by geographical sites, which were in turn strongly associated with soil characteristics such as total C (TC), total N (TN), C/N ratio and annual mean soil temperature (ST). In particular, the TC, TN and C/N ratio had stronger effects on bacterial and fungal communities than on the nematode community. Additionally, two-way cluster analysis using the combined DGGE profile also indicated that all soil samples were classified into four clusters corresponding to the four sites, showing high site specificity of soil samples, and all DNA bands were classified into four clusters, showing the coexistence of specific DGGE bands of bacteria, fungi and nematodes in Andosol fields. The results of this study suggest that geography relative to soil properties has a simultaneous impact on soil microbial and nematode community compositions. This is the first combined profile analysis of bacteria, fungi and nematodes at different sites with agricultural Andosols.

  18. Citation of previous meta-analyses on the same topic: a clue to perpetuation of incorrect methods?

    PubMed

    Li, Tianjing; Dickersin, Kay

    2013-06-01

    Systematic reviews and meta-analyses serve as a basis for decision-making and clinical practice guidelines and should be carried out using appropriate methodology to avoid incorrect inferences. We describe the characteristics, statistical methods used for meta-analyses, and citation patterns of all 21 glaucoma systematic reviews we identified pertaining to the effectiveness of prostaglandin analog eye drops in treating primary open-angle glaucoma, published between December 2000 and February 2012. We abstracted data, assessed whether appropriate statistical methods were applied in meta-analyses, and examined citation patterns of included reviews. We identified two forms of problematic statistical analyses in 9 of the 21 systematic reviews examined. Except in 1 case, none of the 9 reviews that used incorrect statistical methods cited a previously published review that used appropriate methods. Reviews that used incorrect methods were cited 2.6 times more often than reviews that used appropriate statistical methods. We speculate that by emulating the statistical methodology of previous systematic reviews, systematic review authors may have perpetuated incorrect approaches to meta-analysis. The use of incorrect statistical methods, perhaps through emulating methods described in previous research, calls conclusions of systematic reviews into question and may lead to inappropriate patient care. We urge systematic review authors and journal editors to seek the advice of experienced statisticians before undertaking or accepting for publication a systematic review and meta-analysis. The author(s) have no proprietary or commercial interest in any materials discussed in this article. Copyright © 2013 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  19. A proposal for the measurement of graphical statistics effectiveness: Does it enhance or interfere with statistical reasoning?

    NASA Astrophysics Data System (ADS)

    Agus, M.; Penna, M. P.; Peró-Cebollero, M.; Guàrdia-Olmos, J.

    2015-02-01

    Numerous studies have examined students' difficulties in understanding some notions related to statistical problems. Some authors observed that the presentation of distinct visual representations could increase statistical reasoning, supporting the principle of graphical facilitation. But other researchers disagree with this viewpoint, emphasising the impediments related to the use of illustrations that could overcharge the cognitive system with insignificant data. In this work we aim at comparing the probabilistic statistical reasoning regarding two different formats of problem presentations: graphical and verbal-numerical. We have conceived and presented five pairs of homologous simple problems in the verbal numerical and graphical format to 311 undergraduate Psychology students (n=156 in Italy and n=155 in Spain) without statistical expertise. The purpose of our work was to evaluate the effect of graphical facilitation in probabilistic statistical reasoning. Every undergraduate has solved each pair of problems in two formats in different problem presentation orders and sequences. Data analyses have highlighted that the effect of graphical facilitation is infrequent in psychology undergraduates. This effect is related to many factors (as knowledge, abilities, attitudes, and anxiety); moreover it might be considered the resultant of interaction between individual and task characteristics.

  20. Statistical Design Model (SDM) of satellite thermal control subsystem

    NASA Astrophysics Data System (ADS)

    Mirshams, Mehran; Zabihian, Ehsan; Aarabi Chamalishahi, Mahdi

    2016-07-01

    Satellites thermal control, is a satellite subsystem that its main task is keeping the satellite components at its own survival and activity temperatures. Ability of satellite thermal control plays a key role in satisfying satellite's operational requirements and designing this subsystem is a part of satellite design. In the other hand due to the lack of information provided by companies and designers still doesn't have a specific design process while it is one of the fundamental subsystems. The aim of this paper, is to identify and extract statistical design models of spacecraft thermal control subsystem by using SDM design method. This method analyses statistical data with a particular procedure. To implement SDM method, a complete database is required. Therefore, we first collect spacecraft data and create a database, and then we extract statistical graphs using Microsoft Excel, from which we further extract mathematical models. Inputs parameters of the method are mass, mission, and life time of the satellite. For this purpose at first thermal control subsystem has been introduced and hardware using in the this subsystem and its variants has been investigated. In the next part different statistical models has been mentioned and a brief compare will be between them. Finally, this paper particular statistical model is extracted from collected statistical data. Process of testing the accuracy and verifying the method use a case study. Which by the comparisons between the specifications of thermal control subsystem of a fabricated satellite and the analyses results, the methodology in this paper was proved to be effective. Key Words: Thermal control subsystem design, Statistical design model (SDM), Satellite conceptual design, Thermal hardware

  1. Perception of ensemble statistics requires attention.

    PubMed

    Jackson-Nielsen, Molly; Cohen, Michael A; Pitts, Michael A

    2017-02-01

    To overcome inherent limitations in perceptual bandwidth, many aspects of the visual world are represented as summary statistics (e.g., average size, orientation, or density of objects). Here, we investigated the relationship between summary (ensemble) statistics and visual attention. Recently, it was claimed that one ensemble statistic in particular, color diversity, can be perceived without focal attention. However, a broader debate exists over the attentional requirements of conscious perception, and it is possible that some form of attention is necessary for ensemble perception. To test this idea, we employed a modified inattentional blindness paradigm and found that multiple types of summary statistics (color and size) often go unnoticed without attention. In addition, we found attentional costs in dual-task situations, further implicating a role for attention in statistical perception. Overall, we conclude that while visual ensembles may be processed efficiently, some amount of attention is necessary for conscious perception of ensemble statistics. Copyright © 2016 Elsevier Inc. All rights reserved.

  2. Lubricant and additive effects on spur gear fatigue life

    NASA Technical Reports Server (NTRS)

    Townsend, D. P.; Zaretsky, E. V.; Scibbe, H. W.

    1985-01-01

    Spur gear endurance tests were conducted with six lubricants using a single lot of consumable-electrode vacuum melted (CVM) AISI 9310 spur gears. The sixth lubricant was divided into four batches each of which had a different additive content. Lubricants tested with a phosphorus-type load carrying additive showed a statistically significant improvement in life over lubricants without this type of additive. The presence of sulfur type antiwear additives in the lubricant did not appear to affect the surface fatigue life of the gears. No statistical difference in life was produced with those lubricants of different base stocks but with similar viscosity, pressure-viscosity coefficients and antiwear additives. Gears tested with a 0.1 wt % sulfur and 0.1 wt % phosphorus EP additives in the lubricant had reactive films that were 200 to 400 (0.8 to 1.6 microns) thick.

  3. Improving qPCR telomere length assays: Controlling for well position effects increases statistical power.

    PubMed

    Eisenberg, Dan T A; Kuzawa, Christopher W; Hayes, M Geoffrey

    2015-01-01

    Telomere length (TL) is commonly measured using quantitative PCR (qPCR). Although, easier than the southern blot of terminal restriction fragments (TRF) TL measurement method, one drawback of qPCR is that it introduces greater measurement error and thus reduces the statistical power of analyses. To address a potential source of measurement error, we consider the effect of well position on qPCR TL measurements. qPCR TL data from 3,638 people run on a Bio-Rad iCycler iQ are reanalyzed here. To evaluate measurement validity, correspondence with TRF, age, and between mother and offspring are examined. First, we present evidence for systematic variation in qPCR TL measurements in relation to thermocycler well position. Controlling for these well-position effects consistently improves measurement validity and yields estimated improvements in statistical power equivalent to increasing sample sizes by 16%. We additionally evaluated the linearity of the relationships between telomere and single copy gene control amplicons and between qPCR and TRF measures. We find that, unlike some previous reports, our data exhibit linear relationships. We introduce the standard error in percent, a superior method for quantifying measurement error as compared to the commonly used coefficient of variation. Using this measure, we find that excluding samples with high measurement error does not improve measurement validity in our study. Future studies using block-based thermocyclers should consider well position effects. Since additional information can be gleaned from well position corrections, rerunning analyses of previous results with well position correction could serve as an independent test of the validity of these results. © 2015 Wiley Periodicals, Inc.

  4. Gait patterns for crime fighting: statistical evaluation

    NASA Astrophysics Data System (ADS)

    Sulovská, Kateřina; Bělašková, Silvie; Adámek, Milan

    2013-10-01

    The criminality is omnipresent during the human history. Modern technology brings novel opportunities for identification of a perpetrator. One of these opportunities is an analysis of video recordings, which may be taken during the crime itself or before/after the crime. The video analysis can be classed as identification analyses, respectively identification of a person via externals. The bipedal locomotion focuses on human movement on the basis of their anatomical-physiological features. Nowadays, the human gait is tested by many laboratories to learn whether the identification via bipedal locomotion is possible or not. The aim of our study is to use 2D components out of 3D data from the VICON Mocap system for deep statistical analyses. This paper introduces recent results of a fundamental study focused on various gait patterns during different conditions. The study contains data from 12 participants. Curves obtained from these measurements were sorted, averaged and statistically tested to estimate the stability and distinctiveness of this biometrics. Results show satisfactory distinctness of some chosen points, while some do not embody significant difference. However, results presented in this paper are of initial phase of further deeper and more exacting analyses of gait patterns under different conditions.

  5. Methodological approaches in analysing observational data: A practical example on how to address clustering and selection bias.

    PubMed

    Trutschel, Diana; Palm, Rebecca; Holle, Bernhard; Simon, Michael

    2017-11-01

    Because not every scientific question on effectiveness can be answered with randomised controlled trials, research methods that minimise bias in observational studies are required. Two major concerns influence the internal validity of effect estimates: selection bias and clustering. Hence, to reduce the bias of the effect estimates, more sophisticated statistical methods are needed. To introduce statistical approaches such as propensity score matching and mixed models into representative real-world analysis and to conduct the implementation in statistical software R to reproduce the results. Additionally, the implementation in R is presented to allow the results to be reproduced. We perform a two-level analytic strategy to address the problems of bias and clustering: (i) generalised models with different abilities to adjust for dependencies are used to analyse binary data and (ii) the genetic matching and covariate adjustment methods are used to adjust for selection bias. Hence, we analyse the data from two population samples, the sample produced by the matching method and the full sample. The different analysis methods in this article present different results but still point in the same direction. In our example, the estimate of the probability of receiving a case conference is higher in the treatment group than in the control group. Both strategies, genetic matching and covariate adjustment, have their limitations but complement each other to provide the whole picture. The statistical approaches were feasible for reducing bias but were nevertheless limited by the sample used. For each study and obtained sample, the pros and cons of the different methods have to be weighted. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  6. Quantifying Trace Amounts of Aggregates in Biopharmaceuticals Using Analytical Ultracentrifugation Sedimentation Velocity: Bayesian Analyses and F Statistics.

    PubMed

    Wafer, Lucas; Kloczewiak, Marek; Luo, Yin

    2016-07-01

    Analytical ultracentrifugation-sedimentation velocity (AUC-SV) is often used to quantify high molar mass species (HMMS) present in biopharmaceuticals. Although these species are often present in trace quantities, they have received significant attention due to their potential immunogenicity. Commonly, AUC-SV data is analyzed as a diffusion-corrected, sedimentation coefficient distribution, or c(s), using SEDFIT to numerically solve Lamm-type equations. SEDFIT also utilizes maximum entropy or Tikhonov-Phillips regularization to further allow the user to determine relevant sample information, including the number of species present, their sedimentation coefficients, and their relative abundance. However, this methodology has several, often unstated, limitations, which may impact the final analysis of protein therapeutics. These include regularization-specific effects, artificial "ripple peaks," and spurious shifts in the sedimentation coefficients. In this investigation, we experimentally verified that an explicit Bayesian approach, as implemented in SEDFIT, can largely correct for these effects. Clear guidelines on how to implement this technique and interpret the resulting data, especially for samples containing micro-heterogeneity (e.g., differential glycosylation), are also provided. In addition, we demonstrated how the Bayesian approach can be combined with F statistics to draw more accurate conclusions and rigorously exclude artifactual peaks. Numerous examples with an antibody and an antibody-drug conjugate were used to illustrate the strengths and drawbacks of each technique.

  7. Homeopathy: meta-analyses of pooled clinical data.

    PubMed

    Hahn, Robert G

    2013-01-01

    In the first decade of the evidence-based era, which began in the mid-1990s, meta-analyses were used to scrutinize homeopathy for evidence of beneficial effects in medical conditions. In this review, meta-analyses including pooled data from placebo-controlled clinical trials of homeopathy and the aftermath in the form of debate articles were analyzed. In 1997 Klaus Linde and co-workers identified 89 clinical trials that showed an overall odds ratio of 2.45 in favor of homeopathy over placebo. There was a trend toward smaller benefit from studies of the highest quality, but the 10 trials with the highest Jadad score still showed homeopathy had a statistically significant effect. These results challenged academics to perform alternative analyses that, to demonstrate the lack of effect, relied on extensive exclusion of studies, often to the degree that conclusions were based on only 5-10% of the material, or on virtual data. The ultimate argument against homeopathy is the 'funnel plot' published by Aijing Shang's research group in 2005. However, the funnel plot is flawed when applied to a mixture of diseases, because studies with expected strong treatments effects are, for ethical reasons, powered lower than studies with expected weak or unclear treatment effects. To conclude that homeopathy lacks clinical effect, more than 90% of the available clinical trials had to be disregarded. Alternatively, flawed statistical methods had to be applied. Future meta-analyses should focus on the use of homeopathy in specific diseases or groups of diseases instead of pooling data from all clinical trials. © 2013 S. Karger GmbH, Freiburg.

  8. Statistical Primer on Biosimilar Clinical Development.

    PubMed

    Isakov, Leah; Jin, Bo; Jacobs, Ira Allen

    A biosimilar is highly similar to a licensed biological product and has no clinically meaningful differences between the biological product and the reference (originator) product in terms of safety, purity, and potency and is approved under specific regulatory approval processes. Because both the originator and the potential biosimilar are large and structurally complex proteins, biosimilars are not generic equivalents of the originator. Thus, the regulatory approach for a small-molecule generic is not appropriate for a potential biosimilar. As a result, different study designs and statistical approaches are used in the assessment of a potential biosimilar. This review covers concepts and terminology used in statistical analyses in the clinical development of biosimilars so that clinicians can understand how similarity is evaluated. This should allow the clinician to understand the statistical considerations in biosimilar clinical trials and make informed prescribing decisions when an approved biosimilar is available.

  9. Research Design and Statistical Methods in Indian Medical Journals: A Retrospective Survey

    PubMed Central

    Hassan, Shabbeer; Yellur, Rajashree; Subramani, Pooventhan; Adiga, Poornima; Gokhale, Manoj; Iyer, Manasa S.; Mayya, Shreemathi S.

    2015-01-01

    Good quality medical research generally requires not only an expertise in the chosen medical field of interest but also a sound knowledge of statistical methodology. The number of medical research articles which have been published in Indian medical journals has increased quite substantially in the past decade. The aim of this study was to collate all evidence on study design quality and statistical analyses used in selected leading Indian medical journals. Ten (10) leading Indian medical journals were selected based on impact factors and all original research articles published in 2003 (N = 588) and 2013 (N = 774) were categorized and reviewed. A validated checklist on study design, statistical analyses, results presentation, and interpretation was used for review and evaluation of the articles. Main outcomes considered in the present study were – study design types and their frequencies, error/defects proportion in study design, statistical analyses, and implementation of CONSORT checklist in RCT (randomized clinical trials). From 2003 to 2013: The proportion of erroneous statistical analyses did not decrease (χ2=0.592, Φ=0.027, p=0.4418), 25% (80/320) in 2003 compared to 22.6% (111/490) in 2013. Compared with 2003, significant improvement was seen in 2013; the proportion of papers using statistical tests increased significantly (χ2=26.96, Φ=0.16, p<0.0001) from 42.5% (250/588) to 56.7 % (439/774). The overall proportion of errors in study design decreased significantly (χ2=16.783, Φ=0.12 p<0.0001), 41.3% (243/588) compared to 30.6% (237/774). In 2013, randomized clinical trials designs has remained very low (7.3%, 43/588) with majority showing some errors (41 papers, 95.3%). Majority of the published studies were retrospective in nature both in 2003 [79.1% (465/588)] and in 2013 [78.2% (605/774)]. Major decreases in error proportions were observed in both results presentation (χ2=24.477, Φ=0.17, p<0.0001), 82.2% (263/320) compared to 66.3% (325

  10. Research design and statistical methods in Indian medical journals: a retrospective survey.

    PubMed

    Hassan, Shabbeer; Yellur, Rajashree; Subramani, Pooventhan; Adiga, Poornima; Gokhale, Manoj; Iyer, Manasa S; Mayya, Shreemathi S

    2015-01-01

    Good quality medical research generally requires not only an expertise in the chosen medical field of interest but also a sound knowledge of statistical methodology. The number of medical research articles which have been published in Indian medical journals has increased quite substantially in the past decade. The aim of this study was to collate all evidence on study design quality and statistical analyses used in selected leading Indian medical journals. Ten (10) leading Indian medical journals were selected based on impact factors and all original research articles published in 2003 (N = 588) and 2013 (N = 774) were categorized and reviewed. A validated checklist on study design, statistical analyses, results presentation, and interpretation was used for review and evaluation of the articles. Main outcomes considered in the present study were - study design types and their frequencies, error/defects proportion in study design, statistical analyses, and implementation of CONSORT checklist in RCT (randomized clinical trials). From 2003 to 2013: The proportion of erroneous statistical analyses did not decrease (χ2=0.592, Φ=0.027, p=0.4418), 25% (80/320) in 2003 compared to 22.6% (111/490) in 2013. Compared with 2003, significant improvement was seen in 2013; the proportion of papers using statistical tests increased significantly (χ2=26.96, Φ=0.16, p<0.0001) from 42.5% (250/588) to 56.7 % (439/774). The overall proportion of errors in study design decreased significantly (χ2=16.783, Φ=0.12 p<0.0001), 41.3% (243/588) compared to 30.6% (237/774). In 2013, randomized clinical trials designs has remained very low (7.3%, 43/588) with majority showing some errors (41 papers, 95.3%). Majority of the published studies were retrospective in nature both in 2003 [79.1% (465/588)] and in 2013 [78.2% (605/774)]. Major decreases in error proportions were observed in both results presentation (χ2=24.477, Φ=0.17, p<0.0001), 82.2% (263/320) compared to 66.3% (325/490) and

  11. Statistical Characterization of School Bus Drive Cycles Collected via Onboard Logging Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Duran, A.; Walkowicz, K.

    In an effort to characterize the dynamics typical of school bus operation, National Renewable Energy Laboratory (NREL) researchers set out to gather in-use duty cycle data from school bus fleets operating across the country. Employing a combination of Isaac Instruments GPS/CAN data loggers in conjunction with existing onboard telemetric systems resulted in the capture of operating information for more than 200 individual vehicles in three geographically unique domestic locations. In total, over 1,500 individual operational route shifts from Washington, New York, and Colorado were collected. Upon completing the collection of in-use field data using either NREL-installed data acquisition devices ormore » existing onboard telemetry systems, large-scale duty-cycle statistical analyses were performed to examine underlying vehicle dynamics trends within the data and to explore vehicle operation variations between fleet locations. Based on the results of these analyses, high, low, and average vehicle dynamics requirements were determined, resulting in the selection of representative standard chassis dynamometer test cycles for each condition. In this paper, the methodology and accompanying results of the large-scale duty-cycle statistical analysis are presented, including graphical and tabular representations of a number of relationships between key duty-cycle metrics observed within the larger data set. In addition to presenting the results of this analysis, conclusions are drawn and presented regarding potential applications of advanced vehicle technology as it relates specifically to school buses.« less

  12. Statistical considerations for agroforestry studies

    Treesearch

    James A. Baldwin

    1993-01-01

    Statistical topics that related to agroforestry studies are discussed. These included study objectives, populations of interest, sampling schemes, sample sizes, estimation vs. hypothesis testing, and P-values. In addition, a relatively new and very much improved histogram display is described.

  13. Statistical Analysis Techniques for Small Sample Sizes

    NASA Technical Reports Server (NTRS)

    Navard, S. E.

    1984-01-01

    The small sample sizes problem which is encountered when dealing with analysis of space-flight data is examined. Because of such a amount of data available, careful analyses are essential to extract the maximum amount of information with acceptable accuracy. Statistical analysis of small samples is described. The background material necessary for understanding statistical hypothesis testing is outlined and the various tests which can be done on small samples are explained. Emphasis is on the underlying assumptions of each test and on considerations needed to choose the most appropriate test for a given type of analysis.

  14. Quasi-Static Probabilistic Structural Analyses Process and Criteria

    NASA Technical Reports Server (NTRS)

    Goldberg, B.; Verderaime, V.

    1999-01-01

    Current deterministic structural methods are easily applied to substructures and components, and analysts have built great design insights and confidence in them over the years. However, deterministic methods cannot support systems risk analyses, and it was recently reported that deterministic treatment of statistical data is inconsistent with error propagation laws that can result in unevenly conservative structural predictions. Assuming non-nal distributions and using statistical data formats throughout prevailing stress deterministic processes lead to a safety factor in statistical format, which integrated into the safety index, provides a safety factor and first order reliability relationship. The embedded safety factor in the safety index expression allows a historically based risk to be determined and verified over a variety of quasi-static metallic substructures consistent with the traditional safety factor methods and NASA Std. 5001 criteria.

  15. Distinguishing Mediational Models and Analyses in Clinical Psychology: Atemporal Associations Do Not Imply Causation.

    PubMed

    Winer, E Samuel; Cervone, Daniel; Bryant, Jessica; McKinney, Cliff; Liu, Richard T; Nadorff, Michael R

    2016-09-01

    A popular way to attempt to discern causality in clinical psychology is through mediation analysis. However, mediation analysis is sometimes applied to research questions in clinical psychology when inferring causality is impossible. This practice may soon increase with new, readily available, and easy-to-use statistical advances. Thus, we here provide a heuristic to remind clinical psychological scientists of the assumptions of mediation analyses. We describe recent statistical advances and unpack assumptions of causality in mediation, underscoring the importance of time in understanding mediational hypotheses and analyses in clinical psychology. Example analyses demonstrate that statistical mediation can occur despite theoretical mediation being improbable. We propose a delineation of mediational effects derived from cross-sectional designs into the terms temporal and atemporal associations to emphasize time in conceptualizing process models in clinical psychology. The general implications for mediational hypotheses and the temporal frameworks from within which they may be drawn are discussed. © 2016 Wiley Periodicals, Inc.

  16. BRepertoire: a user-friendly web server for analysing antibody repertoire data.

    PubMed

    Margreitter, Christian; Lu, Hui-Chun; Townsend, Catherine; Stewart, Alexander; Dunn-Walters, Deborah K; Fraternali, Franca

    2018-04-14

    Antibody repertoire analysis by high throughput sequencing is now widely used, but a persisting challenge is enabling immunologists to explore their data to discover discriminating repertoire features for their own particular investigations. Computational methods are necessary for large-scale evaluation of antibody properties. We have developed BRepertoire, a suite of user-friendly web-based software tools for large-scale statistical analyses of repertoire data. The software is able to use data preprocessed by IMGT, and performs statistical and comparative analyses with versatile plotting options. BRepertoire has been designed to operate in various modes, for example analysing sequence-specific V(D)J gene usage, discerning physico-chemical properties of the CDR regions and clustering of clonotypes. Those analyses are performed on the fly by a number of R packages and are deployed by a shiny web platform. The user can download the analysed data in different table formats and save the generated plots as image files ready for publication. We believe BRepertoire to be a versatile analytical tool that complements experimental studies of immune repertoires. To illustrate the server's functionality, we show use cases including differential gene usage in a vaccination dataset and analysis of CDR3H properties in old and young individuals. The server is accessible under http://mabra.biomed.kcl.ac.uk/BRepertoire.

  17. Bayesian analyses of seasonal runoff forecasts

    NASA Astrophysics Data System (ADS)

    Krzysztofowicz, R.; Reese, S.

    1991-12-01

    Forecasts of seasonal snowmelt runoff volume provide indispensable information for rational decision making by water project operators, irrigation district managers, and farmers in the western United States. Bayesian statistical models and communication frames have been researched in order to enhance the forecast information disseminated to the users, and to characterize forecast skill from the decision maker's point of view. Four products are presented: (i) a Bayesian Processor of Forecasts, which provides a statistical filter for calibrating the forecasts, and a procedure for estimating the posterior probability distribution of the seasonal runoff; (ii) the Bayesian Correlation Score, a new measure of forecast skill, which is related monotonically to the ex ante economic value of forecasts for decision making; (iii) a statistical predictor of monthly cumulative runoffs within the snowmelt season, conditional on the total seasonal runoff forecast; and (iv) a framing of the forecast message that conveys the uncertainty associated with the forecast estimates to the users. All analyses are illustrated with numerical examples of forecasts for six gauging stations from the period 1971 1988.

  18. HAPRAP: a haplotype-based iterative method for statistical fine mapping using GWAS summary statistics.

    PubMed

    Zheng, Jie; Rodriguez, Santiago; Laurin, Charles; Baird, Denis; Trela-Larsen, Lea; Erzurumluoglu, Mesut A; Zheng, Yi; White, Jon; Giambartolomei, Claudia; Zabaneh, Delilah; Morris, Richard; Kumari, Meena; Casas, Juan P; Hingorani, Aroon D; Evans, David M; Gaunt, Tom R; Day, Ian N M

    2017-01-01

    Fine mapping is a widely used approach for identifying the causal variant(s) at disease-associated loci. Standard methods (e.g. multiple regression) require individual level genotypes. Recent fine mapping methods using summary-level data require the pairwise correlation coefficients ([Formula: see text]) of the variants. However, haplotypes rather than pairwise [Formula: see text], are the true biological representation of linkage disequilibrium (LD) among multiple loci. In this article, we present an empirical iterative method, HAPlotype Regional Association analysis Program (HAPRAP), that enables fine mapping using summary statistics and haplotype information from an individual-level reference panel. Simulations with individual-level genotypes show that the results of HAPRAP and multiple regression are highly consistent. In simulation with summary-level data, we demonstrate that HAPRAP is less sensitive to poor LD estimates. In a parametric simulation using Genetic Investigation of ANthropometric Traits height data, HAPRAP performs well with a small training sample size (N < 2000) while other methods become suboptimal. Moreover, HAPRAP's performance is not affected substantially by single nucleotide polymorphisms (SNPs) with low minor allele frequencies. We applied the method to existing quantitative trait and binary outcome meta-analyses (human height, QTc interval and gallbladder disease); all previous reported association signals were replicated and two additional variants were independently associated with human height. Due to the growing availability of summary level data, the value of HAPRAP is likely to increase markedly for future analyses (e.g. functional prediction and identification of instruments for Mendelian randomization). The HAPRAP package and documentation are available at http://apps.biocompute.org.uk/haprap/ CONTACT: : jie.zheng@bristol.ac.uk or tom.gaunt@bristol.ac.ukSupplementary information: Supplementary data are available at

  19. Accuracy of medical subject heading indexing of dental survival analyses.

    PubMed

    Layton, Danielle M; Clarke, Michael

    2014-01-01

    To assess the Medical Subject Headings (MeSH) indexing of articles that employed time-to-event analyses to report outcomes of dental treatment in patients. Articles published in 2008 in 50 dental journals with the highest impact factors were hand searched to identify articles reporting dental treatment outcomes over time in human subjects with time-to-event statistics (included, n = 95), without time-to-event statistics (active controls, n = 91), and all other articles (passive controls, n = 6,769). The search was systematic (kappa 0.92 for screening, 0.86 for eligibility). Outcome-, statistic- and time-related MeSH were identified, and differences in allocation between groups were analyzed with chi-square and Fischer exact statistics. The most frequently allocated MeSH for included and active control articles were "dental restoration failure" (77% and 52%, respectively) and "treatment outcome" (54% and 48%, respectively). Outcome MeSH was similar between these groups (86% and 77%, respectively) and significantly greater than passive controls (10%, P < .001). Significantly more statistical MeSH were allocated to the included articles than to the active or passive controls (67%, 15%, and 1%, respectively, P < .001). Sixty-nine included articles specifically used Kaplan-Meier or life table analyses, but only 42% (n = 29) were indexed as such. Significantly more time-related MeSH were allocated to the included than the active controls (92% and 79%, respectively, P = .02), or to the passive controls (22%, P < .001). MeSH allocation within MEDLINE to time-to-event dental articles was inaccurate and inconsistent. Statistical MeSH were omitted from 30% of the included articles and incorrectly allocated to 15% of active controls. Such errors adversely impact search accuracy.

  20. Inferential Statistics in "Language Teaching Research": A Review and Ways Forward

    ERIC Educational Resources Information Center

    Lindstromberg, Seth

    2016-01-01

    This article reviews all (quasi)experimental studies appearing in the first 19 volumes (1997-2015) of "Language Teaching Research" (LTR). Specifically, it provides an overview of how statistical analyses were conducted in these studies and of how the analyses were reported. The overall conclusion is that there has been a tight adherence…

  1. A Framework for Assessing High School Students' Statistical Reasoning.

    PubMed

    Chan, Shiau Wei; Ismail, Zaleha; Sumintono, Bambang

    2016-01-01

    Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students' statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students' statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework's cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments.

  2. An operational definition of a statistically meaningful trend.

    PubMed

    Bryhn, Andreas C; Dimberg, Peter H

    2011-04-28

    Linear trend analysis of time series is standard procedure in many scientific disciplines. If the number of data is large, a trend may be statistically significant even if data are scattered far from the trend line. This study introduces and tests a quality criterion for time trends referred to as statistical meaningfulness, which is a stricter quality criterion for trends than high statistical significance. The time series is divided into intervals and interval mean values are calculated. Thereafter, r(2) and p values are calculated from regressions concerning time and interval mean values. If r(2) ≥ 0.65 at p ≤ 0.05 in any of these regressions, then the trend is regarded as statistically meaningful. Out of ten investigated time series from different scientific disciplines, five displayed statistically meaningful trends. A Microsoft Excel application (add-in) was developed which can perform statistical meaningfulness tests and which may increase the operationality of the test. The presented method for distinguishing statistically meaningful trends should be reasonably uncomplicated for researchers with basic statistics skills and may thus be useful for determining which trends are worth analysing further, for instance with respect to causal factors. The method can also be used for determining which segments of a time trend may be particularly worthwhile to focus on.

  3. Statistical process control: A feasibility study of the application of time-series measurement in early neurorehabilitation after acquired brain injury.

    PubMed

    Markovic, Gabriela; Schult, Marie-Louise; Bartfai, Aniko; Elg, Mattias

    2017-01-31

    Progress in early cognitive recovery after acquired brain injury is uneven and unpredictable, and thus the evaluation of rehabilitation is complex. The use of time-series measurements is susceptible to statistical change due to process variation. To evaluate the feasibility of using a time-series method, statistical process control, in early cognitive rehabilitation. Participants were 27 patients with acquired brain injury undergoing interdisciplinary rehabilitation of attention within 4 months post-injury. The outcome measure, the Paced Auditory Serial Addition Test, was analysed using statistical process control. Statistical process control identifies if and when change occurs in the process according to 3 patterns: rapid, steady or stationary performers. The statistical process control method was adjusted, in terms of constructing the baseline and the total number of measurement points, in order to measure a process in change. Statistical process control methodology is feasible for use in early cognitive rehabilitation, since it provides information about change in a process, thus enabling adjustment of the individual treatment response. Together with the results indicating discernible subgroups that respond differently to rehabilitation, statistical process control could be a valid tool in clinical decision-making. This study is a starting-point in understanding the rehabilitation process using a real-time-measurements approach.

  4. Quantifying, displaying and accounting for heterogeneity in the meta-analysis of RCTs using standard and generalised Q statistics

    PubMed Central

    2011-01-01

    Background Clinical researchers have often preferred to use a fixed effects model for the primary interpretation of a meta-analysis. Heterogeneity is usually assessed via the well known Q and I2 statistics, along with the random effects estimate they imply. In recent years, alternative methods for quantifying heterogeneity have been proposed, that are based on a 'generalised' Q statistic. Methods We review 18 IPD meta-analyses of RCTs into treatments for cancer, in order to quantify the amount of heterogeneity present and also to discuss practical methods for explaining heterogeneity. Results Differing results were obtained when the standard Q and I2 statistics were used to test for the presence of heterogeneity. The two meta-analyses with the largest amount of heterogeneity were investigated further, and on inspection the straightforward application of a random effects model was not deemed appropriate. Compared to the standard Q statistic, the generalised Q statistic provided a more accurate platform for estimating the amount of heterogeneity in the 18 meta-analyses. Conclusions Explaining heterogeneity via the pre-specification of trial subgroups, graphical diagnostic tools and sensitivity analyses produced a more desirable outcome than an automatic application of the random effects model. Generalised Q statistic methods for quantifying and adjusting for heterogeneity should be incorporated as standard into statistical software. Software is provided to help achieve this aim. PMID:21473747

  5. Additivity of nonlinear biomass equations

    Treesearch

    Bernard R. Parresol

    2001-01-01

    Two procedures that guarantee the property of additivity among the components of tree biomass and total tree biomass utilizing nonlinear functions are developed. Procedure 1 is a simple combination approach, and procedure 2 is based on nonlinear joint-generalized regression (nonlinear seemingly unrelated regressions) with parameter restrictions. Statistical theory is...

  6. Statistical Significance Testing in Second Language Research: Basic Problems and Suggestions for Reform

    ERIC Educational Resources Information Center

    Norris, John M.

    2015-01-01

    Traditions of statistical significance testing in second language (L2) quantitative research are strongly entrenched in how researchers design studies, select analyses, and interpret results. However, statistical significance tests using "p" values are commonly misinterpreted by researchers, reviewers, readers, and others, leading to…

  7. A wind proxy based on migrating dunes at the Baltic coast: statistical analysis of the link between wind conditions and sand movement

    NASA Astrophysics Data System (ADS)

    Bierstedt, Svenja E.; Hünicke, Birgit; Zorita, Eduardo; Ludwig, Juliane

    2017-07-01

    We statistically analyse the relationship between the structure of migrating dunes in the southern Baltic and the driving wind conditions over the past 26 years, with the long-term aim of using migrating dunes as a proxy for past wind conditions at an interannual resolution. The present analysis is based on the dune record derived from geo-radar measurements by Ludwig et al. (2017). The dune system is located at the Baltic Sea coast of Poland and is migrating from west to east along the coast. The dunes present layers with different thicknesses that can be assigned to absolute dates at interannual timescales and put in relation to seasonal wind conditions. To statistically analyse this record and calibrate it as a wind proxy, we used a gridded regional meteorological reanalysis data set (coastDat2) covering recent decades. The identified link between the dune annual layers and wind conditions was additionally supported by the co-variability between dune layers and observed sea level variations in the southern Baltic Sea. We include precipitation and temperature into our analysis, in addition to wind, to learn more about the dependency between these three atmospheric factors and their common influence on the dune system. We set up a statistical linear model based on the correlation between the frequency of days with specific wind conditions in a given season and dune migration velocities derived for that season. To some extent, the dune records can be seen as analogous to tree-ring width records, and hence we use a proxy validation method usually applied in dendrochronology, cross-validation with the leave-one-out method, when the observational record is short. The revealed correlations between the wind record from the reanalysis and the wind record derived from the dune structure is in the range between 0.28 and 0.63, yielding similar statistical validation skill as dendroclimatological records.

  8. Guidelines for the design and statistical analysis of experiments in papers submitted to ATLA.

    PubMed

    Festing, M F

    2001-01-01

    In vitro experiments need to be well designed and correctly analysed if they are to achieve their full potential to replace the use of animals in research. An "experiment" is a procedure for collecting scientific data in order to answer a hypothesis, or to provide material for generating new hypotheses, and differs from a survey because the scientist has control over the treatments that can be applied. Most experiments can be classified into one of a few formal designs, the most common being completely randomised, and randomised block designs. These are quite common with in vitro experiments, which are often replicated in time. Some experiments involve a single independent (treatment) variable, while other "factorial" designs simultaneously vary two or more independent variables, such as drug treatment and cell line. Factorial designs often provide additional information at little extra cost. Experiments need to be carefully planned to avoid bias, be powerful yet simple, provide for a valid statistical analysis and, in some cases, have a wide range of applicability. Virtually all experiments need some sort of statistical analysis in order to take account of biological variation among the experimental subjects. Parametric methods using the t test or analysis of variance are usually more powerful than non-parametric methods, provided the underlying assumptions of normality of the residuals and equal variances are approximately valid. The statistical analyses of data from a completely randomised design, and from a randomised-block design are demonstrated in Appendices 1 and 2, and methods of determining sample size are discussed in Appendix 3. Appendix 4 gives a checklist for authors submitting papers to ATLA.

  9. Improved score statistics for meta-analysis in single-variant and gene-level association studies.

    PubMed

    Yang, Jingjing; Chen, Sai; Abecasis, Gonçalo

    2018-06-01

    Meta-analysis is now an essential tool for genetic association studies, allowing them to combine large studies and greatly accelerating the pace of genetic discovery. Although the standard meta-analysis methods perform equivalently as the more cumbersome joint analysis under ideal settings, they result in substantial power loss under unbalanced settings with various case-control ratios. Here, we investigate the power loss problem by the standard meta-analysis methods for unbalanced studies, and further propose novel meta-analysis methods performing equivalently to the joint analysis under both balanced and unbalanced settings. We derive improved meta-score-statistics that can accurately approximate the joint-score-statistics with combined individual-level data, for both linear and logistic regression models, with and without covariates. In addition, we propose a novel approach to adjust for population stratification by correcting for known population structures through minor allele frequencies. In the simulated gene-level association studies under unbalanced settings, our method recovered up to 85% power loss caused by the standard methods. We further showed the power gain of our methods in gene-level tests with 26 unbalanced studies of age-related macular degeneration . In addition, we took the meta-analysis of three unbalanced studies of type 2 diabetes as an example to discuss the challenges of meta-analyzing multi-ethnic samples. In summary, our improved meta-score-statistics with corrections for population stratification can be used to construct both single-variant and gene-level association studies, providing a useful framework for ensuring well-powered, convenient, cross-study analyses. © 2018 WILEY PERIODICALS, INC.

  10. An Update on Statistical Boosting in Biomedicine.

    PubMed

    Mayr, Andreas; Hofner, Benjamin; Waldmann, Elisabeth; Hepp, Tobias; Meyer, Sebastian; Gefeller, Olaf

    2017-01-01

    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.

  11. Which statistics should tropical biologists learn?

    PubMed

    Loaiza Velásquez, Natalia; González Lutz, María Isabel; Monge-Nájera, Julián

    2011-09-01

    Tropical biologists study the richest and most endangered biodiversity in the planet, and in these times of climate change and mega-extinctions, the need for efficient, good quality research is more pressing than in the past. However, the statistical component in research published by tropical authors sometimes suffers from poor quality in data collection; mediocre or bad experimental design and a rigid and outdated view of data analysis. To suggest improvements in their statistical education, we listed all the statistical tests and other quantitative analyses used in two leading tropical journals, the Revista de Biología Tropical and Biotropica, during a year. The 12 most frequent tests in the articles were: Analysis of Variance (ANOVA), Chi-Square Test, Student's T Test, Linear Regression, Pearson's Correlation Coefficient, Mann-Whitney U Test, Kruskal-Wallis Test, Shannon's Diversity Index, Tukey's Test, Cluster Analysis, Spearman's Rank Correlation Test and Principal Component Analysis. We conclude that statistical education for tropical biologists must abandon the old syllabus based on the mathematical side of statistics and concentrate on the correct selection of these and other procedures and tests, on their biological interpretation and on the use of reliable and friendly freeware. We think that their time will be better spent understanding and protecting tropical ecosystems than trying to learn the mathematical foundations of statistics: in most cases, a well designed one-semester course should be enough for their basic requirements.

  12. Voxel-based statistical analysis of cerebral blood flow using Tc-99m ECD brain SPECT in patients with traumatic brain injury: group and individual analyses.

    PubMed

    Shin, Yong Beom; Kim, Seong-Jang; Kim, In-Ju; Kim, Yong-Ki; Kim, Dong-Soo; Park, Jae Heung; Yeom, Seok-Ran

    2006-06-01

    Statistical parametric mapping (SPM) was applied to brain perfusion single photon emission computed tomography (SPECT) images in patients with traumatic brain injury (TBI) to investigate regional cerebral abnormalities compared to age-matched normal controls. Thirteen patients with TBI underwent brain perfusion SPECT were included in this study (10 males, three females, mean age 39.8 +/- 18.2, range 21 - 74). SPM2 software implemented in MATLAB 5.3 was used for spatial pre-processing and analysis and to determine the quantitative differences between TBI patients and age-matched normal controls. Three large voxel clusters of significantly decreased cerebral blood perfusion were found in patients with TBI. The largest clusters were area including medial frontal gyrus (voxel number 3642, peak Z-value = 4.31, 4.27, p = 0.000) in both hemispheres. The second largest clusters were areas including cingulated gyrus and anterior cingulate gyrus of left hemisphere (voxel number 381, peak Z-value = 3.67, 3.62, p = 0.000). Other clusters were parahippocampal gyrus (voxel number 173, peak Z-value = 3.40, p = 0.000) and hippocampus (voxel number 173, peak Z-value = 3.23, p = 0.001) in the left hemisphere. The false discovery rate (FDR) was less than 0.04. From this study, group and individual analyses of SPM2 could clearly identify the perfusion abnormalities of brain SPECT in patients with TBI. Group analysis of SPM2 showed hypoperfusion pattern in the areas including medial frontal gyrus of both hemispheres, cingulate gyrus, anterior cingulate gyrus, parahippocampal gyrus and hippocampus in the left hemisphere compared to age-matched normal controls. Also, left parahippocampal gyrus and left hippocampus were additional hypoperfusion areas. However, these findings deserve further investigation on a larger number of patients to be performed to allow a better validation of objective SPM analysis in patients with TBI.

  13. Statistical models and NMR analysis of polymer microstructure

    USDA-ARS?s Scientific Manuscript database

    Statistical models can be used in conjunction with NMR spectroscopy to study polymer microstructure and polymerization mechanisms. Thus, Bernoullian, Markovian, and enantiomorphic-site models are well known. Many additional models have been formulated over the years for additional situations. Typica...

  14. Analyses of global sea surface temperature 1856-1991

    NASA Astrophysics Data System (ADS)

    Kaplan, Alexey; Cane, Mark A.; Kushnir, Yochanan; Clement, Amy C.; Blumenthal, M. Benno; Rajagopalan, Balaji

    1998-08-01

    Global analyses of monthly sea surface temperature (SST) anomalies from 1856 to 1991 are produced using three statistically based methods: optimal smoothing (OS), the Kaiman filter (KF) and optimal interpolation (OI). Each of these is accompanied by estimates of the error covariance of the analyzed fields. The spatial covariance function these methods require is estimated from the available data; the timemarching model is a first-order autoregressive model again estimated from data. The data input for the analyses are monthly anomalies from the United Kingdom Meteorological Office historical sea surface temperature data set (MOHSST5) [Parker et al., 1994] of the Global Ocean Surface Temperature Atlas (GOSTA) [Bottomley et al., 1990]. These analyses are compared with each other, with GOSTA, and with an analysis generated by projection (P) onto a set of empirical orthogonal functions (as in Smith et al. [1996]). In theory, the quality of the analyses should rank in the order OS, KF, OI, P, and GOSTA. It is found that the first four give comparable results in the data-rich periods (1951-1991), but at times when data is sparse the first three differ significantly from P and GOSTA. At these times the latter two often have extreme and fluctuating values, prima facie evidence of error. The statistical schemes are also verified against data not used in any of the analyses (proxy records derived from corals and air temperature records from coastal and island stations). We also present evidence that the analysis error estimates are indeed indicative of the quality of the products. At most times the OS and KF products are close to the OI product, but at times of especially poor coverage their use of information from other times is advantageous. The methods appear to reconstruct the major features of the global SST field from very sparse data. Comparison with other indications of the El Niño-Southern Oscillation cycle show that the analyses provide usable information on

  15. Using Meta-analyses for Comparative Effectiveness Research

    PubMed Central

    Ruppar, Todd M.; Phillips, Lorraine J.; Chase, Jo-Ana D.

    2012-01-01

    Comparative effectiveness research seeks to identify the most effective interventions for particular patient populations. Meta-analysis is an especially valuable form of comparative effectiveness research because it emphasizes the magnitude of intervention effects rather than relying on tests of statistical significance among primary studies. Overall effects can be calculated for diverse clinical and patient-centered variables to determine the outcome patterns. Moderator analyses compare intervention characteristics among primary studies by determining if effect sizes vary among studies with different intervention characteristics. Intervention effectiveness can be linked to patient characteristics to provide evidence for patient-centered care. Moderator analyses often answer questions never posed by primary studies because neither multiple intervention characteristics nor populations are compared in single primary studies. Thus meta-analyses provide unique contributions to knowledge. Although meta-analysis is a powerful comparative effectiveness strategy, methodological challenges and limitations in primary research must be acknowledged to interpret findings. PMID:22789450

  16. Diatremes of the Hopi Buttes, Arizona; chemical and statistical analyses

    USGS Publications Warehouse

    Wenrich, K.J.; Mascarenas, J.F.

    1982-01-01

    Lacustrine sediments deposited in maar lakes of the Hopi Buttes diatremes are hosts for uranium mineralization of as much as 1500 ppm. The monchiquites and limburgite turfs erupted from the diatremes are distinguished from normal alkalic basalts of the Colorado Plateau by their extreme silica undersaturation and high water, TiO2, and P2O5 contents. Many trace elements are also unusually abundant, including Ag, As, Ba, Be, Ce, Dy, Eu, F, Gd, Hf, La, Nd, Pb, Rb, Se, Sm, Sn, Sr, Ta, Tb, Th, U, V, Zn, and Zr. The lacustrine sediments, which consist predominantly of travertine and clastic rocks, are the hosts for syngenetic and epigenetic uranium mineralization of as much as 1500 ppm uranium. Fission track maps show the uranium to be disseminated within the travertine and clastic rocks, and although microprobe analyses have not, as yet, revealed discrete uranium-bearing phases, the clastic rocks show a correlation of high Fe, Ti, and P with areas of high U. Correlation coefficients show that for the travertines, clastics, and limburgite ruffs, Mo, As, Sr, Co, and V appear to have the most consistent and strongest correlations with uranium. Many elements, including many of the rare-earth elements, that are high in these three rocks are also high in the monchiquites, as compared to the average crustal abundance for the respective rock type. This similar suite of anomalous elements, which includes such immobile elements as the rare earths, suggests that Fluids which deposited the travertines were related to the monchiquitic magma. The similar age of about 5 m.y. for both the lake beds and the monchiquites also appears to support this source for the mineralizing fluids.

  17. Perceived Effectiveness among College Students of Selected Statistical Measures in Motivating Exercise Behavior

    ERIC Educational Resources Information Center

    Merrill, Ray M.; Chatterley, Amanda; Shields, Eric C.

    2005-01-01

    This study explored the effectiveness of selected statistical measures at motivating or maintaining regular exercise among college students. The study also considered whether ease in understanding these statistical measures was associated with perceived effectiveness at motivating or maintaining regular exercise. Analyses were based on a…

  18. Additive Interaction between Heterogeneous Environmental Quality Domains (Air, Water, Land, Sociodemographic, and Built Environment) on Preterm Birth.

    PubMed

    Grabich, Shannon C; Rappazzo, Kristen M; Gray, Christine L; Jagai, Jyotsna S; Jian, Yun; Messer, Lynne C; Lobdell, Danelle T

    2016-01-01

    Environmental exposures often occur in tandem; however, epidemiological research often focuses on singular exposures. Statistical interactions among broad, well-characterized environmental domains have not yet been evaluated in association with health. We address this gap by conducting a county-level cross-sectional analysis of interactions between Environmental Quality Index (EQI) domain indices on preterm birth in the Unites States from 2000 to 2005. The EQI, a county-level index constructed for the 2000-2005 time period, was constructed from five domain-specific indices (air, water, land, built, and sociodemographic) using principal component analyses. County-level preterm birth rates ( n  = 3141) were estimated using live births from the National Center for Health Statistics. Linear regression was used to estimate prevalence differences (PDs) and 95% confidence intervals (CIs) comparing worse environmental quality to the better quality for each model for (a) each individual domain main effect, (b) the interaction contrast, and (c) the two main effects plus interaction effect (i.e., the "net effect") to show departure from additivity for the all U.S. counties. Analyses were also performed for subgroupings by four urban/rural strata. We found the suggestion of antagonistic interactions but no synergism, along with several purely additive (i.e., no interaction) associations. In the non-stratified model, we observed antagonistic interactions, between the sociodemographic/air domains [net effect (i.e., the association, including main effects and interaction effects) PD: -0.004 (95% CI: -0.007, 0.000), interaction contrast: -0.013 (95% CI: -0.020, -0.007)] and built/air domains [net effect PD: 0.008 (95% CI 0.004, 0.011), interaction contrast: -0.008 (95% CI: -0.015, -0.002)]. Most interactions were between the air domain and other respective domains. Interactions differed by urbanicity, with more interactions observed in non-metropolitan regions. Observed

  19. Statistical methods for the beta-binomial model in teratology.

    PubMed Central

    Yamamoto, E; Yanagimoto, T

    1994-01-01

    The beta-binomial model is widely used for analyzing teratological data involving littermates. Recent developments in statistical analyses of teratological data are briefly reviewed with emphasis on the model. For statistical inference of the parameters in the beta-binomial distribution, separation of the likelihood introduces an likelihood inference. This leads to reducing biases of estimators and also to improving accuracy of empirical significance levels of tests. Separate inference of the parameters can be conducted in a unified way. PMID:8187716

  20. Analyses of non-fatal accidents in an opencast mine by logistic regression model - a case study.

    PubMed

    Onder, Seyhan; Mutlu, Mert

    2017-09-01

    Accidents cause major damage for both workers and enterprises in the mining industry. To reduce the number of occupational accidents, these incidents should be properly registered and carefully analysed. This study efficiently examines the Aegean Lignite Enterprise (ELI) of Turkish Coal Enterprises (TKI) in Soma between 2006 and 2011, and opencast coal mine occupational accident records were used for statistical analyses. A total of 231 occupational accidents were analysed for this study. The accident records were categorized into seven groups: area, reason, occupation, part of body, age, shift hour and lost days. The SPSS package program was used in this study for logistic regression analyses, which predicted the probability of accidents resulting in greater or less than 3 lost workdays for non-fatal injuries. Social facilities-area of surface installations, workshops and opencast mining areas are the areas with the highest probability for accidents with greater than 3 lost workdays for non-fatal injuries, while the reasons with the highest probability for these types of accidents are transporting and manual handling. Additionally, the model was tested for such reported accidents that occurred in 2012 for the ELI in Soma and estimated the probability of exposure to accidents with lost workdays correctly by 70%.

  1. Statistical analysis of sperm sorting

    NASA Astrophysics Data System (ADS)

    Koh, James; Marcos, Marcos

    2017-11-01

    The success rate of assisted reproduction depends on the proportion of morphologically normal sperm. It is possible to use an external field for manipulation and sorting. Depending on their morphology, the extent of response varies. Due to the wide distribution in sperm morphology even among individuals, the resulting distribution of kinematic behaviour, and consequently the feasibility of sorting, should be analysed statistically. In this theoretical work, Resistive Force Theory and Slender Body Theory will be applied and compared. Full name is Marcos.

  2. Using Additional Analyses to Clarify the Functions of Problem Behavior: An Analysis of Two Cases

    ERIC Educational Resources Information Center

    Payne, Steven W.; Dozier, Claudia L.; Neidert, Pamela L.; Jowett, Erica S.; Newquist, Matthew H.

    2014-01-01

    Functional analyses (FA) have proven useful for identifying contingencies that influence problem behavior. Research has shown that some problem behavior may only occur in specific contexts or be influenced by multiple or idiosyncratic variables. When these contexts or sources of influence are not assessed in an FA, further assessment may be…

  3. Statistics for Learning Genetics

    NASA Astrophysics Data System (ADS)

    Charles, Abigail Sheena

    This study investigated the knowledge and skills that biology students may need to help them understand statistics/mathematics as it applies to genetics. The data are based on analyses of current representative genetics texts, practicing genetics professors' perspectives, and more directly, students' perceptions of, and performance in, doing statistically-based genetics problems. This issue is at the emerging edge of modern college-level genetics instruction, and this study attempts to identify key theoretical components for creating a specialized biological statistics curriculum. The goal of this curriculum will be to prepare biology students with the skills for assimilating quantitatively-based genetic processes, increasingly at the forefront of modern genetics. To fulfill this, two college level classes at two universities were surveyed. One university was located in the northeastern US and the other in the West Indies. There was a sample size of 42 students and a supplementary interview was administered to a select 9 students. Interviews were also administered to professors in the field in order to gain insight into the teaching of statistics in genetics. Key findings indicated that students had very little to no background in statistics (55%). Although students did perform well on exams with 60% of the population receiving an A or B grade, 77% of them did not offer good explanations on a probability question associated with the normal distribution provided in the survey. The scope and presentation of the applicable statistics/mathematics in some of the most used textbooks in genetics teaching, as well as genetics syllabi used by instructors do not help the issue. It was found that the text books, often times, either did not give effective explanations for students, or completely left out certain topics. The omission of certain statistical/mathematical oriented topics was seen to be also true with the genetics syllabi reviewed for this study. Nonetheless

  4. Low-flow statistics of selected streams in Chester County, Pennsylvania

    USGS Publications Warehouse

    Schreffler, Curtis L.

    1998-01-01

    Low-flow statistics for many streams in Chester County, Pa., were determined on the basis of data from 14 continuous-record streamflow stations in Chester County and data from 1 station in Maryland and 1 station in Delaware. The stations in Maryland and Delaware are on streams that drain large areas within Chester County. Streamflow data through the 1994 water year were used in the analyses. The low-flow statistics summarized are the 1Q10, 7Q10, 30Q10, and harmonic mean. Low-flow statistics were estimated at 34 partial-record stream sites throughout Chester County.

  5. Analysis methodology and development of a statistical tool for biodistribution data from internal contamination with actinides.

    PubMed

    Lamart, Stephanie; Griffiths, Nina M; Tchitchek, Nicolas; Angulo, Jaime F; Van der Meeren, Anne

    2017-03-01

    The aim of this work was to develop a computational tool that integrates several statistical analysis features for biodistribution data from internal contamination experiments. These data represent actinide levels in biological compartments as a function of time and are derived from activity measurements in tissues and excreta. These experiments aim at assessing the influence of different contamination conditions (e.g. intake route or radioelement) on the biological behavior of the contaminant. The ever increasing number of datasets and diversity of experimental conditions make the handling and analysis of biodistribution data difficult. This work sought to facilitate the statistical analysis of a large number of datasets and the comparison of results from diverse experimental conditions. Functional modules were developed using the open-source programming language R to facilitate specific operations: descriptive statistics, visual comparison, curve fitting, and implementation of biokinetic models. In addition, the structure of the datasets was harmonized using the same table format. Analysis outputs can be written in text files and updated data can be written in the consistent table format. Hence, a data repository is built progressively, which is essential for the optimal use of animal data. Graphical representations can be automatically generated and saved as image files. The resulting computational tool was applied using data derived from wound contamination experiments conducted under different conditions. In facilitating biodistribution data handling and statistical analyses, this computational tool ensures faster analyses and a better reproducibility compared with the use of multiple office software applications. Furthermore, re-analysis of archival data and comparison of data from different sources is made much easier. Hence this tool will help to understand better the influence of contamination characteristics on actinide biokinetics. Our approach can aid

  6. Statistical methods for convergence detection of multi-objective evolutionary algorithms.

    PubMed

    Trautmann, H; Wagner, T; Naujoks, B; Preuss, M; Mehnen, J

    2009-01-01

    In this paper, two approaches for estimating the generation in which a multi-objective evolutionary algorithm (MOEA) shows statistically significant signs of convergence are introduced. A set-based perspective is taken where convergence is measured by performance indicators. The proposed techniques fulfill the requirements of proper statistical assessment on the one hand and efficient optimisation for real-world problems on the other hand. The first approach accounts for the stochastic nature of the MOEA by repeating the optimisation runs for increasing generation numbers and analysing the performance indicators using statistical tools. This technique results in a very robust offline procedure. Moreover, an online convergence detection method is introduced as well. This method automatically stops the MOEA when either the variance of the performance indicators falls below a specified threshold or a stagnation of their overall trend is detected. Both methods are analysed and compared for two MOEA and on different classes of benchmark functions. It is shown that the methods successfully operate on all stated problems needing less function evaluations while preserving good approximation quality at the same time.

  7. Progressive statistics for studies in sports medicine and exercise science.

    PubMed

    Hopkins, William G; Marshall, Stephen W; Batterham, Alan M; Hanin, Juri

    2009-01-01

    Statistical guidelines and expert statements are now available to assist in the analysis and reporting of studies in some biomedical disciplines. We present here a more progressive resource for sample-based studies, meta-analyses, and case studies in sports medicine and exercise science. We offer forthright advice on the following controversial or novel issues: using precision of estimation for inferences about population effects in preference to null-hypothesis testing, which is inadequate for assessing clinical or practical importance; justifying sample size via acceptable precision or confidence for clinical decisions rather than via adequate power for statistical significance; showing SD rather than SEM, to better communicate the magnitude of differences in means and nonuniformity of error; avoiding purely nonparametric analyses, which cannot provide inferences about magnitude and are unnecessary; using regression statistics in validity studies, in preference to the impractical and biased limits of agreement; making greater use of qualitative methods to enrich sample-based quantitative projects; and seeking ethics approval for public access to the depersonalized raw data of a study, to address the need for more scrutiny of research and better meta-analyses. Advice on less contentious issues includes the following: using covariates in linear models to adjust for confounders, to account for individual differences, and to identify potential mechanisms of an effect; using log transformation to deal with nonuniformity of effects and error; identifying and deleting outliers; presenting descriptive, effect, and inferential statistics in appropriate formats; and contending with bias arising from problems with sampling, assignment, blinding, measurement error, and researchers' prejudices. This article should advance the field by stimulating debate, promoting innovative approaches, and serving as a useful checklist for authors, reviewers, and editors.

  8. A Framework for Assessing High School Students' Statistical Reasoning

    PubMed Central

    2016-01-01

    Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students’ statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students’ statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework’s cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments. PMID:27812091

  9. Application of Natural Mineral Additives in Construction

    NASA Astrophysics Data System (ADS)

    Linek, Malgorzata; Nita, Piotr; Wolka, Paweł; Zebrowski, Wojciech

    2017-12-01

    The article concerns the idea of using selected mineral additives in the pavement quality concrete composition. The basis of the research paper was the modification of cement concrete intended for airfield pavements. The application of the additives: metakaolonite and natural zeolite was suggested. Analyses included the assessment of basic physical properties of modifiers. Screening analysis, assessment of micro structure and chemical microanalysis were conducted in case of these materials. The influence of the applied additives on the change of concrete mix parameters was also presented. The impact of zeolite and metakaolinite on the mix density, oxygen content and consistency class was analysed. The influence of modifiers on physical and mechanical changes of the hardened cement concrete was discussed (concrete density, compressive strength and bending strength during fracturing) in diversified research periods. The impact of the applied additives on the changes of internal structure of cement concrete was discussed. Observation of concrete micro structure was conducted using the scanning electron microscope. According to the obtained lab test results, parameters of the applied modifiers and their influence on changes of internal structure of cement concrete are reflected in the increase of mechanical properties of pavement quality concrete. The increase of compressive and bending strength in case of all analysed research periods was proved.

  10. Illinois crash facts and statistics, 2002

    DOT National Transportation Integrated Search

    2002-01-01

    This publication, Illinois Traffic Crash Facts : and Statistics for 2002, is designed to provide an : overview of motor vehicle crash experience in : Illinois. In addition to a plethora of crash data, the : publication includes key events in th...

  11. Illinois crash facts and statistics, 2001

    DOT National Transportation Integrated Search

    2001-01-01

    This publication, Illinois Traffic Crash Facts : and Statistics for 2001, is designed to provide an : overview of motor vehicle crash experience in : Illinois. In addition to a plethora of crash data, the : publication includes key events in th...

  12. Illinois crash facts and statistics, 2003

    DOT National Transportation Integrated Search

    2003-01-01

    This publication, Illinois Traffic Crash Facts : and Statistics for 2003, is designed to provide an : overview of motor vehicle crash experience in : Illinois. In addition to a plethora of crash data, the : publication includes key events in th...

  13. An innovative statistical approach for analysing non-continuous variables in environmental monitoring: assessing temporal trends of TBT pollution.

    PubMed

    Santos, José António; Galante-Oliveira, Susana; Barroso, Carlos

    2011-03-01

    The current work presents an innovative statistical approach to model ordinal variables in environmental monitoring studies. An ordinal variable has values that can only be compared as "less", "equal" or "greater" and it is not possible to have information about the size of the difference between two particular values. The example of ordinal variable under this study is the vas deferens sequence (VDS) used in imposex (superimposition of male sexual characters onto prosobranch females) field assessment programmes for monitoring tributyltin (TBT) pollution. The statistical methodology presented here is the ordered logit regression model. It assumes that the VDS is an ordinal variable whose values match up a process of imposex development that can be considered continuous in both biological and statistical senses and can be described by a latent non-observable continuous variable. This model was applied to the case study of Nucella lapillus imposex monitoring surveys conducted in the Portuguese coast between 2003 and 2008 to evaluate the temporal evolution of TBT pollution in this country. In order to produce more reliable conclusions, the proposed model includes covariates that may influence the imposex response besides TBT (e.g. the shell size). The model also provides an analysis of the environmental risk associated to TBT pollution by estimating the probability of the occurrence of females with VDS ≥ 2 in each year, according to OSPAR criteria. We consider that the proposed application of this statistical methodology has a great potential in environmental monitoring whenever there is the need to model variables that can only be assessed through an ordinal scale of values.

  14. Understanding Statistics - Cancer Statistics

    Cancer.gov

    Annual reports of U.S. cancer statistics including new cases, deaths, trends, survival, prevalence, lifetime risk, and progress toward Healthy People targets, plus statistical summaries for a number of common cancer types.

  15. [Cause-of-death statistics and ICD, quo vadis?

    PubMed

    Eckert, Olaf; Vogel, Ulrich

    2018-07-01

    The International Statistical Classification of Diseases and Related Health Problems (ICD) is the worldwide binding standard for generating underlying cause-of-death statistics. What are the effects of former revisions of the ICD on underlying cause-of-death statistics and which opportunities and challenges are becoming apparent in a possible transition process from ICD-10 to ICD-11?This article presents the calculation of the exploitation grade of ICD-9 and ICD-10 in the German cause-of-death statistics and quality of documentation. Approximately 67,000 anonymized German death certificates are processed by Iris/MUSE and official German cause-of-death statistics are analyzed.In addition to substantial changes in the exploitation grade in the transition from ICD-9 to ICD-10, regional effects become visible. The rate of so-called "ill-defined" conditions exceeds 10%.Despite substantial improvement of ICD revisions there are long-known deficits in the coroner's inquest, filling death certificates and quality of coding. To make better use of the ICD as a methodological framework for mortality statistics and health reporting in Germany, the following measures are necessary: 1. General use of Iris/MUSE, 2. Establishing multiple underlying cause-of-death statistics, 3. Introduction of an electronic death certificate, 4. Improvement of the medical assessment of cause of death.Within short time the WHO will release the 11th revision of the ICD that will provide additional opportunities for the development of underlying cause-of-death statistics and their use in science, public health and politics. A coordinated effort including participants in the process and users is necessary to meet the related challenges.

  16. Mixed Approach Retrospective Analyses of Suicide and Suicidal Ideation for Brand Compared with Generic Central Nervous System Drugs.

    PubMed

    Cheng, Ning; Rahman, Md Motiur; Alatawi, Yasser; Qian, Jingjing; Peissig, Peggy L; Berg, Richard L; Page, C David; Hansen, Richard A

    2018-04-01

    Several different types of drugs acting on the central nervous system (CNS) have previously been associated with an increased risk of suicide and suicidal ideation (broadly referred to as suicide). However, a differential association between brand and generic CNS drugs and suicide has not been reported. This study compares suicide adverse event rates for brand versus generic CNS drugs using multiple sources of data. Selected examples of CNS drugs (sertraline, gabapentin, zolpidem, and methylphenidate) were evaluated via the US FDA Adverse Event Reporting System (FAERS) for a hypothesis-generating study, and then via administrative claims and electronic health record (EHR) data for a more rigorous retrospective cohort study. Disproportionality analyses with reporting odds ratios and 95% confidence intervals (CIs) were used in the FAERS analyses to quantify the association between each drug and reported suicide. For the cohort studies, Cox proportional hazards models were used, controlling for demographic and clinical characteristics as well as the background risk of suicide in the insured population. The FAERS analyses found significantly lower suicide reporting rates for brands compared with generics for all four studied products (Breslow-Day P < 0.05). In the claims- and EHR-based cohort study, the adjusted hazard ratio (HR) was statistically significant only for sertraline (HR 0.58; 95% CI 0.38-0.88). Suicide reporting rates were disproportionately larger for generic than for brand CNS drugs in FAERS and adjusted retrospective cohort analyses remained significant only for sertraline. However, even for sertraline, temporal confounding related to the close proximity of black box warnings and generic availability is possible. Additional analyses in larger data sources with additional drugs are needed.

  17. Folded concave penalized sparse linear regression: sparsity, statistical performance, and algorithmic theory for local solutions.

    PubMed

    Liu, Hongcheng; Yao, Tao; Li, Runze; Ye, Yinyu

    2017-11-01

    This paper concerns the folded concave penalized sparse linear regression (FCPSLR), a class of popular sparse recovery methods. Although FCPSLR yields desirable recovery performance when solved globally, computing a global solution is NP-complete. Despite some existing statistical performance analyses on local minimizers or on specific FCPSLR-based learning algorithms, it still remains open questions whether local solutions that are known to admit fully polynomial-time approximation schemes (FPTAS) may already be sufficient to ensure the statistical performance, and whether that statistical performance can be non-contingent on the specific designs of computing procedures. To address the questions, this paper presents the following threefold results: (i) Any local solution (stationary point) is a sparse estimator, under some conditions on the parameters of the folded concave penalties. (ii) Perhaps more importantly, any local solution satisfying a significant subspace second-order necessary condition (S 3 ONC), which is weaker than the second-order KKT condition, yields a bounded error in approximating the true parameter with high probability. In addition, if the minimal signal strength is sufficient, the S 3 ONC solution likely recovers the oracle solution. This result also explicates that the goal of improving the statistical performance is consistent with the optimization criteria of minimizing the suboptimality gap in solving the non-convex programming formulation of FCPSLR. (iii) We apply (ii) to the special case of FCPSLR with minimax concave penalty (MCP) and show that under the restricted eigenvalue condition, any S 3 ONC solution with a better objective value than the Lasso solution entails the strong oracle property. In addition, such a solution generates a model error (ME) comparable to the optimal but exponential-time sparse estimator given a sufficient sample size, while the worst-case ME is comparable to the Lasso in general. Furthermore, to guarantee

  18. NeuroVault.org: A repository for sharing unthresholded statistical maps, parcellations, and atlases of the human brain.

    PubMed

    Gorgolewski, Krzysztof J; Varoquaux, Gael; Rivera, Gabriel; Schwartz, Yannick; Sochat, Vanessa V; Ghosh, Satrajit S; Maumet, Camille; Nichols, Thomas E; Poline, Jean-Baptiste; Yarkoni, Tal; Margulies, Daniel S; Poldrack, Russell A

    2016-01-01

    NeuroVault.org is dedicated to storing outputs of analyses in the form of statistical maps, parcellations and atlases, a unique strategy that contrasts with most neuroimaging repositories that store raw acquisition data or stereotaxic coordinates. Such maps are indispensable for performing meta-analyses, validating novel methodology, and deciding on precise outlines for regions of interest (ROIs). NeuroVault is open to maps derived from both healthy and clinical populations, as well as from various imaging modalities (sMRI, fMRI, EEG, MEG, PET, etc.). The repository uses modern web technologies such as interactive web-based visualization, cognitive decoding, and comparison with other maps to provide researchers with efficient, intuitive tools to improve the understanding of their results. Each dataset and map is assigned a permanent Universal Resource Locator (URL), and all of the data is accessible through a REST Application Programming Interface (API). Additionally, the repository supports the NIDM-Results standard and has the ability to parse outputs from popular FSL and SPM software packages to automatically extract relevant metadata. This ease of use, modern web-integration, and pioneering functionality holds promise to improve the workflow for making inferences about and sharing whole-brain statistical maps. Copyright © 2015 Elsevier Inc. All rights reserved.

  19. Perception in statistical graphics

    NASA Astrophysics Data System (ADS)

    VanderPlas, Susan Ruth

    There has been quite a bit of research on statistical graphics and visualization, generally focused on new types of graphics, new software to create graphics, interactivity, and usability studies. Our ability to interpret and use statistical graphics hinges on the interface between the graph itself and the brain that perceives and interprets it, and there is substantially less research on the interplay between graph, eye, brain, and mind than is sufficient to understand the nature of these relationships. The goal of the work presented here is to further explore the interplay between a static graph, the translation of that graph from paper to mental representation (the journey from eye to brain), and the mental processes that operate on that graph once it is transferred into memory (mind). Understanding the perception of statistical graphics should allow researchers to create more effective graphs which produce fewer distortions and viewer errors while reducing the cognitive load necessary to understand the information presented in the graph. Taken together, these experiments should lay a foundation for exploring the perception of statistical graphics. There has been considerable research into the accuracy of numerical judgments viewers make from graphs, and these studies are useful, but it is more effective to understand how errors in these judgments occur so that the root cause of the error can be addressed directly. Understanding how visual reasoning relates to the ability to make judgments from graphs allows us to tailor graphics to particular target audiences. In addition, understanding the hierarchy of salient features in statistical graphics allows us to clearly communicate the important message from data or statistical models by constructing graphics which are designed specifically for the perceptual system.

  20. Mathematical background and attitudes toward statistics in a sample of Spanish college students.

    PubMed

    Carmona, José; Martínez, Rafael J; Sánchez, Manuel

    2005-08-01

    To examine the relation of mathematical background and initial attitudes toward statistics of Spanish college students in social sciences the Survey of Attitudes Toward Statistics was given to 827 students. Multivariate analyses tested the effects of two indicators of mathematical background (amount of exposure and achievement in previous courses) on the four subscales. Analysis suggested grades in previous courses are more related to initial attitudes toward statistics than the number of mathematics courses taken. Mathematical background was related with students' affective responses to statistics but not with their valuing of statistics. Implications of possible research are discussed.

  1. Full in-vitro analyses of new-generation bulk fill dental composites cured by halogen light.

    PubMed

    Tekin, Tuçe Hazal; Kantürk Figen, Aysel; Yılmaz Atalı, Pınar; Coşkuner Filiz, Bilge; Pişkin, Mehmet Burçin

    2017-08-01

    The objective of this study was to investigate the full in-vitro analyses of new-generation bulk-fill dental composites cured by halogen light (HLG). Two types' four composites were studied: Surefill SDR (SDR) and Xtra Base (XB) as bulk-fill flowable materials; QuixFill (QF) and XtraFill (XF) as packable bulk-fill materials. Samples were prepared for each analysis and test by applying the same procedure, but with different diameters and thicknesses appropriate to the analysis and test requirements. Thermal properties were determined by thermogravimetric analysis (TG/DTG) and differential scanning calorimetry (DSC) analysis; the Vickers microhardness (VHN) was measured after 1, 7, 15 and 30days of storage in water. The degree of conversion values for the materials (DC, %) were immediately measured using near-infrared spectroscopy (FT-IR). The surface morphology of the composites was investigated by scanning electron microscopes (SEM) and atomic-force microscopy (AFM) analyses. The sorption and solubility measurements were also performed after 1, 7, 15 and 30days of storage in water. In addition to his, the data were statistically analyzed using one-way analysis of variance, and both the Newman Keuls and Tukey multiple comparison tests. The statistical significance level was established at p<0.05. According to the ISO 4049 standards, all the tested materials showed acceptable water sorption and solubility, and a halogen light source was an option to polymerize bulk-fill, resin-based dental composites. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. The statistical big bang of 1911: ideology, technological innovation and the production of medical statistics.

    PubMed

    Higgs, W

    1996-12-01

    This paper examines the relationship between intellectual debate, technologies for analysing information, and the production of statistics in the General Register Office (GRO) in London in the early twentieth century. It argues that controversy between eugenicists and public health officials respecting the cause and effect of class-specific variations in fertility led to the introduction of questions in the 1911 census on marital fertility. The increasing complexity of the census necessitated a shift from manual to mechanised forms of data processing within the GRO. The subsequent increase in processing power allowed the GRO to make important changes to the medical and demographic statistics it published in the annual Reports of the Registrar General. These included substituting administrative sanitary districts for registration districts as units of analysis, consistently transferring deaths in institutions back to place of residence, and abstracting deaths according to the International List of Causes of Death.

  3. Introductory Statistics and Fish Management.

    ERIC Educational Resources Information Center

    Jardine, Dick

    2002-01-01

    Describes how fisheries research and management data (available on a website) have been incorporated into an Introductory Statistics course. In addition to the motivation gained from seeing the practical relevance of the course, some students have participated in the data collection and analysis for the New Hampshire Fish and Game Department. (MM)

  4. Valid randomization-based p-values for partially post hoc subgroup analyses.

    PubMed

    Lee, Joseph J; Rubin, Donald B

    2015-10-30

    By 'partially post-hoc' subgroup analyses, we mean analyses that compare existing data from a randomized experiment-from which a subgroup specification is derived-to new, subgroup-only experimental data. We describe a motivating example in which partially post hoc subgroup analyses instigated statistical debate about a medical device's efficacy. We clarify the source of such analyses' invalidity and then propose a randomization-based approach for generating valid posterior predictive p-values for such partially post hoc subgroups. Lastly, we investigate the approach's operating characteristics in a simple illustrative setting through a series of simulations, showing that it can have desirable properties under both null and alternative hypotheses. Copyright © 2015 John Wiley & Sons, Ltd.

  5. Exponential order statistic models of software reliability growth

    NASA Technical Reports Server (NTRS)

    Miller, D. R.

    1985-01-01

    Failure times of a software reliabilty growth process are modeled as order statistics of independent, nonidentically distributed exponential random variables. The Jelinsky-Moranda, Goel-Okumoto, Littlewood, Musa-Okumoto Logarithmic, and Power Law models are all special cases of Exponential Order Statistic Models, but there are many additional examples also. Various characterizations, properties and examples of this class of models are developed and presented.

  6. A statistical and experimental approach for assessing the preservation of plant lipids in soil

    NASA Astrophysics Data System (ADS)

    Mueller, K. E.; Eissenstat, D. M.; Oleksyn, J.; Freeman, K. H.

    2011-12-01

    Plant-derived lipids contribute to stable soil organic matter, but further interpretations of their abundance in soils are limited because the factors that control lipid preservation are poorly understood. Using data from a long-term field experiment and simple statistical models, we provide novel constraints on several predictors of the concentration of hydrolyzable lipids in forest mineral soils. Focal lipids included common monomers of cutin, suberin, and plant waxes present in tree leaves and roots. Soil lipid concentrations were most strongly influenced by the concentrations of lipids in leaves and roots of the overlying trees, but were also affected by the type of lipid (e.g. alcohols vs. acids), lipid chain length, and whether lipids originated in leaves or roots. Collectively, these factors explained ~80% of the variation in soil lipid concentrations beneath 11 different tree species. In order to use soil lipid analyses to test and improve conceptual models of soil organic matter stabilization, additional studies that provide experimental and quantitative (i.e. statistical) constraints on plant lipid preservation are needed.

  7. Bayesian approach to inverse statistical mechanics.

    PubMed

    Habeck, Michael

    2014-05-01

    Inverse statistical mechanics aims to determine particle interactions from ensemble properties. This article looks at this inverse problem from a Bayesian perspective and discusses several statistical estimators to solve it. In addition, a sequential Monte Carlo algorithm is proposed that draws the interaction parameters from their posterior probability distribution. The posterior probability involves an intractable partition function that is estimated along with the interactions. The method is illustrated for inverse problems of varying complexity, including the estimation of a temperature, the inverse Ising problem, maximum entropy fitting, and the reconstruction of molecular interaction potentials.

  8. Bayesian approach to inverse statistical mechanics

    NASA Astrophysics Data System (ADS)

    Habeck, Michael

    2014-05-01

    Inverse statistical mechanics aims to determine particle interactions from ensemble properties. This article looks at this inverse problem from a Bayesian perspective and discusses several statistical estimators to solve it. In addition, a sequential Monte Carlo algorithm is proposed that draws the interaction parameters from their posterior probability distribution. The posterior probability involves an intractable partition function that is estimated along with the interactions. The method is illustrated for inverse problems of varying complexity, including the estimation of a temperature, the inverse Ising problem, maximum entropy fitting, and the reconstruction of molecular interaction potentials.

  9. Statistical Modeling for Radiation Hardness Assurance

    NASA Technical Reports Server (NTRS)

    Ladbury, Raymond L.

    2014-01-01

    We cover the models and statistics associated with single event effects (and total ionizing dose), why we need them, and how to use them: What models are used, what errors exist in real test data, and what the model allows us to say about the DUT will be discussed. In addition, how to use other sources of data such as historical, heritage, and similar part and how to apply experience, physics, and expert opinion to the analysis will be covered. Also included will be concepts of Bayesian statistics, data fitting, and bounding rates.

  10. Polygenic scores via penalized regression on summary statistics.

    PubMed

    Mak, Timothy Shin Heng; Porsch, Robert Milan; Choi, Shing Wan; Zhou, Xueya; Sham, Pak Chung

    2017-09-01

    Polygenic scores (PGS) summarize the genetic contribution of a person's genotype to a disease or phenotype. They can be used to group participants into different risk categories for diseases, and are also used as covariates in epidemiological analyses. A number of possible ways of calculating PGS have been proposed, and recently there is much interest in methods that incorporate information available in published summary statistics. As there is no inherent information on linkage disequilibrium (LD) in summary statistics, a pertinent question is how we can use LD information available elsewhere to supplement such analyses. To answer this question, we propose a method for constructing PGS using summary statistics and a reference panel in a penalized regression framework, which we call lassosum. We also propose a general method for choosing the value of the tuning parameter in the absence of validation data. In our simulations, we showed that pseudovalidation often resulted in prediction accuracy that is comparable to using a dataset with validation phenotype and was clearly superior to the conservative option of setting the tuning parameter of lassosum to its lowest value. We also showed that lassosum achieved better prediction accuracy than simple clumping and P-value thresholding in almost all scenarios. It was also substantially faster and more accurate than the recently proposed LDpred. © 2017 WILEY PERIODICALS, INC.

  11. Grain-Size Based Additivity Models for Scaling Multi-rate Uranyl Surface Complexation in Subsurface Sediments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Xiaoying; Liu, Chongxuan; Hu, Bill X.

    This study statistically analyzed a grain-size based additivity model that has been proposed to scale reaction rates and parameters from laboratory to field. The additivity model assumed that reaction properties in a sediment including surface area, reactive site concentration, reaction rate, and extent can be predicted from field-scale grain size distribution by linearly adding reaction properties for individual grain size fractions. This study focused on the statistical analysis of the additivity model with respect to reaction rate constants using multi-rate uranyl (U(VI)) surface complexation reactions in a contaminated sediment as an example. Experimental data of rate-limited U(VI) desorption in amore » stirred flow-cell reactor were used to estimate the statistical properties of multi-rate parameters for individual grain size fractions. The statistical properties of the rate constants for the individual grain size fractions were then used to analyze the statistical properties of the additivity model to predict rate-limited U(VI) desorption in the composite sediment, and to evaluate the relative importance of individual grain size fractions to the overall U(VI) desorption. The result indicated that the additivity model provided a good prediction of the U(VI) desorption in the composite sediment. However, the rate constants were not directly scalable using the additivity model, and U(VI) desorption in individual grain size fractions have to be simulated in order to apply the additivity model. An approximate additivity model for directly scaling rate constants was subsequently proposed and evaluated. The result found that the approximate model provided a good prediction of the experimental results within statistical uncertainty. This study also found that a gravel size fraction (2-8mm), which is often ignored in modeling U(VI) sorption and desorption, is statistically significant to the U(VI) desorption in the sediment.« less

  12. GIS and statistical analysis for landslide susceptibility mapping in the Daunia area, Italy

    NASA Astrophysics Data System (ADS)

    Mancini, F.; Ceppi, C.; Ritrovato, G.

    2010-09-01

    This study focuses on landslide susceptibility mapping in the Daunia area (Apulian Apennines, Italy) and achieves this by using a multivariate statistical method and data processing in a Geographical Information System (GIS). The Logistic Regression (hereafter LR) method was chosen to produce a susceptibility map over an area of 130 000 ha where small settlements are historically threatened by landslide phenomena. By means of LR analysis, the tendency to landslide occurrences was, therefore, assessed by relating a landslide inventory (dependent variable) to a series of causal factors (independent variables) which were managed in the GIS, while the statistical analyses were performed by means of the SPSS (Statistical Package for the Social Sciences) software. The LR analysis produced a reliable susceptibility map of the investigated area and the probability level of landslide occurrence was ranked in four classes. The overall performance achieved by the LR analysis was assessed by local comparison between the expected susceptibility and an independent dataset extrapolated from the landslide inventory. Of the samples classified as susceptible to landslide occurrences, 85% correspond to areas where landslide phenomena have actually occurred. In addition, the consideration of the regression coefficients provided by the analysis demonstrated that a major role is played by the "land cover" and "lithology" causal factors in determining the occurrence and distribution of landslide phenomena in the Apulian Apennines.

  13. Development and Assessment of a Preliminary Randomization-Based Introductory Statistics Curriculum

    ERIC Educational Resources Information Center

    Tintle, Nathan; VanderStoep, Jill; Holmes, Vicki-Lynn; Quisenberry, Brooke; Swanson, Todd

    2011-01-01

    The algebra-based introductory statistics course is the most popular undergraduate course in statistics. While there is a general consensus for the content of the curriculum, the recent Guidelines for Assessment and Instruction in Statistics Education (GAISE) have challenged the pedagogy of this course. Additionally, some arguments have been made…

  14. 34 CFR 668.49 - Institutional fire safety policies and fire statistics.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 34 Education 3 2010-07-01 2010-07-01 false Institutional fire safety policies and fire statistics... fire statistics. (a) Additional definitions that apply to this section. Cause of fire: The factor or... statistics described in paragraph (c) of this section. (2) A description of each on-campus student housing...

  15. Statistics for the Relative Detectability of Chemicals in Weak Gaseous Plumes in LWIR Hyperspectral Imagery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Metoyer, Candace N.; Walsh, Stephen J.; Tardiff, Mark F.

    2008-10-30

    The detection and identification of weak gaseous plumes using thermal imaging data is complicated by many factors. These include variability due to atmosphere, ground and plume temperature, and background clutter. This paper presents an analysis of one formulation of the physics-based model that describes the at-sensor observed radiance. The motivating question for the analyses performed in this paper is as follows. Given a set of backgrounds, is there a way to predict the background over which the probability of detecting a given chemical will be the highest? Two statistics were developed to address this question. These statistics incorporate data frommore » the long-wave infrared band to predict the background over which chemical detectability will be the highest. These statistics can be computed prior to data collection. As a preliminary exploration into the predictive ability of these statistics, analyses were performed on synthetic hyperspectral images. Each image contained one chemical (either carbon tetrachloride or ammonia) spread across six distinct background types. The statistics were used to generate predictions for the background ranks. Then, the predicted ranks were compared to the empirical ranks obtained from the analyses of the synthetic images. For the simplified images under consideration, the predicted and empirical ranks showed a promising amount of agreement. One statistic accurately predicted the best and worst background for detection in all of the images. Future work may include explorations of more complicated plume ingredients, background types, and noise structures.« less

  16. Constitutive Analyses of Nontraditional Stabilization Additives

    DTIC Science & Technology

    2004-11-01

    cm-I Figure 29. FTIRIATR spectrum of Ven-Set 950 soil stabilization agent Based on the information provided in the MSDS and the FTIR analysis above...emulsion. The MSDS states that it is composed of an acrylic polymer (52 percent) with zinc oxide (2 percent), activated carbon (8 to 9 percent), and...water. The polymer as yet is unidentified. However, it appears to be an acrylate/ methacrylate with some aromaticity (peak about 1,635 c-f’). The

  17. A decade of individual participant data meta-analyses: A review of current practice.

    PubMed

    Simmonds, Mark; Stewart, Gavin; Stewart, Lesley

    2015-11-01

    Individual participant data (IPD) systematic reviews and meta-analyses are often considered to be the gold standard for meta-analysis. In the ten years since the first review into the methodology and reporting practice of IPD reviews was published much has changed in the field. This paper investigates current reporting and statistical practice in IPD systematic reviews. A systematic review was performed to identify systematic reviews that collected and analysed IPD. Data were extracted from each included publication on a variety of issues related to the reporting of IPD review process, and the statistical methods used. There has been considerable growth in the use of "one-stage" methods to perform IPD meta-analyses. The majority of reviews consider at least one covariate other than the primary intervention, either using subgroup analysis or including covariates in one-stage regression models. Random-effects analyses, however, are not often used. Reporting of review methods was often limited, with few reviews presenting a risk-of-bias assessment. Details on issues specific to the use of IPD were little reported, including how IPD were obtained; how data was managed and checked for consistency and errors; and for how many studies and participants IPD were sought and obtained. While the last ten years have seen substantial changes in how IPD meta-analyses are performed there remains considerable scope for improving the quality of reporting for both the process of IPD systematic reviews, and the statistical methods employed in them. It is to be hoped that the publication of the PRISMA-IPD guidelines specific to IPD reviews will improve reporting in this area. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. NASA Pocket Statistics

    NASA Technical Reports Server (NTRS)

    1996-01-01

    This booklet of pocket statistics includes the 1996 NASA Major Launch Record, NASA Procurement, Financial, and Workforce data. The NASA Major Launch Record includes all launches of Scout class and larger vehicles. Vehicle and spacecraft development flights are also included in the Major Luanch Record. Shuttle missions are counted as one launch and one payload, where free flying payloads are not involved. Satellites deployed from the cargo bay of the Shuttle and placed in a separate orbit or trajectory are counted as an additional payload.

  19. Genetic variation maintained in multilocus models of additive quantitative traits under stabilizing selection.

    PubMed Central

    Bürger, R; Gimelfarb, A

    1999-01-01

    Stabilizing selection for an intermediate optimum is generally considered to deplete genetic variation in quantitative traits. However, conflicting results from various types of models have been obtained. While classical analyses assuming a large number of independent additive loci with individually small effects indicated that no genetic variation is preserved under stabilizing selection, several analyses of two-locus models showed the contrary. We perform a complete analysis of a generalization of Wright's two-locus quadratic-optimum model and investigate numerically the ability of quadratic stabilizing selection to maintain genetic variation in additive quantitative traits controlled by up to five loci. A statistical approach is employed by choosing randomly 4000 parameter sets (allelic effects, recombination rates, and strength of selection) for a given number of loci. For each parameter set we iterate the recursion equations that describe the dynamics of gamete frequencies starting from 20 randomly chosen initial conditions until an equilibrium is reached, record the quantities of interest, and calculate their corresponding mean values. As the number of loci increases from two to five, the fraction of the genome expected to be polymorphic declines surprisingly rapidly, and the loci that are polymorphic increasingly are those with small effects on the trait. As a result, the genetic variance expected to be maintained under stabilizing selection decreases very rapidly with increased number of loci. The equilibrium structure expected under stabilizing selection on an additive trait differs markedly from that expected under selection with no constraints on genotypic fitness values. The expected genetic variance, the expected polymorphic fraction of the genome, as well as other quantities of interest, are only weakly dependent on the selection intensity and the level of recombination. PMID:10353920

  20. Application of multivariate statistical techniques in microbial ecology.

    PubMed

    Paliy, O; Shankar, V

    2016-03-01

    Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure. © 2016 John Wiley & Sons Ltd.

  1. How to Create Automatically Graded Spreadsheets for Statistics Courses

    ERIC Educational Resources Information Center

    LoSchiavo, Frank M.

    2016-01-01

    Instructors often use spreadsheet software (e.g., Microsoft Excel) in their statistics courses so that students can gain experience conducting computerized analyses. Unfortunately, students tend to make several predictable errors when programming spreadsheets. Without immediate feedback, programming errors are likely to go undetected, and as a…

  2. On sufficient statistics of least-squares superposition of vector sets.

    PubMed

    Konagurthu, Arun S; Kasarapu, Parthan; Allison, Lloyd; Collier, James H; Lesk, Arthur M

    2015-06-01

    The problem of superposition of two corresponding vector sets by minimizing their sum-of-squares error under orthogonal transformation is a fundamental task in many areas of science, notably structural molecular biology. This problem can be solved exactly using an algorithm whose time complexity grows linearly with the number of correspondences. This efficient solution has facilitated the widespread use of the superposition task, particularly in studies involving macromolecular structures. This article formally derives a set of sufficient statistics for the least-squares superposition problem. These statistics are additive. This permits a highly efficient (constant time) computation of superpositions (and sufficient statistics) of vector sets that are composed from its constituent vector sets under addition or deletion operation, where the sufficient statistics of the constituent sets are already known (that is, the constituent vector sets have been previously superposed). This results in a drastic improvement in the run time of the methods that commonly superpose vector sets under addition or deletion operations, where previously these operations were carried out ab initio (ignoring the sufficient statistics). We experimentally demonstrate the improvement our work offers in the context of protein structural alignment programs that assemble a reliable structural alignment from well-fitting (substructural) fragment pairs. A C++ library for this task is available online under an open-source license.

  3. Difficulties in learning and teaching statistics: teacher views

    NASA Astrophysics Data System (ADS)

    Koparan, Timur

    2015-01-01

    The purpose of this study is to define teacher views about the difficulties in learning and teaching middle school statistics subjects. To serve this aim, a number of interviews were conducted with 10 middle school maths teachers in 2011-2012 school year in the province of Trabzon. Of the qualitative descriptive research methods, the semi-structured interview technique was applied in the research. In accordance with the aim, teacher opinions about the statistics subjects were examined and analysed. Similar responses from the teachers were grouped and evaluated. The teachers stated that it was positive that middle school statistics subjects were taught gradually in every grade but some difficulties were experienced in the teaching of this subject. The findings are presented in eight themes which are context, sample, data representation, central tendency and dispersion measure, probability, variance, and other difficulties.

  4. Statistics of high-level scene context

    PubMed Central

    Greene, Michelle R.

    2013-01-01

    Context is critical for recognizing environments and for searching for objects within them: contextual associations have been shown to modulate reaction time and object recognition accuracy, as well as influence the distribution of eye movements and patterns of brain activations. However, we have not yet systematically quantified the relationships between objects and their scene environments. Here I seek to fill this gap by providing descriptive statistics of object-scene relationships. A total of 48, 167 objects were hand-labeled in 3499 scenes using the LabelMe tool (Russell et al., 2008). From these data, I computed a variety of descriptive statistics at three different levels of analysis: the ensemble statistics that describe the density and spatial distribution of unnamed “things” in the scene; the bag of words level where scenes are described by the list of objects contained within them; and the structural level where the spatial distribution and relationships between the objects are measured. The utility of each level of description for scene categorization was assessed through the use of linear classifiers, and the plausibility of each level for modeling human scene categorization is discussed. Of the three levels, ensemble statistics were found to be the most informative (per feature), and also best explained human patterns of categorization errors. Although a bag of words classifier had similar performance to human observers, it had a markedly different pattern of errors. However, certain objects are more useful than others, and ceiling classification performance could be achieved using only the 64 most informative objects. As object location tends not to vary as a function of category, structural information provided little additional information. Additionally, these data provide valuable information on natural scene redundancy that can be exploited for machine vision, and can help the visual cognition community to design experiments guided by

  5. Statistics of high-level scene context.

    PubMed

    Greene, Michelle R

    2013-01-01

    CONTEXT IS CRITICAL FOR RECOGNIZING ENVIRONMENTS AND FOR SEARCHING FOR OBJECTS WITHIN THEM: contextual associations have been shown to modulate reaction time and object recognition accuracy, as well as influence the distribution of eye movements and patterns of brain activations. However, we have not yet systematically quantified the relationships between objects and their scene environments. Here I seek to fill this gap by providing descriptive statistics of object-scene relationships. A total of 48, 167 objects were hand-labeled in 3499 scenes using the LabelMe tool (Russell et al., 2008). From these data, I computed a variety of descriptive statistics at three different levels of analysis: the ensemble statistics that describe the density and spatial distribution of unnamed "things" in the scene; the bag of words level where scenes are described by the list of objects contained within them; and the structural level where the spatial distribution and relationships between the objects are measured. The utility of each level of description for scene categorization was assessed through the use of linear classifiers, and the plausibility of each level for modeling human scene categorization is discussed. Of the three levels, ensemble statistics were found to be the most informative (per feature), and also best explained human patterns of categorization errors. Although a bag of words classifier had similar performance to human observers, it had a markedly different pattern of errors. However, certain objects are more useful than others, and ceiling classification performance could be achieved using only the 64 most informative objects. As object location tends not to vary as a function of category, structural information provided little additional information. Additionally, these data provide valuable information on natural scene redundancy that can be exploited for machine vision, and can help the visual cognition community to design experiments guided by statistics

  6. Histometric analyses of cancellous and cortical interface in autogenous bone grafting

    PubMed Central

    Netto, Henrique Duque; Olate, Sergio; Klüppel, Leandro; do Carmo, Antonio Marcio Resende; Vásquez, Bélgica; Albergaria-Barbosa, Jose

    2013-01-01

    Surgical procedures involving the rehabilitation of the maxillofacial region frequently require bone grafts; the aim of this research was to evaluate the interface between recipient and graft with cortical or cancellous contact. 6 adult beagle dogs with 15 kg weight were included in the study. Under general anesthesia, an 8 mm diameter block was obtained from parietal bone of each animal and was put on the frontal bone with a 12 mm 1.5 screws. Was used the lag screw technique from better contact between the recipient and graft. 3-week and 6-week euthanized period were chosen for histometric evaluation. Hematoxylin-eosin was used in a histologic routine technique and histomorphometry was realized with IMAGEJ software. T test was used for data analyses with p<0.05 for statistical significance. The result show some differences in descriptive histology but non statistical differences in the interface between cortical or cancellous bone at 3 or 6 week; as natural, after 6 week of surgery, bone integration was better and statistically superior to 3-week analyses. We conclude that integration of cortical or cancellous bone can be usefully without differences. PMID:23923071

  7. Accounting for undetected compounds in statistical analyses of mass spectrometry 'omic studies.

    PubMed

    Taylor, Sandra L; Leiserowitz, Gary S; Kim, Kyoungmi

    2013-12-01

    Mass spectrometry is an important high-throughput technique for profiling small molecular compounds in biological samples and is widely used to identify potential diagnostic and prognostic compounds associated with disease. Commonly, this data generated by mass spectrometry has many missing values resulting when a compound is absent from a sample or is present but at a concentration below the detection limit. Several strategies are available for statistically analyzing data with missing values. The accelerated failure time (AFT) model assumes all missing values result from censoring below a detection limit. Under a mixture model, missing values can result from a combination of censoring and the absence of a compound. We compare power and estimation of a mixture model to an AFT model. Based on simulated data, we found the AFT model to have greater power to detect differences in means and point mass proportions between groups. However, the AFT model yielded biased estimates with the bias increasing as the proportion of observations in the point mass increased while estimates were unbiased with the mixture model except if all missing observations came from censoring. These findings suggest using the AFT model for hypothesis testing and mixture model for estimation. We demonstrated this approach through application to glycomics data of serum samples from women with ovarian cancer and matched controls.

  8. Kidney function changes with aging in adults: comparison between cross-sectional and longitudinal data analyses in renal function assessment.

    PubMed

    Chung, Sang M; Lee, David J; Hand, Austin; Young, Philip; Vaidyanathan, Jayabharathi; Sahajwalla, Chandrahas

    2015-12-01

    The study evaluated whether the renal function decline rate per year with age in adults varies based on two primary statistical analyses: cross-section (CS), using one observation per subject, and longitudinal (LT), using multiple observations per subject over time. A total of 16628 records (3946 subjects; age range 30-92 years) of creatinine clearance and relevant demographic data were used. On average, four samples per subject were collected for up to 2364 days (mean: 793 days). A simple linear regression and random coefficient models were selected for CS and LT analyses, respectively. The renal function decline rates per year were 1.33 and 0.95 ml/min/year for CS and LT analyses, respectively, and were slower when the repeated individual measurements were considered. The study confirms that rates are different based on statistical analyses, and that a statistically robust longitudinal model with a proper sampling design provides reliable individual as well as population estimates of the renal function decline rates per year with age in adults. In conclusion, our findings indicated that one should be cautious in interpreting the renal function decline rate with aging information because its estimation was highly dependent on the statistical analyses. From our analyses, a population longitudinal analysis (e.g. random coefficient model) is recommended if individualization is critical, such as a dose adjustment based on renal function during a chronic therapy. Copyright © 2015 John Wiley & Sons, Ltd.

  9. Tsallis statistics and neurodegenerative disorders

    NASA Astrophysics Data System (ADS)

    Iliopoulos, Aggelos C.; Tsolaki, Magdalini; Aifantis, Elias C.

    2016-08-01

    In this paper, we perform statistical analysis of time series deriving from four neurodegenerative disorders, namely epilepsy, amyotrophic lateral sclerosis (ALS), Parkinson's disease (PD), Huntington's disease (HD). The time series are concerned with electroencephalograms (EEGs) of healthy and epileptic states, as well as gait dynamics (in particular stride intervals) of the ALS, PD and HDs. We study data concerning one subject for each neurodegenerative disorder and one healthy control. The analysis is based on Tsallis non-extensive statistical mechanics and in particular on the estimation of Tsallis q-triplet, namely {qstat, qsen, qrel}. The deviation of Tsallis q-triplet from unity indicates non-Gaussian statistics and long-range dependencies for all time series considered. In addition, the results reveal the efficiency of Tsallis statistics in capturing differences in brain dynamics between healthy and epileptic states, as well as differences between ALS, PD, HDs from healthy control subjects. The results indicate that estimations of Tsallis q-indices could be used as possible biomarkers, along with others, for improving classification and prediction of epileptic seizures, as well as for studying the gait complex dynamics of various diseases providing new insights into severity, medications and fall risk, improving therapeutic interventions.

  10. Publication of statistically significant research findings in prosthodontics & implant dentistry in the context of other dental specialties.

    PubMed

    Papageorgiou, Spyridon N; Kloukos, Dimitrios; Petridis, Haralampos; Pandis, Nikolaos

    2015-10-01

    To assess the hypothesis that there is excessive reporting of statistically significant studies published in prosthodontic and implantology journals, which could indicate selective publication. The last 30 issues of 9 journals in prosthodontics and implant dentistry were hand-searched for articles with statistical analyses. The percentages of significant and non-significant results were tabulated by parameter of interest. Univariable/multivariable logistic regression analyses were applied to identify possible predictors of reporting statistically significance findings. The results of this study were compared with similar studies in dentistry with random-effects meta-analyses. From the 2323 included studies 71% of them reported statistically significant results, with the significant results ranging from 47% to 86%. Multivariable modeling identified that geographical area and involvement of statistician were predictors of statistically significant results. Compared to interventional studies, the odds that in vitro and observational studies would report statistically significant results was increased by 1.20 times (OR: 2.20, 95% CI: 1.66-2.92) and 0.35 times (OR: 1.35, 95% CI: 1.05-1.73), respectively. The probability of statistically significant results from randomized controlled trials was significantly lower compared to various study designs (difference: 30%, 95% CI: 11-49%). Likewise the probability of statistically significant results in prosthodontics and implant dentistry was lower compared to other dental specialties, but this result did not reach statistical significant (P>0.05). The majority of studies identified in the fields of prosthodontics and implant dentistry presented statistically significant results. The same trend existed in publications of other specialties in dentistry. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. SPA- STATISTICAL PACKAGE FOR TIME AND FREQUENCY DOMAIN ANALYSIS

    NASA Technical Reports Server (NTRS)

    Brownlow, J. D.

    1994-01-01

    The need for statistical analysis often arises when data is in the form of a time series. This type of data is usually a collection of numerical observations made at specified time intervals. Two kinds of analysis may be performed on the data. First, the time series may be treated as a set of independent observations using a time domain analysis to derive the usual statistical properties including the mean, variance, and distribution form. Secondly, the order and time intervals of the observations may be used in a frequency domain analysis to examine the time series for periodicities. In almost all practical applications, the collected data is actually a mixture of the desired signal and a noise signal which is collected over a finite time period with a finite precision. Therefore, any statistical calculations and analyses are actually estimates. The Spectrum Analysis (SPA) program was developed to perform a wide range of statistical estimation functions. SPA can provide the data analyst with a rigorous tool for performing time and frequency domain studies. In a time domain statistical analysis the SPA program will compute the mean variance, standard deviation, mean square, and root mean square. It also lists the data maximum, data minimum, and the number of observations included in the sample. In addition, a histogram of the time domain data is generated, a normal curve is fit to the histogram, and a goodness-of-fit test is performed. These time domain calculations may be performed on both raw and filtered data. For a frequency domain statistical analysis the SPA program computes the power spectrum, cross spectrum, coherence, phase angle, amplitude ratio, and transfer function. The estimates of the frequency domain parameters may be smoothed with the use of Hann-Tukey, Hamming, Barlett, or moving average windows. Various digital filters are available to isolate data frequency components. Frequency components with periods longer than the data collection interval

  12. Exploratory Visual Analysis of Statistical Results from Microarray Experiments Comparing High and Low Grade Glioma

    PubMed Central

    Reif, David M.; Israel, Mark A.; Moore, Jason H.

    2007-01-01

    The biological interpretation of gene expression microarray results is a daunting challenge. For complex diseases such as cancer, wherein the body of published research is extensive, the incorporation of expert knowledge provides a useful analytical framework. We have previously developed the Exploratory Visual Analysis (EVA) software for exploring data analysis results in the context of annotation information about each gene, as well as biologically relevant groups of genes. We present EVA as a flexible combination of statistics and biological annotation that provides a straightforward visual interface for the interpretation of microarray analyses of gene expression in the most commonly occuring class of brain tumors, glioma. We demonstrate the utility of EVA for the biological interpretation of statistical results by analyzing publicly available gene expression profiles of two important glial tumors. The results of a statistical comparison between 21 malignant, high-grade glioblastoma multiforme (GBM) tumors and 19 indolent, low-grade pilocytic astrocytomas were analyzed using EVA. By using EVA to examine the results of a relatively simple statistical analysis, we were able to identify tumor class-specific gene expression patterns having both statistical and biological significance. Our interactive analysis highlighted the potential importance of genes involved in cell cycle progression, proliferation, signaling, adhesion, migration, motility, and structure, as well as candidate gene loci on a region of Chromosome 7 that has been implicated in glioma. Because EVA does not require statistical or computational expertise and has the flexibility to accommodate any type of statistical analysis, we anticipate EVA will prove a useful addition to the repertoire of computational methods used for microarray data analysis. EVA is available at no charge to academic users and can be found at http://www.epistasis.org. PMID:19390666

  13. Australasian Resuscitation In Sepsis Evaluation trial statistical analysis plan.

    PubMed

    Delaney, Anthony; Peake, Sandra L; Bellomo, Rinaldo; Cameron, Peter; Holdgate, Anna; Howe, Belinda; Higgins, Alisa; Presneill, Jeffrey; Webb, Steve

    2013-10-01

    The Australasian Resuscitation In Sepsis Evaluation (ARISE) study is an international, multicentre, randomised, controlled trial designed to evaluate the effectiveness of early goal-directed therapy compared with standard care for patients presenting to the ED with severe sepsis. In keeping with current practice, and taking into considerations aspects of trial design and reporting specific to non-pharmacologic interventions, this document outlines the principles and methods for analysing and reporting the trial results. The document is prepared prior to completion of recruitment into the ARISE study, without knowledge of the results of the interim analysis conducted by the data safety and monitoring committee and prior to completion of the two related international studies. The statistical analysis plan was designed by the ARISE chief investigators, and reviewed and approved by the ARISE steering committee. The data collected by the research team as specified in the study protocol, and detailed in the study case report form were reviewed. Information related to baseline characteristics, characteristics of delivery of the trial interventions, details of resuscitation and other related therapies, and other relevant data are described with appropriate comparisons between groups. The primary, secondary and tertiary outcomes for the study are defined, with description of the planned statistical analyses. A statistical analysis plan was developed, along with a trial profile, mock-up tables and figures. A plan for presenting baseline characteristics, microbiological and antibiotic therapy, details of the interventions, processes of care and concomitant therapies, along with adverse events are described. The primary, secondary and tertiary outcomes are described along with identification of subgroups to be analysed. A statistical analysis plan for the ARISE study has been developed, and is available in the public domain, prior to the completion of recruitment into the

  14. Basic statistics (the fundamental concepts).

    PubMed

    Lim, Eric

    2014-12-01

    An appreciation and understanding of statistics is import to all practising clinicians, not simply researchers. This is because mathematics is the fundamental basis to which we base clinical decisions, usually with reference to the benefit in relation to risk. Unless a clinician has a basic understanding of statistics, he or she will never be in a position to question healthcare management decisions that have been handed down from generation to generation, will not be able to conduct research effectively nor evaluate the validity of published evidence (usually making an assumption that most published work is either all good or all bad). This article provides a brief introduction to basic statistical methods and illustrates its use in common clinical scenarios. In addition, pitfalls of incorrect usage have been highlighted. However, it is not meant to be a substitute for formal training or consultation with a qualified and experienced medical statistician prior to starting any research project.

  15. Quantifying and reducing statistical uncertainty in sample-based health program costing studies in low- and middle-income countries.

    PubMed

    Rivera-Rodriguez, Claudia L; Resch, Stephen; Haneuse, Sebastien

    2018-01-01

    In many low- and middle-income countries, the costs of delivering public health programs such as for HIV/AIDS, nutrition, and immunization are not routinely tracked. A number of recent studies have sought to estimate program costs on the basis of detailed information collected on a subsample of facilities. While unbiased estimates can be obtained via accurate measurement and appropriate analyses, they are subject to statistical uncertainty. Quantification of this uncertainty, for example, via standard errors and/or 95% confidence intervals, provides important contextual information for decision-makers and for the design of future costing studies. While other forms of uncertainty, such as that due to model misspecification, are considered and can be investigated through sensitivity analyses, statistical uncertainty is often not reported in studies estimating the total program costs. This may be due to a lack of awareness/understanding of (1) the technical details regarding uncertainty estimation and (2) the availability of software with which to calculate uncertainty for estimators resulting from complex surveys. We provide an overview of statistical uncertainty in the context of complex costing surveys, emphasizing the various potential specific sources that contribute to overall uncertainty. We describe how analysts can compute measures of uncertainty, either via appropriately derived formulae or through resampling techniques such as the bootstrap. We also provide an overview of calibration as a means of using additional auxiliary information that is readily available for the entire program, such as the total number of doses administered, to decrease uncertainty and thereby improve decision-making and the planning of future studies. A recent study of the national program for routine immunization in Honduras shows that uncertainty can be reduced by using information available prior to the study. This method can not only be used when estimating the total cost of

  16. Quantifying and reducing statistical uncertainty in sample-based health program costing studies in low- and middle-income countries

    PubMed Central

    Resch, Stephen

    2018-01-01

    Objectives: In many low- and middle-income countries, the costs of delivering public health programs such as for HIV/AIDS, nutrition, and immunization are not routinely tracked. A number of recent studies have sought to estimate program costs on the basis of detailed information collected on a subsample of facilities. While unbiased estimates can be obtained via accurate measurement and appropriate analyses, they are subject to statistical uncertainty. Quantification of this uncertainty, for example, via standard errors and/or 95% confidence intervals, provides important contextual information for decision-makers and for the design of future costing studies. While other forms of uncertainty, such as that due to model misspecification, are considered and can be investigated through sensitivity analyses, statistical uncertainty is often not reported in studies estimating the total program costs. This may be due to a lack of awareness/understanding of (1) the technical details regarding uncertainty estimation and (2) the availability of software with which to calculate uncertainty for estimators resulting from complex surveys. We provide an overview of statistical uncertainty in the context of complex costing surveys, emphasizing the various potential specific sources that contribute to overall uncertainty. Methods: We describe how analysts can compute measures of uncertainty, either via appropriately derived formulae or through resampling techniques such as the bootstrap. We also provide an overview of calibration as a means of using additional auxiliary information that is readily available for the entire program, such as the total number of doses administered, to decrease uncertainty and thereby improve decision-making and the planning of future studies. Results: A recent study of the national program for routine immunization in Honduras shows that uncertainty can be reduced by using information available prior to the study. This method can not only be used when

  17. Increased left hemisphere impairment in high-functioning autism: a tract based spatial statistics study.

    PubMed

    Perkins, Thomas John; Stokes, Mark Andrew; McGillivray, Jane Anne; Mussap, Alexander Julien; Cox, Ivanna Anne; Maller, Jerome Joseph; Bittar, Richard Garth

    2014-11-30

    There is evidence emerging from Diffusion Tensor Imaging (DTI) research that autism spectrum disorders (ASD) are associated with greater impairment in the left hemisphere. Although this has been quantified with volumetric region of interest analyses, it has yet to be tested with white matter integrity analysis. In the present study, tract based spatial statistics was used to contrast white matter integrity of 12 participants with high-functioning autism or Aspergers syndrome (HFA/AS) with 12 typically developing individuals. Fractional Anisotropy (FA) was examined, in addition to axial, radial and mean diffusivity (AD, RD and MD). In the left hemisphere, participants with HFA/AS demonstrated significantly reduced FA in predominantly thalamic and fronto-parietal pathways and increased RD. Symmetry analyses confirmed that in the HFA/AS group, WM disturbance was significantly greater in the left compared to right hemisphere. These findings contribute to a growing body of literature suggestive of reduced FA in ASD, and provide preliminary evidence for RD impairments in the left hemisphere. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  18. Statistics and bioinformatics in nutritional sciences: analysis of complex data in the era of systems biology⋆

    PubMed Central

    Fu, Wenjiang J.; Stromberg, Arnold J.; Viele, Kert; Carroll, Raymond J.; Wu, Guoyao

    2009-01-01

    Over the past two decades, there have been revolutionary developments in life science technologies characterized by high throughput, high efficiency, and rapid computation. Nutritionists now have the advanced methodologies for the analysis of DNA, RNA, protein, low-molecular-weight metabolites, as well as access to bioinformatics databases. Statistics, which can be defined as the process of making scientific inferences from data that contain variability, has historically played an integral role in advancing nutritional sciences. Currently, in the era of systems biology, statistics has become an increasingly important tool to quantitatively analyze information about biological macromolecules. This article describes general terms used in statistical analysis of large, complex experimental data. These terms include experimental design, power analysis, sample size calculation, and experimental errors (type I and II errors) for nutritional studies at population, tissue, cellular, and molecular levels. In addition, we highlighted various sources of experimental variations in studies involving microarray gene expression, real-time polymerase chain reaction, proteomics, and other bioinformatics technologies. Moreover, we provided guidelines for nutritionists and other biomedical scientists to plan and conduct studies and to analyze the complex data. Appropriate statistical analyses are expected to make an important contribution to solving major nutrition-associated problems in humans and animals (including obesity, diabetes, cardiovascular disease, cancer, ageing, and intrauterine fetal retardation). PMID:20233650

  19. Walking through the statistical black boxes of plant breeding.

    PubMed

    Xavier, Alencar; Muir, William M; Craig, Bruce; Rainey, Katy Martin

    2016-10-01

    The main statistical procedures in plant breeding are based on Gaussian process and can be computed through mixed linear models. Intelligent decision making relies on our ability to extract useful information from data to help us achieve our goals more efficiently. Many plant breeders and geneticists perform statistical analyses without understanding the underlying assumptions of the methods or their strengths and pitfalls. In other words, they treat these statistical methods (software and programs) like black boxes. Black boxes represent complex pieces of machinery with contents that are not fully understood by the user. The user sees the inputs and outputs without knowing how the outputs are generated. By providing a general background on statistical methodologies, this review aims (1) to introduce basic concepts of machine learning and its applications to plant breeding; (2) to link classical selection theory to current statistical approaches; (3) to show how to solve mixed models and extend their application to pedigree-based and genomic-based prediction; and (4) to clarify how the algorithms of genome-wide association studies work, including their assumptions and limitations.

  20. Hybrid perturbation methods based on statistical time series models

    NASA Astrophysics Data System (ADS)

    San-Juan, Juan Félix; San-Martín, Montserrat; Pérez, Iván; López, Rosario

    2016-04-01

    In this work we present a new methodology for orbit propagation, the hybrid perturbation theory, based on the combination of an integration method and a prediction technique. The former, which can be a numerical, analytical or semianalytical theory, generates an initial approximation that contains some inaccuracies derived from the fact that, in order to simplify the expressions and subsequent computations, not all the involved forces are taken into account and only low-order terms are considered, not to mention the fact that mathematical models of perturbations not always reproduce physical phenomena with absolute precision. The prediction technique, which can be based on either statistical time series models or computational intelligence methods, is aimed at modelling and reproducing missing dynamics in the previously integrated approximation. This combination results in the precision improvement of conventional numerical, analytical and semianalytical theories for determining the position and velocity of any artificial satellite or space debris object. In order to validate this methodology, we present a family of three hybrid orbit propagators formed by the combination of three different orders of approximation of an analytical theory and a statistical time series model, and analyse their capability to process the effect produced by the flattening of the Earth. The three considered analytical components are the integration of the Kepler problem, a first-order and a second-order analytical theories, whereas the prediction technique is the same in the three cases, namely an additive Holt-Winters method.

  1. The Statistical Consulting Center for Astronomy (SCCA)

    NASA Technical Reports Server (NTRS)

    Akritas, Michael

    2001-01-01

    The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to

  2. Bias, precision and statistical power of analysis of covariance in the analysis of randomized trials with baseline imbalance: a simulation study.

    PubMed

    Egbewale, Bolaji E; Lewis, Martyn; Sim, Julius

    2014-04-09

    Analysis of variance (ANOVA), change-score analysis (CSA) and analysis of covariance (ANCOVA) respond differently to baseline imbalance in randomized controlled trials. However, no empirical studies appear to have quantified the differential bias and precision of estimates derived from these methods of analysis, and their relative statistical power, in relation to combinations of levels of key trial characteristics. This simulation study therefore examined the relative bias, precision and statistical power of these three analyses using simulated trial data. 126 hypothetical trial scenarios were evaluated (126,000 datasets), each with continuous data simulated by using a combination of levels of: treatment effect; pretest-posttest correlation; direction and magnitude of baseline imbalance. The bias, precision and power of each method of analysis were calculated for each scenario. Compared to the unbiased estimates produced by ANCOVA, both ANOVA and CSA are subject to bias, in relation to pretest-posttest correlation and the direction of baseline imbalance. Additionally, ANOVA and CSA are less precise than ANCOVA, especially when pretest-posttest correlation ≥ 0.3. When groups are balanced at baseline, ANCOVA is at least as powerful as the other analyses. Apparently greater power of ANOVA and CSA at certain imbalances is achieved in respect of a biased treatment effect. Across a range of correlations between pre- and post-treatment scores and at varying levels and direction of baseline imbalance, ANCOVA remains the optimum statistical method for the analysis of continuous outcomes in RCTs, in terms of bias, precision and statistical power.

  3. Bias, precision and statistical power of analysis of covariance in the analysis of randomized trials with baseline imbalance: a simulation study

    PubMed Central

    2014-01-01

    Background Analysis of variance (ANOVA), change-score analysis (CSA) and analysis of covariance (ANCOVA) respond differently to baseline imbalance in randomized controlled trials. However, no empirical studies appear to have quantified the differential bias and precision of estimates derived from these methods of analysis, and their relative statistical power, in relation to combinations of levels of key trial characteristics. This simulation study therefore examined the relative bias, precision and statistical power of these three analyses using simulated trial data. Methods 126 hypothetical trial scenarios were evaluated (126 000 datasets), each with continuous data simulated by using a combination of levels of: treatment effect; pretest-posttest correlation; direction and magnitude of baseline imbalance. The bias, precision and power of each method of analysis were calculated for each scenario. Results Compared to the unbiased estimates produced by ANCOVA, both ANOVA and CSA are subject to bias, in relation to pretest-posttest correlation and the direction of baseline imbalance. Additionally, ANOVA and CSA are less precise than ANCOVA, especially when pretest-posttest correlation ≥ 0.3. When groups are balanced at baseline, ANCOVA is at least as powerful as the other analyses. Apparently greater power of ANOVA and CSA at certain imbalances is achieved in respect of a biased treatment effect. Conclusions Across a range of correlations between pre- and post-treatment scores and at varying levels and direction of baseline imbalance, ANCOVA remains the optimum statistical method for the analysis of continuous outcomes in RCTs, in terms of bias, precision and statistical power. PMID:24712304

  4. An Analysis Pipeline with Statistical and Visualization-Guided Knowledge Discovery for Michigan-Style Learning Classifier Systems

    PubMed Central

    Urbanowicz, Ryan J.; Granizo-Mackenzie, Ambrose; Moore, Jason H.

    2014-01-01

    Michigan-style learning classifier systems (M-LCSs) represent an adaptive and powerful class of evolutionary algorithms which distribute the learned solution over a sizable population of rules. However their application to complex real world data mining problems, such as genetic association studies, has been limited. Traditional knowledge discovery strategies for M-LCS rule populations involve sorting and manual rule inspection. While this approach may be sufficient for simpler problems, the confounding influence of noise and the need to discriminate between predictive and non-predictive attributes calls for additional strategies. Additionally, tests of significance must be adapted to M-LCS analyses in order to make them a viable option within fields that require such analyses to assess confidence. In this work we introduce an M-LCS analysis pipeline that combines uniquely applied visualizations with objective statistical evaluation for the identification of predictive attributes, and reliable rule generalizations in noisy single-step data mining problems. This work considers an alternative paradigm for knowledge discovery in M-LCSs, shifting the focus from individual rules to a global, population-wide perspective. We demonstrate the efficacy of this pipeline applied to the identification of epistasis (i.e., attribute interaction) and heterogeneity in noisy simulated genetic association data. PMID:25431544

  5. Trends in statistical methods in articles published in Archives of Plastic Surgery between 2012 and 2017.

    PubMed

    Han, Kyunghwa; Jung, Inkyung

    2018-05-01

    This review article presents an assessment of trends in statistical methods and an evaluation of their appropriateness in articles published in the Archives of Plastic Surgery (APS) from 2012 to 2017. We reviewed 388 original articles published in APS between 2012 and 2017. We categorized the articles that used statistical methods according to the type of statistical method, the number of statistical methods, and the type of statistical software used. We checked whether there were errors in the description of statistical methods and results. A total of 230 articles (59.3%) published in APS between 2012 and 2017 used one or more statistical method. Within these articles, there were 261 applications of statistical methods with continuous or ordinal outcomes, and 139 applications of statistical methods with categorical outcome. The Pearson chi-square test (17.4%) and the Mann-Whitney U test (14.4%) were the most frequently used methods. Errors in describing statistical methods and results were found in 133 of the 230 articles (57.8%). Inadequate description of P-values was the most common error (39.1%). Among the 230 articles that used statistical methods, 71.7% provided details about the statistical software programs used for the analyses. SPSS was predominantly used in the articles that presented statistical analyses. We found that the use of statistical methods in APS has increased over the last 6 years. It seems that researchers have been paying more attention to the proper use of statistics in recent years. It is expected that these positive trends will continue in APS.

  6. Incorporating an Interactive Statistics Workshop into an Introductory Biology Course-Based Undergraduate Research Experience (CURE) Enhances Students' Statistical Reasoning and Quantitative Literacy Skills.

    PubMed

    Olimpo, Jeffrey T; Pevey, Ryan S; McCabe, Thomas M

    2018-01-01

    Course-based undergraduate research experiences (CUREs) provide an avenue for student participation in authentic scientific opportunities. Within the context of such coursework, students are often expected to collect, analyze, and evaluate data obtained from their own investigations. Yet, limited research has been conducted that examines mechanisms for supporting students in these endeavors. In this article, we discuss the development and evaluation of an interactive statistics workshop that was expressly designed to provide students with an open platform for graduate teaching assistant (GTA)-mentored data processing, statistical testing, and synthesis of their own research findings. Mixed methods analyses of pre/post-intervention survey data indicated a statistically significant increase in students' reasoning and quantitative literacy abilities in the domain, as well as enhancement of student self-reported confidence in and knowledge of the application of various statistical metrics to real-world contexts. Collectively, these data reify an important role for scaffolded instruction in statistics in preparing emergent scientists to be data-savvy researchers in a globally expansive STEM workforce.

  7. Mutual interference between statistical summary perception and statistical learning.

    PubMed

    Zhao, Jiaying; Ngo, Nhi; McKendrick, Ryan; Turk-Browne, Nicholas B

    2011-09-01

    The visual system is an efficient statistician, extracting statistical summaries over sets of objects (statistical summary perception) and statistical regularities among individual objects (statistical learning). Although these two kinds of statistical processing have been studied extensively in isolation, their relationship is not yet understood. We first examined how statistical summary perception influences statistical learning by manipulating the task that participants performed over sets of objects containing statistical regularities (Experiment 1). Participants who performed a summary task showed no statistical learning of the regularities, whereas those who performed control tasks showed robust learning. We then examined how statistical learning influences statistical summary perception by manipulating whether the sets being summarized contained regularities (Experiment 2) and whether such regularities had already been learned (Experiment 3). The accuracy of summary judgments improved when regularities were removed and when learning had occurred in advance. In sum, calculating summary statistics impeded statistical learning, and extracting statistical regularities impeded statistical summary perception. This mutual interference suggests that statistical summary perception and statistical learning are fundamentally related.

  8. Statistical Data Editing in Scientific Articles.

    PubMed

    Habibzadeh, Farrokh

    2017-07-01

    Scientific journals are important scholarly forums for sharing research findings. Editors have important roles in safeguarding standards of scientific publication and should be familiar with correct presentation of results, among other core competencies. Editors do not have access to the raw data and should thus rely on clues in the submitted manuscripts. To identify probable errors, they should look for inconsistencies in presented results. Common statistical problems that can be picked up by a knowledgeable manuscript editor are discussed in this article. Manuscripts should contain a detailed section on statistical analyses of the data. Numbers should be reported with appropriate precisions. Standard error of the mean (SEM) should not be reported as an index of data dispersion. Mean (standard deviation [SD]) and median (interquartile range [IQR]) should be used for description of normally and non-normally distributed data, respectively. If possible, it is better to report 95% confidence interval (CI) for statistics, at least for main outcome variables. And, P values should be presented, and interpreted with caution, if there is a hypothesis. To advance knowledge and skills of their members, associations of journal editors are better to develop training courses on basic statistics and research methodology for non-experts. This would in turn improve research reporting and safeguard the body of scientific evidence. © 2017 The Korean Academy of Medical Sciences.

  9. Statistical analysis of iron geochemical data suggests limited late Proterozoic oxygenation

    NASA Astrophysics Data System (ADS)

    Sperling, Erik A.; Wolock, Charles J.; Morgan, Alex S.; Gill, Benjamin C.; Kunzmann, Marcus; Halverson, Galen P.; MacDonald, Francis A.; Knoll, Andrew H.; Johnston, David T.

    2015-07-01

    Sedimentary rocks deposited across the Proterozoic-Phanerozoic transition record extreme climate fluctuations, a potential rise in atmospheric oxygen or re-organization of the seafloor redox landscape, and the initial diversification of animals. It is widely assumed that the inferred redox change facilitated the observed trends in biodiversity. Establishing this palaeoenvironmental context, however, requires that changes in marine redox structure be tracked by means of geochemical proxies and translated into estimates of atmospheric oxygen. Iron-based proxies are among the most effective tools for tracking the redox chemistry of ancient oceans. These proxies are inherently local, but have global implications when analysed collectively and statistically. Here we analyse about 4,700 iron-speciation measurements from shales 2,300 to 360 million years old. Our statistical analyses suggest that subsurface water masses in mid-Proterozoic oceans were predominantly anoxic and ferruginous (depleted in dissolved oxygen and iron-bearing), but with a tendency towards euxinia (sulfide-bearing) that is not observed in the Neoproterozoic era. Analyses further indicate that early animals did not experience appreciable benthic sulfide stress. Finally, unlike proxies based on redox-sensitive trace-metal abundances, iron geochemical data do not show a statistically significant change in oxygen content through the Ediacaran and Cambrian periods, sharply constraining the magnitude of the end-Proterozoic oxygen increase. Indeed, this re-analysis of trace-metal data is consistent with oxygenation continuing well into the Palaeozoic era. Therefore, if changing redox conditions facilitated animal diversification, it did so through a limited rise in oxygen past critical functional and ecological thresholds, as is seen in modern oxygen minimum zone benthic animal communities.

  10. Wildfire cluster detection using space-time scan statistics

    NASA Astrophysics Data System (ADS)

    Tonini, M.; Tuia, D.; Ratle, F.; Kanevski, M.

    2009-04-01

    The aim of the present study is to identify spatio-temporal clusters of fires sequences using space-time scan statistics. These statistical methods are specifically designed to detect clusters and assess their significance. Basically, scan statistics work by comparing a set of events occurring inside a scanning window (or a space-time cylinder for spatio-temporal data) with those that lie outside. Windows of increasing size scan the zone across space and time: the likelihood ratio is calculated for each window (comparing the ratio "observed cases over expected" inside and outside): the window with the maximum value is assumed to be the most probable cluster, and so on. Under the null hypothesis of spatial and temporal randomness, these events are distributed according to a known discrete-state random process (Poisson or Bernoulli), which parameters can be estimated. Given this assumption, it is possible to test whether or not the null hypothesis holds in a specific area. In order to deal with fires data, the space-time permutation scan statistic has been applied since it does not require the explicit specification of the population-at risk in each cylinder. The case study is represented by Florida daily fire detection using the Moderate Resolution Imaging Spectroradiometer (MODIS) active fire product during the period 2003-2006. As result, statistically significant clusters have been identified. Performing the analyses over the entire frame period, three out of the five most likely clusters have been identified in the forest areas, on the North of the country; the other two clusters cover a large zone in the South, corresponding to agricultural land and the prairies in the Everglades. Furthermore, the analyses have been performed separately for the four years to analyze if the wildfires recur each year during the same period. It emerges that clusters of forest fires are more frequent in hot seasons (spring and summer), while in the South areas they are widely

  11. Statistical Parametric Mapping to Identify Differences between Consensus-Based Joint Patterns during Gait in Children with Cerebral Palsy.

    PubMed

    Nieuwenhuys, Angela; Papageorgiou, Eirini; Desloovere, Kaat; Molenaers, Guy; De Laet, Tinne

    2017-01-01

    Experts recently identified 49 joint motion patterns in children with cerebral palsy during a Delphi consensus study. Pattern definitions were therefore the result of subjective expert opinion. The present study aims to provide objective, quantitative data supporting the identification of these consensus-based patterns. To do so, statistical parametric mapping was used to compare the mean kinematic waveforms of 154 trials of typically developing children (n = 56) to the mean kinematic waveforms of 1719 trials of children with cerebral palsy (n = 356), which were classified following the classification rules of the Delphi study. Three hypotheses stated that: (a) joint motion patterns with 'no or minor gait deviations' (n = 11 patterns) do not differ significantly from the gait pattern of typically developing children; (b) all other pathological joint motion patterns (n = 38 patterns) differ from typically developing gait and the locations of difference within the gait cycle, highlighted by statistical parametric mapping, concur with the consensus-based classification rules. (c) all joint motion patterns at the level of each joint (n = 49 patterns) differ from each other during at least one phase of the gait cycle. Results showed that: (a) ten patterns with 'no or minor gait deviations' differed somewhat unexpectedly from typically developing gait, but these differences were generally small (≤3°); (b) all other joint motion patterns (n = 38) differed from typically developing gait and the significant locations within the gait cycle that were indicated by the statistical analyses, coincided well with the classification rules; (c) joint motion patterns at the level of each joint significantly differed from each other, apart from two sagittal plane pelvic patterns. In addition to these results, for several joints, statistical analyses indicated other significant areas during the gait cycle that were not included in the pattern definitions of the consensus study

  12. Meta-analyses of Adverse Effects Data Derived from Randomised Controlled Trials as Compared to Observational Studies: Methodological Overview

    PubMed Central

    Golder, Su; Loke, Yoon K.; Bland, Martin

    2011-01-01

    Background There is considerable debate as to the relative merits of using randomised controlled trial (RCT) data as opposed to observational data in systematic reviews of adverse effects. This meta-analysis of meta-analyses aimed to assess the level of agreement or disagreement in the estimates of harm derived from meta-analysis of RCTs as compared to meta-analysis of observational studies. Methods and Findings Searches were carried out in ten databases in addition to reference checking, contacting experts, citation searches, and hand-searching key journals, conference proceedings, and Web sites. Studies were included where a pooled relative measure of an adverse effect (odds ratio or risk ratio) from RCTs could be directly compared, using the ratio of odds ratios, with the pooled estimate for the same adverse effect arising from observational studies. Nineteen studies, yielding 58 meta-analyses, were identified for inclusion. The pooled ratio of odds ratios of RCTs compared to observational studies was estimated to be 1.03 (95% confidence interval 0.93–1.15). There was less discrepancy with larger studies. The symmetric funnel plot suggests that there is no consistent difference between risk estimates from meta-analysis of RCT data and those from meta-analysis of observational studies. In almost all instances, the estimates of harm from meta-analyses of the different study designs had 95% confidence intervals that overlapped (54/58, 93%). In terms of statistical significance, in nearly two-thirds (37/58, 64%), the results agreed (both studies showing a significant increase or significant decrease or both showing no significant difference). In only one meta-analysis about one adverse effect was there opposing statistical significance. Conclusions Empirical evidence from this overview indicates that there is no difference on average in the risk estimate of adverse effects of an intervention derived from meta-analyses of RCTs and meta-analyses of observational

  13. Premature death of adult adoptees: analyses of a case-cohort sample.

    PubMed

    Petersen, Liselotte; Andersen, Per Kragh; Sørensen, Thorkild I A

    2005-05-01

    Genetic and environmental influence on risk of premature death in adulthood was investigated by estimating the associations in total and cause-specific mortality of adult Danish adoptees and their biological and adoptive parents. Among all 14,425 non-familial adoptions formally granted in Denmark during the period 1924 through 1947, we selected the study population according to a case-cohort sampling design. As the case-control design, the case-cohort design has the advantage of economic data collection and little loss in statistical efficiency, but the case-cohort sample has the additional advantages that rate ratio estimates may be obtained, and re-use of the cohort sample in future studies of other outcomes is possible. Analyses were performed using Kalbfleisch and Lawless's estimator for hazard ratio, and robust estimation for variances. In the main analyses the sample was restricted to birth years of the adoptees 1924 and after, and age of transfer to the adoptive parents before 7 years, and age at death was restricted to 16 to 70 years. The results showed a higher mortality among adoptees, whose biological parents died in the age range of 16 to 70 years; this was significant for deaths from natural causes, vascular causes and all causes. No influence was seen from early death of adoptive parents, regardless of cause of death. (c) 2005 Wiley-Liss, Inc.

  14. 75 FR 24718 - Guidance for Industry on Documenting Statistical Analysis Programs and Data Files; Availability

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-05-05

    ...] Guidance for Industry on Documenting Statistical Analysis Programs and Data Files; Availability AGENCY... Programs and Data Files.'' This guidance is provided to inform study statisticians of recommendations for documenting statistical analyses and data files submitted to the Center for Veterinary Medicine (CVM) for the...

  15. Teaching Biology through Statistics: Application of Statistical Methods in Genetics and Zoology Courses

    PubMed Central

    Colon-Berlingeri, Migdalisel; Burrowes, Patricia A.

    2011-01-01

    Incorporation of mathematics into biology curricula is critical to underscore for undergraduate students the relevance of mathematics to most fields of biology and the usefulness of developing quantitative process skills demanded in modern biology. At our institution, we have made significant changes to better integrate mathematics into the undergraduate biology curriculum. The curricular revision included changes in the suggested course sequence, addition of statistics and precalculus as prerequisites to core science courses, and incorporating interdisciplinary (math–biology) learning activities in genetics and zoology courses. In this article, we describe the activities developed for these two courses and the assessment tools used to measure the learning that took place with respect to biology and statistics. We distinguished the effectiveness of these learning opportunities in helping students improve their understanding of the math and statistical concepts addressed and, more importantly, their ability to apply them to solve a biological problem. We also identified areas that need emphasis in both biology and mathematics courses. In light of our observations, we recommend best practices that biology and mathematics academic departments can implement to train undergraduates for the demands of modern biology. PMID:21885822

  16. Teaching biology through statistics: application of statistical methods in genetics and zoology courses.

    PubMed

    Colon-Berlingeri, Migdalisel; Burrowes, Patricia A

    2011-01-01

    Incorporation of mathematics into biology curricula is critical to underscore for undergraduate students the relevance of mathematics to most fields of biology and the usefulness of developing quantitative process skills demanded in modern biology. At our institution, we have made significant changes to better integrate mathematics into the undergraduate biology curriculum. The curricular revision included changes in the suggested course sequence, addition of statistics and precalculus as prerequisites to core science courses, and incorporating interdisciplinary (math-biology) learning activities in genetics and zoology courses. In this article, we describe the activities developed for these two courses and the assessment tools used to measure the learning that took place with respect to biology and statistics. We distinguished the effectiveness of these learning opportunities in helping students improve their understanding of the math and statistical concepts addressed and, more importantly, their ability to apply them to solve a biological problem. We also identified areas that need emphasis in both biology and mathematics courses. In light of our observations, we recommend best practices that biology and mathematics academic departments can implement to train undergraduates for the demands of modern biology.

  17. Fundamentals and Catalytic Innovation: The Statistical and Data Management Center of the Antibacterial Resistance Leadership Group

    PubMed Central

    Huvane, Jacqueline; Komarow, Lauren; Hill, Carol; Tran, Thuy Tien T.; Pereira, Carol; Rosenkranz, Susan L.; Finnemeyer, Matt; Earley, Michelle; Jiang, Hongyu (Jeanne); Wang, Rui; Lok, Judith

    2017-01-01

    Abstract The Statistical and Data Management Center (SDMC) provides the Antibacterial Resistance Leadership Group (ARLG) with statistical and data management expertise to advance the ARLG research agenda. The SDMC is active at all stages of a study, including design; data collection and monitoring; data analyses and archival; and publication of study results. The SDMC enhances the scientific integrity of ARLG studies through the development and implementation of innovative and practical statistical methodologies and by educating research colleagues regarding the application of clinical trial fundamentals. This article summarizes the challenges and roles, as well as the innovative contributions in the design, monitoring, and analyses of clinical trials and diagnostic studies, of the ARLG SDMC. PMID:28350899

  18. Power considerations for λ inflation factor in meta-analyses of genome-wide association studies.

    PubMed

    Georgiopoulos, Georgios; Evangelou, Evangelos

    2016-05-19

    The genomic control (GC) approach is extensively used to effectively control false positive signals due to population stratification in genome-wide association studies (GWAS). However, GC affects the statistical power of GWAS. The loss of power depends on the magnitude of the inflation factor (λ) that is used for GC. We simulated meta-analyses of different GWAS. Minor allele frequency (MAF) ranged from 0·001 to 0·5 and λ was sampled from two scenarios: (i) random scenario (empirically-derived distribution of real λ values) and (ii) selected scenario from simulation parameter modification. Adjustment for λ was considered under single correction (within study corrected standard errors) and double correction (additional λ corrected summary estimate). MAF was a pivotal determinant of observed power. In random λ scenario, double correction induced a symmetric power reduction in comparison to single correction. For MAF 1·2 and MAF >5%. Our results provide a quick but detailed index for power considerations of future meta-analyses of GWAS that enables a more flexible design from early steps based on the number of studies accumulated in different groups and the λ values observed in the single studies.

  19. Statistically Modeling Individual Students' Learning over Successive Collaborative Practice Opportunities

    ERIC Educational Resources Information Center

    Olsen, Jennifer; Aleven, Vincent; Rummel, Nikol

    2017-01-01

    Within educational data mining, many statistical models capture the learning of students working individually. However, not much work has been done to extend these statistical models of individual learning to a collaborative setting, despite the effectiveness of collaborative learning activities. We extend a widely used model (the additive factors…

  20. Statistical strategies to quantify respiratory sinus arrhythmia: Are commonly used metrics equivalent?

    PubMed Central

    Lewis, Gregory F.; Furman, Senta A.; McCool, Martha F.; Porges, Stephen W.

    2011-01-01

    Three frequently used RSA metrics are investigated to document violations of assumptions for parametric analyses, moderation by respiration, influences of nonstationarity, and sensitivity to vagal blockade. Although all metrics are highly correlated, new findings illustrate that the metrics are noticeably different on the above dimensions. Only one method conforms to the assumptions for parametric analyses, is not moderated by respiration, is not influenced by nonstationarity, and reliably generates stronger effect sizes. Moreover, this method is also the most sensitive to vagal blockade. Specific features of this method may provide insights into improving the statistical characteristics of other commonly used RSA metrics. These data provide the evidence to question, based on statistical grounds, published reports using particular metrics of RSA. PMID:22138367

  1. Mass spectrometry-based protein identification with accurate statistical significance assignment.

    PubMed

    Alves, Gelio; Yu, Yi-Kuo

    2015-03-01

    Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sorić formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.

  2. Methodological difficulties of conducting agroecological studies from a statistical perspective

    USDA-ARS?s Scientific Manuscript database

    Statistical methods for analysing agroecological data might not be able to help agroecologists to solve all of the current problems concerning crop and animal husbandry, but such methods could well help agroecologists to assess, tackle, and resolve several agroecological issues in a more reliable an...

  3. Pooling sexes when assessing ground reaction forces during walking: Statistical Parametric Mapping versus traditional approach.

    PubMed

    Castro, Marcelo P; Pataky, Todd C; Sole, Gisela; Vilas-Boas, Joao Paulo

    2015-07-16

    Ground reaction force (GRF) data from men and women are commonly pooled for analyses. However, it may not be justifiable to pool sexes on the basis of discrete parameters extracted from continuous GRF gait waveforms because this can miss continuous effects. Forty healthy participants (20 men and 20 women) walked at a cadence of 100 steps per minute across two force plates, recording GRFs. Two statistical methods were used to test the null hypothesis of no mean GRF differences between sexes: (i) Statistical Parametric Mapping-using the entire three-component GRF waveform; and (ii) traditional approach-using the first and second vertical GRF peaks. Statistical Parametric Mapping results suggested large sex differences, which post-hoc analyses suggested were due predominantly to higher anterior-posterior and vertical GRFs in early stance in women compared to men. Statistically significant differences were observed for the first GRF peak and similar values for the second GRF peak. These contrasting results emphasise that different parts of the waveform have different signal strengths and thus that one may use the traditional approach to choose arbitrary metrics and make arbitrary conclusions. We suggest that researchers and clinicians consider both the entire gait waveforms and sex-specificity when analysing GRF data. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. NMC stratospheric analyses during the 1987 Antarctic expedition

    NASA Technical Reports Server (NTRS)

    Gelman, Melvyn E.; Newman, Paul A.

    1988-01-01

    Stratospheric constant pressure analyses of geopotential height and temperature, produced as part of regular operations at the National Meteorological Center (NMC), were used by several participants of the Antarctic Ozone Expedition. A brief decription is given of the NMC stratospheric analyses and the data that are used to derive them. In addition, comparisons of the analysis values at the locations of radiosonde and aircraft data are presented to provide indications for assessing the representativeness of the NMC stratospheric analyses during the 1987 Antarctic winter-spring period.

  5. The Relationship between Visual Analysis and Five Statistical Analyses in a Simple AB Single-Case Research Design

    ERIC Educational Resources Information Center

    Brossart, Daniel F.; Parker, Richard I.; Olson, Elizabeth A.; Mahadevan, Lakshmi

    2006-01-01

    This study explored some practical issues for single-case researchers who rely on visual analysis of graphed data, but who also may consider supplemental use of promising statistical analysis techniques. The study sought to answer three major questions: (a) What is a typical range of effect sizes from these analytic techniques for data from…

  6. WAIS-IV subtest covariance structure: conceptual and statistical considerations.

    PubMed

    Ward, L Charles; Bergman, Maria A; Hebert, Katina R

    2012-06-01

    D. Wechsler (2008b) reported confirmatory factor analyses (CFAs) with standardization data (ages 16-69 years) for 10 core and 5 supplemental subtests from the Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV). Analyses of the 15 subtests supported 4 hypothesized oblique factors (Verbal Comprehension, Working Memory, Perceptual Reasoning, and Processing Speed) but also revealed unexplained covariance between Block Design and Visual Puzzles (Perceptual Reasoning subtests). That covariance was not included in the final models. Instead, a path was added from Working Memory to Figure Weights (Perceptual Reasoning subtest) to improve fit and achieve a desired factor pattern. The present research with the same data (N = 1,800) showed that the path from Working Memory to Figure Weights increases the association between Working Memory and Matrix Reasoning. Specifying both paths improves model fit and largely eliminates unexplained covariance between Block Design and Visual Puzzles but with the undesirable consequence that Figure Weights and Matrix Reasoning are equally determined by Perceptual Reasoning and Working Memory. An alternative 4-factor model was proposed that explained theory-implied covariance between Block Design and Visual Puzzles and between Arithmetic and Figure Weights while maintaining compatibility with WAIS-IV Index structure. The proposed model compared favorably with a 5-factor model based on Cattell-Horn-Carroll theory. The present findings emphasize that covariance model comparisons should involve considerations of conceptual coherence and theoretical adherence in addition to statistical fit. (c) 2012 APA, all rights reserved

  7. 2016 Workplace and Gender Relations Survey of Active Duty Members: Statistical Methodology Report

    DTIC Science & Technology

    2017-03-01

    2016 Workplace and Gender Relations Survey of Active Duty Members Statistical Methodology Report Additional copies of this report may be...MEMBERS: STATISTICAL METHODOLOGY REPORT Office of People Analytics (OPA) Defense Research, Surveys, and Statistics Center 4800 Mark Center Drive...20 1 2016 WORKPLACE AND GENDER RELATIONS SURVEY OF ACTIVE DUTY MEMBERS: STATISTICAL METHODOLOGY REPORT

  8. Optimal allocation of testing resources for statistical simulations

    NASA Astrophysics Data System (ADS)

    Quintana, Carolina; Millwater, Harry R.; Singh, Gulshan; Golden, Patrick

    2015-07-01

    Statistical estimates from simulation involve uncertainty caused by the variability in the input random variables due to limited data. Allocating resources to obtain more experimental data of the input variables to better characterize their probability distributions can reduce the variance of statistical estimates. The methodology proposed determines the optimal number of additional experiments required to minimize the variance of the output moments given single or multiple constraints. The method uses multivariate t-distribution and Wishart distribution to generate realizations of the population mean and covariance of the input variables, respectively, given an amount of available data. This method handles independent and correlated random variables. A particle swarm method is used for the optimization. The optimal number of additional experiments per variable depends on the number and variance of the initial data, the influence of the variable in the output function and the cost of each additional experiment. The methodology is demonstrated using a fretting fatigue example.

  9. Extraction of information from major element chemical analyses of lunar basalts

    NASA Technical Reports Server (NTRS)

    Butler, J. C.

    1985-01-01

    Major element chemical analyses often form the framework within which similarities and differences of analyzed specimens are noted and used to propose or devise models. When percentages are formed the ratios of pairs of components are preserved whereas many familiar statistical and geometrical descriptors are likely to exhibit major changes. This ratio preserving aspect forms the basis for a proposed framework. An analysis of compositional variability within the data set of 42 major element analyses of lunar reference samples was selected to investigate this proposal.

  10. Covariate adjustment of event histories estimated from Markov chains: the additive approach.

    PubMed

    Aalen, O O; Borgan, O; Fekjaer, H

    2001-12-01

    Markov chain models are frequently used for studying event histories that include transitions between several states. An empirical transition matrix for nonhomogeneous Markov chains has previously been developed, including a detailed statistical theory based on counting processes and martingales. In this article, we show how to estimate transition probabilities dependent on covariates. This technique may, e.g., be used for making estimates of individual prognosis in epidemiological or clinical studies. The covariates are included through nonparametric additive models on the transition intensities of the Markov chain. The additive model allows for estimation of covariate-dependent transition intensities, and again a detailed theory exists based on counting processes. The martingale setting now allows for a very natural combination of the empirical transition matrix and the additive model, resulting in estimates that can be expressed as stochastic integrals, and hence their properties are easily evaluated. Two medical examples will be given. In the first example, we study how the lung cancer mortality of uranium miners depends on smoking and radon exposure. In the second example, we study how the probability of being in response depends on patient group and prophylactic treatment for leukemia patients who have had a bone marrow transplantation. A program in R and S-PLUS that can carry out the analyses described here has been developed and is freely available on the Internet.

  11. Impact of genotyping errors on statistical power of association tests in genomic analyses: A case study

    PubMed Central

    Hou, Lin; Sun, Ning; Mane, Shrikant; Sayward, Fred; Rajeevan, Nallakkandi; Cheung, Kei-Hoi; Cho, Kelly; Pyarajan, Saiju; Aslan, Mihaela; Miller, Perry; Harvey, Philip D.; Gaziano, J. Michael; Concato, John; Zhao, Hongyu

    2017-01-01

    A key step in genomic studies is to assess high throughput measurements across millions of markers for each participant’s DNA, either using microarrays or sequencing techniques. Accurate genotype calling is essential for downstream statistical analysis of genotype-phenotype associations, and next generation sequencing (NGS) has recently become a more common approach in genomic studies. How the accuracy of variant calling in NGS-based studies affects downstream association analysis has not, however, been studied using empirical data in which both microarrays and NGS were available. In this article, we investigate the impact of variant calling errors on the statistical power to identify associations between single nucleotides and disease, and on associations between multiple rare variants and disease. Both differential and nondifferential genotyping errors are considered. Our results show that the power of burden tests for rare variants is strongly influenced by the specificity in variant calling, but is rather robust with regard to sensitivity. By using the variant calling accuracies estimated from a substudy of a Cooperative Studies Program project conducted by the Department of Veterans Affairs, we show that the power of association tests is mostly retained with commonly adopted variant calling pipelines. An R package, GWAS.PC, is provided to accommodate power analysis that takes account of genotyping errors (http://zhaocenter.org/software/). PMID:28019059

  12. The Australasian Resuscitation in Sepsis Evaluation (ARISE) trial statistical analysis plan.

    PubMed

    Delaney, Anthony P; Peake, Sandra L; Bellomo, Rinaldo; Cameron, Peter; Holdgate, Anna; Howe, Belinda; Higgins, Alisa; Presneill, Jeffrey; Webb, Steve

    2013-09-01

    The Australasian Resuscitation in Sepsis Evaluation (ARISE) study is an international, multicentre, randomised, controlled trial designed to evaluate the effectiveness of early goal-directed therapy compared with standard care for patients presenting to the emergency department with severe sepsis. In keeping with current practice, and considering aspects of trial design and reporting specific to non-pharmacological interventions, our plan outlines the principles and methods for analysing and reporting the trial results. The document is prepared before completion of recruitment into the ARISE study, without knowledge of the results of the interim analysis conducted by the data safety and monitoring committee and before completion of the two related international studies. Our statistical analysis plan was designed by the ARISE chief investigators, and reviewed and approved by the ARISE steering committee. We reviewed the data collected by the research team as specified in the study protocol and detailed in the study case report form. We describe information related to baseline characteristics, characteristics of delivery of the trial interventions, details of resuscitation, other related therapies and other relevant data with appropriate comparisons between groups. We define the primary, secondary and tertiary outcomes for the study, with description of the planned statistical analyses. We have developed a statistical analysis plan with a trial profile, mock-up tables and figures. We describe a plan for presenting baseline characteristics, microbiological and antibiotic therapy, details of the interventions, processes of care and concomitant therapies and adverse events. We describe the primary, secondary and tertiary outcomes with identification of subgroups to be analysed. We have developed a statistical analysis plan for the ARISE study, available in the public domain, before the completion of recruitment into the study. This will minimise analytical bias and

  13. ResidPlots-2: Computer Software for IRT Graphical Residual Analyses

    ERIC Educational Resources Information Center

    Liang, Tie; Han, Kyung T.; Hambleton, Ronald K.

    2009-01-01

    This article discusses the ResidPlots-2, a computer software that provides a powerful tool for IRT graphical residual analyses. ResidPlots-2 consists of two components: a component for computing residual statistics and another component for communicating with users and for plotting the residual graphs. The features of the ResidPlots-2 software are…

  14. Incorporating an Interactive Statistics Workshop into an Introductory Biology Course-Based Undergraduate Research Experience (CURE) Enhances Students’ Statistical Reasoning and Quantitative Literacy Skills †

    PubMed Central

    Olimpo, Jeffrey T.; Pevey, Ryan S.; McCabe, Thomas M.

    2018-01-01

    Course-based undergraduate research experiences (CUREs) provide an avenue for student participation in authentic scientific opportunities. Within the context of such coursework, students are often expected to collect, analyze, and evaluate data obtained from their own investigations. Yet, limited research has been conducted that examines mechanisms for supporting students in these endeavors. In this article, we discuss the development and evaluation of an interactive statistics workshop that was expressly designed to provide students with an open platform for graduate teaching assistant (GTA)-mentored data processing, statistical testing, and synthesis of their own research findings. Mixed methods analyses of pre/post-intervention survey data indicated a statistically significant increase in students’ reasoning and quantitative literacy abilities in the domain, as well as enhancement of student self-reported confidence in and knowledge of the application of various statistical metrics to real-world contexts. Collectively, these data reify an important role for scaffolded instruction in statistics in preparing emergent scientists to be data-savvy researchers in a globally expansive STEM workforce. PMID:29904549

  15. A preliminary study of the statistical analyses and sampling strategies associated with the integration of remote sensing capabilities into the current agricultural crop forecasting system

    NASA Technical Reports Server (NTRS)

    Sand, F.; Christie, R.

    1975-01-01

    Extending the crop survey application of remote sensing from small experimental regions to state and national levels requires that a sample of agricultural fields be chosen for remote sensing of crop acreage, and that a statistical estimate be formulated with measurable characteristics. The critical requirements for the success of the application are reviewed in this report. The problem of sampling in the presence of cloud cover is discussed. Integration of remotely sensed information about crops into current agricultural crop forecasting systems is treated on the basis of the USDA multiple frame survey concepts, with an assumed addition of a new frame derived from remote sensing. Evolution of a crop forecasting system which utilizes LANDSAT and future remote sensing systems is projected for the 1975-1990 time frame.

  16. Using Data Mining to Teach Applied Statistics and Correlation

    ERIC Educational Resources Information Center

    Hartnett, Jessica L.

    2016-01-01

    This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for…

  17. Experimental and statistical analyses to characterize in-vehicle fine particulate matter behavior inside public transit buses operating on B20-grade biodiesel fuel

    NASA Astrophysics Data System (ADS)

    Vijayan, Abhilash; Kumar, Ashok

    2010-11-01

    This paper presents results from an in-vehicle air quality study of public transit buses in Toledo, Ohio, involving continuous monitoring, and experimental and statistical analyses to understand in-vehicle particulate matter (PM) behavior inside buses operating on B20-grade biodiesel fuel. The study also focused on evaluating the effects of vehicle's fuel type, operating periods, operation status, passenger counts, traffic conditions, and the seasonal and meteorological variation on particulates with aerodynamic diameter less than 1 micron (PM 1.0). The study found that the average PM 1.0 mass concentrations in B20-grade biodiesel-fueled bus compartments were approximately 15 μg m -3, while PM 2.5 and PM 10 concentration averages were approximately 19 μg m -3 and 37 μg m -3, respectively. It was also observed that average hourly concentration trends of PM 1.0 and PM 2.5 followed a "μ-shaped" pattern during transit hours. Experimental analyses revealed that the in-vehicle PM 1.0 mass concentrations were higher inside diesel-fueled buses (10.0-71.0 μg m -3 with a mean of 31.8 μg m -3) as compared to biodiesel buses (3.3-33.5 μg m -3 with a mean of 15.3 μg m -3) when the windows were kept open. Vehicle idling conditions and open door status were found to facilitate smaller particle concentrations inside the cabin, while closed door facilitated larger particle concentrations suggesting that smaller particles were originating outside the vehicle and larger particles were formed within the cabin, potentially from passenger activity. The study also found that PM 1.0 mass concentrations at the back of bus compartment (5.7-39.1 μg m -3 with a mean of 28.3 μg m -3) were higher than the concentrations in the front (5.7-25.9 μg m -3 with a mean of 21.9 μg m -3), and the mass concentrations inside the bus compartment were generally 30-70% lower than the just-outside concentrations. Further, bus route, window position, and time of day were found to affect the in

  18. Statistical inference for the additive hazards model under outcome-dependent sampling.

    PubMed

    Yu, Jichang; Liu, Yanyan; Sandler, Dale P; Zhou, Haibo

    2015-09-01

    Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.

  19. Extreme-value statistics reveal rare failure-critical defects in additive manufacturing

    DOE PAGES

    Boyce, Brad L.; Salzbrenner, Bradley C.; Rodelas, Jeffrey M.; ...

    2017-04-21

    Additive manufacturing enables the rapid, cost effective production of large populations of material test coupons such as tensile bars. By adopting streamlined test methods including ‘drop-in’ grips and non-contact extensometry, testing these large populations becomes more efficient. Unlike hardness tests, the tensile test provides a direct measure of yield strength, flow properties, and ductility, which can be directly incorporated into solid mechanics simulations. In the present work, over 1000 nominally identical tensile tests were used to explore the effect of process variability on the mechanical property distributions of a precipitation hardened stainless steel, 17-4PH, produced by a laser powder bedmore » fusion process, also known as direct metal laser sintering. With this large dataset, rare defects are revealed that affect only ~2% of the population, stemming from a single build lot of material. Lastly, the rare defects caused a substantial loss in ductility and were associated with an interconnected network of porosity.« less

  20. Statistical inference for the additive hazards model under outcome-dependent sampling

    PubMed Central

    Yu, Jichang; Liu, Yanyan; Sandler, Dale P.; Zhou, Haibo

    2015-01-01

    Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer. PMID:26379363

  1. Statistical universals reveal the structures and functions of human music.

    PubMed

    Savage, Patrick E; Brown, Steven; Sakai, Emi; Currie, Thomas E

    2015-07-21

    Music has been called "the universal language of mankind." Although contemporary theories of music evolution often invoke various musical universals, the existence of such universals has been disputed for decades and has never been empirically demonstrated. Here we combine a music-classification scheme with statistical analyses, including phylogenetic comparative methods, to examine a well-sampled global set of 304 music recordings. Our analyses reveal no absolute universals but strong support for many statistical universals that are consistent across all nine geographic regions sampled. These universals include 18 musical features that are common individually as well as a network of 10 features that are commonly associated with one another. They span not only features related to pitch and rhythm that are often cited as putative universals but also rarely cited domains including performance style and social context. These cross-cultural structural regularities of human music may relate to roles in facilitating group coordination and cohesion, as exemplified by the universal tendency to sing, play percussion instruments, and dance to simple, repetitive music in groups. Our findings highlight the need for scientists studying music evolution to expand the range of musical cultures and musical features under consideration. The statistical universals we identified represent important candidates for future investigation.

  2. Statistical universals reveal the structures and functions of human music

    PubMed Central

    Savage, Patrick E.; Brown, Steven; Sakai, Emi; Currie, Thomas E.

    2015-01-01

    Music has been called “the universal language of mankind.” Although contemporary theories of music evolution often invoke various musical universals, the existence of such universals has been disputed for decades and has never been empirically demonstrated. Here we combine a music-classification scheme with statistical analyses, including phylogenetic comparative methods, to examine a well-sampled global set of 304 music recordings. Our analyses reveal no absolute universals but strong support for many statistical universals that are consistent across all nine geographic regions sampled. These universals include 18 musical features that are common individually as well as a network of 10 features that are commonly associated with one another. They span not only features related to pitch and rhythm that are often cited as putative universals but also rarely cited domains including performance style and social context. These cross-cultural structural regularities of human music may relate to roles in facilitating group coordination and cohesion, as exemplified by the universal tendency to sing, play percussion instruments, and dance to simple, repetitive music in groups. Our findings highlight the need for scientists studying music evolution to expand the range of musical cultures and musical features under consideration. The statistical universals we identified represent important candidates for future investigation. PMID:26124105

  3. Shielding Analyses for VISION Beam Line at SNS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Popova, Irina; Gallmeier, Franz X

    2014-01-01

    Full-scale neutron and gamma transport analyses were performed to design shielding around the VISION beam line, instrument shielding enclosure, beam stop, secondary shutter including a temporary beam stop for the still closed neighboring beam line to meet requirement is to achieve dose rates below 0.25 mrem/h at 30 cm from the shielding surface. The beam stop and the temporary beam stop analyses were performed with the discrete ordinate code DORT additionally to Monte Carlo analyses with the MCNPX code. Comparison of the results is presented.

  4. Statistical analysis of the determinations of the Sun's Galactocentric distance

    NASA Astrophysics Data System (ADS)

    Malkin, Zinovy

    2013-02-01

    Based on several tens of R0 measurements made during the past two decades, several studies have been performed to derive the best estimate of R0. Some used just simple averaging to derive a result, whereas others provided comprehensive analyses of possible errors in published results. In either case, detailed statistical analyses of data used were not performed. However, a computation of the best estimates of the Galactic rotation constants is not only an astronomical but also a metrological task. Here we perform an analysis of 53 R0 measurements (published in the past 20 years) to assess the consistency of the data. Our analysis shows that they are internally consistent. It is also shown that any trend in the R0 estimates from the last 20 years is statistically negligible, which renders the presence of a bandwagon effect doubtful. On the other hand, the formal errors in the published R0 estimates improve significantly with time.

  5. Methods for meta-analysis of multiple traits using GWAS summary statistics.

    PubMed

    Ray, Debashree; Boehnke, Michael

    2018-03-01

    Genome-wide association studies (GWAS) for complex diseases have focused primarily on single-trait analyses for disease status and disease-related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL-cholesterol, HDL-cholesterol, and triglycerides (TGs) separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed that require individual-level data. Here, we develop metaUSAT (where USAT is unified score-based association test), a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. Although the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual-level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic P-value for association and is computationally efficient for implementation at a genome-wide level. Simulation experiments show that metaUSAT maintains proper type-I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D-GENES studies, metaUSAT detected genome-wide significant loci beyond the ones identified by univariate analyses

  6. Statistical mechanics of economics I

    NASA Astrophysics Data System (ADS)

    Kusmartsev, F. V.

    2011-02-01

    We show that statistical mechanics is useful in the description of financial crisis and economics. Taking a large amount of instant snapshots of a market over an interval of time we construct their ensembles and study their statistical interference. This results in a probability description of the market and gives capital, money, income, wealth and debt distributions, which in the most cases takes the form of the Bose-Einstein distribution. In addition, statistical mechanics provides the main market equations and laws which govern the correlations between the amount of money, debt, product, prices and number of retailers. We applied the found relations to a study of the evolution of the economics in USA between the years 1996 to 2008 and observe that over that time the income of a major population is well described by the Bose-Einstein distribution which parameters are different for each year. Each financial crisis corresponds to a peak in the absolute activity coefficient. The analysis correctly indicates the past crises and predicts the future one.

  7. Semi-Poisson statistics in quantum chaos.

    PubMed

    García-García, Antonio M; Wang, Jiao

    2006-03-01

    We investigate the quantum properties of a nonrandom Hamiltonian with a steplike singularity. It is shown that the eigenfunctions are multifractals and, in a certain range of parameters, the level statistics is described exactly by semi-Poisson statistics (SP) typical of pseudointegrable systems. It is also shown that our results are universal, namely, they depend exclusively on the presence of the steplike singularity and are not modified by smooth perturbations of the potential or the addition of a magnetic flux. Although the quantum properties of our system are similar to those of a disordered conductor at the Anderson transition, we report important quantitative differences in both the level statistics and the multifractal dimensions controlling the transition. Finally, the study of quantum transport properties suggests that the classical singularity induces quantum anomalous diffusion. We discuss how these findings may be experimentally corroborated by using ultracold atoms techniques.

  8. Metagenomic analyses of the late Pleistocene permafrost - additional tools for reconstruction of environmental conditions

    NASA Astrophysics Data System (ADS)

    Rivkina, Elizaveta; Petrovskaya, Lada; Vishnivetskaya, Tatiana; Krivushin, Kirill; Shmakova, Lyubov; Tutukina, Maria; Meyers, Arthur; Kondrashov, Fyodor

    2016-04-01

    A comparative analysis of the metagenomes from two 30 000-year-old permafrost samples, one of lake-alluvial origin and the other from late Pleistocene Ice Complex sediments, revealed significant differences within microbial communities. The late Pleistocene Ice Complex sediments (which have been characterized by the absence of methane with lower values of redox potential and Fe2+ content) showed a low abundance of methanogenic archaea and enzymes from both the carbon and nitrogen cycles, but a higher abundance of enzymes associated with the sulfur cycle. The metagenomic and geochemical analyses described in the paper provide evidence that the formation of the sampled late Pleistocene Ice Complex sediments likely took place under much more aerobic conditions than lake-alluvial sediments.

  9. Median statistics estimates of Hubble and Newton's constants

    NASA Astrophysics Data System (ADS)

    Bethapudi, Suryarao; Desai, Shantanu

    2017-02-01

    Robustness of any statistics depends upon the number of assumptions it makes about the measured data. We point out the advantages of median statistics using toy numerical experiments and demonstrate its robustness, when the number of assumptions we can make about the data are limited. We then apply the median statistics technique to obtain estimates of two constants of nature, Hubble constant (H0) and Newton's gravitational constant ( G , both of which show significant differences between different measurements. For H0, we update the analyses done by Chen and Ratra (2011) and Gott et al. (2001) using 576 measurements. We find after grouping the different results according to their primary type of measurement, the median estimates are given by H0 = 72.5^{+2.5}_{-8} km/sec/Mpc with errors corresponding to 95% c.l. (2 σ) and G=6.674702^{+0.0014}_{-0.0009} × 10^{-11} Nm2kg-2 corresponding to 68% c.l. (1σ).

  10. Transfusion Indication Threshold Reduction (TITRe2) randomized controlled trial in cardiac surgery: statistical analysis plan.

    PubMed

    Pike, Katie; Nash, Rachel L; Murphy, Gavin J; Reeves, Barnaby C; Rogers, Chris A

    2015-02-22

    The Transfusion Indication Threshold Reduction (TITRe2) trial is the largest randomized controlled trial to date to compare red blood cell transfusion strategies following cardiac surgery. This update presents the statistical analysis plan, detailing how the study will be analyzed and presented. The statistical analysis plan has been written following recommendations from the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use, prior to database lock and the final analysis of trial data. Outlined analyses are in line with the Consolidated Standards of Reporting Trials (CONSORT). The study aims to randomize 2000 patients from 17 UK centres. Patients are randomized to either a restrictive (transfuse if haemoglobin concentration <7.5 g/dl) or liberal (transfuse if haemoglobin concentration <9 g/dl) transfusion strategy. The primary outcome is a binary composite outcome of any serious infectious or ischaemic event in the first 3 months following randomization. The statistical analysis plan details how non-adherence with the intervention, withdrawals from the study, and the study population will be derived and dealt with in the analysis. The planned analyses of the trial primary and secondary outcome measures are described in detail, including approaches taken to deal with multiple testing, model assumptions not being met and missing data. Details of planned subgroup and sensitivity analyses and pre-specified ancillary analyses are given, along with potential issues that have been identified with such analyses and possible approaches to overcome such issues. ISRCTN70923932 .

  11. [Methods, challenges and opportunities for big data analyses of microbiome].

    PubMed

    Sheng, Hua-Fang; Zhou, Hong-Wei

    2015-07-01

    Microbiome is a novel research field related with a variety of chronic inflamatory diseases. Technically, there are two major approaches to analysis of microbiome: metataxonome by sequencing the 16S rRNA variable tags, and metagenome by shot-gun sequencing of the total microbial (mainly bacterial) genome mixture. The 16S rRNA sequencing analyses pipeline includes sequence quality control, diversity analyses, taxonomy and statistics; metagenome analyses further includes gene annotation and functional analyses. With the development of the sequencing techniques, the cost of sequencing will decrease, and big data analyses will become the central task. Data standardization, accumulation, modeling and disease prediction are crucial for future exploit of these data. Meanwhile, the information property in these data, and the functional verification with culture-dependent and culture-independent experiments remain the focus in future research. Studies of human microbiome will bring a better understanding of the relations between the human body and the microbiome, especially in the context of disease diagnosis and therapy, which promise rich research opportunities.

  12. Recent meta-analyses neglect previous systematic reviews and meta-analyses about the same topic: a systematic examination.

    PubMed

    Helfer, Bartosz; Prosser, Aaron; Samara, Myrto T; Geddes, John R; Cipriani, Andrea; Davis, John M; Mavridis, Dimitris; Salanti, Georgia; Leucht, Stefan

    2015-04-14

    As the number of systematic reviews is growing rapidly, we systematically investigate whether meta-analyses published in leading medical journals present an outline of available evidence by referring to previous meta-analyses and systematic reviews. We searched PubMed for recent meta-analyses of pharmacological treatments published in high impact factor journals. Previous systematic reviews and meta-analyses were identified with electronic searches of keywords and by searching reference sections. We analyzed the number of meta-analyses and systematic reviews that were cited, described and discussed in each recent meta-analysis. Moreover, we investigated publication characteristics that potentially influence the referencing practices. We identified 52 recent meta-analyses and 242 previous meta-analyses on the same topics. Of these, 66% of identified previous meta-analyses were cited, 36% described, and only 20% discussed by recent meta-analyses. The probability of citing a previous meta-analysis was positively associated with its publication in a journal with a higher impact factor (odds ratio, 1.49; 95% confidence interval, 1.06 to 2.10) and more recent publication year (odds ratio, 1.19; 95% confidence interval 1.03 to 1.37). Additionally, the probability of a previous study being described by the recent meta-analysis was inversely associated with the concordance of results (odds ratio, 0.38; 95% confidence interval, 0.17 to 0.88), and the probability of being discussed was increased for previous studies that employed meta-analytic methods (odds ratio, 32.36; 95% confidence interval, 2.00 to 522.85). Meta-analyses on pharmacological treatments do not consistently refer to and discuss findings of previous meta-analyses on the same topic. Such neglect can lead to research waste and be confusing for readers. Journals should make the discussion of related meta-analyses mandatory.

  13. From sexless to sexy: Why it is time for human genetics to consider and report analyses of sex.

    PubMed

    Powers, Matthew S; Smith, Phillip H; McKee, Sherry A; Ehringer, Marissa A

    2017-01-01

    Science has come a long way with regard to the consideration of sex differences in clinical and preclinical research, but one field remains behind the curve: human statistical genetics. The goal of this commentary is to raise awareness and discussion about how to best consider and evaluate possible sex effects in the context of large-scale human genetic studies. Over the course of this commentary, we reinforce the importance of interpreting genetic results in the context of biological sex, establish evidence that sex differences are not being considered in human statistical genetics, and discuss how best to conduct and report such analyses. Our recommendation is to run stratified analyses by sex no matter the sample size or the result and report the findings. Summary statistics from stratified analyses are helpful for meta-analyses, and patterns of sex-dependent associations may be hidden in a combined dataset. In the age of declining sequencing costs, large consortia efforts, and a number of useful control samples, it is now time for the field of human genetics to appropriately include sex in the design, analysis, and reporting of results.

  14. The statistical analysis of global climate change studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hardin, J.W.

    1992-01-01

    The focus of this work is to contribute to the enhancement of the relationship between climatologists and statisticians. The analysis of global change data has been underway for many years by atmospheric scientists. Much of this analysis includes a heavy reliance on statistics and statistical inference. Some specific climatological analyses are presented and the dependence on statistics is documented before the analysis is undertaken. The first problem presented involves the fluctuation-dissipation theorem and its application to global climate models. This problem has a sound theoretical niche in the literature of both climate modeling and physics, but a statistical analysis inmore » which the data is obtained from the model to show graphically the relationship has not been undertaken. It is under this motivation that the author presents this problem. A second problem concerning the standard errors in estimating global temperatures is purely statistical in nature although very little materials exists for sampling on such a frame. This problem not only has climatological and statistical ramifications, but political ones as well. It is planned to use these results in a further analysis of global warming using actual data collected on the earth. In order to simplify the analysis of these problems, the development of a computer program, MISHA, is presented. This interactive program contains many of the routines, functions, graphics, and map projections needed by the climatologist in order to effectively enter the arena of data visualization.« less

  15. Multi-country health surveys: are the analyses misleading?

    PubMed

    Masood, Mohd; Reidpath, Daniel D

    2014-05-01

    The aim of this paper was to review the types of approaches currently utilized in the analysis of multi-country survey data, specifically focusing on design and modeling issues with a focus on analyses of significant multi-country surveys published in 2010. A systematic search strategy was used to identify the 10 multi-country surveys and the articles published from them in 2010. The surveys were selected to reflect diverse topics and foci; and provide an insight into analytic approaches across research themes. The search identified 159 articles appropriate for full text review and data extraction. The analyses adopted in the multi-country surveys can be broadly classified as: univariate/bivariate analyses, and multivariate/multivariable analyses. Multivariate/multivariable analyses may be further divided into design- and model-based analyses. Of the 159 articles reviewed, 129 articles used model-based analysis, 30 articles used design-based analyses. Similar patterns could be seen in all the individual surveys. While there is general agreement among survey statisticians that complex surveys are most appropriately analyzed using design-based analyses, most researchers continued to use the more common model-based approaches. Recent developments in design-based multi-level analysis may be one approach to include all the survey design characteristics. This is a relatively new area, however, and there remains statistical, as well as applied analytic research required. An important limitation of this study relates to the selection of the surveys used and the choice of year for the analysis, i.e., year 2010 only. There is, however, no strong reason to believe that analytic strategies have changed radically in the past few years, and 2010 provides a credible snapshot of current practice.

  16. Reliable mortality statistics for Turkey: Are we there yet?

    PubMed

    Özdemir, Raziye; Rao, Chalapati; Öcek, Zeliha; Dinç Horasan, Gönül

    2015-06-10

    The Turkish government has implemented several reforms to improve the Turkish Statistical Institute Death Reporting System (TURKSTAT-DRS) since 2009. However, there has been no assessment to evaluate the impact of these reforms on causes of death statistics. This study attempted to analyse the impact of these reforms on the TURKSTAT-DRS for Turkey, and in the case of Izmir, one of the most developed provinces in Turkey. The evaluation framework comprised three main components each with specific criteria. Firstly, data from TURKSTAT for Turkey and Izmir for the periods 2001-2008 and 2009-2013 were assessed in terms of the following dimensions that represent quality of mortality statistics (a. completeness of death registration, b. trends in proportions of deaths with ill-defined causes). Secondly, the quality of information recorded on individual death certificates from Izmir in 2010 was analysed for a. missing information, b. timeliness of death notifications and c. characteristics of deaths with ill-defined causes. Finally, TURKSTAT data were analysed to estimate life tables and summary mortality indicators for Turkey and Izmir, as well as the leading causes-of-death in Turkey in 2013. Registration of adult deaths in Izmir as well as at the national level for Turkey has considerably improved since the introduction of reforms in 2009, along with marked decline in the proportions of deaths assigned ill-defined causes. Death certificates from Izmir indicated significant gaps in recorded information for demographic as well as epidemiological variables, particularly for infant deaths, and in the detailed recording of causes of death. Life expectancy at birth estimated from local data is 3-4 years higher than similar estimates for Turkey from international studies, and this requires further investigation and confirmation. The TURKSTAT-DRS is now an improved source of mortality and cause of death statistics for Turkey. The reliability and validity of TURKSTAT data needs

  17. Quantifying Special Generator Ridership in Transit Analyses

    DOT National Transportation Integrated Search

    1997-01-01

    In major investment analyses and transit corridor studies, the impact of conventions, sporting matches, and other special events on transit ridership is often of interest. In many locations, it is hypothesized that additional ridership to and from su...

  18. Analysis of Feature Intervisibility and Cumulative Visibility Using GIS, Bayesian and Spatial Statistics: A Study from the Mandara Mountains, Northern Cameroon

    PubMed Central

    Wright, David K.; MacEachern, Scott; Lee, Jaeyong

    2014-01-01

    The locations of diy-geδ-bay (DGB) sites in the Mandara Mountains, northern Cameroon are hypothesized to occur as a function of their ability to see and be seen from points on the surrounding landscape. A series of geostatistical, two-way and Bayesian logistic regression analyses were performed to test two hypotheses related to the intervisibility of the sites to one another and their visual prominence on the landscape. We determine that the intervisibility of the sites to one another is highly statistically significant when compared to 10 stratified-random permutations of DGB sites. Bayesian logistic regression additionally demonstrates that the visibility of the sites to points on the surrounding landscape is statistically significant. The location of sites appears to have also been selected on the basis of lower slope than random permutations of sites. Using statistical measures, many of which are not commonly employed in archaeological research, to evaluate aspects of visibility on the landscape, we conclude that the placement of DGB sites improved their conspicuousness for enhanced ritual, social cooperation and/or competition purposes. PMID:25383883

  19. Multi-trait analysis of genome-wide association summary statistics using MTAG.

    PubMed

    Turley, Patrick; Walters, Raymond K; Maghzian, Omeed; Okbay, Aysu; Lee, James J; Fontana, Mark Alan; Nguyen-Viet, Tuan Anh; Wedow, Robbee; Zacher, Meghan; Furlotte, Nicholas A; Magnusson, Patrik; Oskarsson, Sven; Johannesson, Magnus; Visscher, Peter M; Laibson, David; Cesarini, David; Neale, Benjamin M; Benjamin, Daniel J

    2018-02-01

    We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (N eff  = 354,862), neuroticism (N = 168,105), and subjective well-being (N = 388,538). As compared to the 32, 9, and 13 genome-wide significant loci identified in the single-trait GWAS (most of which are themselves novel), MTAG increases the number of associated loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.

  20. Smooth extrapolation of unknown anatomy via statistical shape models

    NASA Astrophysics Data System (ADS)

    Grupp, R. B.; Chiang, H.; Otake, Y.; Murphy, R. J.; Gordon, C. R.; Armand, M.; Taylor, R. H.

    2015-03-01

    Several methods to perform extrapolation of unknown anatomy were evaluated. The primary application is to enhance surgical procedures that may use partial medical images or medical images of incomplete anatomy. Le Fort-based, face-jaw-teeth transplant is one such procedure. From CT data of 36 skulls and 21 mandibles separate Statistical Shape Models of the anatomical surfaces were created. Using the Statistical Shape Models, incomplete surfaces were projected to obtain complete surface estimates. The surface estimates exhibit non-zero error in regions where the true surface is known; it is desirable to keep the true surface and seamlessly merge the estimated unknown surface. Existing extrapolation techniques produce non-smooth transitions from the true surface to the estimated surface, resulting in additional error and a less aesthetically pleasing result. The three extrapolation techniques evaluated were: copying and pasting of the surface estimate (non-smooth baseline), a feathering between the patient surface and surface estimate, and an estimate generated via a Thin Plate Spline trained from displacements between the surface estimate and corresponding vertices of the known patient surface. Feathering and Thin Plate Spline approaches both yielded smooth transitions. However, feathering corrupted known vertex values. Leave-one-out analyses were conducted, with 5% to 50% of known anatomy removed from the left-out patient and estimated via the proposed approaches. The Thin Plate Spline approach yielded smaller errors than the other two approaches, with an average vertex error improvement of 1.46 mm and 1.38 mm for the skull and mandible respectively, over the baseline approach.

  1. Statistical analysis of lightning electric field measured under Malaysian condition

    NASA Astrophysics Data System (ADS)

    Salimi, Behnam; Mehranzamir, Kamyar; Abdul-Malek, Zulkurnain

    2014-02-01

    Lightning is an electrical discharge during thunderstorms that can be either within clouds (Inter-Cloud), or between clouds and ground (Cloud-Ground). The Lightning characteristics and their statistical information are the foundation for the design of lightning protection system as well as for the calculation of lightning radiated fields. Nowadays, there are various techniques to detect lightning signals and to determine various parameters produced by a lightning flash. Each technique provides its own claimed performances. In this paper, the characteristics of captured broadband electric fields generated by cloud-to-ground lightning discharges in South of Malaysia are analyzed. A total of 130 cloud-to-ground lightning flashes from 3 separate thunderstorm events (each event lasts for about 4-5 hours) were examined. Statistical analyses of the following signal parameters were presented: preliminary breakdown pulse train time duration, time interval between preliminary breakdowns and return stroke, multiplicity of stroke, and percentages of single stroke only. The BIL model is also introduced to characterize the lightning signature patterns. Observations on the statistical analyses show that about 79% of lightning signals fit well with the BIL model. The maximum and minimum of preliminary breakdown time duration of the observed lightning signals are 84 ms and 560 us, respectively. The findings of the statistical results show that 7.6% of the flashes were single stroke flashes, and the maximum number of strokes recorded was 14 multiple strokes per flash. A preliminary breakdown signature in more than 95% of the flashes can be identified.

  2. Anti-Vascular Endothelial Growth Factor Comparative Effectiveness Trial for Diabetic Macular Edema: Additional Efficacy Post Hoc Analyses of a Randomized Clinical Trial.

    PubMed

    Jampol, Lee M; Glassman, Adam R; Bressler, Neil M; Wells, John A; Ayala, Allison R

    2016-12-01

    Post hoc analyses from the Diabetic Retinopathy Clinical Research Network randomized clinical trial comparing aflibercept, bevacizumab, and ranibizumab for diabetic macular edema (DME) might influence interpretation of study results. To provide additional outcomes comparing 3 anti-vascular endothelial growth factor (VEGF) agents for DME. Post hoc analyses performed from May 3, 2016, to June 21, 2016, of a randomized clinical trial performed from August 22, 2012, to September 23, 2015, of 660 participants comparing 3 anti-VEGF treatments in eyes with center-involved DME causing vision impairment. Randomization to intravitreous aflibercept (2.0 mg), bevacizumab (1.25 mg), or ranibizumab (0.3 mg) administered up to monthly based on a structured retreatment regimen. Focal/grid laser treatment was added after 6 months for the treatment of persistent DME. Change in visual acuity (VA) area under the curve and change in central subfield thickness (CST) within subgroups based on whether an eye received laser treatment for DME during the study. Post hoc analyses were performed for 660 participants (mean [SD] age, 61 [10] years; 47% female, 65% white, 16% black or African American, 16% Hispanic, and 3% other). For eyes with an initial VA of 20/50 or worse, VA improvement was greater with aflibercept than the other agents at 1 year but superior only to bevacizumab at 2 years. Mean (SD) letter change in VA over 2 years (area under curve) was greater with aflibercept (+17.1 [9.7]) than with bevacizumab (+12.1 [9.4]; 95% CI, +1.6 to +7.3; P < .001) or ranibizumab (+13.6 [8.5]; 95% CI, +0.7 to +6.0; P = .009). When VA was 20/50 or worse at baseline, bevacizumab reduced CST less than the other agents at 1 year, but at 2 years the differences had diminished. In subgroups stratified by baseline VA, anti-VEGF agent, and whether focal/grid laser treatment was performed for DME, the only participants to have a substantial reduction in mean CST between 1 and 2 years were those

  3. Statistical analysis of field data for aircraft warranties

    NASA Astrophysics Data System (ADS)

    Lakey, Mary J.

    Air Force and Navy maintenance data collection systems were researched to determine their scientific applicability to the warranty process. New and unique algorithms were developed to extract failure distributions which were then used to characterize how selected families of equipment typically fails. Families of similar equipment were identified in terms of function, technology and failure patterns. Statistical analyses and applications such as goodness-of-fit test, maximum likelihood estimation and derivation of confidence intervals for the probability density function parameters were applied to characterize the distributions and their failure patterns. Statistical and reliability theory, with relevance to equipment design and operational failures were also determining factors in characterizing the failure patterns of the equipment families. Inferences about the families with relevance to warranty needs were then made.

  4. Exploring Foundation Concepts in Introductory Statistics Using Dynamic Data Points

    ERIC Educational Resources Information Center

    Ekol, George

    2015-01-01

    This paper analyses introductory statistics students' verbal and gestural expressions as they interacted with a dynamic sketch (DS) designed using "Sketchpad" software. The DS involved numeric data points built on the number line whose values changed as the points were dragged along the number line. The study is framed on aggregate…

  5. Study/experimental/research design: much more than statistics.

    PubMed

    Knight, Kenneth L

    2010-01-01

    The purpose of study, experimental, or research design in scientific manuscripts has changed significantly over the years. It has evolved from an explanation of the design of the experiment (ie, data gathering or acquisition) to an explanation of the statistical analysis. This practice makes "Methods" sections hard to read and understand. To clarify the difference between study design and statistical analysis, to show the advantages of a properly written study design on article comprehension, and to encourage authors to correctly describe study designs. The role of study design is explored from the introduction of the concept by Fisher through modern-day scientists and the AMA Manual of Style. At one time, when experiments were simpler, the study design and statistical design were identical or very similar. With the complex research that is common today, which often includes manipulating variables to create new variables and the multiple (and different) analyses of a single data set, data collection is very different than statistical design. Thus, both a study design and a statistical design are necessary. Scientific manuscripts will be much easier to read and comprehend. A proper experimental design serves as a road map to the study methods, helping readers to understand more clearly how the data were obtained and, therefore, assisting them in properly analyzing the results.

  6. Study/Experimental/Research Design: Much More Than Statistics

    PubMed Central

    Knight, Kenneth L.

    2010-01-01

    Abstract Context: The purpose of study, experimental, or research design in scientific manuscripts has changed significantly over the years. It has evolved from an explanation of the design of the experiment (ie, data gathering or acquisition) to an explanation of the statistical analysis. This practice makes “Methods” sections hard to read and understand. Objective: To clarify the difference between study design and statistical analysis, to show the advantages of a properly written study design on article comprehension, and to encourage authors to correctly describe study designs. Description: The role of study design is explored from the introduction of the concept by Fisher through modern-day scientists and the AMA Manual of Style. At one time, when experiments were simpler, the study design and statistical design were identical or very similar. With the complex research that is common today, which often includes manipulating variables to create new variables and the multiple (and different) analyses of a single data set, data collection is very different than statistical design. Thus, both a study design and a statistical design are necessary. Advantages: Scientific manuscripts will be much easier to read and comprehend. A proper experimental design serves as a road map to the study methods, helping readers to understand more clearly how the data were obtained and, therefore, assisting them in properly analyzing the results. PMID:20064054

  7. Interim analyses in 2 x 2 crossover trials.

    PubMed

    Cook, R J

    1995-09-01

    A method is presented for performing interim analyses in long term 2 x 2 crossover trials with serial patient entry. The analyses are based on a linear statistic that combines data from individuals observed for one treatment period with data from individuals observed for both periods. The coefficients in this linear combination can be chosen quite arbitrarily, but we focus on variance-based weights to maximize power for tests regarding direct treatment effects. The type I error rate of this procedure is controlled by utilizing the joint distribution of the linear statistics over analysis stages. Methods for performing power and sample size calculations are indicated. A two-stage sequential design involving simultaneous patient entry and a single between-period interim analysis is considered in detail. The power and average number of measurements required for this design are compared to those of the usual crossover trial. The results indicate that, while there is minimal loss in power relative to the usual crossover design in the absence of differential carry-over effects, the proposed design can have substantially greater power when differential carry-over effects are present. The two-stage crossover design can also lead to more economical studies in terms of the expected number of measurements required, due to the potential for early stopping. Attention is directed toward normally distributed responses.

  8. Statistical Evaluation of Molecular Contamination During Spacecraft Thermal Vacuum Test

    NASA Technical Reports Server (NTRS)

    Chen, Philip; Hedgeland, Randy; Montoya, Alex; Roman-Velazquez, Juan; Dunn, Jamie; Colony, Joe; Petitto, Joseph

    1998-01-01

    The purpose of this paper is to evaluate the statistical molecular contamination data with a goal to improve spacecraft contamination control. The statistical data was generated in typical thermal vacuum tests at the National Aeronautics and Space Administration, Goddard Space Flight Center (GSFC). The magnitude of material outgassing was measured using a Quartz Crystal Microbalance (QCM) device during the test. A solvent rinse sample was taken at the conclusion of each test. Then detailed qualitative and quantitative measurements were obtained through chemical analyses. All data used in this study encompassed numerous spacecraft tests in recent years.

  9. Statistical Evaluation of Molecular Contamination During Spacecraft Thermal Vacuum Test

    NASA Technical Reports Server (NTRS)

    Chen, Philip; Hedgeland, Randy; Montoya, Alex; Roman-Velazquez, Juan; Dunn, Jamie; Colony, Joe; Petitto, Joseph

    1999-01-01

    The purpose of this paper is to evaluate the statistical molecular contamination data with a goal to improve spacecraft contamination control. The statistical data was generated in typical thermal vacuum tests at the National Aeronautics and Space Administration, Goddard Space Flight Center (GSFC). The magnitude of material outgassing was measured using a Quartz Crystal Microbalance (QCNO device during the test. A solvent rinse sample was taken at the conclusion of each test. Then detailed qualitative and quantitative measurements were obtained through chemical analyses. All data used in this study encompassed numerous spacecraft tests in recent years.

  10. Statistical Evaluation of Molecular Contamination During Spacecraft Thermal Vacuum Test

    NASA Technical Reports Server (NTRS)

    Chen, Philip; Hedgeland, Randy; Montoya, Alex; Roman-Velazquez, Juan; Dunn, Jamie; Colony, Joe; Petitto, Joseph

    1997-01-01

    The purpose of this paper is to evaluate the statistical molecular contamination data with a goal to improve spacecraft contamination control. The statistical data was generated in typical thermal vacuum tests at the National Aeronautics and Space Administration, Goddard Space Flight Center (GSFC). The magnitude of material outgassing was measured using a Quartz Crystal Microbalance (QCM) device during the test. A solvent rinse sample was taken at the conclusion of the each test. Then detailed qualitative and quantitative measurements were obtained through chemical analyses. All data used in this study encompassed numerous spacecraft tests in recent years.

  11. Dissecting the genetics of complex traits using summary association statistics

    PubMed Central

    Pasaniuc, Bogdan; Price, Alkes L.

    2017-01-01

    During the past decade, genome-wide association studies (GWAS) have successfully identified tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyze summary association statistics. Here we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases. PMID:27840428

  12. Development of Brassica oleracea-nigra monosomic alien addition lines: genotypic, cytological and morphological analyses.

    PubMed

    Tan, Chen; Cui, Cheng; Xiang, Yi; Ge, Xianhong; Li, Zaiyun

    2017-12-01

    We report the development and characterization of Brassica oleracea - nigra monosomic alien addition lines (MAALs) to dissect the Brassica B genome. Brassica nigra (2n = 16, BB) represents the diploid Brassica B genome which carries many useful genes and traits for breeding but received limited studies. To dissect the B genome from B. nigra, the triploid F 1 hybrid (2n = 26, CCB) obtained previously from the cross B. oleracea var. alboglabra (2n = 18, CC) × B. nigra was used as the maternal parent and backcrossed successively to parental B. oleracea. The progenies in BC 1 to BC 3 generations were analyzed by the methods of FISH and SSR markers to screen the monosomic alien addition lines (MAALs) with each of eight different B-genome chromosomes added to C genome (2n = 19, CC + 1B 1-8 ), and seven different MAALs were established, except for the one with chromosome B2 which existed in one triple addition. Most of these MAALs were distinguishable morphologically from each other, as they expressed the characters from B. nigra differently and at variable extents. The alien chromosome remained unpaired as a univalent in 86.24% pollen mother cells at diakinesis or metaphase I, and formed a trivalent with two C-genome chromosomes in 13.76% cells. Transmission frequency of all the added chromosomes was far higher through the ovules (averagely 14.40%) than the pollen (2.64%). The B1, B4 and B5 chromosomes were transmitted by female at much higher rates (22.38-30.00%) than the other four (B3, B6, B7, B8) (5.04-8.42%). The MAALs should be valuable for exploiting the genome structure and evolution of B. nigra.

  13. The Relationship between Statistics Self-Efficacy, Statistics Anxiety, and Performance in an Introductory Graduate Statistics Course

    ERIC Educational Resources Information Center

    Schneider, William R.

    2011-01-01

    The purpose of this study was to determine the relationship between statistics self-efficacy, statistics anxiety, and performance in introductory graduate statistics courses. The study design compared two statistics self-efficacy measures developed by Finney and Schraw (2003), a statistics anxiety measure developed by Cruise and Wilkins (1980),…

  14. Chi-squared and C statistic minimization for low count per bin data

    NASA Astrophysics Data System (ADS)

    Nousek, John A.; Shue, David R.

    1989-07-01

    Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.

  15. The Development of Statistical Models for Predicting Surgical Site Infections in Japan: Toward a Statistical Model-Based Standardized Infection Ratio.

    PubMed

    Fukuda, Haruhisa; Kuroki, Manabu

    2016-03-01

    To develop and internally validate a surgical site infection (SSI) prediction model for Japan. Retrospective observational cohort study. We analyzed surveillance data submitted to the Japan Nosocomial Infections Surveillance system for patients who had undergone target surgical procedures from January 1, 2010, through December 31, 2012. Logistic regression analyses were used to develop statistical models for predicting SSIs. An SSI prediction model was constructed for each of the procedure categories by statistically selecting the appropriate risk factors from among the collected surveillance data and determining their optimal categorization. Standard bootstrapping techniques were applied to assess potential overfitting. The C-index was used to compare the predictive performances of the new statistical models with those of models based on conventional risk index variables. The study sample comprised 349,987 cases from 428 participant hospitals throughout Japan, and the overall SSI incidence was 7.0%. The C-indices of the new statistical models were significantly higher than those of the conventional risk index models in 21 (67.7%) of the 31 procedure categories (P<.05). No significant overfitting was detected. Japan-specific SSI prediction models were shown to generally have higher accuracy than conventional risk index models. These new models may have applications in assessing hospital performance and identifying high-risk patients in specific procedure categories.

  16. Advanced Statistical Analyses to Reduce Inconsistency of Bond Strength Data.

    PubMed

    Minamino, T; Mine, A; Shintani, A; Higashi, M; Kawaguchi-Uemura, A; Kabetani, T; Hagino, R; Imai, D; Tajiri, Y; Matsumoto, M; Yatani, H

    2017-11-01

    This study was designed to clarify the interrelationship of factors that affect the value of microtensile bond strength (µTBS), focusing on nondestructive testing by which information of the specimens can be stored and quantified. µTBS test specimens were prepared from 10 noncarious human molars. Six factors of µTBS test specimens were evaluated: presence of voids at the interface, X-ray absorption coefficient of resin, X-ray absorption coefficient of dentin, length of dentin part, size of adhesion area, and individual differences of teeth. All specimens were observed nondestructively by optical coherence tomography and micro-computed tomography before µTBS testing. After µTBS testing, the effect of these factors on µTBS data was analyzed by the general linear model, linear mixed effects regression model, and nonlinear regression model with 95% confidence intervals. By the general linear model, a significant difference in individual differences of teeth was observed ( P < 0.001). A significantly positive correlation was shown between µTBS and length of dentin part ( P < 0.001); however, there was no significant nonlinearity ( P = 0.157). Moreover, a significantly negative correlation was observed between µTBS and size of adhesion area ( P = 0.001), with significant nonlinearity ( P = 0.014). No correlation was observed between µTBS and X-ray absorption coefficient of resin ( P = 0.147), and there was no significant nonlinearity ( P = 0.089). Additionally, a significantly positive correlation was observed between µTBS and X-ray absorption coefficient of dentin ( P = 0.022), with significant nonlinearity ( P = 0.036). A significant difference was also observed between the presence and absence of voids by linear mixed effects regression analysis. Our results showed correlations between various parameters of tooth specimens and µTBS data. To evaluate the performance of the adhesive more precisely, the effect of tooth variability and a method to reduce

  17. NASA Pocket Statistics

    NASA Technical Reports Server (NTRS)

    1995-01-01

    NASA Pocket Statistics is published for the use of NASA managers and their staff. Included herein is Administrative and Organizational information, summaries of Space Flight Activity including the NASA Major Launch Record, and NASA Procurement, Financial, and Manpower data. The NASA Major Launch Record includes all launches of Scout class and larger vehicles. Vehicle and spacecraft development flights are also included in the Major Launch Record. Shuttle missions are counted as one launch and one payload, where free flying payloads are not involved. Satellites deployed from the cargo bay of the Shuttle and placed in a separate orbit or trajectory are counted as an additional payload.

  18. NASA Pocket Statistics

    NASA Technical Reports Server (NTRS)

    1994-01-01

    Pocket Statistics is published for the use of NASA managers and their staff. Included herein is Administrative and Organizational information, summaries of Space Flight Activity including the NASA Major Launch Record, and NASA Procurement, Financial, and Manpower data. The NASA Major Launch Record includes all launches of Scout class and larger vehicles. Vehicle and spacecraft development flights are also included in the Major Launch Record. Shuttle missions are counted as one launch and one payload, where free flying payloads are not involved. Satellites deployed from the cargo bay of the Shuttle and placed in a separate orbit or trajectory are counted as an additional payload.

  19. Positive pressure--analysing the effect of the addition of non-invasive ventilation (NIV) to home airway clearance techniques (ACT) in adult cystic fibrosis (CF) patients.

    PubMed

    Stanford, Gemma; Parrott, Helen; Bilton, Diana; Agent, Penny

    2015-05-01

    There is no published literature on the frequency of use of non-invasive ventilation (NIV) with airway clearance techniques (ACT) throughout the cystic fibrosis (CF) population; 3.9% (191 people of 5062 registered) of the United Kingdom CF population older than 16 years are reported to use NIV in registry data; however, it is not specified if this is for ACT or respiratory failure. Using NIV with ACT decreases work of breathing and fatigue during in-patient admissions for CF patients. We hypothesised these effects could be replicated at home, potentially reducing hospital admissions. Fourteen adult patients with CF scored ease of clearance and breathlessness with ACT before and after addition of NIV to normal ACT routine using a visual analog scale. Patient views on NIV with ACT were collected via a structured interview. Number of home intravenous (IV) antibiotic courses and days in hospital was collected for one year pre- and post-NIV provision. Patients reported statistically significant improvements in ease of clearance (p = 0.011) and reduced breathlessness during ACT using NIV (p = 0.011). Structured interview results indicated patient reports of sputum clearance improved. In-patient days were lower, while home IV days were higher after NIV was set up, although not statistically significant. This study is limited by small numbers; however, trends towards less hospital admissions and greater patient ease while using NIV with ACT warrant further investigation.

  20. The sumLINK statistic for genetic linkage analysis in the presence of heterogeneity.

    PubMed

    Christensen, G B; Knight, S; Camp, N J

    2009-11-01

    We present the "sumLINK" statistic--the sum of multipoint LOD scores for the subset of pedigrees with nominally significant linkage evidence at a given locus--as an alternative to common methods to identify susceptibility loci in the presence of heterogeneity. We also suggest the "sumLOD" statistic (the sum of positive multipoint LOD scores) as a companion to the sumLINK. sumLINK analysis identifies genetic regions of extreme consistency across pedigrees without regard to negative evidence from unlinked or uninformative pedigrees. Significance is determined by an innovative permutation procedure based on genome shuffling that randomizes linkage information across pedigrees. This procedure for generating the empirical null distribution may be useful for other linkage-based statistics as well. Using 500 genome-wide analyses of simulated null data, we show that the genome shuffling procedure results in the correct type 1 error rates for both the sumLINK and sumLOD. The power of the statistics was tested using 100 sets of simulated genome-wide data from the alternative hypothesis from GAW13. Finally, we illustrate the statistics in an analysis of 190 aggressive prostate cancer pedigrees from the International Consortium for Prostate Cancer Genetics, where we identified a new susceptibility locus. We propose that the sumLINK and sumLOD are ideal for collaborative projects and meta-analyses, as they do not require any sharing of identifiable data between contributing institutions. Further, loci identified with the sumLINK have good potential for gene localization via statistical recombinant mapping, as, by definition, several linked pedigrees contribute to each peak.

  1. Truths, lies, and statistics.

    PubMed

    Thiese, Matthew S; Walker, Skyler; Lindsey, Jenna

    2017-10-01

    Distribution of valuable research discoveries are needed for the continual advancement of patient care. Publication and subsequent reliance of false study results would be detrimental for patient care. Unfortunately, research misconduct may originate from many sources. While there is evidence of ongoing research misconduct in all it's forms, it is challenging to identify the actual occurrence of research misconduct, which is especially true for misconduct in clinical trials. Research misconduct is challenging to measure and there are few studies reporting the prevalence or underlying causes of research misconduct among biomedical researchers. Reported prevalence estimates of misconduct are probably underestimates, and range from 0.3% to 4.9%. There have been efforts to measure the prevalence of research misconduct; however, the relatively few published studies are not freely comparable because of varying characterizations of research misconduct and the methods used for data collection. There are some signs which may point to an increased possibility of research misconduct, however there is a need for continued self-policing by biomedical researchers. There are existing resources to assist in ensuring appropriate statistical methods and preventing other types of research fraud. These included the "Statistical Analyses and Methods in the Published Literature", also known as the SAMPL guidelines, which help scientists determine the appropriate method of reporting various statistical methods; the "Strengthening Analytical Thinking for Observational Studies", or the STRATOS, which emphases on execution and interpretation of results; and the Committee on Publication Ethics (COPE), which was created in 1997 to deliver guidance about publication ethics. COPE has a sequence of views and strategies grounded in the values of honesty and accuracy.

  2. Relationship between Graduate Students' Statistics Self-Efficacy, Statistics Anxiety, Attitude toward Statistics, and Social Support

    ERIC Educational Resources Information Center

    Perepiczka, Michelle; Chandler, Nichelle; Becerra, Michael

    2011-01-01

    Statistics plays an integral role in graduate programs. However, numerous intra- and interpersonal factors may lead to successful completion of needed coursework in this area. The authors examined the extent of the relationship between self-efficacy to learn statistics and statistics anxiety, attitude towards statistics, and social support of 166…

  3. [Clinical research XXIII. From clinical judgment to meta-analyses].

    PubMed

    Rivas-Ruiz, Rodolfo; Castelán-Martínez, Osvaldo D; Pérez-Rodríguez, Marcela; Palacios-Cruz, Lino; Noyola-Castillo, Maura E; Talavera, Juan O

    2014-01-01

    Systematic reviews (SR) are studies made in order to ask clinical questions based on original articles. Meta-analysis (MTA) is the mathematical analysis of SR. These analyses are divided in two groups, those which evaluate the measured results of quantitative variables (for example, the body mass index -BMI-) and those which evaluate qualitative variables (for example, if a patient is alive or dead, or if he is healing or not). Quantitative variables generally use the mean difference analysis and qualitative variables can be performed using several calculations: odds ratio (OR), relative risk (RR), absolute risk reduction (ARR) and hazard ratio (HR). These analyses are represented through forest plots which allow the evaluation of each individual study, as well as the heterogeneity between studies and the overall effect of the intervention. These analyses are mainly based on Student's t test and chi-squared. To take appropriate decisions based on the MTA, it is important to understand the characteristics of statistical methods in order to avoid misinterpretations.

  4. Kaolinite flocculation induced by smectite addition - a transmission X-ray microscopic study.

    PubMed

    Zbik, Marek S; Song, Yen-Fang; Frost, Ray L

    2010-09-01

    The influence of smectite addition on kaolinite suspensions in water was investigated by transmission X-ray microscopy (TXM) and Scanning Electron Microscopy (SEM). Sedimentation test screening was also conducted. Micrographs were processed by the STatistic IMage Analysing (STIMAN) program and structural parameters were calculated. From the results of the sedimentation tests important influences of small smectite additions to about 3wt.% on kaolinite suspension flocculation has been found. In order to determine the reason for this smectite impact on kaolinite suspension, macroscopic behaviour micro-structural examination using Transmission X-ray Microscope (TXM) and SEM has been undertaken. TXM & SEM micrographs of freeze-dried kaolinite-smectite suspensions with up to 20% smectite showed a high degree of orientation of the fabric made of highly oriented particles and greatest density when 3wt.% of smectite was added to the 10wt.% dense kaolinite suspension. In contrast, suspensions containing pure kaolinite do not show such platelet mutual orientation but homogenous network of randomly oriented kaolinite platelets. This suggests that in kaolinite-smectite suspensions, smectite forms highly oriented basic framework into which kaolinite platelets may bond in face to face preferential contacts strengthening structure and allowing them to show plastic behaviour which is cause of platelets orientation. Copyright 2010 Elsevier Inc. All rights reserved.

  5. Lightning NOx Statistics Derived by NASA Lightning Nitrogen Oxides Model (LNOM) Data Analyses

    NASA Technical Reports Server (NTRS)

    Koshak, William; Peterson, Harold

    2013-01-01

    What is the LNOM? The NASA Marshall Space Flight Center (MSFC) Lightning Nitrogen Oxides Model (LNOM) [Koshak et al., 2009, 2010, 2011; Koshak and Peterson 2011, 2013] analyzes VHF Lightning Mapping Array (LMA) and National Lightning Detection Network(TradeMark) (NLDN) data to estimate the lightning nitrogen oxides (LNOx) produced by individual flashes. Figure 1 provides an overview of LNOM functionality. Benefits of LNOM: (1) Does away with unrealistic "vertical stick" lightning channel models for estimating LNOx; (2) Uses ground-based VHF data that maps out the true channel in space and time to < 100 m accuracy; (3) Therefore, true channel segment height (ambient air density) is used to compute LNOx; (4) True channel length is used! (typically tens of kilometers since channel has many branches and "wiggles"); (5) Distinction between ground and cloud flashes are made; (6) For ground flashes, actual peak current from NLDN used to compute NOx from lightning return stroke; (7) NOx computed for several other lightning discharge processes (based on Cooray et al., 2009 theory): (a) Hot core of stepped leaders and dart leaders, (b) Corona sheath of stepped leader, (c) K-change, (d) Continuing Currents, and (e) M-components; and (8) LNOM statistics (see later) can be used to parameterize LNOx production for regional air quality models (like CMAQ), and for global chemical transport models (like GEOS-Chem).

  6. Phospholipid and Respiratory Quinone Analyses From Extreme Environments

    NASA Astrophysics Data System (ADS)

    Pfiffner, S. M.

    2008-12-01

    Extreme environments on Earth have been chosen as surrogate sites to test methods and strategies for the deployment of space craft in the search for extraterrestrial life. Surrogate sites for many of the NASA astrobiology institutes include the South African gold mines, Canadian subpermafrost, Atacama Desert, and acid rock drainage. Soils, sediments, rock cores, fracture waters, biofilms, and service and drill waters represent the types of samples collected from these sites. These samples were analyzed by gas chromatography mass spectrometry for phospholipid fatty acid methyl esters and by high performance liquid chromatography atmospheric pressure chemical ionization tandem mass spectrometry for respiratory quinones. Phospholipid analyses provided estimates of biomass, community composition, and compositional changes related to nutritional limitations or exposure to toxic conditions. Similar to phospholipid analyses, respiratory quinone analyses afforded identification of certain types of microorganisms in the community based on respiration and offered clues to in situ redox conditions. Depending on the number of samples analyzed, selected multivariate statistical methods were applied to relate membrane lipid results with site biogeochemical parameters. Successful detection of life signatures and refinement of methodologies at surrogate sites on Earth will be critical for the recognition of extraterrestrial life. At this time, membrane lipid analyses provide useful information not easily obtained by other molecular techniques.

  7. Recovering incomplete data using Statistical Multiple Imputations (SMI): a case study in environmental chemistry.

    PubMed

    Mercer, Theresa G; Frostick, Lynne E; Walmsley, Anthony D

    2011-10-15

    This paper presents a statistical technique that can be applied to environmental chemistry data where missing values and limit of detection levels prevent the application of statistics. A working example is taken from an environmental leaching study that was set up to determine if there were significant differences in levels of leached arsenic (As), chromium (Cr) and copper (Cu) between lysimeters containing preservative treated wood waste and those containing untreated wood. Fourteen lysimeters were setup and left in natural conditions for 21 weeks. The resultant leachate was analysed by ICP-OES to determine the As, Cr and Cu concentrations. However, due to the variation inherent in each lysimeter combined with the limits of detection offered by ICP-OES, the collected quantitative data was somewhat incomplete. Initial data analysis was hampered by the number of 'missing values' in the data. To recover the dataset, the statistical tool of Statistical Multiple Imputation (SMI) was applied, and the data was re-analysed successfully. It was demonstrated that using SMI did not affect the variance in the data, but facilitated analysis of the complete dataset. Copyright © 2011 Elsevier B.V. All rights reserved.

  8. Evidence-based orthodontics. Current statistical trends in published articles in one journal.

    PubMed

    Law, Scott V; Chudasama, Dipak N; Rinchuse, Donald J

    2010-09-01

    To ascertain the number, type, and overall usage of statistics in American Journal of Orthodontics and Dentofacial (AJODO) articles for 2008. These data were then compared to data from three previous years: 1975, 1985, and 2003. The frequency and distribution of statistics used in the AJODO original articles for 2008 were dichotomized into those using statistics and those not using statistics. Statistical procedures were then broadly divided into descriptive statistics (mean, standard deviation, range, percentage) and inferential statistics (t-test, analysis of variance). Descriptive statistics were used to make comparisons. In 1975, 1985, 2003, and 2008, AJODO published 72, 87, 134, and 141 original articles, respectively. The percentage of original articles using statistics was 43.1% in 1975, 75.9% in 1985, 94.0% in 2003, and 92.9% in 2008; original articles using statistics stayed relatively the same from 2003 to 2008, with only a small 1.1% decrease. The percentage of articles using inferential statistical analyses was 23.7% in 1975, 74.2% in 1985, 92.9% in 2003, and 84.4% in 2008. Comparing AJODO publications in 2003 and 2008, there was an 8.5% increase in the use of descriptive articles (from 7.1% to 15.6%), and there was an 8.5% decrease in articles using inferential statistics (from 92.9% to 84.4%).

  9. On an additive partial correlation operator and nonparametric estimation of graphical models.

    PubMed

    Lee, Kuang-Yao; Li, Bing; Zhao, Hongyu

    2016-09-01

    We introduce an additive partial correlation operator as an extension of partial correlation to the nonlinear setting, and use it to develop a new estimator for nonparametric graphical models. Our graphical models are based on additive conditional independence, a statistical relation that captures the spirit of conditional independence without having to resort to high-dimensional kernels for its estimation. The additive partial correlation operator completely characterizes additive conditional independence, and has the additional advantage of putting marginal variation on appropriate scales when evaluating interdependence, which leads to more accurate statistical inference. We establish the consistency of the proposed estimator. Through simulation experiments and analysis of the DREAM4 Challenge dataset, we demonstrate that our method performs better than existing methods in cases where the Gaussian or copula Gaussian assumption does not hold, and that a more appropriate scaling for our method further enhances its performance.

  10. On an additive partial correlation operator and nonparametric estimation of graphical models

    PubMed Central

    Li, Bing; Zhao, Hongyu

    2016-01-01

    Abstract We introduce an additive partial correlation operator as an extension of partial correlation to the nonlinear setting, and use it to develop a new estimator for nonparametric graphical models. Our graphical models are based on additive conditional independence, a statistical relation that captures the spirit of conditional independence without having to resort to high-dimensional kernels for its estimation. The additive partial correlation operator completely characterizes additive conditional independence, and has the additional advantage of putting marginal variation on appropriate scales when evaluating interdependence, which leads to more accurate statistical inference. We establish the consistency of the proposed estimator. Through simulation experiments and analysis of the DREAM4 Challenge dataset, we demonstrate that our method performs better than existing methods in cases where the Gaussian or copula Gaussian assumption does not hold, and that a more appropriate scaling for our method further enhances its performance. PMID:29422689

  11. Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses

    PubMed Central

    Callahan, Ben J.; Sankaran, Kris; Fukuyama, Julia A.; McMurdie, Paul J.; Holmes, Susan P.

    2016-01-01

    High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package. PMID:27508062

  12. Predation and fragmentation portrayed in the statistical structure of prey time series

    PubMed Central

    Hendrichsen, Ditte K; Topping, Chris J; Forchhammer, Mads C

    2009-01-01

    Background Statistical autoregressive analyses of direct and delayed density dependence are widespread in ecological research. The models suggest that changes in ecological factors affecting density dependence, like predation and landscape heterogeneity are directly portrayed in the first and second order autoregressive parameters, and the models are therefore used to decipher complex biological patterns. However, independent tests of model predictions are complicated by the inherent variability of natural populations, where differences in landscape structure, climate or species composition prevent controlled repeated analyses. To circumvent this problem, we applied second-order autoregressive time series analyses to data generated by a realistic agent-based computer model. The model simulated life history decisions of individual field voles under controlled variations in predator pressure and landscape fragmentation. Analyses were made on three levels: comparisons between predated and non-predated populations, between populations exposed to different types of predators and between populations experiencing different degrees of habitat fragmentation. Results The results are unambiguous: Changes in landscape fragmentation and the numerical response of predators are clearly portrayed in the statistical time series structure as predicted by the autoregressive model. Populations without predators displayed significantly stronger negative direct density dependence than did those exposed to predators, where direct density dependence was only moderately negative. The effects of predation versus no predation had an even stronger effect on the delayed density dependence of the simulated prey populations. In non-predated prey populations, the coefficients of delayed density dependence were distinctly positive, whereas they were negative in predated populations. Similarly, increasing the degree of fragmentation of optimal habitat available to the prey was accompanied with a

  13. Survey of the Methods and Reporting Practices in Published Meta-analyses of Test Performance: 1987 to 2009

    ERIC Educational Resources Information Center

    Dahabreh, Issa J.; Chung, Mei; Kitsios, Georgios D.; Terasawa, Teruhiko; Raman, Gowri; Tatsioni, Athina; Tobar, Annette; Lau, Joseph; Trikalinos, Thomas A.; Schmid, Christopher H.

    2013-01-01

    We performed a survey of meta-analyses of test performance to describe the evolution in their methods and reporting. Studies were identified through MEDLINE (1966-2009), reference lists, and relevant reviews. We extracted information on clinical topics, literature review methods, quality assessment, and statistical analyses. We reviewed 760…

  14. Isotropy analyses of the Planck convergence map

    NASA Astrophysics Data System (ADS)

    Marques, G. A.; Novaes, C. P.; Bernui, A.; Ferreira, I. S.

    2018-01-01

    The presence of matter in the path of relic photons causes distortions in the angular pattern of the cosmic microwave background (CMB) temperature fluctuations, modifying their properties in a slight but measurable way. Recently, the Planck Collaboration released the estimated convergence map, an integrated measure of the large-scale matter distribution that produced the weak gravitational lensing (WL) phenomenon observed in Planck CMB data. We perform exhaustive analyses of this convergence map calculating the variance in small and large regions of the sky, but excluding the area masked due to Galactic contaminations, and compare them with the features expected in the set of simulated convergence maps, also released by the Planck Collaboration. Our goal is to search for sky directions or regions where the WL imprints anomalous signatures to the variance estimator revealed through a χ2 analyses at a statistically significant level. In the local analysis of the Planck convergence map, we identified eight patches of the sky in disagreement, in more than 2σ, with what is observed in the average of the simulations. In contrast, in the large regions analysis we found no statistically significant discrepancies, but, interestingly, the regions with the highest χ2 values are surrounding the ecliptic poles. Thus, our results show a good agreement with the features expected by the Λ cold dark matter concordance model, as given by the simulations. Yet, the outliers regions found here could suggest that the data still contain residual contamination, like noise, due to over- or underestimation of systematic effects in the simulation data set.

  15. Mathematics authentic assessment on statistics learning: the case for student mini projects

    NASA Astrophysics Data System (ADS)

    Fauziah, D.; Mardiyana; Saputro, D. R. S.

    2018-03-01

    Mathematics authentic assessment is a form of meaningful measurement of student learning outcomes for the sphere of attitude, skill and knowledge in mathematics. The construction of attitude, skill and knowledge achieved through the fulfilment of tasks which involve active and creative role of the students. One type of authentic assessment is student mini projects, started from planning, data collecting, organizing, processing, analysing and presenting the data. The purpose of this research is to learn the process of using authentic assessments on statistics learning which is conducted by teachers and to discuss specifically the use of mini projects to improving students’ learning in the school of Surakarta. This research is an action research, where the data collected through the results of the assessments rubric of student mini projects. The result of data analysis shows that the average score of rubric of student mini projects result is 82 with 96% classical completeness. This study shows that the application of authentic assessment can improve students’ mathematics learning outcomes. Findings showed that teachers and students participate actively during teaching and learning process, both inside and outside of the school. Student mini projects also provide opportunities to interact with other people in the real context while collecting information and giving presentation to the community. Additionally, students are able to exceed more on the process of statistics learning using authentic assessment.

  16. Survey of Attitudes Toward Statistics: Factor Structure Invariance by Gender and by Administration Time

    ERIC Educational Resources Information Center

    Hilton, Sterling C.; Schau, Candace; Olsen, Joseph A.

    2004-01-01

    In addition to student learning, positive student attitudes have become an important course outcome for many introductory statistics instructors. To adequately assess changes in mean attitudes across introductory statistics courses, the attitude instruments used should be invariant by administration time. Attitudes toward statistics from 4,910…

  17. Statistical study of air pollutant concentrations via generalized gamma distribution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marani, A.; Lavagnini, I.; Buttazzoni, C.

    1986-11-01

    This paper deals with modeling observed frequency distributions of air quality data measured in the area of Venice, Italy. The paper discusses the application of the generalized gamma distribution (ggd) which has not been commonly applied to air quality data notwithstanding the fact that it embodies most distribution models used for air quality analyses. The approach yields important simplifications for statistical analyses. A comparison among the ggd and other relevant models (standard gamma, Weibull, lognormal), carried out on daily sulfur dioxide concentrations in the area of Venice underlines the efficiency of ggd models in portraying experimental data.

  18. Mars: Noachian hydrology by its statistics and topology

    NASA Technical Reports Server (NTRS)

    Cabrol, N. A.; Grin, E. A.

    1993-01-01

    Discrimination between fluvial features generated by surface drainage and subsurface aquifer discharges will provide clues to the understanding of early Mars' climatic history. Our approach is to define the process of formation of the oldest fluvial valleys by statistical and topological analyses. Formation of fluvial valley systems reached its highest statistical concentration during the Noachian Period. Nevertheless, they are a scarce phenomenom in Martian history, localized on the craterized upland, and subject to latitudinal distribution. They occur sparsely on Noachian geological units with a weak distribution density, and appear in reduced isolated surface (around 5 x 10(exp 3)(sq km)), filled by short streams (100-300 km length). Topological analysis of the internal organization of 71 surveyed Noachian fluvial valley networks also provides information on the mechanisms of formation.

  19. An application of statistics to comparative metagenomics

    PubMed Central

    Rodriguez-Brito, Beltran; Rohwer, Forest; Edwards, Robert A

    2006-01-01

    Background Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments. Results Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified. Conclusion The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems. PMID:16549025

  20. An application of statistics to comparative metagenomics.

    PubMed

    Rodriguez-Brito, Beltran; Rohwer, Forest; Edwards, Robert A

    2006-03-20

    Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments. Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified. The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems.

  1. 2017 Workplace and Gender Relations Survey of Reserve Component Members: Statistical Methodology Report

    DTIC Science & Technology

    2018-04-30

    2017 Workplace and Gender Relations Survey of Reserve Component Members Statistical Methodology Report Additional copies of this report...Survey of Reserve Component Members Statistical Methodology Report Office of People Analytics (OPA) 4800 Mark Center Drive, Suite...RESERVE COMPONENT MEMBERS STATISTICAL METHODOLOGY REPORT Introduction The Office of People Analytics’ Center for Health and Resilience (OPA[H&R

  2. [Continuity of hospital identifiers in hospital discharge data - Analysis of the nationwide German DRG Statistics from 2005 to 2013].

    PubMed

    Nimptsch, Ulrike; Wengler, Annelene; Mansky, Thomas

    2016-11-01

    In Germany, nationwide hospital discharge data (DRG statistics provided by the research data centers of the Federal Statistical Office and the Statistical Offices of the 'Länder') are increasingly used as data source for health services research. Within this data hospitals can be separated via their hospital identifier ([Institutionskennzeichen] IK). However, this hospital identifier primarily designates the invoicing unit and is not necessarily equivalent to one hospital location. Aiming to investigate direction and extent of possible bias in hospital-level analyses this study examines the continuity of the hospital identifier within a cross-sectional and longitudinal approach and compares the results to official hospital census statistics. Within the DRG statistics from 2005 to 2013 the annual number of hospitals as classified by hospital identifiers was counted for each year of observation. The annual number of hospitals derived from DRG statistics was compared to the number of hospitals in the official census statistics 'Grunddaten der Krankenhäuser'. Subsequently, the temporal continuity of hospital identifiers in the DRG statistics was analyzed within cohorts of hospitals. Until 2013, the annual number of hospital identifiers in the DRG statistics fell by 175 (from 1,725 to 1,550). This decline affected only providers with small or medium case volume. The number of hospitals identified in the DRG statistics was lower than the number given in the census statistics (e.g., in 2013 1,550 IK vs. 1,668 hospitals in the census statistics). The longitudinal analyses revealed that the majority of hospital identifiers persisted in the years of observation, while one fifth of hospital identifiers changed. In cross-sectional studies of German hospital discharge data the separation of hospitals via the hospital identifier might lead to underestimating the number of hospitals and consequential overestimation of caseload per hospital. Discontinuities of hospital

  3. Statistical assessment of changes in extreme maximum temperatures over Saudi Arabia, 1985-2014

    NASA Astrophysics Data System (ADS)

    Raggad, Bechir

    2018-05-01

    In this study, two statistical approaches were adopted in the analysis of observed maximum temperature data collected from fifteen stations over Saudi Arabia during the period 1985-2014. In the first step, the behavior of extreme temperatures was analyzed and their changes were quantified with respect to the Expert Team on Climate Change Detection Monitoring indices. The results showed a general warming trend over most stations, in maximum temperature-related indices, during the period of analysis. In the second step, stationary and non-stationary extreme-value analyses were conducted for the temperature data. The results revealed that the non-stationary model with increasing linear trend in its location parameter outperforms the other models for two-thirds of the stations. Additionally, the 10-, 50-, and 100-year return levels were found to change with time considerably and that the maximum temperature could start to reappear in the different T-year return period for most stations. This analysis shows the importance of taking account the change over time in the estimation of return levels and therefore justifies the use of the non-stationary generalized extreme value distribution model to describe most of the data. Furthermore, these last findings are in line with the result of significant warming trends found in climate indices analyses.

  4. Harmonic statistics

    NASA Astrophysics Data System (ADS)

    Eliazar, Iddo

    2017-05-01

    The exponential, the normal, and the Poisson statistical laws are of major importance due to their universality. Harmonic statistics are as universal as the three aforementioned laws, but yet they fall short in their 'public relations' for the following reason: the full scope of harmonic statistics cannot be described in terms of a statistical law. In this paper we describe harmonic statistics, in their full scope, via an object termed harmonic Poisson process: a Poisson process, over the positive half-line, with a harmonic intensity. The paper reviews the harmonic Poisson process, investigates its properties, and presents the connections of this object to an assortment of topics: uniform statistics, scale invariance, random multiplicative perturbations, Pareto and inverse-Pareto statistics, exponential growth and exponential decay, power-law renormalization, convergence and domains of attraction, the Langevin equation, diffusions, Benford's law, and 1/f noise.

  5. Statistical word learning in children with autism spectrum disorder and specific language impairment.

    PubMed

    Haebig, Eileen; Saffran, Jenny R; Ellis Weismer, Susan

    2017-11-01

    Word learning is an important component of language development that influences child outcomes across multiple domains. Despite the importance of word knowledge, word-learning mechanisms are poorly understood in children with specific language impairment (SLI) and children with autism spectrum disorder (ASD). This study examined underlying mechanisms of word learning, specifically, statistical learning and fast-mapping, in school-aged children with typical and atypical development. Statistical learning was assessed through a word segmentation task and fast-mapping was examined in an object-label association task. We also examined children's ability to map meaning onto newly segmented words in a third task that combined exposure to an artificial language and a fast-mapping task. Children with SLI had poorer performance on the word segmentation and fast-mapping tasks relative to the typically developing and ASD groups, who did not differ from one another. However, when children with SLI were exposed to an artificial language with phonemes used in the subsequent fast-mapping task, they successfully learned more words than in the isolated fast-mapping task. There was some evidence that word segmentation abilities are associated with word learning in school-aged children with typical development and ASD, but not SLI. Follow-up analyses also examined performance in children with ASD who did and did not have a language impairment. Children with ASD with language impairment evidenced intact statistical learning abilities, but subtle weaknesses in fast-mapping abilities. As the Procedural Deficit Hypothesis (PDH) predicts, children with SLI have impairments in statistical learning. However, children with SLI also have impairments in fast-mapping. Nonetheless, they are able to take advantage of additional phonological exposure to boost subsequent word-learning performance. In contrast to the PDH, children with ASD appear to have intact statistical learning, regardless of

  6. Statistical Performances of Resistive Active Power Splitter

    NASA Astrophysics Data System (ADS)

    Lalléchère, Sébastien; Ravelo, Blaise; Thakur, Atul

    2016-03-01

    In this paper, the synthesis and sensitivity analysis of an active power splitter (PWS) is proposed. It is based on the active cell composed of a Field Effect Transistor in cascade with shunted resistor at the input and the output (resistive amplifier topology). The PWS uncertainty versus resistance tolerances is suggested by using stochastic method. Furthermore, with the proposed topology, we can control easily the device gain while varying a resistance. This provides useful tool to analyse the statistical sensitivity of the system in uncertain environment.

  7. Longitudinal data analyses using linear mixed models in SPSS: concepts, procedures and illustrations.

    PubMed

    Shek, Daniel T L; Ma, Cecilia M S

    2011-01-05

    Although different methods are available for the analyses of longitudinal data, analyses based on generalized linear models (GLM) are criticized as violating the assumption of independence of observations. Alternatively, linear mixed models (LMM) are commonly used to understand changes in human behavior over time. In this paper, the basic concepts surrounding LMM (or hierarchical linear models) are outlined. Although SPSS is a statistical analyses package commonly used by researchers, documentation on LMM procedures in SPSS is not thorough or user friendly. With reference to this limitation, the related procedures for performing analyses based on LMM in SPSS are described. To demonstrate the application of LMM analyses in SPSS, findings based on six waves of data collected in the Project P.A.T.H.S. (Positive Adolescent Training through Holistic Social Programmes) in Hong Kong are presented.

  8. Longitudinal Data Analyses Using Linear Mixed Models in SPSS: Concepts, Procedures and Illustrations

    PubMed Central

    Shek, Daniel T. L.; Ma, Cecilia M. S.

    2011-01-01

    Although different methods are available for the analyses of longitudinal data, analyses based on generalized linear models (GLM) are criticized as violating the assumption of independence of observations. Alternatively, linear mixed models (LMM) are commonly used to understand changes in human behavior over time. In this paper, the basic concepts surrounding LMM (or hierarchical linear models) are outlined. Although SPSS is a statistical analyses package commonly used by researchers, documentation on LMM procedures in SPSS is not thorough or user friendly. With reference to this limitation, the related procedures for performing analyses based on LMM in SPSS are described. To demonstrate the application of LMM analyses in SPSS, findings based on six waves of data collected in the Project P.A.T.H.S. (Positive Adolescent Training through Holistic Social Programmes) in Hong Kong are presented. PMID:21218263

  9. Predicting clinical trial results based on announcements of interim analyses

    PubMed Central

    2014-01-01

    Background Announcements of interim analyses of a clinical trial convey information about the results beyond the trial’s Data Safety Monitoring Board (DSMB). The amount of information conveyed may be minimal, but the fact that none of the trial’s stopping boundaries has been crossed implies that the experimental therapy is neither extremely effective nor hopeless. Predicting success of the ongoing trial is of interest to the trial’s sponsor, the medical community, pharmaceutical companies, and investors. We determine the probability of trial success by quantifying only the publicly available information from interim analyses of an ongoing trial. We illustrate our method in the context of the National Surgical Adjuvant Breast and Bowel (NSABP) trial, C-08. Methods We simulated trials based on the specifics of the NSABP C-08 protocol that were publicly available. We quantified the uncertainty around the treatment effect using prior weights for the various possibilities in light of other colon cancer studies and other studies of the investigational agent, bevacizumab. We considered alternative prior distributions. Results Subsequent to the trial’s third interim analysis, our predictive probabilities were: that the trial would eventually be successful, 48.0%; would stop for futility, 7.4%; and would continue to completion without statistical significance, 44.5%. The actual trial continued to completion without statistical significance. Conclusions Announcements of interim analyses provide information outside the DSMB’s sphere of confidentiality. This information is potentially helpful to clinical trial prognosticators. ‘Information leakage’ from standard interim analyses such as in NSABP C-08 is conventionally viewed as acceptable even though it may be quite revealing. Whether leakage from more aggressive types of adaptations is acceptable should be assessed at the design stage. PMID:24607270

  10. Replication Unreliability in Psychology: Elusive Phenomena or “Elusive” Statistical Power?

    PubMed Central

    Tressoldi, Patrizio E.

    2012-01-01

    The focus of this paper is to analyze whether the unreliability of results related to certain controversial psychological phenomena may be a consequence of their low statistical power. Applying the Null Hypothesis Statistical Testing (NHST), still the widest used statistical approach, unreliability derives from the failure to refute the null hypothesis, in particular when exact or quasi-exact replications of experiments are carried out. Taking as example the results of meta-analyses related to four different controversial phenomena, subliminal semantic priming, incubation effect for problem solving, unconscious thought theory, and non-local perception, it was found that, except for semantic priming on categorization, the statistical power to detect the expected effect size (ES) of the typical study, is low or very low. The low power in most studies undermines the use of NHST to study phenomena with moderate or low ESs. We conclude by providing some suggestions on how to increase the statistical power or use different statistical approaches to help discriminate whether the results obtained may or may not be used to support or to refute the reality of a phenomenon with small ES. PMID:22783215

  11. Impaired Statistical Learning in Developmental Dyslexia

    PubMed Central

    Thiessen, Erik D.; Holt, Lori L.

    2015-01-01

    Purpose Developmental dyslexia (DD) is commonly thought to arise from phonological impairments. However, an emerging perspective is that a more general procedural learning deficit, not specific to phonological processing, may underlie DD. The current study examined if individuals with DD are capable of extracting statistical regularities across sequences of passively experienced speech and nonspeech sounds. Such statistical learning is believed to be domain-general, to draw upon procedural learning systems, and to relate to language outcomes. Method DD and control groups were familiarized with a continuous stream of syllables or sine-wave tones, the ordering of which was defined by high or low transitional probabilities across adjacent stimulus pairs. Participants subsequently judged two 3-stimulus test items with either high or low statistical coherence as being the most similar to the sounds heard during familiarization. Results As with control participants, the DD group was sensitive to the transitional probability structure of the familiarization materials as evidenced by above-chance performance. However, the performance of participants with DD was significantly poorer than controls across linguistic and nonlinguistic stimuli. In addition, reading-related measures were significantly correlated with statistical learning performance of both speech and nonspeech material. Conclusion Results are discussed in light of procedural learning impairments among participants with DD. PMID:25860795

  12. Fundamentals and Catalytic Innovation: The Statistical and Data Management Center of the Antibacterial Resistance Leadership Group.

    PubMed

    Huvane, Jacqueline; Komarow, Lauren; Hill, Carol; Tran, Thuy Tien T; Pereira, Carol; Rosenkranz, Susan L; Finnemeyer, Matt; Earley, Michelle; Jiang, Hongyu Jeanne; Wang, Rui; Lok, Judith; Evans, Scott R

    2017-03-15

    The Statistical and Data Management Center (SDMC) provides the Antibacterial Resistance Leadership Group (ARLG) with statistical and data management expertise to advance the ARLG research agenda. The SDMC is active at all stages of a study, including design; data collection and monitoring; data analyses and archival; and publication of study results. The SDMC enhances the scientific integrity of ARLG studies through the development and implementation of innovative and practical statistical methodologies and by educating research colleagues regarding the application of clinical trial fundamentals. This article summarizes the challenges and roles, as well as the innovative contributions in the design, monitoring, and analyses of clinical trials and diagnostic studies, of the ARLG SDMC. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.

  13. Design and Execution of make-like, distributed Analyses based on Spotify’s Pipelining Package Luigi

    NASA Astrophysics Data System (ADS)

    Erdmann, M.; Fischer, B.; Fischer, R.; Rieger, M.

    2017-10-01

    In high-energy particle physics, workflow management systems are primarily used as tailored solutions in dedicated areas such as Monte Carlo production. However, physicists performing data analyses are usually required to steer their individual workflows manually which is time-consuming and often leads to undocumented relations between particular workloads. We present a generic analysis design pattern that copes with the sophisticated demands of end-to-end HEP analyses and provides a make-like execution system. It is based on the open-source pipelining package Luigi which was developed at Spotify and enables the definition of arbitrary workloads, so-called Tasks, and the dependencies between them in a lightweight and scalable structure. Further features are multi-user support, automated dependency resolution and error handling, central scheduling, and status visualization in the web. In addition to already built-in features for remote jobs and file systems like Hadoop and HDFS, we added support for WLCG infrastructure such as LSF and CREAM job submission, as well as remote file access through the Grid File Access Library. Furthermore, we implemented automated resubmission functionality, software sandboxing, and a command line interface with auto-completion for a convenient working environment. For the implementation of a t \\overline{{{t}}} H cross section measurement, we created a generic Python interface that provides programmatic access to all external information such as datasets, physics processes, statistical models, and additional files and values. In summary, the setup enables the execution of the entire analysis in a parallelized and distributed fashion with a single command.

  14. A simulations approach for meta-analysis of genetic association studies based on additive genetic model.

    PubMed

    John, Majnu; Lencz, Todd; Malhotra, Anil K; Correll, Christoph U; Zhang, Jian-Ping

    2018-06-01

    Meta-analysis of genetic association studies is being increasingly used to assess phenotypic differences between genotype groups. When the underlying genetic model is assumed to be dominant or recessive, assessing the phenotype differences based on summary statistics, reported for individual studies in a meta-analysis, is a valid strategy. However, when the genetic model is additive, a similar strategy based on summary statistics will lead to biased results. This fact about the additive model is one of the things that we establish in this paper, using simulations. The main goal of this paper is to present an alternate strategy for the additive model based on simulating data for the individual studies. We show that the alternate strategy is far superior to the strategy based on summary statistics.

  15. Statistical evaluation of surrogate endpoints with examples from cancer clinical trials.

    PubMed

    Buyse, Marc; Molenberghs, Geert; Paoletti, Xavier; Oba, Koji; Alonso, Ariel; Van der Elst, Wim; Burzykowski, Tomasz

    2016-01-01

    A surrogate endpoint is intended to replace a clinical endpoint for the evaluation of new treatments when it can be measured more cheaply, more conveniently, more frequently, or earlier than that clinical endpoint. A surrogate endpoint is expected to predict clinical benefit, harm, or lack of these. Besides the biological plausibility of a surrogate, a quantitative assessment of the strength of evidence for surrogacy requires the demonstration of the prognostic value of the surrogate for the clinical outcome, and evidence that treatment effects on the surrogate reliably predict treatment effects on the clinical outcome. We focus on these two conditions, and outline the statistical approaches that have been proposed to assess the extent to which these conditions are fulfilled. When data are available from a single trial, one can assess the "individual level association" between the surrogate and the true endpoint. When data are available from several trials, one can additionally assess the "trial level association" between the treatment effect on the surrogate and the treatment effect on the true endpoint. In the latter case, the "surrogate threshold effect" can be estimated as the minimum effect on the surrogate endpoint that predicts a statistically significant effect on the clinical endpoint. All these concepts are discussed in the context of randomized clinical trials in oncology, and illustrated with two meta-analyses in gastric cancer. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Statistical Parametric Mapping to Identify Differences between Consensus-Based Joint Patterns during Gait in Children with Cerebral Palsy

    PubMed Central

    Papageorgiou, Eirini; Desloovere, Kaat; Molenaers, Guy; De Laet, Tinne

    2017-01-01

    Experts recently identified 49 joint motion patterns in children with cerebral palsy during a Delphi consensus study. Pattern definitions were therefore the result of subjective expert opinion. The present study aims to provide objective, quantitative data supporting the identification of these consensus-based patterns. To do so, statistical parametric mapping was used to compare the mean kinematic waveforms of 154 trials of typically developing children (n = 56) to the mean kinematic waveforms of 1719 trials of children with cerebral palsy (n = 356), which were classified following the classification rules of the Delphi study. Three hypotheses stated that: (a) joint motion patterns with ‘no or minor gait deviations’ (n = 11 patterns) do not differ significantly from the gait pattern of typically developing children; (b) all other pathological joint motion patterns (n = 38 patterns) differ from typically developing gait and the locations of difference within the gait cycle, highlighted by statistical parametric mapping, concur with the consensus-based classification rules. (c) all joint motion patterns at the level of each joint (n = 49 patterns) differ from each other during at least one phase of the gait cycle. Results showed that: (a) ten patterns with ‘no or minor gait deviations’ differed somewhat unexpectedly from typically developing gait, but these differences were generally small (≤3°); (b) all other joint motion patterns (n = 38) differed from typically developing gait and the significant locations within the gait cycle that were indicated by the statistical analyses, coincided well with the classification rules; (c) joint motion patterns at the level of each joint significantly differed from each other, apart from two sagittal plane pelvic patterns. In addition to these results, for several joints, statistical analyses indicated other significant areas during the gait cycle that were not included in the pattern definitions of the consensus

  17. Dynamic Graphics in Excel for Teaching Statistics: Understanding the Probability Density Function

    ERIC Educational Resources Information Center

    Coll-Serrano, Vicente; Blasco-Blasco, Olga; Alvarez-Jareno, Jose A.

    2011-01-01

    In this article, we show a dynamic graphic in Excel that is used to introduce an important concept in our subject, Statistics I: the probability density function. This interactive graphic seeks to facilitate conceptual understanding of the main aspects analysed by the learners.

  18. Statistics Anxiety and Business Statistics: The International Student

    ERIC Educational Resources Information Center

    Bell, James A.

    2008-01-01

    Does the international student suffer from statistics anxiety? To investigate this, the Statistics Anxiety Rating Scale (STARS) was administered to sixty-six beginning statistics students, including twelve international students and fifty-four domestic students. Due to the small number of international students, nonparametric methods were used to…

  19. Effects of additional data on Bayesian clustering.

    PubMed

    Yamazaki, Keisuke

    2017-10-01

    Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Visualizing statistical significance of disease clusters using cartograms.

    PubMed

    Kronenfeld, Barry J; Wong, David W S

    2017-05-15

    Health officials and epidemiological researchers often use maps of disease rates to identify potential disease clusters. Because these maps exaggerate the prominence of low-density districts and hide potential clusters in urban (high-density) areas, many researchers have used density-equalizing maps (cartograms) as a basis for epidemiological mapping. However, we do not have existing guidelines for visual assessment of statistical uncertainty. To address this shortcoming, we develop techniques for visual determination of statistical significance of clusters spanning one or more districts on a cartogram. We developed the techniques within a geovisual analytics framework that does not rely on automated significance testing, and can therefore facilitate visual analysis to detect clusters that automated techniques might miss. On a cartogram of the at-risk population, the statistical significance of a disease cluster is determinate from the rate, area and shape of the cluster under standard hypothesis testing scenarios. We develop formulae to determine, for a given rate, the area required for statistical significance of a priori and a posteriori designated regions under certain test assumptions. Uniquely, our approach enables dynamic inference of aggregate regions formed by combining individual districts. The method is implemented in interactive tools that provide choropleth mapping, automated legend construction and dynamic search tools to facilitate cluster detection and assessment of the validity of tested assumptions. A case study of leukemia incidence analysis in California demonstrates the ability to visually distinguish between statistically significant and insignificant regions. The proposed geovisual analytics approach enables intuitive visual assessment of statistical significance of arbitrarily defined regions on a cartogram. Our research prompts a broader discussion of the role of geovisual exploratory analyses in disease mapping and the appropriate

  1. DISSCO: direct imputation of summary statistics allowing covariates

    PubMed Central

    Xu, Zheng; Duan, Qing; Yan, Song; Chen, Wei; Li, Mingyao; Lange, Ethan; Li, Yun

    2015-01-01

    Background: Imputation of individual level genotypes at untyped markers using an external reference panel of genotyped or sequenced individuals has become standard practice in genetic association studies. Direct imputation of summary statistics can also be valuable, for example in meta-analyses where individual level genotype data are not available. Two methods (DIST and ImpG-Summary/LD), that assume a multivariate Gaussian distribution for the association summary statistics, have been proposed for imputing association summary statistics. However, both methods assume that the correlations between association summary statistics are the same as the correlations between the corresponding genotypes. This assumption can be violated in the presence of confounding covariates. Methods: We analytically show that in the absence of covariates, correlation among association summary statistics is indeed the same as that among the corresponding genotypes, thus serving as a theoretical justification for the recently proposed methods. We continue to prove that in the presence of covariates, correlation among association summary statistics becomes the partial correlation of the corresponding genotypes controlling for covariates. We therefore develop direct imputation of summary statistics allowing covariates (DISSCO). Results: We consider two real-life scenarios where the correlation and partial correlation likely make practical difference: (i) association studies in admixed populations; (ii) association studies in presence of other confounding covariate(s). Application of DISSCO to real datasets under both scenarios shows at least comparable, if not better, performance compared with existing correlation-based methods, particularly for lower frequency variants. For example, DISSCO can reduce the absolute deviation from the truth by 3.9–15.2% for variants with minor allele frequency <5%. Availability and implementation: http://www.unc.edu/∼yunmli/DISSCO. Contact: yunli

  2. DISSCO: direct imputation of summary statistics allowing covariates.

    PubMed

    Xu, Zheng; Duan, Qing; Yan, Song; Chen, Wei; Li, Mingyao; Lange, Ethan; Li, Yun

    2015-08-01

    Imputation of individual level genotypes at untyped markers using an external reference panel of genotyped or sequenced individuals has become standard practice in genetic association studies. Direct imputation of summary statistics can also be valuable, for example in meta-analyses where individual level genotype data are not available. Two methods (DIST and ImpG-Summary/LD), that assume a multivariate Gaussian distribution for the association summary statistics, have been proposed for imputing association summary statistics. However, both methods assume that the correlations between association summary statistics are the same as the correlations between the corresponding genotypes. This assumption can be violated in the presence of confounding covariates. We analytically show that in the absence of covariates, correlation among association summary statistics is indeed the same as that among the corresponding genotypes, thus serving as a theoretical justification for the recently proposed methods. We continue to prove that in the presence of covariates, correlation among association summary statistics becomes the partial correlation of the corresponding genotypes controlling for covariates. We therefore develop direct imputation of summary statistics allowing covariates (DISSCO). We consider two real-life scenarios where the correlation and partial correlation likely make practical difference: (i) association studies in admixed populations; (ii) association studies in presence of other confounding covariate(s). Application of DISSCO to real datasets under both scenarios shows at least comparable, if not better, performance compared with existing correlation-based methods, particularly for lower frequency variants. For example, DISSCO can reduce the absolute deviation from the truth by 3.9-15.2% for variants with minor allele frequency <5%. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Remote camera-trap methods and analyses reveal impacts of rangeland management on Namibian carnivore communities

    USGS Publications Warehouse

    Kauffman, M.J.; Sanjayan, M.; Lowenstein, J.; Nelson, A.; Jeo, R.M.; Crooks, K.R.

    2007-01-01

    Assessing the abundance and distribution of mammalian carnivores is vital for understanding their ecology and providing for their long-term conservation. Because of the difficulty of trapping and handling carnivores many studies have relied on abundance indices that may not accurately reflect real abundance and distribution patterns. We developed statistical analyses that detect spatial correlation in visitation data from combined scent station and camera-trap surveys, and we illustrate how to use such data to make inferences about changes in carnivore assemblages. As a case study we compared the carnivore communities of adjacent communal and freehold rangelands in central Namibia. We used an index of overdispersion to test for repeat visits to individual camera-trap scent stations and a bootstrap simulation to test for correlations in visits to camera neighbourhoods. After distilling our presence-absence data to the most defensible spatial scale, we assessed overall carnivore visitation using logistic regression. Our analyses confirmed the expected pattern of a depauparate fauna on the communal rangelands compared to the freehold rangelands. Additionally, the species that were not detected on communal sites were the larger-bodied carnivores. By modelling these rare visits as a Poisson process we illustrate a method of inferring whether or not such patterns are because of local extinction of species or are simply a result of low sample effort. Our Namibian case study indicates that these field methods and analyses can detect meaningful differences in the carnivore communities brought about by anthropogenic influences. ?? 2007 FFI.

  4. Back to BaySICS: a user-friendly program for Bayesian Statistical Inference from Coalescent Simulations.

    PubMed

    Sandoval-Castellanos, Edson; Palkopoulou, Eleftheria; Dalén, Love

    2014-01-01

    Inference of population demographic history has vastly improved in recent years due to a number of technological and theoretical advances including the use of ancient DNA. Approximate Bayesian computation (ABC) stands among the most promising methods due to its simple theoretical fundament and exceptional flexibility. However, limited availability of user-friendly programs that perform ABC analysis renders it difficult to implement, and hence programming skills are frequently required. In addition, there is limited availability of programs able to deal with heterochronous data. Here we present the software BaySICS: Bayesian Statistical Inference of Coalescent Simulations. BaySICS provides an integrated and user-friendly platform that performs ABC analyses by means of coalescent simulations from DNA sequence data. It estimates historical demographic population parameters and performs hypothesis testing by means of Bayes factors obtained from model comparisons. Although providing specific features that improve inference from datasets with heterochronous data, BaySICS also has several capabilities making it a suitable tool for analysing contemporary genetic datasets. Those capabilities include joint analysis of independent tables, a graphical interface and the implementation of Markov-chain Monte Carlo without likelihoods.

  5. System for analysing sickness absenteeism in Poland.

    PubMed

    Indulski, J A; Szubert, Z

    1997-01-01

    The National System of Sickness Absenteeism Statistics has been functioning in Poland since 1977, as the part of the national health statistics. The system is based on a 15-percent random sample of copies of certificates of temporary incapacity for work issued by all health care units and authorised private medical practitioners. A certificate of temporary incapacity for work is received by every insured employee who is compelled to stop working due to sickness, accident, or due to the necessity to care for a sick member of his/her family. The certificate is required on the first day of sickness. Analyses of disease- and accident-related sickness absenteeism carried out each year in Poland within the statistical system lead to the main conclusions: 1. Diseases of the musculoskeletal and peripheral nervous systems accounting, when combined, for 1/3 of the total sickness absenteeism, are a major health problem of the working population in Poland. During the past five years, incapacity for work caused by these diseases in males increased 2.5 times. 2. Circulatory diseases, and arterial hypertension and ischaemic heart disease in particular (41% and 27% of sickness days, respectively), create an essential health problem among males at productive age, especially, in the 40 and older age group. Absenteeism due to these diseases has increased in males more than two times.

  6. Bootstrap versus Statistical Effect Size Corrections: A Comparison with Data from the Finding Embedded Figures Test.

    ERIC Educational Resources Information Center

    Thompson, Bruce; Melancon, Janet G.

    Effect sizes have been increasingly emphasized in research as more researchers have recognized that: (1) all parametric analyses (t-tests, analyses of variance, etc.) are correlational; (2) effect sizes have played an important role in meta-analytic work; and (3) statistical significance testing is limited in its capacity to inform scientific…

  7. The Need for Speed in Rodent Locomotion Analyses

    PubMed Central

    Batka, Richard J.; Brown, Todd J.; Mcmillan, Kathryn P.; Meadows, Rena M.; Jones, Kathryn J.; Haulcomb, Melissa M.

    2016-01-01

    Locomotion analysis is now widely used across many animal species to understand the motor defects in disease, functional recovery following neural injury, and the effectiveness of various treatments. More recently, rodent locomotion analysis has become an increasingly popular method in a diverse range of research. Speed is an inseparable aspect of locomotion that is still not fully understood, and its effects are often not properly incorporated while analyzing data. In this hybrid manuscript, we accomplish three things: (1) review the interaction between speed and locomotion variables in rodent studies, (2) comprehensively analyze the relationship between speed and 162 locomotion variables in a group of 16 wild-type mice using the CatWalk gait analysis system, and (3) develop and test a statistical method in which locomotion variables are analyzed and reported in the context of speed. Notable results include the following: (1) over 90% of variables, reported by CatWalk, were dependent on speed with an average R2 value of 0.624, (2) most variables were related to speed in a nonlinear manner, (3) current methods of controlling for speed are insufficient, and (4) the linear mixed model is an appropriate and effective statistical method for locomotion analyses that is inclusive of speed-dependent relationships. Given the pervasive dependency of locomotion variables on speed, we maintain that valid conclusions from locomotion analyses cannot be made unless they are analyzed and reported within the context of speed. PMID:24890845

  8. Statistical analysis plan for the Alveolar Recruitment for Acute Respiratory Distress Syndrome Trial (ART). A randomized controlled trial

    PubMed Central

    Damiani, Lucas Petri; Berwanger, Otavio; Paisani, Denise; Laranjeira, Ligia Nasi; Suzumura, Erica Aranha; Amato, Marcelo Britto Passos; Carvalho, Carlos Roberto Ribeiro; Cavalcanti, Alexandre Biasi

    2017-01-01

    Background The Alveolar Recruitment for Acute Respiratory Distress Syndrome Trial (ART) is an international multicenter randomized pragmatic controlled trial with allocation concealment involving 120 intensive care units in Brazil, Argentina, Colombia, Italy, Poland, Portugal, Malaysia, Spain, and Uruguay. The primary objective of ART is to determine whether maximum stepwise alveolar recruitment associated with PEEP titration, adjusted according to the static compliance of the respiratory system (ART strategy), is able to increase 28-day survival in patients with acute respiratory distress syndrome compared to conventional treatment (ARDSNet strategy). Objective To describe the data management process and statistical analysis plan. Methods The statistical analysis plan was designed by the trial executive committee and reviewed and approved by the trial steering committee. We provide an overview of the trial design with a special focus on describing the primary (28-day survival) and secondary outcomes. We describe our data management process, data monitoring committee, interim analyses, and sample size calculation. We describe our planned statistical analyses for primary and secondary outcomes as well as pre-specified subgroup analyses. We also provide details for presenting results, including mock tables for baseline characteristics, adherence to the protocol and effect on clinical outcomes. Conclusion According to best trial practice, we report our statistical analysis plan and data management plan prior to locking the database and beginning analyses. We anticipate that this document will prevent analysis bias and enhance the utility of the reported results. Trial registration ClinicalTrials.gov number, NCT01374022. PMID:28977255

  9. A statistical human resources costing and accounting model for analysing the economic effects of an intervention at a workplace.

    PubMed

    Landstad, Bodil J; Gelin, Gunnar; Malmquist, Claes; Vinberg, Stig

    2002-09-15

    The study had two primary aims. The first aim was to combine a human resources costing and accounting approach (HRCA) with a quantitative statistical approach in order to get an integrated model. The second aim was to apply this integrated model in a quasi-experimental study in order to investigate whether preventive intervention affected sickness absence costs at the company level. The intervention studied contained occupational organizational measures, competence development, physical and psychosocial working environmental measures and individual and rehabilitation measures on both an individual and a group basis. The study is a quasi-experimental design with a non-randomized control group. Both groups involved cleaning jobs at predominantly female workplaces. The study plan involved carrying out before and after studies on both groups. The study included only those who were at the same workplace during the whole of the study period. In the HRCA model used here, the cost of sickness absence is the net difference between the costs, in the form of the value of the loss of production and the administrative cost, and the benefits in the form of lower labour costs. According to the HRCA model, the intervention used counteracted a rise in sickness absence costs at the company level, giving an average net effect of 266.5 Euros per person (full-time working) during an 8-month period. Using an analogue statistical analysis on the whole of the material, the contribution of the intervention counteracted a rise in sickness absence costs at the company level giving an average net effect of 283.2 Euros. Using a statistical method it was possible to study the regression coefficients in sub-groups and calculate the p-values for these coefficients; in the younger group the intervention gave a calculated net contribution of 605.6 Euros with a p-value of 0.073, while the intervention net contribution in the older group had a very high p-value. Using the statistical model it was

  10. Real-world visual statistics and infants' first-learned object names.

    PubMed

    Clerkin, Elizabeth M; Hart, Elizabeth; Rehg, James M; Yu, Chen; Smith, Linda B

    2017-01-05

    We offer a new solution to the unsolved problem of how infants break into word learning based on the visual statistics of everyday infant-perspective scenes. Images from head camera video captured by 8 1/2 to 10 1/2 month-old infants at 147 at-home mealtime events were analysed for the objects in view. The images were found to be highly cluttered with many different objects in view. However, the frequency distribution of object categories was extremely right skewed such that a very small set of objects was pervasively present-a fact that may substantially reduce the problem of referential ambiguity. The statistical structure of objects in these infant egocentric scenes differs markedly from that in the training sets used in computational models and in experiments on statistical word-referent learning. Therefore, the results also indicate a need to re-examine current explanations of how infants break into word learning.This article is part of the themed issue 'New frontiers for statistical learning in the cognitive sciences'. © 2016 The Author(s).

  11. Real-world visual statistics and infants' first-learned object names

    PubMed Central

    Clerkin, Elizabeth M.; Hart, Elizabeth; Rehg, James M.; Yu, Chen

    2017-01-01

    We offer a new solution to the unsolved problem of how infants break into word learning based on the visual statistics of everyday infant-perspective scenes. Images from head camera video captured by 8 1/2 to 10 1/2 month-old infants at 147 at-home mealtime events were analysed for the objects in view. The images were found to be highly cluttered with many different objects in view. However, the frequency distribution of object categories was extremely right skewed such that a very small set of objects was pervasively present—a fact that may substantially reduce the problem of referential ambiguity. The statistical structure of objects in these infant egocentric scenes differs markedly from that in the training sets used in computational models and in experiments on statistical word-referent learning. Therefore, the results also indicate a need to re-examine current explanations of how infants break into word learning. This article is part of the themed issue ‘New frontiers for statistical learning in the cognitive sciences’. PMID:27872373

  12. Common pitfalls in statistical analysis: Clinical versus statistical significance

    PubMed Central

    Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc

    2015-01-01

    In clinical research, study results, which are statistically significant are often interpreted as being clinically important. While statistical significance indicates the reliability of the study results, clinical significance reflects its impact on clinical practice. The third article in this series exploring pitfalls in statistical analysis clarifies the importance of differentiating between statistical significance and clinical significance. PMID:26229754

  13. Spectral statistics of the uni-modular ensemble

    NASA Astrophysics Data System (ADS)

    Joyner, Christopher H.; Smilansky, Uzy; Weidenmüller, Hans A.

    2017-09-01

    We investigate the spectral statistics of Hermitian matrices in which the elements are chosen uniformly from U(1) , called the uni-modular ensemble (UME), in the limit of large matrix size. Using three complimentary methods; a supersymmetric integration method, a combinatorial graph-theoretical analysis and a Brownian motion approach, we are able to derive expressions for 1 / N corrections to the mean spectral moments and also analyse the fluctuations about this mean. By addressing the same ensemble from three different point of view, we can critically compare their relative advantages and derive some new results.

  14. Statistical analysis of Thematic Mapper Simulator data for the geobotanical discrimination of rock types in southwest Oregon

    NASA Technical Reports Server (NTRS)

    Morrissey, L. A.; Weinstock, K. J.; Mouat, D. A.; Card, D. H.

    1984-01-01

    An evaluation of Thematic Mapper Simulator (TMS) data for the geobotanical discrimination of rock types based on vegetative cover characteristics is addressed in this research. A methodology for accomplishing this evaluation utilizing univariate and multivariate techniques is presented. TMS data acquired with a Daedalus DEI-1260 multispectral scanner were integrated with vegetation and geologic information for subsequent statistical analyses, which included a chi-square test, an analysis of variance, stepwise discriminant analysis, and Duncan's multiple range test. Results indicate that ultramafic rock types are spectrally separable from nonultramafics based on vegetative cover through the use of statistical analyses.

  15. A risk-based statistical investigation of the quantification of polymorphic purity of a pharmaceutical candidate by solid-state 19F NMR.

    PubMed

    Barry, Samantha J; Pham, Tran N; Borman, Phil J; Edwards, Andrew J; Watson, Simon A

    2012-01-27

    The DMAIC (Define, Measure, Analyse, Improve and Control) framework and associated statistical tools have been applied to both identify and reduce variability observed in a quantitative (19)F solid-state NMR (SSNMR) analytical method. The method had been developed to quantify levels of an additional polymorph (Form 3) in batches of an active pharmaceutical ingredient (API), where Form 1 is the predominant polymorph. In order to validate analyses of the polymorphic form, a single batch of API was used as a standard each time the method was used. The level of Form 3 in this standard was observed to gradually increase over time, the effect not being immediately apparent due to method variability. In order to determine the cause of this unexpected increase and to reduce method variability, a risk-based statistical investigation was performed to identify potential factors which could be responsible for these effects. Factors identified by the risk assessment were investigated using a series of designed experiments to gain a greater understanding of the method. The increase of the level of Form 3 in the standard was primarily found to correlate with the number of repeat analyses, an effect not previously reported in SSNMR literature. Differences in data processing (phasing and linewidth) were found to be responsible for the variability in the method. After implementing corrective actions the variability was reduced such that the level of Form 3 was within an acceptable range of ±1% ww(-1) in fresh samples of API. Copyright © 2011. Published by Elsevier B.V.

  16. An experiment in software reliability: Additional analyses using data from automated replications

    NASA Technical Reports Server (NTRS)

    Dunham, Janet R.; Lauterbach, Linda A.

    1988-01-01

    A study undertaken to collect software error data of laboratory quality for use in the development of credible methods for predicting the reliability of software used in life-critical applications is summarized. The software error data reported were acquired through automated repetitive run testing of three independent implementations of a launch interceptor condition module of a radar tracking problem. The results are based on 100 test applications to accumulate a sufficient sample size for error rate estimation. The data collected is used to confirm the results of two Boeing studies reported in NASA-CR-165836 Software Reliability: Repetitive Run Experimentation and Modeling, and NASA-CR-172378 Software Reliability: Additional Investigations into Modeling With Replicated Experiments, respectively. That is, the results confirm the log-linear pattern of software error rates and reject the hypothesis of equal error rates per individual fault. This rejection casts doubt on the assumption that the program's failure rate is a constant multiple of the number of residual bugs; an assumption which underlies some of the current models of software reliability. data raises new questions concerning the phenomenon of interacting faults.

  17. Software Used to Generate Cancer Statistics - SEER Cancer Statistics

    Cancer.gov

    Videos that highlight topics and trends in cancer statistics and definitions of statistical terms. Also software tools for analyzing and reporting cancer statistics, which are used to compile SEER's annual reports.

  18. 10 CFR 52.158 - Contents of application; additional technical information.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 10 Energy 2 2010-01-01 2010-01-01 false Contents of application; additional technical information... APPROVALS FOR NUCLEAR POWER PLANTS Manufacturing Licenses § 52.158 Contents of application; additional technical information. The application must contain: (a)(1) Inspections, tests, analyses, and acceptance...

  19. Statistical analysis of solid waste composition data: Arithmetic mean, standard deviation and correlation coefficients.

    PubMed

    Edjabou, Maklawe Essonanawe; Martín-Fernández, Josep Antoni; Scheutz, Charlotte; Astrup, Thomas Fruergaard

    2017-11-01

    Data for fractional solid waste composition provide relative magnitudes of individual waste fractions, the percentages of which always sum to 100, thereby connecting them intrinsically. Due to this sum constraint, waste composition data represent closed data, and their interpretation and analysis require statistical methods, other than classical statistics that are suitable only for non-constrained data such as absolute values. However, the closed characteristics of waste composition data are often ignored when analysed. The results of this study showed, for example, that unavoidable animal-derived food waste amounted to 2.21±3.12% with a confidence interval of (-4.03; 8.45), which highlights the problem of the biased negative proportions. A Pearson's correlation test, applied to waste fraction generation (kg mass), indicated a positive correlation between avoidable vegetable food waste and plastic packaging. However, correlation tests applied to waste fraction compositions (percentage values) showed a negative association in this regard, thus demonstrating that statistical analyses applied to compositional waste fraction data, without addressing the closed characteristics of these data, have the potential to generate spurious or misleading results. Therefore, ¨compositional data should be transformed adequately prior to any statistical analysis, such as computing mean, standard deviation and correlation coefficients. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Data-driven inference for the spatial scan statistic.

    PubMed

    Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C

    2011-08-02

    Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.