Skinner, Carl G; Patel, Manish M; Thomas, Jerry D; Miller, Michael A
2011-01-01
Statistical methods are pervasive in medical research and general medical literature. Understanding general statistical concepts will enhance our ability to critically appraise the current literature and ultimately improve the delivery of patient care. This article intends to provide an overview of the common statistical methods relevant to medicine.
Quality of reporting statistics in two Indian pharmacology journals
Jaykaran; Yadav, Preeti
2011-01-01
Objective: To evaluate the reporting of the statistical methods in articles published in two Indian pharmacology journals. Materials and Methods: All original articles published since 2002 were downloaded from the journals’ (Indian Journal of Pharmacology (IJP) and Indian Journal of Physiology and Pharmacology (IJPP)) website. These articles were evaluated on the basis of appropriateness of descriptive statistics and inferential statistics. Descriptive statistics was evaluated on the basis of reporting of method of description and central tendencies. Inferential statistics was evaluated on the basis of fulfilling of assumption of statistical methods and appropriateness of statistical tests. Values are described as frequencies, percentage, and 95% confidence interval (CI) around the percentages. Results: Inappropriate descriptive statistics was observed in 150 (78.1%, 95% CI 71.7–83.3%) articles. Most common reason for this inappropriate descriptive statistics was use of mean ± SEM at the place of “mean (SD)” or “mean ± SD.” Most common statistical method used was one-way ANOVA (58.4%). Information regarding checking of assumption of statistical test was mentioned in only two articles. Inappropriate statistical test was observed in 61 (31.7%, 95% CI 25.6–38.6%) articles. Most common reason for inappropriate statistical test was the use of two group test for three or more groups. Conclusion: Articles published in two Indian pharmacology journals are not devoid of statistical errors. PMID:21772766
Quality of reporting statistics in two Indian pharmacology journals.
Jaykaran; Yadav, Preeti
2011-04-01
To evaluate the reporting of the statistical methods in articles published in two Indian pharmacology journals. All original articles published since 2002 were downloaded from the journals' (Indian Journal of Pharmacology (IJP) and Indian Journal of Physiology and Pharmacology (IJPP)) website. These articles were evaluated on the basis of appropriateness of descriptive statistics and inferential statistics. Descriptive statistics was evaluated on the basis of reporting of method of description and central tendencies. Inferential statistics was evaluated on the basis of fulfilling of assumption of statistical methods and appropriateness of statistical tests. Values are described as frequencies, percentage, and 95% confidence interval (CI) around the percentages. Inappropriate descriptive statistics was observed in 150 (78.1%, 95% CI 71.7-83.3%) articles. Most common reason for this inappropriate descriptive statistics was use of mean ± SEM at the place of "mean (SD)" or "mean ± SD." Most common statistical method used was one-way ANOVA (58.4%). Information regarding checking of assumption of statistical test was mentioned in only two articles. Inappropriate statistical test was observed in 61 (31.7%, 95% CI 25.6-38.6%) articles. Most common reason for inappropriate statistical test was the use of two group test for three or more groups. Articles published in two Indian pharmacology journals are not devoid of statistical errors.
A Comparison of Two-Group Classification Methods
ERIC Educational Resources Information Center
Holden, Jocelyn E.; Finch, W. Holmes; Kelley, Ken
2011-01-01
The statistical classification of "N" individuals into "G" mutually exclusive groups when the actual group membership is unknown is common in the social and behavioral sciences. The results of such classification methods often have important consequences. Among the most common methods of statistical classification are linear discriminant analysis,…
Analysis of Statistical Methods and Errors in the Articles Published in the Korean Journal of Pain
Yim, Kyoung Hoon; Han, Kyoung Ah; Park, Soo Young
2010-01-01
Background Statistical analysis is essential in regard to obtaining objective reliability for medical research. However, medical researchers do not have enough statistical knowledge to properly analyze their study data. To help understand and potentially alleviate this problem, we have analyzed the statistical methods and errors of articles published in the Korean Journal of Pain (KJP), with the intention to improve the statistical quality of the journal. Methods All the articles, except case reports and editorials, published from 2004 to 2008 in the KJP were reviewed. The types of applied statistical methods and errors in the articles were evaluated. Results One hundred and thirty-nine original articles were reviewed. Inferential statistics and descriptive statistics were used in 119 papers and 20 papers, respectively. Only 20.9% of the papers were free from statistical errors. The most commonly adopted statistical method was the t-test (21.0%) followed by the chi-square test (15.9%). Errors of omission were encountered 101 times in 70 papers. Among the errors of omission, "no statistics used even though statistical methods were required" was the most common (40.6%). The errors of commission were encountered 165 times in 86 papers, among which "parametric inference for nonparametric data" was the most common (33.9%). Conclusions We found various types of statistical errors in the articles published in the KJP. This suggests that meticulous attention should be given not only in the applying statistical procedures but also in the reviewing process to improve the value of the article. PMID:20552071
Hamel, Jean-Francois; Saulnier, Patrick; Pe, Madeline; Zikos, Efstathios; Musoro, Jammbe; Coens, Corneel; Bottomley, Andrew
2017-09-01
Over the last decades, Health-related Quality of Life (HRQoL) end-points have become an important outcome of the randomised controlled trials (RCTs). HRQoL methodology in RCTs has improved following international consensus recommendations. However, no international recommendations exist concerning the statistical analysis of such data. The aim of our study was to identify and characterise the quality of the statistical methods commonly used for analysing HRQoL data in cancer RCTs. Building on our recently published systematic review, we analysed a total of 33 published RCTs studying the HRQoL methods reported in RCTs since 1991. We focussed on the ability of the methods to deal with the three major problems commonly encountered when analysing HRQoL data: their multidimensional and longitudinal structure and the commonly high rate of missing data. All studies reported HRQoL being assessed repeatedly over time for a period ranging from 2 to 36 months. Missing data were common, with compliance rates ranging from 45% to 90%. From the 33 studies considered, 12 different statistical methods were identified. Twenty-nine studies analysed each of the questionnaire sub-dimensions without type I error adjustment. Thirteen studies repeated the HRQoL analysis at each assessment time again without type I error adjustment. Only 8 studies used methods suitable for repeated measurements. Our findings show a lack of consistency in statistical methods for analysing HRQoL data. Problems related to multiple comparisons were rarely considered leading to a high risk of false positive results. It is therefore critical that international recommendations for improving such statistical practices are developed. Copyright © 2017. Published by Elsevier Ltd.
Predicting recreational water quality advisories: A comparison of statistical methods
Brooks, Wesley R.; Corsi, Steven R.; Fienen, Michael N.; Carvin, Rebecca B.
2016-01-01
Epidemiological studies indicate that fecal indicator bacteria (FIB) in beach water are associated with illnesses among people having contact with the water. In order to mitigate public health impacts, many beaches are posted with an advisory when the concentration of FIB exceeds a beach action value. The most commonly used method of measuring FIB concentration takes 18–24 h before returning a result. In order to avoid the 24 h lag, it has become common to ”nowcast” the FIB concentration using statistical regressions on environmental surrogate variables. Most commonly, nowcast models are estimated using ordinary least squares regression, but other regression methods from the statistical and machine learning literature are sometimes used. This study compares 14 regression methods across 7 Wisconsin beaches to identify which consistently produces the most accurate predictions. A random forest model is identified as the most accurate, followed by multiple regression fit using the adaptive LASSO.
Pathway analysis with next-generation sequencing data.
Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric; Xiong, Momiao
2015-04-01
Although pathway analysis methods have been developed and successfully applied to association studies of common variants, the statistical methods for pathway-based association analysis of rare variants have not been well developed. Many investigators observed highly inflated false-positive rates and low power in pathway-based tests of association of rare variants. The inflated false-positive rates and low true-positive rates of the current methods are mainly due to their lack of ability to account for gametic phase disequilibrium. To overcome these serious limitations, we develop a novel statistic that is based on the smoothed functional principal component analysis (SFPCA) for pathway association tests with next-generation sequencing data. The developed statistic has the ability to capture position-level variant information and account for gametic phase disequilibrium. By intensive simulations, we demonstrate that the SFPCA-based statistic for testing pathway association with either rare or common or both rare and common variants has the correct type 1 error rates. Also the power of the SFPCA-based statistic and 22 additional existing statistics are evaluated. We found that the SFPCA-based statistic has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the SFPCA-based statistic is applied to pathway analysis of exome sequencing data in the early-onset myocardial infarction (EOMI) project. We identify three pathways significantly associated with EOMI after the Bonferroni correction. In addition, our preliminary results show that the SFPCA-based statistic has much smaller P-values to identify pathway association than other existing methods.
2015-03-26
to my reader, Lieutenant Colonel Robert Overstreet, for helping solidify my research, coaching me through the statistical analysis, and positive...61 Descriptive Statistics .............................................................................................................. 61...common-method bias requires careful assessment of potential sources of bias and implementing procedural and statistical control methods. Podsakoff
Ganger, Michael T; Dietz, Geoffrey D; Ewing, Sarah J
2017-12-01
qPCR has established itself as the technique of choice for the quantification of gene expression. Procedures for conducting qPCR have received significant attention; however, more rigorous approaches to the statistical analysis of qPCR data are needed. Here we develop a mathematical model, termed the Common Base Method, for analysis of qPCR data based on threshold cycle values (C q ) and efficiencies of reactions (E). The Common Base Method keeps all calculations in the logscale as long as possible by working with log 10 (E) ∙ C q , which we call the efficiency-weighted C q value; subsequent statistical analyses are then applied in the logscale. We show how efficiency-weighted C q values may be analyzed using a simple paired or unpaired experimental design and develop blocking methods to help reduce unexplained variation. The Common Base Method has several advantages. It allows for the incorporation of well-specific efficiencies and multiple reference genes. The method does not necessitate the pairing of samples that must be performed using traditional analysis methods in order to calculate relative expression ratios. Our method is also simple enough to be implemented in any spreadsheet or statistical software without additional scripts or proprietary components.
Kline, Joshua C.
2014-01-01
Over the past four decades, various methods have been implemented to measure synchronization of motor-unit firings. In this work, we provide evidence that prior reports of the existence of universal common inputs to all motoneurons and the presence of long-term synchronization are misleading, because they did not use sufficiently rigorous statistical tests to detect synchronization. We developed a statistically based method (SigMax) for computing synchronization and tested it with data from 17,736 motor-unit pairs containing 1,035,225 firing instances from the first dorsal interosseous and vastus lateralis muscles—a data set one order of magnitude greater than that reported in previous studies. Only firing data, obtained from surface electromyographic signal decomposition with >95% accuracy, were used in the study. The data were not subjectively selected in any manner. Because of the size of our data set and the statistical rigor inherent to SigMax, we have confidence that the synchronization values that we calculated provide an improved estimate of physiologically driven synchronization. Compared with three other commonly used techniques, ours revealed three types of discrepancies that result from failing to use sufficient statistical tests necessary to detect synchronization. 1) On average, the z-score method falsely detected synchronization at 16 separate latencies in each motor-unit pair. 2) The cumulative sum method missed one out of every four synchronization identifications found by SigMax. 3) The common input assumption method identified synchronization from 100% of motor-unit pairs studied. SigMax revealed that only 50% of motor-unit pairs actually manifested synchronization. PMID:25210152
Harari, Gil
2014-01-01
Statistic significance, also known as p-value, and CI (Confidence Interval) are common statistics measures and are essential for the statistical analysis of studies in medicine and life sciences. These measures provide complementary information about the statistical probability and conclusions regarding the clinical significance of study findings. This article is intended to describe the methodologies, compare between the methods, assert their suitability for the different needs of study results analysis and to explain situations in which each method should be used.
Lewis, Gregory F.; Furman, Senta A.; McCool, Martha F.; Porges, Stephen W.
2011-01-01
Three frequently used RSA metrics are investigated to document violations of assumptions for parametric analyses, moderation by respiration, influences of nonstationarity, and sensitivity to vagal blockade. Although all metrics are highly correlated, new findings illustrate that the metrics are noticeably different on the above dimensions. Only one method conforms to the assumptions for parametric analyses, is not moderated by respiration, is not influenced by nonstationarity, and reliably generates stronger effect sizes. Moreover, this method is also the most sensitive to vagal blockade. Specific features of this method may provide insights into improving the statistical characteristics of other commonly used RSA metrics. These data provide the evidence to question, based on statistical grounds, published reports using particular metrics of RSA. PMID:22138367
De Luca, Carlo J; Kline, Joshua C
2014-12-01
Over the past four decades, various methods have been implemented to measure synchronization of motor-unit firings. In this work, we provide evidence that prior reports of the existence of universal common inputs to all motoneurons and the presence of long-term synchronization are misleading, because they did not use sufficiently rigorous statistical tests to detect synchronization. We developed a statistically based method (SigMax) for computing synchronization and tested it with data from 17,736 motor-unit pairs containing 1,035,225 firing instances from the first dorsal interosseous and vastus lateralis muscles--a data set one order of magnitude greater than that reported in previous studies. Only firing data, obtained from surface electromyographic signal decomposition with >95% accuracy, were used in the study. The data were not subjectively selected in any manner. Because of the size of our data set and the statistical rigor inherent to SigMax, we have confidence that the synchronization values that we calculated provide an improved estimate of physiologically driven synchronization. Compared with three other commonly used techniques, ours revealed three types of discrepancies that result from failing to use sufficient statistical tests necessary to detect synchronization. 1) On average, the z-score method falsely detected synchronization at 16 separate latencies in each motor-unit pair. 2) The cumulative sum method missed one out of every four synchronization identifications found by SigMax. 3) The common input assumption method identified synchronization from 100% of motor-unit pairs studied. SigMax revealed that only 50% of motor-unit pairs actually manifested synchronization. Copyright © 2014 the American Physiological Society.
Statistics used in current nursing research.
Zellner, Kathleen; Boerst, Connie J; Tabb, Wil
2007-02-01
Undergraduate nursing research courses should emphasize the statistics most commonly used in the nursing literature to strengthen students' and beginning researchers' understanding of them. To determine the most commonly used statistics, we reviewed all quantitative research articles published in 13 nursing journals in 2000. The findings supported Beitz's categorization of kinds of statistics. Ten primary statistics used in 80% of nursing research published in 2000 were identified. We recommend that the appropriate use of those top 10 statistics be emphasized in undergraduate nursing education and that the nursing profession continue to advocate for the use of methods (e.g., power analysis, odds ratio) that may contribute to the advancement of nursing research.
Luo, Li; Zhu, Yun
2012-01-01
Abstract The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T2, collapsing method, multivariate and collapsing (CMC) method, individual χ2 test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets. PMID:22651812
Luo, Li; Zhu, Yun; Xiong, Momiao
2012-06-01
The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.
Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P
1999-01-01
Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149
Jiang, Wei; Yu, Weichuan
2017-02-15
In genome-wide association studies (GWASs) of common diseases/traits, we often analyze multiple GWASs with the same phenotype together to discover associated genetic variants with higher power. Since it is difficult to access data with detailed individual measurements, summary-statistics-based meta-analysis methods have become popular to jointly analyze datasets from multiple GWASs. In this paper, we propose a novel summary-statistics-based joint analysis method based on controlling the joint local false discovery rate (Jlfdr). We prove that our method is the most powerful summary-statistics-based joint analysis method when controlling the false discovery rate at a certain level. In particular, the Jlfdr-based method achieves higher power than commonly used meta-analysis methods when analyzing heterogeneous datasets from multiple GWASs. Simulation experiments demonstrate the superior power of our method over meta-analysis methods. Also, our method discovers more associations than meta-analysis methods from empirical datasets of four phenotypes. The R-package is available at: http://bioinformatics.ust.hk/Jlfdr.html . eeyu@ust.hk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Reporting Statistical Results in Medical Journals
Arifin, Wan Nor; Sarimah, Abdullah; Norsa’adah, Bachok; Najib Majdi, Yaacob; Siti-Azrin, Ab Hamid; Kamarul Imran, Musa; Aniza, Abd Aziz; Naing, Lin
2016-01-01
Statistical editors of the Malaysian Journal of Medical Sciences (MJMS) must go through many submitted manuscripts, focusing on the statistical aspect of the manuscripts. However, the editors notice myriad styles of reporting the statistical results, which are not standardised among the authors. This could be due to the lack of clear written instructions on reporting statistics in the guidelines for authors. The aim of this editorial is to briefly outline reporting methods for several important and common statistical results. It will also address a number of common mistakes made by the authors. The editorial will serve as a guideline for authors aiming to publish in the MJMS as well as in other medical journals. PMID:27904419
1989-03-01
statistical energy analysis , the finite clement method, and the power flow method. Experimental solutions are the most common in the literature. The authors of...to the added weights and inertias of the transducers attached to an experimental structure. Statistical energy analysis (SEA) is a computational method...Analysis and Diagnosis," Journal of Sound and Vibration, Vol. 115, No. 3, pp. 405-422 (1987). 8. Lyon, R.L., Statistical Energy Analysis of Dynamical Systems
Investigation of Attitudinal Differences among Individuals of Different Employment Status
2010-10-28
be included in order to statistically control for common method variance (see Podsakoff , MacKenzie, Lee, & Podsakoff , 2003). Results Hypotheses 1...social identity theory. Social Psychology Quarterly, 58, 255-269. Podsakoff , P. M., MacKenzie, S. B., Lee, J., & Podsakoff , N. P. (2003). Common method
Tuuli, Methodius G; Odibo, Anthony O
2011-08-01
The objective of this article is to discuss the rationale for common statistical tests used for the analysis and interpretation of prenatal diagnostic imaging studies. Examples from the literature are used to illustrate descriptive and inferential statistics. The uses and limitations of linear and logistic regression analyses are discussed in detail.
Research design and statistical methods in Pakistan Journal of Medical Sciences (PJMS).
Akhtar, Sohail; Shah, Syed Wadood Ali; Rafiq, M; Khan, Ajmal
2016-01-01
This article compares the study design and statistical methods used in 2005, 2010 and 2015 of Pakistan Journal of Medical Sciences (PJMS). Only original articles of PJMS were considered for the analysis. The articles were carefully reviewed for statistical methods and designs, and then recorded accordingly. The frequency of each statistical method and research design was estimated and compared with previous years. A total of 429 articles were evaluated (n=74 in 2005, n=179 in 2010, n=176 in 2015) in which 171 (40%) were cross-sectional and 116 (27%) were prospective study designs. A verity of statistical methods were found in the analysis. The most frequent methods include: descriptive statistics (n=315, 73.4%), chi-square/Fisher's exact tests (n=205, 47.8%) and student t-test (n=186, 43.4%). There was a significant increase in the use of statistical methods over time period: t-test, chi-square/Fisher's exact test, logistic regression, epidemiological statistics, and non-parametric tests. This study shows that a diverse variety of statistical methods have been used in the research articles of PJMS and frequency improved from 2005 to 2015. However, descriptive statistics was the most frequent method of statistical analysis in the published articles while cross-sectional study design was common study design.
Situating Qualitative Modes of Inquiry within the Discipline of Statistics Education Research
ERIC Educational Resources Information Center
Groth, Randall E.
2010-01-01
Qualitative methods have become common in statistics education research, but questions linger about their role in scholarship. Currently, influential policy documents lend credence to the notion that qualitative methods are inherently inferior to quantitative ones. In this paper, several of the questions about qualitative research raised in recent…
NASA Technical Reports Server (NTRS)
Staubert, R.
1985-01-01
Methods for calculating the statistical significance of excess events and the interpretation of the formally derived values are discussed. It is argued that a simple formula for a conservative estimate should generally be used in order to provide a common understanding of quoted values.
Applying Bayesian statistics to the study of psychological trauma: A suggestion for future research.
Yalch, Matthew M
2016-03-01
Several contemporary researchers have noted the virtues of Bayesian methods of data analysis. Although debates continue about whether conventional or Bayesian statistics is the "better" approach for researchers in general, there are reasons why Bayesian methods may be well suited to the study of psychological trauma in particular. This article describes how Bayesian statistics offers practical solutions to the problems of data non-normality, small sample size, and missing data common in research on psychological trauma. After a discussion of these problems and the effects they have on trauma research, this article explains the basic philosophical and statistical foundations of Bayesian statistics and how it provides solutions to these problems using an applied example. Results of the literature review and the accompanying example indicates the utility of Bayesian statistics in addressing problems common in trauma research. Bayesian statistics provides a set of methodological tools and a broader philosophical framework that is useful for trauma researchers. Methodological resources are also provided so that interested readers can learn more. (c) 2016 APA, all rights reserved).
Han, Kyunghwa; Jung, Inkyung
2018-05-01
This review article presents an assessment of trends in statistical methods and an evaluation of their appropriateness in articles published in the Archives of Plastic Surgery (APS) from 2012 to 2017. We reviewed 388 original articles published in APS between 2012 and 2017. We categorized the articles that used statistical methods according to the type of statistical method, the number of statistical methods, and the type of statistical software used. We checked whether there were errors in the description of statistical methods and results. A total of 230 articles (59.3%) published in APS between 2012 and 2017 used one or more statistical method. Within these articles, there were 261 applications of statistical methods with continuous or ordinal outcomes, and 139 applications of statistical methods with categorical outcome. The Pearson chi-square test (17.4%) and the Mann-Whitney U test (14.4%) were the most frequently used methods. Errors in describing statistical methods and results were found in 133 of the 230 articles (57.8%). Inadequate description of P-values was the most common error (39.1%). Among the 230 articles that used statistical methods, 71.7% provided details about the statistical software programs used for the analyses. SPSS was predominantly used in the articles that presented statistical analyses. We found that the use of statistical methods in APS has increased over the last 6 years. It seems that researchers have been paying more attention to the proper use of statistics in recent years. It is expected that these positive trends will continue in APS.
Smith, Joseph M.; Mather, Martha E.
2012-01-01
Ecological indicators are science-based tools used to assess how human activities have impacted environmental resources. For monitoring and environmental assessment, existing species assemblage data can be used to make these comparisons through time or across sites. An impediment to using assemblage data, however, is that these data are complex and need to be simplified in an ecologically meaningful way. Because multivariate statistics are mathematical relationships, statistical groupings may not make ecological sense and will not have utility as indicators. Our goal was to define a process to select defensible and ecologically interpretable statistical simplifications of assemblage data in which researchers and managers can have confidence. For this, we chose a suite of statistical methods, compared the groupings that resulted from these analyses, identified convergence among groupings, then we interpreted the groupings using species and ecological guilds. When we tested this approach using a statewide stream fish dataset, not all statistical methods worked equally well. For our dataset, logistic regression (Log), detrended correspondence analysis (DCA), cluster analysis (CL), and non-metric multidimensional scaling (NMDS) provided consistent, simplified output. Specifically, the Log, DCA, CL-1, and NMDS-1 groupings were ≥60% similar to each other, overlapped with the fluvial-specialist ecological guild, and contained a common subset of species. Groupings based on number of species (e.g., Log, DCA, CL and NMDS) outperformed groupings based on abundance [e.g., principal components analysis (PCA) and Poisson regression]. Although the specific methods that worked on our test dataset have generality, here we are advocating a process (e.g., identifying convergent groupings with redundant species composition that are ecologically interpretable) rather than the automatic use of any single statistical tool. We summarize this process in step-by-step guidance for the future use of these commonly available ecological and statistical methods in preparing assemblage data for use in ecological indicators.
Research design and statistical methods in Pakistan Journal of Medical Sciences (PJMS)
Akhtar, Sohail; Shah, Syed Wadood Ali; Rafiq, M.; Khan, Ajmal
2016-01-01
Objective: This article compares the study design and statistical methods used in 2005, 2010 and 2015 of Pakistan Journal of Medical Sciences (PJMS). Methods: Only original articles of PJMS were considered for the analysis. The articles were carefully reviewed for statistical methods and designs, and then recorded accordingly. The frequency of each statistical method and research design was estimated and compared with previous years. Results: A total of 429 articles were evaluated (n=74 in 2005, n=179 in 2010, n=176 in 2015) in which 171 (40%) were cross-sectional and 116 (27%) were prospective study designs. A verity of statistical methods were found in the analysis. The most frequent methods include: descriptive statistics (n=315, 73.4%), chi-square/Fisher’s exact tests (n=205, 47.8%) and student t-test (n=186, 43.4%). There was a significant increase in the use of statistical methods over time period: t-test, chi-square/Fisher’s exact test, logistic regression, epidemiological statistics, and non-parametric tests. Conclusion: This study shows that a diverse variety of statistical methods have been used in the research articles of PJMS and frequency improved from 2005 to 2015. However, descriptive statistics was the most frequent method of statistical analysis in the published articles while cross-sectional study design was common study design. PMID:27022365
Peer-Assisted Learning in Research Methods and Statistics
ERIC Educational Resources Information Center
Stone, Anna; Meade, Claire; Watling, Rosamond
2012-01-01
Feedback from students on a Level 1 Research Methods and Statistics module, studied as a core part of a BSc Psychology programme, highlighted demand for additional tutorials to help them to understand basic concepts. Students in their final year of study commonly request work experience to enhance their employability. All students on the Level 1…
Robustness of S1 statistic with Hodges-Lehmann for skewed distributions
NASA Astrophysics Data System (ADS)
Ahad, Nor Aishah; Yahaya, Sharipah Soaad Syed; Yin, Lee Ping
2016-10-01
Analysis of variance (ANOVA) is a common use parametric method to test the differences in means for more than two groups when the populations are normally distributed. ANOVA is highly inefficient under the influence of non- normal and heteroscedastic settings. When the assumptions are violated, researchers are looking for alternative such as Kruskal-Wallis under nonparametric or robust method. This study focused on flexible method, S1 statistic for comparing groups using median as the location estimator. S1 statistic was modified by substituting the median with Hodges-Lehmann and the default scale estimator with the variance of Hodges-Lehmann and MADn to produce two different test statistics for comparing groups. Bootstrap method was used for testing the hypotheses since the sampling distributions of these modified S1 statistics are unknown. The performance of the proposed statistic in terms of Type I error was measured and compared against the original S1 statistic, ANOVA and Kruskal-Wallis. The propose procedures show improvement compared to the original statistic especially under extremely skewed distribution.
Dexter, Franklin; Shafer, Steven L
2017-03-01
Considerable attention has been drawn to poor reproducibility in the biomedical literature. One explanation is inadequate reporting of statistical methods by authors and inadequate assessment of statistical reporting and methods during peer review. In this narrative review, we examine scientific studies of several well-publicized efforts to improve statistical reporting. We also review several retrospective assessments of the impact of these efforts. These studies show that instructions to authors and statistical checklists are not sufficient; no findings suggested that either improves the quality of statistical methods and reporting. Second, even basic statistics, such as power analyses, are frequently missing or incorrectly performed. Third, statistical review is needed for all papers that involve data analysis. A consistent finding in the studies was that nonstatistical reviewers (eg, "scientific reviewers") and journal editors generally poorly assess statistical quality. We finish by discussing our experience with statistical review at Anesthesia & Analgesia from 2006 to 2016.
Statistical methods and neural network approaches for classification of data from multiple sources
NASA Technical Reports Server (NTRS)
Benediktsson, Jon Atli; Swain, Philip H.
1990-01-01
Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.
Random forests for classification in ecology
Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J.
2007-01-01
Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature. ?? 2007 by the Ecological Society of America.
Statistical evidence for common ancestry: Application to primates.
Baum, David A; Ané, Cécile; Larget, Bret; Solís-Lemus, Claudia; Ho, Lam Si Tung; Boone, Peggy; Drummond, Chloe P; Bontrager, Martin; Hunter, Steven J; Saucier, William
2016-06-01
Since Darwin, biologists have come to recognize that the theory of descent from common ancestry (CA) is very well supported by diverse lines of evidence. However, while the qualitative evidence is overwhelming, we also need formal methods for quantifying the evidential support for CA over the alternative hypothesis of separate ancestry (SA). In this article, we explore a diversity of statistical methods using data from the primates. We focus on two alternatives to CA, species SA (the separate origin of each named species) and family SA (the separate origin of each family). We implemented statistical tests based on morphological, molecular, and biogeographic data and developed two new methods: one that tests for phylogenetic autocorrelation while correcting for variation due to confounding ecological traits and a method for examining whether fossil taxa have fewer derived differences than living taxa. We overwhelmingly rejected both species and family SA with infinitesimal P values. We compare these results with those from two companion papers, which also found tremendously strong support for the CA of all primates, and discuss future directions and general philosophical issues that pertain to statistical testing of historical hypotheses such as CA. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
Gardenier, John S
2012-12-01
This paper recommends how authors of statistical studies can communicate to general audiences fully, clearly, and comfortably. The studies may use statistical methods to explore issues in science, engineering, and society or they may address issues in statistics specifically. In either case, readers without explicit statistical training should have no problem understanding the issues, the methods, or the results at a non-technical level. The arguments for those results should be clear, logical, and persuasive. This paper also provides advice for editors of general journals on selecting high quality statistical articles without the need for exceptional work or expense. Finally, readers are also advised to watch out for some common errors or misuses of statistics that can be detected without a technical statistical background.
Sequi, Marco; Campi, Rita; Clavenna, Antonio; Bonati, Maurizio
2013-03-01
To evaluate the quality of data reporting and statistical methods performed in drug utilization studies in the pediatric population. Drug utilization studies evaluating all drug prescriptions to children and adolescents published between January 1994 and December 2011 were retrieved and analyzed. For each study, information on measures of exposure/consumption, the covariates considered, descriptive and inferential analyses, statistical tests, and methods of data reporting was extracted. An overall quality score was created for each study using a 12-item checklist that took into account the presence of outcome measures, covariates of measures, descriptive measures, statistical tests, and graphical representation. A total of 22 studies were reviewed and analyzed. Of these, 20 studies reported at least one descriptive measure. The mean was the most commonly used measure (18 studies), but only five of these also reported the standard deviation. Statistical analyses were performed in 12 studies, with the chi-square test being the most commonly performed test. Graphs were presented in 14 papers. Sixteen papers reported the number of drug prescriptions and/or packages, and ten reported the prevalence of the drug prescription. The mean quality score was 8 (median 9). Only seven of the 22 studies received a score of ≥10, while four studies received a score of <6. Our findings document that only a few of the studies reviewed applied statistical methods and reported data in a satisfactory manner. We therefore conclude that the methodology of drug utilization studies needs to be improved.
Pattern statistics on Markov chains and sensitivity to parameter estimation
Nuel, Grégory
2006-01-01
Background: In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). Results: In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of σ, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. Conclusion: We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation. PMID:17044916
Pattern statistics on Markov chains and sensitivity to parameter estimation.
Nuel, Grégory
2006-10-17
In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of sigma, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation.
Two Paradoxes in Linear Regression Analysis.
Feng, Ge; Peng, Jing; Tu, Dongke; Zheng, Julia Z; Feng, Changyong
2016-12-25
Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.
The limits of protein sequence comparison?
Pearson, William R; Sierk, Michael L
2010-01-01
Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized. PMID:15919194
Statistics in the pharmacy literature.
Lee, Charlene M; Soin, Herpreet K; Einarson, Thomas R
2004-09-01
Research in statistical methods is essential for maintenance of high quality of the published literature. To update previous reports of the types and frequencies of statistical terms and procedures in research studies of selected professional pharmacy journals. We obtained all research articles published in 2001 in 6 journals: American Journal of Health-System Pharmacy, The Annals of Pharmacotherapy, Canadian Journal of Hospital Pharmacy, Formulary, Hospital Pharmacy, and Journal of the American Pharmaceutical Association. Two independent reviewers identified and recorded descriptive and inferential statistical terms/procedures found in the methods, results, and discussion sections of each article. Results were determined by tallying the total number of times, as well as the percentage, that each statistical term or procedure appeared in the articles. One hundred forty-four articles were included. Ninety-eight percent employed descriptive statistics; of these, 28% used only descriptive statistics. The most common descriptive statistical terms were percentage (90%), mean (74%), standard deviation (58%), and range (46%). Sixty-nine percent of the articles used inferential statistics, the most frequent being chi(2) (33%), Student's t-test (26%), Pearson's correlation coefficient r (18%), ANOVA (14%), and logistic regression (11%). Statistical terms and procedures were found in nearly all of the research articles published in pharmacy journals. Thus, pharmacy education should aim to provide current and future pharmacists with an understanding of the common statistical terms and procedures identified to facilitate the appropriate appraisal and consequential utilization of the information available in research articles.
Testing variance components by two jackknife methods
USDA-ARS?s Scientific Manuscript database
The jacknife method, a resampling technique, has been widely used for statistical tests for years. The pseudo value based jacknife method (defined as pseudo jackknife method) is commonly used to reduce the bias for an estimate; however, sometimes it could result in large variaion for an estmimate a...
When mechanism matters: Bayesian forecasting using models of ecological diffusion
Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.
2017-01-01
Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.
Nowcasting Cloud Fields for U.S. Air Force Special Operations
2017-03-01
application of Bayes’ Rule offers many advantages over Kernel Density Estimation (KDE) and other commonly used statistical post-processing methods...reflectance and probability of cloud. A statistical post-processing technique is applied using Bayesian estimation to train the system from a set of past...nowcasting, low cloud forecasting, cloud reflectance, ISR, Bayesian estimation, statistical post-processing, machine learning 15. NUMBER OF PAGES
ERIC Educational Resources Information Center
Nicholson, James; Ridgway, Jim
2017-01-01
White and Gorard make important and relevant criticisms of some of the methods commonly used in social science research, but go further by criticising the logical basis for inferential statistical tests. This paper comments briefly on matters we broadly agree on with them and more fully on matters where we disagree. We agree that too little…
Common pitfalls in statistical analysis: Measures of agreement.
Ranganathan, Priya; Pramesh, C S; Aggarwal, Rakesh
2017-01-01
Agreement between measurements refers to the degree of concordance between two (or more) sets of measurements. Statistical methods to test agreement are used to assess inter-rater variability or to decide whether one technique for measuring a variable can substitute another. In this article, we look at statistical measures of agreement for different types of data and discuss the differences between these and those for assessing correlation.
Secondary Analysis of National Longitudinal Transition Study 2 Data
ERIC Educational Resources Information Center
Hicks, Tyler A.; Knollman, Greg A.
2015-01-01
This review examines published secondary analyses of National Longitudinal Transition Study 2 (NLTS2) data, with a primary focus upon statistical objectives, paradigms, inferences, and methods. Its primary purpose was to determine which statistical techniques have been common in secondary analyses of NLTS2 data. The review begins with an…
A Primer on Multivariate Analysis of Variance (MANOVA) for Behavioral Scientists
ERIC Educational Resources Information Center
Warne, Russell T.
2014-01-01
Reviews of statistical procedures (e.g., Bangert & Baumberger, 2005; Kieffer, Reese, & Thompson, 2001; Warne, Lazo, Ramos, & Ritter, 2012) show that one of the most common multivariate statistical methods in psychological research is multivariate analysis of variance (MANOVA). However, MANOVA and its associated procedures are often not…
Two Paradoxes in Linear Regression Analysis
FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong
2016-01-01
Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214
Otwombe, Kennedy N.; Petzold, Max; Martinson, Neil; Chirwa, Tobias
2014-01-01
Background Research in the predictors of all-cause mortality in HIV-infected people has widely been reported in literature. Making an informed decision requires understanding the methods used. Objectives We present a review on study designs, statistical methods and their appropriateness in original articles reporting on predictors of all-cause mortality in HIV-infected people between January 2002 and December 2011. Statistical methods were compared between 2002–2006 and 2007–2011. Time-to-event analysis techniques were considered appropriate. Data Sources Pubmed/Medline. Study Eligibility Criteria Original English-language articles were abstracted. Letters to the editor, editorials, reviews, systematic reviews, meta-analysis, case reports and any other ineligible articles were excluded. Results A total of 189 studies were identified (n = 91 in 2002–2006 and n = 98 in 2007–2011) out of which 130 (69%) were prospective and 56 (30%) were retrospective. One hundred and eighty-two (96%) studies described their sample using descriptive statistics while 32 (17%) made comparisons using t-tests. Kaplan-Meier methods for time-to-event analysis were commonly used in the earlier period (n = 69, 76% vs. n = 53, 54%, p = 0.002). Predictors of mortality in the two periods were commonly determined using Cox regression analysis (n = 67, 75% vs. n = 63, 64%, p = 0.12). Only 7 (4%) used advanced survival analysis methods of Cox regression analysis with frailty in which 6 (3%) were used in the later period. Thirty-two (17%) used logistic regression while 8 (4%) used other methods. There were significantly more articles from the first period using appropriate methods compared to the second (n = 80, 88% vs. n = 69, 70%, p-value = 0.003). Conclusion Descriptive statistics and survival analysis techniques remain the most common methods of analysis in publications on predictors of all-cause mortality in HIV-infected cohorts while prospective research designs are favoured. Sophisticated techniques of time-dependent Cox regression and Cox regression with frailty are scarce. This motivates for more training in the use of advanced time-to-event methods. PMID:24498313
A survey of statistics in three UK general practice journal
Rigby, Alan S; Armstrong, Gillian K; Campbell, Michael J; Summerton, Nick
2004-01-01
Background Many medical specialities have reviewed the statistical content of their journals. To our knowledge this has not been done in general practice. Given the main role of a general practitioner as a diagnostician we thought it would be of interest to see whether the statistical methods reported reflect the diagnostic process. Methods Hand search of three UK journals of general practice namely the British Medical Journal (general practice section), British Journal of General Practice and Family Practice over a one-year period (1 January to 31 December 2000). Results A wide variety of statistical techniques were used. The most common methods included t-tests and Chi-squared tests. There were few articles reporting likelihood ratios and other useful diagnostic methods. There was evidence that the journals with the more thorough statistical review process reported a more complex and wider variety of statistical techniques. Conclusions The BMJ had a wider range and greater diversity of statistical methods than the other two journals. However, in all three journals there was a dearth of papers reflecting the diagnostic process. Across all three journals there were relatively few papers describing randomised controlled trials thus recognising the difficulty of implementing this design in general practice. PMID:15596014
Mathur, Sunil; Sadana, Ajit
2015-12-01
We present a rank-based test statistic for the identification of differentially expressed genes using a distance measure. The proposed test statistic is highly robust against extreme values and does not assume the distribution of parent population. Simulation studies show that the proposed test is more powerful than some of the commonly used methods, such as paired t-test, Wilcoxon signed rank test, and significance analysis of microarray (SAM) under certain non-normal distributions. The asymptotic distribution of the test statistic, and the p-value function are discussed. The application of proposed method is shown using a real-life data set. © The Author(s) 2011.
A general statistical test for correlations in a finite-length time series.
Hanson, Jeffery A; Yang, Haw
2008-06-07
The statistical properties of the autocorrelation function from a time series composed of independently and identically distributed stochastic variables has been studied. Analytical expressions for the autocorrelation function's variance have been derived. It has been found that two common ways of calculating the autocorrelation, moving-average and Fourier transform, exhibit different uncertainty characteristics. For periodic time series, the Fourier transform method is preferred because it gives smaller uncertainties that are uniform through all time lags. Based on these analytical results, a statistically robust method has been proposed to test the existence of correlations in a time series. The statistical test is verified by computer simulations and an application to single-molecule fluorescence spectroscopy is discussed.
Blind image quality assessment based on aesthetic and statistical quality-aware features
NASA Astrophysics Data System (ADS)
Jenadeleh, Mohsen; Masaeli, Mohammad Masood; Moghaddam, Mohsen Ebrahimi
2017-07-01
The main goal of image quality assessment (IQA) methods is the emulation of human perceptual image quality judgments. Therefore, the correlation between objective scores of these methods with human perceptual scores is considered as their performance metric. Human judgment of the image quality implicitly includes many factors when assessing perceptual image qualities such as aesthetics, semantics, context, and various types of visual distortions. The main idea of this paper is to use a host of features that are commonly employed in image aesthetics assessment in order to improve blind image quality assessment (BIQA) methods accuracy. We propose an approach that enriches the features of BIQA methods by integrating a host of aesthetics image features with the features of natural image statistics derived from multiple domains. The proposed features have been used for augmenting five different state-of-the-art BIQA methods, which use statistical natural scene statistics features. Experiments were performed on seven benchmark image quality databases. The experimental results showed significant improvement of the accuracy of the methods.
Beyond Multiple Regression: Using Commonality Analysis to Better Understand R[superscript 2] Results
ERIC Educational Resources Information Center
Warne, Russell T.
2011-01-01
Multiple regression is one of the most common statistical methods used in quantitative educational research. Despite the versatility and easy interpretability of multiple regression, it has some shortcomings in the detection of suppressor variables and for somewhat arbitrarily assigning values to the structure coefficients of correlated…
Statistical Techniques to Analyze Pesticide Data Program Food Residue Observations.
Szarka, Arpad Z; Hayworth, Carol G; Ramanarayanan, Tharacad S; Joseph, Robert S I
2018-06-26
The U.S. EPA conducts dietary-risk assessments to ensure that levels of pesticides on food in the U.S. food supply are safe. Often these assessments utilize conservative residue estimates, maximum residue levels (MRLs), and a high-end estimate derived from registrant-generated field-trial data sets. A more realistic estimate of consumers' pesticide exposure from food may be obtained by utilizing residues from food-monitoring programs, such as the Pesticide Data Program (PDP) of the U.S. Department of Agriculture. A substantial portion of food-residue concentrations in PDP monitoring programs are below the limits of detection (left-censored), which makes the comparison of regulatory-field-trial and PDP residue levels difficult. In this paper, we present a novel adaption of established statistical techniques, the Kaplan-Meier estimator (K-M), the robust regression on ordered statistic (ROS), and the maximum-likelihood estimator (MLE), to quantify the pesticide-residue concentrations in the presence of heavily censored data sets. The examined statistical approaches include the most commonly used parametric and nonparametric methods for handling left-censored data that have been used in the fields of medical and environmental sciences. This work presents a case study in which data of thiamethoxam residue on bell pepper generated from registrant field trials were compared with PDP-monitoring residue values. The results from the statistical techniques were evaluated and compared with commonly used simple substitution methods for the determination of summary statistics. It was found that the maximum-likelihood estimator (MLE) is the most appropriate statistical method to analyze this residue data set. Using the MLE technique, the data analyses showed that the median and mean PDP bell pepper residue levels were approximately 19 and 7 times lower, respectively, than the corresponding statistics of the field-trial residues.
Beck, H J; Birch, G F
2013-06-01
Stormwater contaminant loading estimates using event mean concentration (EMC), rainfall/runoff relationship calculations and computer modelling (Model of Urban Stormwater Infrastructure Conceptualisation--MUSIC) demonstrated high variability in common methods of water quality assessment. Predictions of metal, nutrient and total suspended solid loadings for three highly urbanised catchments in Sydney estuary, Australia, varied greatly within and amongst methods tested. EMC and rainfall/runoff relationship calculations produced similar estimates (within 1 SD) in a statistically significant number of trials; however, considerable variability within estimates (∼50 and ∼25 % relative standard deviation, respectively) questions the reliability of these methods. Likewise, upper and lower default inputs in a commonly used loading model (MUSIC) produced an extensive range of loading estimates (3.8-8.3 times above and 2.6-4.1 times below typical default inputs, respectively). Default and calibrated MUSIC simulations produced loading estimates that agreed with EMC and rainfall/runoff calculations in some trials (4-10 from 18); however, they were not frequent enough to statistically infer that these methods produced the same results. Great variance within and amongst mean annual loads estimated by common methods of water quality assessment has important ramifications for water quality managers requiring accurate estimates of the quantities and nature of contaminants requiring treatment.
Podsakoff, Philip M; MacKenzie, Scott B; Lee, Jeong-Yeon; Podsakoff, Nathan P
2003-10-01
Interest in the problem of method biases has a long history in the behavioral sciences. Despite this, a comprehensive summary of the potential sources of method biases and how to control for them does not exist. Therefore, the purpose of this article is to examine the extent to which method biases influence behavioral research results, identify potential sources of method biases, discuss the cognitive processes through which method biases influence responses to measures, evaluate the many different procedural and statistical techniques that can be used to control method biases, and provide recommendations for how to select appropriate procedural and statistical remedies for different types of research settings.
Correcting for Optimistic Prediction in Small Data Sets
Smith, Gordon C. S.; Seaman, Shaun R.; Wood, Angela M.; Royston, Patrick; White, Ian R.
2014-01-01
The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. We used clinical data sets (United Kingdom Down syndrome screening data from Glasgow (1991–2003), Edinburgh (1999–2003), and Cambridge (1990–2006), as well as Scottish national pregnancy discharge data (2004–2007)) to evaluate different approaches to adjustment for optimism. We found that sample splitting, cross-validation without replication, and leave-1-out cross-validation produced optimism-adjusted estimates of the C statistic that were biased and/or associated with greater absolute error than other available methods. Cross-validation with replication, bootstrapping, and a new method (leave-pair-out cross-validation) all generated unbiased optimism-adjusted estimates of the C statistic and had similar absolute errors in the clinical data set. Larger simulation studies confirmed that all 3 methods performed similarly with 10 or more events per variable, or when the C statistic was 0.9 or greater. However, with lower events per variable or lower C statistics, bootstrapping tended to be optimistic but with lower absolute and mean squared errors than both methods of cross-validation. PMID:24966219
Ritchie, Marylyn D.; Hahn, Lance W.; Roodi, Nady; Bailey, L. Renee; Dupont, William D.; Parl, Fritz F.; Moore, Jason H.
2001-01-01
One of the greatest challenges facing human geneticists is the identification and characterization of susceptibility genes for common complex multifactorial human diseases. This challenge is partly due to the limitations of parametric-statistical methods for detection of gene effects that are dependent solely or partially on interactions with other genes and with environmental exposures. We introduce multifactor-dimensionality reduction (MDR) as a method for reducing the dimensionality of multilocus information, to improve the identification of polymorphism combinations associated with disease risk. The MDR method is nonparametric (i.e., no hypothesis about the value of a statistical parameter is made), is model-free (i.e., it assumes no particular inheritance model), and is directly applicable to case-control and discordant-sib-pair studies. Using simulated case-control data, we demonstrate that MDR has reasonable power to identify interactions among two or more loci in relatively small samples. When it was applied to a sporadic breast cancer case-control data set, in the absence of any statistically significant independent main effects, MDR identified a statistically significant high-order interaction among four polymorphisms from three different estrogen-metabolism genes. To our knowledge, this is the first report of a four-locus interaction associated with a common complex multifactorial disease. PMID:11404819
Nour-Eldein, Hebatallah
2016-01-01
Background: With limited statistical knowledge of most physicians it is not uncommon to find statistical errors in research articles. Objectives: To determine the statistical methods and to assess the statistical errors in family medicine (FM) research articles that were published between 2010 and 2014. Methods: This was a cross-sectional study. All 66 FM research articles that were published over 5 years by FM authors with affiliation to Suez Canal University were screened by the researcher between May and August 2015. Types and frequencies of statistical methods were reviewed in all 66 FM articles. All 60 articles with identified inferential statistics were examined for statistical errors and deficiencies. A comprehensive 58-item checklist based on statistical guidelines was used to evaluate the statistical quality of FM articles. Results: Inferential methods were recorded in 62/66 (93.9%) of FM articles. Advanced analyses were used in 29/66 (43.9%). Contingency tables 38/66 (57.6%), regression (logistic, linear) 26/66 (39.4%), and t-test 17/66 (25.8%) were the most commonly used inferential tests. Within 60 FM articles with identified inferential statistics, no prior sample size 19/60 (31.7%), application of wrong statistical tests 17/60 (28.3%), incomplete documentation of statistics 59/60 (98.3%), reporting P value without test statistics 32/60 (53.3%), no reporting confidence interval with effect size measures 12/60 (20.0%), use of mean (standard deviation) to describe ordinal/nonnormal data 8/60 (13.3%), and errors related to interpretation were mainly for conclusions without support by the study data 5/60 (8.3%). Conclusion: Inferential statistics were used in the majority of FM articles. Data analysis and reporting statistics are areas for improvement in FM research articles. PMID:27453839
NASA Astrophysics Data System (ADS)
Pernot, Pascal; Savin, Andreas
2018-06-01
Benchmarking studies in computational chemistry use reference datasets to assess the accuracy of a method through error statistics. The commonly used error statistics, such as the mean signed and mean unsigned errors, do not inform end-users on the expected amplitude of prediction errors attached to these methods. We show that, the distributions of model errors being neither normal nor zero-centered, these error statistics cannot be used to infer prediction error probabilities. To overcome this limitation, we advocate for the use of more informative statistics, based on the empirical cumulative distribution function of unsigned errors, namely, (1) the probability for a new calculation to have an absolute error below a chosen threshold and (2) the maximal amplitude of errors one can expect with a chosen high confidence level. Those statistics are also shown to be well suited for benchmarking and ranking studies. Moreover, the standard error on all benchmarking statistics depends on the size of the reference dataset. Systematic publication of these standard errors would be very helpful to assess the statistical reliability of benchmarking conclusions.
Kumar, Vineet
2011-12-01
The grain size statistics, commonly derived from the grain map of a material sample, are important microstructure characteristics that greatly influence its properties. The grain map for nanomaterials is usually obtained manually by visual inspection of the transmission electron microscope (TEM) micrographs because automated methods do not perform satisfactorily. While the visual inspection method provides reliable results, it is a labor intensive process and is often prone to human errors. In this article, an automated grain mapping method is developed using TEM diffraction patterns. The presented method uses wide angle convergent beam diffraction in the TEM. The automated technique was applied on a platinum thin film sample to obtain the grain map and subsequently derive grain size statistics from it. The grain size statistics obtained with the automated method were found in good agreement with the visual inspection method.
A comparison of linear and nonlinear statistical techniques in performance attribution.
Chan, N H; Genovese, C R
2001-01-01
Performance attribution is usually conducted under the linear framework of multifactor models. Although commonly used by practitioners in finance, linear multifactor models are known to be less than satisfactory in many situations. After a brief survey of nonlinear methods, nonlinear statistical techniques are applied to performance attribution of a portfolio constructed from a fixed universe of stocks using factors derived from some commonly used cross sectional linear multifactor models. By rebalancing this portfolio monthly, the cumulative returns for procedures based on standard linear multifactor model and three nonlinear techniques-model selection, additive models, and neural networks-are calculated and compared. It is found that the first two nonlinear techniques, especially in combination, outperform the standard linear model. The results in the neural-network case are inconclusive because of the great variety of possible models. Although these methods are more complicated and may require some tuning, toolboxes are developed and suggestions on calibration are proposed. This paper demonstrates the usefulness of modern nonlinear statistical techniques in performance attribution.
Williams, Jennifer Stewart
2011-07-01
To show how fractional polynomial methods can usefully replace the practice of arbitrarily categorizing data in epidemiology and health services research. A health service setting is used to illustrate a structured and transparent way of representing non-linear data without arbitrary grouping. When age is a regressor its effects on an outcome will be interpreted differently depending upon the placing of cutpoints or the use of a polynomial transformation. Although it is common practice, categorization comes at a cost. Information is lost, and accuracy and statistical power reduced, leading to spurious statistical interpretation of the data. The fractional polynomial method is widely supported by statistical software programs, and deserves greater attention and use.
Statistics corner: A guide to appropriate use of correlation coefficient in medical research.
Mukaka, M M
2012-09-01
Correlation is a statistical method used to assess a possible linear association between two continuous variables. It is simple both to calculate and to interpret. However, misuse of correlation is so common among researchers that some statisticians have wished that the method had never been devised at all. The aim of this article is to provide a guide to appropriate use of correlation in medical research and to highlight some misuse. Examples of the applications of the correlation coefficient have been provided using data from statistical simulations as well as real data. Rule of thumb for interpreting size of a correlation coefficient has been provided.
Allen, Peter J.; Dorozenko, Kate P.; Roberts, Lynne D.
2016-01-01
Quantitative research methods are essential to the development of professional competence in psychology. They are also an area of weakness for many students. In particular, students are known to struggle with the skill of selecting quantitative analytical strategies appropriate for common research questions, hypotheses and data types. To begin understanding this apparent deficit, we presented nine psychology undergraduates (who had all completed at least one quantitative methods course) with brief research vignettes, and asked them to explicate the process they would follow to identify an appropriate statistical technique for each. Thematic analysis revealed that all participants found this task challenging, and even those who had completed several research methods courses struggled to articulate how they would approach the vignettes on more than a very superficial and intuitive level. While some students recognized that there is a systematic decision making process that can be followed, none could describe it clearly or completely. We then presented the same vignettes to 10 psychology academics with particular expertise in conducting research and/or research methods instruction. Predictably, these “experts” were able to describe a far more systematic, comprehensive, flexible, and nuanced approach to statistical decision making, which begins early in the research process, and pays consideration to multiple contextual factors. They were sensitive to the challenges that students experience when making statistical decisions, which they attributed partially to how research methods and statistics are commonly taught. This sensitivity was reflected in their pedagogic practices. When asked to consider the format and features of an aid that could facilitate the statistical decision making process, both groups expressed a preference for an accessible, comprehensive and reputable resource that follows a basic decision tree logic. For the academics in particular, this aid should function as a teaching tool, which engages the user with each choice-point in the decision making process, rather than simply providing an “answer.” Based on these findings, we offer suggestions for tools and strategies that could be deployed in the research methods classroom to facilitate and strengthen students' statistical decision making abilities. PMID:26909064
Allen, Peter J; Dorozenko, Kate P; Roberts, Lynne D
2016-01-01
Quantitative research methods are essential to the development of professional competence in psychology. They are also an area of weakness for many students. In particular, students are known to struggle with the skill of selecting quantitative analytical strategies appropriate for common research questions, hypotheses and data types. To begin understanding this apparent deficit, we presented nine psychology undergraduates (who had all completed at least one quantitative methods course) with brief research vignettes, and asked them to explicate the process they would follow to identify an appropriate statistical technique for each. Thematic analysis revealed that all participants found this task challenging, and even those who had completed several research methods courses struggled to articulate how they would approach the vignettes on more than a very superficial and intuitive level. While some students recognized that there is a systematic decision making process that can be followed, none could describe it clearly or completely. We then presented the same vignettes to 10 psychology academics with particular expertise in conducting research and/or research methods instruction. Predictably, these "experts" were able to describe a far more systematic, comprehensive, flexible, and nuanced approach to statistical decision making, which begins early in the research process, and pays consideration to multiple contextual factors. They were sensitive to the challenges that students experience when making statistical decisions, which they attributed partially to how research methods and statistics are commonly taught. This sensitivity was reflected in their pedagogic practices. When asked to consider the format and features of an aid that could facilitate the statistical decision making process, both groups expressed a preference for an accessible, comprehensive and reputable resource that follows a basic decision tree logic. For the academics in particular, this aid should function as a teaching tool, which engages the user with each choice-point in the decision making process, rather than simply providing an "answer." Based on these findings, we offer suggestions for tools and strategies that could be deployed in the research methods classroom to facilitate and strengthen students' statistical decision making abilities.
Mean template for tensor-based morphometry using deformation tensors.
Leporé, Natasha; Brun, Caroline; Pennec, Xavier; Chou, Yi-Yu; Lopez, Oscar L; Aizenstein, Howard J; Becker, James T; Toga, Arthur W; Thompson, Paul M
2007-01-01
Tensor-based morphometry (TBM) studies anatomical differences between brain images statistically, to identify regions that differ between groups, over time, or correlate with cognitive or clinical measures. Using a nonlinear registration algorithm, all images are mapped to a common space, and statistics are most commonly performed on the Jacobian determinant (local expansion factor) of the deformation fields. In, it was shown that the detection sensitivity of the standard TBM approach could be increased by using the full deformation tensors in a multivariate statistical analysis. Here we set out to improve the common space itself, by choosing the shape that minimizes a natural metric on the deformation tensors from that space to the population of control subjects. This method avoids statistical bias and should ease nonlinear registration of new subjects data to a template that is 'closest' to all subjects' anatomies. As deformation tensors are symmetric positive-definite matrices and do not form a vector space, all computations are performed in the log-Euclidean framework. The control brain B that is already the closest to 'average' is found. A gradient descent algorithm is then used to perform the minimization that iteratively deforms this template and obtains the mean shape. We apply our method to map the profile of anatomical differences in a dataset of 26 HIV/AIDS patients and 14 controls, via a log-Euclidean Hotelling's T2 test on the deformation tensors. These results are compared to the ones found using the 'best' control, B. Statistics on both shapes are evaluated using cumulative distribution functions of the p-values in maps of inter-group differences.
ERIC Educational Resources Information Center
Jackson, Dan
2013-01-01
Statistical inference is problematic in the common situation in meta-analysis where the random effects model is fitted to just a handful of studies. In particular, the asymptotic theory of maximum likelihood provides a poor approximation, and Bayesian methods are sensitive to the prior specification. Hence, less efficient, but easily computed and…
Common pitfalls in statistical analysis: The perils of multiple testing
Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc
2016-01-01
Multiple testing refers to situations where a dataset is subjected to statistical testing multiple times - either at multiple time-points or through multiple subgroups or for multiple end-points. This amplifies the probability of a false-positive finding. In this article, we look at the consequences of multiple testing and explore various methods to deal with this issue. PMID:27141478
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piepel, Gregory F.; Matzke, Brett D.; Sego, Landon H.
2013-04-27
This report discusses the methodology, formulas, and inputs needed to make characterization and clearance decisions for Bacillus anthracis-contaminated and uncontaminated (or decontaminated) areas using a statistical sampling approach. Specifically, the report includes the methods and formulas for calculating the • number of samples required to achieve a specified confidence in characterization and clearance decisions • confidence in making characterization and clearance decisions for a specified number of samples for two common statistically based environmental sampling approaches. In particular, the report addresses an issue raised by the Government Accountability Office by providing methods and formulas to calculate the confidence that amore » decision area is uncontaminated (or successfully decontaminated) if all samples collected according to a statistical sampling approach have negative results. Key to addressing this topic is the probability that an individual sample result is a false negative, which is commonly referred to as the false negative rate (FNR). The two statistical sampling approaches currently discussed in this report are 1) hotspot sampling to detect small isolated contaminated locations during the characterization phase, and 2) combined judgment and random (CJR) sampling during the clearance phase. Typically if contamination is widely distributed in a decision area, it will be detectable via judgment sampling during the characterization phrase. Hotspot sampling is appropriate for characterization situations where contamination is not widely distributed and may not be detected by judgment sampling. CJR sampling is appropriate during the clearance phase when it is desired to augment judgment samples with statistical (random) samples. The hotspot and CJR statistical sampling approaches are discussed in the report for four situations: 1. qualitative data (detect and non-detect) when the FNR = 0 or when using statistical sampling methods that account for FNR > 0 2. qualitative data when the FNR > 0 but statistical sampling methods are used that assume the FNR = 0 3. quantitative data (e.g., contaminant concentrations expressed as CFU/cm2) when the FNR = 0 or when using statistical sampling methods that account for FNR > 0 4. quantitative data when the FNR > 0 but statistical sampling methods are used that assume the FNR = 0. For Situation 2, the hotspot sampling approach provides for stating with Z% confidence that a hotspot of specified shape and size with detectable contamination will be found. Also for Situation 2, the CJR approach provides for stating with X% confidence that at least Y% of the decision area does not contain detectable contamination. Forms of these statements for the other three situations are discussed in Section 2.2. Statistical methods that account for FNR > 0 currently only exist for the hotspot sampling approach with qualitative data (or quantitative data converted to qualitative data). This report documents the current status of methods and formulas for the hotspot and CJR sampling approaches. Limitations of these methods are identified. Extensions of the methods that are applicable when FNR = 0 to account for FNR > 0, or to address other limitations, will be documented in future revisions of this report if future funding supports the development of such extensions. For quantitative data, this report also presents statistical methods and formulas for 1. quantifying the uncertainty in measured sample results 2. estimating the true surface concentration corresponding to a surface sample 3. quantifying the uncertainty of the estimate of the true surface concentration. All of the methods and formulas discussed in the report were applied to example situations to illustrate application of the methods and interpretation of the results.« less
Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul
2016-01-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257
Assessment of statistical methods used in library-based approaches to microbial source tracking.
Ritter, Kerry J; Carruthers, Ethan; Carson, C Andrew; Ellender, R D; Harwood, Valerie J; Kingsley, Kyle; Nakatsu, Cindy; Sadowsky, Michael; Shear, Brian; West, Brian; Whitlock, John E; Wiggins, Bruce A; Wilbur, Jayson D
2003-12-01
Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.
Modeling Outcomes with Floor or Ceiling Effects: An Introduction to the Tobit Model
ERIC Educational Resources Information Center
McBee, Matthew
2010-01-01
In gifted education research, it is common for outcome variables to exhibit strong floor or ceiling effects due to insufficient range of measurement of many instruments when used with gifted populations. Common statistical methods (e.g., analysis of variance, linear regression) produce biased estimates when such effects are present. In practice,…
Mapping Quantitative Traits in Unselected Families: Algorithms and Examples
Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David
2009-01-01
Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016
Smith, Ben J; Zehle, Katharina; Bauman, Adrian E; Chau, Josephine; Hawkshaw, Barbara; Frost, Steven; Thomas, Margaret
2006-04-01
This study examined the use of quantitative methods in Australian health promotion research in order to identify methodological trends and priorities for strengthening the evidence base for health promotion. Australian health promotion articles were identified by hand searching publications from 1992-2002 in six journals: Health Promotion Journal of Australia, Australian and New Zealand journal of Public Health, Health Promotion International, Health Education Research, Health Education and Behavior and the American Journal of Health Promotion. The study designs and statistical methods used in articles presenting quantitative research were recorded. 591 (57.7%) of the 1,025 articles used quantitative methods. Cross-sectional designs were used in the majority (54.3%) of studies with pre- and post-test (14.6%) and post-test only (9.5%) the next most common designs. Bivariate statistical methods were used in 45.9% of papers, multivariate methods in 27.1% and simple numbers and proportions in 25.4%. Few studies used higher-level statistical techniques. While most studies used quantitative methods, the majority were descriptive in nature. The study designs and statistical methods used provided limited scope for demonstrating intervention effects or understanding the determinants of change.
Nour-Eldein, Hebatallah
2016-01-01
With limited statistical knowledge of most physicians it is not uncommon to find statistical errors in research articles. To determine the statistical methods and to assess the statistical errors in family medicine (FM) research articles that were published between 2010 and 2014. This was a cross-sectional study. All 66 FM research articles that were published over 5 years by FM authors with affiliation to Suez Canal University were screened by the researcher between May and August 2015. Types and frequencies of statistical methods were reviewed in all 66 FM articles. All 60 articles with identified inferential statistics were examined for statistical errors and deficiencies. A comprehensive 58-item checklist based on statistical guidelines was used to evaluate the statistical quality of FM articles. Inferential methods were recorded in 62/66 (93.9%) of FM articles. Advanced analyses were used in 29/66 (43.9%). Contingency tables 38/66 (57.6%), regression (logistic, linear) 26/66 (39.4%), and t-test 17/66 (25.8%) were the most commonly used inferential tests. Within 60 FM articles with identified inferential statistics, no prior sample size 19/60 (31.7%), application of wrong statistical tests 17/60 (28.3%), incomplete documentation of statistics 59/60 (98.3%), reporting P value without test statistics 32/60 (53.3%), no reporting confidence interval with effect size measures 12/60 (20.0%), use of mean (standard deviation) to describe ordinal/nonnormal data 8/60 (13.3%), and errors related to interpretation were mainly for conclusions without support by the study data 5/60 (8.3%). Inferential statistics were used in the majority of FM articles. Data analysis and reporting statistics are areas for improvement in FM research articles.
Statistical inference for tumor growth inhibition T/C ratio.
Wu, Jianrong
2010-09-01
The tumor growth inhibition T/C ratio is commonly used to quantify treatment effects in drug screening tumor xenograft experiments. The T/C ratio is converted to an antitumor activity rating using an arbitrary cutoff point and often without any formal statistical inference. Here, we applied a nonparametric bootstrap method and a small sample likelihood ratio statistic to make a statistical inference of the T/C ratio, including both hypothesis testing and a confidence interval estimate. Furthermore, sample size and power are also discussed for statistical design of tumor xenograft experiments. Tumor xenograft data from an actual experiment were analyzed to illustrate the application.
Perlin, Mark William
2015-01-01
Background: DNA mixtures of two or more people are a common type of forensic crime scene evidence. A match statistic that connects the evidence to a criminal defendant is usually needed for court. Jurors rely on this strength of match to help decide guilt or innocence. However, the reliability of unsophisticated match statistics for DNA mixtures has been questioned. Materials and Methods: The most prevalent match statistic for DNA mixtures is the combined probability of inclusion (CPI), used by crime labs for over 15 years. When testing 13 short tandem repeat (STR) genetic loci, the CPI-1 value is typically around a million, regardless of DNA mixture composition. However, actual identification information, as measured by a likelihood ratio (LR), spans a much broader range. This study examined probability of inclusion (PI) mixture statistics for 517 locus experiments drawn from 16 reported cases and compared them with LR locus information calculated independently on the same data. The log(PI-1) values were examined and compared with corresponding log(LR) values. Results: The LR and CPI methods were compared in case examples of false inclusion, false exclusion, a homicide, and criminal justice outcomes. Statistical analysis of crime laboratory STR data shows that inclusion match statistics exhibit a truncated normal distribution having zero center, with little correlation to actual identification information. By the law of large numbers (LLN), CPI-1 increases with the number of tested genetic loci, regardless of DNA mixture composition or match information. These statistical findings explain why CPI is relatively constant, with implications for DNA policy, criminal justice, cost of crime, and crime prevention. Conclusions: Forensic crime laboratories have generated CPI statistics on hundreds of thousands of DNA mixture evidence items. However, this commonly used match statistic behaves like a random generator of inclusionary values, following the LLN rather than measuring identification information. A quantitative CPI number adds little meaningful information beyond the analyst's initial qualitative assessment that a person's DNA is included in a mixture. Statistical methods for reporting on DNA mixture evidence should be scientifically validated before they are relied upon by criminal justice. PMID:26605124
Robbins, L G
2000-01-01
Graduate school programs in genetics have become so full that courses in statistics have often been eliminated. In addition, typical introductory statistics courses for the "statistics user" rather than the nascent statistician are laden with methods for analysis of measured variables while genetic data are most often discrete numbers. These courses are often seen by students and genetics professors alike as largely irrelevant cookbook courses. The powerful methods of likelihood analysis, although commonly employed in human genetics, are much less often used in other areas of genetics, even though current computational tools make this approach readily accessible. This article introduces the MLIKELY.PAS computer program and the logic of do-it-yourself maximum-likelihood statistics. The program itself, course materials, and expanded discussions of some examples that are only summarized here are available at http://www.unisi. it/ricerca/dip/bio_evol/sitomlikely/mlikely.h tml. PMID:10628965
Effect of the absolute statistic on gene-sampling gene-set analysis methods.
Nam, Dougu
2017-06-01
Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.
Tan, Ming T; Liu, Jian-ping; Lao, Lixing
2012-08-01
Recently, proper use of the statistical methods in traditional Chinese medicine (TCM) randomized controlled trials (RCTs) has received increased attention. Statistical inference based on hypothesis testing is the foundation of clinical trials and evidence-based medicine. In this article, the authors described the methodological differences between literature published in Chinese and Western journals in the design and analysis of acupuncture RCTs and the application of basic statistical principles. In China, qualitative analysis method has been widely used in acupuncture and TCM clinical trials, while the between-group quantitative analysis methods on clinical symptom scores are commonly used in the West. The evidence for and against these analytical differences were discussed based on the data of RCTs assessing acupuncture for pain relief. The authors concluded that although both methods have their unique advantages, quantitative analysis should be used as the primary analysis while qualitative analysis can be a secondary criterion for analysis. The purpose of this paper is to inspire further discussion of such special issues in clinical research design and thus contribute to the increased scientific rigor of TCM research.
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James
2014-01-01
Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.
A spatial scan statistic for multiple clusters.
Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie
2011-10-01
Spatial scan statistics are commonly used for geographical disease surveillance and cluster detection. While there are multiple clusters coexisting in the study area, they become difficult to detect because of clusters' shadowing effect to each other. The recently proposed sequential method showed its better power for detecting the second weaker cluster, but did not improve the ability of detecting the first stronger cluster which is more important than the second one. We propose a new extension of the spatial scan statistic which could be used to detect multiple clusters. Through constructing two or more clusters in the alternative hypothesis, our proposed method accounts for other coexisting clusters in the detecting and evaluating process. The performance of the proposed method is compared to the sequential method through an intensive simulation study, in which our proposed method shows better power in terms of both rejecting the null hypothesis and accurately detecting the coexisting clusters. In the real study of hand-foot-mouth disease data in Pingdu city, a true cluster town is successfully detected by our proposed method, which cannot be evaluated to be statistically significant by the standard method due to another cluster's shadowing effect. Copyright © 2011 Elsevier Inc. All rights reserved.
A spatial scan statistic for nonisotropic two-level risk cluster.
Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie
2012-01-30
Spatial scan statistic methods are commonly used for geographical disease surveillance and cluster detection. The standard spatial scan statistic does not model any variability in the underlying risks of subregions belonging to a detected cluster. For a multilevel risk cluster, the isotonic spatial scan statistic could model a centralized high-risk kernel in the cluster. Because variations in disease risks are anisotropic owing to different social, economical, or transport factors, the real high-risk kernel will not necessarily take the central place in a whole cluster area. We propose a spatial scan statistic for a nonisotropic two-level risk cluster, which could be used to detect a whole cluster and a noncentralized high-risk kernel within the cluster simultaneously. The performance of the three methods was evaluated through an intensive simulation study. Our proposed nonisotropic two-level method showed better power and geographical precision with two-level risk cluster scenarios, especially for a noncentralized high-risk kernel. Our proposed method is illustrated using the hand-foot-mouth disease data in Pingdu City, Shandong, China in May 2009, compared with two other methods. In this practical study, the nonisotropic two-level method is the only way to precisely detect a high-risk area in a detected whole cluster. Copyright © 2011 John Wiley & Sons, Ltd.
Gupta, Munish; Kaplan, Heather C
2017-09-01
Quality improvement (QI) is based on measuring performance over time, and variation in data measured over time must be understood to guide change and make optimal improvements. Common cause variation is natural variation owing to factors inherent to any process; special cause variation is unnatural variation owing to external factors. Statistical process control methods, and particularly control charts, are robust tools for understanding data over time and identifying common and special cause variation. This review provides a practical introduction to the use of control charts in health care QI, with a focus on neonatology. Copyright © 2017 Elsevier Inc. All rights reserved.
[Statistical analysis of German radiologic periodicals: developmental trends in the last 10 years].
Golder, W
1999-09-01
To identify which statistical tests are applied in German radiological publications, to what extent their use has changed during the last decade, and which factors might be responsible for this development. The major articles published in "ROFO" and "DER RADIOLOGE" during 1988, 1993 and 1998 were reviewed for statistical content. The contributions were classified by principal focus and radiological subspecialty. The methods used were assigned to descriptive, basal and advanced statistics. Sample size, significance level and power were established. The use of experts' assistance was monitored. Finally, we calculated the so-called cumulative accessibility of the publications. 525 contributions were found to be eligible. In 1988, 87% used descriptive statistics only, 12.5% basal, and 0.5% advanced statistics. The corresponding figures in 1993 and 1998 are 62 and 49%, 32 and 41%, and 6 and 10%, respectively. Statistical techniques were most likely to be used in research on musculoskeletal imaging and articles dedicated to MRI. Six basic categories of statistical methods account for the complete statistical analysis appearing in 90% of the articles. ROC analysis is the single most common advanced technique. Authors make increasingly use of statistical experts' opinion and programs. During the last decade, the use of statistical methods in German radiological journals has fundamentally improved, both quantitatively and qualitatively. Presently, advanced techniques account for 20% of the pertinent statistical tests. This development seems to be promoted by the increasing availability of statistical analysis software.
The assignment of scores procedure for ordinal categorical data.
Chen, Han-Ching; Wang, Nae-Sheng
2014-01-01
Ordinal data are the most frequently encountered type of data in the social sciences. Many statistical methods can be used to process such data. One common method is to assign scores to the data, convert them into interval data, and further perform statistical analysis. There are several authors who have recently developed assigning score methods to assign scores to ordered categorical data. This paper proposes an approach that defines an assigning score system for an ordinal categorical variable based on underlying continuous latent distribution with interpretation by using three case study examples. The results show that the proposed score system is well for skewed ordinal categorical data.
Jiang, Honghua; Ni, Xiao; Huster, William; Heilmann, Cory
2015-01-01
Hypoglycemia has long been recognized as a major barrier to achieving normoglycemia with intensive diabetic therapies. It is a common safety concern for the diabetes patients. Therefore, it is important to apply appropriate statistical methods when analyzing hypoglycemia data. Here, we carried out bootstrap simulations to investigate the performance of the four commonly used statistical models (Poisson, negative binomial, analysis of covariance [ANCOVA], and rank ANCOVA) based on the data from a diabetes clinical trial. Zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model were also evaluated. Simulation results showed that Poisson model inflated type I error, while negative binomial model was overly conservative. However, after adjusting for dispersion, both Poisson and negative binomial models yielded slightly inflated type I errors, which were close to the nominal level and reasonable power. Reasonable control of type I error was associated with ANCOVA model. Rank ANCOVA model was associated with the greatest power and with reasonable control of type I error. Inflated type I error was observed with ZIP and ZINB models.
Sikirzhytskaya, Aliaksandra; Sikirzhytski, Vitali; Lednev, Igor K
2014-01-01
Body fluids are a common and important type of forensic evidence. In particular, the identification of menstrual blood stains is often a key step during the investigation of rape cases. Here, we report on the application of near-infrared Raman microspectroscopy for differentiating menstrual blood from peripheral blood. We observed that the menstrual and peripheral blood samples have similar but distinct Raman spectra. Advanced statistical analysis of the multiple Raman spectra that were automatically (Raman mapping) acquired from the 40 dried blood stains (20 donors for each group) allowed us to build classification model with maximum (100%) sensitivity and specificity. We also demonstrated that despite certain common constituents, menstrual blood can be readily distinguished from vaginal fluid. All of the classification models were verified using cross-validation methods. The proposed method overcomes the problems associated with currently used biochemical methods, which are destructive, time consuming and expensive. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.
Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa
2011-05-26
Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.
Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R
2016-12-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Can We Spin Straw Into Gold? An Evaluation of Immigrant Legal Status Imputation Approaches
Van Hook, Jennifer; Bachmeier, James D.; Coffman, Donna; Harel, Ofer
2014-01-01
Researchers have developed logical, demographic, and statistical strategies for imputing immigrants’ legal status, but these methods have never been empirically assessed. We used Monte Carlo simulations to test whether, and under what conditions, legal status imputation approaches yield unbiased estimates of the association of unauthorized status with health insurance coverage. We tested five methods under a range of missing data scenarios. Logical and demographic imputation methods yielded biased estimates across all missing data scenarios. Statistical imputation approaches yielded unbiased estimates only when unauthorized status was jointly observed with insurance coverage; when this condition was not met, these methods overestimated insurance coverage for unauthorized relative to legal immigrants. We next showed how bias can be reduced by incorporating prior information about unauthorized immigrants. Finally, we demonstrated the utility of the best-performing statistical method for increasing power. We used it to produce state/regional estimates of insurance coverage among unauthorized immigrants in the Current Population Survey, a data source that contains no direct measures of immigrants’ legal status. We conclude that commonly employed legal status imputation approaches are likely to produce biased estimates, but data and statistical methods exist that could substantially reduce these biases. PMID:25511332
Missing data is a common problem in the application of statistical techniques. In principal component analysis (PCA), a technique for dimensionality reduction, incomplete data points are either discarded or imputed using interpolation methods. Such approaches are less valid when ...
Advantages of Social Network Analysis in Educational Research
ERIC Educational Resources Information Center
Ushakov, K. M.; Kukso, K. N.
2015-01-01
Currently one of the main tools for the large scale studies of schools is statistical analysis. Although it is the most common method and it offers greatest opportunities for analysis, there are other quantitative methods for studying schools, such as network analysis. We discuss the potential advantages that network analysis has for educational…
Improved method for reliable HMW-GS identification by RP-HPLC and SDS-PAGE in common wheat cultivars
USDA-ARS?s Scientific Manuscript database
The accurate identification of alleles for high-molecular weight glutenins (HMW-GS) is critical for wheat breeding programs targeting end-use quality. RP-HPLC methods were optimized for separation of HMW-GS, resulting in enhanced resolution of 1By and 1Dx subunits. Statistically significant differe...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Fangyan; Zhang, Song; Chung Wong, Pak
Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, themore » size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.« less
A close examination of double filtering with fold change and t test in microarray analysis
2009-01-01
Background Many researchers use the double filtering procedure with fold change and t test to identify differentially expressed genes, in the hope that the double filtering will provide extra confidence in the results. Due to its simplicity, the double filtering procedure has been popular with applied researchers despite the development of more sophisticated methods. Results This paper, for the first time to our knowledge, provides theoretical insight on the drawback of the double filtering procedure. We show that fold change assumes all genes to have a common variance while t statistic assumes gene-specific variances. The two statistics are based on contradicting assumptions. Under the assumption that gene variances arise from a mixture of a common variance and gene-specific variances, we develop the theoretically most powerful likelihood ratio test statistic. We further demonstrate that the posterior inference based on a Bayesian mixture model and the widely used significance analysis of microarrays (SAM) statistic are better approximations to the likelihood ratio test than the double filtering procedure. Conclusion We demonstrate through hypothesis testing theory, simulation studies and real data examples, that well constructed shrinkage testing methods, which can be united under the mixture gene variance assumption, can considerably outperform the double filtering procedure. PMID:19995439
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gilbert, Richard O.
The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Somemore » statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.« less
Statistical methods for investigating quiescence and other temporal seismicity patterns
Matthews, M.V.; Reasenberg, P.A.
1988-01-01
We propose a statistical model and a technique for objective recognition of one of the most commonly cited seismicity patterns:microearthquake quiescence. We use a Poisson process model for seismicity and define a process with quiescence as one with a particular type of piece-wise constant intensity function. From this model, we derive a statistic for testing stationarity against a 'quiescence' alternative. The large-sample null distribution of this statistic is approximated from simulated distributions of appropriate functionals applied to Brownian bridge processes. We point out the restrictiveness of the particular model we propose and of the quiescence idea in general. The fact that there are many point processes which have neither constant nor quiescent rate functions underscores the need to test for and describe nonuniformity thoroughly. We advocate the use of the quiescence test in conjunction with various other tests for nonuniformity and with graphical methods such as density estimation. ideally these methods may promote accurate description of temporal seismicity distributions and useful characterizations of interesting patterns. ?? 1988 Birkha??user Verlag.
Randall, Sean M; Ferrante, Anna M; Boyd, James H; Brown, Adrian P; Semmens, James B
2016-08-01
The statistical linkage key (SLK-581) is a common tool for record linkage in Australia, due to its ability to provide some privacy protection. However, newer privacy-preserving approaches may provide greater privacy protection, while allowing high-quality linkage. To evaluate the standard SLK-581, encrypted SLK-581 and a newer privacy-preserving approach using Bloom filters, in terms of both privacy and linkage quality. Linkage quality was compared by conducting linkages on Australian health datasets using these three techniques and examining results. Privacy was compared qualitatively in relation to a series of scenarios where privacy breaches may occur. The Bloom filter technique offered greater privacy protection and linkage quality compared to the SLK-based method commonly used in Australia. The adoption of new privacy-preserving methods would allow both greater confidence in research results, while significantly improving privacy protection. © The Author(s) 2016.
Lindsey, David A.; Tysdal, Russell G.; Taggart, Joseph E.
2002-01-01
The principal purpose of this report is to provide a reference archive for results of a statistical analysis of geochemical data for metasedimentary rocks of Mesoproterozoic age of the Salmon River Mountains and Lemhi Range, central Idaho. Descriptions of geochemical data sets, statistical methods, rationale for interpretations, and references to the literature are provided. Three methods of analysis are used: R-mode factor analysis of major oxide and trace element data for identifying petrochemical processes, analysis of variance for effects of rock type and stratigraphic position on chemical composition, and major-oxide ratio plots for comparison with the chemical composition of common clastic sedimentary rocks.
Statistical Method to Overcome Overfitting Issue in Rational Function Models
NASA Astrophysics Data System (ADS)
Alizadeh Moghaddam, S. H.; Mokhtarzade, M.; Alizadeh Naeini, A.; Alizadeh Moghaddam, S. A.
2017-09-01
Rational function models (RFMs) are known as one of the most appealing models which are extensively applied in geometric correction of satellite images and map production. Overfitting is a common issue, in the case of terrain dependent RFMs, that degrades the accuracy of RFMs-derived geospatial products. This issue, resulting from the high number of RFMs' parameters, leads to ill-posedness of the RFMs. To tackle this problem, in this study, a fast and robust statistical approach is proposed and compared to Tikhonov regularization (TR) method, as a frequently-used solution to RFMs' overfitting. In the proposed method, a statistical test, namely, significance test is applied to search for the RFMs' parameters that are resistant against overfitting issue. The performance of the proposed method was evaluated for two real data sets of Cartosat-1 satellite images. The obtained results demonstrate the efficiency of the proposed method in term of the achievable level of accuracy. This technique, indeed, shows an improvement of 50-80% over the TR.
Young, Robin L; Weinberg, Janice; Vieira, Verónica; Ozonoff, Al; Webster, Thomas F
2010-07-19
A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. The GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic.
2010-01-01
Background A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. Results This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. Conclusions The GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic. PMID:20642827
Poppe, L.J.; Eliason, A.H.; Hastings, M.E.
2004-01-01
Measures that describe and summarize sediment grain-size distributions are important to geologists because of the large amount of information contained in textural data sets. Statistical methods are usually employed to simplify the necessary comparisons among samples and quantify the observed differences. The two statistical methods most commonly used by sedimentologists to describe particle distributions are mathematical moments (Krumbein and Pettijohn, 1938) and inclusive graphics (Folk, 1974). The choice of which of these statistical measures to use is typically governed by the amount of data available (Royse, 1970). If the entire distribution is known, the method of moments may be used; if the next to last accumulated percent is greater than 95, inclusive graphics statistics can be generated. Unfortunately, earlier programs designed to describe sediment grain-size distributions statistically do not run in a Windows environment, do not allow extrapolation of the distribution's tails, or do not generate both moment and graphic statistics (Kane and Hubert, 1963; Collias et al., 1963; Schlee and Webster, 1967; Poppe et al., 2000)1.Owing to analytical limitations, electro-resistance multichannel particle-size analyzers, such as Coulter Counters, commonly truncate the tails of the fine-fraction part of grain-size distributions. These devices do not detect fine clay in the 0.6–0.1 μm range (part of the 11-phi and all of the 12-phi and 13-phi fractions). Although size analyses performed down to 0.6 μm microns are adequate for most freshwater and near shore marine sediments, samples from many deeper water marine environments (e.g. rise and abyssal plain) may contain significant material in the fine clay fraction, and these analyses benefit from extrapolation.The program (GSSTAT) described herein generates statistics to characterize sediment grain-size distributions and can extrapolate the fine-grained end of the particle distribution. It is written in Microsoft Visual Basic 6.0 and provides a window to facilitate program execution. The input for the sediment fractions is weight percentages in whole-phi notation (Krumbein, 1934; Inman, 1952), and the program permits the user to select output in either method of moments or inclusive graphics statistics (Fig. 1). Users select options primarily with mouse-click events, or through interactive dialogue boxes.
General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies
Lee, Seunggeun; Teslovich, Tanya M.; Boehnke, Michael; Lin, Xihong
2013-01-01
We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels. PMID:23768515
Metamodels for Computer-Based Engineering Design: Survey and Recommendations
NASA Technical Reports Server (NTRS)
Simpson, Timothy W.; Peplinski, Jesse; Koch, Patrick N.; Allen, Janet K.
1997-01-01
The use of statistical techniques to build approximations of expensive computer analysis codes pervades much of todays engineering design. These statistical approximations, or metamodels, are used to replace the actual expensive computer analyses, facilitating multidisciplinary, multiobjective optimization and concept exploration. In this paper we review several of these techniques including design of experiments, response surface methodology, Taguchi methods, neural networks, inductive learning, and kriging. We survey their existing application in engineering design and then address the dangers of applying traditional statistical techniques to approximate deterministic computer analysis codes. We conclude with recommendations for the appropriate use of statistical approximation techniques in given situations and how common pitfalls can be avoided.
Morphometrical study on senile larynx.
Zieliński, R
2001-01-01
The aim of the study was a morphometrical macroscopic evaluation of senile larynges, according to its usefulness in ORL diagnostic and operational methods. Larynx preparations were taken from cadavers of both sexes, of age 65 and over, about 24 hours after death. Clinically important laryngeal diameters were collected using common morphometrical methods. A few body features were also being gathered. Computer statistical methods were used in data assessment, including basic statistics and linear correlations between diameters and between diameters and body features. The data presented in the study may be very helpful in evaluation of diagnostic methods. It may also help in selection of right operational tool' sizes, the most appropriate operational technique choice, preoperative preparations and designing and building virtual and plastic models for physicians' training.
Bennett, Derrick A; Landry, Denise; Little, Julian; Minelli, Cosetta
2017-09-19
Several statistical approaches have been proposed to assess and correct for exposure measurement error. We aimed to provide a critical overview of the most common approaches used in nutritional epidemiology. MEDLINE, EMBASE, BIOSIS and CINAHL were searched for reports published in English up to May 2016 in order to ascertain studies that described methods aimed to quantify and/or correct for measurement error for a continuous exposure in nutritional epidemiology using a calibration study. We identified 126 studies, 43 of which described statistical methods and 83 that applied any of these methods to a real dataset. The statistical approaches in the eligible studies were grouped into: a) approaches to quantify the relationship between different dietary assessment instruments and "true intake", which were mostly based on correlation analysis and the method of triads; b) approaches to adjust point and interval estimates of diet-disease associations for measurement error, mostly based on regression calibration analysis and its extensions. Two approaches (multiple imputation and moment reconstruction) were identified that can deal with differential measurement error. For regression calibration, the most common approach to correct for measurement error used in nutritional epidemiology, it is crucial to ensure that its assumptions and requirements are fully met. Analyses that investigate the impact of departures from the classical measurement error model on regression calibration estimates can be helpful to researchers in interpreting their findings. With regard to the possible use of alternative methods when regression calibration is not appropriate, the choice of method should depend on the measurement error model assumed, the availability of suitable calibration study data and the potential for bias due to violation of the classical measurement error model assumptions. On the basis of this review, we provide some practical advice for the use of methods to assess and adjust for measurement error in nutritional epidemiology.
Wiuf, Carsten; Schaumburg-Müller Pallesen, Jonatan; Foldager, Leslie; Grove, Jakob
2016-08-01
In many areas of science it is custom to perform many, potentially millions, of tests simultaneously. To gain statistical power it is common to group tests based on a priori criteria such as predefined regions or by sliding windows. However, it is not straightforward to choose grouping criteria and the results might depend on the chosen criteria. Methods that summarize, or aggregate, test statistics or p-values, without relying on a priori criteria, are therefore desirable. We present a simple method to aggregate a sequence of stochastic variables, such as test statistics or p-values, into fewer variables without assuming a priori defined groups. We provide different ways to evaluate the significance of the aggregated variables based on theoretical considerations and resampling techniques, and show that under certain assumptions the FWER is controlled in the strong sense. Validity of the method was demonstrated using simulations and real data analyses. Our method may be a useful supplement to standard procedures relying on evaluation of test statistics individually. Moreover, by being agnostic and not relying on predefined selected regions, it might be a practical alternative to conventionally used methods of aggregation of p-values over regions. The method is implemented in Python and freely available online (through GitHub, see the Supplementary information).
Cankar, Katarina; Chauvensy-Ancel, Valérie; Fortabat, Marie-Noelle; Gruden, Kristina; Kobilinsky, André; Zel, Jana; Bertheau, Yves
2008-05-15
Detection of nonauthorized genetically modified organisms (GMOs) has always presented an analytical challenge because the complete sequence data needed to detect them are generally unavailable although sequence similarity to known GMOs can be expected. A new approach, differential quantitative polymerase chain reaction (PCR), for detection of nonauthorized GMOs is presented here. This method is based on the presence of several common elements (e.g., promoter, genes of interest) in different GMOs. A statistical model was developed to study the difference between the number of molecules of such a common sequence and the number of molecules identifying the approved GMO (as determined by border-fragment-based PCR) and the donor organism of the common sequence. When this difference differs statistically from zero, the presence of a nonauthorized GMO can be inferred. The interest and scope of such an approach were tested on a case study of different proportions of genetically modified maize events, with the P35S promoter as the Cauliflower Mosaic Virus common sequence. The presence of a nonauthorized GMO was successfully detected in the mixtures analyzed and in the presence of (donor organism of P35S promoter). This method could be easily transposed to other common GMO sequences and other species and is applicable to other detection areas such as microbiology.
Hauber, A Brett; González, Juan Marcos; Groothuis-Oudshoorn, Catharina G M; Prior, Thomas; Marshall, Deborah A; Cunningham, Charles; IJzerman, Maarten J; Bridges, John F P
2016-06-01
Conjoint analysis is a stated-preference survey method that can be used to elicit responses that reveal preferences, priorities, and the relative importance of individual features associated with health care interventions or services. Conjoint analysis methods, particularly discrete choice experiments (DCEs), have been increasingly used to quantify preferences of patients, caregivers, physicians, and other stakeholders. Recent consensus-based guidance on good research practices, including two recent task force reports from the International Society for Pharmacoeconomics and Outcomes Research, has aided in improving the quality of conjoint analyses and DCEs in outcomes research. Nevertheless, uncertainty regarding good research practices for the statistical analysis of data from DCEs persists. There are multiple methods for analyzing DCE data. Understanding the characteristics and appropriate use of different analysis methods is critical to conducting a well-designed DCE study. This report will assist researchers in evaluating and selecting among alternative approaches to conducting statistical analysis of DCE data. We first present a simplistic DCE example and a simple method for using the resulting data. We then present a pedagogical example of a DCE and one of the most common approaches to analyzing data from such a question format-conditional logit. We then describe some common alternative methods for analyzing these data and the strengths and weaknesses of each alternative. We present the ESTIMATE checklist, which includes a list of questions to consider when justifying the choice of analysis method, describing the analysis, and interpreting the results. Copyright © 2016 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.
Harrington, Peter de Boves
2018-01-02
Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.
Space biology initiative program definition review. Trade study 4: Design modularity and commonality
NASA Technical Reports Server (NTRS)
Jackson, L. Neal; Crenshaw, John, Sr.; Davidson, William L.; Herbert, Frank J.; Bilodeau, James W.; Stoval, J. Michael; Sutton, Terry
1989-01-01
The relative cost impacts (up or down) of developing Space Biology hardware using design modularity and commonality is studied. Recommendations for how the hardware development should be accomplished to meet optimum design modularity requirements for Life Science investigation hardware will be provided. In addition, the relative cost impacts of implementing commonality of hardware for all Space Biology hardware are defined. Cost analysis and supporting recommendations for levels of modularity and commonality are presented. A mathematical or statistical cost analysis method with the capability to support development of production design modularity and commonality impacts to parametric cost analysis is provided.
47 CFR 54.807 - Interstate access universal service support.
Code of Federal Regulations, 2010 CFR
2010-10-01
... Common Carrier Bureau Report, Statistics of Communications Common Carriers, Table 6.10—Selected Operating Statistics. Interested parties may obtain this report from the U.S. Government Printing Office or by... Bureau Report, Statistics of Communications Common Carriers, Table 6.10—Selected Operating Statistics...
A high-fidelity weather time series generator using the Markov Chain process on a piecewise level
NASA Astrophysics Data System (ADS)
Hersvik, K.; Endrerud, O.-E. V.
2017-12-01
A method is developed for generating a set of unique weather time-series based on an existing weather series. The method allows statistically valid weather variations to take place within repeated simulations of offshore operations. The numerous generated time series need to share the same statistical qualities as the original time series. Statistical qualities here refer mainly to the distribution of weather windows available for work, including durations and frequencies of such weather windows, and seasonal characteristics. The method is based on the Markov chain process. The core new development lies in how the Markov Process is used, specifically by joining small pieces of random length time series together rather than joining individual weather states, each from a single time step, which is a common solution found in the literature. This new Markov model shows favorable characteristics with respect to the requirements set forth and all aspects of the validation performed.
A Comparison of Analytical and Data Preprocessing Methods for Spectral Fingerprinting
LUTHRIA, DEVANAND L.; MUKHOPADHYAY, SUDARSAN; LIN, LONG-ZE; HARNLY, JAMES M.
2013-01-01
Spectral fingerprinting, as a method of discriminating between plant cultivars and growing treatments for a common set of broccoli samples, was compared for six analytical instruments. Spectra were acquired for finely powdered solid samples using Fourier transform infrared (FT-IR) and Fourier transform near-infrared (NIR) spectrometry. Spectra were also acquired for unfractionated aqueous methanol extracts of the powders using molecular absorption in the ultraviolet (UV) and visible (VIS) regions and mass spectrometry with negative (MS−) and positive (MS+) ionization. The spectra were analyzed using nested one-way analysis of variance (ANOVA) and principal component analysis (PCA) to statistically evaluate the quality of discrimination. All six methods showed statistically significant differences between the cultivars and treatments. The significance of the statistical tests was improved by the judicious selection of spectral regions (IR and NIR), masses (MS+ and MS−), and derivatives (IR, NIR, UV, and VIS). PMID:21352644
Classical Statistics and Statistical Learning in Imaging Neuroscience
Bzdok, Danilo
2017-01-01
Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques. PMID:29056896
A critique of the usefulness of inferential statistics in applied behavior analysis
Hopkins, B. L.; Cole, Brian L.; Mason, Tina L.
1998-01-01
Researchers continue to recommend that applied behavior analysts use inferential statistics in making decisions about effects of independent variables on dependent variables. In many other approaches to behavioral science, inferential statistics are the primary means for deciding the importance of effects. Several possible uses of inferential statistics are considered. Rather than being an objective means for making decisions about effects, as is often claimed, inferential statistics are shown to be subjective. It is argued that the use of inferential statistics adds nothing to the complex and admittedly subjective nonstatistical methods that are often employed in applied behavior analysis. Attacks on inferential statistics that are being made, perhaps with increasing frequency, by those who are not behavior analysts, are discussed. These attackers are calling for banning the use of inferential statistics in research publications and commonly recommend that behavioral scientists should switch to using statistics aimed at interval estimation or the method of confidence intervals. Interval estimation is shown to be contrary to the fundamental assumption of behavior analysis that only individuals behave. It is recommended that authors who wish to publish the results of inferential statistics be asked to justify them as a means for helping us to identify any ways in which they may be useful. PMID:22478304
Dear, Blake F; Heller, Gillian Z; Crane, Monique F; Titov, Nickolai
2018-01-01
Background Missing cases following treatment are common in Web-based psychotherapy trials. Without the ability to directly measure and evaluate the outcomes for missing cases, the ability to measure and evaluate the effects of treatment is challenging. Although common, little is known about the characteristics of Web-based psychotherapy participants who present as missing cases, their likely clinical outcomes, or the suitability of different statistical assumptions that can characterize missing cases. Objective Using a large sample of individuals who underwent Web-based psychotherapy for depressive symptoms (n=820), the aim of this study was to explore the characteristics of cases who present as missing cases at posttreatment (n=138), their likely treatment outcomes, and compare between statistical methods for replacing their missing data. Methods First, common participant and treatment features were tested through binary logistic regression models, evaluating the ability to predict missing cases. Second, the same variables were screened for their ability to increase or impede the rate symptom change that was observed following treatment. Third, using recontacted cases at 3-month follow-up to proximally represent missing cases outcomes following treatment, various simulated replacement scores were compared and evaluated against observed clinical follow-up scores. Results Missing cases were dominantly predicted by lower treatment adherence and increased symptoms at pretreatment. Statistical methods that ignored these characteristics can overlook an important clinical phenomenon and consequently produce inaccurate replacement outcomes, with symptoms estimates that can swing from −32% to 70% from the observed outcomes of recontacted cases. In contrast, longitudinal statistical methods that adjusted their estimates for missing cases outcomes by treatment adherence rates and baseline symptoms scores resulted in minimal measurement bias (<8%). Conclusions Certain variables can characterize and predict missing cases likelihood and jointly predict lesser clinical improvement. Under such circumstances, individuals with potentially worst off treatment outcomes can become concealed, and failure to adjust for this can lead to substantial clinical measurement bias. Together, this preliminary research suggests that missing cases in Web-based psychotherapeutic interventions may not occur as random events and can be systematically predicted. Critically, at the same time, missing cases may experience outcomes that are distinct and important for a complete understanding of the treatment effect. PMID:29674311
Reporting Practices and Use of Quantitative Methods in Canadian Journal Articles in Psychology.
Counsell, Alyssa; Harlow, Lisa L
2017-05-01
With recent focus on the state of research in psychology, it is essential to assess the nature of the statistical methods and analyses used and reported by psychological researchers. To that end, we investigated the prevalence of different statistical procedures and the nature of statistical reporting practices in recent articles from the four major Canadian psychology journals. The majority of authors evaluated their research hypotheses through the use of analysis of variance (ANOVA), t -tests, and multiple regression. Multivariate approaches were less common. Null hypothesis significance testing remains a popular strategy, but the majority of authors reported a standardized or unstandardized effect size measure alongside their significance test results. Confidence intervals on effect sizes were infrequently employed. Many authors provided minimal details about their statistical analyses and less than a third of the articles presented on data complications such as missing data and violations of statistical assumptions. Strengths of and areas needing improvement for reporting quantitative results are highlighted. The paper concludes with recommendations for how researchers and reviewers can improve comprehension and transparency in statistical reporting.
Application of random match probability calculations to mixed STR profiles.
Bille, Todd; Bright, Jo-Anne; Buckleton, John
2013-03-01
Mixed DNA profiles are being encountered more frequently as laboratories analyze increasing amounts of touch evidence. If it is determined that an individual could be a possible contributor to the mixture, it is necessary to perform a statistical analysis to allow an assignment of weight to the evidence. Currently, the combined probability of inclusion (CPI) and the likelihood ratio (LR) are the most commonly used methods to perform the statistical analysis. A third method, random match probability (RMP), is available. This article compares the advantages and disadvantages of the CPI and LR methods to the RMP method. We demonstrate that although the LR method is still considered the most powerful of the binary methods, the RMP and LR methods make similar use of the observed data such as peak height, assumed number of contributors, and known contributors where the CPI calculation tends to waste information and be less informative. © 2013 American Academy of Forensic Sciences.
Testing statistical isotropy in cosmic microwave background polarization maps
NASA Astrophysics Data System (ADS)
Rath, Pranati K.; Samal, Pramoda Kumar; Panda, Srikanta; Mishra, Debesh D.; Aluri, Pavan K.
2018-04-01
We apply our symmetry based Power tensor technique to test conformity of PLANCK Polarization maps with statistical isotropy. On a wide range of angular scales (l = 40 - 150), our preliminary analysis detects many statistically anisotropic multipoles in foreground cleaned full sky PLANCK polarization maps viz., COMMANDER and NILC. We also study the effect of residual foregrounds that may still be present in the Galactic plane using both common UPB77 polarization mask, as well as the individual component separation method specific polarization masks. However, some of the statistically anisotropic modes still persist, albeit significantly in NILC map. We further probed the data for any coherent alignments across multipoles in several bins from the chosen multipole range.
Descriptive and inferential statistical methods used in burns research.
Al-Benna, Sammy; Al-Ajam, Yazan; Way, Benjamin; Steinstraesser, Lars
2010-05-01
Burns research articles utilise a variety of descriptive and inferential methods to present and analyse data. The aim of this study was to determine the descriptive methods (e.g. mean, median, SD, range, etc.) and survey the use of inferential methods (statistical tests) used in articles in the journal Burns. This study defined its population as all original articles published in the journal Burns in 2007. Letters to the editor, brief reports, reviews, and case reports were excluded. Study characteristics, use of descriptive statistics and the number and types of statistical methods employed were evaluated. Of the 51 articles analysed, 11(22%) were randomised controlled trials, 18(35%) were cohort studies, 11(22%) were case control studies and 11(22%) were case series. The study design and objectives were defined in all articles. All articles made use of continuous and descriptive data. Inferential statistics were used in 49(96%) articles. Data dispersion was calculated by standard deviation in 30(59%). Standard error of the mean was quoted in 19(37%). The statistical software product was named in 33(65%). Of the 49 articles that used inferential statistics, the tests were named in 47(96%). The 6 most common tests used (Student's t-test (53%), analysis of variance/co-variance (33%), chi(2) test (27%), Wilcoxon & Mann-Whitney tests (22%), Fisher's exact test (12%)) accounted for the majority (72%) of statistical methods employed. A specified significance level was named in 43(88%) and the exact significance levels were reported in 28(57%). Descriptive analysis and basic statistical techniques account for most of the statistical tests reported. This information should prove useful in deciding which tests should be emphasised in educating burn care professionals. These results highlight the need for burn care professionals to have a sound understanding of basic statistics, which is crucial in interpreting and reporting data. Advice should be sought from professionals in the fields of biostatistics and epidemiology when using more advanced statistical techniques. Copyright 2009 Elsevier Ltd and ISBI. All rights reserved.
Serang, Oliver; Noble, William Stafford
2012-01-01
The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido. PMID:22331862
Confidence Limits for the Indirect Effect: Distribution of the Product and Resampling Methods
ERIC Educational Resources Information Center
MacKinnon, David P.; Lockwood, Chondra M.; Williams, Jason
2004-01-01
The most commonly used method to test an indirect effect is to divide the estimate of the indirect effect by its standard error and compare the resulting z statistic with a critical value from the standard normal distribution. Confidence limits for the indirect effect are also typically based on critical values from the standard normal…
NASA Astrophysics Data System (ADS)
Wang, Dong
2016-03-01
Gears are the most commonly used components in mechanical transmission systems. Their failures may cause transmission system breakdown and result in economic loss. Identification of different gear crack levels is important to prevent any unexpected gear failure because gear cracks lead to gear tooth breakage. Signal processing based methods mainly require expertize to explain gear fault signatures which is usually not easy to be achieved by ordinary users. In order to automatically identify different gear crack levels, intelligent gear crack identification methods should be developed. The previous case studies experimentally proved that K-nearest neighbors based methods exhibit high prediction accuracies for identification of 3 different gear crack levels under different motor speeds and loads. In this short communication, to further enhance prediction accuracies of existing K-nearest neighbors based methods and extend identification of 3 different gear crack levels to identification of 5 different gear crack levels, redundant statistical features are constructed by using Daubechies 44 (db44) binary wavelet packet transform at different wavelet decomposition levels, prior to the use of a K-nearest neighbors method. The dimensionality of redundant statistical features is 620, which provides richer gear fault signatures. Since many of these statistical features are redundant and highly correlated with each other, dimensionality reduction of redundant statistical features is conducted to obtain new significant statistical features. At last, the K-nearest neighbors method is used to identify 5 different gear crack levels under different motor speeds and loads. A case study including 3 experiments is investigated to demonstrate that the developed method provides higher prediction accuracies than the existing K-nearest neighbors based methods for recognizing different gear crack levels under different motor speeds and loads. Based on the new significant statistical features, some other popular statistical models including linear discriminant analysis, quadratic discriminant analysis, classification and regression tree and naive Bayes classifier, are compared with the developed method. The results show that the developed method has the highest prediction accuracies among these statistical models. Additionally, selection of the number of new significant features and parameter selection of K-nearest neighbors are thoroughly investigated.
Performance Characterization of an Instrument.
ERIC Educational Resources Information Center
Salin, Eric D.
1984-01-01
Describes an experiment designed to teach students to apply the same statistical awareness to instrumentation they commonly apply to classical techniques. Uses propagation of error techniques to pinpoint instrumental limitations and breakdowns and to demonstrate capabilities and limitations of volumetric and gravimetric methods. Provides lists of…
Yu, Jihnhee; Yang, Luge; Vexler, Albert; Hutson, Alan D
2016-06-15
The receiver operating characteristic (ROC) curve is a popular technique with applications, for example, investigating an accuracy of a biomarker to delineate between disease and non-disease groups. A common measure of accuracy of a given diagnostic marker is the area under the ROC curve (AUC). In contrast with the AUC, the partial area under the ROC curve (pAUC) looks into the area with certain specificities (i.e., true negative rate) only, and it can be often clinically more relevant than examining the entire ROC curve. The pAUC is commonly estimated based on a U-statistic with the plug-in sample quantile, making the estimator a non-traditional U-statistic. In this article, we propose an accurate and easy method to obtain the variance of the nonparametric pAUC estimator. The proposed method is easy to implement for both one biomarker test and the comparison of two correlated biomarkers because it simply adapts the existing variance estimator of U-statistics. In this article, we show accuracy and other advantages of the proposed variance estimation method by broadly comparing it with previously existing methods. Further, we develop an empirical likelihood inference method based on the proposed variance estimator through a simple implementation. In an application, we demonstrate that, depending on the inferences by either the AUC or pAUC, we can make a different decision on a prognostic ability of a same set of biomarkers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A; van't Veld, Aart A
2012-03-15
To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended. Copyright © 2012 Elsevier Inc. All rights reserved.
Methods for planning a statistical POD study
NASA Astrophysics Data System (ADS)
Koh, Y.-M.; Meeker, W. Q.
2013-01-01
The most common question asked of a statistician is "How large should my sample be?" In NDE applications, the most common questions asked of a statistician are "How many specimens do I need and what should be the distribution of flaw sizes?" Although some useful general guidelines exist (e.g. in MIK-HDBK-1823) it is possible to use statistical tools to provide more definitive guidelines and to allow comparison among different proposed study plans. One can assess the performance of a proposed POD study plan by obtaining computable expressions for estimation precision. This allows for a quick and easy assessment of tradeoffs and comparison of various alternative plans. We use a signal-response dataset obtained from MIK-HDBK-1823 to illustrate the ideas.
Redshift data and statistical inference
NASA Technical Reports Server (NTRS)
Newman, William I.; Haynes, Martha P.; Terzian, Yervant
1994-01-01
Frequency histograms and the 'power spectrum analysis' (PSA) method, the latter developed by Yu & Peebles (1969), have been widely employed as techniques for establishing the existence of periodicities. We provide a formal analysis of these two classes of methods, including controlled numerical experiments, to better understand their proper use and application. In particular, we note that typical published applications of frequency histograms commonly employ far greater numbers of class intervals or bins than is advisable by statistical theory sometimes giving rise to the appearance of spurious patterns. The PSA method generates a sequence of random numbers from observational data which, it is claimed, is exponentially distributed with unit mean and variance, essentially independent of the distribution of the original data. We show that the derived random processes is nonstationary and produces a small but systematic bias in the usual estimate of the mean and variance. Although the derived variable may be reasonably described by an exponential distribution, the tail of the distribution is far removed from that of an exponential, thereby rendering statistical inference and confidence testing based on the tail of the distribution completely unreliable. Finally, we examine a number of astronomical examples wherein these methods have been used giving rise to widespread acceptance of statistically unconfirmed conclusions.
Conceptual and statistical problems associated with the use of diversity indices in ecology.
Barrantes, Gilbert; Sandoval, Luis
2009-09-01
Diversity indices, particularly the Shannon-Wiener index, have extensively been used in analyzing patterns of diversity at different geographic and ecological scales. These indices have serious conceptual and statistical problems which make comparisons of species richness or species abundances across communities nearly impossible. There is often no a single statistical method that retains all information needed to answer even a simple question. However, multivariate analyses could be used instead of diversity indices, such as cluster analyses or multiple regressions. More complex multivariate analyses, such as Canonical Correspondence Analysis, provide very valuable information on environmental variables associated to the presence and abundance of the species in a community. In addition, particular hypotheses associated to changes in species richness across localities, or change in abundance of one, or a group of species can be tested using univariate, bivariate, and/or rarefaction statistical tests. The rarefaction method has proved to be robust to standardize all samples to a common size. Even the simplest method as reporting the number of species per taxonomic category possibly provides more information than a diversity index value.
Portillo, M C; Gonzalez, J M
2008-08-01
Molecular fingerprints of microbial communities are a common method for the analysis and comparison of environmental samples. The significance of differences between microbial community fingerprints was analyzed considering the presence of different phylotypes and their relative abundance. A method is proposed by simulating coverage of the analyzed communities as a function of sampling size applying a Cramér-von Mises statistic. Comparisons were performed by a Monte Carlo testing procedure. As an example, this procedure was used to compare several sediment samples from freshwater ponds using a relative quantitative PCR-DGGE profiling technique. The method was able to discriminate among different samples based on their molecular fingerprints, and confirmed the lack of differences between aliquots from a single sample.
Healthy Worker Effect Phenomenon: Revisited with Emphasis on Statistical Methods – A Review
Chowdhury, Ritam; Shah, Divyang; Payal, Abhishek R.
2017-01-01
Known since 1885 but studied systematically only in the past four decades, the healthy worker effect (HWE) is a special form of selection bias common to occupational cohort studies. The phenomenon has been under debate for many years with respect to its impact, conceptual approach (confounding, selection bias, or both), and ways to resolve or account for its effect. The effect is not uniform across age groups, gender, race, and types of occupations and nor is it constant over time. Hence, assessing HWE and accounting for it in statistical analyses is complicated and requires sophisticated methods. Here, we review the HWE, factors affecting it, and methods developed so far to deal with it. PMID:29391741
Multivariate analysis: A statistical approach for computations
NASA Astrophysics Data System (ADS)
Michu, Sachin; Kaushik, Vandana
2014-10-01
Multivariate analysis is a type of multivariate statistical approach commonly used in, automotive diagnosis, education evaluating clusters in finance etc and more recently in the health-related professions. The objective of the paper is to provide a detailed exploratory discussion about factor analysis (FA) in image retrieval method and correlation analysis (CA) of network traffic. Image retrieval methods aim to retrieve relevant images from a collected database, based on their content. The problem is made more difficult due to the high dimension of the variable space in which the images are represented. Multivariate correlation analysis proposes an anomaly detection and analysis method based on the correlation coefficient matrix. Anomaly behaviors in the network include the various attacks on the network like DDOs attacks and network scanning.
NASA Astrophysics Data System (ADS)
Awi; Ahmar, A. S.; Rahman, A.; Minggi, I.; Mulbar, U.; Asdar; Ruslan; Upu, H.; Alimuddin; Hamda; Rosidah; Sutamrin; Tiro, M. A.; Rusli
2018-01-01
This research aims to reveal the profile about the level of creativity and the ability to propose statistical problem of students at Mathematics Education 2014 Batch in the State University of Makassar in terms of their cognitive style. This research uses explorative qualitative method by giving meta-cognitive scaffolding at the time of research. The hypothesis of research is that students who have field independent (FI) cognitive style in statistics problem posing from the provided information already able to propose the statistical problem that can be solved and create new data and the problem is already been included as a high quality statistical problem, while students who have dependent cognitive field (FD) commonly are still limited in statistics problem posing that can be finished and do not load new data and the problem is included as medium quality statistical problem.
Li, Qizhai; Hu, Jiyuan; Ding, Juan; Zheng, Gang
2014-04-01
A classical approach to combine independent test statistics is Fisher's combination of $p$-values, which follows the $\\chi ^2$ distribution. When the test statistics are dependent, the gamma distribution (GD) is commonly used for the Fisher's combination test (FCT). We propose to use two generalizations of the GD: the generalized and the exponentiated GDs. We study some properties of mis-using the GD for the FCT to combine dependent statistics when one of the two proposed distributions are true. Our results show that both generalizations have better control of type I error rates than the GD, which tends to have inflated type I error rates at more extreme tails. In practice, common model selection criteria (e.g. Akaike information criterion/Bayesian information criterion) can be used to help select a better distribution to use for the FCT. A simple strategy of the two generalizations of the GD in genome-wide association studies is discussed. Applications of the results to genetic pleiotrophic associations are described, where multiple traits are tested for association with a single marker.
Perlin, Mark William
2015-01-01
DNA mixtures of two or more people are a common type of forensic crime scene evidence. A match statistic that connects the evidence to a criminal defendant is usually needed for court. Jurors rely on this strength of match to help decide guilt or innocence. However, the reliability of unsophisticated match statistics for DNA mixtures has been questioned. The most prevalent match statistic for DNA mixtures is the combined probability of inclusion (CPI), used by crime labs for over 15 years. When testing 13 short tandem repeat (STR) genetic loci, the CPI(-1) value is typically around a million, regardless of DNA mixture composition. However, actual identification information, as measured by a likelihood ratio (LR), spans a much broader range. This study examined probability of inclusion (PI) mixture statistics for 517 locus experiments drawn from 16 reported cases and compared them with LR locus information calculated independently on the same data. The log(PI(-1)) values were examined and compared with corresponding log(LR) values. The LR and CPI methods were compared in case examples of false inclusion, false exclusion, a homicide, and criminal justice outcomes. Statistical analysis of crime laboratory STR data shows that inclusion match statistics exhibit a truncated normal distribution having zero center, with little correlation to actual identification information. By the law of large numbers (LLN), CPI(-1) increases with the number of tested genetic loci, regardless of DNA mixture composition or match information. These statistical findings explain why CPI is relatively constant, with implications for DNA policy, criminal justice, cost of crime, and crime prevention. Forensic crime laboratories have generated CPI statistics on hundreds of thousands of DNA mixture evidence items. However, this commonly used match statistic behaves like a random generator of inclusionary values, following the LLN rather than measuring identification information. A quantitative CPI number adds little meaningful information beyond the analyst's initial qualitative assessment that a person's DNA is included in a mixture. Statistical methods for reporting on DNA mixture evidence should be scientifically validated before they are relied upon by criminal justice.
Generalized functional linear models for gene-based case-control association studies.
Fan, Ruzong; Wang, Yifan; Mills, James L; Carter, Tonia C; Lobach, Iryna; Wilson, Alexander F; Bailey-Wilson, Joan E; Weeks, Daniel E; Xiong, Momiao
2014-11-01
By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT-O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT-O. In practice, it is not known whether rare variants or common variants in a gene region are disease related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT-O on real neural tube defects and Hirschsprung's disease datasets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT-O in the real data analysis. Our methods can be used in either gene-disease genome-wide/exome-wide association studies or candidate gene analyses. © 2014 WILEY PERIODICALS, INC.
Generalized Functional Linear Models for Gene-based Case-Control Association Studies
Mills, James L.; Carter, Tonia C.; Lobach, Iryna; Wilson, Alexander F.; Bailey-Wilson, Joan E.; Weeks, Daniel E.; Xiong, Momiao
2014-01-01
By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT-O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT-O. In practice, it is not known whether rare variants or common variants in a gene are disease-related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT-O on real neural tube defects and Hirschsprung's disease data sets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT-O in the real data analysis. Our methods can be used in either gene-disease genome-wide/exome-wide association studies or candidate gene analyses. PMID:25203683
Yan, Kaihong; Dong, Zhaomin; Liu, Yanju; Naidu, Ravi
2016-04-01
Bioaccessibility to assess potential risks resulting from exposure to Pb-contaminated soils is commonly estimated using various in vitro methods. However, existing in vitro methods yield different results depending on the composition of the extractant as well as the contaminated soils. For this reason, the relationships between the five commonly used in vitro methods, the Relative Bioavailability Leaching Procedure (RBALP), the unified BioAccessibility Research Group Europe (BARGE) method (UBM), the Solubility Bioaccessibility Research Consortium assay (SBRC), a Physiologically Based Extraction Test (PBET), and the in vitro Digestion Model (RIVM) were quantified statistically using 10 soils from long-term Pb-contaminated mining and smelter sites located in Western Australia and South Australia. For all 10 soils, the measured Pb bioaccessibility regarding all in vitro methods varied from 1.9 to 106% for gastric phase, which is higher than that for intestinal phase: 0.2 ∼ 78.6%. The variations in Pb bioaccessibility depend on the in vitro models being used, suggesting that the method chosen for bioaccessibility assessment must be validated against in vivo studies prior to use for predicting risk. Regression studies between RBALP and SRBC, RBALP and RIVM (0.06) (0.06 g of soil in each tube, S:L ratios for gastric phase and intestinal phase are 1:375 and 1:958, respectively) showed that Pb bioaccessibility based on the three methods were comparable. Meanwhile, the slopes between RBALP and UBM, RBALP and RIVM (0.6) (0.6 g soil in each tube, S:L ratios for gastric phase and intestinal phase are 1:37.5 and 1:96, respectively) were 1.21 and 1.02, respectively. The findings presented in this study could help standardize in vitro bioaccessibility measurements and provide a scientific basis for further relating Pb bioavailability and soil properties.
Maric, Marija; de Haan, Else; Hogendoorn, Sanne M; Wolters, Lidewij H; Huizenga, Hilde M
2015-03-01
Single-case experimental designs are useful methods in clinical research practice to investigate individual client progress. Their proliferation might have been hampered by methodological challenges such as the difficulty applying existing statistical procedures. In this article, we describe a data-analytic method to analyze univariate (i.e., one symptom) single-case data using the common package SPSS. This method can help the clinical researcher to investigate whether an intervention works as compared with a baseline period or another intervention type, and to determine whether symptom improvement is clinically significant. First, we describe the statistical method in a conceptual way and show how it can be implemented in SPSS. Simulation studies were performed to determine the number of observation points required per intervention phase. Second, to illustrate this method and its implications, we present a case study of an adolescent with anxiety disorders treated with cognitive-behavioral therapy techniques in an outpatient psychotherapy clinic, whose symptoms were regularly assessed before each session. We provide a description of the data analyses and results of this case study. Finally, we discuss the advantages and shortcomings of the proposed method. Copyright © 2014. Published by Elsevier Ltd.
NASA Astrophysics Data System (ADS)
Ohyanagi, S.; Dileonardo, C.
2013-12-01
As a natural phenomenon earthquake occurrence is difficult to predict. Statistical analysis of earthquake data was performed using candlestick chart and Bollinger Band methods. These statistical methods, commonly used in the financial world to analyze market trends were tested against earthquake data. Earthquakes above Mw 4.0 located on shore of Sanriku (37.75°N ~ 41.00°N, 143.00°E ~ 144.50°E) from February 1973 to May 2013 were selected for analysis. Two specific patterns in earthquake occurrence were recognized through the analysis. One is a spread of candlestick prior to the occurrence of events greater than Mw 6.0. A second pattern shows convergence in the Bollinger Band, which implies a positive or negative change in the trend of earthquakes. Both patterns match general models for the buildup and release of strain through the earthquake cycle, and agree with both the characteristics of the candlestick chart and Bollinger Band analysis. These results show there is a high correlation between patterns in earthquake occurrence and trend analysis by these two statistical methods. The results of this study agree with the appropriateness of the application of these financial analysis methods to the analysis of earthquake occurrence.
This paper describes the application and method performance parameters of a Luminex xMAP™ bead-based, multiplex immunoassay for measuring specific antibody responses in saliva samples (n=5438) to antigens of six common waterborne pathogens (Campylobacter jejuni, Helicobacter pylo...
Lee, L.; Helsel, D.
2007-01-01
Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
2011-01-01
Background This study aims to identify the statistical software applications most commonly employed for data analysis in health services research (HSR) studies in the U.S. The study also examines the extent to which information describing the specific analytical software utilized is provided in published articles reporting on HSR studies. Methods Data were extracted from a sample of 1,139 articles (including 877 original research articles) published between 2007 and 2009 in three U.S. HSR journals, that were considered to be representative of the field based upon a set of selection criteria. Descriptive analyses were conducted to categorize patterns in statistical software usage in those articles. The data were stratified by calendar year to detect trends in software use over time. Results Only 61.0% of original research articles in prominent U.S. HSR journals identified the particular type of statistical software application used for data analysis. Stata and SAS were overwhelmingly the most commonly used software applications employed (in 46.0% and 42.6% of articles respectively). However, SAS use grew considerably during the study period compared to other applications. Stratification of the data revealed that the type of statistical software used varied considerably by whether authors were from the U.S. or from other countries. Conclusions The findings highlight a need for HSR investigators to identify more consistently the specific analytical software used in their studies. Knowing that information can be important, because different software packages might produce varying results, owing to differences in the software's underlying estimation methods. PMID:21977990
Slotnick, Scott D
2017-07-01
Analysis of functional magnetic resonance imaging (fMRI) data typically involves over one hundred thousand independent statistical tests; therefore, it is necessary to correct for multiple comparisons to control familywise error. In a recent paper, Eklund, Nichols, and Knutsson used resting-state fMRI data to evaluate commonly employed methods to correct for multiple comparisons and reported unacceptable rates of familywise error. Eklund et al.'s analysis was based on the assumption that resting-state fMRI data reflect null data; however, their 'null data' actually reflected default network activity that inflated familywise error. As such, Eklund et al.'s results provide no basis to question the validity of the thousands of published fMRI studies that have corrected for multiple comparisons or the commonly employed methods to correct for multiple comparisons.
Feio, M J; Ferreira, J; Buffagni, A; Erba, S; Dörflinger, G; Ferréol, M; Munné, A; Prat, N; Tziortzis, I; Urbanič, G
2014-04-01
Within the Mediterranean region each country has its own assessment method based on aquatic macroinvertebrates. However, independently of the classification system, quality assessments should be comparable across members of the European Commission, which means, among others, that the boundaries between classes should not deviate significantly. Here we check for comparability between High-Good and Good-Moderate classifications, through the use of a common metric. Additionally, we discuss the influence of the conceptual and statistical approaches used to calculate a common boundary within the Mediterranean countries participating in the Intercalibration Exercise (e.g., using individual national type-boundaries, one value for each common type or an average boundary by country; weighted average, median) in the overall outcome. All methods, except for the IBMWP (the Iberian BMWP) when applied to temporary rivers, were highly correlated (0.82
DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts.
Lee, Donghyung; Bigdeli, T Bernard; Williamson, Vernell S; Vladimirov, Vladimir I; Riley, Brien P; Fanous, Ayman H; Bacanu, Silviu-Alin
2015-10-01
To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured single nucleotide polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject-level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. Directly Imputing summary STatistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts. To decrease computational and access requirements for the analysis of cosmopolitan cohorts, we propose DISTMIX, which extends DIST capabilities to the analysis of mixed ethnicity cohorts. The method uses a relevant reference panel to directly impute unmeasured SNP statistics based only on statistics at measured SNPs and estimated/user-specified ethnic proportions. Simulations show that the proposed method adequately controls the Type I error rates. The 1000 Genomes panel imputation of summary statistics from the ethnically diverse Psychiatric Genetic Consortium Schizophrenia Phase 2 suggests that, when compared to genotype imputation methods, DISTMIX offers comparable imputation accuracy for only a fraction of computational resources. DISTMIX software, its reference population data, and usage examples are publicly available at http://code.google.com/p/distmix. dlee4@vcu.edu Supplementary Data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
THE SCREENING AND RANKING ALGORITHM FOR CHANGE-POINTS DETECTION IN MULTIPLE SAMPLES
Song, Chi; Min, Xiaoyi; Zhang, Heping
2016-01-01
The chromosome copy number variation (CNV) is the deviation of genomic regions from their normal copy number states, which may associate with many human diseases. Current genetic studies usually collect hundreds to thousands of samples to study the association between CNV and diseases. CNVs can be called by detecting the change-points in mean for sequences of array-based intensity measurements. Although multiple samples are of interest, the majority of the available CNV calling methods are single sample based. Only a few multiple sample methods have been proposed using scan statistics that are computationally intensive and designed toward either common or rare change-points detection. In this paper, we propose a novel multiple sample method by adaptively combining the scan statistic of the screening and ranking algorithm (SaRa), which is computationally efficient and is able to detect both common and rare change-points. We prove that asymptotically this method can find the true change-points with almost certainty and show in theory that multiple sample methods are superior to single sample methods when shared change-points are of interest. Additionally, we report extensive simulation studies to examine the performance of our proposed method. Finally, using our proposed method as well as two competing approaches, we attempt to detect CNVs in the data from the Primary Open-Angle Glaucoma Genes and Environment study, and conclude that our method is faster and requires less information while our ability to detect the CNVs is comparable or better. PMID:28090239
Xu, Xiaobin; Li, Zhenghui; Li, Guo; Zhou, Zhe
2017-04-21
Estimating the state of a dynamic system via noisy sensor measurement is a common problem in sensor methods and applications. Most state estimation methods assume that measurement noise and state perturbations can be modeled as random variables with known statistical properties. However in some practical applications, engineers can only get the range of noises, instead of the precise statistical distributions. Hence, in the framework of Dempster-Shafer (DS) evidence theory, a novel state estimatation method by fusing dependent evidence generated from state equation, observation equation and the actual observations of the system states considering bounded noises is presented. It can be iteratively implemented to provide state estimation values calculated from fusion results at every time step. Finally, the proposed method is applied to a low-frequency acoustic resonance level gauge to obtain high-accuracy measurement results.
Statistical process control in nursing research.
Polit, Denise F; Chaboyer, Wendy
2012-02-01
In intervention studies in which randomization to groups is not possible, researchers typically use quasi-experimental designs. Time series designs are strong quasi-experimental designs but are seldom used, perhaps because of technical and analytic hurdles. Statistical process control (SPC) is an alternative analytic approach to testing hypotheses about intervention effects using data collected over time. SPC, like traditional statistical methods, is a tool for understanding variation and involves the construction of control charts that distinguish between normal, random fluctuations (common cause variation), and statistically significant special cause variation that can result from an innovation. The purpose of this article is to provide an overview of SPC and to illustrate its use in a study of a nursing practice improvement intervention. Copyright © 2011 Wiley Periodicals, Inc.
Speckle reduction in optical coherence tomography by adaptive total variation method
NASA Astrophysics Data System (ADS)
Wu, Tong; Shi, Yaoyao; Liu, Youwen; He, Chongjun
2015-12-01
An adaptive total variation method based on the combination of speckle statistics and total variation restoration is proposed and developed for reducing speckle noise in optical coherence tomography (OCT) images. The statistical distribution of the speckle noise in OCT image is investigated and measured. With the measured parameters such as the mean value and variance of the speckle noise, the OCT image is restored by the adaptive total variation restoration method. The adaptive total variation restoration algorithm was applied to the OCT images of a volunteer's hand skin, which showed effective speckle noise reduction and image quality improvement. For image quality comparison, the commonly used median filtering method was also applied to the same images to reduce the speckle noise. The measured results demonstrate the superior performance of the adaptive total variation restoration method in terms of image signal-to-noise ratio, equivalent number of looks, contrast-to-noise ratio, and mean square error.
A Psychometric Evaluation of the Digital Logic Concept Inventory
ERIC Educational Resources Information Center
Herman, Geoffrey L.; Zilles, Craig; Loui, Michael C.
2014-01-01
Concept inventories hold tremendous promise for promoting the rigorous evaluation of teaching methods that might remedy common student misconceptions and promote deep learning. The measurements from concept inventories can be trusted only if the concept inventories are evaluated both by expert feedback and statistical scrutiny (psychometric…
Novick, Steven; Shen, Yan; Yang, Harry; Peterson, John; LeBlond, Dave; Altan, Stan
2015-01-01
Dissolution (or in vitro release) studies constitute an important aspect of pharmaceutical drug development. One important use of such studies is for justifying a biowaiver for post-approval changes which requires establishing equivalence between the new and old product. We propose a statistically rigorous modeling approach for this purpose based on the estimation of what we refer to as the F2 parameter, an extension of the commonly used f2 statistic. A Bayesian test procedure is proposed in relation to a set of composite hypotheses that capture the similarity requirement on the absolute mean differences between test and reference dissolution profiles. Several examples are provided to illustrate the application. Results of our simulation study comparing the performance of f2 and the proposed method show that our Bayesian approach is comparable to or in many cases superior to the f2 statistic as a decision rule. Further useful extensions of the method, such as the use of continuous-time dissolution modeling, are considered.
Practical steganalysis of digital images: state of the art
NASA Astrophysics Data System (ADS)
Fridrich, Jessica; Goljan, Miroslav
2002-04-01
Steganography is the art of hiding the very presence of communication by embedding secret messages into innocuous looking cover documents, such as digital images. Detection of steganography, estimation of message length, and its extraction belong to the field of steganalysis. Steganalysis has recently received a great deal of attention both from law enforcement and the media. In our paper, we classify and review current stego-detection algorithms that can be used to trace popular steganographic products. We recognize several qualitatively different approaches to practical steganalysis - visual detection, detection based on first order statistics (histogram analysis), dual statistics methods that use spatial correlations in images and higher-order statistics (RS steganalysis), universal blind detection schemes, and special cases, such as JPEG compatibility steganalysis. We also present some new results regarding our previously proposed detection of LSB embedding using sensitive dual statistics. The recent steganalytic methods indicate that the most common paradigm in image steganography - the bit-replacement or bit substitution - is inherently insecure with safe capacities far smaller than previously thought.
Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M; Grün, Sonja
2017-01-01
Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis.
Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M.; Grün, Sonja
2017-01-01
Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis. PMID:28596729
Comparing estimates of climate change impacts from process-based and statistical crop models
NASA Astrophysics Data System (ADS)
Lobell, David B.; Asseng, Senthold
2017-01-01
The potential impacts of climate change on crop productivity are of widespread interest to those concerned with addressing climate change and improving global food security. Two common approaches to assess these impacts are process-based simulation models, which attempt to represent key dynamic processes affecting crop yields, and statistical models, which estimate functional relationships between historical observations of weather and yields. Examples of both approaches are increasingly found in the scientific literature, although often published in different disciplinary journals. Here we compare published sensitivities to changes in temperature, precipitation, carbon dioxide (CO2), and ozone from each approach for the subset of crops, locations, and climate scenarios for which both have been applied. Despite a common perception that statistical models are more pessimistic, we find no systematic differences between the predicted sensitivities to warming from process-based and statistical models up to +2 °C, with limited evidence at higher levels of warming. For precipitation, there are many reasons why estimates could be expected to differ, but few estimates exist to develop robust comparisons, and precipitation changes are rarely the dominant factor for predicting impacts given the prominent role of temperature, CO2, and ozone changes. A common difference between process-based and statistical studies is that the former tend to include the effects of CO2 increases that accompany warming, whereas statistical models typically do not. Major needs moving forward include incorporating CO2 effects into statistical studies, improving both approaches’ treatment of ozone, and increasing the use of both methods within the same study. At the same time, those who fund or use crop model projections should understand that in the short-term, both approaches when done well are likely to provide similar estimates of warming impacts, with statistical models generally requiring fewer resources to produce robust estimates, especially when applied to crops beyond the major grains.
BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.
Kaur, Harpreet; Raghava, G P S
2002-03-01
beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. The server is accessible from http://imtech.res.in/raghava/betatpred/
Evolving forecasting classifications and applications in health forecasting
Soyiri, Ireneous N; Reidpath, Daniel D
2012-01-01
Health forecasting forewarns the health community about future health situations and disease episodes so that health systems can better allocate resources and manage demand. The tools used for developing and measuring the accuracy and validity of health forecasts commonly are not defined although they are usually adapted forms of statistical procedures. This review identifies previous typologies used in classifying the forecasting methods commonly used in forecasting health conditions or situations. It then discusses the strengths and weaknesses of these methods and presents the choices available for measuring the accuracy of health-forecasting models, including a note on the discrepancies in the modes of validation. PMID:22615533
Profile Of 'Original Articles' Published In 2016 By The Journal Of Ayub Medical College, Pakistan.
Shaikh, Masood Ali
2018-01-01
Journal of Ayub Medical College (JAMC) is the only Medline indexed biomedical journal of Pakistan that is edited and published by a medical college. Assessing the trends of study designs employed, statistical methods used, and statistical analysis software used in the articles of medical journals help understand the sophistication of research published. The objectives of this descriptive study were to assess all original articles published by JAMC in the year 2016. JAMC published 147 original articles in the year 2016. The most commonly used study design was crosssectional studies, with 64 (43.5%) articles reporting its use. Statistical tests involving bivariate analysis were most common and reported by 73 (49.6%) articles. Use of SPSS software was reported by 109 (74.1%) of articles. Most 138 (93.9%) of the original articles published were based on studies conducted in Pakistan. The number and sophistication of analysis reported in JAMC increased from year 2014 to 2016.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Labby, Z.
Physicists are often expected to have a solid grounding in experimental design and statistical analysis, sometimes filling in when biostatisticians or other experts are not available for consultation. Unfortunately, graduate education on these topics is seldom emphasized and few opportunities for continuing education exist. Clinical physicists incorporate new technology and methods into their practice based on published literature. A poor understanding of experimental design and analysis could Result in inappropriate use of new techniques. Clinical physicists also improve current practice through quality initiatives that require sound experimental design and analysis. Academic physicists with a poor understanding of design and analysismore » may produce ambiguous (or misleading) results. This can Result in unnecessary rewrites, publication rejection, and experimental redesign (wasting time, money, and effort). This symposium will provide a practical review of error and uncertainty, common study designs, and statistical tests. Instruction will primarily focus on practical implementation through examples and answer questions such as: where would you typically apply the test/design and where is the test/design typically misapplied (i.e., common pitfalls)? An analysis of error and uncertainty will also be explored using biological studies and associated modeling as a specific use case. Learning Objectives: Understand common experimental testing and clinical trial designs, what questions they can answer, and how to interpret the results Determine where specific statistical tests are appropriate and identify common pitfalls Understand the how uncertainty and error are addressed in biological testing and associated biological modeling.« less
A guide to missing data for the pediatric nephrologist.
Larkins, Nicholas G; Craig, Jonathan C; Teixeira-Pinto, Armando
2018-03-13
Missing data is an important and common source of bias in clinical research. Readers should be alert to and consider the impact of missing data when reading studies. Beyond preventing missing data in the first place, through good study design and conduct, there are different strategies available to handle data containing missing observations. Complete case analysis is often biased unless data are missing completely at random. Better methods of handling missing data include multiple imputation and models using likelihood-based estimation. With advancing computing power and modern statistical software, these methods are within the reach of clinician-researchers under guidance of a biostatistician. As clinicians reading papers, we need to continue to update our understanding of statistical methods, so that we understand the limitations of these techniques and can critically interpret literature.
Guidelines for the design and statistical analysis of experiments in papers submitted to ATLA.
Festing, M F
2001-01-01
In vitro experiments need to be well designed and correctly analysed if they are to achieve their full potential to replace the use of animals in research. An "experiment" is a procedure for collecting scientific data in order to answer a hypothesis, or to provide material for generating new hypotheses, and differs from a survey because the scientist has control over the treatments that can be applied. Most experiments can be classified into one of a few formal designs, the most common being completely randomised, and randomised block designs. These are quite common with in vitro experiments, which are often replicated in time. Some experiments involve a single independent (treatment) variable, while other "factorial" designs simultaneously vary two or more independent variables, such as drug treatment and cell line. Factorial designs often provide additional information at little extra cost. Experiments need to be carefully planned to avoid bias, be powerful yet simple, provide for a valid statistical analysis and, in some cases, have a wide range of applicability. Virtually all experiments need some sort of statistical analysis in order to take account of biological variation among the experimental subjects. Parametric methods using the t test or analysis of variance are usually more powerful than non-parametric methods, provided the underlying assumptions of normality of the residuals and equal variances are approximately valid. The statistical analyses of data from a completely randomised design, and from a randomised-block design are demonstrated in Appendices 1 and 2, and methods of determining sample size are discussed in Appendix 3. Appendix 4 gives a checklist for authors submitting papers to ATLA.
A comparison of Probability Of Detection (POD) data determined using different statistical methods
NASA Astrophysics Data System (ADS)
Fahr, A.; Forsyth, D.; Bullock, M.
1993-12-01
Different statistical methods have been suggested for determining probability of detection (POD) data for nondestructive inspection (NDI) techniques. A comparative assessment of various methods of determining POD was conducted using results of three NDI methods obtained by inspecting actual aircraft engine compressor disks which contained service induced cracks. The study found that the POD and 95 percent confidence curves as a function of crack size as well as the 90/95 percent crack length vary depending on the statistical method used and the type of data. The distribution function as well as the parameter estimation procedure used for determining POD and the confidence bound must be included when referencing information such as the 90/95 percent crack length. The POD curves and confidence bounds determined using the range interval method are very dependent on information that is not from the inspection data. The maximum likelihood estimators (MLE) method does not require such information and the POD results are more reasonable. The log-logistic function appears to model POD of hit/miss data relatively well and is easy to implement. The log-normal distribution using MLE provides more realistic POD results and is the preferred method. Although it is more complicated and slower to calculate, it can be implemented on a common spreadsheet program.
Use of Statistical Analyses in the Ophthalmic Literature
Lisboa, Renato; Meira-Freitas, Daniel; Tatham, Andrew J.; Marvasti, Amir H.; Sharpsten, Lucie; Medeiros, Felipe A.
2014-01-01
Purpose To identify the most commonly used statistical analyses in the ophthalmic literature and to determine the likely gain in comprehension of the literature that readers could expect if they were to sequentially add knowledge of more advanced techniques to their statistical repertoire. Design Cross-sectional study Methods All articles published from January 2012 to December 2012 in Ophthalmology, American Journal of Ophthalmology and Archives of Ophthalmology were reviewed. A total of 780 peer-reviewed articles were included. Two reviewers examined each article and assigned categories to each one depending on the type of statistical analyses used. Discrepancies between reviewers were resolved by consensus. Main Outcome Measures Total number and percentage of articles containing each category of statistical analysis were obtained. Additionally we estimated the accumulated number and percentage of articles that a reader would be expected to be able to interpret depending on their statistical repertoire. Results Readers with little or no statistical knowledge would be expected to be able to interpret the statistical methods presented in only 20.8% of articles. In order to understand more than half (51.4%) of the articles published, readers were expected to be familiar with at least 15 different statistical methods. Knowledge of 21 categories of statistical methods was necessary to comprehend 70.9% of articles, while knowledge of more than 29 categories was necessary to comprehend more than 90% of articles. Articles in retina and glaucoma subspecialties showed a tendency for using more complex analysis when compared to cornea. Conclusions Readers of clinical journals in ophthalmology need to have substantial knowledge of statistical methodology to understand the results of published studies in the literature. The frequency of use of complex statistical analyses also indicates that those involved in the editorial peer-review process must have sound statistical knowledge in order to critically appraise articles submitted for publication. The results of this study could provide guidance to direct the statistical learning of clinical ophthalmologists, researchers and educators involved in the design of courses for residents and medical students. PMID:24612977
Tips, Tropes, and Trivia: Ideas for Teaching Educational Research.
ERIC Educational Resources Information Center
Stallings, William M.; And Others
The collective experience of more than 50 years has led to the development of approaches that have enhanced student comprehension in the teaching of educational research methods, statistics, and measurement. Tips for teachers include using illustrative problems with one-digit numbers, using common situations and everyday objects to illustrate…
Approaches to Art Therapy for Cancer Inpatients: Research and Practice Considerations
ERIC Educational Resources Information Center
Nainis, Nancy A.
2008-01-01
Common symptoms reported by cancer patients include pain, fatigue, breathlessness, insomnia, lack of appetite, and anxiety. A study conducted by an interdisciplinary research team (Nainis et al., 2006) demonstrated statistically significant reductions in these cancer symptoms with the use of traditional art therapy methods. The study found a…
Item Screening in Graphical Loglinear Rasch Models
ERIC Educational Resources Information Center
Kreiner, Svend; Christensen, Karl Bang
2011-01-01
In behavioural sciences, local dependence and DIF are common, and purification procedures that eliminate items with these weaknesses often result in short scales with poor reliability. Graphical loglinear Rasch models (Kreiner & Christensen, in "Statistical Methods for Quality of Life Studies," ed. by M. Mesbah, F.C. Cole & M.T.…
Joint QTL linkage mapping for multiple-cross mating design sharing one common parent
USDA-ARS?s Scientific Manuscript database
Nested association mapping (NAM) is a novel genetic mating design that combines the advantages of linkage analysis and association mapping. This design provides opportunities to study the inheritance of complex traits, but also requires more advanced statistical methods. In this paper, we present th...
[The role of international comparison in surveying divorce trends].
Cseh-szombathy, L
1982-04-01
In this study, the main survey methods and approaches to analyzing data on divorce in different countries are described. Some problems concerning the international comparability of divorce statistics are examined, and the value of international comparative studies for identifying common factors affecting divorce is stressed. (summary in ENG, RUS)
Markov Logic Networks in the Analysis of Genetic Data
Sakhanenko, Nikita A.
2010-01-01
Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249
Karin, Eyal; Dear, Blake F; Heller, Gillian Z; Crane, Monique F; Titov, Nickolai
2018-04-19
Missing cases following treatment are common in Web-based psychotherapy trials. Without the ability to directly measure and evaluate the outcomes for missing cases, the ability to measure and evaluate the effects of treatment is challenging. Although common, little is known about the characteristics of Web-based psychotherapy participants who present as missing cases, their likely clinical outcomes, or the suitability of different statistical assumptions that can characterize missing cases. Using a large sample of individuals who underwent Web-based psychotherapy for depressive symptoms (n=820), the aim of this study was to explore the characteristics of cases who present as missing cases at posttreatment (n=138), their likely treatment outcomes, and compare between statistical methods for replacing their missing data. First, common participant and treatment features were tested through binary logistic regression models, evaluating the ability to predict missing cases. Second, the same variables were screened for their ability to increase or impede the rate symptom change that was observed following treatment. Third, using recontacted cases at 3-month follow-up to proximally represent missing cases outcomes following treatment, various simulated replacement scores were compared and evaluated against observed clinical follow-up scores. Missing cases were dominantly predicted by lower treatment adherence and increased symptoms at pretreatment. Statistical methods that ignored these characteristics can overlook an important clinical phenomenon and consequently produce inaccurate replacement outcomes, with symptoms estimates that can swing from -32% to 70% from the observed outcomes of recontacted cases. In contrast, longitudinal statistical methods that adjusted their estimates for missing cases outcomes by treatment adherence rates and baseline symptoms scores resulted in minimal measurement bias (<8%). Certain variables can characterize and predict missing cases likelihood and jointly predict lesser clinical improvement. Under such circumstances, individuals with potentially worst off treatment outcomes can become concealed, and failure to adjust for this can lead to substantial clinical measurement bias. Together, this preliminary research suggests that missing cases in Web-based psychotherapeutic interventions may not occur as random events and can be systematically predicted. Critically, at the same time, missing cases may experience outcomes that are distinct and important for a complete understanding of the treatment effect. ©Eyal Karin, Blake F Dear, Gillian Z Heller, Monique F Crane, Nickolai Titov. Originally published in JMIR Mental Health (http://mental.jmir.org), 19.04.2018.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sparks, R.B.; Aydogan, B.
In the development of new radiopharmaceuticals, animal studies are typically performed to get a first approximation of the expected radiation dose in humans. This study evaluates the performance of some commonly used data extrapolation techniques to predict residence times in humans using data collected from animals. Residence times were calculated using animal and human data, and distributions of ratios of the animal results to human results were constructed for each extrapolation method. Four methods using animal data to predict human residence times were examined: (1) using no extrapolation, (2) using relative organ mass extrapolation, (3) using physiological time extrapolation, andmore » (4) using a combination of the mass and time methods. The residence time ratios were found to be log normally distributed for the nonextrapolated and extrapolated data sets. The use of relative organ mass extrapolation yielded no statistically significant change in the geometric mean or variance of the residence time ratios as compared to using no extrapolation. Physiologic time extrapolation yielded a statistically significant improvement (p < 0.01, paired t test) in the geometric mean of the residence time ratio from 0.5 to 0.8. Combining mass and time methods did not significantly improve the results of using time extrapolation alone. 63 refs., 4 figs., 3 tabs.« less
Preparing and Presenting Effective Research Posters
Miller, Jane E
2007-01-01
Objectives Posters are a common way to present results of a statistical analysis, program evaluation, or other project at professional conferences. Often, researchers fail to recognize the unique nature of the format, which is a hybrid of a published paper and an oral presentation. This methods note demonstrates how to design research posters to convey study objectives, methods, findings, and implications effectively to varied professional audiences. Methods A review of existing literature on research communication and poster design is used to identify and demonstrate important considerations for poster content and layout. Guidelines on how to write about statistical methods, results, and statistical significance are illustrated with samples of ineffective writing annotated to point out weaknesses, accompanied by concrete examples and explanations of improved presentation. A comparison of the content and format of papers, speeches, and posters is also provided. Findings Each component of a research poster about a quantitative analysis should be adapted to the audience and format, with complex statistical results translated into simplified charts, tables, and bulleted text to convey findings as part of a clear, focused story line. Conclusions Effective research posters should be designed around two or three key findings with accompanying handouts and narrative description to supply additional technical detail and encourage dialog with poster viewers. PMID:17355594
Taylor, Sandra L; Ruhaak, L Renee; Weiss, Robert H; Kelly, Karen; Kim, Kyoungmi
2017-01-01
High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact between-biospecimen correlation and multivariate analysis results. We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen. We provide R functions to implement and illustrate our method as supplementary information CONTACT: sltaylor@ucdavis.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Chang, Cheng; Xu, Kaikun; Guo, Chaoping; Wang, Jinxia; Yan, Qi; Zhang, Jian; He, Fuchu; Zhu, Yunping
2018-05-22
Compared with the numerous software tools developed for identification and quantification of -omics data, there remains a lack of suitable tools for both downstream analysis and data visualization. To help researchers better understand the biological meanings in their -omics data, we present an easy-to-use tool, named PANDA-view, for both statistical analysis and visualization of quantitative proteomics data and other -omics data. PANDA-view contains various kinds of analysis methods such as normalization, missing value imputation, statistical tests, clustering and principal component analysis, as well as the most commonly-used data visualization methods including an interactive volcano plot. Additionally, it provides user-friendly interfaces for protein-peptide-spectrum representation of the quantitative proteomics data. PANDA-view is freely available at https://sourceforge.net/projects/panda-view/. 1987ccpacer@163.com and zhuyunping@gmail.com. Supplementary data are available at Bioinformatics online.
Zhou, Xiang
2017-12-01
Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal z -scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.
Separate-channel analysis of two-channel microarrays: recovering inter-spot information.
Smyth, Gordon K; Altman, Naomi S
2013-05-26
Two-channel (or two-color) microarrays are cost-effective platforms for comparative analysis of gene expression. They are traditionally analysed in terms of the log-ratios (M-values) of the two channel intensities at each spot, but this analysis does not use all the information available in the separate channel observations. Mixed models have been proposed to analyse intensities from the two channels as separate observations, but such models can be complex to use and the gain in efficiency over the log-ratio analysis is difficult to quantify. Mixed models yield test statistics for the null distributions can be specified only approximately, and some approaches do not borrow strength between genes. This article reformulates the mixed model to clarify the relationship with the traditional log-ratio analysis, to facilitate information borrowing between genes, and to obtain an exact distributional theory for the resulting test statistics. The mixed model is transformed to operate on the M-values and A-values (average log-expression for each spot) instead of on the log-expression values. The log-ratio analysis is shown to ignore information contained in the A-values. The relative efficiency of the log-ratio analysis is shown to depend on the size of the intraspot correlation. A new separate channel analysis method is proposed that assumes a constant intra-spot correlation coefficient across all genes. This approach permits the mixed model to be transformed into an ordinary linear model, allowing the data analysis to use a well-understood empirical Bayes analysis pipeline for linear modeling of microarray data. This yields statistically powerful test statistics that have an exact distributional theory. The log-ratio, mixed model and common correlation methods are compared using three case studies. The results show that separate channel analyses that borrow strength between genes are more powerful than log-ratio analyses. The common correlation analysis is the most powerful of all. The common correlation method proposed in this article for separate-channel analysis of two-channel microarray data is no more difficult to apply in practice than the traditional log-ratio analysis. It provides an intuitive and powerful means to conduct analyses and make comparisons that might otherwise not be possible.
Association analysis of multiple traits by an approach of combining P values.
Chen, Lili; Wang, Yong; Zhou, Yajing
2018-03-01
Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants.
Moore, Jason H; Amos, Ryan; Kiralis, Jeff; Andrews, Peter C
2015-01-01
Simulation plays an essential role in the development of new computational and statistical methods for the genetic analysis of complex traits. Most simulations start with a statistical model using methods such as linear or logistic regression that specify the relationship between genotype and phenotype. This is appealing due to its simplicity and because these statistical methods are commonly used in genetic analysis. It is our working hypothesis that simulations need to move beyond simple statistical models to more realistically represent the biological complexity of genetic architecture. The goal of the present study was to develop a prototype genotype–phenotype simulation method and software that are capable of simulating complex genetic effects within the context of a hierarchical biology-based framework. Specifically, our goal is to simulate multilocus epistasis or gene–gene interaction where the genetic variants are organized within the framework of one or more genes, their regulatory regions and other regulatory loci. We introduce here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating data in this manner. This approach combines a biological hierarchy, a flexible mathematical framework, a liability threshold model for defining disease endpoints, and a heuristic search strategy for identifying high-order epistatic models of disease susceptibility. We provide several simulation examples using genetic models exhibiting independent main effects and three-way epistatic effects. PMID:25395175
Brown, Jeffrey S.; Petronis, Kenneth R.; Bate, Andrew; Zhang, Fang; Dashevsky, Inna; Kulldorff, Martin; Avery, Taliser R.; Davis, Robert L.; Chan, K. Arnold; Andrade, Susan E.; Boudreau, Denise; Gunter, Margaret J.; Herrinton, Lisa; Pawloski, Pamala A.; Raebel, Marsha A.; Roblin, Douglas; Smith, David; Reynolds, Robert
2013-01-01
Background: Drug adverse event (AE) signal detection using the Gamma Poisson Shrinker (GPS) is commonly applied in spontaneous reporting. AE signal detection using large observational health plan databases can expand medication safety surveillance. Methods: Using data from nine health plans, we conducted a pilot study to evaluate the implementation and findings of the GPS approach for two antifungal drugs, terbinafine and itraconazole, and two diabetes drugs, pioglitazone and rosiglitazone. We evaluated 1676 diagnosis codes grouped into 183 different clinical concepts and four levels of granularity. Several signaling thresholds were assessed. GPS results were compared to findings from a companion study using the identical analytic dataset but an alternative statistical method—the tree-based scan statistic (TreeScan). Results: We identified 71 statistical signals across two signaling thresholds and two methods, including closely-related signals of overlapping diagnosis definitions. Initial review found that most signals represented known adverse drug reactions or confounding. About 31% of signals met the highest signaling threshold. Conclusions: The GPS method was successfully applied to observational health plan data in a distributed data environment as a drug safety data mining method. There was substantial concordance between the GPS and TreeScan approaches. Key method implementation decisions relate to defining exposures and outcomes and informed choice of signaling thresholds. PMID:24300404
Zhang, Yun; Baheti, Saurabh; Sun, Zhifu
2018-05-01
High-throughput bisulfite methylation sequencing such as reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing is commonly used for base resolution methylome research. These data are represented either by the ratio of methylated cytosine versus total coverage at a CpG site or numbers of methylated and unmethylated cytosines. Multiple statistical methods can be used to detect differentially methylated CpGs (DMCs) between conditions, and these methods are often the base for the next step of differentially methylated region identification. The ratio data have a flexibility of fitting to many linear models, but the raw count data take consideration of coverage information. There is an array of options in each datatype for DMC detection; however, it is not clear which is an optimal statistical method. In this study, we systematically evaluated four statistic methods on methylation ratio data and four methods on count-based data and compared their performances with regard to type I error control, sensitivity and specificity of DMC detection and computational resource demands using real RRBS data along with simulation. Our results show that the ratio-based tests are generally more conservative (less sensitive) than the count-based tests. However, some count-based methods have high false-positive rates and should be avoided. The beta-binomial model gives a good balance between sensitivity and specificity and is preferred method. Selection of methods in different settings, signal versus noise and sample size estimation are also discussed.
Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.
Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen
2015-05-01
Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.
The Mantel-Haenszel procedure revisited: models and generalizations.
Fidler, Vaclav; Nagelkerke, Nico
2013-01-01
Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented.
The Mantel-Haenszel Procedure Revisited: Models and Generalizations
Fidler, Vaclav; Nagelkerke, Nico
2013-01-01
Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented. PMID:23516463
Modelling gene expression profiles related to prostate tumor progression using binary states
2013-01-01
Background Cancer is a complex disease commonly characterized by the disrupted activity of several cancer-related genes such as oncogenes and tumor-suppressor genes. Previous studies suggest that the process of tumor progression to malignancy is dynamic and can be traced by changes in gene expression. Despite the enormous efforts made for differential expression detection and biomarker discovery, few methods have been designed to model the gene expression level to tumor stage during malignancy progression. Such models could help us understand the dynamics and simplify or reveal the complexity of tumor progression. Methods We have modeled an on-off state of gene activation per sample then per stage to select gene expression profiles associated to tumor progression. The selection is guided by statistical significance of profiles based on random permutated datasets. Results We show that our method identifies expected profiles corresponding to oncogenes and tumor suppressor genes in a prostate tumor progression dataset. Comparisons with other methods support our findings and indicate that a considerable proportion of significant profiles is not found by other statistical tests commonly used to detect differential expression between tumor stages nor found by other tailored methods. Ontology and pathway analysis concurred with these findings. Conclusions Results suggest that our methodology may be a valuable tool to study tumor malignancy progression, which might reveal novel cancer therapies. PMID:23721350
Bolin, Jocelyn Holden; Finch, W Holmes
2014-01-01
Statistical classification of phenomena into observed groups is very common in the social and behavioral sciences. Statistical classification methods, however, are affected by the characteristics of the data under study. Statistical classification can be further complicated by initial misclassification of the observed groups. The purpose of this study is to investigate the impact of initial training data misclassification on several statistical classification and data mining techniques. Misclassification conditions in the three group case will be simulated and results will be presented in terms of overall as well as subgroup classification accuracy. Results show decreased classification accuracy as sample size, group separation and group size ratio decrease and as misclassification percentage increases with random forests demonstrating the highest accuracy across conditions.
Applied statistics in ecology: common pitfalls and simple solutions
E. Ashley Steel; Maureen C. Kennedy; Patrick G. Cunningham; John S. Stanovick
2013-01-01
The most common statistical pitfalls in ecological research are those associated with data exploration, the logic of sampling and design, and the interpretation of statistical results. Although one can find published errors in calculations, the majority of statistical pitfalls result from incorrect logic or interpretation despite correct numerical calculations. There...
Multivariate analysis of longitudinal rates of change.
Bryan, Matthew; Heagerty, Patrick J
2016-12-10
Longitudinal data allow direct comparison of the change in patient outcomes associated with treatment or exposure. Frequently, several longitudinal measures are collected that either reflect a common underlying health status, or characterize processes that are influenced in a similar way by covariates such as exposure or demographic characteristics. Statistical methods that can combine multivariate response variables into common measures of covariate effects have been proposed in the literature. Current methods for characterizing the relationship between covariates and the rate of change in multivariate outcomes are limited to select models. For example, 'accelerated time' methods have been developed which assume that covariates rescale time in longitudinal models for disease progression. In this manuscript, we detail an alternative multivariate model formulation that directly structures longitudinal rates of change and that permits a common covariate effect across multiple outcomes. We detail maximum likelihood estimation for a multivariate longitudinal mixed model. We show via asymptotic calculations the potential gain in power that may be achieved with a common analysis of multiple outcomes. We apply the proposed methods to the analysis of a trivariate outcome for infant growth and compare rates of change for HIV infected and uninfected infants. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Kerr, Kathleen F.; Meisner, Allison; Thiessen-Philbrook, Heather; Coca, Steven G.
2014-01-01
The field of nephrology is actively involved in developing biomarkers and improving models for predicting patients’ risks of AKI and CKD and their outcomes. However, some important aspects of evaluating biomarkers and risk models are not widely appreciated, and statistical methods are still evolving. This review describes some of the most important statistical concepts for this area of research and identifies common pitfalls. Particular attention is paid to metrics proposed within the last 5 years for quantifying the incremental predictive value of a new biomarker. PMID:24855282
Pugh, Aaron L.
2014-01-01
Users of streamflow information often require streamflow statistics and basin characteristics at various locations along a stream. The USGS periodically calculates and publishes streamflow statistics and basin characteristics for streamflowgaging stations and partial-record stations, but these data commonly are scattered among many reports that may or may not be readily available to the public. The USGS also provides and periodically updates regional analyses of streamflow statistics that include regression equations and other prediction methods for estimating statistics for ungaged and unregulated streams across the State. Use of these regional predictions for a stream can be complex and often requires the user to determine a number of basin characteristics that may require interpretation. Basin characteristics may include drainage area, classifiers for physical properties, climatic characteristics, and other inputs. Obtaining these input values for gaged and ungaged locations traditionally has been time consuming, subjective, and can lead to inconsistent results.
NASA Astrophysics Data System (ADS)
Santiago-Lona, Cynthia V.; Hernández-Montes, María del Socorro; Mendoza-Santoyo, Fernando; Esquivel-Tejeda, Jesús
2018-02-01
The study and quantification of the tympanic membrane (TM) displacements add important information to advance the knowledge about the hearing process. A comparative statistical analysis between two commonly used demodulation methods employed to recover the optical phase in digital holographic interferometry, namely the fast Fourier transform and phase-shifting interferometry, is presented as applied to study thin tissues such as the TM. The resulting experimental TM surface displacement data are used to contrast both methods through the analysis of variance and F tests. Data are gathered when the TMs are excited with continuous sound stimuli at levels 86, 89 and 93 dB SPL for the frequencies of 800, 1300 and 2500 Hz under the same experimental conditions. The statistical analysis shows repeatability in z-direction displacements with a standard deviation of 0.086, 0.098 and 0.080 μm using the Fourier method, and 0.080, 0.104 and 0.055 μm with the phase-shifting method at a 95% confidence level for all frequencies. The precision and accuracy are evaluated by means of the coefficient of variation; the results with the Fourier method are 0.06143, 0.06125, 0.06154 and 0.06154, 0.06118, 0.06111 with phase-shifting. The relative error between both methods is 7.143, 6.250 and 30.769%. On comparing the measured displacements, the results indicate that there is no statistically significant difference between both methods for frequencies at 800 and 1300 Hz; however, errors and other statistics increase at 2500 Hz.
Non-linear scaling of a musculoskeletal model of the lower limb using statistical shape models.
Nolte, Daniel; Tsang, Chui Kit; Zhang, Kai Yu; Ding, Ziyun; Kedgley, Angela E; Bull, Anthony M J
2016-10-03
Accurate muscle geometry for musculoskeletal models is important to enable accurate subject-specific simulations. Commonly, linear scaling is used to obtain individualised muscle geometry. More advanced methods include non-linear scaling using segmented bone surfaces and manual or semi-automatic digitisation of muscle paths from medical images. In this study, a new scaling method combining non-linear scaling with reconstructions of bone surfaces using statistical shape modelling is presented. Statistical Shape Models (SSMs) of femur and tibia/fibula were used to reconstruct bone surfaces of nine subjects. Reference models were created by morphing manually digitised muscle paths to mean shapes of the SSMs using non-linear transformations and inter-subject variability was calculated. Subject-specific models of muscle attachment and via points were created from three reference models. The accuracy was evaluated by calculating the differences between the scaled and manually digitised models. The points defining the muscle paths showed large inter-subject variability at the thigh and shank - up to 26mm; this was found to limit the accuracy of all studied scaling methods. Errors for the subject-specific muscle point reconstructions of the thigh could be decreased by 9% to 20% by using the non-linear scaling compared to a typical linear scaling method. We conclude that the proposed non-linear scaling method is more accurate than linear scaling methods. Thus, when combined with the ability to reconstruct bone surfaces from incomplete or scattered geometry data using statistical shape models our proposed method is an alternative to linear scaling methods. Copyright © 2016 The Author. Published by Elsevier Ltd.. All rights reserved.
Statistics, Handle with Care: Detecting Multiple Model Components with the Likelihood Ratio Test
NASA Astrophysics Data System (ADS)
Protassov, Rostislav; van Dyk, David A.; Connors, Alanna; Kashyap, Vinay L.; Siemiginowska, Aneta
2002-05-01
The likelihood ratio test (LRT) and the related F-test, popularized in astrophysics by Eadie and coworkers in 1971, Bevington in 1969, Lampton, Margon, & Bowyer, in 1976, Cash in 1979, and Avni in 1978, do not (even asymptotically) adhere to their nominal χ2 and F-distributions in many statistical tests common in astrophysics, thereby casting many marginal line or source detections and nondetections into doubt. Although the above authors illustrate the many legitimate uses of these statistics, in some important cases it can be impossible to compute the correct false positive rate. For example, it has become common practice to use the LRT or the F-test to detect a line in a spectral model or a source above background despite the lack of certain required regularity conditions. (These applications were not originally suggested by Cash or by Bevington.) In these and other settings that involve testing a hypothesis that is on the boundary of the parameter space, contrary to common practice, the nominal χ2 distribution for the LRT or the F-distribution for the F-test should not be used. In this paper, we characterize an important class of problems in which the LRT and the F-test fail and illustrate this nonstandard behavior. We briefly sketch several possible acceptable alternatives, focusing on Bayesian posterior predictive probability values. We present this method in some detail since it is a simple, robust, and intuitive approach. This alternative method is illustrated using the gamma-ray burst of 1997 May 8 (GRB 970508) to investigate the presence of an Fe K emission line during the initial phase of the observation. There are many legitimate uses of the LRT and the F-test in astrophysics, and even when these tests are inappropriate, there remain several statistical alternatives (e.g., judicious use of error bars and Bayes factors). Nevertheless, there are numerous cases of the inappropriate use of the LRT and similar tests in the literature, bringing substantive scientific results into question.
Simplified estimation of age-specific reference intervals for skewed data.
Wright, E M; Royston, P
1997-12-30
Age-specific reference intervals are commonly used in medical screening and clinical practice, where interest lies in the detection of extreme values. Many different statistical approaches have been published on this topic. The advantages of a parametric method are that they necessarily produce smooth centile curves, the entire density is estimated and an explicit formula is available for the centiles. The method proposed here is a simplified version of a recent approach proposed by Royston and Wright. Basic transformations of the data and multiple regression techniques are combined to model the mean, standard deviation and skewness. Using these simple tools, which are implemented in almost all statistical computer packages, age-specific reference intervals may be obtained. The scope of the method is illustrated by fitting models to several real data sets and assessing each model using goodness-of-fit techniques.
Patton, Charles J.; Kryskalla, Jennifer R.
2011-01-01
In addition to operational details and performance benchmarks for these new DA-AtNaR2 nitrate + nitrite assays, this report also provides results of interference studies for common inorganic and organic matrix constituents at 1, 10, and 100 times their median concentrations in surface-water and groundwater samples submitted annually to the NWQL for nitrate + nitrite analyses. Paired t-test and Wilcoxon signed-rank statistical analyses of results determined by CFA-CdR methods and DA-AtNaR2 methods indicate that nitrate concentration differences between population means or sign ranks were either statistically equivalent to zero at the 95 percent confidence level (p ≥ 0.05) or analytically equivalent to zero-that is, when p < 0.05, concentration differences between population means or medians were less than MDLs.
Landslide Susceptibility Statistical Methods: A Critical and Systematic Literature Review
NASA Astrophysics Data System (ADS)
Mihir, Monika; Malamud, Bruce; Rossi, Mauro; Reichenbach, Paola; Ardizzone, Francesca
2014-05-01
Landslide susceptibility assessment, the subject of this systematic review, is aimed at understanding the spatial probability of slope failures under a set of geomorphological and environmental conditions. It is estimated that about 375 landslides that occur globally each year are fatal, with around 4600 people killed per year. Past studies have brought out the increasing cost of landslide damages which primarily can be attributed to human occupation and increased human activities in the vulnerable environments. Many scientists, to evaluate and reduce landslide risk, have made an effort to efficiently map landslide susceptibility using different statistical methods. In this paper, we do a critical and systematic landslide susceptibility literature review, in terms of the different statistical methods used. For each of a broad set of studies reviewed we note: (i) study geography region and areal extent, (ii) landslide types, (iii) inventory type and temporal period covered, (iv) mapping technique (v) thematic variables used (vi) statistical models, (vii) assessment of model skill, (viii) uncertainty assessment methods, (ix) validation methods. We then pulled out broad trends within our review of landslide susceptibility, particularly regarding the statistical methods. We found that the most common statistical methods used in the study of landslide susceptibility include logistic regression, artificial neural network, discriminant analysis and weight of evidence. Although most of the studies we reviewed assessed the model skill, very few assessed model uncertainty. In terms of geographic extent, the largest number of landslide susceptibility zonations were in Turkey, Korea, Spain, Italy and Malaysia. However, there are also many landslides and fatalities in other localities, particularly India, China, Philippines, Nepal and Indonesia, Guatemala, and Pakistan, where there are much fewer landslide susceptibility studies available in the peer-review literature. This raises some concern that existing studies do not always cover all the regions globally that currently experience landslides and landslide fatalities.
ERIC Educational Resources Information Center
Li, Spencer D.
2011-01-01
Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…
It has been fifty years since Kirkham and Bartholmew (1954) presented the conceptual framework and derived the mathematical equations that formed the basis of the now commonly employed method of 15N isotope dilution. Although many advances in methodology and analysis have been ma...
Power Analysis in Two-Level Unbalanced Designs
ERIC Educational Resources Information Center
Konstantopoulos, Spyros
2010-01-01
Previous work on statistical power has discussed mainly single-level designs or 2-level balanced designs with random effects. Although balanced experiments are common, in practice balance cannot always be achieved. Work on class size is one example of unbalanced designs. This study provides methods for power analysis in 2-level unbalanced designs…
Modeling Error Distributions of Growth Curve Models through Bayesian Methods
ERIC Educational Resources Information Center
Zhang, Zhiyong
2016-01-01
Growth curve models are widely used in social and behavioral sciences. However, typical growth curve models often assume that the errors are normally distributed although non-normal data may be even more common than normal data. In order to avoid possible statistical inference problems in blindly assuming normality, a general Bayesian framework is…
Principal Component Analysis: Resources for an Essential Application of Linear Algebra
ERIC Educational Resources Information Center
Pankavich, Stephen; Swanson, Rebecca
2015-01-01
Principal Component Analysis (PCA) is a highly useful topic within an introductory Linear Algebra course, especially since it can be used to incorporate a number of applied projects. This method represents an essential application and extension of the Spectral Theorem and is commonly used within a variety of fields, including statistics,…
Effective Computer-Aided Assessment of Mathematics; Principles, Practice and Results
ERIC Educational Resources Information Center
Greenhow, Martin
2015-01-01
This article outlines some key issues for writing effective computer-aided assessment (CAA) questions in subjects with substantial mathematical or statistical content, especially the importance of control of random parameters and the encoding of wrong methods of solution (mal-rules) commonly used by students. The pros and cons of using CAA and…
Have the Focus and Sophistication of Research in Health Education Changed?
ERIC Educational Resources Information Center
Merrill, Ray M.; Lindsay, Christopher A.; Shields, Eric C.; Stoddard, Julianne
2007-01-01
This study assessed the types of research and the statistical methods used in three representative health education journals from 1994 through 2003. Editorials, commentaries, program/practice notes, and perspectives represent 17.6% of the journals' content. The most common types of articles are cross-sectional studies (27.5%), reviews (23.2%), and…
Electronic contributions to the sigma(p) parameter of the Hammett equation.
Domingo, Luis R; Pérez, Patricia; Contreras, Renato
2003-07-25
A statistical procedure to obtain the intrinsic electronic contributions to the Hammett substituent constant sigma(p) is reported. The method is based on the comparison between the experimental sigma(p) values and the electronic electrophilicity index omega evaluated for a series of 42 functional groups commonly present in organic compounds.
Easy Extraction of Roundworms from Earthworm Hosts.
ERIC Educational Resources Information Center
Eyster, Linda S.; Fried, Bernard
2000-01-01
Describes the inexpensive and safe method of using roundworms in the classroom or laboratories. Because parasitic infections are so common, students should learn about worms. Provides statistics on just how many people have a worm infection in the world. Explains how to study living nematodes, and obtain and use earthworms. (Contains 13…
Estimating annual bole biomass production using uncertainty analysis
Travis J. Woolley; Mark E. Harmon; Kari B. O' Connell
2007-01-01
Two common sampling methodologies coupled with a simple statistical model were evaluated to determine the accuracy and precision of annual bole biomass production (BBP) and inter-annual variability estimates using this type of approach. We performed an uncertainty analysis using Monte Carlo methods in conjunction with radial growth core data from trees in three Douglas...
Variation in passing standards for graduation-level knowledge items at UK medical schools.
Taylor, Celia A; Gurnell, Mark; Melville, Colin R; Kluth, David C; Johnson, Neil; Wass, Val
2017-06-01
Given the absence of a common passing standard for students at UK medical schools, this paper compares independently set standards for common 'one from five' single-best-answer (multiple-choice) items used in graduation-level applied knowledge examinations and explores potential reasons for any differences. A repeated cross-sectional study was conducted. Participating schools were sent a common set of graduation-level items (55 in 2013-2014; 60 in 2014-2015). Items were selected against a blueprint and subjected to a quality review process. Each school employed its own standard-setting process for the common items. The primary outcome was the passing standard for the common items by each medical school set using the Angoff or Ebel methods. Of 31 invited medical schools, 22 participated in 2013-2014 (71%) and 30 (97%) in 2014-2015. Schools used a mean of 49 and 53 common items in 2013-2014 and 2014-2015, respectively, representing around one-third of the items in the examinations in which they were embedded. Data from 19 (61%) and 26 (84%) schools, respectively, met the inclusion criteria for comparison of standards. There were statistically significant differences in the passing standards set by schools in both years (effect sizes (f 2 ): 0.041 in 2013-2014 and 0.218 in 2014-2015; both p < 0.001). The interquartile range of standards was 5.7 percentage points in 2013-2014 and 6.5 percentage points in 2014-2015. There was a positive correlation between the relative standards set by schools in the 2 years (Pearson's r = 0.57, n = 18, p = 0.014). Time allowed per item, method of standard setting and timing of examination in the curriculum did not have a statistically significant impact on standards. Independently set standards for common single-best-answer items used in graduation-level examinations vary across UK medical schools. Further work to examine standard-setting processes in more detail is needed to help explain this variability and develop methods to reduce it. © 2017 John Wiley & Sons Ltd and The Association for the Study of Medical Education.
Multiple Phenotype Association Tests Using Summary Statistics in Genome-Wide Association Studies
Liu, Zhonghua; Lin, Xihong
2017-01-01
Summary We study in this paper jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. PMID:28653391
Multiple phenotype association tests using summary statistics in genome-wide association studies.
Liu, Zhonghua; Lin, Xihong
2018-03-01
We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. © 2017, The International Biometric Society.
Extracting and Converting Quantitative Data into Human Error Probabilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tuan Q. Tran; Ronald L. Boring; Jeffrey C. Joe
2007-08-01
This paper discusses a proposed method using a combination of advanced statistical approaches (e.g., meta-analysis, regression, structural equation modeling) that will not only convert different empirical results into a common metric for scaling individual PSFs effects, but will also examine the complex interrelationships among PSFs. Furthermore, the paper discusses how the derived statistical estimates (i.e., effect sizes) can be mapped onto a HRA method (e.g. SPAR-H) to generate HEPs that can then be use in probabilistic risk assessment (PRA). The paper concludes with a discussion of the benefits of using academic literature in assisting HRA analysts in generating sound HEPsmore » and HRA developers in validating current HRA models and formulating new HRA models.« less
Hypothesis testing for band size detection of high-dimensional banded precision matrices.
An, Baiguo; Guo, Jianhua; Liu, Yufeng
2014-06-01
Many statistical analysis procedures require a good estimator for a high-dimensional covariance matrix or its inverse, the precision matrix. When the precision matrix is banded, the Cholesky-based method often yields a good estimator of the precision matrix. One important aspect of this method is determination of the band size of the precision matrix. In practice, crossvalidation is commonly used; however, we show that crossvalidation not only is computationally intensive but can be very unstable. In this paper, we propose a new hypothesis testing procedure to determine the band size in high dimensions. Our proposed test statistic is shown to be asymptotically normal under the null hypothesis, and its theoretical power is studied. Numerical examples demonstrate the effectiveness of our testing procedure.
Tree Colors: Color Schemes for Tree-Structured Data.
Tennekes, Martijn; de Jonge, Edwin
2014-12-01
We present a method to map tree structures to colors from the Hue-Chroma-Luminance color model, which is known for its well balanced perceptual properties. The Tree Colors method can be tuned with several parameters, whose effect on the resulting color schemes is discussed in detail. We provide a free and open source implementation with sensible parameter defaults. Categorical data are very common in statistical graphics, and often these categories form a classification tree. We evaluate applying Tree Colors to tree structured data with a survey on a large group of users from a national statistical institute. Our user study suggests that Tree Colors are useful, not only for improving node-link diagrams, but also for unveiling tree structure in non-hierarchical visualizations.
NASA Astrophysics Data System (ADS)
Ranaie, Mehrdad; Soffianian, Alireza; Pourmanafi, Saeid; Mirghaffari, Noorollah; Tarkesh, Mostafa
2018-03-01
In recent decade, analyzing the remotely sensed imagery is considered as one of the most common and widely used procedures in the environmental studies. In this case, supervised image classification techniques play a central role. Hence, taking a high resolution Worldview-3 over a mixed urbanized landscape in Iran, three less applied image classification methods including Bagged CART, Stochastic gradient boosting model and Neural network with feature extraction were tested and compared with two prevalent methods: random forest and support vector machine with linear kernel. To do so, each method was run ten time and three validation techniques was used to estimate the accuracy statistics consist of cross validation, independent validation and validation with total of train data. Moreover, using ANOVA and Tukey test, statistical difference significance between the classification methods was significantly surveyed. In general, the results showed that random forest with marginal difference compared to Bagged CART and stochastic gradient boosting model is the best performing method whilst based on independent validation there was no significant difference between the performances of classification methods. It should be finally noted that neural network with feature extraction and linear support vector machine had better processing speed than other.
Wilcox, Rand; Carlson, Mike; Azen, Stan; Clark, Florence
2013-03-01
Recently, there have been major advances in statistical techniques for assessing central tendency and measures of association. The practical utility of modern methods has been documented extensively in the statistics literature, but they remain underused and relatively unknown in clinical trials. Our objective was to address this issue. STUDY DESIGN AND PURPOSE: The first purpose was to review common problems associated with standard methodologies (low power, lack of control over type I errors, and incorrect assessments of the strength of the association). The second purpose was to summarize some modern methods that can be used to circumvent such problems. The third purpose was to illustrate the practical utility of modern robust methods using data from the Well Elderly 2 randomized controlled trial. In multiple instances, robust methods uncovered differences among groups and associations among variables that were not detected by classic techniques. In particular, the results demonstrated that details of the nature and strength of the association were sometimes overlooked when using ordinary least squares regression and Pearson correlation. Modern robust methods can make a practical difference in detecting and describing differences between groups and associations between variables. Such procedures should be applied more frequently when analyzing trial-based data. Copyright © 2013 Elsevier Inc. All rights reserved.
Probability genotype imputation method and integrated weighted lasso for QTL identification.
Demetrashvili, Nino; Van den Heuvel, Edwin R; Wit, Ernst C
2013-12-30
Many QTL studies have two common features: (1) often there is missing marker information, (2) among many markers involved in the biological process only a few are causal. In statistics, the second issue falls under the headings "sparsity" and "causal inference". The goal of this work is to develop a two-step statistical methodology for QTL mapping for markers with binary genotypes. The first step introduces a novel imputation method for missing genotypes. Outcomes of the proposed imputation method are probabilities which serve as weights to the second step, namely in weighted lasso. The sparse phenotype inference is employed to select a set of predictive markers for the trait of interest. Simulation studies validate the proposed methodology under a wide range of realistic settings. Furthermore, the methodology outperforms alternative imputation and variable selection methods in such studies. The methodology was applied to an Arabidopsis experiment, containing 69 markers for 165 recombinant inbred lines of a F8 generation. The results confirm previously identified regions, however several new markers are also found. On the basis of the inferred ROC behavior these markers show good potential for being real, especially for the germination trait Gmax. Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method. Also, the proposed weighted lasso outperforms commonly practiced multiple regression as well as the traditional lasso and adaptive lasso with three weighting schemes. This means that under realistic missing data settings this methodology can be used for QTL identification.
Chung, Dongjun; Kim, Hang J; Zhao, Hongyu
2017-02-01
Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, identification of risk variants associated with complex diseases remains challenging as they are often affected by many genetic variants with small or moderate effects. There has been accumulating evidence suggesting that different complex traits share common risk basis, namely pleiotropy. Recently, several statistical methods have been developed to improve statistical power to identify risk variants for complex traits through a joint analysis of multiple GWAS datasets by leveraging pleiotropy. While these methods were shown to improve statistical power for association mapping compared to separate analyses, they are still limited in the number of phenotypes that can be integrated. In order to address this challenge, in this paper, we propose a novel statistical framework, graph-GPA, to integrate a large number of GWAS datasets for multiple phenotypes using a hidden Markov random field approach. Application of graph-GPA to a joint analysis of GWAS datasets for 12 phenotypes shows that graph-GPA improves statistical power to identify risk variants compared to statistical methods based on smaller number of GWAS datasets. In addition, graph-GPA also promotes better understanding of genetic mechanisms shared among phenotypes, which can potentially be useful for the development of improved diagnosis and therapeutics. The R implementation of graph-GPA is currently available at https://dongjunchung.github.io/GGPA/.
Study/experimental/research design: much more than statistics.
Knight, Kenneth L
2010-01-01
The purpose of study, experimental, or research design in scientific manuscripts has changed significantly over the years. It has evolved from an explanation of the design of the experiment (ie, data gathering or acquisition) to an explanation of the statistical analysis. This practice makes "Methods" sections hard to read and understand. To clarify the difference between study design and statistical analysis, to show the advantages of a properly written study design on article comprehension, and to encourage authors to correctly describe study designs. The role of study design is explored from the introduction of the concept by Fisher through modern-day scientists and the AMA Manual of Style. At one time, when experiments were simpler, the study design and statistical design were identical or very similar. With the complex research that is common today, which often includes manipulating variables to create new variables and the multiple (and different) analyses of a single data set, data collection is very different than statistical design. Thus, both a study design and a statistical design are necessary. Scientific manuscripts will be much easier to read and comprehend. A proper experimental design serves as a road map to the study methods, helping readers to understand more clearly how the data were obtained and, therefore, assisting them in properly analyzing the results.
Assessing dynamics, spatial scale, and uncertainty in task-related brain network analyses
Stephen, Emily P.; Lepage, Kyle Q.; Eden, Uri T.; Brunner, Peter; Schalk, Gerwin; Brumberg, Jonathan S.; Guenther, Frank H.; Kramer, Mark A.
2014-01-01
The brain is a complex network of interconnected elements, whose interactions evolve dynamically in time to cooperatively perform specific functions. A common technique to probe these interactions involves multi-sensor recordings of brain activity during a repeated task. Many techniques exist to characterize the resulting task-related activity, including establishing functional networks, which represent the statistical associations between brain areas. Although functional network inference is commonly employed to analyze neural time series data, techniques to assess the uncertainty—both in the functional network edges and the corresponding aggregate measures of network topology—are lacking. To address this, we describe a statistically principled approach for computing uncertainty in functional networks and aggregate network measures in task-related data. The approach is based on a resampling procedure that utilizes the trial structure common in experimental recordings. We show in simulations that this approach successfully identifies functional networks and associated measures of confidence emergent during a task in a variety of scenarios, including dynamically evolving networks. In addition, we describe a principled technique for establishing functional networks based on predetermined regions of interest using canonical correlation. Doing so provides additional robustness to the functional network inference. Finally, we illustrate the use of these methods on example invasive brain voltage recordings collected during an overt speech task. The general strategy described here—appropriate for static and dynamic network inference and different statistical measures of coupling—permits the evaluation of confidence in network measures in a variety of settings common to neuroscience. PMID:24678295
Assessing dynamics, spatial scale, and uncertainty in task-related brain network analyses.
Stephen, Emily P; Lepage, Kyle Q; Eden, Uri T; Brunner, Peter; Schalk, Gerwin; Brumberg, Jonathan S; Guenther, Frank H; Kramer, Mark A
2014-01-01
The brain is a complex network of interconnected elements, whose interactions evolve dynamically in time to cooperatively perform specific functions. A common technique to probe these interactions involves multi-sensor recordings of brain activity during a repeated task. Many techniques exist to characterize the resulting task-related activity, including establishing functional networks, which represent the statistical associations between brain areas. Although functional network inference is commonly employed to analyze neural time series data, techniques to assess the uncertainty-both in the functional network edges and the corresponding aggregate measures of network topology-are lacking. To address this, we describe a statistically principled approach for computing uncertainty in functional networks and aggregate network measures in task-related data. The approach is based on a resampling procedure that utilizes the trial structure common in experimental recordings. We show in simulations that this approach successfully identifies functional networks and associated measures of confidence emergent during a task in a variety of scenarios, including dynamically evolving networks. In addition, we describe a principled technique for establishing functional networks based on predetermined regions of interest using canonical correlation. Doing so provides additional robustness to the functional network inference. Finally, we illustrate the use of these methods on example invasive brain voltage recordings collected during an overt speech task. The general strategy described here-appropriate for static and dynamic network inference and different statistical measures of coupling-permits the evaluation of confidence in network measures in a variety of settings common to neuroscience.
Overholser, Brian R; Sowinski, Kevin M
2007-12-01
Biostatistics is the application of statistics to biologic data. The field of statistics can be broken down into 2 fundamental parts: descriptive and inferential. Descriptive statistics are commonly used to categorize, display, and summarize data. Inferential statistics can be used to make predictions based on a sample obtained from a population or some large body of information. It is these inferences that are used to test specific research hypotheses. This 2-part review will outline important features of descriptive and inferential statistics as they apply to commonly conducted research studies in the biomedical literature. Part 1 in this issue will discuss fundamental topics of statistics and data analysis. Additionally, some of the most commonly used statistical tests found in the biomedical literature will be reviewed in Part 2 in the February 2008 issue.
Across-cohort QC analyses of GWAS summary statistics from complex traits.
Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M
2016-01-01
Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics F st statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.
Across-cohort QC analyses of GWAS summary statistics from complex traits
Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M
2017-01-01
Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics Fst statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy. PMID:27552965
Stewart, Sarah; Pearson, Janet; Rome, Keith; Dalbeth, Nicola; Vandal, Alain C
2018-01-01
Statistical techniques currently used in musculoskeletal research often inefficiently account for paired-limb measurements or the relationship between measurements taken from multiple regions within limbs. This study compared three commonly used analysis methods with a mixed-models approach that appropriately accounted for the association between limbs, regions, and trials and that utilised all information available from repeated trials. Four analysis were applied to an existing data set containing plantar pressure data, which was collected for seven masked regions on right and left feet, over three trials, across three participant groups. Methods 1-3 averaged data over trials and analysed right foot data (Method 1), data from a randomly selected foot (Method 2), and averaged right and left foot data (Method 3). Method 4 used all available data in a mixed-effects regression that accounted for repeated measures taken for each foot, foot region and trial. Confidence interval widths for the mean differences between groups for each foot region were used as a criterion for comparison of statistical efficiency. Mean differences in pressure between groups were similar across methods for each foot region, while the confidence interval widths were consistently smaller for Method 4. Method 4 also revealed significant between-group differences that were not detected by Methods 1-3. A mixed effects linear model approach generates improved efficiency and power by producing more precise estimates compared to alternative approaches that discard information in the process of accounting for paired-limb measurements. This approach is recommended in generating more clinically sound and statistically efficient research outputs. Copyright © 2017 Elsevier B.V. All rights reserved.
Statistical atlas based extrapolation of CT data
NASA Astrophysics Data System (ADS)
Chintalapani, Gouthami; Murphy, Ryan; Armiger, Robert S.; Lepisto, Jyri; Otake, Yoshito; Sugano, Nobuhiko; Taylor, Russell H.; Armand, Mehran
2010-02-01
We present a framework to estimate the missing anatomical details from a partial CT scan with the help of statistical shape models. The motivating application is periacetabular osteotomy (PAO), a technique for treating developmental hip dysplasia, an abnormal condition of the hip socket that, if untreated, may lead to osteoarthritis. The common goals of PAO are to reduce pain, joint subluxation and improve contact pressure distribution by increasing the coverage of the femoral head by the hip socket. While current diagnosis and planning is based on radiological measurements, because of significant structural variations in dysplastic hips, a computer-assisted geometrical and biomechanical planning based on CT data is desirable to help the surgeon achieve optimal joint realignments. Most of the patients undergoing PAO are young females, hence it is usually desirable to minimize the radiation dose by scanning only the joint portion of the hip anatomy. These partial scans, however, do not provide enough information for biomechanical analysis due to missing iliac region. A statistical shape model of full pelvis anatomy is constructed from a database of CT scans. The partial volume is first aligned with the statistical atlas using an iterative affine registration, followed by a deformable registration step and the missing information is inferred from the atlas. The atlas inferences are further enhanced by the use of X-ray images of the patient, which are very common in an osteotomy procedure. The proposed method is validated with a leave-one-out analysis method. Osteotomy cuts are simulated and the effect of atlas predicted models on the actual procedure is evaluated.
NASA Astrophysics Data System (ADS)
Most, S.; Nowak, W.; Bijeljic, B.
2014-12-01
Transport processes in porous media are frequently simulated as particle movement. This process can be formulated as a stochastic process of particle position increments. At the pore scale, the geometry and micro-heterogeneities prohibit the commonly made assumption of independent and normally distributed increments to represent dispersion. Many recent particle methods seek to loosen this assumption. Recent experimental data suggest that we have not yet reached the end of the need to generalize, because particle increments show statistical dependency beyond linear correlation and over many time steps. The goal of this work is to better understand the validity regions of commonly made assumptions. We are investigating after what transport distances can we observe: A statistical dependence between increments, that can be modelled as an order-k Markov process, boils down to order 1. This would be the Markovian distance for the process, where the validity of yet-unexplored non-Gaussian-but-Markovian random walks would start. A bivariate statistical dependence that simplifies to a multi-Gaussian dependence based on simple linear correlation (validity of correlated PTRW). Complete absence of statistical dependence (validity of classical PTRW/CTRW). The approach is to derive a statistical model for pore-scale transport from a powerful experimental data set via copula analysis. The model is formulated as a non-Gaussian, mutually dependent Markov process of higher order, which allows us to investigate the validity ranges of simpler models.
A Universal Trend among Proteomes Indicates an Oily Last Common Ancestor
Mannige, Ranjan V.; Brooks, Charles L.; Shakhnovich, Eugene I.
2012-01-01
Despite progresses in ancestral protein sequence reconstruction, much needs to be unraveled about the nature of the putative last common ancestral proteome that served as the prototype of all extant lifeforms. Here, we present data that indicate a steady decline (oil escape) in proteome hydrophobicity over species evolvedness (node number) evident in 272 diverse proteomes, which indicates a highly hydrophobic (oily) last common ancestor (LCA). This trend, obtained from simple considerations (free from sequence reconstruction methods), was corroborated by regression studies within homologous and orthologous protein clusters as well as phylogenetic estimates of the ancestral oil content. While indicating an inherent irreversibility in molecular evolution, oil escape also serves as a rare and universal reaction-coordinate for evolution (reinforcing Darwin's principle of Common Descent), and may prove important in matters such as (i) explaining the emergence of intrinsically disordered proteins, (ii) developing composition- and speciation-based “global” molecular clocks, and (iii) improving the statistical methods for ancestral sequence reconstruction. PMID:23300421
Hickey, Graeme L; Dunning, Joel; Seifert, Burkhardt; Sodeck, Gottfried; Carr, Matthew J; Burger, Hans Ulrich; Beyersdorf, Friedhelm
2015-08-01
As part of the peer review process for the European Journal of Cardio-Thoracic Surgery (EJCTS) and the Interactive CardioVascular and Thoracic Surgery (ICVTS), a statistician reviews any manuscript that includes a statistical analysis. To facilitate authors considering submitting a manuscript and to make it clearer about the expectations of the statistical reviewers, we present up-to-date guidelines for authors on statistical and data reporting specifically in these journals. The number of statistical methods used in the cardiothoracic literature is vast, as are the ways in which data are presented. Therefore, we narrow the scope of these guidelines to cover the most common applications submitted to the EJCTS and ICVTS, focusing in particular on those that the statistical reviewers most frequently comment on. © The Author 2015. Published by Oxford University Press on behalf of the European Association for Cardio-Thoracic Surgery. All rights reserved.
A Comparison of Three Different Methods of Fixation in the Management of Thoracolumbar Fractures.
Panteliadis, Pavlos; Musbahi, Omar; Muthian, Senthil; Goyal, Shivam; Montgomery, Alexander Sheriff; Ranganathan, Arun
2017-01-01
Management of thoracolumbar fractures remains controversial in the literature. The primary aims of this study were to assess different levels of fixation with respect to radiological outcomes in terms of fracture reduction and future loss of correction. This is a single center, retrospective study. Fifty-five patients presenting with thoracolumbar fractures between January 2012 and December 2015 were analyzed in the study. The levels of fixation were divided in 3 groups, 1 vertebra above and 1 below the fracture (1/1), 2 above and 2 below (2/2), and 2 above and 1 below (2/1). The most common mechanism was high fall injury and the most common vertebra L1. Burst fractures were the ones with the highest incidence. The 2/2 fixation achieved the best reduction of the fracture but with no statistical significance. The correction is maintained better by the 2/2 fixation but there is no statistical difference compared to the other fixations. Insertion of screws at the fracture level did not improve outcomes. The data of this study identified a trend towards better radiological outcomes for fracture reduction and maintenance of the correction in the 2/2 fixations. However these results are not statistically significant. Future multicenter prospective clinical trials are needed in order to agree on the ideal management and method of fixation for thoracolumbar fractures.
Durand, Jean-Baptiste; Allard, Alix; Guitton, Baptiste; van de Weg, Eric; Bink, Marco C A M; Costes, Evelyne
2017-01-01
Irregular flowering over years is commonly observed in fruit trees. The early prediction of tree behavior is highly desirable in breeding programmes. This study aims at performing such predictions, combining simplified phenotyping and statistics methods. Sequences of vegetative vs. floral annual shoots (AS) were observed along axes in trees belonging to five apple related full-sib families. Sequences were analyzed using Markovian and linear mixed models including year and site effects. Indices of flowering irregularity, periodicity and synchronicity were estimated, at tree and axis scales. They were used to predict tree behavior and detect QTL with a Bayesian pedigree-based analysis, using an integrated genetic map containing 6,849 SNPs. The combination of a Biennial Bearing Index (BBI) with an autoregressive coefficient (γ g ) efficiently predicted and classified the genotype behaviors, despite few misclassifications. Four QTLs common to BBIs and γ g and one for synchronicity were highlighted and revealed the complex genetic architecture of the traits. Irregularity resulted from high AS synchronism, whereas regularity resulted from either asynchronous locally alternating or continual regular AS flowering. A relevant and time-saving method, based on a posteriori sampling of axes and statistical indices is proposed, which is efficient to evaluate the tree breeding values for flowering regularity and could be transferred to other species.
A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants
Broadaway, K. Alaine; Cutler, David J.; Duncan, Richard; Moore, Jacob L.; Ware, Erin B.; Jhun, Min A.; Bielak, Lawrence F.; Zhao, Wei; Smith, Jennifer A.; Peyser, Patricia A.; Kardia, Sharon L.R.; Ghosh, Debashis; Epstein, Michael P.
2016-01-01
Increasing empirical evidence suggests that many genetic variants influence multiple distinct phenotypes. When cross-phenotype effects exist, multivariate association methods that consider pleiotropy are often more powerful than univariate methods that model each phenotype separately. Although several statistical approaches exist for testing cross-phenotype effects for common variants, there is a lack of similar tests for gene-based analysis of rare variants. In order to fill this important gap, we introduce a statistical method for cross-phenotype analysis of rare variants using a nonparametric distance-covariance approach that compares similarity in multivariate phenotypes to similarity in rare-variant genotypes across a gene. The approach can accommodate both binary and continuous phenotypes and further can adjust for covariates. Our approach yields a closed-form test whose significance can be evaluated analytically, thereby improving computational efficiency and permitting application on a genome-wide scale. We use simulated data to demonstrate that our method, which we refer to as the Gene Association with Multiple Traits (GAMuT) test, provides increased power over competing approaches. We also illustrate our approach using exome-chip data from the Genetic Epidemiology Network of Arteriopathy. PMID:26942286
Experimental and environmental factors affect spurious detection of ecological thresholds
Daily, Jonathan P.; Hitt, Nathaniel P.; Smith, David; Snyder, Craig D.
2012-01-01
Threshold detection methods are increasingly popular for assessing nonlinear responses to environmental change, but their statistical performance remains poorly understood. We simulated linear change in stream benthic macroinvertebrate communities and evaluated the performance of commonly used threshold detection methods based on model fitting (piecewise quantile regression [PQR]), data partitioning (nonparametric change point analysis [NCPA]), and a hybrid approach (significant zero crossings [SiZer]). We demonstrated that false detection of ecological thresholds (type I errors) and inferences on threshold locations are influenced by sample size, rate of linear change, and frequency of observations across the environmental gradient (i.e., sample-environment distribution, SED). However, the relative importance of these factors varied among statistical methods and between inference types. False detection rates were influenced primarily by user-selected parameters for PQR (τ) and SiZer (bandwidth) and secondarily by sample size (for PQR) and SED (for SiZer). In contrast, the location of reported thresholds was influenced primarily by SED. Bootstrapped confidence intervals for NCPA threshold locations revealed strong correspondence to SED. We conclude that the choice of statistical methods for threshold detection should be matched to experimental and environmental constraints to minimize false detection rates and avoid spurious inferences regarding threshold location.
From fields to objects: A review of geographic boundary analysis
NASA Astrophysics Data System (ADS)
Jacquez, G. M.; Maruca, S.; Fortin, M.-J.
Geographic boundary analysis is a relatively new approach unfamiliar to many spatial analysts. It is best viewed as a technique for defining objects - geographic boundaries - on spatial fields, and for evaluating the statistical significance of characteristics of those boundary objects. This is accomplished using null spatial models representative of the spatial processes expected in the absence of boundary-generating phenomena. Close ties to the object-field dialectic eminently suit boundary analysis to GIS data. The majority of existing spatial methods are field-based in that they describe, estimate, or predict how attributes (variables defining the field) vary through geographic space. Such methods are appropriate for field representations but not object representations. As the object-field paradigm gains currency in geographic information science, appropriate techniques for the statistical analysis of objects are required. The methods reviewed in this paper are a promising foundation. Geographic boundary analysis is clearly a valuable addition to the spatial statistical toolbox. This paper presents the philosophy of, and motivations for geographic boundary analysis. It defines commonly used statistics for quantifying boundaries and their characteristics, as well as simulation procedures for evaluating their significance. We review applications of these techniques, with the objective of making this promising approach accessible to the GIS-spatial analysis community. We also describe the implementation of these methods within geographic boundary analysis software: GEM.
Accounting for measurement error: a critical but often overlooked process.
Harris, Edward F; Smith, Richard N
2009-12-01
Due to instrument imprecision and human inconsistencies, measurements are not free of error. Technical error of measurement (TEM) is the variability encountered between dimensions when the same specimens are measured at multiple sessions. A goal of a data collection regimen is to minimise TEM. The few studies that actually quantify TEM, regardless of discipline, report that it is substantial and can affect results and inferences. This paper reviews some statistical approaches for identifying and controlling TEM. Statistically, TEM is part of the residual ('unexplained') variance in a statistical test, so accounting for TEM, which requires repeated measurements, enhances the chances of finding a statistically significant difference if one exists. The aim of this paper was to review and discuss common statistical designs relating to types of error and statistical approaches to error accountability. This paper addresses issues of landmark location, validity, technical and systematic error, analysis of variance, scaled measures and correlation coefficients in order to guide the reader towards correct identification of true experimental differences. Researchers commonly infer characteristics about populations from comparatively restricted study samples. Most inferences are statistical and, aside from concerns about adequate accounting for known sources of variation with the research design, an important source of variability is measurement error. Variability in locating landmarks that define variables is obvious in odontometrics, cephalometrics and anthropometry, but the same concerns about measurement accuracy and precision extend to all disciplines. With increasing accessibility to computer-assisted methods of data collection, the ease of incorporating repeated measures into statistical designs has improved. Accounting for this technical source of variation increases the chance of finding biologically true differences when they exist.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Owens, W.W.; Sullivan, H.H.
Electroless nicke-plate characteristics are substantially influenced by percent phosphorous concentrations. Available ASTM analytical methods are designed for phosphorous concentrations of less than one percent compared to the 4.0 to 20.0% concentrations common in electroless nickel plate. A variety of analytical adaptations are applied through the industry resulting in poor data continuity. This paper presents a statistical comparison of five analytical methods and recommends accurate and precise procedures for use in percent phosphorous determinations in electroless nickel plate. 2 figures, 1 table.
Empirical methods for assessing meaningful neuropsychological change following epilepsy surgery.
Sawrie, S M; Chelune, G J; Naugle, R I; Lüders, H O
1996-11-01
Traditional methods for assessing the neurocognitive effects of epilepsy surgery are confounded by practice effects, test-retest reliability issues, and regression to the mean. This study employs 2 methods for assessing individual change that allow direct comparison of changes across both individuals and test measures. Fifty-one medically intractable epilepsy patients completed a comprehensive neuropsychological battery twice, approximately 8 months apart, prior to any invasive monitoring or surgical intervention. First, a Reliable Change (RC) index score was computed for each test score to take into account the reliability of that measure, and a cutoff score was empirically derived to establish the limits of statistically reliable change. These indices were subsequently adjusted for expected practice effects. The second approach used a regression technique to establish "change norms" along a common metric that models both expected practice effects and regression to the mean. The RC index scores provide the clinician with a statistical means of determining whether a patient's retest performance is "significantly" changed from baseline. The regression norms for change allow the clinician to evaluate the magnitude of a given patient's change on 1 or more variables along a common metric that takes into account the reliability and stability of each test measure. Case data illustrate how these methods provide an empirically grounded means for evaluating neurocognitive outcomes following medical interventions such as epilepsy surgery.
Results of Propellant Mixing Variable Study Using Precise Pressure-Based Burn Rate Calculations
NASA Technical Reports Server (NTRS)
Stefanski, Philip L.
2014-01-01
A designed experiment was conducted in which three mix processing variables (pre-curative addition mix temperature, pre-curative addition mixing time, and mixer speed) were varied to estimate their effects on within-mix propellant burn rate variability. The chosen discriminator for the experiment was the 2-inch diameter by 4-inch long (2x4) Center-Perforated (CP) ballistic evaluation motor. Motor nozzle throat diameters were sized to produce a common targeted chamber pressure. Initial data analysis did not show a statistically significant effect. Because propellant burn rate must be directly related to chamber pressure, a method was developed that showed statistically significant effects on chamber pressure (either maximum or average) by adjustments to the process settings. Burn rates were calculated from chamber pressures and these were then normalized to a common pressure for comparative purposes. The pressure-based method of burn rate determination showed significant reduction in error when compared to results obtained from the Brooks' modification of the propellant web-bisector burn rate determination method. Analysis of effects using burn rates calculated by the pressure-based method showed a significant correlation of within-mix burn rate dispersion to mixing duration and the quadratic of mixing duration. The findings were confirmed in a series of mixes that examined the effects of mixing time on burn rate variation, which yielded the same results.
Theory of atomic spectral emission intensity
NASA Astrophysics Data System (ADS)
Yngström, Sten
1994-07-01
The theoretical derivation of a new spectral line intensity formula for atomic radiative emission is presented. The theory is based on first principles of quantum physics, electrodynamics, and statistical physics. Quantum rules lead to revision of the conventional principle of local thermal equilibrium of matter and radiation. Study of electrodynamics suggests absence of spectral emission from fractions of the numbers of atoms and ions in a plasma due to radiative inhibition caused by electromagnetic force fields. Statistical probability methods are extended by the statement: A macroscopic physical system develops in the most probable of all conceivable ways consistent with the constraining conditions for the system. The crucial role of statistical physics in transforming quantum logic into common sense logic is stressed. The theory is strongly supported by experimental evidence.
Rodgers, Joseph Lee
2016-01-01
The Bayesian-frequentist debate typically portrays these statistical perspectives as opposing views. However, both Bayesian and frequentist statisticians have expanded their epistemological basis away from a singular focus on the null hypothesis, to a broader perspective involving the development and comparison of competing statistical/mathematical models. For frequentists, statistical developments such as structural equation modeling and multilevel modeling have facilitated this transition. For Bayesians, the Bayes factor has facilitated this transition. The Bayes factor is treated in articles within this issue of Multivariate Behavioral Research. The current presentation provides brief commentary on those articles and more extended discussion of the transition toward a modern modeling epistemology. In certain respects, Bayesians and frequentists share common goals.
ERIC Educational Resources Information Center
Sun, Shuyan; Pan, Wei
2014-01-01
As applications of multilevel modelling in educational research increase, researchers realize that multilevel data collected in many educational settings are often not purely nested. The most common multilevel non-nested data structure is one that involves student mobility in longitudinal studies. This article provides a methodological review of…
ERIC Educational Resources Information Center
Henry, Kimberly L.; Muthen, Bengt
2010-01-01
Latent class analysis (LCA) is a statistical method used to identify subtypes of related cases using a set of categorical or continuous observed variables. Traditional LCA assumes that observations are independent. However, multilevel data structures are common in social and behavioral research and alternative strategies are needed. In this…
An Evaluation of Normal versus Lognormal Distribution in Data Description and Empirical Analysis
ERIC Educational Resources Information Center
Diwakar, Rekha
2017-01-01
Many existing methods of statistical inference and analysis rely heavily on the assumption that the data are normally distributed. However, the normality assumption is not fulfilled when dealing with data which does not contain negative values or are otherwise skewed--a common occurrence in diverse disciplines such as finance, economics, political…
Estimation of body mass index from the metrics of the first metatarsal
NASA Astrophysics Data System (ADS)
Dunn, Tyler E.
Estimation of the biological profile from as many skeletal elements as possible is a necessity in both forensic and bioarchaeological contexts; this includes non-standard aspects of the biological profile, such as body mass index (BMI). BMI is a measure that allows for understanding of the composition of an individual and is traditionally divided into four groups: underweight, normal weight, overweight, and obese. BMI estimation incorporates both estimation of stature and body mass. The estimation of stature from skeletal elements is commonly included into the standard biological profile but the estimation of body mass needs to be further statistically validated to be consistently included. The bones of the foot, specifically the first metatarsal, may have the ability to estimate BMI given an allometric relationship to stature and the mechanical relationship to body mass. There are two commonly used methods for stature estimation, the anatomical method and the regression method. The anatomical method takes into account all of the skeletal elements that contribute to stature while the regression method relies on the allometric relationship between a skeletal element and living stature. A correlation between the metrics of the first metatarsal and living stature has been observed, and proposed as a method for valid stature estimation from the boney foot (Byers et al., 1989). Body mass estimation from skeletal elements relies on two theoretical frameworks: the morphometric and the mechanical approaches. The morphometric approach relies on the size relationship of the individual to body mass; the basic relationship between volume, density, and weight allows for body mass estimation. The body is thought of as a cylinder, and in order to understand the volume of this cylinder the diameter is needed. A commonly used proxy for this in the human body is skeletal bi-iliac breadth from rearticulated pelvic girdle. The mechanical method of body mass estimation relies on the ideas of biomechanical bone remodeling; the elements of the skeleton that are under higher forces, including weight, will remodel to minimize stress. A commonly used metric for the mechanical method of body mass estimation is the diameter of the head of the femur. The foot experiences nearly the entire weight force of the individual at any point in the gait cycle and is subject to the biomechanical remodeling that this force would induce. Therefore, the application of the mechanical framework for body mass estimation could stand true for the elements of the foot. The morphometric and mechanical approaches have been validated against one another on a large, geographically disparate population (Auerbach and Ruff, 2004), but have yet to be validated on a sample of known body mass. DeGroote and Humphrey (2011) test the ability of the first metatarsal to estimate femoral head diameter, body mass, and femoral length. The estimated femoral head diameter from the first metatarsal is used to estimate body mass via the morphometric approach and the femoral length is used to estimate living stature. The authors find that body mass and stature estimation methods from more commonly used skeletal elements compared well with the methods developed from the first metatarsal. This study examines 388 `White' individuals from the William M. Bass donated skeletal collection to test the reliability of the body mass estimates from femoral head diameter and bi-iliac breadth, stature from maximum femoral length, and body mass and stature from the metrics of the first metatarsal. This sample included individuals from all four of the BMI classes. This study finds that all of the skeletal indicators compare well with one another; there is no statistical difference in the stature estimates from the first metatarsal and the maximum length of the femur, and there is no statistical between all three of the body mass estimation methods. When compared to the forensic estimates of stature neither of the tested methods had statistical difference. Conversely, when the body mass estimates are compared to forensic body mass there was a statistical difference and when further investigated the most difference in the body mass estimates was in the extremes of body mass (the underweight and obese categories). These findings indicate that the estimation of stature from both the maximum femoral length and the metrics of the metatarsal are accurate methods. Furthermore, the estimation of body mass is accurate when the individual is in the middle range of the BMI spectrum while these methods for outlying individuals are inaccurate. These findings have implications for the application of stature and body mass estimation in the fields of bioarchaeology, forensic anthropology, and paleoanthropology.
NASA Astrophysics Data System (ADS)
Smid, Marek; Costa, Ana; Pebesma, Edzer; Granell, Carlos; Bhattacharya, Devanjan
2016-04-01
Human kind is currently predominantly urban based, and the majority of ever continuing population growth will take place in urban agglomerations. Urban systems are not only major drivers of climate change, but also the impact hot spots. Furthermore, climate change impacts are commonly managed at city scale. Therefore, assessing climate change impacts on urban systems is a very relevant subject of research. Climate and its impacts on all levels (local, meso and global scale) and also the inter-scale dependencies of those processes should be a subject to detail analysis. While global and regional projections of future climate are currently available, local-scale information is lacking. Hence, statistical downscaling methodologies represent a potentially efficient way to help to close this gap. In general, the methodological reviews of downscaling procedures cover the various methods according to their application (e.g. downscaling for the hydrological modelling). Some of the most recent and comprehensive studies, such as the ESSEM COST Action ES1102 (VALUE), use the concept of Perfect Prog and MOS. Other examples of classification schemes of downscaling techniques consider three main categories: linear methods, weather classifications and weather generators. Downscaling and climate modelling represent a multidisciplinary field, where researchers from various backgrounds intersect their efforts, resulting in specific terminology, which may be somewhat confusing. For instance, the Polynomial Regression (also called the Surface Trend Analysis) is a statistical technique. In the context of the spatial interpolation procedures, it is commonly classified as a deterministic technique, and kriging approaches are classified as stochastic. Furthermore, the terms "statistical" and "stochastic" (frequently used as names of sub-classes in downscaling methodological reviews) are not always considered as synonymous, even though both terms could be seen as identical since they are referring to methods handling input modelling factors as variables with certain probability distributions. In addition, the recent development is going towards multi-step methodologies containing deterministic and stochastic components. This evolution leads to the introduction of new terms like hybrid or semi-stochastic approaches, which makes the efforts to systematically classifying downscaling methods to the previously defined categories even more challenging. This work presents a review of statistical downscaling procedures, which classifies the methods in two steps. In the first step, we describe several techniques that produce a single climatic surface based on observations. The methods are classified into two categories using an approximation to the broadest consensual statistical terms: linear and non-linear methods. The second step covers techniques that use simulations to generate alternative surfaces, which correspond to different realizations of the same processes. Those simulations are essential because there is a limited number of real observational data, and such procedures are crucial for modelling extremes. This work emphasises the link between statistical downscaling methods and the research of climate change impacts at city scale.
Physics of Electronic Materials
NASA Astrophysics Data System (ADS)
Rammer, Jørgen
2017-03-01
1. Quantum mechanics; 2. Quantum tunneling; 3. Standard metal model; 4. Standard conductor model; 5. Electric circuit theory; 6. Quantum wells; 7. Particle in a periodic potential; 8. Bloch currents; 9. Crystalline solids; 10. Semiconductor doping; 11. Transistors; 12. Heterostructures; 13. Mesoscopic physics; 14. Arithmetic, logic and machines; Appendix A. Principles of quantum mechanics; Appendix B. Dirac's delta function; Appendix C. Fourier analysis; Appendix D. Classical mechanics; Appendix E. Wave function properties; Appendix F. Transfer matrix properties; Appendix G. Momentum; Appendix H. Confined particles; Appendix I. Spin and quantum statistics; Appendix J. Statistical mechanics; Appendix K. The Fermi-Dirac distribution; Appendix L. Thermal current fluctuations; Appendix M. Gaussian wave packets; Appendix N. Wave packet dynamics; Appendix O. Screening by symmetry method; Appendix P. Commutation and common eigenfunctions; Appendix Q. Interband coupling; Appendix R. Common crystal structures; Appendix S. Effective mass approximation; Appendix T. Integral doubling formula; Bibliography; Index.
Patounakis, George; Hill, Micah J
2018-06-01
The purpose of the current review is to describe the common pitfalls in design and statistical analysis of reproductive medicine studies. It serves to guide both authors and reviewers toward reducing the incidence of spurious statistical results and erroneous conclusions. The large amount of data gathered in IVF cycles leads to problems with multiplicity, multicollinearity, and over fitting of regression models. Furthermore, the use of the word 'trend' to describe nonsignificant results has increased in recent years. Finally, methods to accurately account for female age in infertility research models are becoming more common and necessary. The pitfalls of study design and analysis reviewed provide a framework for authors and reviewers to approach clinical research in the field of reproductive medicine. By providing a more rigorous approach to study design and analysis, the literature in reproductive medicine will have more reliable conclusions that can stand the test of time.
Estimation of descriptive statistics for multiply censored water quality data
Helsel, Dennis R.; Cohn, Timothy A.
1988-01-01
This paper extends the work of Gilliom and Helsel (1986) on procedures for estimating descriptive statistics of water quality data that contain “less than” observations. Previously, procedures were evaluated when only one detection limit was present. Here we investigate the performance of estimators for data that have multiple detection limits. Probability plotting and maximum likelihood methods perform substantially better than simple substitution procedures now commonly in use. Therefore simple substitution procedures (e.g., substitution of the detection limit) should be avoided. Probability plotting methods are more robust than maximum likelihood methods to misspecification of the parent distribution and their use should be encouraged in the typical situation where the parent distribution is unknown. When utilized correctly, less than values frequently contain nearly as much information for estimating population moments and quantiles as would the same observations had the detection limit been below them.
NASA Astrophysics Data System (ADS)
Abdellatef, Hisham E.
2007-04-01
Picric acid, bromocresol green, bromothymol blue, cobalt thiocyanate and molybdenum(V) thiocyanate have been tested as spectrophotometric reagents for the determination of disopyramide and irbesartan. Reaction conditions have been optimized to obtain coloured comoplexes of higher sensitivity and longer stability. The absorbance of ion-pair complexes formed were found to increases linearity with increases in concentrations of disopyramide and irbesartan which were corroborated by correction coefficient values. The developed methods have been successfully applied for the determination of disopyramide and irbesartan in bulk drugs and pharmaceutical formulations. The common excipients and additives did not interfere in their determination. The results obtained by the proposed methods have been statistically compared by means of student t-test and by the variance ratio F-test. The validity was assessed by applying the standard addition technique. The results were compared statistically with the official or reference methods showing a good agreement with high precision and accuracy.
Statistical universals reveal the structures and functions of human music.
Savage, Patrick E; Brown, Steven; Sakai, Emi; Currie, Thomas E
2015-07-21
Music has been called "the universal language of mankind." Although contemporary theories of music evolution often invoke various musical universals, the existence of such universals has been disputed for decades and has never been empirically demonstrated. Here we combine a music-classification scheme with statistical analyses, including phylogenetic comparative methods, to examine a well-sampled global set of 304 music recordings. Our analyses reveal no absolute universals but strong support for many statistical universals that are consistent across all nine geographic regions sampled. These universals include 18 musical features that are common individually as well as a network of 10 features that are commonly associated with one another. They span not only features related to pitch and rhythm that are often cited as putative universals but also rarely cited domains including performance style and social context. These cross-cultural structural regularities of human music may relate to roles in facilitating group coordination and cohesion, as exemplified by the universal tendency to sing, play percussion instruments, and dance to simple, repetitive music in groups. Our findings highlight the need for scientists studying music evolution to expand the range of musical cultures and musical features under consideration. The statistical universals we identified represent important candidates for future investigation.
Statistical universals reveal the structures and functions of human music
Savage, Patrick E.; Brown, Steven; Sakai, Emi; Currie, Thomas E.
2015-01-01
Music has been called “the universal language of mankind.” Although contemporary theories of music evolution often invoke various musical universals, the existence of such universals has been disputed for decades and has never been empirically demonstrated. Here we combine a music-classification scheme with statistical analyses, including phylogenetic comparative methods, to examine a well-sampled global set of 304 music recordings. Our analyses reveal no absolute universals but strong support for many statistical universals that are consistent across all nine geographic regions sampled. These universals include 18 musical features that are common individually as well as a network of 10 features that are commonly associated with one another. They span not only features related to pitch and rhythm that are often cited as putative universals but also rarely cited domains including performance style and social context. These cross-cultural structural regularities of human music may relate to roles in facilitating group coordination and cohesion, as exemplified by the universal tendency to sing, play percussion instruments, and dance to simple, repetitive music in groups. Our findings highlight the need for scientists studying music evolution to expand the range of musical cultures and musical features under consideration. The statistical universals we identified represent important candidates for future investigation. PMID:26124105
Kerr, Kathleen F; Meisner, Allison; Thiessen-Philbrook, Heather; Coca, Steven G; Parikh, Chirag R
2014-08-07
The field of nephrology is actively involved in developing biomarkers and improving models for predicting patients' risks of AKI and CKD and their outcomes. However, some important aspects of evaluating biomarkers and risk models are not widely appreciated, and statistical methods are still evolving. This review describes some of the most important statistical concepts for this area of research and identifies common pitfalls. Particular attention is paid to metrics proposed within the last 5 years for quantifying the incremental predictive value of a new biomarker. Copyright © 2014 by the American Society of Nephrology.
Tolerancing aspheres based on manufacturing statistics
NASA Astrophysics Data System (ADS)
Wickenhagen, S.; Möhl, A.; Fuchs, U.
2017-11-01
A standard way of tolerancing optical elements or systems is to perform a Monte Carlo based analysis within a common optical design software package. Although, different weightings and distributions are assumed they are all counting on statistics, which usually means several hundreds or thousands of systems for reliable results. Thus, employing these methods for small batch sizes is unreliable, especially when aspheric surfaces are involved. The huge database of asphericon was used to investigate the correlation between the given tolerance values and measured data sets. The resulting probability distributions of these measured data were analyzed aiming for a robust optical tolerancing process.
What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum
Hesterberg, Tim C.
2015-01-01
Bootstrapping has enormous potential in statistics education and practice, but there are subtle issues and ways to go wrong. For example, the common combination of nonparametric bootstrapping and bootstrap percentile confidence intervals is less accurate than using t-intervals for small samples, though more accurate for larger samples. My goals in this article are to provide a deeper understanding of bootstrap methods—how they work, when they work or not, and which methods work better—and to highlight pedagogical issues. Supplementary materials for this article are available online. [Received December 2014. Revised August 2015] PMID:27019512
A prospective study of two methods of closing surgical scalp wounds.
Adeolu, A A; Olabanji, J K; Komolafe, E O; Ademuyiwa, A O; Awe, A O; Oladele, A O
2012-02-01
Scalp wounds are commonly closed in two layers, although single layer closure is feasible. This study prospectively compared the two methods of closing scalp wounds. Patients with non-traumatic scalp wounds were allocated to either the single layer closure group or the multilayer closure group. We obtained relevant data from the patients. The primary outcome measures were wound edge related complications, rate of suturing and cost of sutures used for suturing. Thirty-one wounds were in the single layer closure group and 30 were in the multilayer closure group. Age range was 1-80 years. The most common indication for making a scalp incision was subdural hematoma, representing 27.8% of all the indications. The most common surgery was burr hole drainage of subdural hematoma. Polyglactin acid suture was used for the inner layer and polyamide -00- for the final layer in the multilayer closure group. Only the latter suture was used for the single layer closure method. Total cost of suturing per wound in the single layer closure group was N= 100 (0.70USD) and N= 800 (5.30USD) in the multilayer group. The mean rate of closure was 0.39 ± 1.89 mm/sec for single layer closure and 0.23 ± 0.89 mm/sec in multilayer closure. The difference was statistically significant. Wound edge related complication rate was 19.35% in the single layer closure group and 16.67% in the multilayer closure method group. The difference was not statistically significant (z: 0.00, p value: 1.000; Pearson chi-squared (DF = 1)= 0.0075, p = 0.0785). The study shows that closing the scalp in one layer is much faster and more cost effective compared to the multilayer closure method. We did not observe significant difference in the complication rates in the two methods of closure. Long-term outcome, especially cosmetic outcome, remains to be determined in this preliminary study.
Zhang, Kui; Wiener, Howard; Beasley, Mark; George, Varghese; Amos, Christopher I; Allison, David B
2006-08-01
Individual genome scans for quantitative trait loci (QTL) mapping often suffer from low statistical power and imprecise estimates of QTL location and effect. This lack of precision yields large confidence intervals for QTL location, which are problematic for subsequent fine mapping and positional cloning. In prioritizing areas for follow-up after an initial genome scan and in evaluating the credibility of apparent linkage signals, investigators typically examine the results of other genome scans of the same phenotype and informally update their beliefs about which linkage signals in their scan most merit confidence and follow-up via a subjective-intuitive integration approach. A method that acknowledges the wisdom of this general paradigm but formally borrows information from other scans to increase confidence in objectivity would be a benefit. We developed an empirical Bayes analytic method to integrate information from multiple genome scans. The linkage statistic obtained from a single genome scan study is updated by incorporating statistics from other genome scans as prior information. This technique does not require that all studies have an identical marker map or a common estimated QTL effect. The updated linkage statistic can then be used for the estimation of QTL location and effect. We evaluate the performance of our method by using extensive simulations based on actual marker spacing and allele frequencies from available data. Results indicate that the empirical Bayes method can account for between-study heterogeneity, estimate the QTL location and effect more precisely, and provide narrower confidence intervals than results from any single individual study. We also compared the empirical Bayes method with a method originally developed for meta-analysis (a closely related but distinct purpose). In the face of marked heterogeneity among studies, the empirical Bayes method outperforms the comparator.
Gu, Zhan; Qi, Xiuzhong; Zhai, Xiaofeng; Lang, Qingbo; Lu, Jianying; Ma, Changping; Liu, Long; Yue, Xiaoqiang
2015-01-01
Primary liver cancer (PLC) is one of the most common malignant tumors because of its high incidence and high mortality. Traditional Chinese medicine (TCM) plays an active role in the treatment of PLC. As the most important part in the TCM system, syndrome differentiation based on the clinical manifestations from traditional four diagnostic methods has met great challenges and questions with the lack of statistical validation support. In this study, we provided evidences for TCM syndrome differentiation of PLC using the method of analysis of latent structural model from clinic data, thus providing basis for establishing TCM syndrome criteria. And also we obtain the common syndromes of PLC as well as their typical clinical manifestations, respectively.
Helsel, D.R.
2006-01-01
The most commonly used method in environmental chemistry to deal with values below detection limits is to substitute a fraction of the detection limit for each nondetect. Two decades of research has shown that this fabrication of values produces poor estimates of statistics, and commonly obscures patterns and trends in the data. Papers using substitution may conclude that significant differences, correlations, and regression relationships do not exist, when in fact they do. The reverse may also be true. Fortunately, good alternative methods for dealing with nondetects already exist, and are summarized here with references to original sources. Substituting values for nondetects should be used rarely, and should generally be considered unacceptable in scientific research. There are better ways.
Model Considerations for Memory-based Automatic Music Transcription
NASA Astrophysics Data System (ADS)
Albrecht, Štěpán; Šmídl, Václav
2009-12-01
The problem of automatic music description is considered. The recorded music is modeled as a superposition of known sounds from a library weighted by unknown weights. Similar observation models are commonly used in statistics and machine learning. Many methods for estimation of the weights are available. These methods differ in the assumptions imposed on the weights. In Bayesian paradigm, these assumptions are typically expressed in the form of prior probability density function (pdf) on the weights. In this paper, commonly used assumptions about music signal are summarized and complemented by a new assumption. These assumptions are translated into pdfs and combined into a single prior density using combination of pdfs. Validity of the model is tested in simulation using synthetic data.
2014-01-01
Quantitative imaging biomarkers (QIBs) are being used increasingly in medicine to diagnose and monitor patients’ disease. The computer algorithms that measure QIBs have different technical performance characteristics. In this paper we illustrate the appropriate statistical methods for assessing and comparing the bias, precision, and agreement of computer algorithms. We use data from three studies of pulmonary nodules. The first study is a small phantom study used to illustrate metrics for assessing repeatability. The second study is a large phantom study allowing assessment of four algorithms’ bias and reproducibility for measuring tumor volume and the change in tumor volume. The third study is a small clinical study of patients whose tumors were measured on two occasions. This study allows a direct assessment of six algorithms’ performance for measuring tumor change. With these three examples we compare and contrast study designs and performance metrics, and we illustrate the advantages and limitations of various common statistical methods for QIB studies. PMID:24919828
Multivariate Analysis of Longitudinal Rates of Change
Bryan, Matthew; Heagerty, Patrick J.
2016-01-01
Longitudinal data allow direct comparison of the change in patient outcomes associated with treatment or exposure. Frequently, several longitudinal measures are collected that either reflect a common underlying health status, or characterize processes that are influenced in a similar way by covariates such as exposure or demographic characteristics. Statistical methods that can combine multivariate response variables into common measures of covariate effects have been proposed by Roy and Lin [1]; Proust-Lima, Letenneur and Jacqmin-Gadda [2]; and Gray and Brookmeyer [3] among others. Current methods for characterizing the relationship between covariates and the rate of change in multivariate outcomes are limited to select models. For example, Gray and Brookmeyer [3] introduce an “accelerated time” method which assumes that covariates rescale time in longitudinal models for disease progression. In this manuscript we detail an alternative multivariate model formulation that directly structures longitudinal rates of change, and that permits a common covariate effect across multiple outcomes. We detail maximum likelihood estimation for a multivariate longitudinal mixed model. We show via asymptotic calculations the potential gain in power that may be achieved with a common analysis of multiple outcomes. We apply the proposed methods to the analysis of a trivariate outcome for infant growth and compare rates of change for HIV infected and uninfected infants. PMID:27417129
Abbott, M.; Einerson, J.; Schuster, P.; Susong, D.; Taylor, Howard E.; ,
2004-01-01
Snow sampling and analysis methods which produce accurate and ultra-low measurements of trace elements and common ion concentration in southeastern Idaho snow, were developed. Snow samples were collected over two winters to assess trace elements and common ion concentrations in air pollutant fallout across the southeastern Idaho. The area apportionment of apportionment of fallout concentrations measured at downwind location were investigated using pattern recognition and multivariate statistical technical techniques. Results show a high level of contribution from phosphates processing facilities located outside Pocatello in the southern portion of the Eastern Snake River Plain, and no obvious source area profiles other than at Pocatello.
Freeman, Jenny V; Collier, Steve; Staniforth, David; Smith, Kevin J
2008-01-01
Background Statistics is relevant to students and practitioners in medicine and health sciences and is increasingly taught as part of the medical curriculum. However, it is common for students to dislike and under-perform in statistics. We sought to address these issues by redesigning the way that statistics is taught. Methods The project brought together a statistician, clinician and educational experts to re-conceptualize the syllabus, and focused on developing different methods of delivery. New teaching materials, including videos, animations and contextualized workbooks were designed and produced, placing greater emphasis on applying statistics and interpreting data. Results Two cohorts of students were evaluated, one with old style and one with new style teaching. Both were similar with respect to age, gender and previous level of statistics. Students who were taught using the new approach could better define the key concepts of p-value and confidence interval (p < 0.001 for both). They were more likely to regard statistics as integral to medical practice (p = 0.03), and to expect to use it in their medical career (p = 0.003). There was no significant difference in the numbers who thought that statistics was essential to understand the literature (p = 0.28) and those who felt comfortable with the basics of statistics (p = 0.06). More than half the students in both cohorts felt that they were comfortable with the basics of medical statistics. Conclusion Using a variety of media, and placing emphasis on interpretation can help make teaching, learning and understanding of statistics more people-centred and relevant, resulting in better outcomes for students. PMID:18452599
Testing for significance of phase synchronisation dynamics in the EEG.
Daly, Ian; Sweeney-Reed, Catherine M; Nasuto, Slawomir J
2013-06-01
A number of tests exist to check for statistical significance of phase synchronisation within the Electroencephalogram (EEG); however, the majority suffer from a lack of generality and applicability. They may also fail to account for temporal dynamics in the phase synchronisation, regarding synchronisation as a constant state instead of a dynamical process. Therefore, a novel test is developed for identifying the statistical significance of phase synchronisation based upon a combination of work characterising temporal dynamics of multivariate time-series and Markov modelling. We show how this method is better able to assess the significance of phase synchronisation than a range of commonly used significance tests. We also show how the method may be applied to identify and classify significantly different phase synchronisation dynamics in both univariate and multivariate datasets.
Adams, Dean C
2014-09-01
Phylogenetic signal is the tendency for closely related species to display similar trait values due to their common ancestry. Several methods have been developed for quantifying phylogenetic signal in univariate traits and for sets of traits treated simultaneously, and the statistical properties of these approaches have been extensively studied. However, methods for assessing phylogenetic signal in high-dimensional multivariate traits like shape are less well developed, and their statistical performance is not well characterized. In this article, I describe a generalization of the K statistic of Blomberg et al. that is useful for quantifying and evaluating phylogenetic signal in highly dimensional multivariate data. The method (K(mult)) is found from the equivalency between statistical methods based on covariance matrices and those based on distance matrices. Using computer simulations based on Brownian motion, I demonstrate that the expected value of K(mult) remains at 1.0 as trait variation among species is increased or decreased, and as the number of trait dimensions is increased. By contrast, estimates of phylogenetic signal found with a squared-change parsimony procedure for multivariate data change with increasing trait variation among species and with increasing numbers of trait dimensions, confounding biological interpretations. I also evaluate the statistical performance of hypothesis testing procedures based on K(mult) and find that the method displays appropriate Type I error and high statistical power for detecting phylogenetic signal in high-dimensional data. Statistical properties of K(mult) were consistent for simulations using bifurcating and random phylogenies, for simulations using different numbers of species, for simulations that varied the number of trait dimensions, and for different underlying models of trait covariance structure. Overall these findings demonstrate that K(mult) provides a useful means of evaluating phylogenetic signal in high-dimensional multivariate traits. Finally, I illustrate the utility of the new approach by evaluating the strength of phylogenetic signal for head shape in a lineage of Plethodon salamanders. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Evaluation of normalization methods in mammalian microRNA-Seq data
Garmire, Lana Xia; Subramaniam, Shankar
2012-01-01
Simple total tag count normalization is inadequate for microRNA sequencing data generated from the next generation sequencing technology. However, so far systematic evaluation of normalization methods on microRNA sequencing data is lacking. We comprehensively evaluate seven commonly used normalization methods including global normalization, Lowess normalization, Trimmed Mean Method (TMM), quantile normalization, scaling normalization, variance stabilization, and invariant method. We assess these methods on two individual experimental data sets with the empirical statistical metrics of mean square error (MSE) and Kolmogorov-Smirnov (K-S) statistic. Additionally, we evaluate the methods with results from quantitative PCR validation. Our results consistently show that Lowess normalization and quantile normalization perform the best, whereas TMM, a method applied to the RNA-Sequencing normalization, performs the worst. The poor performance of TMM normalization is further evidenced by abnormal results from the test of differential expression (DE) of microRNA-Seq data. Comparing with the models used for DE, the choice of normalization method is the primary factor that affects the results of DE. In summary, Lowess normalization and quantile normalization are recommended for normalizing microRNA-Seq data, whereas the TMM method should be used with caution. PMID:22532701
Abdulmajeed, Ahmed; Rabab, Mohamed A; Sliem, Hamdy A; Hebatallah, Nour Eldein
2011-01-01
Introduction Irritable bowel syndrome (IBS) is one of the most common disorders diagnosed by gastroenterologists and a common cause of general practice visits. Although this disease is not life threatening, patients with IBS seem to be seriously affected in their everyday life. The study was designed to explore the pattern of IBS in clinical practice and the impact on the quality of life. Methods This is a case control descriptive study. 117 individuals were included in this study. Rome II criteria were used for the diagnosis of IBS. Impact of IBS on patient's quality of life was determined by irritable bowel syndrome quality of life (IBS-QOL) questionnaire. Results Prevalence of IBS among the study sample was 34.2%. 10% were IBS-Diarrhea, 37.5% were IBS-Constipation and 52.5% were alternators. There is statistical insignificant relationship between IBS (+) and age while it was a significant relation regarding gender (more common among women 80%). There is statistical significance relationship between IBS (+) on one hand and marital status and occupational status on the other hand. Patients with IBS had statistically significant lower scores for all IBS- QOL domains compared with the control group. Conclusion IBS is a prevalent disorder that affects females more than males and it has significant impacts on work, lifestyle and social well-being. PMID:22145053
EEG Sleep Stages Classification Based on Time Domain Features and Structural Graph Similarity.
Diykh, Mohammed; Li, Yan; Wen, Peng
2016-11-01
The electroencephalogram (EEG) signals are commonly used in diagnosing and treating sleep disorders. Many existing methods for sleep stages classification mainly depend on the analysis of EEG signals in time or frequency domain to obtain a high classification accuracy. In this paper, the statistical features in time domain, the structural graph similarity and the K-means (SGSKM) are combined to identify six sleep stages using single channel EEG signals. Firstly, each EEG segment is partitioned into sub-segments. The size of a sub-segment is determined empirically. Secondly, statistical features are extracted, sorted into different sets of features and forwarded to the SGSKM to classify EEG sleep stages. We have also investigated the relationships between sleep stages and the time domain features of the EEG data used in this paper. The experimental results show that the proposed method yields better classification results than other four existing methods and the support vector machine (SVM) classifier. A 95.93% average classification accuracy is achieved by using the proposed method.
Jia, Erik; Chen, Tianlu
2018-01-01
Left-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been developed and applied to the situation of MNAR in the field of metabolomics. Thus, a practical left-censored missing value imputation method is urgently needed. We developed an iterative Gibbs sampler based left-censored missing value imputation approach (GSimp). We compared GSimp with other three imputation methods on two real-world targeted metabolomics datasets and one simulation dataset using our imputation evaluation pipeline. The results show that GSimp outperforms other imputation methods in terms of imputation accuracy, observation distribution, univariate and multivariate analyses, and statistical sensitivity. Additionally, a parallel version of GSimp was developed for dealing with large scale metabolomics datasets. The R code for GSimp, evaluation pipeline, tutorial, real-world and simulated targeted metabolomics datasets are available at: https://github.com/WandeRum/GSimp. PMID:29385130
Recchia, Gabriel L; Louwerse, Max M
2016-11-01
Computational techniques comparing co-occurrences of city names in texts allow the relative longitudes and latitudes of cities to be estimated algorithmically. However, these techniques have not been applied to estimate the provenance of artifacts with unknown origins. Here, we estimate the geographic origin of artifacts from the Indus Valley Civilization, applying methods commonly used in cognitive science to the Indus script. We show that these methods can accurately predict the relative locations of archeological sites on the basis of artifacts of known provenance, and we further apply these techniques to determine the most probable excavation sites of four sealings of unknown provenance. These findings suggest that inscription statistics reflect historical interactions among locations in the Indus Valley region, and they illustrate how computational methods can help localize inscribed archeological artifacts of unknown origin. The success of this method offers opportunities for the cognitive sciences in general and for computational anthropology specifically. Copyright © 2015 Cognitive Science Society, Inc.
Yan, Zhengbing; Kuang, Te-Hui; Yao, Yuan
2017-09-01
In recent years, multivariate statistical monitoring of batch processes has become a popular research topic, wherein multivariate fault isolation is an important step aiming at the identification of the faulty variables contributing most to the detected process abnormality. Although contribution plots have been commonly used in statistical fault isolation, such methods suffer from the smearing effect between correlated variables. In particular, in batch process monitoring, the high autocorrelations and cross-correlations that exist in variable trajectories make the smearing effect unavoidable. To address such a problem, a variable selection-based fault isolation method is proposed in this research, which transforms the fault isolation problem into a variable selection problem in partial least squares discriminant analysis and solves it by calculating a sparse partial least squares model. As different from the traditional methods, the proposed method emphasizes the relative importance of each process variable. Such information may help process engineers in conducting root-cause diagnosis. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Choi, Leena; Carroll, Robert J; Beck, Cole; Mosley, Jonathan D; Roden, Dan M; Denny, Joshua C; Van Driest, Sara L
2018-04-18
Phenome-wide association studies (PheWAS) have been used to discover many genotype-phenotype relationships and have the potential to identify therapeutic and adverse drug outcomes using longitudinal data within electronic health records (EHRs). However, the statistical methods for PheWAS applied to longitudinal EHR medication data have not been established. In this study, we developed methods to address two challenges faced with reuse of EHR for this purpose: confounding by indication, and low exposure and event rates. We used Monte Carlo simulation to assess propensity score (PS) methods, focusing on two of the most commonly used methods, PS matching and PS adjustment, to address confounding by indication. We also compared two logistic regression approaches (the default of Wald vs. Firth's penalized maximum likelihood, PML) to address complete separation due to sparse data with low exposure and event rates. PS adjustment resulted in greater power than propensity score matching, while controlling Type I error at 0.05. The PML method provided reasonable p-values, even in cases with complete separation, with well controlled Type I error rates. Using PS adjustment and the PML method, we identify novel latent drug effects in pediatric patients exposed to two common antibiotic drugs, ampicillin and gentamicin. R packages PheWAS and EHR are available at https://github.com/PheWAS/PheWAS and at CRAN (https://www.r-project.org/), respectively. The R script for data processing and the main analysis is available at https://github.com/choileena/EHR. leena.choi@vanderbilt.edu. Supplementary data are available at Bioinformatics online.
Basic statistics (the fundamental concepts).
Lim, Eric
2014-12-01
An appreciation and understanding of statistics is import to all practising clinicians, not simply researchers. This is because mathematics is the fundamental basis to which we base clinical decisions, usually with reference to the benefit in relation to risk. Unless a clinician has a basic understanding of statistics, he or she will never be in a position to question healthcare management decisions that have been handed down from generation to generation, will not be able to conduct research effectively nor evaluate the validity of published evidence (usually making an assumption that most published work is either all good or all bad). This article provides a brief introduction to basic statistical methods and illustrates its use in common clinical scenarios. In addition, pitfalls of incorrect usage have been highlighted. However, it is not meant to be a substitute for formal training or consultation with a qualified and experienced medical statistician prior to starting any research project.
New U.S. Geological Survey Method for the Assessment of Reserve Growth
Klett, Timothy R.; Attanasi, E.D.; Charpentier, Ronald R.; Cook, Troy A.; Freeman, P.A.; Gautier, Donald L.; Le, Phuong A.; Ryder, Robert T.; Schenk, Christopher J.; Tennyson, Marilyn E.; Verma, Mahendra K.
2011-01-01
Reserve growth is defined as the estimated increases in quantities of crude oil, natural gas, and natural gas liquids that have the potential to be added to remaining reserves in discovered accumulations through extension, revision, improved recovery efficiency, and additions of new pools or reservoirs. A new U.S. Geological Survey method was developed to assess the reserve-growth potential of technically recoverable crude oil and natural gas to be added to reserves under proven technology currently in practice within the trend or play, or which reasonably can be extrapolated from geologically similar trends or plays. This method currently is in use to assess potential additions to reserves in discovered fields of the United States. The new approach involves (1) individual analysis of selected large accumulations that contribute most to reserve growth, and (2) conventional statistical modeling of reserve growth in remaining accumulations. This report will focus on the individual accumulation analysis. In the past, the U.S. Geological Survey estimated reserve growth by statistical methods using historical recoverable-quantity data. Those statistical methods were based on growth rates averaged by the number of years since accumulation discovery. Accumulations in mature petroleum provinces with volumetrically significant reserve growth, however, bias statistical models of the data; therefore, accumulations with significant reserve growth are best analyzed separately from those with less significant reserve growth. Large (greater than 500 million barrels) and older (with respect to year of discovery) oil accumulations increase in size at greater rates late in their development history in contrast to more recently discovered accumulations that achieve most growth early in their development history. Such differences greatly affect the statistical methods commonly used to forecast reserve growth. The individual accumulation-analysis method involves estimating the in-place petroleum quantity and its uncertainty, as well as the estimated (forecasted) recoverability and its respective uncertainty. These variables are assigned probabilistic distributions and are combined statistically to provide probabilistic estimates of ultimate recoverable quantities. Cumulative production and remaining reserves are then subtracted from the estimated ultimate recoverable quantities to provide potential reserve growth. In practice, results of the two methods are aggregated to various scales, the highest of which includes an entire country or the world total. The aggregated results are reported along with the statistically appropriate uncertainties.
Paroxysmal atrial fibrillation prediction method with shorter HRV sequences.
Boon, K H; Khalil-Hani, M; Malarvili, M B; Sia, C W
2016-10-01
This paper proposes a method that predicts the onset of paroxysmal atrial fibrillation (PAF), using heart rate variability (HRV) segments that are shorter than those applied in existing methods, while maintaining good prediction accuracy. PAF is a common cardiac arrhythmia that increases the health risk of a patient, and the development of an accurate predictor of the onset of PAF is clinical important because it increases the possibility to stabilize (electrically) and prevent the onset of atrial arrhythmias with different pacing techniques. We investigate the effect of HRV features extracted from different lengths of HRV segments prior to PAF onset with the proposed PAF prediction method. The pre-processing stage of the predictor includes QRS detection, HRV quantification and ectopic beat correction. Time-domain, frequency-domain, non-linear and bispectrum features are then extracted from the quantified HRV. In the feature selection, the HRV feature set and classifier parameters are optimized simultaneously using an optimization procedure based on genetic algorithm (GA). Both full feature set and statistically significant feature subset are optimized by GA respectively. For the statistically significant feature subset, Mann-Whitney U test is used to filter non-statistical significance features that cannot pass the statistical test at 20% significant level. The final stage of our predictor is the classifier that is based on support vector machine (SVM). A 10-fold cross-validation is applied in performance evaluation, and the proposed method achieves 79.3% prediction accuracy using 15-minutes HRV segment. This accuracy is comparable to that achieved by existing methods that use 30-minutes HRV segments, most of which achieves accuracy of around 80%. More importantly, our method significantly outperforms those that applied segments shorter than 30 minutes. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Evaluation of methods for managing censored results when calculating the geometric mean.
Mikkonen, Hannah G; Clarke, Bradley O; Dasika, Raghava; Wallis, Christian J; Reichman, Suzie M
2018-01-01
Currently, there are conflicting views on the best statistical methods for managing censored environmental data. The method commonly applied by environmental science researchers and professionals is to substitute half the limit of reporting for derivation of summary statistics. This approach has been criticised by some researchers, raising questions around the interpretation of historical scientific data. This study evaluated four complete soil datasets, at three levels of simulated censorship, to test the accuracy of a range of censored data management methods for calculation of the geometric mean. The methods assessed included removal of censored results, substitution of a fixed value (near zero, half the limit of reporting and the limit of reporting), substitution by nearest neighbour imputation, maximum likelihood estimation, regression on order substitution and Kaplan-Meier/survival analysis. This is the first time such a comprehensive range of censored data management methods have been applied to assess the accuracy of calculation of the geometric mean. The results of this study show that, for describing the geometric mean, the simple method of substitution of half the limit of reporting is comparable or more accurate than alternative censored data management methods, including nearest neighbour imputation methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
Baqué, Michèle; Amendt, Jens
2013-01-01
Developmental data of juvenile blow flies (Diptera: Calliphoridae) are typically used to calculate the age of immature stages found on or around a corpse and thus to estimate a minimum post-mortem interval (PMI(min)). However, many of those data sets don't take into account that immature blow flies grow in a non-linear fashion. Linear models do not supply a sufficient reliability on age estimates and may even lead to an erroneous determination of the PMI(min). According to the Daubert standard and the need for improvements in forensic science, new statistic tools like smoothing methods and mixed models allow the modelling of non-linear relationships and expand the field of statistical analyses. The present study introduces into the background and application of these statistical techniques by analysing a model which describes the development of the forensically important blow fly Calliphora vicina at different temperatures. The comparison of three statistical methods (linear regression, generalised additive modelling and generalised additive mixed modelling) clearly demonstrates that only the latter provided regression parameters that reflect the data adequately. We focus explicitly on both the exploration of the data--to assure their quality and to show the importance of checking it carefully prior to conducting the statistical tests--and the validation of the resulting models. Hence, we present a common method for evaluating and testing forensic entomological data sets by using for the first time generalised additive mixed models.
Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models.
Fan, Ruzong; Wang, Yifan; Boehnke, Michael; Chen, Wei; Li, Yun; Ren, Haobo; Lobach, Iryna; Xiong, Momiao
2015-08-01
Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies. Copyright © 2015 by the Genetics Society of America.
Carayol, Jérôme; Schellenberg, Gerard D.; Dombroski, Beth; Amiet, Claire; Génin, Bérengère; Fontaine, Karine; Rousseau, Francis; Vazart, Céline; Cohen, David; Frazier, Thomas W.; Hardan, Antonio Y.; Dawson, Geraldine; Rio Frio, Thomas
2014-01-01
Autism spectrum disorders (ASD) are highly heritable complex neurodevelopmental disorders with a 4:1 male: female ratio. Common genetic variation could explain 40–60% of the variance in liability to autism. Because of their small effect, genome-wide association studies (GWASs) have only identified a small number of individual single-nucleotide polymorphisms (SNPs). To increase the power of GWASs in complex disorders, methods like convergent functional genomics (CFG) have emerged to extract true association signals from noise and to identify and prioritize genes from SNPs using a scoring strategy combining statistics and functional genomics. We adapted and applied this approach to analyze data from a GWAS performed on families with multiple children affected with autism from Autism Speaks Autism Genetic Resource Exchange (AGRE). We identified a set of 133 candidate markers that were localized in or close to genes with functional relevance in ASD from a discovery population (545 multiplex families); a gender specific genetic score (GS) based on these common variants explained 1% (P = 0.01 in males) and 5% (P = 8.7 × 10−7 in females) of genetic variance in an independent sample of multiplex families. Overall, our work demonstrates that prioritization of GWAS data based on functional genomics identified common variants associated with autism and provided additional support for a common polygenic background in autism. PMID:24600472
Guo, Ying; Little, Roderick J; McConnell, Daniel S
2012-01-01
Covariate measurement error is common in epidemiologic studies. Current methods for correcting measurement error with information from external calibration samples are insufficient to provide valid adjusted inferences. We consider the problem of estimating the regression of an outcome Y on covariates X and Z, where Y and Z are observed, X is unobserved, but a variable W that measures X with error is observed. Information about measurement error is provided in an external calibration sample where data on X and W (but not Y and Z) are recorded. We describe a method that uses summary statistics from the calibration sample to create multiple imputations of the missing values of X in the regression sample, so that the regression coefficients of Y on X and Z and associated standard errors can be estimated using simple multiple imputation combining rules, yielding valid statistical inferences under the assumption of a multivariate normal distribution. The proposed method is shown by simulation to provide better inferences than existing methods, namely the naive method, classical calibration, and regression calibration, particularly for correction for bias and achieving nominal confidence levels. We also illustrate our method with an example using linear regression to examine the relation between serum reproductive hormone concentrations and bone mineral density loss in midlife women in the Michigan Bone Health and Metabolism Study. Existing methods fail to adjust appropriately for bias due to measurement error in the regression setting, particularly when measurement error is substantial. The proposed method corrects this deficiency.
Kim, Jiyu; Jung, Inkyung
2017-01-01
Spatial scan statistics with circular or elliptic scanning windows are commonly used for cluster detection in various applications, such as the identification of geographical disease clusters from epidemiological data. It has been pointed out that the method may have difficulty in correctly identifying non-compact, arbitrarily shaped clusters. In this paper, we evaluated the Gini coefficient for detecting irregularly shaped clusters through a simulation study. The Gini coefficient, the use of which in spatial scan statistics was recently proposed, is a criterion measure for optimizing the maximum reported cluster size. Our simulation study results showed that using the Gini coefficient works better than the original spatial scan statistic for identifying irregularly shaped clusters, by reporting an optimized and refined collection of clusters rather than a single larger cluster. We have provided a real data example that seems to support the simulation results. We think that using the Gini coefficient in spatial scan statistics can be helpful for the detection of irregularly shaped clusters. PMID:28129368
Statistical Analysis of Large-Scale Structure of Universe
NASA Astrophysics Data System (ADS)
Tugay, A. V.
While galaxy cluster catalogs were compiled many decades ago, other structural elements of cosmic web are detected at definite level only in the newest works. For example, extragalactic filaments were described by velocity field and SDSS galaxy distribution during the last years. Large-scale structure of the Universe could be also mapped in the future using ATHENA observations in X-rays and SKA in radio band. Until detailed observations are not available for the most volume of Universe, some integral statistical parameters can be used for its description. Such methods as galaxy correlation function, power spectrum, statistical moments and peak statistics are commonly used with this aim. The parameters of power spectrum and other statistics are important for constraining the models of dark matter, dark energy, inflation and brane cosmology. In the present work we describe the growth of large-scale density fluctuations in one- and three-dimensional case with Fourier harmonics of hydrodynamical parameters. In result we get power-law relation for the matter power spectrum.
Fast and accurate imputation of summary statistics enhances evidence of functional enrichment
Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P.; Patterson, Nick; Price, Alkes L.
2014-01-01
Motivation: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. Results: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1–5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case–control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of χ2 association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Availability and implementation: Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. Contact: bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:24990607
The Case Survey and Alternative Methods for Research Aggregation
1974-06-01
of FolLow-Through programs were found to predollinate Jin one set of pooled data with common statistical characteristics, aod studies of Montessori ...be mad compatible. l"’we" and other mundane d if f icu ties have a t renendous ca ac i tv to ctonsurW ’ i M- and energy. Montessori ), or they could be
A re-evaluation of a case-control model with contaminated controls for resource selection studies
Christopher T. Rota; Joshua J. Millspaugh; Dylan C. Kesler; Chad P. Lehman; Mark A. Rumble; Catherine M. B. Jachowski
2013-01-01
A common sampling design in resource selection studies involves measuring resource attributes at sample units used by an animal and at sample units considered available for use. Few models can estimate the absolute probability of using a sample unit from such data, but such approaches are generally preferred over statistical methods that estimate a relative probability...
ERIC Educational Resources Information Center
Lun, G.; Holzer, D.; Tappeiner, G.; Tappeiner, U.
2006-01-01
The calculation of composite indicators and the derivation of respective rankings is a common method used to benchmark countries or regions. However, although the statistical robustness of these rankings is often criticised, they often still spark off heated political debate. Here, we assess the sensitivity of the province ranking published by the…
A review of statistical updating methods for clinical prediction models.
Su, Ting-Li; Jaki, Thomas; Hickey, Graeme L; Buchan, Iain; Sperrin, Matthew
2018-01-01
A clinical prediction model is a tool for predicting healthcare outcomes, usually within a specific population and context. A common approach is to develop a new clinical prediction model for each population and context; however, this wastes potentially useful historical information. A better approach is to update or incorporate the existing clinical prediction models already developed for use in similar contexts or populations. In addition, clinical prediction models commonly become miscalibrated over time, and need replacing or updating. In this article, we review a range of approaches for re-using and updating clinical prediction models; these fall in into three main categories: simple coefficient updating, combining multiple previous clinical prediction models in a meta-model and dynamic updating of models. We evaluated the performance (discrimination and calibration) of the different strategies using data on mortality following cardiac surgery in the United Kingdom: We found that no single strategy performed sufficiently well to be used to the exclusion of the others. In conclusion, useful tools exist for updating existing clinical prediction models to a new population or context, and these should be implemented rather than developing a new clinical prediction model from scratch, using a breadth of complementary statistical methods.
Measurements and analysis in imaging for biomedical applications
NASA Astrophysics Data System (ADS)
Hoeller, Timothy L.
2009-02-01
A Total Quality Management (TQM) approach can be used to analyze data from biomedical optical and imaging platforms of tissues. A shift from individuals to teams, partnerships, and total participation are necessary from health care groups for improved prognostics using measurement analysis. Proprietary measurement analysis software is available for calibrated, pixel-to-pixel measurements of angles and distances in digital images. Feature size, count, and color are determinable on an absolute and comparative basis. Although changes in images of histomics are based on complex and numerous factors, the variation of changes in imaging analysis to correlations of time, extent, and progression of illness can be derived. Statistical methods are preferred. Applications of the proprietary measurement software are available for any imaging platform. Quantification of results provides improved categorization of illness towards better health. As health care practitioners try to use quantified measurement data for patient diagnosis, the techniques reported can be used to track and isolate causes better. Comparisons, norms, and trends are available from processing of measurement data which is obtained easily and quickly from Scientific Software and methods. Example results for the class actions of Preventative and Corrective Care in Ophthalmology and Dermatology, respectively, are provided. Improved and quantified diagnosis can lead to better health and lower costs associated with health care. Systems support improvements towards Lean and Six Sigma affecting all branches of biology and medicine. As an example for use of statistics, the major types of variation involving a study of Bone Mineral Density (BMD) are examined. Typically, special causes in medicine relate to illness and activities; whereas, common causes are known to be associated with gender, race, size, and genetic make-up. Such a strategy of Continuous Process Improvement (CPI) involves comparison of patient results to baseline data using F-statistics. Self-parings over time are also useful. Special and common causes are identified apart from aging in applying the statistical methods. In the future, implementation of imaging measurement methods by research staff, doctors, and concerned patient partners result in improved health diagnosis, reporting, and cause determination. The long-term prospects for quantified measurements are better quality in imaging analysis with applications of higher utility for heath care providers.
The Importance of Statistical Modeling in Data Analysis and Inference
ERIC Educational Resources Information Center
Rollins, Derrick, Sr.
2017-01-01
Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…
An advanced probabilistic structural analysis method for implicit performance functions
NASA Technical Reports Server (NTRS)
Wu, Y.-T.; Millwater, H. R.; Cruse, T. A.
1989-01-01
In probabilistic structural analysis, the performance or response functions usually are implicitly defined and must be solved by numerical analysis methods such as finite element methods. In such cases, the most commonly used probabilistic analysis tool is the mean-based, second-moment method which provides only the first two statistical moments. This paper presents a generalized advanced mean value (AMV) method which is capable of establishing the distributions to provide additional information for reliability design. The method requires slightly more computations than the second-moment method but is highly efficient relative to the other alternative methods. In particular, the examples show that the AMV method can be used to solve problems involving non-monotonic functions that result in truncated distributions.
Sunspot activity and influenza pandemics: a statistical assessment of the purported association.
Towers, S
2017-10-01
Since 1978, a series of papers in the literature have claimed to find a significant association between sunspot activity and the timing of influenza pandemics. This paper examines these analyses, and attempts to recreate the three most recent statistical analyses by Ertel (1994), Tapping et al. (2001), and Yeung (2006), which all have purported to find a significant relationship between sunspot numbers and pandemic influenza. As will be discussed, each analysis had errors in the data. In addition, in each analysis arbitrary selections or assumptions were also made, and the authors did not assess the robustness of their analyses to changes in those arbitrary assumptions. Varying the arbitrary assumptions to other, equally valid, assumptions negates the claims of significance. Indeed, an arbitrary selection made in one of the analyses appears to have resulted in almost maximal apparent significance; changing it only slightly yields a null result. This analysis applies statistically rigorous methodology to examine the purported sunspot/pandemic link, using more statistically powerful un-binned analysis methods, rather than relying on arbitrarily binned data. The analyses are repeated using both the Wolf and Group sunspot numbers. In all cases, no statistically significant evidence of any association was found. However, while the focus in this particular analysis was on the purported relationship of influenza pandemics to sunspot activity, the faults found in the past analyses are common pitfalls; inattention to analysis reproducibility and robustness assessment are common problems in the sciences, that are unfortunately not noted often enough in review.
Granato, Gregory E.
2014-01-01
The U.S. Geological Survey (USGS) developed the Stochastic Empirical Loading and Dilution Model (SELDM) in cooperation with the Federal Highway Administration (FHWA) to indicate the risk for stormwater concentrations, flows, and loads to be above user-selected water-quality goals and the potential effectiveness of mitigation measures to reduce such risks. SELDM models the potential effect of mitigation measures by using Monte Carlo methods with statistics that approximate the net effects of structural and nonstructural best management practices (BMPs). In this report, structural BMPs are defined as the components of the drainage pathway between the source of runoff and a stormwater discharge location that affect the volume, timing, or quality of runoff. SELDM uses a simple stochastic statistical model of BMP performance to develop planning-level estimates of runoff-event characteristics. This statistical approach can be used to represent a single BMP or an assemblage of BMPs. The SELDM BMP-treatment module has provisions for stochastic modeling of three stormwater treatments: volume reduction, hydrograph extension, and water-quality treatment. In SELDM, these three treatment variables are modeled by using the trapezoidal distribution and the rank correlation with the associated highway-runoff variables. This report describes methods for calculating the trapezoidal-distribution statistics and rank correlation coefficients for stochastic modeling of volume reduction, hydrograph extension, and water-quality treatment by structural stormwater BMPs and provides the calculated values for these variables. This report also provides robust methods for estimating the minimum irreducible concentration (MIC), which is the lowest expected effluent concentration from a particular BMP site or a class of BMPs. These statistics are different from the statistics commonly used to characterize or compare BMPs. They are designed to provide a stochastic transfer function to approximate the quantity, duration, and quality of BMP effluent given the associated inflow values for a population of storm events. A database application and several spreadsheet tools are included in the digital media accompanying this report for further documentation of methods and for future use. In this study, analyses were done with data extracted from a modified copy of the January 2012 version of International Stormwater Best Management Practices Database, designated herein as the January 2012a version. Statistics for volume reduction, hydrograph extension, and water-quality treatment were developed with selected data. Sufficient data were available to estimate statistics for 5 to 10 BMP categories by using data from 40 to more than 165 monitoring sites. Water-quality treatment statistics were developed for 13 runoff-quality constituents commonly measured in highway and urban runoff studies including turbidity, sediment and solids; nutrients; total metals; organic carbon; and fecal coliforms. The medians of the best-fit statistics for each category were selected to construct generalized cumulative distribution functions for the three treatment variables. For volume reduction and hydrograph extension, interpretation of available data indicates that selection of a Spearman’s rho value that is the average of the median and maximum values for the BMP category may help generate realistic simulation results in SELDM. The median rho value may be selected to help generate realistic simulation results for water-quality treatment variables. MIC statistics were developed for 12 runoff-quality constituents commonly measured in highway and urban runoff studies by using data from 11 BMP categories and more than 167 monitoring sites. Four statistical techniques were applied for estimating MIC values with monitoring data from each site. These techniques produce a range of lower-bound estimates for each site. Four MIC estimators are proposed as alternatives for selecting a value from among the estimates from multiple sites. Correlation analysis indicates that the MIC estimates from multiple sites were weakly correlated with the geometric mean of inflow values, which indicates that there may be a qualitative or semiquantitative link between the inflow quality and the MIC. Correlations probably are weak because the MIC is influenced by the inflow water quality and the capability of each individual BMP site to reduce inflow concentrations.
Armstrong, Ben; Fleming, Lora E.; Elson, Richard; Kovats, Sari; Vardoulakis, Sotiris; Nichols, Gordon L.
2017-01-01
Infectious diseases attributable to unsafe water supply, sanitation and hygiene (e.g. Cholera, Leptospirosis, Giardiasis) remain an important cause of morbidity and mortality, especially in low-income countries. Climate and weather factors are known to affect the transmission and distribution of infectious diseases and statistical and mathematical modelling are continuously developing to investigate the impact of weather and climate on water-associated diseases. There have been little critical analyses of the methodological approaches. Our objective is to review and summarize statistical and modelling methods used to investigate the effects of weather and climate on infectious diseases associated with water, in order to identify limitations and knowledge gaps in developing of new methods. We conducted a systematic review of English-language papers published from 2000 to 2015. Search terms included concepts related to water-associated diseases, weather and climate, statistical, epidemiological and modelling methods. We found 102 full text papers that met our criteria and were included in the analysis. The most commonly used methods were grouped in two clusters: process-based models (PBM) and time series and spatial epidemiology (TS-SE). In general, PBM methods were employed when the bio-physical mechanism of the pathogen under study was relatively well known (e.g. Vibrio cholerae); TS-SE tended to be used when the specific environmental mechanisms were unclear (e.g. Campylobacter). Important data and methodological challenges emerged, with implications for surveillance and control of water-associated infections. The most common limitations comprised: non-inclusion of key factors (e.g. biological mechanism, demographic heterogeneity, human behavior), reporting bias, poor data quality, and collinearity in exposures. Furthermore, the methods often did not distinguish among the multiple sources of time-lags (e.g. patient physiology, reporting bias, healthcare access) between environmental drivers/exposures and disease detection. Key areas of future research include: disentangling the complex effects of weather/climate on each exposure-health outcome pathway (e.g. person-to-person vs environment-to-person), and linking weather data to individual cases longitudinally. PMID:28604791
Lo Iacono, Giovanni; Armstrong, Ben; Fleming, Lora E; Elson, Richard; Kovats, Sari; Vardoulakis, Sotiris; Nichols, Gordon L
2017-06-01
Infectious diseases attributable to unsafe water supply, sanitation and hygiene (e.g. Cholera, Leptospirosis, Giardiasis) remain an important cause of morbidity and mortality, especially in low-income countries. Climate and weather factors are known to affect the transmission and distribution of infectious diseases and statistical and mathematical modelling are continuously developing to investigate the impact of weather and climate on water-associated diseases. There have been little critical analyses of the methodological approaches. Our objective is to review and summarize statistical and modelling methods used to investigate the effects of weather and climate on infectious diseases associated with water, in order to identify limitations and knowledge gaps in developing of new methods. We conducted a systematic review of English-language papers published from 2000 to 2015. Search terms included concepts related to water-associated diseases, weather and climate, statistical, epidemiological and modelling methods. We found 102 full text papers that met our criteria and were included in the analysis. The most commonly used methods were grouped in two clusters: process-based models (PBM) and time series and spatial epidemiology (TS-SE). In general, PBM methods were employed when the bio-physical mechanism of the pathogen under study was relatively well known (e.g. Vibrio cholerae); TS-SE tended to be used when the specific environmental mechanisms were unclear (e.g. Campylobacter). Important data and methodological challenges emerged, with implications for surveillance and control of water-associated infections. The most common limitations comprised: non-inclusion of key factors (e.g. biological mechanism, demographic heterogeneity, human behavior), reporting bias, poor data quality, and collinearity in exposures. Furthermore, the methods often did not distinguish among the multiple sources of time-lags (e.g. patient physiology, reporting bias, healthcare access) between environmental drivers/exposures and disease detection. Key areas of future research include: disentangling the complex effects of weather/climate on each exposure-health outcome pathway (e.g. person-to-person vs environment-to-person), and linking weather data to individual cases longitudinally.
The prior statistics of object colors.
Koenderink, Jan J
2010-02-01
The prior statistics of object colors is of much interest because extensive statistical investigations of reflectance spectra reveal highly non-uniform structure in color space common to several very different databases. This common structure is due to the visual system rather than to the statistics of environmental structure. Analysis involves an investigation of the proper sample space of spectral reflectance factors and of the statistical consequences of the projection of spectral reflectances on the color solid. Even in the case of reflectance statistics that are translationally invariant with respect to the wavelength dimension, the statistics of object colors is highly non-uniform. The qualitative nature of this non-uniformity is due to trichromacy.
Logistic regression applied to natural hazards: rare event logistic regression with replications
NASA Astrophysics Data System (ADS)
Guns, M.; Vanacker, V.
2012-06-01
Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
[How to start a neuroimaging study].
Narumoto, Jin
2012-06-01
In order to help researchers understand how to start a neuroimaging study, several tips are described in this paper. These include 1) Choice of an imaging modality, 2) Statistical method, and 3) Interpretation of the results. 1) There are several imaging modalities available in clinical research. Advantages and disadvantages of each modality are described. 2) Statistical Parametric Mapping, which is the most common statistical software for neuroimaging analysis, is described in terms of parameter setting in normalization and level of significance. 3) In the discussion section, the region which shows a significant difference between patients and normal controls should be discussed in relation to the neurophysiology of the disease, making reference to previous reports from neuroimaging studies in normal controls, lesion studies and animal studies. A typical pattern of discussion is described.
Quantifying the impact of between-study heterogeneity in multivariate meta-analyses
Jackson, Dan; White, Ian R; Riley, Richard D
2012-01-01
Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950
Gupta, Mridula; Bisesi, Michael; Lee, Jiyoung
2017-07-01
The survivability of Staphylococcus aureus and spores of Aspergillus niger was compared on 5 common floor materials. Floor materials were inoculated with a known concentration of S aureus and spores of A niger on day 0. Their survivability was measured on days, 2, 7, 14, and 28 by bulk rinsate method and enumerated using culture-based method. The difference in change of S aureus levels was statistically significant for all tested days (P < .001) for all floor materials. Vinyl composition tile (VCT) and porcelain tile (PT) had statistically similar survivability and differed statistically from carpets. On both VCT and PT, positive growth for S aureus occurred by day 2 (1-1.7 log 10 ), declined slightly (0.1 to -0.2 log 10 ) by day 7, and remained positive until day 28. However, S aureus was undetected by day 7 on both carpets. A niger spores were undetected on residential broadloom carpet and rubber-backed commercial carpet after day 2 but survived on VCT, PT, and wood until day 28. Floor materials with hard and smooth surfaces, such as VCT and PT, can allow survival of S aureus and A niger for up to 4 weeks. It may imply that floor materials can play a major role in preserving microbial contaminants in the built environment. Copyright © 2017 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.
[Drug use and sexual practices of teenagers in the city of Medellin (Colombia)].
Castaño Pérez, Guillermo A; Arango Tobon, Eduardo; Morales Mesa, Santiago; Rodríguez Bustamante, Alexander; Montoya Montoya, Carolina
2012-01-01
The purpose of this research project was to study the relationship between drug consumption and sexual practices in teenagers in the city of Medellin, Colombia. A transversal studied was designed in order to identify the variables related to having had sexual intercourse under the effects of drugs or alcohol. The sample was made up of 955 teenagers between 14 and 17, who were in 9th, 10th and 11th grades in public and private schools in the city of Medellin. The results show that the prevalence of drug and alcohol influence in sexual intercourse is 43,67%. The most common drugs used for sexual practices are alcohol, marihuana, popper, cocaine and ecstasy. Consuming alcohol or drugs and having sexual practices shows an associated meaningful statistic (p= .001). The more common sexual practices under psychoactive substances are the exploratory ones (caresses and touching) (71%), vaginal penetration (63.67%), oral sex (45.30%) and masturbation (19.59%). Regarding the protection methods during sexual intercourse under drug or alcohol influence 55,9% always use a condom, 37,3% sometimes use it, and 6,8% never do it. This study proves what had been previously established by other research projects that show a high statistic association between drug consumption and sexual practices, but realizes that there is no statistically significant association between sexual practices under the influence of alcohol or drugs and the use or non-use of protective methods, which is the most important finding.
Invited Commentary: The Need for Cognitive Science in Methodology.
Greenland, Sander
2017-09-15
There is no complete solution for the problem of abuse of statistics, but methodological training needs to cover cognitive biases and other psychosocial factors affecting inferences. The present paper discusses 3 common cognitive distortions: 1) dichotomania, the compulsion to perceive quantities as dichotomous even when dichotomization is unnecessary and misleading, as in inferences based on whether a P value is "statistically significant"; 2) nullism, the tendency to privilege the hypothesis of no difference or no effect when there is no scientific basis for doing so, as when testing only the null hypothesis; and 3) statistical reification, treating hypothetical data distributions and statistical models as if they reflect known physical laws rather than speculative assumptions for thought experiments. As commonly misused, null-hypothesis significance testing combines these cognitive problems to produce highly distorted interpretation and reporting of study results. Interval estimation has so far proven to be an inadequate solution because it involves dichotomization, an avenue for nullism. Sensitivity and bias analyses have been proposed to address reproducibility problems (Am J Epidemiol. 2017;186(6):646-647); these methods can indeed address reification, but they can also introduce new distortions via misleading specifications for bias parameters. P values can be reframed to lessen distortions by presenting them without reference to a cutoff, providing them for relevant alternatives to the null, and recognizing their dependence on all assumptions used in their computation; they nonetheless require rescaling for measuring evidence. I conclude that methodological development and training should go beyond coverage of mechanistic biases (e.g., confounding, selection bias, measurement error) to cover distortions of conclusions produced by statistical methods and psychosocial forces. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
High-performance parallel computing in the classroom using the public goods game as an example
NASA Astrophysics Data System (ADS)
Perc, Matjaž
2017-07-01
The use of computers in statistical physics is common because the sheer number of equations that describe the behaviour of an entire system particle by particle often makes it impossible to solve them exactly. Monte Carlo methods form a particularly important class of numerical methods for solving problems in statistical physics. Although these methods are simple in principle, their proper use requires a good command of statistical mechanics, as well as considerable computational resources. The aim of this paper is to demonstrate how the usage of widely accessible graphics cards on personal computers can elevate the computing power in Monte Carlo simulations by orders of magnitude, thus allowing live classroom demonstration of phenomena that would otherwise be out of reach. As an example, we use the public goods game on a square lattice where two strategies compete for common resources in a social dilemma situation. We show that the second-order phase transition to an absorbing phase in the system belongs to the directed percolation universality class, and we compare the time needed to arrive at this result by means of the main processor and by means of a suitable graphics card. Parallel computing on graphics processing units has been developed actively during the last decade, to the point where today the learning curve for entry is anything but steep for those familiar with programming. The subject is thus ripe for inclusion in graduate and advanced undergraduate curricula, and we hope that this paper will facilitate this process in the realm of physics education. To that end, we provide a documented source code for an easy reproduction of presented results and for further development of Monte Carlo simulations of similar systems.
Zeng, Ping; Mukherjee, Sayan; Zhou, Xiang
2017-01-01
Epistasis, commonly defined as the interaction between multiple genes, is an important genetic component underlying phenotypic variation. Many statistical methods have been developed to model and identify epistatic interactions between genetic variants. However, because of the large combinatorial search space of interactions, most epistasis mapping methods face enormous computational challenges and often suffer from low statistical power due to multiple test correction. Here, we present a novel, alternative strategy for mapping epistasis: instead of directly identifying individual pairwise or higher-order interactions, we focus on mapping variants that have non-zero marginal epistatic effects—the combined pairwise interaction effects between a given variant and all other variants. By testing marginal epistatic effects, we can identify candidate variants that are involved in epistasis without the need to identify the exact partners with which the variants interact, thus potentially alleviating much of the statistical and computational burden associated with standard epistatic mapping procedures. Our method is based on a variance component model, and relies on a recently developed variance component estimation method for efficient parameter inference and p-value computation. We refer to our method as the “MArginal ePIstasis Test”, or MAPIT. With simulations, we show how MAPIT can be used to estimate and test marginal epistatic effects, produce calibrated test statistics under the null, and facilitate the detection of pairwise epistatic interactions. We further illustrate the benefits of MAPIT in a QTL mapping study by analyzing the gene expression data of over 400 individuals from the GEUVADIS consortium. PMID:28746338
NASA Astrophysics Data System (ADS)
Yepes-Calderon, Fernando; Brun, Caroline; Sant, Nishita; Thompson, Paul; Lepore, Natasha
2015-01-01
Tensor-Based Morphometry (TBM) is an increasingly popular method for group analysis of brain MRI data. The main steps in the analysis consist of a nonlinear registration to align each individual scan to a common space, and a subsequent statistical analysis to determine morphometric differences, or difference in fiber structure between groups. Recently, we implemented the Statistically-Assisted Fluid Registration Algorithm or SAFIRA,1 which is designed for tracking morphometric differences among populations. To this end, SAFIRA allows the inclusion of statistical priors extracted from the populations being studied as regularizers in the registration. This flexibility and degree of sophistication limit the tool to expert use, even more so considering that SAFIRA was initially implemented in command line mode. Here, we introduce a new, intuitive, easy to use, Matlab-based graphical user interface for SAFIRA's multivariate TBM. The interface also generates different choices for the TBM statistics, including both the traditional univariate statistics on the Jacobian matrix, and comparison of the full deformation tensors.2 This software will be freely disseminated to the neuroimaging research community.
Scale invariance in biophysics
NASA Astrophysics Data System (ADS)
Stanley, H. Eugene
2000-06-01
In this general talk, we offer an overview of some problems of interest to biophysicists, medical physicists, and econophysicists. These include DNA sequences, brain plaques in Alzheimer patients, heartbeat intervals, and time series giving price fluctuations in economics. These problems have the common feature that they exhibit features that appear to be scale invariant. Particularly vexing is the problem that some of these scale invariant phenomena are not stationary-their statistical properties vary from one time interval to the next or form one position to the next. We will discuss methods, such as wavelet methods and multifractal methods, to cope with these problems. .
Fleetwood, V A; Gross, K N; Alex, G C; Cortina, C S; Smolevitz, J B; Sarvepalli, S; Bakhsh, S R; Poirier, J; Myers, J A; Singer, M A; Orkin, B A
2017-03-01
Anastomotic leak (AL) increases costs and cancer recurrence. Studies show decreased AL with side-to-side stapled anastomosis (SSA), but none identify risk factors within SSAs. We hypothesized that stapler characteristics and closure technique of the common enterotomy affect AL rates. Retrospective review of bowel SSAs was performed. Data included stapler brand, staple line oversewing, and closure method (handsewn, HC; linear stapler [Barcelona technique], BT; transverse stapler, TX). Primary endpoint was AL. Statistical analysis included Fisher's test and logistic regression. 463 patients were identified, 58.5% BT, 21.2% HC, and 20.3% TX. Covidien staplers comprised 74.9%, Ethicon 18.1%. There were no differences between stapler types (Covidien 5.8%, Ethicon 6.0%). However, AL rates varied by common side closure (BT 3.7% vs. TX 10.6%, p = 0.017), remaining significant on multivariate analysis. Closure method of the common side impacts AL rates. Barcelona technique has fewer leaks than transverse stapled closure. Further prospective evaluation is recommended. Copyright © 2017. Published by Elsevier Inc.
A hybrid method in combining treatment effects from matched and unmatched studies.
Byun, Jinyoung; Lai, Dejian; Luo, Sheng; Risser, Jan; Tung, Betty; Hardy, Robert J
2013-12-10
The most common data structures in the biomedical studies have been matched or unmatched designs. Data structures resulting from a hybrid of the two may create challenges for statistical inferences. The question may arise whether to use parametric or nonparametric methods on the hybrid data structure. The Early Treatment for Retinopathy of Prematurity study was a multicenter clinical trial sponsored by the National Eye Institute. The design produced data requiring a statistical method of a hybrid nature. An infant in this multicenter randomized clinical trial had high-risk prethreshold retinopathy of prematurity that was eligible for treatment in one or both eyes at entry into the trial. During follow-up, recognition visual acuity was accessed for both eyes. Data from both eyes (matched) and from only one eye (unmatched) were eligible to be used in the trial. The new hybrid nonparametric method is a meta-analysis based on combining the Hodges-Lehmann estimates of treatment effects from the Wilcoxon signed rank and rank sum tests. To compare the new method, we used the classic meta-analysis with the t-test method to combine estimates of treatment effects from the paired and two sample t-tests. We used simulations to calculate the empirical size and power of the test statistics, as well as the bias, mean square and confidence interval width of the corresponding estimators. The proposed method provides an effective tool to evaluate data from clinical trials and similar comparative studies. Copyright © 2013 John Wiley & Sons, Ltd.
Liegl, Gregor; Wahl, Inka; Berghöfer, Anne; Nolte, Sandra; Pieh, Christoph; Rose, Matthias; Fischer, Felix
2016-03-01
To investigate the validity of a common depression metric in independent samples. We applied a common metrics approach based on item-response theory for measuring depression to four German-speaking samples that completed the Patient Health Questionnaire (PHQ-9). We compared the PHQ item parameters reported for this common metric to reestimated item parameters that derived from fitting a generalized partial credit model solely to the PHQ-9 items. We calibrated the new model on the same scale as the common metric using two approaches (estimation with shifted prior and Stocking-Lord linking). By fitting a mixed-effects model and using Bland-Altman plots, we investigated the agreement between latent depression scores resulting from the different estimation models. We found different item parameters across samples and estimation methods. Although differences in latent depression scores between different estimation methods were statistically significant, these were clinically irrelevant. Our findings provide evidence that it is possible to estimate latent depression scores by using the item parameters from a common metric instead of reestimating and linking a model. The use of common metric parameters is simple, for example, using a Web application (http://www.common-metrics.org) and offers a long-term perspective to improve the comparability of patient-reported outcome measures. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Gábor Hatvani, István; Kern, Zoltán; Leél-Őssy, Szabolcs; Demény, Attila
2018-01-01
Uneven spacing is a common feature of sedimentary paleoclimate records, in many cases causing difficulties in the application of classical statistical and time series methods. Although special statistical tools do exist to assess unevenly spaced data directly, the transformation of such data into a temporally equidistant time series which may then be examined using commonly employed statistical tools remains, however, an unachieved goal. The present paper, therefore, introduces an approach to obtain evenly spaced time series (using cubic spline fitting) from unevenly spaced speleothem records with the application of a spectral guidance to avoid the spectral bias caused by interpolation and retain the original spectral characteristics of the data. The methodology was applied to stable carbon and oxygen isotope records derived from two stalagmites from the Baradla Cave (NE Hungary) dating back to the late 18th century. To show the benefit of the equally spaced records to climate studies, their coherence with climate parameters is explored using wavelet transform coherence and discussed. The obtained equally spaced time series are available at https://doi.org/10.1594/PANGAEA.875917.
Self-Consistency of Rain Event Definitions
NASA Astrophysics Data System (ADS)
Teves, J. B.; Larsen, M.
2014-12-01
A dense optical rain disdrometer array was constructed to study rain variability on spatial scales less than 100 meters with temporal resolution of 1 minute. Approximately two months of data were classified into rain events using methods common in the literature. These methods were unable to identify an array-wide consensus as to the total number of rain events; instruments as little as 2 meters apart with similar data records sometimes identified different rain event totals. Physical considerations suggest that these differing event totals are likely due to instrument sampling fluctuations that are typically not accounted for in rain event studies. Detection of varying numbers of rain events impact many commonly used storm statistics including storm duration distributions and mean rain rate. A summary of the results above and their implications are presented.
Method of analysis of local neuronal circuits in the vertebrate central nervous system.
Reinis, S; Weiss, D S; McGaraughty, S; Tsoukatos, J
1992-06-01
Although a considerable amount of knowledge has been accumulated about the activity of individual nerve cells in the brain, little is known about their mutual interactions at the local level. The method presented in this paper allows the reconstruction of functional relations within a group of neurons as recorded by a single microelectrode. Data are sampled at 10 or 13 kHz. Prominent spikes produced by one or more single cells are selected and sorted by K-means cluster analysis. The activities of single cells are then related to the background firing of neurons in their vicinity. Auto-correlograms of the leading cells, auto-correlograms of the background cells (mass correlograms) and cross-correlograms between these two levels of firing are computed and evaluated. The statistical probability of mutual interactions is determined, and the statistically significant, most common interspike intervals are stored and attributed to real pairs of spikes in the original record. Selected pairs of spikes, characterized by statistically significant intervals between them, are then assembled into a working model of the system. This method has revealed substantial differences between the information processing in the visual cortex, the inferior colliculus, the rostral ventromedial medulla and the ventrobasal complex of the thalamus. Even short 1-s records of the multiple neuronal activity may provide meaningful and statistically significant results.
Confounding in statistical mediation analysis: What it is and how to address it.
Valente, Matthew J; Pelham, William E; Smyth, Heather; MacKinnon, David P
2017-11-01
Psychology researchers are often interested in mechanisms underlying how randomized interventions affect outcomes such as substance use and mental health. Mediation analysis is a common statistical method for investigating psychological mechanisms that has benefited from exciting new methodological improvements over the last 2 decades. One of the most important new developments is methodology for estimating causal mediated effects using the potential outcomes framework for causal inference. Potential outcomes-based methods developed in epidemiology and statistics have important implications for understanding psychological mechanisms. We aim to provide a concise introduction to and illustration of these new methods and emphasize the importance of confounder adjustment. First, we review the traditional regression approach for estimating mediated effects. Second, we describe the potential outcomes framework. Third, we define what a confounder is and how the presence of a confounder can provide misleading evidence regarding mechanisms of interventions. Fourth, we describe experimental designs that can help rule out confounder bias. Fifth, we describe new statistical approaches to adjust for measured confounders of the mediator-outcome relation and sensitivity analyses to probe effects of unmeasured confounders on the mediated effect. All approaches are illustrated with application to a real counseling intervention dataset. Counseling psychologists interested in understanding the causal mechanisms of their interventions can benefit from incorporating the most up-to-date techniques into their mediation analyses. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Cocco, Simona; Leibler, Stanislas; Monasson, Rémi
2009-01-01
Complexity of neural systems often makes impracticable explicit measurements of all interactions between their constituents. Inverse statistical physics approaches, which infer effective couplings between neurons from their spiking activity, have been so far hindered by their computational complexity. Here, we present 2 complementary, computationally efficient inverse algorithms based on the Ising and “leaky integrate-and-fire” models. We apply those algorithms to reanalyze multielectrode recordings in the salamander retina in darkness and under random visual stimulus. We find strong positive couplings between nearby ganglion cells common to both stimuli, whereas long-range couplings appear under random stimulus only. The uncertainty on the inferred couplings due to limitations in the recordings (duration, small area covered on the retina) is discussed. Our methods will allow real-time evaluation of couplings for large assemblies of neurons. PMID:19666487
NASA Astrophysics Data System (ADS)
Aonishi, Toru; Mimura, Kazushi; Utsunomiya, Shoko; Okada, Masato; Yamamoto, Yoshihisa
2017-10-01
The coherent Ising machine (CIM) has attracted attention as one of the most effective Ising computing architectures for solving large scale optimization problems because of its scalability and high-speed computational ability. However, it is difficult to implement the Ising computation in the CIM because the theories and techniques of classical thermodynamic equilibrium Ising spin systems cannot be directly applied to the CIM. This means we have to adapt these theories and techniques to the CIM. Here we focus on a ferromagnetic model and a finite loading Hopfield model, which are canonical models sharing a common mathematical structure with almost all other Ising models. We derive macroscopic equations to capture nonequilibrium phase transitions in these models. The statistical mechanical methods developed here constitute a basis for constructing evaluation methods for other Ising computation models.
Bruner, L H; Carr, G J; Harbell, J W; Curren, R D
2002-06-01
An approach commonly used to measure new toxicity test method (NTM) performance in validation studies is to divide toxicity results into positive and negative classifications, and the identify true positive (TP), true negative (TN), false positive (FP) and false negative (FN) results. After this step is completed, the contingent probability statistics (CPS), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) are calculated. Although these statistics are widely used and often the only statistics used to assess the performance of toxicity test methods, there is little specific guidance in the validation literature on what values for these statistics indicate adequate performance. The purpose of this study was to begin developing data-based answers to this question by characterizing the CPS obtained from an NTM whose data have a completely random association with a reference test method (RTM). Determining the CPS of this worst-case scenario is useful because it provides a lower baseline from which the performance of an NTM can be judged in future validation studies. It also provides an indication of relationships in the CPS that help identify random or near-random relationships in the data. The results from this study of randomly associated tests show that the values obtained for the statistics vary significantly depending on the cut-offs chosen, that high values can be obtained for individual statistics, and that the different measures cannot be considered independently when evaluating the performance of an NTM. When the association between results of an NTM and RTM is random the sum of the complementary pairs of statistics (sensitivity + specificity, NPV + PPV) is approximately 1, and the prevalence (i.e., the proportion of toxic chemicals in the population of chemicals) and PPV are equal. Given that combinations of high sensitivity-low specificity or low specificity-high sensitivity (i.e., the sum of the sensitivity and specificity equal to approximately 1) indicate lack of predictive capacity, an NTM having these performance characteristics should be considered no better for predicting toxicity than by chance alone.
Truly random number generation: an example
NASA Astrophysics Data System (ADS)
Frauchiger, Daniela; Renner, Renato
2013-10-01
Randomness is crucial for a variety of applications, ranging from gambling to computer simulations, and from cryptography to statistics. However, many of the currently used methods for generating randomness do not meet the criteria that are necessary for these applications to work properly and safely. A common problem is that a sequence of numbers may look random but nevertheless not be truly random. In fact, the sequence may pass all standard statistical tests and yet be perfectly predictable. This renders it useless for many applications. For example, in cryptography, the predictability of a "andomly" chosen password is obviously undesirable. Here, we review a recently developed approach to generating true | and hence unpredictable | randomness.
Cycling transport safety quantification
NASA Astrophysics Data System (ADS)
Drbohlav, Jiri; Kocourek, Josef
2018-05-01
Dynamic interest in cycling transport brings the necessity to design safety cycling infrastructure. In las few years, couple of norms with safety elements have been designed and suggested for the cycling infrastructure. But these were not fully examined. The main parameter of suitable and fully functional transport infrastructure is the evaluation of its safety. Common evaluation of transport infrastructure safety is based on accident statistics. These statistics are suitable for motor vehicle transport but unsuitable for the cycling transport. Cycling infrastructure evaluation of safety is suitable for the traffic conflicts monitoring. The results of this method are fast, based on real traffic situations and can be applied on any traffic situations.
NASA Astrophysics Data System (ADS)
Ma, Nan; Jena, Debdeep
2015-03-01
In this work, the consequence of the high band-edge density of states on the carrier statistics and quantum capacitance in transition metal dichalcogenide two-dimensional semiconductor devices is explored. The study questions the validity of commonly used expressions for extracting carrier densities and field-effect mobilities from the transfer characteristics of transistors with such channel materials. By comparison to experimental data, a new method for the accurate extraction of carrier densities and mobilities is outlined. The work thus highlights a fundamental difference between these materials and traditional semiconductors that must be considered in future experimental measurements.
Kuss, O
2015-03-30
Meta-analyses with rare events, especially those that include studies with no event in one ('single-zero') or even both ('double-zero') treatment arms, are still a statistical challenge. In the case of double-zero studies, researchers in general delete these studies or use continuity corrections to avoid them. A number of arguments against both options has been given, and statistical methods that use the information from double-zero studies without using continuity corrections have been proposed. In this paper, we collect them and compare them by simulation. This simulation study tries to mirror real-life situations as completely as possible by deriving true underlying parameters from empirical data on actually performed meta-analyses. It is shown that for each of the commonly encountered effect estimators valid statistical methods are available that use the information from double-zero studies without using continuity corrections. Interestingly, all of them are truly random effects models, and so also the current standard method for very sparse data as recommended from the Cochrane collaboration, the Yusuf-Peto odds ratio, can be improved on. For actual analysis, we recommend to use beta-binomial regression methods to arrive at summary estimates for the odds ratio, the relative risk, or the risk difference. Methods that ignore information from double-zero studies or use continuity corrections should no longer be used. We illustrate the situation with an example where the original analysis ignores 35 double-zero studies, and a superior analysis discovers a clinically relevant advantage of off-pump surgery in coronary artery bypass grafting. Copyright © 2014 John Wiley & Sons, Ltd.
An Embedded Statistical Method for Coupling Molecular Dynamics and Finite Element Analyses
NASA Technical Reports Server (NTRS)
Saether, E.; Glaessgen, E.H.; Yamakov, V.
2008-01-01
The coupling of molecular dynamics (MD) simulations with finite element methods (FEM) yields computationally efficient models that link fundamental material processes at the atomistic level with continuum field responses at higher length scales. The theoretical challenge involves developing a seamless connection along an interface between two inherently different simulation frameworks. Various specialized methods have been developed to solve particular classes of problems. Many of these methods link the kinematics of individual MD atoms with FEM nodes at their common interface, necessarily requiring that the finite element mesh be refined to atomic resolution. Some of these coupling approaches also require simulations to be carried out at 0 K and restrict modeling to two-dimensional material domains due to difficulties in simulating full three-dimensional material processes. In the present work, a new approach to MD-FEM coupling is developed based on a restatement of the standard boundary value problem used to define a coupled domain. The method replaces a direct linkage of individual MD atoms and finite element (FE) nodes with a statistical averaging of atomistic displacements in local atomic volumes associated with each FE node in an interface region. The FEM and MD computational systems are effectively independent and communicate only through an iterative update of their boundary conditions. With the use of statistical averages of the atomistic quantities to couple the two computational schemes, the developed approach is referred to as an embedded statistical coupling method (ESCM). ESCM provides an enhanced coupling methodology that is inherently applicable to three-dimensional domains, avoids discretization of the continuum model to atomic scale resolution, and permits finite temperature states to be applied.
Age-dependent biochemical quantities: an approach for calculating reference intervals.
Bjerner, J
2007-01-01
A parametric method is often preferred when calculating reference intervals for biochemical quantities, as non-parametric methods are less efficient and require more observations/study subjects. Parametric methods are complicated, however, because of three commonly encountered features. First, biochemical quantities seldom display a Gaussian distribution, and there must either be a transformation procedure to obtain such a distribution or a more complex distribution has to be used. Second, biochemical quantities are often dependent on a continuous covariate, exemplified by rising serum concentrations of MUC1 (episialin, CA15.3) with increasing age. Third, outliers often exert substantial influence on parametric estimations and therefore need to be excluded before calculations are made. The International Federation of Clinical Chemistry (IFCC) currently recommends that confidence intervals be calculated for the reference centiles obtained. However, common statistical packages allowing for the adjustment of a continuous covariate do not make this calculation. In the method described in the current study, Tukey's fence is used to eliminate outliers and two-stage transformations (modulus-exponential-normal) in order to render Gaussian distributions. Fractional polynomials are employed to model functions for mean and standard deviations dependent on a covariate, and the model is selected by maximum likelihood. Confidence intervals are calculated for the fitted centiles by combining parameter estimation and sampling uncertainties. Finally, the elimination of outliers was made dependent on covariates by reiteration. Though a good knowledge of statistical theory is needed when performing the analysis, the current method is rewarding because the results are of practical use in patient care.
Identification of altered pathways in breast cancer based on individualized pathway aberrance score.
Shi, Sheng-Hong; Zhang, Wei; Jiang, Jing; Sun, Long
2017-08-01
The objective of the present study was to identify altered pathways in breast cancer based on the individualized pathway aberrance score (iPAS) method combined with the normal reference (nRef). There were 4 steps to identify altered pathways using the iPAS method: Data preprocessing conducted by the robust multi-array average (RMA) algorithm; gene-level statistics based on average Z ; pathway-level statistics according to iPAS; and a significance test dependent on 1 sample Wilcoxon test. The altered pathways were validated by calculating the changed percentage of each pathway in tumor samples and comparing them with pathways from differentially expressed genes (DEGs). A total of 688 altered pathways with P<0.01 were identified, including kinesin (KIF)- and polo-like kinase (PLK)-mediated events. When the percentage of change reached 50%, 310 pathways were involved in the total 688 altered pathways, which may validate the present results. In addition, there were 324 DEGs and 155 common genes between DEGs and pathway genes. DEGs and common genes were enriched in the same 9 significant terms, which also were members of altered pathways. The iPAS method was suitable for identifying altered pathways in breast cancer. Altered pathways (such as KIF and PLK mediated events) were important for understanding breast cancer mechanisms and for the future application of customized therapeutic decisions.
Mean-field approximation for spacing distribution functions in classical systems
NASA Astrophysics Data System (ADS)
González, Diego Luis; Pimpinelli, Alberto; Einstein, T. L.
2012-01-01
We propose a mean-field method to calculate approximately the spacing distribution functions p(n)(s) in one-dimensional classical many-particle systems. We compare our method with two other commonly used methods, the independent interval approximation and the extended Wigner surmise. In our mean-field approach, p(n)(s) is calculated from a set of Langevin equations, which are decoupled by using a mean-field approximation. We find that in spite of its simplicity, the mean-field approximation provides good results in several systems. We offer many examples illustrating that the three previously mentioned methods give a reasonable description of the statistical behavior of the system. The physical interpretation of each method is also discussed.
Im, Jeong-Soo; Choi, Soon Ho; Hong, Duho; Seo, Hwa Jeong; Park, Subin; Hong, Jin Pyo
2011-01-01
This study was conducted to examine differences in proximal risk factors and suicide methods by sex and age in the national suicide mortality data in Korea. Data were collected from the National Police Agency and the National Statistical Office of Korea on suicide completers from 2004 to 2006. The 31,711 suicide case records were used to analyze suicide rates, methods, and proximal risk factors by sex and age. Suicide rate increased with age, especially in men. The most common proximal risk factor for suicide was medical illness in both sexes. The most common proximal risk factor for subjects younger than 30 years was found to be a conflict in relationships with family members, partner, or friends. Medical illness was found to increase in prevalence as a risk factor with age. Hanging/Suffocation was the most common suicide method used by both sexes. The use of drug/pesticide poisoning to suicide increased with age. A fall from height or hanging/suffocation was more popular in the younger age groups. Because proximal risk factors and suicide methods varied with sex and age, different suicide prevention measures are required after consideration of both of these parameters. Copyright © 2011 Elsevier Inc. All rights reserved.
Ilechukwu, GC; Ilechukwu, CGA; Ubesie, AC; Onyire, NB; Emechebe, G; Eze, JC
2014-01-01
Background: Intestinal helminthiasis is associated with malnutrition in children. Aim: The objective of this study was to determine the intensity and effect of the common intestinal helminths on the nutritional status of children in Enugu, Nigeria. Subjects and Methods: A cross-sectional study of 460 children conducted in Enugu metropolis, south-east Nigeria between August and September 2003. Their stools were analyzed at the research laboratory of the Federal Ministry of Health, National Arbovirus and Vector Research Center, Enugu. The intensity of the common intestinal helminths was determined using the standard Kato-Katz method of fresh stool samples. The classification intensity of helminthic infestation was according to the World Health Organization classification. Data were analyzed using Statistical Software for Social Sciences version 11.0 (Chicago IL, USA). P < 0.05 was regarded as statistically significant. Results: 452 of 460 children (98.3%) had normal height for age, weight for age and weight for height Z-scores. Six of the 460 children (1.3% were wasted), 1/460 stunted (0.2%) and 1/460 wasted and stunted (0.2%). 150 out of 460 (32.6%) studied were infected with helminths. There was no significant relationship between the intensity of helminth infection and the nutritional status of the children. Conclusion: Although the prevalence of helminthiasis in children in Enugu was high, intensity of helminthiasis in these children was mainly mild. Hence, majority of them had normal weight and height measurements for age and sex. PMID:25184077
Wang, Feng-Juan; Zhang, Dai; Liu, Zhao-Hui; Wu, Wen-Xiang; Bai, Hui-Hui; Dong, Han-Yu
2016-01-01
Background: Vulvovaginal candidiasis (VVC) was a common infection associated with lifelong harassment of woman's social and sexual life. The purpose of this study was to describe the species distribution and in vitro antifungal susceptibility of Candida species (Candida spp.) isolated from patients with VVC over 8 years. Methods: Species which isolated from patients with VVC in Peking University First Hospital were identified using chromogenic culture media. Susceptibility to common antifungal agents was determined using agar diffusion method based on CLSI M44-A2 document. SPSS software (version 14.0, Inc., Chicago, IL, USA) was used for statistical analysis, involving statistical description and Chi-square test. Results: The most common strains were Candida (C.) albicans, 80.5% (n = 1775) followed by C. glabrata, 18.1% (n = 400). Nystatin exhibited excellent activity against all species (<4% resistant [R]). Resistance to azole drugs varied among different species. C. albicans: clotrimazole (3.1% R) < fluconazole (16.6% R) < itraconazole (51.5% R) < miconazole (54.0% R); C. glabrata: miconazole (25.6% R) < clotrimazole (50.5% R) < itraconazole (61.9% R) < fluconazole (73.3% R); Candida krusei: clotrimazole (0 R) < fluconazole (57.7% R) < miconazole (73.1% R) < itraconazole (83.3% R). The susceptibility of fluconazole was noticeably decreasing among all species in the study period. Conclusions: Nystatin was the optimal choice for the treatment of VVC at present. The species distribution and in vitro antifungal susceptibility of Candida spp. isolated from patients with VVC had changed over time. PMID:27174323
How Prevalent Is Object-Based Attention?
Pilz, Karin S.; Roggeveen, Alexa B.; Creighton, Sarah E.; Bennett, Patrick J.; Sekuler, Allison B.
2012-01-01
Previous research suggests that visual attention can be allocated to locations in space (space-based attention) and to objects (object-based attention). The cueing effects associated with space-based attention tend to be large and are found consistently across experiments. Object-based attention effects, however, are small and found less consistently across experiments. In three experiments we address the possibility that variability in object-based attention effects across studies reflects low incidence of such effects at the level of individual subjects. Experiment 1 measured space-based and object-based cueing effects for horizontal and vertical rectangles in 60 subjects comparing commonly used target detection and discrimination tasks. In Experiment 2 we ran another 120 subjects in a target discrimination task in which rectangle orientation varied between subjects. Using parametric statistical methods, we found object-based effects only for horizontal rectangles. Bootstrapping methods were used to measure effects in individual subjects. Significant space-based cueing effects were found in nearly all subjects in both experiments, across tasks and rectangle orientations. However, only a small number of subjects exhibited significant object-based cueing effects. Experiment 3 measured only object-based attention effects using another common paradigm and again, using bootstrapping, we found only a small number of subjects that exhibited significant object-based cueing effects. Our results show that object-based effects are more prevalent for horizontal rectangles, which is in accordance with the theory that attention may be allocated more easily along the horizontal meridian. The fact that so few individuals exhibit a significant object-based cueing effect presumably is why previous studies of this effect might have yielded inconsistent results. The results from the current study highlight the importance of considering individual subject data in addition to commonly used statistical methods. PMID:22348018
Chen, Jonathan H; Podchiyska, Tanya
2016-01-01
Objective: To answer a “grand challenge” in clinical decision support, the authors produced a recommender system that automatically data-mines inpatient decision support from electronic medical records (EMR), analogous to Netflix or Amazon.com’s product recommender. Materials and Methods: EMR data were extracted from 1 year of hospitalizations (>18K patients with >5.4M structured items including clinical orders, lab results, and diagnosis codes). Association statistics were counted for the ∼1.5K most common items to drive an order recommender. The authors assessed the recommender’s ability to predict hospital admission orders and outcomes based on initial encounter data from separate validation patients. Results: Compared to a reference benchmark of using the overall most common orders, the recommender using temporal relationships improves precision at 10 recommendations from 33% to 38% (P < 10−10) for hospital admission orders. Relative risk-based association methods improve inverse frequency weighted recall from 4% to 16% (P < 10−16). The framework yields a prediction receiver operating characteristic area under curve (c-statistic) of 0.84 for 30 day mortality, 0.84 for 1 week need for ICU life support, 0.80 for 1 week hospital discharge, and 0.68 for 30-day readmission. Discussion: Recommender results quantitatively improve on reference benchmarks and qualitatively appear clinically reasonable. The method assumes that aggregate decision making converges appropriately, but ongoing evaluation is necessary to discern common behaviors from “correct” ones. Conclusions: Collaborative filtering recommender algorithms generate clinical decision support that is predictive of real practice patterns and clinical outcomes. Incorporating temporal relationships improves accuracy. Different evaluation metrics satisfy different goals (predicting likely events vs. “interesting” suggestions). PMID:26198303
Meta-analysis as Statistical and Analytical Method of Journal’s Content Scientific Evaluation
Masic, Izet; Begic, Edin
2015-01-01
Introduction: A meta-analysis is a statistical and analytical method which combines and synthesizes different independent studies and integrates their results into one common result. Goal: Analysis of the journals “Medical Archives”, “Materia Socio Medica” and “Acta Informatica Medica”, which are located in the most eminent indexed databases of the biomedical milieu. Material and methods: The study has retrospective and descriptive character, and included the period of the calendar year 2014. Study included six editions of all three journals (total of 18 journals). Results: In this period was published a total of 291 articles (in the “Medical Archives” 110, “Materia Socio Medica” 97, and in “Acta Informatica Medica” 84). The largest number of articles was original articles. Small numbers have been published as professional, review articles and case reports. Clinical events were most common in the first two journals, while in the journal “Acta Informatica Medica” belonged to the field of medical informatics, as part of pre-clinical medical disciplines. Articles are usually required period of fifty to fifty nine days for review. Articles were received from four continents, mostly from Europe. The authors are most often from the territory of Bosnia and Herzegovina, then Iran, Kosovo and Macedonia. Conclusion: The number of articles published each year is increasing, with greater participation of authors from different continents and abroad. Clinical medical disciplines are the most common, with the broader spectrum of topics and with a growing number of original articles. Greater support of the wider scientific community is needed for further development of all three of the aforementioned journals. PMID:25870484
Statistical issues on the analysis of change in follow-up studies in dental research.
Blance, Andrew; Tu, Yu-Kang; Baelum, Vibeke; Gilthorpe, Mark S
2007-12-01
To provide an overview to the problems in study design and associated analyses of follow-up studies in dental research, particularly addressing three issues: treatment-baselineinteractions; statistical power; and nonrandomization. Our previous work has shown that many studies purport an interacion between change (from baseline) and baseline values, which is often based on inappropriate statistical analyses. A priori power calculations are essential for randomized controlled trials (RCTs), but in the pre-test/post-test RCT design it is not well known to dental researchers that the choice of statistical method affects power, and that power is affected by treatment-baseline interactions. A common (good) practice in the analysis of RCT data is to adjust for baseline outcome values using ancova, thereby increasing statistical power. However, an important requirement for ancova is there to be no interaction between the groups and baseline outcome (i.e. effective randomization); the patient-selection process should not cause differences in mean baseline values across groups. This assumption is often violated for nonrandomized (observational) studies and the use of ancova is thus problematic, potentially giving biased estimates, invoking Lord's paradox and leading to difficulties in the interpretation of results. Baseline interaction issues can be overcome by use of statistical methods; not widely practiced in dental research: Oldham's method and multilevel modelling; the latter is preferred for its greater flexibility to deal with more than one follow-up occasion as well as additional covariates To illustrate these three key issues, hypothetical examples are considered from the fields of periodontology, orthodontics, and oral implantology. Caution needs to be exercised when considering the design and analysis of follow-up studies. ancova is generally inappropriate for nonrandomized studies and causal inferences from observational data should be avoided.
ASCS online fault detection and isolation based on an improved MPCA
NASA Astrophysics Data System (ADS)
Peng, Jianxin; Liu, Haiou; Hu, Yuhui; Xi, Junqiang; Chen, Huiyan
2014-09-01
Multi-way principal component analysis (MPCA) has received considerable attention and been widely used in process monitoring. A traditional MPCA algorithm unfolds multiple batches of historical data into a two-dimensional matrix and cut the matrix along the time axis to form subspaces. However, low efficiency of subspaces and difficult fault isolation are the common disadvantages for the principal component model. This paper presents a new subspace construction method based on kernel density estimation function that can effectively reduce the storage amount of the subspace information. The MPCA model and the knowledge base are built based on the new subspace. Then, fault detection and isolation with the squared prediction error (SPE) statistic and the Hotelling ( T 2) statistic are also realized in process monitoring. When a fault occurs, fault isolation based on the SPE statistic is achieved by residual contribution analysis of different variables. For fault isolation of subspace based on the T 2 statistic, the relationship between the statistic indicator and state variables is constructed, and the constraint conditions are presented to check the validity of fault isolation. Then, to improve the robustness of fault isolation to unexpected disturbances, the statistic method is adopted to set the relation between single subspace and multiple subspaces to increase the corrective rate of fault isolation. Finally fault detection and isolation based on the improved MPCA is used to monitor the automatic shift control system (ASCS) to prove the correctness and effectiveness of the algorithm. The research proposes a new subspace construction method to reduce the required storage capacity and to prove the robustness of the principal component model, and sets the relationship between the state variables and fault detection indicators for fault isolation.
A Bootstrap Algorithm for Mixture Models and Interval Data in Inter-Comparisons
2001-07-01
parametric bootstrap. The present algorithm will be applied to a thermometric inter-comparison, where data cannot be assumed to be normally distributed. 2 Data...experimental methods, used in each laboratory) often imply that the statistical assumptions are not satisfied, as for example in several thermometric ...triangular). Indeed, in thermometric experiments these three probabilistic models can represent several common stochastic variabilities for
W. Keith Moser; Zhaofei Fan; Mark H. Hansen; Michael K. Crosby; Shirley X. Fan
2016-01-01
We used non-native invasive plant data from the US Forest Serviceâs Forest Inventory and Analysis (FIA) program, spatial statistical methods, and the space (cover class)-for-time approach to quantify the invasion potential and success ("invasibility") of three major invasive shrubs (multiflora rose, non-native bush honeysuckles, and common buckthorn...
Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Su, Edwin P; Grauer, Jonathan N
2018-03-01
Despite the advantages of large, national datasets, one continuing concern is missing data values. Complete case analysis, where only cases with complete data are analyzed, is commonly used rather than more statistically rigorous approaches such as multiple imputation. This study characterizes the potential selection bias introduced using complete case analysis and compares the results of common regressions using both techniques following unicompartmental knee arthroplasty. Patients undergoing unicompartmental knee arthroplasty were extracted from the 2005 to 2015 National Surgical Quality Improvement Program. As examples, the demographics of patients with and without missing preoperative albumin and hematocrit values were compared. Missing data were then treated with both complete case analysis and multiple imputation (an approach that reproduces the variation and associations that would have been present in a full dataset) and the conclusions of common regressions for adverse outcomes were compared. A total of 6117 patients were included, of which 56.7% were missing at least one value. Younger, female, and healthier patients were more likely to have missing preoperative albumin and hematocrit values. The use of complete case analysis removed 3467 patients from the study in comparison with multiple imputation which included all 6117 patients. The 2 methods of handling missing values led to differing associations of low preoperative laboratory values with commonly studied adverse outcomes. The use of complete case analysis can introduce selection bias and may lead to different conclusions in comparison with the statistically rigorous multiple imputation approach. Joint surgeons should consider the methods of handling missing values when interpreting arthroplasty research. Copyright © 2017 Elsevier Inc. All rights reserved.
Estimating the Diets of Animals Using Stable Isotopes and a Comprehensive Bayesian Mixing Model
Hopkins, John B.; Ferguson, Jake M.
2012-01-01
Using stable isotope mixing models (SIMMs) as a tool to investigate the foraging ecology of animals is gaining popularity among researchers. As a result, statistical methods are rapidly evolving and numerous models have been produced to estimate the diets of animals—each with their benefits and their limitations. Deciding which SIMM to use is contingent on factors such as the consumer of interest, its food sources, sample size, the familiarity a user has with a particular framework for statistical analysis, or the level of inference the researcher desires to make (e.g., population- or individual-level). In this paper, we provide a review of commonly used SIMM models and describe a comprehensive SIMM that includes all features commonly used in SIMM analysis and two new features. We used data collected in Yosemite National Park to demonstrate IsotopeR's ability to estimate dietary parameters. We then examined the importance of each feature in the model and compared our results to inferences from commonly used SIMMs. IsotopeR's user interface (in R) will provide researchers a user-friendly tool for SIMM analysis. The model is also applicable for use in paleontology, archaeology, and forensic studies as well as estimating pollution inputs. PMID:22235246
Epidemiology of musculoskeletal injury in the California film and motion picture industry
Kusnezov, Nicholas A.; Yazdanshenas, Hamed; Garcia, Eddie
2016-01-01
Introduction Musculoskeletal injury exerts a significant burden on US industry. The purpose of this study was to investigate the frequency and characteristics of musculoskeletal injuries in the California (CA) film and motion picture (FMP) industry which may result in unforeseen morbidity and mortality. Methods We reviewed the workers’ compensation (WC) claims database of the Workers’ Compensation Insurance Rating Bureau of California (WCIRB) and employment statistics through the US Bureau of Labor Statistics (BLS). We analyzed the frequency, type, body part affected, and cause of musculoskeletal injuries. Results From 2003 to 2009, there were 3505 WC claims of which 94.4% were musculoskeletal. In the CA FMP industry, the most common injuries were strains (38.4%), sprains (12.2%), and fractures (11.7%). The most common sites of isolated injury were the knee (18.9%), lower back (15.0%), and ankle (8.6%). Isolated musculoskeletal spine injuries represented 19.3% of all injuries. The most common causes of injury were work-directed activity (36.0%) and falls (25.5%). Conclusion We present the first report on the unique profile of musculoskeletal injury claims in the FMP industry. This data provides direction for improvement of workplace safety. PMID:26812757
Zhu, Yun; Fan, Ruzong; Xiong, Momiao
2017-01-01
Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics. PMID:29040274
Dimension from covariance matrices.
Carroll, T L; Byers, J M
2017-02-01
We describe a method to estimate embedding dimension from a time series. This method includes an estimate of the probability that the dimension estimate is valid. Such validity estimates are not common in algorithms for calculating the properties of dynamical systems. The algorithm described here compares the eigenvalues of covariance matrices created from an embedded signal to the eigenvalues for a covariance matrix of a Gaussian random process with the same dimension and number of points. A statistical test gives the probability that the eigenvalues for the embedded signal did not come from the Gaussian random process.
Williams, Donald R; Carlsson, Rickard; Bürkner, Paul-Christian
2017-10-01
Developmental studies of hormones and behavior often include littermates-rodent siblings that share early-life experiences and genes. Due to between-litter variation (i.e., litter effects), the statistical assumption of independent observations is untenable. In two literatures-natural variation in maternal care and prenatal stress-entire litters are categorized based on maternal behavior or experimental condition. Here, we (1) review both literatures; (2) simulate false positive rates for commonly used statistical methods in each literature; and (3) characterize small sample performance of multilevel models (MLM) and generalized estimating equations (GEE). We found that the assumption of independence was routinely violated (>85%), false positives (α=0.05) exceeded nominal levels (up to 0.70), and power (1-β) rarely surpassed 0.80 (even for optimistic sample and effect sizes). Additionally, we show that MLMs and GEEs have adequate performance for common research designs. We discuss implications for the extant literature, the field of behavioral neuroendocrinology, and provide recommendations. Copyright © 2017 Elsevier Inc. All rights reserved.
The Grammar of Exchange: A Comparative Study of Reciprocal Constructions Across Languages
Majid, Asifa; Evans, Nicholas; Gaby, Alice; Levinson, Stephen C.
2010-01-01
Cultures are built on social exchange. Most languages have dedicated grammatical machinery for expressing this. To demonstrate that statistical methods can also be applied to grammatical meaning, we here ask whether the underlying meanings of these grammatical constructions are based on shared common concepts. To explore this, we designed video stimuli of reciprocated actions (e.g., “giving to each other”) and symmetrical states (e.g., “sitting next to each other”), and with the help of a team of linguists collected responses from 20 languages around the world. Statistical analyses revealed that many languages do, in fact, share a common conceptual core for reciprocal meanings but that this is not a universally expressed concept. The recurrent pattern of conceptual packaging found across languages is compatible with the view that there is a shared non-linguistic understanding of reciprocation. But, nevertheless, there are considerable differences between languages in the exact extensional patterns, highlighting that even in the domain of grammar semantics is highly language-specific. PMID:21713188
Obuchowski, Nancy A; Barnhart, Huiman X; Buckler, Andrew J; Pennello, Gene; Wang, Xiao-Feng; Kalpathy-Cramer, Jayashree; Kim, Hyun J Grace; Reeves, Anthony P
2015-02-01
Quantitative imaging biomarkers are being used increasingly in medicine to diagnose and monitor patients' disease. The computer algorithms that measure quantitative imaging biomarkers have different technical performance characteristics. In this paper we illustrate the appropriate statistical methods for assessing and comparing the bias, precision, and agreement of computer algorithms. We use data from three studies of pulmonary nodules. The first study is a small phantom study used to illustrate metrics for assessing repeatability. The second study is a large phantom study allowing assessment of four algorithms' bias and reproducibility for measuring tumor volume and the change in tumor volume. The third study is a small clinical study of patients whose tumors were measured on two occasions. This study allows a direct assessment of six algorithms' performance for measuring tumor change. With these three examples we compare and contrast study designs and performance metrics, and we illustrate the advantages and limitations of various common statistical methods for quantitative imaging biomarker studies. © The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
A Method for Modeling Household Occupant Behavior to Simulate Residential Energy Consumption
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Brandon J; Starke, Michael R; Abdelaziz, Omar
2014-01-01
This paper presents a statistical method for modeling the behavior of household occupants to estimate residential energy consumption. Using data gathered by the U.S. Census Bureau in the American Time Use Survey (ATUS), actions carried out by survey respondents are categorized into ten distinct activities. These activities are defined to correspond to the major energy consuming loads commonly found within the residential sector. Next, time varying minute resolution Markov chain based statistical models of different occupant types are developed. Using these behavioral models, individual occupants are simulated to show how an occupant interacts with the major residential energy consuming loadsmore » throughout the day. From these simulations, the minimum number of occupants, and consequently the minimum number of multiple occupant households, needing to be simulated to produce a statistically accurate representation of aggregate residential behavior can be determined. Finally, future work will involve the use of these occupant models along side residential load models to produce a high-resolution energy consumption profile and estimate the potential for demand response from residential loads.« less
Quantitative knowledge acquisition for expert systems
NASA Technical Reports Server (NTRS)
Belkin, Brenda L.; Stengel, Robert F.
1991-01-01
A common problem in the design of expert systems is the definition of rules from data obtained in system operation or simulation. While it is relatively easy to collect data and to log the comments of human operators engaged in experiments, generalizing such information to a set of rules has not previously been a direct task. A statistical method is presented for generating rule bases from numerical data, motivated by an example based on aircraft navigation with multiple sensors. The specific objective is to design an expert system that selects a satisfactory suite of measurements from a dissimilar, redundant set, given an arbitrary navigation geometry and possible sensor failures. The systematic development is described of a Navigation Sensor Management (NSM) Expert System from Kalman Filter convariance data. The method invokes two statistical techniques: Analysis of Variance (ANOVA) and the ID3 Algorithm. The ANOVA technique indicates whether variations of problem parameters give statistically different covariance results, and the ID3 algorithms identifies the relationships between the problem parameters using probabilistic knowledge extracted from a simulation example set. Both are detailed.
Duc, Anh Nguyen; Wolbers, Marcel
2017-02-10
Composite endpoints are widely used as primary endpoints of randomized controlled trials across clinical disciplines. A common critique of the conventional analysis of composite endpoints is that all disease events are weighted equally, whereas their clinical relevance may differ substantially. We address this by introducing a framework for the weighted analysis of composite endpoints and interpretable test statistics, which are applicable to both binary and time-to-event data. To cope with the difficulty of selecting an exact set of weights, we propose a method for constructing simultaneous confidence intervals and tests that asymptotically preserve the family-wise type I error in the strong sense across families of weights satisfying flexible inequality or order constraints based on the theory of χ¯2-distributions. We show that the method achieves the nominal simultaneous coverage rate with substantial efficiency gains over Scheffé's procedure in a simulation study and apply it to trials in cardiovascular disease and enteric fever. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
He, Xiulan; Sonnenborg, Torben O.; Jørgensen, Flemming; Jensen, Karsten H.
2017-03-01
Stationarity has traditionally been a requirement of geostatistical simulations. A common way to deal with non-stationarity is to divide the system into stationary sub-regions and subsequently merge the realizations for each region. Recently, the so-called partition approach that has the flexibility to model non-stationary systems directly was developed for multiple-point statistics simulation (MPS). The objective of this study is to apply the MPS partition method with conventional borehole logs and high-resolution airborne electromagnetic (AEM) data, for simulation of a real-world non-stationary geological system characterized by a network of connected buried valleys that incise deeply into layered Miocene sediments (case study in Denmark). The results show that, based on fragmented information of the formation boundaries, the MPS partition method is able to simulate a non-stationary system including valley structures embedded in a layered Miocene sequence in a single run. Besides, statistical information retrieved from the AEM data improved the simulation of the geology significantly, especially for the deep-seated buried valley sediments where borehole information is sparse.
Buttigieg, Pier Luigi; Ramette, Alban
2014-12-01
The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynamic, web-based resource providing accessible descriptions of numerous multivariate techniques relevant to microbial ecologists. A combination of interactive elements allows users to discover and navigate between methods relevant to their needs and examine how they have been used by others in the field. We have designed GUSTA ME to become a community-led and -curated service, which we hope will provide a common reference and forum to discuss and disseminate analytical techniques relevant to the microbial ecology community. © 2014 The Authors. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies.
Design of off-statistics axial-flow fans by means of vortex law optimization
NASA Astrophysics Data System (ADS)
Lazari, Andrea; Cattanei, Andrea
2014-12-01
Off-statistics input data sets are common in axial-flow fans design and may easily result in some violation of the requirements of a good aerodynamic blade design. In order to circumvent this problem, in the present paper, a solution to the radial equilibrium equation is found which minimizes the outlet kinetic energy and fulfills the aerodynamic constraints, thus ensuring that the resulting blade has acceptable aerodynamic performance. The presented method is based on the optimization of a three-parameters vortex law and of the meridional channel size. The aerodynamic quantities to be employed as constraints are individuated and their suitable ranges of variation are proposed. The method is validated by means of a design with critical input data values and CFD analysis. Then, by means of systematic computations with different input data sets, some correlations and charts are obtained which are analogous to classic correlations based on statistical investigations on existing machines. Such new correlations help size a fan of given characteristics as well as study the feasibility of a given design.
Boe, Debra Thingstad; Parsons, Helen
2009-01-01
Local public health agencies are challenged to continually improve service delivery, yet they frequently operate with constrained resources. Quality improvement methods and techniques such as statistical process control are commonly used in other industries, and they have recently been proposed as a means of improving service delivery and performance in public health settings. We analyzed a quality improvement project undertaken at a local Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) clinic to reduce waiting times and improve client satisfaction with a walk-in nutrition education service. We used statistical process control techniques to evaluate initial process performance, implement an intervention, and assess process improvements. We found that implementation of these techniques significantly reduced waiting time and improved clients' satisfaction with the WIC service. PMID:19608964
Understanding Statistics - Cancer Statistics
Annual reports of U.S. cancer statistics including new cases, deaths, trends, survival, prevalence, lifetime risk, and progress toward Healthy People targets, plus statistical summaries for a number of common cancer types.
Ensemble stacking mitigates biases in inference of synaptic connectivity.
Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N
2018-01-01
A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.
Vitamin C and common cold-induced asthma: a systematic review and statistical analysis
2013-01-01
Background Asthma exacerbations are often induced by the common cold, which, in turn, can be alleviated by vitamin C. Objective To investigate whether vitamin C administration influences common cold-induced asthma. Methods Systematic review and statistical analysis of the identified trials. Medline, Scopus and Cochrane Central were searched for studies that give information on the effects of vitamin C on common cold-induced asthma. All clinically relevant outcomes related to asthma were included in this review. The estimates of vitamin C effect and their confidence intervals [CI] were calculated for the included studies. Results Three studies that were relevant for examining the role of vitamin C on common cold-induced asthma were identified. The three studies had a total of 79 participants. Two studies were randomized double-blind placebo-controlled trials. A study in Nigeria on asthmatics whose asthma attacks were precipitated by respiratory infections found that 1 g/day vitamin C decreased the occurrence of asthma attacks by 78% (95% CI: 19% to 94%). A cross-over study in former East-Germany on patients who had infection-related asthma found that 5 g/day vitamin C decreased the proportion of participants who had bronchial hypersensitivity to histamine by 52 percentage points (95% CI: 25 to 71). The third study did not use a placebo. Administration of a single dose of 1 gram of vitamin C to Italian non-asthmatic common cold patients increased the provocative concentration of histamine (PC20) 3.2-fold (95% CI: 2.0 to 5.1), but the vitamin C effect was significantly less when the same participants did not suffer from the common cold. Conclusions The three reviewed studies differed substantially in their methods, settings and outcomes. Each of them found benefits from the administration of vitamin C; either against asthma attacks or against bronchial hypersensitivity, the latter of which is a characteristic of asthma. Given the evidence suggesting that vitamin C alleviates common cold symptoms and the findings of this systematic review, it may be reasonable for asthmatic patients to test vitamin C on an individual basis, if they have exacerbations of asthma caused by respiratory infections. More research on the role of vitamin C on common cold-induced asthma is needed. PMID:24279478
Masud, Mohammad Shahed; Borisyuk, Roman; Stuart, Liz
2017-07-15
This study analyses multiple spike trains (MST) data, defines its functional connectivity and subsequently visualises an accurate diagram of connections. This is a challenging problem. For example, it is difficult to distinguish the common input and the direct functional connection of two spike trains. The new method presented in this paper is based on the traditional pairwise cross-correlation function (CCF) and a new combination of statistical techniques. First, the CCF is used to create the Advanced Correlation Grid (ACG) correlation where both the significant peak of the CCF and the corresponding time delay are used for detailed analysis of connectivity. Second, these two features of functional connectivity are used to classify connections. Finally, the visualization technique is used to represent the topology of functional connections. Examples are presented in the paper to demonstrate the new Advanced Correlation Grid method and to show how it enables discrimination between (i) influence from one spike train to another through an intermediate spike train and (ii) influence from one common spike train to another pair of analysed spike trains. The ACG method enables scientists to automatically distinguish between direct connections from spurious connections such as common source connection and indirect connection whereas existing methods require in-depth analysis to identify such connections. The ACG is a new and effective method for studying functional connectivity of multiple spike trains. This method can identify accurately all the direct connections and can distinguish common source and indirect connections automatically. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.
2017-12-01
Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.
The heritability of the functional connectome is robust to common nonlinear registration methods
NASA Astrophysics Data System (ADS)
Hafzalla, George W.; Prasad, Gautam; Baboyan, Vatche G.; Faskowitz, Joshua; Jahanshad, Neda; McMahon, Katie L.; de Zubicaray, Greig I.; Wright, Margaret J.; Braskie, Meredith N.; Thompson, Paul M.
2016-03-01
Nonlinear registration algorithms are routinely used in brain imaging, to align data for inter-subject and group comparisons, and for voxelwise statistical analyses. To understand how the choice of registration method affects maps of functional brain connectivity in a sample of 611 twins, we evaluated three popular nonlinear registration methods: Advanced Normalization Tools (ANTs), Automatic Registration Toolbox (ART), and FMRIB's Nonlinear Image Registration Tool (FNIRT). Using both structural and functional MRI, we used each of the three methods to align the MNI152 brain template, and 80 regions of interest (ROIs), to each subject's T1-weighted (T1w) anatomical image. We then transformed each subject's ROIs onto the associated resting state functional MRI (rs-fMRI) scans and computed a connectivity network or functional connectome for each subject. Given the different degrees of genetic similarity between pairs of monozygotic (MZ) and same-sex dizygotic (DZ) twins, we used structural equation modeling to estimate the additive genetic influences on the elements of the function networks, or their heritability. The functional connectome and derived statistics were relatively robust to nonlinear registration effects.
Improving stochastic estimates with inference methods: calculating matrix diagonals.
Selig, Marco; Oppermann, Niels; Ensslin, Torsten A
2012-02-01
Estimating the diagonal entries of a matrix, that is not directly accessible but only available as a linear operator in the form of a computer routine, is a common necessity in many computational applications, especially in image reconstruction and statistical inference. Here, methods of statistical inference are used to improve the accuracy or the computational costs of matrix probing methods to estimate matrix diagonals. In particular, the generalized Wiener filter methodology, as developed within information field theory, is shown to significantly improve estimates based on only a few sampling probes, in cases in which some form of continuity of the solution can be assumed. The strength, length scale, and precise functional form of the exploited autocorrelation function of the matrix diagonal is determined from the probes themselves. The developed algorithm is successfully applied to mock and real world problems. These performance tests show that, in situations where a matrix diagonal has to be calculated from only a small number of computationally expensive probes, a speedup by a factor of 2 to 10 is possible with the proposed method. © 2012 American Physical Society
Xu, Anping; Chen, Weidong; Xia, Yong; Zhou, Yu; Ji, Ling
2018-04-07
HbA1c is a widely used biomarker for diabetes mellitus management. Here, we evaluated the accuracy of six methods for determining HbA1c values in Chinese patients with common α- and β-globin chains variants in China. Blood samples from normal subjects and individuals exhibiting hemoglobin variants were analyzed for HbA1c, using Sebia Capillarys 2 Flex Piercing (C2FP), Bio-Rad Variant II Turbo 2.0, Tosoh HLC-723 G8 (ver. 5.24), Arkray ADAMS A1c HA-8180V fast mode, Cobas c501 and Trinity Ultra2 systems. DNA sequencing revealed five common β-globin chain variants and three common α-globin chain variants. The most common variant was Hb E, followed by Hb New York, Hb J-Bangkok, Hb G-Coushatta, Hb Q-Thailand, Hb G-Honolulu, Hb Ube-2 and Hb G-Taipei. Variant II Turbo 2.0, Ultra2 and Cobas c501 showed good agreement with C2FP for most samples with variants. HLC-723 G8 yielded no HbA1c values for Hb J-Bangkok, Hb Q-Thailand and Hb G-Honolulu. Samples with Hb E, Hb G-Coushatta, Hb G-Taipei and Hb Ube-2 produced significant negative biases for HLC-723 G8. HA-8180V showed statistically significant differences for Hb E, Hb G-Coushatta, Hb G-Taipei, Hb Q-Thailand and Hb G-Honolulu. HA-8180V yielded no HbA1c values for Hb J-Bangkok. All methods showed good agreement for samples with Hb New York. Some common hemoglobin variants can interfere with HbA1c determination by the most popular methods in China.
Statistical variability and confidence intervals for planar dose QA pass rates
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bailey, Daniel W.; Nelms, Benjamin E.; Attwood, Kristopher
Purpose: The most common metric for comparing measured to calculated dose, such as for pretreatment quality assurance of intensity-modulated photon fields, is a pass rate (%) generated using percent difference (%Diff), distance-to-agreement (DTA), or some combination of the two (e.g., gamma evaluation). For many dosimeters, the grid of analyzed points corresponds to an array with a low areal density of point detectors. In these cases, the pass rates for any given comparison criteria are not absolute but exhibit statistical variability that is a function, in part, on the detector sampling geometry. In this work, the authors analyze the statistics ofmore » various methods commonly used to calculate pass rates and propose methods for establishing confidence intervals for pass rates obtained with low-density arrays. Methods: Dose planes were acquired for 25 prostate and 79 head and neck intensity-modulated fields via diode array and electronic portal imaging device (EPID), and matching calculated dose planes were created via a commercial treatment planning system. Pass rates for each dose plane pair (both centered to the beam central axis) were calculated with several common comparison methods: %Diff/DTA composite analysis and gamma evaluation, using absolute dose comparison with both local and global normalization. Specialized software was designed to selectively sample the measured EPID response (very high data density) down to discrete points to simulate low-density measurements. The software was used to realign the simulated detector grid at many simulated positions with respect to the beam central axis, thereby altering the low-density sampled grid. Simulations were repeated with 100 positional iterations using a 1 detector/cm{sup 2} uniform grid, a 2 detector/cm{sup 2} uniform grid, and similar random detector grids. For each simulation, %/DTA composite pass rates were calculated with various %Diff/DTA criteria and for both local and global %Diff normalization techniques. Results: For the prostate and head/neck cases studied, the pass rates obtained with gamma analysis of high density dose planes were 2%-5% higher than respective %/DTA composite analysis on average (ranging as high as 11%), depending on tolerances and normalization. Meanwhile, the pass rates obtained via local normalization were 2%-12% lower than with global maximum normalization on average (ranging as high as 27%), depending on tolerances and calculation method. Repositioning of simulated low-density sampled grids leads to a distribution of possible pass rates for each measured/calculated dose plane pair. These distributions can be predicted using a binomial distribution in order to establish confidence intervals that depend largely on the sampling density and the observed pass rate (i.e., the degree of difference between measured and calculated dose). These results can be extended to apply to 3D arrays of detectors, as well. Conclusions: Dose plane QA analysis can be greatly affected by choice of calculation metric and user-defined parameters, and so all pass rates should be reported with a complete description of calculation method. Pass rates for low-density arrays are subject to statistical uncertainty (vs. the high-density pass rate), but these sampling errors can be modeled using statistical confidence intervals derived from the sampled pass rate and detector density. Thus, pass rates for low-density array measurements should be accompanied by a confidence interval indicating the uncertainty of each pass rate.« less
Addeh, Abdoljalil; Khormali, Aminollah; Golilarz, Noorbakhsh Amiri
2018-05-04
The control chart patterns are the most commonly used statistical process control (SPC) tools to monitor process changes. When a control chart produces an out-of-control signal, this means that the process has been changed. In this study, a new method based on optimized radial basis function neural network (RBFNN) is proposed for control chart patterns (CCPs) recognition. The proposed method consists of four main modules: feature extraction, feature selection, classification and learning algorithm. In the feature extraction module, shape and statistical features are used. Recently, various shape and statistical features have been presented for the CCPs recognition. In the feature selection module, the association rules (AR) method has been employed to select the best set of the shape and statistical features. In the classifier section, RBFNN is used and finally, in RBFNN, learning algorithm has a high impact on the network performance. Therefore, a new learning algorithm based on the bees algorithm has been used in the learning module. Most studies have considered only six patterns: Normal, Cyclic, Increasing Trend, Decreasing Trend, Upward Shift and Downward Shift. Since three patterns namely Normal, Stratification, and Systematic are very similar to each other and distinguishing them is very difficult, in most studies Stratification and Systematic have not been considered. Regarding to the continuous monitoring and control over the production process and the exact type detection of the problem encountered during the production process, eight patterns have been investigated in this study. The proposed method is tested on a dataset containing 1600 samples (200 samples from each pattern) and the results showed that the proposed method has a very good performance. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Guillen, George; Rainey, Gail; Morin, Michelle
2004-04-01
Currently, the Minerals Management Service uses the Oil Spill Risk Analysis model (OSRAM) to predict the movement of potential oil spills greater than 1000 bbl originating from offshore oil and gas facilities. OSRAM generates oil spill trajectories using meteorological and hydrological data input from either actual physical measurements or estimates generated from other hydrological models. OSRAM and many other models produce output matrices of average, maximum and minimum contact probabilities to specific landfall or target segments (columns) from oil spills at specific points (rows). Analysts and managers are often interested in identifying geographic areas or groups of facilities that pose similar risks to specific targets or groups of targets if a spill occurred. Unfortunately, due to the potentially large matrix generated by many spill models, this question is difficult to answer without the use of data reduction and visualization methods. In our study we utilized a multivariate statistical method called cluster analysis to group areas of similar risk based on potential distribution of landfall target trajectory probabilities. We also utilized ArcView™ GIS to display spill launch point groupings. The combination of GIS and multivariate statistical techniques in the post-processing of trajectory model output is a powerful tool for identifying and delineating areas of similar risk from multiple spill sources. We strongly encourage modelers, statistical and GIS software programmers to closely collaborate to produce a more seamless integration of these technologies and approaches to analyzing data. They are complimentary methods that strengthen the overall assessment of spill risks.
Tenan, Matthew S; Tweedell, Andrew J; Haynes, Courtney A
2017-01-01
The timing of muscle activity is a commonly applied analytic method to understand how the nervous system controls movement. This study systematically evaluates six classes of standard and statistical algorithms to determine muscle onset in both experimental surface electromyography (EMG) and simulated EMG with a known onset time. Eighteen participants had EMG collected from the biceps brachii and vastus lateralis while performing a biceps curl or knee extension, respectively. Three established methods and three statistical methods for EMG onset were evaluated. Linear envelope, Teager-Kaiser energy operator + linear envelope and sample entropy were the established methods evaluated while general time series mean/variance, sequential and batch processing of parametric and nonparametric tools, and Bayesian changepoint analysis were the statistical techniques used. Visual EMG onset (experimental data) and objective EMG onset (simulated data) were compared with algorithmic EMG onset via root mean square error and linear regression models for stepwise elimination of inferior algorithms. The top algorithms for both data types were analyzed for their mean agreement with the gold standard onset and evaluation of 95% confidence intervals. The top algorithms were all Bayesian changepoint analysis iterations where the parameter of the prior (p0) was zero. The best performing Bayesian algorithms were p0 = 0 and a posterior probability for onset determination at 60-90%. While existing algorithms performed reasonably, the Bayesian changepoint analysis methodology provides greater reliability and accuracy when determining the singular onset of EMG activity in a time series. Further research is needed to determine if this class of algorithms perform equally well when the time series has multiple bursts of muscle activity.
Bieber, Frederick R; Buckleton, John S; Budowle, Bruce; Butler, John M; Coble, Michael D
2016-08-31
The evaluation and interpretation of forensic DNA mixture evidence faces greater interpretational challenges due to increasingly complex mixture evidence. Such challenges include: casework involving low quantity or degraded evidence leading to allele and locus dropout; allele sharing of contributors leading to allele stacking; and differentiation of PCR stutter artifacts from true alleles. There is variation in statistical approaches used to evaluate the strength of the evidence when inclusion of a specific known individual(s) is determined, and the approaches used must be supportable. There are concerns that methods utilized for interpretation of complex forensic DNA mixtures may not be implemented properly in some casework. Similar questions are being raised in a number of U.S. jurisdictions, leading to some confusion about mixture interpretation for current and previous casework. Key elements necessary for the interpretation and statistical evaluation of forensic DNA mixtures are described. Given the most common method for statistical evaluation of DNA mixtures in many parts of the world, including the USA, is the Combined Probability of Inclusion/Exclusion (CPI/CPE). Exposition and elucidation of this method and a protocol for use is the focus of this article. Formulae and other supporting materials are provided. Guidance and details of a DNA mixture interpretation protocol is provided for application of the CPI/CPE method in the analysis of more complex forensic DNA mixtures. This description, in turn, should help reduce the variability of interpretation with application of this methodology and thereby improve the quality of DNA mixture interpretation throughout the forensic community.
Harris, Alex H S; Reeder, Rachelle; Hyun, Jenny K
2009-10-01
Journal editors and statistical reviewers are often in the difficult position of catching serious problems in submitted manuscripts after the research is conducted and data have been analyzed. We sought to learn from editors and reviewers of major psychiatry journals what common statistical and design problems they most often find in submitted manuscripts and what they wished to communicate to authors regarding these issues. Our primary goal was to facilitate communication between journal editors/reviewers and researchers/authors and thereby improve the scientific and statistical quality of research and submitted manuscripts. Editors and statistical reviewers of 54 high-impact psychiatry journals were surveyed to learn what statistical or design problems they encounter most often in submitted manuscripts. Respondents completed the survey online. The authors analyzed survey text responses using content analysis procedures to identify major themes related to commonly encountered statistical or research design problems. Editors and reviewers (n=15) who handle manuscripts from 39 different high-impact psychiatry journals responded to the survey. The most commonly cited problems regarded failure to map statistical models onto research questions, improper handling of missing data, not controlling for multiple comparisons, not understanding the difference between equivalence and difference trials, and poor controls in quasi-experimental designs. The scientific quality of psychiatry research and submitted reports could be greatly improved if researchers became sensitive to, or sought consultation on frequently encountered methodological and analytic issues.
Revealing how network structure affects accuracy of link prediction
NASA Astrophysics Data System (ADS)
Yang, Jin-Xuan; Zhang, Xiao-Dong
2017-08-01
Link prediction plays an important role in network reconstruction and network evolution. The network structure affects the accuracy of link prediction, which is an interesting problem. In this paper we use common neighbors and the Gini coefficient to reveal the relation between them, which can provide a good reference for the choice of a suitable link prediction algorithm according to the network structure. Moreover, the statistical analysis reveals correlation between the common neighbors index, Gini coefficient index and other indices to describe the network structure, such as Laplacian eigenvalues, clustering coefficient, degree heterogeneity, and assortativity of network. Furthermore, a new method to predict missing links is proposed. The experimental results show that the proposed algorithm yields better prediction accuracy and robustness to the network structure than existing currently used methods for a variety of real-world networks.
Multivariate meta-analysis: a robust approach based on the theory of U-statistic.
Ma, Yan; Mazumdar, Madhu
2011-10-30
Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.
A Tutorial in Bayesian Potential Outcomes Mediation Analysis.
Miočević, Milica; Gonzalez, Oscar; Valente, Matthew J; MacKinnon, David P
2018-01-01
Statistical mediation analysis is used to investigate intermediate variables in the relation between independent and dependent variables. Causal interpretation of mediation analyses is challenging because randomization of subjects to levels of the independent variable does not rule out the possibility of unmeasured confounders of the mediator to outcome relation. Furthermore, commonly used frequentist methods for mediation analysis compute the probability of the data given the null hypothesis, which is not the probability of a hypothesis given the data as in Bayesian analysis. Under certain assumptions, applying the potential outcomes framework to mediation analysis allows for the computation of causal effects, and statistical mediation in the Bayesian framework gives indirect effects probabilistic interpretations. This tutorial combines causal inference and Bayesian methods for mediation analysis so the indirect and direct effects have both causal and probabilistic interpretations. Steps in Bayesian causal mediation analysis are shown in the application to an empirical example.
Analyzing and improving surface texture by dual-rotation magnetorheological finishing
NASA Astrophysics Data System (ADS)
Wang, Yuyue; Zhang, Yun; Feng, Zhijing
2016-01-01
The main advantages of magnetorheological finishing (MRF) are its high convergence rate of surface error, the ability of polishing aspheric surfaces and nearly no subsurface damage. However, common MRF produces directional surface texture due to the constant flow direction of the magnetorheological (MR) polishing fluid. This paper studies the mechanism of surface texture formation by texture modeling. Dual-rotation magnetorheological finishing (DRMRF) is presented to suppress directional surface texture after analyzing the results of the texture model for common MRF. The results of the surface texture model for DRMRF and the proposed quantitative method based on mathematical statistics indicate the effective suppression of directional surface texture. An experimental setup is developed and experiments show directional surface texture and no directional surface texture in common MRF and DRMRF, respectively. As a result, the surface roughness of DRMRF is 0.578 nm (root-mean-square value) which is lower than 1.109 nm in common MRF.
Investigation of the prevalence and causes and of legal abortion of teenage married mothers in Iran.
Ghodrati, Fatemeh; Saadatmand, Narges; Gholamzadeh, Saeid; Akbarzadeh, Marzieh
2018-02-05
Background The therapeutic abortion law, in accordance with the fatwa issued by our Muslim jurisprudent approved by the parliament in 2005, has made major developments in dealing with cases of therapeutic abortions. Objective This study aimed at identifying the prevalence and causes of therapeutic abortion requests to the Legal Medicine Organization of Fars province, Shiraz, by pregnant teenager mothers. Methods This study was a retrospective, cross-sectional, descriptive survey. In this study, all documents related to therapeutic abortion requests from the Legal Medicine Organization of Fars province (southern Iran) from 2006 to 2013 were investigated. The total sample size included 1664, out of which 142 were teenagers. Sampling was carried out using Convenience method. Data were analyzed using SPSS statistical software, version 16, descriptive statistics and χ2. Results In this study, 142 mothers were under 20 years of age (8.5%). The prevalence of fetal abortion license requests was 110 (78.6%) and for maternal causes was 30 (21.4%). There was no significant statistical correlation between fetal causes in different years (p = 0.083). The most common causes of fetal abortion request were for thalassemia treatment in 78 cases (79.9%) followed by fetal malformations (20.9%); also, the most common maternal cause was thalassemia in 14 cases (51.9%) and depression in three cases (1.11%), respectively. Conclusion Our results showed that after approval of therapeutic abortion law, requests for therapeutic abortion due to fetal causes are extensively increasing. There is still a need for coordination of judicial, medical and legal authorities for prompt notification.
A comparative analysis of the statistical properties of large mobile phone calling networks.
Li, Ming-Xia; Jiang, Zhi-Qiang; Xie, Wen-Jie; Miccichè, Salvatore; Tumminello, Michele; Zhou, Wei-Xing; Mantegna, Rosario N
2014-05-30
Mobile phone calling is one of the most widely used communication methods in modern society. The records of calls among mobile phone users provide us a valuable proxy for the understanding of human communication patterns embedded in social networks. Mobile phone users call each other forming a directed calling network. If only reciprocal calls are considered, we obtain an undirected mutual calling network. The preferential communication behavior between two connected users can be statistically tested and it results in two Bonferroni networks with statistically validated edges. We perform a comparative analysis of the statistical properties of these four networks, which are constructed from the calling records of more than nine million individuals in Shanghai over a period of 110 days. We find that these networks share many common structural properties and also exhibit idiosyncratic features when compared with previously studied large mobile calling networks. The empirical findings provide us an intriguing picture of a representative large social network that might shed new lights on the modelling of large social networks.
Evaluation of peak-picking algorithms for protein mass spectrometry.
Bauer, Chris; Cramer, Rainer; Schuchhardt, Johannes
2011-01-01
Peak picking is an early key step in MS data analysis. We compare three commonly used approaches to peak picking and discuss their merits by means of statistical analysis. Methods investigated encompass signal-to-noise ratio, continuous wavelet transform, and a correlation-based approach using a Gaussian template. Functionality of the three methods is illustrated and discussed in a practical context using a mass spectral data set created with MALDI-TOF technology. Sensitivity and specificity are investigated using a manually defined reference set of peaks. As an additional criterion, the robustness of the three methods is assessed by a perturbation analysis and illustrated using ROC curves.
Geodesic Monte Carlo on Embedded Manifolds
Byrne, Simon; Girolami, Mark
2013-01-01
Markov chain Monte Carlo methods explicitly defined on the manifold of probability distributions have recently been established. These methods are constructed from diffusions across the manifold and the solution of the equations describing geodesic flows in the Hamilton–Jacobi representation. This paper takes the differential geometric basis of Markov chain Monte Carlo further by considering methods to simulate from probability distributions that themselves are defined on a manifold, with common examples being classes of distributions describing directional statistics. Proposal mechanisms are developed based on the geodesic flows over the manifolds of support for the distributions, and illustrative examples are provided for the hypersphere and Stiefel manifold of orthonormal matrices. PMID:25309024
ERIC Educational Resources Information Center
Sinharay, Sandip; Dorans, Neil J.
2010-01-01
The Mantel-Haenszel (MH) procedure (Mantel and Haenszel) is a popular method for estimating and testing a common two-factor association parameter in a 2 x 2 x K table. Holland and Holland and Thayer described how to use the procedure to detect differential item functioning (DIF) for tests with dichotomously scored items. Wang, Bradlow, Wainer, and…
Local sensitivity analysis for inverse problems solved by singular value decomposition
Hill, M.C.; Nolan, B.T.
2010-01-01
Local sensitivity analysis provides computationally frugal ways to evaluate models commonly used for resource management, risk assessment, and so on. This includes diagnosing inverse model convergence problems caused by parameter insensitivity and(or) parameter interdependence (correlation), understanding what aspects of the model and data contribute to measures of uncertainty, and identifying new data likely to reduce model uncertainty. Here, we consider sensitivity statistics relevant to models in which the process model parameters are transformed using singular value decomposition (SVD) to create SVD parameters for model calibration. The statistics considered include the PEST identifiability statistic, and combined use of the process-model parameter statistics composite scaled sensitivities and parameter correlation coefficients (CSS and PCC). The statistics are complimentary in that the identifiability statistic integrates the effects of parameter sensitivity and interdependence, while CSS and PCC provide individual measures of sensitivity and interdependence. PCC quantifies correlations between pairs or larger sets of parameters; when a set of parameters is intercorrelated, the absolute value of PCC is close to 1.00 for all pairs in the set. The number of singular vectors to include in the calculation of the identifiability statistic is somewhat subjective and influences the statistic. To demonstrate the statistics, we use the USDA’s Root Zone Water Quality Model to simulate nitrogen fate and transport in the unsaturated zone of the Merced River Basin, CA. There are 16 log-transformed process-model parameters, including water content at field capacity (WFC) and bulk density (BD) for each of five soil layers. Calibration data consisted of 1,670 observations comprising soil moisture, soil water tension, aqueous nitrate and bromide concentrations, soil nitrate concentration, and organic matter content. All 16 of the SVD parameters could be estimated by regression based on the range of singular values. Identifiability statistic results varied based on the number of SVD parameters included. Identifiability statistics calculated for four SVD parameters indicate the same three most important process-model parameters as CSS/PCC (WFC1, WFC2, and BD2), but the order differed. Additionally, the identifiability statistic showed that BD1 was almost as dominant as WFC1. The CSS/PCC analysis showed that this results from its high correlation with WCF1 (-0.94), and not its individual sensitivity. Such distinctions, combined with analysis of how high correlations and(or) sensitivities result from the constructed model, can produce important insights into, for example, the use of sensitivity analysis to design monitoring networks. In conclusion, the statistics considered identified similar important parameters. They differ because (1) with CSS/PCC can be more awkward because sensitivity and interdependence are considered separately and (2) identifiability requires consideration of how many SVD parameters to include. A continuing challenge is to understand how these computationally efficient methods compare with computationally demanding global methods like Markov-Chain Monte Carlo given common nonlinear processes and the often even more nonlinear models.
Hassan, Wafaa El-Sayed
2008-08-01
Three rapid, simple, reproducible and sensitive extractive colorimetric methods (A--C) for assaying dothiepin hydrochloride (I) and risperidone (II) in bulk sample and in dosage forms were investigated. Methods A and B are based on the formation of an ion pair complexes with methyl orange (A) and orange G (B), whereas method C depends on ternary complex formation between cobalt thiocyanate and the studied drug I or II. The optimum reaction conditions were investigated and it was observed the calibration curves resulting from the measurements of absorbance concentration relations of the extracted complexes were linear over the concentration range 0.1--12 microg ml(-1) for method A, 0.5--11 mug ml(-1) for method B, and 3.2--80 microg ml(-1) for method C with a relative standard deviation (RSD) of 1.17 and 1.28 for drug I and II, respectively. The molar absorptivity, Sandell sensitivity, Ringbom optimum concentration ranges, and detection and quantification limits for all complexes were calculated and evaluated at maximum wavelengths of 423, 498, and 625 nm, using methods A, B, and C, respectively. The interference from excipients commonly present in dosage forms and common degradation products was studied. The proposed methods are highly specific for the determination of drugs I and II, in their dosage forms applying the standard additions technique without any interference from common excipients. The proposed methods have been compared statistically to the reference methods and found to be simple, accurate (t-test) and reproducible (F-value).
Hill, Shannon B; Faradzhev, Nadir S; Powell, Cedric J
2017-12-01
We discuss the problem of quantifying common sources of statistical uncertainties for analyses of trace levels of surface contamination using X-ray photoelectron spectroscopy. We examine the propagation of error for peak-area measurements using common forms of linear and polynomial background subtraction including the correlation of points used to determine both background and peak areas. This correlation has been neglected in previous analyses, but we show that it contributes significantly to the peak-area uncertainty near the detection limit. We introduce the concept of relative background subtraction variance (RBSV) which quantifies the uncertainty introduced by the method of background determination relative to the uncertainty of the background area itself. The uncertainties of the peak area and atomic concentration and of the detection limit are expressed using the RBSV, which separates the contributions from the acquisition parameters, the background-determination method, and the properties of the measured spectrum. These results are then combined to find acquisition strategies that minimize the total measurement time needed to achieve a desired detection limit or atomic-percentage uncertainty for a particular trace element. Minimization of data-acquisition time is important for samples that are sensitive to x-ray dose and also for laboratories that need to optimize throughput.
Reliability of clinical guideline development using mail-only versus in-person expert panels.
Washington, Donna L; Bernstein, Steven J; Kahan, James P; Leape, Lucian L; Kamberg, Caren J; Shekelle, Paul G
2003-12-01
Clinical practice guidelines quickly become outdated. One reason they might not be updated as often as needed is the expense of collecting expert judgment regarding the evidence. The RAND-UCLA Appropriateness Method is one commonly used method for collecting expert opinion. We tested whether a less expensive, mail-only process could substitute for the standard in-person process normally used. We performed a 4-way replication of the appropriateness panel process for coronary revascularization and hysterectomy, conducting 3 panels using the conventional in-person method and 1 panel entirely by mail. All indications were classified as inappropriate or not (to evaluate overuse), and coronary revascularization indications were classified as necessary or not (to evaluate underuse). Kappa statistics were calculated for the comparison in ratings from the 2 methods. Agreement beyond chance between the 2 panel methods ranged from moderate to substantial. The kappa statistic to detect overuse was 0.57 for coronary revascularization and 0.70 for hysterectomy. The kappa statistic to detect coronary revascularization underuse was 0.76. There were no cases in which coronary revascularization was considered inappropriate by 1 method, but necessary or appropriate by the other. Three of 636 (0.5%) hysterectomy cases were categorized as inappropriate by 1 method but appropriate by the other. The reproducibility of the overuse and underuse assessments from the mail-only compared with the conventional in-person conduct of expert panels in this application was similar to the underlying reproducibility of the process. This suggests a potential role for updating guidelines using an expert judgment process conducted entirely through the mail.
Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.
Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P; Patterson, Nick; Price, Alkes L
2014-10-15
Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary materials are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Design and analysis of multiple diseases genome-wide association studies without controls.
Chen, Zhongxue; Huang, Hanwen; Ng, Hon Keung Tony
2012-11-15
In genome-wide association studies (GWAS), multiple diseases with shared controls is one of the case-control study designs. If data obtained from these studies are appropriately analyzed, this design can have several advantages such as improving statistical power in detecting associations and reducing the time and cost in the data collection process. In this paper, we propose a study design for GWAS which involves multiple diseases but without controls. We also propose corresponding statistical data analysis strategy for GWAS with multiple diseases but no controls. Through a simulation study, we show that the statistical association test with the proposed study design is more powerful than the test with single disease sharing common controls, and it has comparable power to the overall test based on the whole dataset including the controls. We also apply the proposed method to a real GWAS dataset to illustrate the methodologies and the advantages of the proposed design. Some possible limitations of this study design and testing method and their solutions are also discussed. Our findings indicate that the proposed study design and statistical analysis strategy could be more efficient than the usual case-control GWAS as well as those with shared controls. Copyright © 2012 Elsevier B.V. All rights reserved.
De Groote, Sandra L.; Blecic, Deborah D.; Martin, Kristin
2013-01-01
Objective: Libraries require efficient and reliable methods to assess journal use. Vendors provide complete counts of articles retrieved from their platforms. However, if a journal is available on multiple platforms, several sets of statistics must be merged. Link-resolver reports merge data from all platforms into one report but only record partial use because users can access library subscriptions from other paths. Citation data are limited to publication use. Vendor, link-resolver, and local citation data were examined to determine correlation. Because link-resolver statistics are easy to obtain, the study library especially wanted to know if they correlate highly with the other measures. Methods: Vendor, link-resolver, and local citation statistics for the study institution were gathered for health sciences journals. Spearman rank-order correlation coefficients were calculated. Results: There was a high positive correlation between all three data sets, with vendor data commonly showing the highest use. However, a small percentage of titles showed anomalous results. Discussion and Conclusions: Link-resolver data correlate well with vendor and citation data, but due to anomalies, low link-resolver data would best be used to suggest titles for further evaluation using vendor data. Citation data may not be needed as it correlates highly with other measures. PMID:23646026
A weighted U-statistic for genetic association analyses of sequencing data.
Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J; Lu, Qing
2014-12-01
With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. © 2014 WILEY PERIODICALS, INC.
NASA Astrophysics Data System (ADS)
Xu, Liangfei; Reimer, Uwe; Li, Jianqiu; Huang, Haiyan; Hu, Zunyan; Jiang, Hongliang; Janßen, Holger; Ouyang, Minggao; Lehnert, Werner
2018-02-01
City buses using polymer electrolyte membrane (PEM) fuel cells are considered to be the most likely fuel cell vehicles to be commercialized in China. The technical specifications of the fuel cell systems (FCSs) these buses are equipped with will differ based on the powertrain configurations and vehicle control strategies, but can generally be classified into the power-follow and soft-run modes. Each mode imposes different levels of electrochemical stress on the fuel cells. Evaluating the aging behavior of fuel cell stacks under the conditions encountered in fuel cell buses requires new durability test protocols based on statistical results obtained during actual driving tests. In this study, we propose a systematic design method for fuel cell durability test protocols that correspond to the power-follow mode based on three parameters for different fuel cell load ranges. The powertrain configurations and control strategy are described herein, followed by a presentation of the statistical data for the duty cycles of FCSs in one city bus in the demonstration project. Assessment protocols are presented based on the statistical results using mathematical optimization methods, and are compared to existing protocols with respect to common factors, such as time at open circuit voltage and root-mean-square power.
Nakagawa, Shinichi; Johnson, Paul C D; Schielzeth, Holger
2017-09-01
The coefficient of determination R 2 quantifies the proportion of variance explained by a statistical model and is an important summary statistic of biological interest. However, estimating R 2 for generalized linear mixed models (GLMMs) remains challenging. We have previously introduced a version of R 2 that we called [Formula: see text] for Poisson and binomial GLMMs, but not for other distributional families. Similarly, we earlier discussed how to estimate intra-class correlation coefficients (ICCs) using Poisson and binomial GLMMs. In this paper, we generalize our methods to all other non-Gaussian distributions, in particular to negative binomial and gamma distributions that are commonly used for modelling biological data. While expanding our approach, we highlight two useful concepts for biologists, Jensen's inequality and the delta method, both of which help us in understanding the properties of GLMMs. Jensen's inequality has important implications for biologically meaningful interpretation of GLMMs, whereas the delta method allows a general derivation of variance associated with non-Gaussian distributions. We also discuss some special considerations for binomial GLMMs with binary or proportion data. We illustrate the implementation of our extension by worked examples from the field of ecology and evolution in the R environment. However, our method can be used across disciplines and regardless of statistical environments. © 2017 The Author(s).
Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application
Cantor, Rita M.; Lange, Kenneth; Sinsheimer, Janet S.
2010-01-01
Genome-wide association studies (GWAS) have rapidly become a standard method for disease gene discovery. A substantial number of recent GWAS indicate that for most disorders, only a few common variants are implicated and the associated SNPs explain only a small fraction of the genetic risk. This review is written from the viewpoint that findings from the GWAS provide preliminary genetic information that is available for additional analysis by statistical procedures that accumulate evidence, and that these secondary analyses are very likely to provide valuable information that will help prioritize the strongest constellations of results. We review and discuss three analytic methods to combine preliminary GWAS statistics to identify genes, alleles, and pathways for deeper investigations. Meta-analysis seeks to pool information from multiple GWAS to increase the chances of finding true positives among the false positives and provides a way to combine associations across GWAS, even when the original data are unavailable. Testing for epistasis within a single GWAS study can identify the stronger results that are revealed when genes interact. Pathway analysis of GWAS results is used to prioritize genes and pathways within a biological context. Following a GWAS, association results can be assigned to pathways and tested in aggregate with computational tools and pathway databases. Reviews of published methods with recommendations for their application are provided within the framework for each approach. PMID:20074509
An observational method for fast stochastic X-ray polarimetry timing
NASA Astrophysics Data System (ADS)
Ingram, Adam R.; Maccarone, Thomas J.
2017-11-01
The upcoming launch of the first space based X-ray polarimeter in ˜40 yr will provide powerful new diagnostic information to study accreting compact objects. In particular, analysis of rapid variability of the polarization degree and angle will provide the opportunity to probe the relativistic motions of material in the strong gravitational fields close to the compact objects, and enable new methods to measure black hole and neutron star parameters. However, polarization properties are measured in a statistical sense, and a statistically significant polarization detection requires a fairly long exposure, even for the brightest objects. Therefore, the sub-minute time-scales of interest are not accessible using a direct time-resolved analysis of polarization degree and angle. Phase-folding can be used for coherent pulsations, but not for stochastic variability such as quasi-periodic oscillations. Here, we introduce a Fourier method that enables statistically robust detection of stochastic polarization variability for arbitrarily short variability time-scales. Our method is analogous to commonly used spectral-timing techniques. We find that it should be possible in the near future to detect the quasi-periodic swings in polarization angle predicted by Lense-Thirring precession of the inner accretion flow. This is contingent on the mean polarization degree of the source being greater than ˜4-5 per cent, which is consistent with the best current constraints on Cygnus X-1 from the late 1970s.
Nonlinear multivariate and time series analysis by neural network methods
NASA Astrophysics Data System (ADS)
Hsieh, William W.
2004-03-01
Methods in multivariate statistical analysis are essential for working with large amounts of geophysical data, data from observational arrays, from satellites, or from numerical model output. In classical multivariate statistical analysis, there is a hierarchy of methods, starting with linear regression at the base, followed by principal component analysis (PCA) and finally canonical correlation analysis (CCA). A multivariate time series method, the singular spectrum analysis (SSA), has been a fruitful extension of the PCA technique. The common drawback of these classical methods is that only linear structures can be correctly extracted from the data. Since the late 1980s, neural network methods have become popular for performing nonlinear regression and classification. More recently, neural network methods have been extended to perform nonlinear PCA (NLPCA), nonlinear CCA (NLCCA), and nonlinear SSA (NLSSA). This paper presents a unified view of the NLPCA, NLCCA, and NLSSA techniques and their applications to various data sets of the atmosphere and the ocean (especially for the El Niño-Southern Oscillation and the stratospheric quasi-biennial oscillation). These data sets reveal that the linear methods are often too simplistic to describe real-world systems, with a tendency to scatter a single oscillatory phenomenon into numerous unphysical modes or higher harmonics, which can be largely alleviated in the new nonlinear paradigm.
NASA Astrophysics Data System (ADS)
Mousavi Anzehaee, Mohammad; Adib, Ahmad; Heydarzadeh, Kobra
2015-10-01
The manner of microtremor data collection and filtering operation and also the method used for processing have a considerable effect on the accuracy of estimation of dynamic soil parameters. In this paper, running variance method was used to improve the automatic detection of data sections infected by local perturbations. In this method, the microtremor data running variance is computed using a sliding window. Then the obtained signal is used to remove the ranges of data affected by perturbations from the original data. Additionally, to determinate the fundamental frequency of a site, this study has proposed a statistical characteristics-based method. Actually, statistical characteristics, such as the probability density graph and the average and the standard deviation of all the frequencies corresponding to the maximum peaks in the H/ V spectra of all data windows, are used to differentiate the real peaks from the false peaks resulting from perturbations. The methods have been applied to the data recorded for the City of Meybod in central Iran. Experimental results show that the applied methods are able to successfully reduce the effects of extensive local perturbations on microtremor data and eventually to estimate the fundamental frequency more accurately compared to other common methods.
The analysis of morphometric data on rocky mountain wolves and artic wolves using statistical method
NASA Astrophysics Data System (ADS)
Ammar Shafi, Muhammad; Saifullah Rusiman, Mohd; Hamzah, Nor Shamsidah Amir; Nor, Maria Elena; Ahmad, Noor’ani; Azia Hazida Mohamad Azmi, Nur; Latip, Muhammad Faez Ab; Hilmi Azman, Ahmad
2018-04-01
Morphometrics is a quantitative analysis depending on the shape and size of several specimens. Morphometric quantitative analyses are commonly used to analyse fossil record, shape and size of specimens and others. The aim of the study is to find the differences between rocky mountain wolves and arctic wolves based on gender. The sample utilised secondary data which included seven variables as independent variables and two dependent variables. Statistical modelling was used in the analysis such was the analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA). The results showed there exist differentiating results between arctic wolves and rocky mountain wolves based on independent factors and gender.
Dynamic principle for ensemble control tools.
Samoletov, A; Vasiev, B
2017-11-28
Dynamical equations describing physical systems in contact with a thermal bath are commonly extended by mathematical tools called "thermostats." These tools are designed for sampling ensembles in statistical mechanics. Here we propose a dynamic principle underlying a range of thermostats which is derived using fundamental laws of statistical physics and ensures invariance of the canonical measure. The principle covers both stochastic and deterministic thermostat schemes. Our method has a clear advantage over a range of proposed and widely used thermostat schemes that are based on formal mathematical reasoning. Following the derivation of the proposed principle, we show its generality and illustrate its applications including design of temperature control tools that differ from the Nosé-Hoover-Langevin scheme.
Selected Statistics from the Common Core of Data: School Year 2011-12. First Look. NCES 2013-441
ERIC Educational Resources Information Center
Keaton, Patrick
2013-01-01
The Common Core of Data (CCD) is an annual collection of public elementary and secondary education data by the National Center for Education Statistics (NCES) in the Institute of Education Sciences of the U.S. Department of Education. The data presented in this report are selected from the three nonfiscal components of the Common Core of Data…
Bonn, Bernadine A.
2008-01-01
A long-term method detection level (LT-MDL) and laboratory reporting level (LRL) are used by the U.S. Geological Survey?s National Water Quality Laboratory (NWQL) when reporting results from most chemical analyses of water samples. Changing to this method provided data users with additional information about their data and often resulted in more reported values in the low concentration range. Before this method was implemented, many of these values would have been censored. The use of the LT-MDL and LRL presents some challenges for the data user. Interpreting data in the low concentration range increases the need for adequate quality assurance because even small contamination or recovery problems can be relatively large compared to concentrations near the LT-MDL and LRL. In addition, the definition of the LT-MDL, as well as the inclusion of low values, can result in complex data sets with multiple censoring levels and reported values that are less than a censoring level. Improper interpretation or statistical manipulation of low-range results in these data sets can result in bias and incorrect conclusions. This document is designed to help data users use and interpret data reported with the LTMDL/ LRL method. The calculation and application of the LT-MDL and LRL are described. This document shows how to extract statistical information from the LT-MDL and LRL and how to use that information in USGS investigations, such as assessing the quality of field data, interpreting field data, and planning data collection for new projects. A set of 19 detailed examples are included in this document to help data users think about their data and properly interpret lowrange data without introducing bias. Although this document is not meant to be a comprehensive resource of statistical methods, several useful methods of analyzing censored data are demonstrated, including Regression on Order Statistics and Kaplan-Meier Estimation. These two statistical methods handle complex censored data sets without resorting to substitution, thereby avoiding a common source of bias and inaccuracy.
Wingate, Savanna; Sng, Eveleen; Loprinzi, Paul D
2018-01-01
Background: The purpose of this study was to evaluate the extent, if any, that the association between socio-ecological parameters and physical activity may be influenced by common method bias (CMB). Methods: This study took place between February and May of 2017 at a Southeastern University in the United States. A randomized controlled experiment was employed among 119 young adults.Participants were randomized into either group 1 (the group we attempted to minimize CMB)or group 2 (control group). In group 1, CMB was minimized via various procedural remedies,such as separating the measurement of predictor and criterion variables by introducing a time lag (temporal; 2 visits several days apart), creating a cover story (psychological), and approximating measures to have data collected in different media (computer-based vs. paper and pencil) and different locations to control method variance when collecting self-report measures from the same source. Socio-ecological parameters (self-efficacy; friend support; family support)and physical activity were self-reported. Results: Exercise self-efficacy was significantly associated with physical activity. This association (β = 0.74, 95% CI: 0.33-1.1; P = 0.001) was only observed in group 2 (control), but not in group 1 (experimental group) (β = 0.03; 95% CI: -0.57-0.63; P = 0.91). The difference in these coefficients (i.e., β = 0.74 vs. β = 0.03) was statistically significant (P = 0.04). Conclusion: Future research in this field, when feasible, may wish to consider employing procedural and statistical remedies to minimize CMB.
Khalili, Azizeh Farshbaf; Shahnazi, Mahnaz
2010-04-01
Breast cancer is the most common cancer and the most common cause of death in Iranian women aged 35-55 years. Breast cancer screening comprises breast self-examination (BSE), clinical breast examination (CBE) and mammography. The study aimed to examine the performance of screening methods among women referring to health centers of Tabriz, Iran. This was a descriptive-analytical research carried out on 400 women aged 20-50 years. The samples were chosen through random multistage sampling among health centers of Tabriz then active records of women. A questionnaire and observational checklist was used to elicit socio-demographic information and performance of women towards breast cancer screening methods. Descriptive and inferential statistics (chi-square and Fisher's exact test) were used to analyze the data. Only 18.8% of women did breast self-examination, 19.1% had clinical breast examination and 3.3% had mammogram. Statistical test showed a significant relationship between performing BSE and educational level, employment, income, number of children, breastfeeding history, breastfeeding quality and family history of breast cancer. There was a significant correlation between performing CBE and history of breast tumor and also, between performing the mammography and family history of breast cancer and history of breast tumor (P < 0.05). The findings showed that the performance of breast cancer screening methods was not satisfactory. Performance in high risk women was very desirable than others. The presentation of imperative education about breast cancer screening methods through health staff especially in pregnancy, post-partum and even in pre marriage counseling periods seems necessary.
Wingate, Savanna; Sng, Eveleen; Loprinzi, Paul D.
2018-01-01
Background: The purpose of this study was to evaluate the extent, if any, that the association between socio-ecological parameters and physical activity may be influenced by common method bias (CMB). Methods: This study took place between February and May of 2017 at a Southeastern University in the United States. A randomized controlled experiment was employed among 119 young adults.Participants were randomized into either group 1 (the group we attempted to minimize CMB)or group 2 (control group). In group 1, CMB was minimized via various procedural remedies,such as separating the measurement of predictor and criterion variables by introducing a time lag (temporal; 2 visits several days apart), creating a cover story (psychological), and approximating measures to have data collected in different media (computer-based vs. paper and pencil) and different locations to control method variance when collecting self-report measures from the same source. Socio-ecological parameters (self-efficacy; friend support; family support)and physical activity were self-reported. Results: Exercise self-efficacy was significantly associated with physical activity. This association (β = 0.74, 95% CI: 0.33-1.1; P = 0.001) was only observed in group 2 (control), but not in group 1 (experimental group) (β = 0.03; 95% CI: -0.57-0.63; P = 0.91). The difference in these coefficients (i.e., β = 0.74 vs. β = 0.03) was statistically significant (P = 0.04). Conclusion: Future research in this field, when feasible, may wish to consider employing procedural and statistical remedies to minimize CMB. PMID:29423361
Mean-field approximation for spacing distribution functions in classical systems.
González, Diego Luis; Pimpinelli, Alberto; Einstein, T L
2012-01-01
We propose a mean-field method to calculate approximately the spacing distribution functions p((n))(s) in one-dimensional classical many-particle systems. We compare our method with two other commonly used methods, the independent interval approximation and the extended Wigner surmise. In our mean-field approach, p((n))(s) is calculated from a set of Langevin equations, which are decoupled by using a mean-field approximation. We find that in spite of its simplicity, the mean-field approximation provides good results in several systems. We offer many examples illustrating that the three previously mentioned methods give a reasonable description of the statistical behavior of the system. The physical interpretation of each method is also discussed. © 2012 American Physical Society
An Independent Filter for Gene Set Testing Based on Spectral Enrichment.
Frost, H Robert; Li, Zhigang; Asselbergs, Folkert W; Moore, Jason H
2015-01-01
Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.
The statistical average of optical properties for alumina particle cluster in aircraft plume
NASA Astrophysics Data System (ADS)
Li, Jingying; Bai, Lu; Wu, Zhensen; Guo, Lixin
2018-04-01
We establish a model for lognormal distribution of monomer radius and number of alumina particle clusters in plume. According to the Multi-Sphere T Matrix (MSTM) theory, we provide a method for finding the statistical average of optical properties for alumina particle clusters in plume, analyze the effect of different distributions and different detection wavelengths on the statistical average of optical properties for alumina particle cluster, and compare the statistical average optical properties under the alumina particle cluster model established in this study and those under three simplified alumina particle models. The calculation results show that the monomer number of alumina particle cluster and its size distribution have a considerable effect on its statistical average optical properties. The statistical average of optical properties for alumina particle cluster at common detection wavelengths exhibit obvious differences, whose differences have a great effect on modeling IR and UV radiation properties of plume. Compared with the three simplified models, the alumina particle cluster model herein features both higher extinction and scattering efficiencies. Therefore, we may find that an accurate description of the scattering properties of alumina particles in aircraft plume is of great significance in the study of plume radiation properties.
Alternative Test Methods for Electronic Parts
NASA Technical Reports Server (NTRS)
Plante, Jeannette
2004-01-01
It is common practice within NASA to test electronic parts at the manufacturing lot level to demonstrate, statistically, that parts from the lot tested will not fail in service using generic application conditions. The test methods and the generic application conditions used have been developed over the years through cooperation between NASA, DoD, and industry in order to establish a common set of standard practices. These common practices, found in MIL-STD-883, MIL-STD-750, military part specifications, EEE-INST-002, and other guidelines are preferred because they are considered to be effective and repeatable and their results are usually straightforward to interpret. These practices can sometimes be unavailable to some NASA projects due to special application conditions that must be addressed, such as schedule constraints, cost constraints, logistical constraints, or advances in the technology that make the historical standards an inappropriate choice for establishing part performance and reliability. Alternate methods have begun to emerge and to be used by NASA programs to test parts individually or as part of a system, especially when standard lot tests cannot be applied. Four alternate screening methods will be discussed in this paper: Highly accelerated life test (HALT), forward voltage drop tests for evaluating wire-bond integrity, burn-in options during or after highly accelerated stress test (HAST), and board-level qualification.
Data series embedding and scale invariant statistics.
Michieli, I; Medved, B; Ristov, S
2010-06-01
Data sequences acquired from bio-systems such as human gait data, heart rate interbeat data, or DNA sequences exhibit complex dynamics that is frequently described by a long-memory or power-law decay of autocorrelation function. One way of characterizing that dynamics is through scale invariant statistics or "fractal-like" behavior. For quantifying scale invariant parameters of physiological signals several methods have been proposed. Among them the most common are detrended fluctuation analysis, sample mean variance analyses, power spectral density analysis, R/S analysis, and recently in the realm of the multifractal approach, wavelet analysis. In this paper it is demonstrated that embedding the time series data in the high-dimensional pseudo-phase space reveals scale invariant statistics in the simple fashion. The procedure is applied on different stride interval data sets from human gait measurements time series (Physio-Bank data library). Results show that introduced mapping adequately separates long-memory from random behavior. Smaller gait data sets were analyzed and scale-free trends for limited scale intervals were successfully detected. The method was verified on artificially produced time series with known scaling behavior and with the varying content of noise. The possibility for the method to falsely detect long-range dependence in the artificially generated short range dependence series was investigated. (c) 2009 Elsevier B.V. All rights reserved.
Reentry survivability modeling
NASA Astrophysics Data System (ADS)
Fudge, Michael L.; Maher, Robert L.
1997-10-01
Statistical methods for expressing the impact risk posed to space systems in general [and the International Space Station (ISS) in particular] by other resident space objects have been examined. One of the findings of this investigation is that there are legitimate physical modeling reasons for the common statistical expression of the collision risk. A combination of statistical methods and physical modeling is also used to express the impact risk posed by re-entering space systems to objects of interest (e.g., people and property) on Earth. One of the largest uncertainties in the expressing of this risk is the estimation of survivable material which survives reentry to impact Earth's surface. This point was recently demonstrated in dramatic fashion by the impact of an intact expendable launch vehicle (ELV) upper stage near a private residence in the continental United States. Since approximately half of the missions supporting ISS will utilize ELVs, it is appropriate to examine the methods used to estimate the amount and physical characteristics of ELV debris surviving reentry to impact Earth's surface. This paper examines reentry survivability estimation methodology, including the specific methodology used by Caiman Sciences' 'Survive' model. Comparison between empirical results (observations of objects which have been recovered on Earth after surviving reentry) and Survive estimates are presented for selected upper stage or spacecraft components and a Delta launch vehicle second stage.
Mukaka, Mavuto; White, Sarah A; Terlouw, Dianne J; Mwapasa, Victor; Kalilani-Phiri, Linda; Faragher, E Brian
2016-07-22
Missing outcomes can seriously impair the ability to make correct inferences from randomized controlled trials (RCTs). Complete case (CC) analysis is commonly used, but it reduces sample size and is perceived to lead to reduced statistical efficiency of estimates while increasing the potential for bias. As multiple imputation (MI) methods preserve sample size, they are generally viewed as the preferred analytical approach. We examined this assumption, comparing the performance of CC and MI methods to determine risk difference (RD) estimates in the presence of missing binary outcomes. We conducted simulation studies of 5000 simulated data sets with 50 imputations of RCTs with one primary follow-up endpoint at different underlying levels of RD (3-25 %) and missing outcomes (5-30 %). For missing at random (MAR) or missing completely at random (MCAR) outcomes, CC method estimates generally remained unbiased and achieved precision similar to or better than MI methods, and high statistical coverage. Missing not at random (MNAR) scenarios yielded invalid inferences with both methods. Effect size estimate bias was reduced in MI methods by always including group membership even if this was unrelated to missingness. Surprisingly, under MAR and MCAR conditions in the assessed scenarios, MI offered no statistical advantage over CC methods. While MI must inherently accompany CC methods for intention-to-treat analyses, these findings endorse CC methods for per protocol risk difference analyses in these conditions. These findings provide an argument for the use of the CC approach to always complement MI analyses, with the usual caveat that the validity of the mechanism for missingness be thoroughly discussed. More importantly, researchers should strive to collect as much data as possible.
Melroy, Christopher T; Dubin, Marc G; Hardy, Stuart M; Senior, Brent A
2006-01-01
The aim of this study was to compare three common methods (transillumination, plain radiographs, and computerized tomography [CT] image guidance) for estimating the position and extent of pneumatization of the frontal sinus in osteoplastic flap surgery. Axial CT scans and 6-ft Caldwell radiographs were performed on 10 cadaver heads. For each head, soft tissue overlying the frontal bone was raised and the anticipated position and extent of the frontal sinus at four points was marked using three common methods. The silhouette of the frontal sinus from the Caldwell plain radiograph was excised and placed in position. Four points at the periphery also were made using information obtained from a passive optically guided image-guided surgery device, and transillumination via a frontal trephination also was used to estimate sinus extent. The true sinus size was measured at each point and compared with experimental values. The use of CT image guidance generated the least difference between measured and actual values (mean = 1.91 mm; SEM = 0.29); this method was found statistically superior to Caldwell (p = 0.040) and transillumination (p = 0.007). Image guidance did not overestimate the size of the sinus (0/36) and was quicker than the Caldwell approach (8.5 versus 11.5 minutes). There was no learning curve appreciated with image guidance. Accurate and precise estimation of the position and extent of the frontal sinus is crucial when performing osteoplastic flap surgery. Use of CT image guidance was statistically superior to Caldwell and transillumination methods and proved to be safe, reproducible, economic, and easy to learn.
An appraisal of statistical procedures used in derivation of reference intervals.
Ichihara, Kiyoshi; Boyd, James C
2010-11-01
When conducting studies to derive reference intervals (RIs), various statistical procedures are commonly applied at each step, from the planning stages to final computation of RIs. Determination of the necessary sample size is an important consideration, and evaluation of at least 400 individuals in each subgroup has been recommended to establish reliable common RIs in multicenter studies. Multiple regression analysis allows identification of the most important factors contributing to variation in test results, while accounting for possible confounding relationships among these factors. Of the various approaches proposed for judging the necessity of partitioning reference values, nested analysis of variance (ANOVA) is the likely method of choice owing to its ability to handle multiple groups and being able to adjust for multiple factors. Box-Cox power transformation often has been used to transform data to a Gaussian distribution for parametric computation of RIs. However, this transformation occasionally fails. Therefore, the non-parametric method based on determination of the 2.5 and 97.5 percentiles following sorting of the data, has been recommended for general use. The performance of the Box-Cox transformation can be improved by introducing an additional parameter representing the origin of transformation. In simulations, the confidence intervals (CIs) of reference limits (RLs) calculated by the parametric method were narrower than those calculated by the non-parametric approach. However, the margin of difference was rather small owing to additional variability in parametrically-determined RLs introduced by estimation of parameters for the Box-Cox transformation. The parametric calculation method may have an advantage over the non-parametric method in allowing identification and exclusion of extreme values during RI computation.
Statistical inference for extended or shortened phase II studies based on Simon's two-stage designs.
Zhao, Junjun; Yu, Menggang; Feng, Xi-Ping
2015-06-07
Simon's two-stage designs are popular choices for conducting phase II clinical trials, especially in the oncology trials to reduce the number of patients placed on ineffective experimental therapies. Recently Koyama and Chen (2008) discussed how to conduct proper inference for such studies because they found that inference procedures used with Simon's designs almost always ignore the actual sampling plan used. In particular, they proposed an inference method for studies when the actual second stage sample sizes differ from planned ones. We consider an alternative inference method based on likelihood ratio. In particular, we order permissible sample paths under Simon's two-stage designs using their corresponding conditional likelihood. In this way, we can calculate p-values using the common definition: the probability of obtaining a test statistic value at least as extreme as that observed under the null hypothesis. In addition to providing inference for a couple of scenarios where Koyama and Chen's method can be difficult to apply, the resulting estimate based on our method appears to have certain advantage in terms of inference properties in many numerical simulations. It generally led to smaller biases and narrower confidence intervals while maintaining similar coverages. We also illustrated the two methods in a real data setting. Inference procedures used with Simon's designs almost always ignore the actual sampling plan. Reported P-values, point estimates and confidence intervals for the response rate are not usually adjusted for the design's adaptiveness. Proper statistical inference procedures should be used.
Multivariate meta-analysis: potential and promise.
Jackson, Dan; Riley, Richard; White, Ian R
2011-09-10
The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day 'Multivariate meta-analysis' event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd.
Multivariate meta-analysis: Potential and promise
Jackson, Dan; Riley, Richard; White, Ian R
2011-01-01
The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day ‘Multivariate meta-analysis’ event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd. PMID:21268052
Informativeness of Diagnostic Marker Values and the Impact of Data Grouping.
Ma, Hua; Bandos, Andriy I; Gur, David
2018-01-01
Assessing performance of diagnostic markers is a necessary step for their use in decision making regarding various conditions of interest in diagnostic medicine and other fields. Globally useful markers could, however, have ranges of values that are " diagnostically non-informative" . This paper demonstrates that the presence of marker values from diagnostically non-informative ranges could lead to a loss in statistical efficiency during nonparametric evaluation and shows that grouping non-informative values provides a natural resolution to this problem. These points are theoretically proven and an extensive simulation study is conducted to illustrate the possible benefits of using grouped marker values in a number of practically reasonable scenarios. The results contradict the common conjecture regarding the detrimental effect of grouped marker values during performance assessments. Specifically, contrary to the common assumption that grouped marker values lead to bias, grouping non-informative values does not introduce bias and could substantially reduce sampling variability. The proven concept that grouped marker values could be statistically beneficial without detrimental consequences implies that in practice, tied values do not always require resolution whereas the use of continuous diagnostic results without addressing diagnostically non-informative ranges could be statistically detrimental. Based on these findings, more efficient methods for evaluating diagnostic markers could be developed.
A statistical approach to root system classification
Bodner, Gernot; Leitner, Daniel; Nakhforoosh, Alireza; Sobotik, Monika; Moder, Karl; Kaul, Hans-Peter
2013-01-01
Plant root systems have a key role in ecology and agronomy. In spite of fast increase in root studies, still there is no classification that allows distinguishing among distinctive characteristics within the diversity of rooting strategies. Our hypothesis is that a multivariate approach for “plant functional type” identification in ecology can be applied to the classification of root systems. The classification method presented is based on a data-defined statistical procedure without a priori decision on the classifiers. The study demonstrates that principal component based rooting types provide efficient and meaningful multi-trait classifiers. The classification method is exemplified with simulated root architectures and morphological field data. Simulated root architectures showed that morphological attributes with spatial distribution parameters capture most distinctive features within root system diversity. While developmental type (tap vs. shoot-borne systems) is a strong, but coarse classifier, topological traits provide the most detailed differentiation among distinctive groups. Adequacy of commonly available morphologic traits for classification is supported by field data. Rooting types emerging from measured data, mainly distinguished by diameter/weight and density dominated types. Similarity of root systems within distinctive groups was the joint result of phylogenetic relation and environmental as well as human selection pressure. We concluded that the data-define classification is appropriate for integration of knowledge obtained with different root measurement methods and at various scales. Currently root morphology is the most promising basis for classification due to widely used common measurement protocols. To capture details of root diversity efforts in architectural measurement techniques are essential. PMID:23914200
A statistical approach to root system classification.
Bodner, Gernot; Leitner, Daniel; Nakhforoosh, Alireza; Sobotik, Monika; Moder, Karl; Kaul, Hans-Peter
2013-01-01
Plant root systems have a key role in ecology and agronomy. In spite of fast increase in root studies, still there is no classification that allows distinguishing among distinctive characteristics within the diversity of rooting strategies. Our hypothesis is that a multivariate approach for "plant functional type" identification in ecology can be applied to the classification of root systems. The classification method presented is based on a data-defined statistical procedure without a priori decision on the classifiers. The study demonstrates that principal component based rooting types provide efficient and meaningful multi-trait classifiers. The classification method is exemplified with simulated root architectures and morphological field data. Simulated root architectures showed that morphological attributes with spatial distribution parameters capture most distinctive features within root system diversity. While developmental type (tap vs. shoot-borne systems) is a strong, but coarse classifier, topological traits provide the most detailed differentiation among distinctive groups. Adequacy of commonly available morphologic traits for classification is supported by field data. Rooting types emerging from measured data, mainly distinguished by diameter/weight and density dominated types. Similarity of root systems within distinctive groups was the joint result of phylogenetic relation and environmental as well as human selection pressure. We concluded that the data-define classification is appropriate for integration of knowledge obtained with different root measurement methods and at various scales. Currently root morphology is the most promising basis for classification due to widely used common measurement protocols. To capture details of root diversity efforts in architectural measurement techniques are essential.
Sanford, Dominic E; Woolsey, Cheryl A; Hall, Bruce L; Linehan, David C; Hawkins, William G; Fields, Ryan C; Strasberg, Steven M
2014-09-01
NSQIP and the Accordion Severity Grading System have recently been used to develop quantitative methods for measuring the burden of postoperative complications. However, other audit methods such as chart reviews and prospective institutional databases are commonly used to gather postoperative complications. The purpose of this study was to evaluate discordance between different audit methods in pancreatoduodenectomy--a common major surgical procedure. The chief aim was to determine how these different methods could affect quantitative evaluations of postoperative complications. Three common audit methods were compared with NSQIP in 84 patients who underwent pancreatoduodenectomy. The methods were use of a prospective database, a chart review based on discharge summaries only, and a detailed retrospective chart review. The methods were evaluated for discordance with NSQIP and among themselves. Severity grading was performed using the Modified Accordion System. Fifty-three complications were listed by NSQIP and 31 complications were identified that were not listed by NSQIP. There was poor agreement for NSQIP-type complications between NSQIP and the other audit methods for mild and moderate complications (kappa 0.381 to 0.744), but excellent agreement for severe complications (kappa 0.953 to 1.00). Discordance was usually due to variations in definition of the complications in non-NSQIP methods. There was good agreement among non-NSQIP methods for non-NSQIP complications for moderate and severe complications, but not for mild complications. There are important differences in perceived surgical outcomes based on the method of complication retrieval. The non-NSQIP methods used in this study could not be substituted for NSQIP in a quantitative analysis unless that analysis was limited to severe complications. Copyright © 2014 American College of Surgeons. Published by Elsevier Inc. All rights reserved.
Arano, Ichiro; Sugimoto, Tomoyuki; Hamasaki, Toshimitsu; Ohno, Yuko
2010-04-23
Survival analysis methods such as the Kaplan-Meier method, log-rank test, and Cox proportional hazards regression (Cox regression) are commonly used to analyze data from randomized withdrawal studies in patients with major depressive disorder. However, unfortunately, such common methods may be inappropriate when a long-term censored relapse-free time appears in data as the methods assume that if complete follow-up were possible for all individuals, each would eventually experience the event of interest. In this paper, to analyse data including such a long-term censored relapse-free time, we discuss a semi-parametric cure regression (Cox cure regression), which combines a logistic formulation for the probability of occurrence of an event with a Cox proportional hazards specification for the time of occurrence of the event. In specifying the treatment's effect on disease-free survival, we consider the fraction of long-term survivors and the risks associated with a relapse of the disease. In addition, we develop a tree-based method for the time to event data to identify groups of patients with differing prognoses (cure survival CART). Although analysis methods typically adapt the log-rank statistic for recursive partitioning procedures, the method applied here used a likelihood ratio (LR) test statistic from a fitting of cure survival regression assuming exponential and Weibull distributions for the latency time of relapse. The method is illustrated using data from a sertraline randomized withdrawal study in patients with major depressive disorder. We concluded that Cox cure regression reveals facts on who may be cured, and how the treatment and other factors effect on the cured incidence and on the relapse time of uncured patients, and that cure survival CART output provides easily understandable and interpretable information, useful both in identifying groups of patients with differing prognoses and in utilizing Cox cure regression models leading to meaningful interpretations.
Machine Learning for Detecting Gene-Gene Interactions
McKinney, Brett A.; Reif, David M.; Ritchie, Marylyn D.; Moore, Jason H.
2011-01-01
Complex interactions among genes and environmental factors are known to play a role in common human disease aetiology. There is a growing body of evidence to suggest that complex interactions are ‘the norm’ and, rather than amounting to a small perturbation to classical Mendelian genetics, interactions may be the predominant effect. Traditional statistical methods are not well suited for detecting such interactions, especially when the data are high dimensional (many attributes or independent variables) or when interactions occur between more than two polymorphisms. In this review, we discuss machine-learning models and algorithms for identifying and characterising susceptibility genes in common, complex, multifactorial human diseases. We focus on the following machine-learning methods that have been used to detect gene-gene interactions: neural networks, cellular automata, random forests, and multifactor dimensionality reduction. We conclude with some ideas about how these methods and others can be integrated into a comprehensive and flexible framework for data mining and knowledge discovery in human genetics. PMID:16722772
MDD diagnosis based on partial-brain functional connection network
NASA Astrophysics Data System (ADS)
Yan, Gaoliang; Hu, Hailong; Zhao, Xiang; Zhang, Lin; Qu, Zehui; Li, Yantao
2018-04-01
Artificial intelligence (AI) is a hotspot in computer science research nowadays. To apply AI technology in all industries has been the developing direction for researchers. Major depressive disorder (MDD) is a common disease of serious mental disorders. The World Health Organization (WHO) reports that MDD is projected to become the second most common cause of death and disability by 2020. At present, the way of MDD diagnosis is single. Applying AI technology to MDD diagnosis and pathophysiological research will speed up the MDD research and improve the efficiency of MDD diagnosis. In this study, we select the higher degree of brain network functional connectivity by statistical methods. And our experiments show that the average accuracy of Logistic Regression (LR) classifier using feature filtering reaches 88.48%. Compared with other classification methods, both the efficiency and accuracy of this method are improved, which will greatly improve the process of MDD diagnose. In these experiments, we also define the brain regions associated with MDD, which plays a vital role in MDD pathophysiological research.
Implementation of direct LSC method for diesel samples on the fuel market.
Krištof, Romana; Hirsch, Marko; Kožar Logar, Jasmina
2014-11-01
The European Union develops common EU policy and strategy on biofuels and sustainable bio-economy through several documents. The encouragement of biofuel's consumption is therefore the obligation of each EU member state. The situation in Slovenian fuel market is presented and compared with other EU countries in the frame of prescribed values from EU directives. Diesel is the most common fuel for transportation needs in Slovenia. The study was therefore performed on diesel. The sampling net was determined in accordance with the fuel consumption statistics of the country. 75 Sampling points were located on different types of roads. The quantity of bio-component in diesel samples was determined by direct LSC method through measurement of C-14 content. The measured values were in the range from 0 up to nearly 6 mass percentage of bio-component in fuel. The method has proved to be appropriate, suitable and effective for studies on the real fuel market. Copyright © 2014 Elsevier Ltd. All rights reserved.
Meta-analysis as Statistical and Analytical Method of Journal's Content Scientific Evaluation.
Masic, Izet; Begic, Edin
2015-02-01
A meta-analysis is a statistical and analytical method which combines and synthesizes different independent studies and integrates their results into one common result. Analysis of the journals "Medical Archives", "Materia Socio Medica" and "Acta Informatica Medica", which are located in the most eminent indexed databases of the biomedical milieu. The study has retrospective and descriptive character, and included the period of the calendar year 2014. Study included six editions of all three journals (total of 18 journals). In this period was published a total of 291 articles (in the "Medical Archives" 110, "Materia Socio Medica" 97, and in "Acta Informatica Medica" 84). The largest number of articles was original articles. Small numbers have been published as professional, review articles and case reports. Clinical events were most common in the first two journals, while in the journal "Acta Informatica Medica" belonged to the field of medical informatics, as part of pre-clinical medical disciplines. Articles are usually required period of fifty to fifty nine days for review. Articles were received from four continents, mostly from Europe. The authors are most often from the territory of Bosnia and Herzegovina, then Iran, Kosovo and Macedonia. The number of articles published each year is increasing, with greater participation of authors from different continents and abroad. Clinical medical disciplines are the most common, with the broader spectrum of topics and with a growing number of original articles. Greater support of the wider scientific community is needed for further development of all three of the aforementioned journals.
Common pitfalls in statistical analysis: Odds versus risk
Ranganathan, Priya; Aggarwal, Rakesh; Pramesh, C. S.
2015-01-01
In biomedical research, we are often interested in quantifying the relationship between an exposure and an outcome. “Odds” and “Risk” are the most common terms which are used as measures of association between variables. In this article, which is the fourth in the series of common pitfalls in statistical analysis, we explain the meaning of risk and odds and the difference between the two. PMID:26623395
Substantial increase in concurrent droughts and heatwaves in the United States
Mazdiyasni, Omid; AghaKouchak, Amir
2015-01-01
A combination of climate events (e.g., low precipitation and high temperatures) may cause a significant impact on the ecosystem and society, although individual events involved may not be severe extremes themselves. Analyzing historical changes in concurrent climate extremes is critical to preparing for and mitigating the negative effects of climatic change and variability. This study focuses on the changes in concurrences of heatwaves and meteorological droughts from 1960 to 2010. Despite an apparent hiatus in rising temperature and no significant trend in droughts, we show a substantial increase in concurrent droughts and heatwaves across most parts of the United States, and a statistically significant shift in the distribution of concurrent extremes. Although commonly used trend analysis methods do not show any trend in concurrent droughts and heatwaves, a unique statistical approach discussed in this study exhibits a statistically significant change in the distribution of the data. PMID:26324927
Substantial increase in concurrent droughts and heatwaves in the United States.
Mazdiyasni, Omid; AghaKouchak, Amir
2015-09-15
A combination of climate events (e.g., low precipitation and high temperatures) may cause a significant impact on the ecosystem and society, although individual events involved may not be severe extremes themselves. Analyzing historical changes in concurrent climate extremes is critical to preparing for and mitigating the negative effects of climatic change and variability. This study focuses on the changes in concurrences of heatwaves and meteorological droughts from 1960 to 2010. Despite an apparent hiatus in rising temperature and no significant trend in droughts, we show a substantial increase in concurrent droughts and heatwaves across most parts of the United States, and a statistically significant shift in the distribution of concurrent extremes. Although commonly used trend analysis methods do not show any trend in concurrent droughts and heatwaves, a unique statistical approach discussed in this study exhibits a statistically significant change in the distribution of the data.
Statistical analysis of QC data and estimation of fuel rod behaviour
NASA Astrophysics Data System (ADS)
Heins, L.; Groβ, H.; Nissen, K.; Wunderlich, F.
1991-02-01
The behaviour of fuel rods while in reactor is influenced by many parameters. As far as fabrication is concerned, fuel pellet diameter and density, and inner cladding diameter are important examples. Statistical analyses of quality control data show a scatter of these parameters within the specified tolerances. At present it is common practice to use a combination of superimposed unfavorable tolerance limits (worst case dataset) in fuel rod design calculations. Distributions are not considered. The results obtained in this way are very conservative but the degree of conservatism is difficult to quantify. Probabilistic calculations based on distributions allow the replacement of the worst case dataset by a dataset leading to results with known, defined conservatism. This is achieved by response surface methods and Monte Carlo calculations on the basis of statistical distributions of the important input parameters. The procedure is illustrated by means of two examples.
A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic
Madsen, Bo Eskerod; Browning, Sharon R.
2009-01-01
Resequencing is an emerging tool for identification of rare disease-associated mutations. Rare mutations are difficult to tag with SNP genotyping, as genotyping studies are designed to detect common variants. However, studies have shown that genetic heterogeneity is a probable scenario for common diseases, in which multiple rare mutations together explain a large proportion of the genetic basis for the disease. Thus, we propose a weighted-sum method to jointly analyse a group of mutations in order to test for groupwise association with disease status. For example, such a group of mutations may result from resequencing a gene. We compare the proposed weighted-sum method to alternative methods and show that it is powerful for identifying disease-associated genes, both on simulated and Encode data. Using the weighted-sum method, a resequencing study can identify a disease-associated gene with an overall population attributable risk (PAR) of 2%, even when each individual mutation has much lower PAR, using 1,000 to 7,000 affected and unaffected individuals, depending on the underlying genetic model. This study thus demonstrates that resequencing studies can identify important genetic associations, provided that specialised analysis methods, such as the weighted-sum method, are used. PMID:19214210
Performance metrics for the assessment of satellite data products: an ocean color case study
Performance assessment of ocean color satellite data has generally relied on statistical metrics chosen for their common usage and the rationale for selecting certain metrics is infrequently explained. Commonly reported statistics based on mean squared errors, such as the coeffic...
Statistical Approaches Used to Assess the Equity of Access to Food Outlets: A Systematic Review
Lamb, Karen E.; Thornton, Lukar E.; Cerin, Ester; Ball, Kylie
2015-01-01
Background Inequalities in eating behaviours are often linked to the types of food retailers accessible in neighbourhood environments. Numerous studies have aimed to identify if access to healthy and unhealthy food retailers is socioeconomically patterned across neighbourhoods, and thus a potential risk factor for dietary inequalities. Existing reviews have examined differences between methodologies, particularly focussing on neighbourhood and food outlet access measure definitions. However, no review has informatively discussed the suitability of the statistical methodologies employed; a key issue determining the validity of study findings. Our aim was to examine the suitability of statistical approaches adopted in these analyses. Methods Searches were conducted for articles published from 2000–2014. Eligible studies included objective measures of the neighbourhood food environment and neighbourhood-level socio-economic status, with a statistical analysis of the association between food outlet access and socio-economic status. Results Fifty-four papers were included. Outlet accessibility was typically defined as the distance to the nearest outlet from the neighbourhood centroid, or as the number of food outlets within a neighbourhood (or buffer). To assess if these measures were linked to neighbourhood disadvantage, common statistical methods included ANOVA, correlation, and Poisson or negative binomial regression. Although all studies involved spatial data, few considered spatial analysis techniques or spatial autocorrelation. Conclusions With advances in GIS software, sophisticated measures of neighbourhood outlet accessibility can be considered. However, approaches to statistical analysis often appear less sophisticated. Care should be taken to consider assumptions underlying the analysis and the possibility of spatially correlated residuals which could affect the results. PMID:29546115
Using volcano plots and regularized-chi statistics in genetic association studies.
Li, Wentian; Freudenberg, Jan; Suh, Young Ju; Yang, Yaning
2014-02-01
Labor intensive experiments are typically required to identify the causal disease variants from a list of disease associated variants in the genome. For designing such experiments, candidate variants are ranked by their strength of genetic association with the disease. However, the two commonly used measures of genetic association, the odds-ratio (OR) and p-value may rank variants in different order. To integrate these two measures into a single analysis, here we transfer the volcano plot methodology from gene expression analysis to genetic association studies. In its original setting, volcano plots are scatter plots of fold-change and t-test statistic (or -log of the p-value), with the latter being more sensitive to sample size. In genetic association studies, the OR and Pearson's chi-square statistic (or equivalently its square root, chi; or the standardized log(OR)) can be analogously used in a volcano plot, allowing for their visual inspection. Moreover, the geometric interpretation of these plots leads to an intuitive method for filtering results by a combination of both OR and chi-square statistic, which we term "regularized-chi". This method selects associated markers by a smooth curve in the volcano plot instead of the right-angled lines which corresponds to independent cutoffs for OR and chi-square statistic. The regularized-chi incorporates relatively more signals from variants with lower minor-allele-frequencies than chi-square test statistic. As rare variants tend to have stronger functional effects, regularized-chi is better suited to the task of prioritization of candidate genes. Copyright © 2013 Elsevier Ltd. All rights reserved.
Bessey, Cindy; Vanderklift, Mathew A
2014-02-15
Stable isotope analysis (SIA) is a powerful tool in many fields of research that enables quantitative comparisons among studies, if similar methods have been used. The goal of this study was to determine if three different drying methods commonly used to prepare samples for SIA yielded different δ(15)N and δ(13)C values. Muscle subsamples from 10 individuals each of three teleost species were dried using three methods: (i) oven, (ii) food dehydrator, and (iii) freeze-dryer. All subsamples were analysed for δ(15)N and δ(13)C values, and nitrogen and carbon content, using a continuous flow system consisting of a Delta V Plus mass spectrometer and a Flush 1112 elemental analyser via a Conflo IV universal interface. The δ(13)C values were normalized to constant lipid content using the equations proposed by McConnaughey and McRoy. Although statistically significant, the differences in δ(15)N values between the drying methods were small (mean differences ≤0.21‰). The differences in δ(13)C values between the drying methods were not statistically significant, and normalising the δ(13)C values to constant lipid content reduced the mean differences for all treatments to ≤0.65‰. A statistically significant difference of ~2% in C content existed between tissues dried in a food dehydrator and those dried in a freeze-dryer for two fish species. There was no significant effect of fish size on the differences between methods. No substantial effect of drying method was found on the δ(15)N or δ(13)C values of teleost muscle tissue. Copyright © 2013 John Wiley & Sons, Ltd.
Jenkins, Martin
2016-01-01
Objective. In clinical trials of RA, it is common to assess effectiveness using end points based upon dichotomized continuous measures of disease activity, which classify patients as responders or non-responders. Although dichotomization generally loses statistical power, there are good clinical reasons to use these end points; for example, to allow for patients receiving rescue therapy to be assigned as non-responders. We adopt a statistical technique called the augmented binary method to make better use of the information provided by these continuous measures and account for how close patients were to being responders. Methods. We adapted the augmented binary method for use in RA clinical trials. We used a previously published randomized controlled trial (Oral SyK Inhibition in Rheumatoid Arthritis-1) to assess its performance in comparison to a standard method treating patients purely as responders or non-responders. The power and error rate were investigated by sampling from this study. Results. The augmented binary method reached similar conclusions to standard analysis methods but was able to estimate the difference in response rates to a higher degree of precision. Results suggested that CI widths for ACR responder end points could be reduced by at least 15%, which could equate to reducing the sample size of a study by 29% to achieve the same statistical power. For other end points, the gain was even higher. Type I error rates were not inflated. Conclusion. The augmented binary method shows considerable promise for RA trials, making more efficient use of patient data whilst still reporting outcomes in terms of recognized response end points. PMID:27338084
Ceppi, Marcello; Gallo, Fabio; Bonassi, Stefano
2011-01-01
The most common study design performed in population studies based on the micronucleus (MN) assay, is the cross-sectional study, which is largely performed to evaluate the DNA damaging effects of exposure to genotoxic agents in the workplace, in the environment, as well as from diet or lifestyle factors. Sample size is still a critical issue in the design of MN studies since most recent studies considering gene-environment interaction, often require a sample size of several hundred subjects, which is in many cases difficult to achieve. The control of confounding is another major threat to the validity of causal inference. The most popular confounders considered in population studies using MN are age, gender and smoking habit. Extensive attention is given to the assessment of effect modification, given the increasing inclusion of biomarkers of genetic susceptibility in the study design. Selected issues concerning the statistical treatment of data have been addressed in this mini-review, starting from data description, which is a critical step of statistical analysis, since it allows to detect possible errors in the dataset to be analysed and to check the validity of assumptions required for more complex analyses. Basic issues dealing with statistical analysis of biomarkers are extensively evaluated, including methods to explore the dose-response relationship among two continuous variables and inferential analysis. A critical approach to the use of parametric and non-parametric methods is presented, before addressing the issue of most suitable multivariate models to fit MN data. In the last decade, the quality of statistical analysis of MN data has certainly evolved, although even nowadays only a small number of studies apply the Poisson model, which is the most suitable method for the analysis of MN data.
Andersen, Flemming; Watanabe, Hideaki; Bjarkam, Carsten; Danielsen, Erik H; Cumming, Paul
2005-07-15
The analysis of physiological processes in brain by position emission tomography (PET) is facilitated when images are spatially normalized to a standard coordinate system. Thus, PET activation studies of human brain frequently employ the common stereotaxic coordinates of Talairach. We have developed an analogous stereotaxic coordinate system for the brain of the Gottingen miniature pig, based on automatic co-registration of magnetic resonance (MR) images obtained in 22 male pigs. The origin of the pig brain stereotaxic space (0, 0, 0) was arbitrarily placed in the centroid of the pineal gland as identified on the average MRI template. The orthogonal planes were imposed using the line between stereotaxic zero and the optic chiasm. A series of mean MR images in the coronal, sagittal and horizontal planes were generated. To test the utility of the common coordinate system for functional imaging studies of minipig brain, we calculated cerebral blood flow (CBF) maps from normal minipigs and from minipigs with a syndrome of parkisonism induced by 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP)-poisoning. These maps were transformed from the native space into the common stereotaxic space. After global normalization of these maps, an undirected search for differences between the groups was then performed using statistical parametric mapping. Using this method, we detected a statistically significant focal increase in CBF in the left cerebellum of the MPTP-lesioned group. We expect the present approach to be of general use in the statistical parametric mapping of CBF and other physiological parameters in living pig brain.
Investigation of methods for hydroclimatic data homogenization
NASA Astrophysics Data System (ADS)
Steirou, E.; Koutsoyiannis, D.
2012-04-01
We investigate the methods used for the adjustment of inhomogeneities of temperature time series covering the last 100 years. Based on a systematic study of scientific literature, we classify and evaluate the observed inhomogeneities in historical and modern time series, as well as their adjustment methods. It turns out that these methods are mainly statistical, not well justified by experiments and are rarely supported by metadata. In many of the cases studied the proposed corrections are not even statistically significant. From the global database GHCN-Monthly Version 2, we examine all stations containing both raw and adjusted data that satisfy certain criteria of continuity and distribution over the globe. In the United States of America, because of the large number of available stations, stations were chosen after a suitable sampling. In total we analyzed 181 stations globally. For these stations we calculated the differences between the adjusted and non-adjusted linear 100-year trends. It was found that in the two thirds of the cases, the homogenization procedure increased the positive or decreased the negative temperature trends. One of the most common homogenization methods, 'SNHT for single shifts', was applied to synthetic time series with selected statistical characteristics, occasionally with offsets. The method was satisfactory when applied to independent data normally distributed, but not in data with long-term persistence. The above results cast some doubts in the use of homogenization procedures and tend to indicate that the global temperature increase during the last century is between 0.4°C and 0.7°C, where these two values are the estimates derived from raw and adjusted data, respectively.
A functional U-statistic method for association analysis of sequencing data.
Jadhav, Sneha; Tong, Xiaoran; Lu, Qing
2017-11-01
Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence. © 2017 WILEY PERIODICALS, INC.
Larson, Nicholas B; McDonnell, Shannon; Cannon Albright, Lisa; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan; Schleutker, Johanna; Carpten, John D; Powell, Isaac; Bailey-Wilson, Joan E; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G; MacInnis, Robert J; Maier, Christiane; Whittemore, Alice S; Hsieh, Chih-Lin; Wiklund, Fredrik; Catalona, William J; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael J; Olson, Timothy M; Klein, Christopher J; Thibodeau, Stephen N; Schaid, Daniel J
2017-05-01
Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results. © 2017 WILEY PERIODICALS, INC.
NASA Astrophysics Data System (ADS)
Xianliang, Lei; Hongying, Yu
Using the questionnaire, mathematical statistics and entropy measurement methods, the quantitative relationship between the individual characteristics urban residents and their sports consumption motivation are studied. The results show that the most main sports consumption motivation of urban residents is fitness motivation and social motivation. Urban residents of different gender, age, education and income levels are different in regulating psychological motivation, rational consumption motivation and seeking common motivation.
De Micco, Veronica; Ruel, Katia; Joseleau, Jean-Paul; Aronne, Giovanna
2010-08-01
During cell wall formation and degradation, it is possible to detect cellulose microfibrils assembled into thicker and thinner lamellar structures, respectively, following inverse parallel patterns. The aim of this study was to analyse such patterns of microfibril aggregation and cell wall delamination. The thickness of microfibrils and lamellae was measured on digital images of both growing and degrading cell walls viewed by means of transmission electron microscopy. To objectively detect, measure and classify microfibrils and lamellae into thickness classes, a method based on the application of computerized image analysis combined with graphical and statistical methods was developed. The method allowed common classes of microfibrils and lamellae in cell walls to be identified from different origins. During both the formation and degradation of cell walls, a preferential formation of structures with specific thickness was evidenced. The results obtained with the developed method allowed objective analysis of patterns of microfibril aggregation and evidenced a trend of doubling/halving lamellar structures, during cell wall formation/degradation in materials from different origin and which have undergone different treatments.
Statistical Methods for Quantifying the Variability of Solar Wind Transients of All Sizes
NASA Astrophysics Data System (ADS)
Tindale, E.; Chapman, S. C.
2016-12-01
The solar wind is inherently variable across a wide range of timescales, from small-scale turbulent fluctuations to the 11-year periodicity induced by the solar cycle. Each solar cycle is unique, and this change in overall cycle activity is coupled from the Sun to Earth via the solar wind, leading to long-term trends in space weather. Our work [Tindale & Chapman, 2016] applies novel statistical methods to solar wind transients of all sizes, to quantify the variability of the solar wind associated with the solar cycle. We use the same methods to link solar wind observations with those on the Sun and Earth. We use Wind data to construct quantile-quantile (QQ) plots comparing the statistical distributions of multiple commonly used solar wind-magnetosphere coupling parameters between the minima and maxima of solar cycles 23 and 24. We find that in each case the distribution is multicomponent, ranging from small fluctuations to extreme values, with the same functional form at all phases of the solar cycle. The change in PDF is captured by a simple change of variables, which is independent of the PDF model. Using this method we can quantify the quietness of the cycle 24 maximum, identify which variable drives the changing distribution of composite parameters such as ɛ, and we show that the distribution of ɛ is less sensitive to changes in its extreme values than that of its constituents. After demonstrating the QQ method on solar wind data, we extend the analysis to include solar and magnetospheric data spanning the same time period. We focus on GOES X-ray flux and WDC AE index data. Finally, having studied the statistics of transients across the full distribution, we apply the same method to time series of extreme bursts in each variable. Using these statistical tools, we aim to track the solar cycle-driven variability from the Sun through the solar wind and into the Earth's magnetosphere. Tindale, E. and S.C. Chapman (2016), Geophys. Res. Lett., 43(11), doi: 10.1002/2016GL068920.
SPSS and SAS programs for comparing Pearson correlations and OLS regression coefficients.
Weaver, Bruce; Wuensch, Karl L
2013-09-01
Several procedures that use summary data to test hypotheses about Pearson correlations and ordinary least squares regression coefficients have been described in various books and articles. To our knowledge, however, no single resource describes all of the most common tests. Furthermore, many of these tests have not yet been implemented in popular statistical software packages such as SPSS and SAS. In this article, we describe all of the most common tests and provide SPSS and SAS programs to perform them. When they are applicable, our code also computes 100 × (1 - α)% confidence intervals corresponding to the tests. For testing hypotheses about independent regression coefficients, we demonstrate one method that uses summary data and another that uses raw data (i.e., Potthoff analysis). When the raw data are available, the latter method is preferred, because use of summary data entails some loss of precision due to rounding.
Technology Solutions Case Study: Predicting Envelope Leakage in Attached Dwellings
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
2013-11-01
The most common method of measuring air leakage is to perform single (or solo) blower door pressurization and/or depressurization test. In detached housing, the single blower door test measures leakage to the outside. In attached housing, however, this “solo” test method measures both air leakage to the outside and air leakage between adjacent units through common surfaces. In an attempt to create a simplified tool for predicting leakage to the outside, Building America team Consortium for Advanced Residential Buildings (CARB) performed a preliminary statistical analysis on blower door test results from 112 attached dwelling units in four apartment complexes. Althoughmore » the subject data set is limited in size and variety, the preliminary analyses suggest significant predictors are present and support the development of a predictive model. Further data collection is underway to create a more robust prediction tool for use across different construction types, climate zones, and unit configurations.« less
Parastomal hernia – current knowledge and treatment
Styliński, Roman; Rudzki, Sławomir
2018-01-01
Intestinal stoma creation is one of the most common surgical procedures. The most common long-term complication following stoma creation is parastomal hernia, which according to some authors is practically unavoidable. Statistical differences of its occurrence are mainly due to patient observation time and evaluation criteria. Consequently, primary prevention methods such as placement of prosthetic mesh and newly developed minimally invasive methods of stoma creation are used. It seems that in the light of evidence-based medicine, the best way to treat parastomal hernia is the one that the surgeon undertaking therapy is the most experienced in and is suited to the individuality of each patient, his condition and comorbidities. As a general rule, reinforcing the abdominal wall with a prosthetic mesh is the treatment of choice, with a low rate of complications and relapses over a long period of time. The current trend is to use lightweight, large pore meshes. PMID:29643952
Iddawela, Devika; Ehambaram, Kiruthiha; Pethiyagoda, Kalyani; Bandara, Lakmalee
2017-01-01
Introduction Human toxocariasis is caused by several species of the nematode Toxocara. Two common clinical syndromes are ocular and visceral larva migrans. Objectives To determine the Toxocara antibody positivity in clinically suspected VLM patients and to describe demographic factors and clinical manifestations of seropositive patients. Methods 522 clinically suspected patients were studied between 1993 and 2014. Relevant data was gathered from referral letters. Serum samples were subjected to Toxocara antigen ELISA. Results Overall, seropositivity was 50.2% (262), of which 109 (40.8%) were positive at high level of Toxocara antibody carriage and 153 (58.4%) were positive at low levels. The seropositives ranged from 3 months to 70 years (mean = 7.8). Younger age group had higher levels of seropositivity and it was statistically significant. Majority of children under 5 years were seropositive (47.7%, n = 125). Seropositivity was common in males (55.3%, n = 145). Clinical manifestations of seropositives include lymphadenopathy (24.1%) skin rash (22.5%), dyspnoea (21.7%), fever (21%), hepatosplenomegaly (9.2%), and abdominal pain (3.8%). 197 (75.2%) seropositive cases had eosinophilia. These symptoms were not statistically significant. Conclusions This study confirms toxocariasis as an important cause of childhood ill health identifying common clinical symptoms recommending preventive measures to limit transmission. PMID:29362672
Examining the dimensions and correlates of workplace stress among Australian veterinarians
2009-01-01
Background Although stress is known to be a common occupational health issue in the veterinary profession, few studies have investigated its broad domains or the internal validity of the survey instrument used for assessment. Methods We analysed data from over 500 veterinarians in Queensland, Australia, who were surveyed during 2006-07. Results The most common causes of stress were reported to be long hours worked per day, not having enough holidays per year, not having enough rest breaks per day, the attitude of customers, lack of recognition from the public and not having enough time per patient. Age, gender and practice type were statistically associated with various aspects of work-related stress. Strong correlations were found between having too many patients per day and not having enough time per patient; between not having enough holidays and long working hours; and also between not enough rest breaks per day and long working hours. Factor analysis revealed four dimensions of stress comprising a mixture of career, professional and practice-related items. The internal validity of our stress questionnaire was shown to be high during statistical analysis. Conclusion Overall, this study suggests that workplace stress is fairly common among Australian veterinarians and represents an issue that occupies several distinct areas within their professional life. PMID:19995450
Structurally Sound Statistics Instruction
ERIC Educational Resources Information Center
Casey, Stephanie A.; Bostic, Jonathan D.
2016-01-01
The Common Core's Standards for Mathematical Practice (SMP) call for all K-grade 12 students to develop expertise in the processes and proficiencies of doing mathematics. However, the Common Core State Standards for Mathematics (CCSSM) (CCSSI 2010) as a whole addresses students' learning of not only mathematics but also statistics. This situation…
Tweedell, Andrew J.; Haynes, Courtney A.
2017-01-01
The timing of muscle activity is a commonly applied analytic method to understand how the nervous system controls movement. This study systematically evaluates six classes of standard and statistical algorithms to determine muscle onset in both experimental surface electromyography (EMG) and simulated EMG with a known onset time. Eighteen participants had EMG collected from the biceps brachii and vastus lateralis while performing a biceps curl or knee extension, respectively. Three established methods and three statistical methods for EMG onset were evaluated. Linear envelope, Teager-Kaiser energy operator + linear envelope and sample entropy were the established methods evaluated while general time series mean/variance, sequential and batch processing of parametric and nonparametric tools, and Bayesian changepoint analysis were the statistical techniques used. Visual EMG onset (experimental data) and objective EMG onset (simulated data) were compared with algorithmic EMG onset via root mean square error and linear regression models for stepwise elimination of inferior algorithms. The top algorithms for both data types were analyzed for their mean agreement with the gold standard onset and evaluation of 95% confidence intervals. The top algorithms were all Bayesian changepoint analysis iterations where the parameter of the prior (p0) was zero. The best performing Bayesian algorithms were p0 = 0 and a posterior probability for onset determination at 60–90%. While existing algorithms performed reasonably, the Bayesian changepoint analysis methodology provides greater reliability and accuracy when determining the singular onset of EMG activity in a time series. Further research is needed to determine if this class of algorithms perform equally well when the time series has multiple bursts of muscle activity. PMID:28489897
Reconstruction of early phase deformations by integrated magnetic and mesotectonic data evaluation
NASA Astrophysics Data System (ADS)
Sipos, András A.; Márton, Emő; Fodor, László
2018-02-01
Markers of brittle faulting are widely used for recovering past deformation phases. Rocks often have oriented magnetic fabrics, which can be interpreted as connected to ductile deformation before cementation of the sediment. This paper reports a novel statistical procedure for simultaneous evaluation of AMS (Anisotropy of Magnetic Susceptibility) and fault-slip data. The new method analyzes the AMS data, without linearization techniques, so that weak AMS lineation and rotational AMS can be assessed that are beyond the scope of classical methods. This idea is extended to the evaluation of fault-slip data. While the traditional assumptions of stress inversion are not rejected, the method recovers the stress field via statistical hypothesis testing. In addition it provides statistical information needed for the combined evaluation of the AMS and the mesotectonic (0.1 to 10 m) data. In the combined evaluation a statistical test is carried out that helps to decide if the AMS lineation and the mesotectonic markers (in case of repeated deformation of the oldest set of markers) were formed in the same or different deformation phases. If this condition is met, the combined evaluation can improve the precision of the reconstruction. When the two data sets do not have a common solution for the direction of the extension, the deformational origin of the AMS is questionable. In this case the orientation of the stress field responsible for the AMS lineation might be different from that which caused the brittle deformation. Although most of the examples demonstrate the reconstruction of weak deformations in sediments, the new method is readily applicable to investigate the ductile-brittle transition of any rock formation as long as AMS and fault-slip data are available.
Comparison of statistical tests for association between rare variants and binary traits.
Bacanu, Silviu-Alin; Nelson, Matthew R; Whittaker, John C
2012-01-01
Genome-wide association studies have found thousands of common genetic variants associated with a wide variety of diseases and other complex traits. However, a large portion of the predicted genetic contribution to many traits remains unknown. One plausible explanation is that some of the missing variation is due to the effects of rare variants. Nonetheless, the statistical analysis of rare variants is challenging. A commonly used method is to contrast, within the same region (gene), the frequency of minor alleles at rare variants between cases and controls. However, this strategy is most useful under the assumption that the tested variants have similar effects. We previously proposed a method that can accommodate heterogeneous effects in the analysis of quantitative traits. Here we extend this method to include binary traits that can accommodate covariates. We use simulations for a variety of causal and covariate impact scenarios to compare the performance of the proposed method to standard logistic regression, C-alpha, SKAT, and EREC. We found that i) logistic regression methods perform well when the heterogeneity of the effects is not extreme and ii) SKAT and EREC have good performance under all tested scenarios but they can be computationally intensive. Consequently, it would be more computationally desirable to use a two-step strategy by (i) selecting promising genes by faster methods and ii) analyzing selected genes using SKAT/EREC. To select promising genes one can use (1) regression methods when effect heterogeneity is assumed to be low and the covariates explain a non-negligible part of trait variability, (2) C-alpha when heterogeneity is assumed to be large and covariates explain a small fraction of trait's variability and (3) the proposed trend and heterogeneity test when the heterogeneity is assumed to be non-trivial and the covariates explain a large fraction of trait variability.
NASA Astrophysics Data System (ADS)
Skrzypek, Grzegorz; Sadler, Rohan; Wiśniewski, Andrzej
2017-04-01
The stable oxygen isotope composition of phosphates (δ18O) extracted from mammalian bone and teeth material is commonly used as a proxy for paleotemperature. Historically, several different analytical and statistical procedures for determining air paleotemperatures from the measured δ18O of phosphates have been applied. This inconsistency in both stable isotope data processing and the application of statistical procedures has led to large and unwanted differences between calculated results. This study presents the uncertainty associated with two of the most commonly used regression methods: least squares inverted fit and transposed fit. We assessed the performance of these methods by designing and applying calculation experiments to multiple real-life data sets, calculating in reverse temperatures, and comparing them with true recorded values. Our calculations clearly show that the mean absolute errors are always substantially higher for the inverted fit (a causal model), with the transposed fit (a predictive model) returning mean values closer to the measured values (Skrzypek et al. 2015). The predictive models always performed better than causal models, with 12-65% lower mean absolute errors. Moreover, the least-squares regression (LSM) model is more appropriate than Reduced Major Axis (RMA) regression for calculating the environmental water stable oxygen isotope composition from phosphate signatures, as well as for calculating air temperature from the δ18O value of environmental water. The transposed fit introduces a lower overall error than the inverted fit for both the δ18O of environmental water and Tair calculations; therefore, the predictive models are more statistically efficient than the causal models in this instance. The direct comparison of paleotemperature results from different laboratories and studies may only be achieved if a single method of calculation is applied. Reference Skrzypek G., Sadler R., Wiśniewski A., 2016. Reassessment of recommendations for processing mammal phosphate δ18O data for paleotemperature reconstruction. Palaeogeography, Palaeoclimatology, Palaeoecology 446, 162-167.
Prevalence of hypothalamo pituitary dysfunction in patients of traumatic brain injury
Hari Kumar, K. V. S.; Swamy, M. N.; Khan, M. A.
2016-01-01
Background: Traumatic brain injury (TBI) is common in young soldiers of armed forces leading to significant morbidity and mortality. We studied the prevalence of hypopituitarism following TBI and its association with trauma severity. Materials and Methods: We conducted a 12-month prospective study of 56 TBI patients for the presence of hormonal dysfunction. Hormonal parameters were estimated during the early phase (0–10 days posttraumatically) and after 6 and 12 months. Dynamic testing was done when required, and the results were analyzed by appropriate statistical methods. Results: Hormonal dysfunction was seen in 39 of the 56 (70%) patients at initial assessment. Persisting pituitary deficiencies are seen in 7 and 8 patients at the end of 6 months and 12 months, respectively. Hypogonadotropic hypogonadism, hypothyroidism, and growth hormone deficiency are the most common diagnoses. Initial severe TBI and plurihormonal involvement predicted the long-term hypopituitarism. Conclusion: Early hypopituitarism was common in severe TBI, but recovers in majority. Evaluation for the occult pituitary dysfunction is required during the rehabilitation of TBI patients. PMID:27867878
Defect detection of castings in radiography images using a robust statistical feature.
Zhao, Xinyue; He, Zaixing; Zhang, Shuyou
2014-01-01
One of the most commonly used optical methods for defect detection is radiographic inspection. Compared with methods that extract defects directly from the radiography image, model-based methods deal with the case of an object with complex structure well. However, detection of small low-contrast defects in nonuniformly illuminated images is still a major challenge for them. In this paper, we present a new method based on the grayscale arranging pairs (GAP) feature to detect casting defects in radiography images automatically. First, a model is built using pixel pairs with a stable intensity relationship based on the GAP feature from previously acquired images. Second, defects can be extracted by comparing the difference of intensity-difference signs between the input image and the model statistically. The robustness of the proposed method to noise and illumination variations has been verified on casting radioscopic images with defects. The experimental results showed that the average computation time of the proposed method in the testing stage is 28 ms per image on a computer with a Pentium Core 2 Duo 3.00 GHz processor. For the comparison, we also evaluated the performance of the proposed method as well as that of the mixture-of-Gaussian-based and crossing line profile methods. The proposed method achieved 2.7% and 2.0% false negative rates in the noise and illumination variation experiments, respectively.
Zhou, Yunyi; Tao, Chenyang; Lu, Wenlian; Feng, Jianfeng
2018-04-20
Functional connectivity is among the most important tools to study brain. The correlation coefficient, between time series of different brain areas, is the most popular method to quantify functional connectivity. Correlation coefficient in practical use assumes the data to be temporally independent. However, the time series data of brain can manifest significant temporal auto-correlation. A widely applicable method is proposed for correcting temporal auto-correlation. We considered two types of time series models: (1) auto-regressive-moving-average model, (2) nonlinear dynamical system model with noisy fluctuations, and derived their respective asymptotic distributions of correlation coefficient. These two types of models are most commonly used in neuroscience studies. We show the respective asymptotic distributions share a unified expression. We have verified the validity of our method, and shown our method exhibited sufficient statistical power for detecting true correlation on numerical experiments. Employing our method on real dataset yields more robust functional network and higher classification accuracy than conventional methods. Our method robustly controls the type I error while maintaining sufficient statistical power for detecting true correlation in numerical experiments, where existing methods measuring association (linear and nonlinear) fail. In this work, we proposed a widely applicable approach for correcting the effect of temporal auto-correlation on functional connectivity. Empirical results favor the use of our method in functional network analysis. Copyright © 2018. Published by Elsevier B.V.
White, Sarah A; van den Broek, Nynke R
2004-05-30
Before introducing a new measurement tool it is necessary to evaluate its performance. Several statistical methods have been developed, or used, to evaluate the reliability and validity of a new assessment method in such circumstances. In this paper we review some commonly used methods. Data from a study that was conducted to evaluate the usefulness of a specific measurement tool (the WHO Colour Scale) is then used to illustrate the application of these methods. The WHO Colour Scale was developed under the auspices of the WHO to provide a simple portable and reliable method of detecting anaemia. This Colour Scale is a discrete interval scale, whereas the actual haemoglobin values it is used to estimate are on a continuous interval scale and can be measured accurately using electrical laboratory equipment. The methods we consider are: linear regression, correlation coefficients, paired t-tests plotting differences against mean values and deriving limits of agreement; kappa and weighted kappa statistics, sensitivity and specificity, an intraclass correlation coefficient and the repeatability coefficient. We note that although the definition and properties of each of these methods is well established inappropriate methods continue to be used in medical literature for assessing reliability and validity, as evidenced in the context of the evaluation of the WHO Colour Scale. Copyright 2004 John Wiley & Sons, Ltd.
Garcia, Rafaela Alvim; Vanelli, Chislene Pereira; Pereira Junior, Olavo Dos Santos; Corrêa, José Otávio do Amaral
2018-06-19
Hydroelectrolytic disorders are common in clinical situations and may be harmful to the patient, especially those involving plasma sodium and potassium dosages. Among the possible methods for the dosages are flame photometry, ion-selective electrode (ISE) and colorimetric enzymatic method. We analyzed 175 samples in the three different methods cited from patients attending the laboratory of the University Hospital of the Federal University of Juiz de Fora. The values obtained were statistically treated using SPSS 19.0 software. The present study aims to evaluate the impact of the use of these different methods in the determination of plasma sodium and potassium. The averages obtained for sodium and potassium measurements by flame photometry were similar (P > .05) to the means obtained for the two electrolytes by ISE. The averages obtained by the colorimetric enzymatic method presented statistical difference in relation to ISE, both for sodium and potassium. In the correlation analysis, both flame photometry and colorimetric enzymatic showed a strong correlation with the ISE method for both dosages. At the first time in the same work sodium and potassium were analyzed by three different methods and the results allowed us to conclude that the methods showed a positive and strong correlation, and can be applied in the clinical routine. © 2018 Wiley Periodicals, Inc.
Computer program for the calculation of grain size statistics by the method of moments
Sawyer, Michael B.
1977-01-01
A computer program is presented for a Hewlett-Packard Model 9830A desk-top calculator (1) which calculates statistics using weight or point count data from a grain-size analysis. The program uses the method of moments in contrast to the more commonly used but less inclusive graphic method of Folk and Ward (1957). The merits of the program are: (1) it is rapid; (2) it can accept data in either grouped or ungrouped format; (3) it allows direct comparison with grain-size data in the literature that have been calculated by the method of moments; (4) it utilizes all of the original data rather than percentiles from the cumulative curve as in the approximation technique used by the graphic method; (5) it is written in the computer language BASIC, which is easily modified and adapted to a wide variety of computers; and (6) when used in the HP-9830A, it does not require punching of data cards. The method of moments should be used only if the entire sample has been measured and the worker defines the measured grain-size range. (1) Use of brand names in this paper does not imply endorsement of these products by the U.S. Geological Survey.
Statistical downscaling modeling with quantile regression using lasso to estimate extreme rainfall
NASA Astrophysics Data System (ADS)
Santri, Dewi; Wigena, Aji Hamim; Djuraidah, Anik
2016-02-01
Rainfall is one of the climatic elements with high diversity and has many negative impacts especially extreme rainfall. Therefore, there are several methods that required to minimize the damage that may occur. So far, Global circulation models (GCM) are the best method to forecast global climate changes include extreme rainfall. Statistical downscaling (SD) is a technique to develop the relationship between GCM output as a global-scale independent variables and rainfall as a local- scale response variable. Using GCM method will have many difficulties when assessed against observations because GCM has high dimension and multicollinearity between the variables. The common method that used to handle this problem is principal components analysis (PCA) and partial least squares regression. The new method that can be used is lasso. Lasso has advantages in simultaneuosly controlling the variance of the fitted coefficients and performing automatic variable selection. Quantile regression is a method that can be used to detect extreme rainfall in dry and wet extreme. Objective of this study is modeling SD using quantile regression with lasso to predict extreme rainfall in Indramayu. The results showed that the estimation of extreme rainfall (extreme wet in January, February and December) in Indramayu could be predicted properly by the model at quantile 90th.
Xue, Xiaonan; Kim, Mimi Y; Castle, Philip E; Strickler, Howard D
2014-03-01
Studies to evaluate clinical screening tests often face the problem that the "gold standard" diagnostic approach is costly and/or invasive. It is therefore common to verify only a subset of negative screening tests using the gold standard method. However, undersampling the screen negatives can lead to substantial overestimation of the sensitivity and underestimation of the specificity of the diagnostic test. Our objective was to develop a simple and accurate statistical method to address this "verification bias." We developed a weighted generalized estimating equation approach to estimate, in a single model, the accuracy (eg, sensitivity/specificity) of multiple assays and simultaneously compare results between assays while addressing verification bias. This approach can be implemented using standard statistical software. Simulations were conducted to assess the proposed method. An example is provided using a cervical cancer screening trial that compared the accuracy of human papillomavirus and Pap tests, with histologic data as the gold standard. The proposed approach performed well in estimating and comparing the accuracy of multiple assays in the presence of verification bias. The proposed approach is an easy to apply and accurate method for addressing verification bias in studies of multiple screening methods. Copyright © 2014 Elsevier Inc. All rights reserved.
Steganographic embedding in containers-images
NASA Astrophysics Data System (ADS)
Nikishova, A. V.; Omelchenko, T. A.; Makedonskij, S. A.
2018-05-01
Steganography is one of the approaches to ensuring the protection of information transmitted over the network. But a steganographic method should vary depending on a used container. According to statistics, the most widely used containers are images and the most common image format is JPEG. Authors propose a method of data embedding into a frequency area of images in format JPEG 2000. It is proposed to use the method of Benham-Memon- Yeo-Yeung, in which instead of discrete cosine transform, discrete wavelet transform is used. Two requirements for images are formulated. Structure similarity is chosen to obtain quality assessment of data embedding. Experiments confirm that requirements satisfaction allows achieving high quality assessment of data embedding.
The Heuristic Value of p in Inductive Statistical Inference
Krueger, Joachim I.; Heck, Patrick R.
2017-01-01
Many statistical methods yield the probability of the observed data – or data more extreme – under the assumption that a particular hypothesis is true. This probability is commonly known as ‘the’ p-value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p-value has been subjected to much speculation, analysis, and criticism. We explore how well the p-value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p-value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p-value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say. PMID:28649206
The Heuristic Value of p in Inductive Statistical Inference.
Krueger, Joachim I; Heck, Patrick R
2017-01-01
Many statistical methods yield the probability of the observed data - or data more extreme - under the assumption that a particular hypothesis is true. This probability is commonly known as 'the' p -value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p -value has been subjected to much speculation, analysis, and criticism. We explore how well the p -value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p -value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p -value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say.
Small sample mediation testing: misplaced confidence in bootstrapped confidence intervals.
Koopman, Joel; Howe, Michael; Hollenbeck, John R; Sin, Hock-Peng
2015-01-01
Bootstrapping is an analytical tool commonly used in psychology to test the statistical significance of the indirect effect in mediation models. Bootstrapping proponents have particularly advocated for its use for samples of 20-80 cases. This advocacy has been heeded, especially in the Journal of Applied Psychology, as researchers are increasingly utilizing bootstrapping to test mediation with samples in this range. We discuss reasons to be concerned with this escalation, and in a simulation study focused specifically on this range of sample sizes, we demonstrate not only that bootstrapping has insufficient statistical power to provide a rigorous hypothesis test in most conditions but also that bootstrapping has a tendency to exhibit an inflated Type I error rate. We then extend our simulations to investigate an alternative empirical resampling method as well as a Bayesian approach and demonstrate that they exhibit comparable statistical power to bootstrapping in small samples without the associated inflated Type I error. Implications for researchers testing mediation hypotheses in small samples are presented. For researchers wishing to use these methods in their own research, we have provided R syntax in the online supplemental materials. (c) 2015 APA, all rights reserved.
Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T
2016-12-20
Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Does rational selection of training and test sets improve the outcome of QSAR modeling?
Martin, Todd M; Harten, Paul; Young, Douglas M; Muratov, Eugene N; Golbraikh, Alexander; Zhu, Hao; Tropsha, Alexander
2012-10-22
Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and k-nearest neighbor (kNN) methods were used to develop QSAR models based on the training sets. For kNN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable.
Tilson, Julie K; Marshall, Katie; Tam, Jodi J; Fetters, Linda
2016-04-22
A primary barrier to the implementation of evidence based practice (EBP) in physical therapy is therapists' limited ability to understand and interpret statistics. Physical therapists demonstrate limited skills and report low self-efficacy for interpreting results of statistical procedures. While standards for physical therapist education include statistics, little empirical evidence is available to inform what should constitute such curricula. The purpose of this study was to conduct a census of the statistical terms and study designs used in physical therapy literature and to use the results to make recommendations for curricular development in physical therapist education. We conducted a bibliometric analysis of 14 peer-reviewed journals associated with the American Physical Therapy Association over 12 months (Oct 2011-Sept 2012). Trained raters recorded every statistical term appearing in identified systematic reviews, primary research reports, and case series and case reports. Investigator-reported study design was also recorded. Terms representing the same statistical test or concept were combined into a single, representative term. Cumulative percentage was used to identify the most common representative statistical terms. Common representative terms were organized into eight categories to inform curricular design. Of 485 articles reviewed, 391 met the inclusion criteria. These 391 articles used 532 different terms which were combined into 321 representative terms; 13.1 (sd = 8.0) terms per article. Eighty-one representative terms constituted 90% of all representative term occurrences. Of the remaining 240 representative terms, 105 (44%) were used in only one article. The most common study design was prospective cohort (32.5%). Physical therapy literature contains a large number of statistical terms and concepts for readers to navigate. However, in the year sampled, 81 representative terms accounted for 90% of all occurrences. These "common representative terms" can be used to inform curricula to promote physical therapists' skills, competency, and confidence in interpreting statistics in their professional literature. We make specific recommendations for curriculum development informed by our findings.
Use of the Box-Cox Transformation in Detecting Changepoints in Daily Precipitation Data Series
NASA Astrophysics Data System (ADS)
Wang, X. L.; Chen, H.; Wu, Y.; Pu, Q.
2009-04-01
This study integrates a Box-Cox power transformation procedure into two statistical tests for detecting changepoints in Gaussian data series, to make the changepoint detection methods applicable to non-Gaussian data series, such as daily precipitation amounts. The detection power aspects of transformed methods in a common trend two-phase regression setting are assessed by Monte Carlo simulations for data of a log-normal or Gamma distribution. The results show that the transformed methods have increased the power of detection, in comparison with the corresponding original (untransformed) methods. The transformed data much better approximate to a Gaussian distribution. As an example of application, the new methods are applied to a series of daily precipitation amounts recorded at a station in Canada, showing satisfactory detection power.
Measuring missing heritability: Inferring the contribution of common variants
Golan, David; Lander, Eric S.; Rosset, Saharon
2014-01-01
Genome-wide association studies (GWASs), also called common variant association studies (CVASs), have uncovered thousands of genetic variants associated with hundreds of diseases. However, the variants that reach statistical significance typically explain only a small fraction of the heritability. One explanation for the “missing heritability” is that there are many additional disease-associated common variants whose effects are too small to detect with current sample sizes. It therefore is useful to have methods to quantify the heritability due to common variation, without having to identify all causal variants. Recent studies applied restricted maximum likelihood (REML) estimation to case–control studies for diseases. Here, we show that REML considerably underestimates the fraction of heritability due to common variation in this setting. The degree of underestimation increases with the rarity of disease, the heritability of the disease, and the size of the sample. Instead, we develop a general framework for heritability estimation, called phenotype correlation–genotype correlation (PCGC) regression, which generalizes the well-known Haseman–Elston regression method. We show that PCGC regression yields unbiased estimates. Applying PCGC regression to six diseases, we estimate the proportion of the phenotypic variance due to common variants to range from 25% to 56% and the proportion of heritability due to common variants from 41% to 68% (mean 60%). These results suggest that common variants may explain at least half the heritability for many diseases. PCGC regression also is readily applicable to other settings, including analyzing extreme-phenotype studies and adjusting for covariates such as sex, age, and population structure. PMID:25422463
Outlier Removal and the Relation with Reporting Errors and Quality of Psychological Research
Bakker, Marjan; Wicherts, Jelte M.
2014-01-01
Background The removal of outliers to acquire a significant result is a questionable research practice that appears to be commonly used in psychology. In this study, we investigated whether the removal of outliers in psychology papers is related to weaker evidence (against the null hypothesis of no effect), a higher prevalence of reporting errors, and smaller sample sizes in these papers compared to papers in the same journals that did not report the exclusion of outliers from the analyses. Methods and Findings We retrieved a total of 2667 statistical results of null hypothesis significance tests from 153 articles in main psychology journals, and compared results from articles in which outliers were removed (N = 92) with results from articles that reported no exclusion of outliers (N = 61). We preregistered our hypotheses and methods and analyzed the data at the level of articles. Results show no significant difference between the two types of articles in median p value, sample sizes, or prevalence of all reporting errors, large reporting errors, and reporting errors that concerned the statistical significance. However, we did find a discrepancy between the reported degrees of freedom of t tests and the reported sample size in 41% of articles that did not report removal of any data values. This suggests common failure to report data exclusions (or missingness) in psychological articles. Conclusions We failed to find that the removal of outliers from the analysis in psychological articles was related to weaker evidence (against the null hypothesis of no effect), sample size, or the prevalence of errors. However, our control sample might be contaminated due to nondisclosure of excluded values in articles that did not report exclusion of outliers. Results therefore highlight the importance of more transparent reporting of statistical analyses. PMID:25072606
Eisenberg, Dan T A; Kuzawa, Christopher W; Hayes, M Geoffrey
2015-01-01
Telomere length (TL) is commonly measured using quantitative PCR (qPCR). Although, easier than the southern blot of terminal restriction fragments (TRF) TL measurement method, one drawback of qPCR is that it introduces greater measurement error and thus reduces the statistical power of analyses. To address a potential source of measurement error, we consider the effect of well position on qPCR TL measurements. qPCR TL data from 3,638 people run on a Bio-Rad iCycler iQ are reanalyzed here. To evaluate measurement validity, correspondence with TRF, age, and between mother and offspring are examined. First, we present evidence for systematic variation in qPCR TL measurements in relation to thermocycler well position. Controlling for these well-position effects consistently improves measurement validity and yields estimated improvements in statistical power equivalent to increasing sample sizes by 16%. We additionally evaluated the linearity of the relationships between telomere and single copy gene control amplicons and between qPCR and TRF measures. We find that, unlike some previous reports, our data exhibit linear relationships. We introduce the standard error in percent, a superior method for quantifying measurement error as compared to the commonly used coefficient of variation. Using this measure, we find that excluding samples with high measurement error does not improve measurement validity in our study. Future studies using block-based thermocyclers should consider well position effects. Since additional information can be gleaned from well position corrections, rerunning analyses of previous results with well position correction could serve as an independent test of the validity of these results. © 2015 Wiley Periodicals, Inc.
Common pitfalls in statistical analysis: Clinical versus statistical significance
Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc
2015-01-01
In clinical research, study results, which are statistically significant are often interpreted as being clinically important. While statistical significance indicates the reliability of the study results, clinical significance reflects its impact on clinical practice. The third article in this series exploring pitfalls in statistical analysis clarifies the importance of differentiating between statistical significance and clinical significance. PMID:26229754
Q methodology: a new way of assessing employee satisfaction.
Chinnis, A S; Summers, D E; Doerr, C; Paulson, D J; Davis, S M
2001-05-01
As yet another nursing shortage faces the country, the issue of the satisfaction of nurses again becomes of critical concern to nursing managers in the interest of staff retention. The authors describe the use of the statistical technique Q methodology to assess the needs of nurses and other medical staff at a level one, tertiary care emergency department in the United States. Using the Q method, the authors were able to identify different, unique viewpoints concerning employee needs among the study population, as well as commonly shared views. This level of detail, not obtainable using more traditional statistical techniques, can aid in the design of more effective strategies aimed at fulfilling the needs of an organization's staff to increase their satisfaction.
Size Matters: What Are the Characteristic Source Areas for Urban Planning Strategies?
Fan, Chao; Myint, Soe W.; Wang, Chenghao
2016-01-01
Urban environmental measurements and observational statistics should reflect the properties generated over an adjacent area of adequate length where homogeneity is usually assumed. The determination of this characteristic source area that gives sufficient representation of the horizontal coverage of a sensing instrument or the fetch of transported quantities is of critical importance to guide the design and implementation of urban landscape planning strategies. In this study, we aim to unify two different methods for estimating source areas, viz. the statistical correlation method commonly used by geographers for landscape fragmentation and the mechanistic footprint model by meteorologists for atmospheric measurements. Good agreement was found in the intercomparison of the estimate of source areas by the two methods, based on 2-m air temperature measurement collected using a network of weather stations. The results can be extended to shed new lights on urban planning strategies, such as the use of urban vegetation for heat mitigation. In general, a sizable patch of landscape is required in order to play an effective role in regulating the local environment, proportional to the height at which stakeholders’ interest is mainly concerned. PMID:27832111
Comparison of four different methods for detection of biofilm formation by uropathogens.
Panda, Pragyan Swagatika; Chaudhary, Uma; Dube, Surya K
2016-01-01
Urinary tract infection (UTI) is one of the most common infectious diseases encountered in clinical practice. Emerging resistance of the uropathogens to the antimicrobial agents due to biofilm formation is a matter of concern while treating symptomatic UTI. However, studies comparing different methods for detection of biofilm by uropathogens are scarce. To compare four different methods for detection of biofilm formation by uropathogens. Prospective observational study conducted in a tertiary care hospital. Totally 300 isolates from urinary samples were analyzed for biofilm formation by four methods, that is, tissue culture plate (TCP) method, tube method (TM), Congo Red Agar (CRA) method and modified CRA (MCRA) method. Chi-square test was applied when two or more set of variables were compared. P < 0.05 considered as statistically significant. Considering TCP to be a gold standard method for our study we calculated other statistical parameters. The rate of biofilm detection was 45.6%, 39.3% and 11% each by TCP, TM, CRA and MCRA methods, respectively. The difference between TCP and only CRA/MCRA was significant, but not that between TCP and TM. There was no difference in the rate of biofilm detection between CRA and MCRA in other isolates, but MCRA is superior to CRA for detection of the staphylococcal biofilm formation. TCP method is the ideal method for detection of bacterial biofilm formation by uropathogens. MCRA method is superior only to CRA for detection of staphylococcal biofilm formation.
On the analysis of studies of choice
Mullins, Eamonn; Agunwamba, Christian C.; Donohoe, Anthony J.
1982-01-01
In a review of 103 sets of data from 23 different studies of choice, Baum (1979) concluded that whereas undermatching was most commonly observed for responses, the time measure generally conformed to the matching relation. A reexamination of the evidence presented by Baum concludes that undermatching is the most commonly observed finding for both measures. Use of the coefficient of determination by both Baum (1979) and de Villiers (1977) for assessing when matching occurs is criticized on statistical grounds. An alternative to the loss-in-predictability criterion used by Baum (1979) is proposed. This alternative statistic has a simple operational meaning and is related to the usual F-ratio test. It can therefore be used as a formal test of the hypothesis that matching occurs. Baum (1979) also suggests that slope values of between .90 and 1.11 can be considered good approximations to matching. It is argued that the establishment of a fixed interval as a criterion for determining when matching occurs, is inappropriate. A confidence interval based on the data from any given experiment is suggested as a more useful method of assessment. PMID:16812271
Tests for a disease-susceptibility locus allowing for an inbreeding coefficient (F).
Song, Kijoung; Elston, Robert C
2003-11-01
We begin by discussing the false positive test results that arise because of cryptic relatedness and population substructure when testing a disease susceptibility locus. We extend and evaluate the Hardy-Weinberg disequilibrium (HWD) method, allowing for an inbreeding coefficient (F) in a similar way that Devlin and Roeder (1999) allowed for inbreeding in a case-control study. Then we compare the HWD measure and the common direct measure of linkage disequilibrium, both when there is no population substructure (F = 0) and when there is population substructure (F not = 0), for a single marker. The HWD test statistic gives rise to false positives caused by population stratification. These false positives can be controlled by adjusting the test statistic for the amount of variance inflation caused by the inbreeding coefficient (F). The power loss for the HWD test that arises when controlling for population structure is much less than that which arises for the common direct measure of linkage disequilibrium. However, in the multiplicative model, the HWD test has virtually no power even when allowing for non-zero F.
Möltgen, C-V; Herdling, T; Reich, G
2013-11-01
This study demonstrates an approach, using science-based calibration (SBC), for direct coating thickness determination on heart-shaped tablets in real-time. Near-Infrared (NIR) spectra were collected during four full industrial pan coating operations. The tablets were coated with a thin hydroxypropyl methylcellulose (HPMC) film up to a film thickness of 28 μm. The application of SBC permits the calibration of the NIR spectral data without using costly determined reference values. This is due to the fact that SBC combines classical methods to estimate the coating signal and statistical methods for the noise estimation. The approach enabled the use of NIR for the measurement of the film thickness increase from around 8 to 28 μm of four independent batches in real-time. The developed model provided a spectroscopic limit of detection for the coating thickness of 0.64 ± 0.03 μm root-mean square (RMS). In the commonly used statistical methods for calibration, such as Partial Least Squares (PLS), sufficiently varying reference values are needed for calibration. For thin non-functional coatings this is a challenge because the quality of the model depends on the accuracy of the selected calibration standards. The obvious and simple approach of SBC eliminates many of the problems associated with the conventional statistical methods and offers an alternative for multivariate calibration. Copyright © 2013 Elsevier B.V. All rights reserved.
Measurement of the local food environment: a comparison of existing data sources.
Bader, Michael D M; Ailshire, Jennifer A; Morenoff, Jeffrey D; House, James S
2010-03-01
Studying the relation between the residential environment and health requires valid, reliable, and cost-effective methods to collect data on residential environments. This 2002 study compared the level of agreement between measures of the presence of neighborhood businesses drawn from 2 common sources of data used for research on the built environment and health: listings of businesses from commercial databases and direct observations of city blocks by raters. Kappa statistics were calculated for 6 types of businesses-drugstores, liquor stores, bars, convenience stores, restaurants, and grocers-located on 1,663 city blocks in Chicago, Illinois. Logistic regressions estimated whether disagreement between measurement methods was systematically correlated with the socioeconomic and demographic characteristics of neighborhoods. Levels of agreement between the 2 sources were relatively high, with significant (P < 0.001) kappa statistics for each business type ranging from 0.32 to 0.70. Most business types were more likely to be reported by direct observations than in the commercial database listings. Disagreement between the 2 sources was not significantly correlated with the socioeconomic and demographic characteristics of neighborhoods. Results suggest that researchers should have reasonable confidence using whichever method (or combination of methods) is most cost-effective and theoretically appropriate for their research design.
Graph-based inductive reasoning.
Boumans, Marcel
2016-10-01
This article discusses methods of inductive inferences that are methods of visualizations designed in such a way that the "eye" can be employed as a reliable tool for judgment. The term "eye" is used as a stand-in for visual cognition and perceptual processing. In this paper "meaningfulness" has a particular meaning, namely accuracy, which is closeness to truth. Accuracy consists of precision and unbiasedness. Precision is dealt with by statistical methods, but for unbiasedness one needs expert judgment. The common view at the beginning of the twentieth century was to make the most efficient use of this kind of judgment by representing the data in shapes and forms in such a way that the "eye" can function as a reliable judge to reduce bias. The need for judgment of the "eye" is even more necessary when the background conditions of the observations are heterogeneous. Statistical procedures require a certain minimal level of homogeneity, but the "eye" does not. The "eye" is an adequate tool for assessing topological similarities when, due to heterogeneity of the data, metric assessment is not possible. In fact, graphical assessments precedes measurement, or to put it more forcefully, the graphic method is a necessary prerequisite for measurement. Copyright © 2016 Elsevier Ltd. All rights reserved.
Sex differences in mechanical allodynia: how can it be preclinically quantified and analyzed?
Nicotra, Lauren; Tuke, Jonathan; Grace, Peter M.; Rolan, Paul E.; Hutchinson, Mark R.
2014-01-01
Translating promising preclinical drug discoveries to successful clinical trials remains a significant hurdle in pain research. Although animal models have significantly contributed to understanding chronic pain pathophysiology, the majority of research has focused on male rodents using testing procedures that produce sex difference data that do not align well with comparable clinical experiences. Additionally, the use of animal pain models presents ongoing ethical challenges demanding continuing refinement of preclinical methods. To this end, this study sought to test a quantitative allodynia assessment technique and associated statistical analysis in a modified graded nerve injury pain model with the aim to further examine sex differences in allodynia. Graded allodynia was established in male and female Sprague Dawley rats by altering the number of sutures placed around the sciatic nerve and quantified by the von Frey test. Linear mixed effects modeling regressed response on each fixed effect (sex, oestrus cycle, pain treatment). On comparison with other common von Frey assessment techniques, utilizing lower threshold filaments than those ordinarily tested, at 1 s intervals, appropriately and successfully investigated female mechanical allodynia, revealing significant sex and oestrus cycle difference across the graded allodynia that other common behavioral methods were unable to detect. Utilizing this different von Frey approach and graded allodynia model, a single suture inflicting less allodynia was sufficient to demonstrate exaggerated female mechanical allodynia throughout the phases of dioestrus and pro-oestrus. Refining the von Frey testing method, statistical analysis technique and the use of a graded model of chronic pain, allowed for examination of the influences on female mechanical nociception that other von Frey methods cannot provide. PMID:24592221
Zhao, Huiying; Nyholt, Dale R; Yang, Yuanhao; Wang, Jihua; Yang, Yuedong
2017-06-14
Genome-wide association studies (GWAS) have successfully identified single variants associated with diseases. To increase the power of GWAS, gene-based and pathway-based tests are commonly employed to detect more risk factors. However, the gene- and pathway-based association tests may be biased towards genes or pathways containing a large number of single-nucleotide polymorphisms (SNPs) with small P-values caused by high linkage disequilibrium (LD) correlations. To address such bias, numerous pathway-based methods have been developed. Here we propose a novel method, DGAT-path, to divide all SNPs assigned to genes in each pathway into LD blocks, and to sum the chi-square statistics of LD blocks for assessing the significance of the pathway by permutation tests. The method was proven robust with the type I error rate >1.6 times lower than other methods. Meanwhile, the method displays a higher power and is not biased by the pathway size. The applications to the GWAS summary statistics for schizophrenia and breast cancer indicate that the detected top pathways contain more genes close to associated SNPs than other methods. As a result, the method identified 17 and 12 significant pathways containing 20 and 21 novel associated genes, respectively for two diseases. The method is available online by http://sparks-lab.org/server/DGAT-path .
On the Use of Statistics in Design and the Implications for Deterministic Computer Experiments
NASA Technical Reports Server (NTRS)
Simpson, Timothy W.; Peplinski, Jesse; Koch, Patrick N.; Allen, Janet K.
1997-01-01
Perhaps the most prevalent use of statistics in engineering design is through Taguchi's parameter and robust design -- using orthogonal arrays to compute signal-to-noise ratios in a process of design improvement. In our view, however, there is an equally exciting use of statistics in design that could become just as prevalent: it is the concept of metamodeling whereby statistical models are built to approximate detailed computer analysis codes. Although computers continue to get faster, analysis codes always seem to keep pace so that their computational time remains non-trivial. Through metamodeling, approximations of these codes are built that are orders of magnitude cheaper to run. These metamodels can then be linked to optimization routines for fast analysis, or they can serve as a bridge for integrating analysis codes across different domains. In this paper we first review metamodeling techniques that encompass design of experiments, response surface methodology, Taguchi methods, neural networks, inductive learning, and kriging. We discuss their existing applications in engineering design and then address the dangers of applying traditional statistical techniques to approximate deterministic computer analysis codes. We conclude with recommendations for the appropriate use of metamodeling techniques in given situations and how common pitfalls can be avoided.
Functional annotation of regulatory pathways.
Pandey, Jayesh; Koyutürk, Mehmet; Kim, Yohan; Szpankowski, Wojciech; Subramaniam, Shankar; Grama, Ananth
2007-07-01
Standardized annotations of biomolecules in interaction networks (e.g. Gene Ontology) provide comprehensive understanding of the function of individual molecules. Extending such annotations to pathways is a critical component of functional characterization of cellular signaling at the systems level. We propose a framework for projecting gene regulatory networks onto the space of functional attributes using multigraph models, with the objective of deriving statistically significant pathway annotations. We first demonstrate that annotations of pairwise interactions do not generalize to indirect relationships between processes. Motivated by this result, we formalize the problem of identifying statistically overrepresented pathways of functional attributes. We establish the hardness of this problem by demonstrating the non-monotonicity of common statistical significance measures. We propose a statistical model that emphasizes the modularity of a pathway, evaluating its significance based on the coupling of its building blocks. We complement the statistical model by an efficient algorithm and software, Narada, for computing significant pathways in large regulatory networks. Comprehensive results from our methods applied to the Escherichia coli transcription network demonstrate that our approach is effective in identifying known, as well as novel biological pathway annotations. Narada is implemented in Java and is available at http://www.cs.purdue.edu/homes/jpandey/narada/.
Sampling methods to the statistical control of the production of blood components.
Pereira, Paulo; Seghatchian, Jerard; Caldeira, Beatriz; Santos, Paula; Castro, Rosa; Fernandes, Teresa; Xavier, Sandra; de Sousa, Gracinda; de Almeida E Sousa, João Paulo
2017-12-01
The control of blood components specifications is a requirement generalized in Europe by the European Commission Directives and in the US by the AABB standards. The use of a statistical process control methodology is recommended in the related literature, including the EDQM guideline. The control reliability is dependent of the sampling. However, a correct sampling methodology seems not to be systematically applied. Commonly, the sampling is intended to comply uniquely with the 1% specification to the produced blood components. Nevertheless, on a purely statistical viewpoint, this model could be argued not to be related to a consistent sampling technique. This could be a severe limitation to detect abnormal patterns and to assure that the production has a non-significant probability of producing nonconforming components. This article discusses what is happening in blood establishments. Three statistical methodologies are proposed: simple random sampling, sampling based on the proportion of a finite population, and sampling based on the inspection level. The empirical results demonstrate that these models are practicable in blood establishments contributing to the robustness of sampling and related statistical process control decisions for the purpose they are suggested for. Copyright © 2017 Elsevier Ltd. All rights reserved.
Wang, Dan; Singhasemanon, Nan; Goh, Kean S
2016-11-15
Pesticides are routinely monitored in surface waters and resultant data are analyzed to assess whether their uses will damage aquatic eco-systems. However, the utility of the monitoring data is limited because of the insufficiency in the temporal and spatial sampling coverage and the inability to detect and quantify trace concentrations. This study developed a novel assessment procedure that addresses those limitations by combining 1) statistical methods capable of extracting information from concentrations below changing detection limits, 2) statistical resampling techniques that account for uncertainties rooted in the non-detects and insufficient/irregular sampling coverage, and 3) multiple lines of evidence that improve confidence in the final conclusion. This procedure was demonstrated by an assessment on chlorpyrifos monitoring data in surface waters of California's Central Valley (2005-2013). We detected a significant downward trend in the concentrations, which cannot be observed by commonly-used statistical approaches. We assessed that the aquatic risk was low using a probabilistic method that works with non-detects and has the ability to differentiate indicator groups with varying sensitivity. In addition, we showed that the frequency of exceedance over ambient aquatic life water quality criteria was affected by pesticide use, precipitation and irrigation demand in certain periods anteceding the water sampling events. Copyright © 2016 Elsevier B.V. All rights reserved.
Wiemken, Timothy L; Furmanek, Stephen P; Mattingly, William A; Wright, Marc-Oliver; Persaud, Annuradha K; Guinn, Brian E; Carrico, Ruth M; Arnold, Forest W; Ramirez, Julio A
2018-02-01
Although not all health care-associated infections (HAIs) are preventable, reducing HAIs through targeted intervention is key to a successful infection prevention program. To identify areas in need of targeted intervention, robust statistical methods must be used when analyzing surveillance data. The objective of this study was to compare and contrast statistical process control (SPC) charts with Twitter's anomaly and breakout detection algorithms. SPC and anomaly/breakout detection (ABD) charts were created for vancomycin-resistant Enterococcus, Acinetobacter baumannii, catheter-associated urinary tract infection, and central line-associated bloodstream infection data. Both SPC and ABD charts detected similar data points as anomalous/out of control on most charts. The vancomycin-resistant Enterococcus ABD chart detected an extra anomalous point that appeared to be higher than the same time period in prior years. Using a small subset of the central line-associated bloodstream infection data, the ABD chart was able to detect anomalies where the SPC chart was not. SPC charts and ABD charts both performed well, although ABD charts appeared to work better in the context of seasonal variation and autocorrelation. Because they account for common statistical issues in HAI data, ABD charts may be useful for practitioners for analysis of HAI surveillance data. Copyright © 2018 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.
Directions for new developments on statistical design and analysis of small population group trials.
Hilgers, Ralf-Dieter; Roes, Kit; Stallard, Nigel
2016-06-14
Most statistical design and analysis methods for clinical trials have been developed and evaluated where at least several hundreds of patients could be recruited. These methods may not be suitable to evaluate therapies if the sample size is unavoidably small, which is usually termed by small populations. The specific sample size cut off, where the standard methods fail, needs to be investigated. In this paper, the authors present their view on new developments for design and analysis of clinical trials in small population groups, where conventional statistical methods may be inappropriate, e.g., because of lack of power or poor adherence to asymptotic approximations due to sample size restrictions. Following the EMA/CHMP guideline on clinical trials in small populations, we consider directions for new developments in the area of statistical methodology for design and analysis of small population clinical trials. We relate the findings to the research activities of three projects, Asterix, IDeAl, and InSPiRe, which have received funding since 2013 within the FP7-HEALTH-2013-INNOVATION-1 framework of the EU. As not all aspects of the wide research area of small population clinical trials can be addressed, we focus on areas where we feel advances are needed and feasible. The general framework of the EMA/CHMP guideline on small population clinical trials stimulates a number of research areas. These serve as the basis for the three projects, Asterix, IDeAl, and InSPiRe, which use various approaches to develop new statistical methodology for design and analysis of small population clinical trials. Small population clinical trials refer to trials with a limited number of patients. Small populations may result form rare diseases or specific subtypes of more common diseases. New statistical methodology needs to be tailored to these specific situations. The main results from the three projects will constitute a useful toolbox for improved design and analysis of small population clinical trials. They address various challenges presented by the EMA/CHMP guideline as well as recent discussions about extrapolation. There is a need for involvement of the patients' perspective in the planning and conduct of small population clinical trials for a successful therapy evaluation.
Common Scientific and Statistical Errors in Obesity Research
George, Brandon J.; Beasley, T. Mark; Brown, Andrew W.; Dawson, John; Dimova, Rositsa; Divers, Jasmin; Goldsby, TaShauna U.; Heo, Moonseong; Kaiser, Kathryn A.; Keith, Scott; Kim, Mimi Y.; Li, Peng; Mehta, Tapan; Oakes, J. Michael; Skinner, Asheley; Stuart, Elizabeth; Allison, David B.
2015-01-01
We identify 10 common errors and problems in the statistical analysis, design, interpretation, and reporting of obesity research and discuss how they can be avoided. The 10 topics are: 1) misinterpretation of statistical significance, 2) inappropriate testing against baseline values, 3) excessive and undisclosed multiple testing and “p-value hacking,” 4) mishandling of clustering in cluster randomized trials, 5) misconceptions about nonparametric tests, 6) mishandling of missing data, 7) miscalculation of effect sizes, 8) ignoring regression to the mean, 9) ignoring confirmation bias, and 10) insufficient statistical reporting. We hope that discussion of these errors can improve the quality of obesity research by helping researchers to implement proper statistical practice and to know when to seek the help of a statistician. PMID:27028280
MIDAS: Regionally linear multivariate discriminative statistical mapping.
Varol, Erdem; Sotiras, Aristeidis; Davatzikos, Christos
2018-07-01
Statistical parametric maps formed via voxel-wise mass-univariate tests, such as the general linear model, are commonly used to test hypotheses about regionally specific effects in neuroimaging cross-sectional studies where each subject is represented by a single image. Despite being informative, these techniques remain limited as they ignore multivariate relationships in the data. Most importantly, the commonly employed local Gaussian smoothing, which is important for accounting for registration errors and making the data follow Gaussian distributions, is usually chosen in an ad hoc fashion. Thus, it is often suboptimal for the task of detecting group differences and correlations with non-imaging variables. Information mapping techniques, such as searchlight, which use pattern classifiers to exploit multivariate information and obtain more powerful statistical maps, have become increasingly popular in recent years. However, existing methods may lead to important interpretation errors in practice (i.e., misidentifying a cluster as informative, or failing to detect truly informative voxels), while often being computationally expensive. To address these issues, we introduce a novel efficient multivariate statistical framework for cross-sectional studies, termed MIDAS, seeking highly sensitive and specific voxel-wise brain maps, while leveraging the power of regional discriminant analysis. In MIDAS, locally linear discriminative learning is applied to estimate the pattern that best discriminates between two groups, or predicts a variable of interest. This pattern is equivalent to local filtering by an optimal kernel whose coefficients are the weights of the linear discriminant. By composing information from all neighborhoods that contain a given voxel, MIDAS produces a statistic that collectively reflects the contribution of the voxel to the regional classifiers as well as the discriminative power of the classifiers. Critically, MIDAS efficiently assesses the statistical significance of the derived statistic by analytically approximating its null distribution without the need for computationally expensive permutation tests. The proposed framework was extensively validated using simulated atrophy in structural magnetic resonance imaging (MRI) and further tested using data from a task-based functional MRI study as well as a structural MRI study of cognitive performance. The performance of the proposed framework was evaluated against standard voxel-wise general linear models and other information mapping methods. The experimental results showed that MIDAS achieves relatively higher sensitivity and specificity in detecting group differences. Together, our results demonstrate the potential of the proposed approach to efficiently map effects of interest in both structural and functional data. Copyright © 2018. Published by Elsevier Inc.
40 CFR Appendix C to Part 60 - Determination of Emission Rate Change
Code of Federal Regulations, 2011 CFR
2011-07-01
... emission rate to the atmosphere. The method used is the Student's t test, commonly used to make inferences.... EC01JN92.294 3.5 Calculate the test statistic, t, using Equation 4. EC01JN92.295 4.Results 4.1If E b>,E a... occurred. Table 1 Degrees of freedom (n a=n b−2) t′ (95 percent confidence level) 2 2.920 3 2.353 4 2.132 5...
KECSA-Movable Type Implicit Solvation Model (KMTISM)
2015-01-01
Computation of the solvation free energy for chemical and biological processes has long been of significant interest. The key challenges to effective solvation modeling center on the choice of potential function and configurational sampling. Herein, an energy sampling approach termed the “Movable Type” (MT) method, and a statistical energy function for solvation modeling, “Knowledge-based and Empirical Combined Scoring Algorithm” (KECSA) are developed and utilized to create an implicit solvation model: KECSA-Movable Type Implicit Solvation Model (KMTISM) suitable for the study of chemical and biological systems. KMTISM is an implicit solvation model, but the MT method performs energy sampling at the atom pairwise level. For a specific molecular system, the MT method collects energies from prebuilt databases for the requisite atom pairs at all relevant distance ranges, which by its very construction encodes all possible molecular configurations simultaneously. Unlike traditional statistical energy functions, KECSA converts structural statistical information into categorized atom pairwise interaction energies as a function of the radial distance instead of a mean force energy function. Within the implicit solvent model approximation, aqueous solvation free energies are then obtained from the NVT ensemble partition function generated by the MT method. Validation is performed against several subsets selected from the Minnesota Solvation Database v2012. Results are compared with several solvation free energy calculation methods, including a one-to-one comparison against two commonly used classical implicit solvation models: MM-GBSA and MM-PBSA. Comparison against a quantum mechanics based polarizable continuum model is also discussed (Cramer and Truhlar’s Solvation Model 12). PMID:25691832
Gamarra, Soledad; Morano, Susana; Dudiuk, Catiana; Mancilla, Estefanía; Nardin, María Elena; de Los Angeles Méndez, Emilce; Garcia-Effron, Guillermo
2014-10-01
Vulvovaginal candidiasis is one of the most common mycosis. However, the information about antifungal susceptibilities of the yeasts causing this infection is scant. We studied 121 yeasts isolated from 118 patients with vulvovaginal candidiasis. The isolates were identified by phenotypic and molecular methods, including four phenotypic methods described to differentiate Candida albicans from C. dubliniensis. Antifungal susceptibility testing was performed according to CLSI documents M27A3 and M27S4 using the drugs available as treatment option in the hospital. Diabetes, any antibacterial and amoxicillin treatment were statistically linked with vulvovaginal candidiasis, while oral contraceptives were not considered a risk factor. Previous azole-based over-the-counter antifungal treatment was statistically associated with non-C.albicans yeasts infections. The most common isolated yeast species was C. albicans (85.2 %) followed by C. glabrata (5 %), Saccharomyces cerevisiae (3.3 %), and C. dubliniensis (2.5 %). Fluconazole- and itraconazole-reduced susceptibility was observed in ten and in only one C. albicans strains, respectively. All the C. glabrata isolates showed low fluconazole MICs. Clotrimazole showed excellent potency against all but seven isolates (three C. glabrata, two S. cerevisiae, one C. albicans and one Picchia anomala). Any of the strains showed nystatin reduced susceptibility. On the other hand, terbinafine was the less potent drug. Antifungal resistance is still a rare phenomenon supporting the use of azole antifungals as empirical treatment of vulvovaginal candidiasis.
Accounting for host cell protein behavior in anion-exchange chromatography.
Swanson, Ryan K; Xu, Ruo; Nettleton, Daniel S; Glatz, Charles E
2016-11-01
Host cell proteins (HCP) are a problematic set of impurities in downstream processing (DSP) as they behave most similarly to the target protein during separation. Approaching DSP with the knowledge of HCP separation behavior would be beneficial for the production of high purity recombinant biologics. Therefore, this work was aimed at characterizing the separation behavior of complex mixtures of HCP during a commonly used method: anion-exchange chromatography (AEX). An additional goal was to evaluate the performance of a statistical methodology, based on the characterization data, as a tool for predicting protein separation behavior. Aqueous two-phase partitioning followed by two-dimensional electrophoresis provided data on the three physicochemical properties most commonly exploited during DSP for each HCP: pI (isoelectric point), molecular weight, and surface hydrophobicity. The protein separation behaviors of two alternative expression host extracts (corn germ and E. coli) were characterized. A multivariate random forest (MVRF) statistical methodology was then applied to the database of characterized proteins creating a tool for predicting the AEX behavior of a mixture of proteins. The accuracy of the MVRF method was determined by calculating a root mean squared error value for each database. This measure never exceeded a value of 0.045 (fraction of protein populating each of the multiple separation fractions) for AEX. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 32:1453-1463, 2016. © 2016 American Institute of Chemical Engineers.
Powerful Statistical Inference for Nested Data Using Sufficient Summary Statistics
Dowding, Irene; Haufe, Stefan
2018-01-01
Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This “naive” approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment. PMID:29615885
On sufficient statistics of least-squares superposition of vector sets.
Konagurthu, Arun S; Kasarapu, Parthan; Allison, Lloyd; Collier, James H; Lesk, Arthur M
2015-06-01
The problem of superposition of two corresponding vector sets by minimizing their sum-of-squares error under orthogonal transformation is a fundamental task in many areas of science, notably structural molecular biology. This problem can be solved exactly using an algorithm whose time complexity grows linearly with the number of correspondences. This efficient solution has facilitated the widespread use of the superposition task, particularly in studies involving macromolecular structures. This article formally derives a set of sufficient statistics for the least-squares superposition problem. These statistics are additive. This permits a highly efficient (constant time) computation of superpositions (and sufficient statistics) of vector sets that are composed from its constituent vector sets under addition or deletion operation, where the sufficient statistics of the constituent sets are already known (that is, the constituent vector sets have been previously superposed). This results in a drastic improvement in the run time of the methods that commonly superpose vector sets under addition or deletion operations, where previously these operations were carried out ab initio (ignoring the sufficient statistics). We experimentally demonstrate the improvement our work offers in the context of protein structural alignment programs that assemble a reliable structural alignment from well-fitting (substructural) fragment pairs. A C++ library for this task is available online under an open-source license.
A fast exact simulation method for a class of Markov jump processes.
Li, Yao; Hu, Lili
2015-11-14
A new method of the stochastic simulation algorithm (SSA), named the Hashing-Leaping method (HLM), for exact simulations of a class of Markov jump processes, is presented in this paper. The HLM has a conditional constant computational cost per event, which is independent of the number of exponential clocks in the Markov process. The main idea of the HLM is to repeatedly implement a hash-table-like bucket sort algorithm for all times of occurrence covered by a time step with length τ. This paper serves as an introduction to this new SSA method. We introduce the method, demonstrate its implementation, analyze its properties, and compare its performance with three other commonly used SSA methods in four examples. Our performance tests and CPU operation statistics show certain advantages of the HLM for large scale problems.
... Doing AMIGAS Stay Informed Cancer Home Uterine Cancer Statistics Language: English (US) Español (Spanish) Recommend on Facebook ... the most commonly diagnosed gynecologic cancer. U.S. Cancer Statistics Data Visualizations Tool The Data Visualizations tool makes ...
Sb2Te3 and Its Superlattices: Optimization by Statistical Design.
Behera, Jitendra K; Zhou, Xilin; Ranjan, Alok; Simpson, Robert E
2018-05-02
The objective of this work is to demonstrate the usefulness of fractional factorial design for optimizing the crystal quality of chalcogenide van der Waals (vdW) crystals. We statistically analyze the growth parameters of highly c axis oriented Sb 2 Te 3 crystals and Sb 2 Te 3 -GeTe phase change vdW heterostructured superlattices. The statistical significance of the growth parameters of temperature, pressure, power, buffer materials, and buffer layer thickness was found by fractional factorial design and response surface analysis. Temperature, pressure, power, and their second-order interactions are the major factors that significantly influence the quality of the crystals. Additionally, using tungsten rather than molybdenum as a buffer layer significantly enhances the crystal quality. Fractional factorial design minimizes the number of experiments that are necessary to find the optimal growth conditions, resulting in an order of magnitude improvement in the crystal quality. We highlight that statistical design of experiment methods, which is more commonly used in product design, should be considered more broadly by those designing and optimizing materials.
Pei, Yanbo; Tian, Guo-Liang; Tang, Man-Lai
2014-11-10
Stratified data analysis is an important research topic in many biomedical studies and clinical trials. In this article, we develop five test statistics for testing the homogeneity of proportion ratios for stratified correlated bilateral binary data based on an equal correlation model assumption. Bootstrap procedures based on these test statistics are also considered. To evaluate the performance of these statistics and procedures, we conduct Monte Carlo simulations to study their empirical sizes and powers under various scenarios. Our results suggest that the procedure based on score statistic performs well generally and is highly recommended. When the sample size is large, procedures based on the commonly used weighted least square estimate and logarithmic transformation with Mantel-Haenszel estimate are recommended as they do not involve any computation of maximum likelihood estimates requiring iterative algorithms. We also derive approximate sample size formulas based on the recommended test procedures. Finally, we apply the proposed methods to analyze a multi-center randomized clinical trial for scleroderma patients. Copyright © 2014 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Darvishzadeh, R.; Skidmore, A. K.; Mirzaie, M.; Atzberger, C.; Schlerf, M.
2014-12-01
Accurate estimation of grassland biomass at their peak productivity can provide crucial information regarding the functioning and productivity of the rangelands. Hyperspectral remote sensing has proved to be valuable for estimation of vegetation biophysical parameters such as biomass using different statistical techniques. However, in statistical analysis of hyperspectral data, multicollinearity is a common problem due to large amount of correlated hyper-spectral reflectance measurements. The aim of this study was to examine the prospect of above ground biomass estimation in a heterogeneous Mediterranean rangeland employing multivariate calibration methods. Canopy spectral measurements were made in the field using a GER 3700 spectroradiometer, along with concomitant in situ measurements of above ground biomass for 170 sample plots. Multivariate calibrations including partial least squares regression (PLSR), principal component regression (PCR), and Least-Squared Support Vector Machine (LS-SVM) were used to estimate the above ground biomass. The prediction accuracy of the multivariate calibration methods were assessed using cross validated R2 and RMSE. The best model performance was obtained using LS_SVM and then PLSR both calibrated with first derivative reflectance dataset with R2cv = 0.88 & 0.86 and RMSEcv= 1.15 & 1.07 respectively. The weakest prediction accuracy was appeared when PCR were used (R2cv = 0.31 and RMSEcv= 2.48). The obtained results highlight the importance of multivariate calibration methods for biomass estimation when hyperspectral data are used.
Improved sampling and analysis of images in corneal confocal microscopy.
Schaldemose, E L; Fontain, F I; Karlsson, P; Nyengaard, J R
2017-10-01
Corneal confocal microscopy (CCM) is a noninvasive clinical method to analyse and quantify corneal nerve fibres in vivo. Although the CCM technique is in constant progress, there are methodological limitations in terms of sampling of images and objectivity of the nerve quantification. The aim of this study was to present a randomized sampling method of the CCM images and to develop an adjusted area-dependent image analysis. Furthermore, a manual nerve fibre analysis method was compared to a fully automated method. 23 idiopathic small-fibre neuropathy patients were investigated using CCM. Corneal nerve fibre length density (CNFL) and corneal nerve fibre branch density (CNBD) were determined in both a manual and automatic manner. Differences in CNFL and CNBD between (1) the randomized and the most common sampling method, (2) the adjusted and the unadjusted area and (3) the manual and automated quantification method were investigated. The CNFL values were significantly lower when using the randomized sampling method compared to the most common method (p = 0.01). There was not a statistical significant difference in the CNBD values between the randomized and the most common sampling method (p = 0.85). CNFL and CNBD values were increased when using the adjusted area compared to the standard area. Additionally, the study found a significant increase in the CNFL and CNBD values when using the manual method compared to the automatic method (p ≤ 0.001). The study demonstrated a significant difference in the CNFL values between the randomized and common sampling method indicating the importance of clear guidelines for the image sampling. The increase in CNFL and CNBD values when using the adjusted cornea area is not surprising. The observed increases in both CNFL and CNBD values when using the manual method of nerve quantification compared to the automatic method are consistent with earlier findings. This study underlines the importance of improving the analysis of the CCM images in order to obtain more objective corneal nerve fibre measurements. © 2017 The Authors Journal of Microscopy © 2017 Royal Microscopical Society.
Wang, Yikai; Kang, Jian; Kemmer, Phebe B.; Guo, Ying
2016-01-01
Currently, network-oriented analysis of fMRI data has become an important tool for understanding brain organization and brain networks. Among the range of network modeling methods, partial correlation has shown great promises in accurately detecting true brain network connections. However, the application of partial correlation in investigating brain connectivity, especially in large-scale brain networks, has been limited so far due to the technical challenges in its estimation. In this paper, we propose an efficient and reliable statistical method for estimating partial correlation in large-scale brain network modeling. Our method derives partial correlation based on the precision matrix estimated via Constrained L1-minimization Approach (CLIME), which is a recently developed statistical method that is more efficient and demonstrates better performance than the existing methods. To help select an appropriate tuning parameter for sparsity control in the network estimation, we propose a new Dens-based selection method that provides a more informative and flexible tool to allow the users to select the tuning parameter based on the desired sparsity level. Another appealing feature of the Dens-based method is that it is much faster than the existing methods, which provides an important advantage in neuroimaging applications. Simulation studies show that the Dens-based method demonstrates comparable or better performance with respect to the existing methods in network estimation. We applied the proposed partial correlation method to investigate resting state functional connectivity using rs-fMRI data from the Philadelphia Neurodevelopmental Cohort (PNC) study. Our results show that partial correlation analysis removed considerable between-module marginal connections identified by full correlation analysis, suggesting these connections were likely caused by global effects or common connection to other nodes. Based on partial correlation, we find that the most significant direct connections are between homologous brain locations in the left and right hemisphere. When comparing partial correlation derived under different sparse tuning parameters, an important finding is that the sparse regularization has more shrinkage effects on negative functional connections than on positive connections, which supports previous findings that many of the negative brain connections are due to non-neurophysiological effects. An R package “DensParcorr” can be downloaded from CRAN for implementing the proposed statistical methods. PMID:27242395
Wang, Yikai; Kang, Jian; Kemmer, Phebe B; Guo, Ying
2016-01-01
Currently, network-oriented analysis of fMRI data has become an important tool for understanding brain organization and brain networks. Among the range of network modeling methods, partial correlation has shown great promises in accurately detecting true brain network connections. However, the application of partial correlation in investigating brain connectivity, especially in large-scale brain networks, has been limited so far due to the technical challenges in its estimation. In this paper, we propose an efficient and reliable statistical method for estimating partial correlation in large-scale brain network modeling. Our method derives partial correlation based on the precision matrix estimated via Constrained L1-minimization Approach (CLIME), which is a recently developed statistical method that is more efficient and demonstrates better performance than the existing methods. To help select an appropriate tuning parameter for sparsity control in the network estimation, we propose a new Dens-based selection method that provides a more informative and flexible tool to allow the users to select the tuning parameter based on the desired sparsity level. Another appealing feature of the Dens-based method is that it is much faster than the existing methods, which provides an important advantage in neuroimaging applications. Simulation studies show that the Dens-based method demonstrates comparable or better performance with respect to the existing methods in network estimation. We applied the proposed partial correlation method to investigate resting state functional connectivity using rs-fMRI data from the Philadelphia Neurodevelopmental Cohort (PNC) study. Our results show that partial correlation analysis removed considerable between-module marginal connections identified by full correlation analysis, suggesting these connections were likely caused by global effects or common connection to other nodes. Based on partial correlation, we find that the most significant direct connections are between homologous brain locations in the left and right hemisphere. When comparing partial correlation derived under different sparse tuning parameters, an important finding is that the sparse regularization has more shrinkage effects on negative functional connections than on positive connections, which supports previous findings that many of the negative brain connections are due to non-neurophysiological effects. An R package "DensParcorr" can be downloaded from CRAN for implementing the proposed statistical methods.
Khana, Diba; Rossen, Lauren M; Hedegaard, Holly; Warner, Margaret
2018-01-01
Hierarchical Bayes models have been used in disease mapping to examine small scale geographic variation. State level geographic variation for less common causes of mortality outcomes have been reported however county level variation is rarely examined. Due to concerns about statistical reliability and confidentiality, county-level mortality rates based on fewer than 20 deaths are suppressed based on Division of Vital Statistics, National Center for Health Statistics (NCHS) statistical reliability criteria, precluding an examination of spatio-temporal variation in less common causes of mortality outcomes such as suicide rates (SRs) at the county level using direct estimates. Existing Bayesian spatio-temporal modeling strategies can be applied via Integrated Nested Laplace Approximation (INLA) in R to a large number of rare causes of mortality outcomes to enable examination of spatio-temporal variations on smaller geographic scales such as counties. This method allows examination of spatiotemporal variation across the entire U.S., even where the data are sparse. We used mortality data from 2005-2015 to explore spatiotemporal variation in SRs, as one particular application of the Bayesian spatio-temporal modeling strategy in R-INLA to predict year and county-specific SRs. Specifically, hierarchical Bayesian spatio-temporal models were implemented with spatially structured and unstructured random effects, correlated time effects, time varying confounders and space-time interaction terms in the software R-INLA, borrowing strength across both counties and years to produce smoothed county level SRs. Model-based estimates of SRs were mapped to explore geographic variation.