Sample records for multiple statistical methods

  1. Statistical methods and neural network approaches for classification of data from multiple sources

    NASA Technical Reports Server (NTRS)

    Benediktsson, Jon Atli; Swain, Philip H.

    1990-01-01

    Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.

  2. Statistical Methods in Integrative Genomics

    PubMed Central

    Richardson, Sylvia; Tseng, George C.; Sun, Wei

    2016-01-01

    Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions. PMID:27482531

  3. Common pitfalls in statistical analysis: The perils of multiple testing

    PubMed Central

    Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc

    2016-01-01

    Multiple testing refers to situations where a dataset is subjected to statistical testing multiple times - either at multiple time-points or through multiple subgroups or for multiple end-points. This amplifies the probability of a false-positive finding. In this article, we look at the consequences of multiple testing and explore various methods to deal with this issue. PMID:27141478

  4. Analysis Code - Data Analysis in 'Leveraging Multiple Statistical Methods for Inverse Prediction in Nuclear Forensics Applications' (LMSMIPNFA) v. 1.0

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lewis, John R

    R code that performs the analysis of a data set presented in the paper ‘Leveraging Multiple Statistical Methods for Inverse Prediction in Nuclear Forensics Applications’ by Lewis, J., Zhang, A., Anderson-Cook, C. It provides functions for doing inverse predictions in this setting using several different statistical methods. The data set is a publicly available data set from a historical Plutonium production experiment.

  5. Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies.

    PubMed

    Jiang, Wei; Yu, Weichuan

    2017-02-15

    In genome-wide association studies (GWASs) of common diseases/traits, we often analyze multiple GWASs with the same phenotype together to discover associated genetic variants with higher power. Since it is difficult to access data with detailed individual measurements, summary-statistics-based meta-analysis methods have become popular to jointly analyze datasets from multiple GWASs. In this paper, we propose a novel summary-statistics-based joint analysis method based on controlling the joint local false discovery rate (Jlfdr). We prove that our method is the most powerful summary-statistics-based joint analysis method when controlling the false discovery rate at a certain level. In particular, the Jlfdr-based method achieves higher power than commonly used meta-analysis methods when analyzing heterogeneous datasets from multiple GWASs. Simulation experiments demonstrate the superior power of our method over meta-analysis methods. Also, our method discovers more associations than meta-analysis methods from empirical datasets of four phenotypes. The R-package is available at: http://bioinformatics.ust.hk/Jlfdr.html . eeyu@ust.hk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  6. Association analysis of multiple traits by an approach of combining P values.

    PubMed

    Chen, Lili; Wang, Yong; Zhou, Yajing

    2018-03-01

    Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants.

  7. Multiple testing and power calculations in genetic association studies.

    PubMed

    So, Hon-Cheong; Sham, Pak C

    2011-01-01

    Modern genetic association studies typically involve multiple single-nucleotide polymorphisms (SNPs) and/or multiple genes. With the development of high-throughput genotyping technologies and the reduction in genotyping cost, investigators can now assay up to a million SNPs for direct or indirect association with disease phenotypes. In addition, some studies involve multiple disease or related phenotypes and use multiple methods of statistical analysis. The combination of multiple genetic loci, multiple phenotypes, and multiple methods of evaluating associations between genotype and phenotype means that modern genetic studies often involve the testing of an enormous number of hypotheses. When multiple hypothesis tests are performed in a study, there is a risk of inflation of the type I error rate (i.e., the chance of falsely claiming an association when there is none). Several methods for multiple-testing correction are in popular use, and they all have strengths and weaknesses. Because no single method is universally adopted or always appropriate, it is important to understand the principles, strengths, and weaknesses of the methods so that they can be applied appropriately in practice. In this article, we review the three principle methods for multiple-testing correction and provide guidance for calculating statistical power.

  8. An Illustration to Assist in Comparing and Remembering Several Multiplicity Adjustment Methods

    ERIC Educational Resources Information Center

    Hasler, Mario

    2017-01-01

    There are many well-known or new methods to adjust statistical tests for multiplicity. This article provides an illustration helping lecturers or consultants to remember the differences of three important multiplicity adjustment methods and to explain them to non-statisticians.

  9. Estimating the mass variance in neutron multiplicity counting-A comparison of approaches

    NASA Astrophysics Data System (ADS)

    Dubi, C.; Croft, S.; Favalli, A.; Ocherashvili, A.; Pedersen, B.

    2017-12-01

    In the standard practice of neutron multiplicity counting , the first three sampled factorial moments of the event triggered neutron count distribution are used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α , n) production and the induced fission source responsible for multiplication. This study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra, Italy, sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.

  10. Estimating the mass variance in neutron multiplicity counting $-$ A comparison of approaches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dubi, C.; Croft, S.; Favalli, A.

    In the standard practice of neutron multiplicity counting, the first three sampled factorial moments of the event triggered neutron count distribution are used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. This study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra, Italy,more » sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  11. Estimating the mass variance in neutron multiplicity counting $-$ A comparison of approaches

    DOE PAGES

    Dubi, C.; Croft, S.; Favalli, A.; ...

    2017-09-14

    In the standard practice of neutron multiplicity counting, the first three sampled factorial moments of the event triggered neutron count distribution are used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. This study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra, Italy,more » sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  12. Estimating weak ratiometric signals in imaging data. II. Meta-analysis with multiple, dual-channel datasets.

    PubMed

    Sornborger, Andrew; Broder, Josef; Majumder, Anirban; Srinivasamoorthy, Ganesh; Porter, Erika; Reagin, Sean S; Keith, Charles; Lauderdale, James D

    2008-09-01

    Ratiometric fluorescent indicators are used for making quantitative measurements of a variety of physiological variables. Their utility is often limited by noise. This is the second in a series of papers describing statistical methods for denoising ratiometric data with the aim of obtaining improved quantitative estimates of variables of interest. Here, we outline a statistical optimization method that is designed for the analysis of ratiometric imaging data in which multiple measurements have been taken of systems responding to the same stimulation protocol. This method takes advantage of correlated information across multiple datasets for objectively detecting and estimating ratiometric signals. We demonstrate our method by showing results of its application on multiple, ratiometric calcium imaging experiments.

  13. Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.

    PubMed

    Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen

    2015-05-01

    Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.

  14. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  15. An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics.

    PubMed

    Kim, Junghi; Bai, Yun; Pan, Wei

    2015-12-01

    We study the problem of testing for single marker-multiple phenotype associations based on genome-wide association study (GWAS) summary statistics without access to individual-level genotype and phenotype data. For most published GWASs, because obtaining summary data is substantially easier than accessing individual-level phenotype and genotype data, while often multiple correlated traits have been collected, the problem studied here has become increasingly important. We propose a powerful adaptive test and compare its performance with some existing tests. We illustrate its applications to analyses of a meta-analyzed GWAS dataset with three blood lipid traits and another with sex-stratified anthropometric traits, and further demonstrate its potential power gain over some existing methods through realistic simulation studies. We start from the situation with only one set of (possibly meta-analyzed) genome-wide summary statistics, then extend the method to meta-analysis of multiple sets of genome-wide summary statistics, each from one GWAS. We expect the proposed test to be useful in practice as more powerful than or complementary to existing methods. © 2015 WILEY PERIODICALS, INC.

  16. A spatial scan statistic for multiple clusters.

    PubMed

    Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie

    2011-10-01

    Spatial scan statistics are commonly used for geographical disease surveillance and cluster detection. While there are multiple clusters coexisting in the study area, they become difficult to detect because of clusters' shadowing effect to each other. The recently proposed sequential method showed its better power for detecting the second weaker cluster, but did not improve the ability of detecting the first stronger cluster which is more important than the second one. We propose a new extension of the spatial scan statistic which could be used to detect multiple clusters. Through constructing two or more clusters in the alternative hypothesis, our proposed method accounts for other coexisting clusters in the detecting and evaluating process. The performance of the proposed method is compared to the sequential method through an intensive simulation study, in which our proposed method shows better power in terms of both rejecting the null hypothesis and accurately detecting the coexisting clusters. In the real study of hand-foot-mouth disease data in Pingdu city, a true cluster town is successfully detected by our proposed method, which cannot be evaluated to be statistically significant by the standard method due to another cluster's shadowing effect. Copyright © 2011 Elsevier Inc. All rights reserved.

  17. Multiple Phenotype Association Tests Using Summary Statistics in Genome-Wide Association Studies

    PubMed Central

    Liu, Zhonghua; Lin, Xihong

    2017-01-01

    Summary We study in this paper jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. PMID:28653391

  18. Multiple phenotype association tests using summary statistics in genome-wide association studies.

    PubMed

    Liu, Zhonghua; Lin, Xihong

    2018-03-01

    We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. © 2017, The International Biometric Society.

  19. Monte Carlo based statistical power analysis for mediation models: methods and software.

    PubMed

    Zhang, Zhiyong

    2014-12-01

    The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Woodroffe, J. R.; Brito, T. V.; Jordanova, V. K.

    In the standard practice of neutron multiplicity counting , the first three sampled factorial moments of the event triggered neutron count distribution were used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. Our study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra,more » Italy, sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  1. Data-optimized source modeling with the Backwards Liouville Test–Kinetic method

    DOE PAGES

    Woodroffe, J. R.; Brito, T. V.; Jordanova, V. K.; ...

    2017-09-14

    In the standard practice of neutron multiplicity counting , the first three sampled factorial moments of the event triggered neutron count distribution were used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. Our study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra,more » Italy, sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  2. Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

    PubMed Central

    Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy

    2015-01-01

    Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning. PMID:26086579

  3. A Statistical Method for Synthesizing Mediation Analyses Using the Product of Coefficient Approach Across Multiple Trials

    PubMed Central

    Huang, Shi; MacKinnon, David P.; Perrino, Tatiana; Gallo, Carlos; Cruden, Gracelyn; Brown, C Hendricks

    2016-01-01

    Mediation analysis often requires larger sample sizes than main effect analysis to achieve the same statistical power. Combining results across similar trials may be the only practical option for increasing statistical power for mediation analysis in some situations. In this paper, we propose a method to estimate: 1) marginal means for mediation path a, the relation of the independent variable to the mediator; 2) marginal means for path b, the relation of the mediator to the outcome, across multiple trials; and 3) the between-trial level variance-covariance matrix based on a bivariate normal distribution. We present the statistical theory and an R computer program to combine regression coefficients from multiple trials to estimate a combined mediated effect and confidence interval under a random effects model. Values of coefficients a and b, along with their standard errors from each trial are the input for the method. This marginal likelihood based approach with Monte Carlo confidence intervals provides more accurate inference than the standard meta-analytic approach. We discuss computational issues, apply the method to two real-data examples and make recommendations for the use of the method in different settings. PMID:28239330

  4. Cognition of and Demand for Education and Teaching in Medical Statistics in China: A Systematic Review and Meta-Analysis

    PubMed Central

    Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong

    2015-01-01

    Background Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. Objectives This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. Methods We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. Results There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. Conclusion The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent. PMID:26053876

  5. Integrated Analysis of Pharmacologic, Clinical, and SNP Microarray Data using Projection onto the Most Interesting Statistical Evidence with Adaptive Permutation Testing

    PubMed Central

    Pounds, Stan; Cao, Xueyuan; Cheng, Cheng; Yang, Jun; Campana, Dario; Evans, William E.; Pui, Ching-Hon; Relling, Mary V.

    2010-01-01

    Powerful methods for integrated analysis of multiple biological data sets are needed to maximize interpretation capacity and acquire meaningful knowledge. We recently developed Projection Onto the Most Interesting Statistical Evidence (PROMISE). PROMISE is a statistical procedure that incorporates prior knowledge about the biological relationships among endpoint variables into an integrated analysis of microarray gene expression data with multiple biological and clinical endpoints. Here, PROMISE is adapted to the integrated analysis of pharmacologic, clinical, and genome-wide genotype data that incorporating knowledge about the biological relationships among pharmacologic and clinical response data. An efficient permutation-testing algorithm is introduced so that statistical calculations are computationally feasible in this higher-dimension setting. The new method is applied to a pediatric leukemia data set. The results clearly indicate that PROMISE is a powerful statistical tool for identifying genomic features that exhibit a biologically meaningful pattern of association with multiple endpoint variables. PMID:21516175

  6. Comparing multiple statistical methods for inverse prediction in nuclear forensics applications

    DOE PAGES

    Lewis, John R.; Zhang, Adah; Anderson-Cook, Christine Michaela

    2017-10-29

    Forensic science seeks to predict source characteristics using measured observables. Statistically, this objective can be thought of as an inverse problem where interest is in the unknown source characteristics or factors ( X) of some underlying causal model producing the observables or responses (Y = g ( X) + error). Here, this paper reviews several statistical methods for use in inverse problems and demonstrates that comparing results from multiple methods can be used to assess predictive capability. Motivation for assessing inverse predictions comes from the desired application to historical and future experiments involving nuclear material production for forensics research inmore » which inverse predictions, along with an assessment of predictive capability, are desired.« less

  7. Comparing multiple statistical methods for inverse prediction in nuclear forensics applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lewis, John R.; Zhang, Adah; Anderson-Cook, Christine Michaela

    Forensic science seeks to predict source characteristics using measured observables. Statistically, this objective can be thought of as an inverse problem where interest is in the unknown source characteristics or factors ( X) of some underlying causal model producing the observables or responses (Y = g ( X) + error). Here, this paper reviews several statistical methods for use in inverse problems and demonstrates that comparing results from multiple methods can be used to assess predictive capability. Motivation for assessing inverse predictions comes from the desired application to historical and future experiments involving nuclear material production for forensics research inmore » which inverse predictions, along with an assessment of predictive capability, are desired.« less

  8. Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution.

    PubMed

    Gangnon, Ronald E

    2012-03-01

    The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, whereas rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. © 2011, The International Biometric Society.

  9. Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution

    PubMed Central

    Gangnon, Ronald E.

    2011-01-01

    Summary The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, while rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. PMID:21762118

  10. Statistical innovations in diagnostic device evaluation.

    PubMed

    Yu, Tinghui; Li, Qin; Gray, Gerry; Yue, Lilly Q

    2016-01-01

    Due to rapid technological development, innovations in diagnostic devices are proceeding at an extremely fast pace. Accordingly, the needs for adopting innovative statistical methods have emerged in the evaluation of diagnostic devices. Statisticians in the Center for Devices and Radiological Health at the Food and Drug Administration have provided leadership in implementing statistical innovations. The innovations discussed in this article include: the adoption of bootstrap and Jackknife methods, the implementation of appropriate multiple reader multiple case study design, the application of robustness analyses for missing data, and the development of study designs and data analyses for companion diagnostics.

  11. The Research of Multiple Attenuation Based on Feedback Iteration and Independent Component Analysis

    NASA Astrophysics Data System (ADS)

    Xu, X.; Tong, S.; Wang, L.

    2017-12-01

    How to solve the problem of multiple suppression is a difficult problem in seismic data processing. The traditional technology for multiple attenuation is based on the principle of the minimum output energy of the seismic signal, this criterion is based on the second order statistics, and it can't achieve the multiple attenuation when the primaries and multiples are non-orthogonal. In order to solve the above problems, we combine the feedback iteration method based on the wave equation and the improved independent component analysis (ICA) based on high order statistics to suppress the multiple waves. We first use iterative feedback method to predict the free surface multiples of each order. Then, in order to predict multiples from real multiple in amplitude and phase, we design an expanded pseudo multi-channel matching filtering method to get a more accurate matching multiple result. Finally, we present the improved fast ICA algorithm which is based on the maximum non-Gauss criterion of output signal to the matching multiples and get better separation results of the primaries and the multiples. The advantage of our method is that we don't need any priori information to the prediction of the multiples, and can have a better separation result. The method has been applied to several synthetic data generated by finite-difference model technique and the Sigsbee2B model multiple data, the primaries and multiples are non-orthogonal in these models. The experiments show that after three to four iterations, we can get the perfect multiple results. Using our matching method and Fast ICA adaptive multiple subtraction, we can not only effectively preserve the effective wave energy in seismic records, but also can effectively suppress the free surface multiples, especially the multiples related to the middle and deep areas.

  12. Spatial scan statistics for detection of multiple clusters with arbitrary shapes.

    PubMed

    Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray

    2016-12-01

    In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.

  13. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension.

    PubMed

    Zhu, Xiaofeng; Feng, Tao; Tayo, Bamidele O; Liang, Jingjing; Young, J Hunter; Franceschini, Nora; Smith, Jennifer A; Yanek, Lisa R; Sun, Yan V; Edwards, Todd L; Chen, Wei; Nalls, Mike; Fox, Ervin; Sale, Michele; Bottinger, Erwin; Rotimi, Charles; Liu, Yongmei; McKnight, Barbara; Liu, Kiang; Arnett, Donna K; Chakravati, Aravinda; Cooper, Richard S; Redline, Susan

    2015-01-08

    Genome-wide association studies (GWASs) have identified many genetic variants underlying complex traits. Many detected genetic loci harbor variants that associate with multiple-even distinct-traits. Most current analysis approaches focus on single traits, even though the final results from multiple traits are evaluated together. Such approaches miss the opportunity to systemically integrate the phenome-wide data available for genetic association analysis. In this study, we propose a general approach that can integrate association evidence from summary statistics of multiple traits, either correlated, independent, continuous, or binary traits, which might come from the same or different studies. We allow for trait heterogeneity effects. Population structure and cryptic relatedness can also be controlled. Our simulations suggest that the proposed method has improved statistical power over single-trait analysis in most of the cases we studied. We applied our method to the Continental Origins and Genetic Epidemiology Network (COGENT) African ancestry samples for three blood pressure traits and identified four loci (CHIC2, HOXA-EVX1, IGFBP1/IGFBP3, and CDH17; p < 5.0 × 10(-8)) associated with hypertension-related traits that were missed by a single-trait analysis in the original report. Six additional loci with suggestive association evidence (p < 5.0 × 10(-7)) were also observed, including CACNA1D and WNT3. Our study strongly suggests that analyzing multiple phenotypes can improve statistical power and that such analysis can be executed with the summary statistics from GWASs. Our method also provides a way to study a cross phenotype (CP) association by using summary statistics from GWASs of multiple phenotypes. Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  14. Cognition of and Demand for Education and Teaching in Medical Statistics in China: A Systematic Review and Meta-Analysis.

    PubMed

    Wu, Yazhou; Zhou, Liang; Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong

    2015-01-01

    Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent.

  15. RooStatsCms: A tool for analysis modelling, combination and statistical studies

    NASA Astrophysics Data System (ADS)

    Piparo, D.; Schott, G.; Quast, G.

    2010-04-01

    RooStatsCms is an object oriented statistical framework based on the RooFit technology. Its scope is to allow the modelling, statistical analysis and combination of multiple search channels for new phenomena in High Energy Physics. It provides a variety of methods described in literature implemented as classes, whose design is oriented to the execution of multiple CPU intensive jobs on batch systems or on the Grid.

  16. Advances in Testing the Statistical Significance of Mediation Effects

    ERIC Educational Resources Information Center

    Mallinckrodt, Brent; Abraham, W. Todd; Wei, Meifen; Russell, Daniel W.

    2006-01-01

    P. A. Frazier, A. P. Tix, and K. E. Barron (2004) highlighted a normal theory method popularized by R. M. Baron and D. A. Kenny (1986) for testing the statistical significance of indirect effects (i.e., mediator variables) in multiple regression contexts. However, simulation studies suggest that this method lacks statistical power relative to some…

  17. Field Penetration in a Rectangular Box Using Numerical Techniques: An Effort to Obtain Statistical Shielding Effectiveness

    NASA Technical Reports Server (NTRS)

    Bunting, Charles F.; Yu, Shih-Pin

    2006-01-01

    This paper emphasizes the application of numerical methods to explore the ideas related to shielding effectiveness from a statistical view. An empty rectangular box is examined using a hybrid modal/moment method. The basic computational method is presented followed by the results for single- and multiple observation points within the over-moded empty structure. The statistics of the field are obtained by using frequency stirring, borrowed from the ideas connected with reverberation chamber techniques, and extends the ideas of shielding effectiveness well into the multiple resonance regions. The study presented in this paper will address the average shielding effectiveness over a broad spatial sample within the enclosure as the frequency is varied.

  18. Multiple Contact Dates and SARS Incubation Periods

    PubMed Central

    2004-01-01

    Many severe acute respiratory syndrome (SARS) patients have multiple possible incubation periods due to multiple contact dates. Multiple contact dates cannot be used in standard statistical analytic techniques, however. I present a simple spreadsheet-based method that uses multiple contact dates to calculate the possible incubation periods of SARS. PMID:15030684

  19. P values in display items are ubiquitous and almost invariably significant: A survey of top science journals

    PubMed Central

    Cristea, Ioana Alina

    2018-01-01

    P values represent a widely used, but pervasively misunderstood and fiercely contested method of scientific inference. Display items, such as figures and tables, often containing the main results, are an important source of P values. We conducted a survey comparing the overall use of P values and the occurrence of significant P values in display items of a sample of articles in the three top multidisciplinary journals (Nature, Science, PNAS) in 2017 and, respectively, in 1997. We also examined the reporting of multiplicity corrections and its potential influence on the proportion of statistically significant P values. Our findings demonstrated substantial and growing reliance on P values in display items, with increases of 2.5 to 14.5 times in 2017 compared to 1997. The overwhelming majority of P values (94%, 95% confidence interval [CI] 92% to 96%) were statistically significant. Methods to adjust for multiplicity were almost non-existent in 1997, but reported in many articles relying on P values in 2017 (Nature 68%, Science 48%, PNAS 38%). In their absence, almost all reported P values were statistically significant (98%, 95% CI 96% to 99%). Conversely, when any multiplicity corrections were described, 88% (95% CI 82% to 93%) of reported P values were statistically significant. Use of Bayesian methods was scant (2.5%) and rarely (0.7%) articles relied exclusively on Bayesian statistics. Overall, wider appreciation of the need for multiplicity corrections is a welcome evolution, but the rapid growth of reliance on P values and implausibly high rates of reported statistical significance are worrisome. PMID:29763472

  20. P values in display items are ubiquitous and almost invariably significant: A survey of top science journals.

    PubMed

    Cristea, Ioana Alina; Ioannidis, John P A

    2018-01-01

    P values represent a widely used, but pervasively misunderstood and fiercely contested method of scientific inference. Display items, such as figures and tables, often containing the main results, are an important source of P values. We conducted a survey comparing the overall use of P values and the occurrence of significant P values in display items of a sample of articles in the three top multidisciplinary journals (Nature, Science, PNAS) in 2017 and, respectively, in 1997. We also examined the reporting of multiplicity corrections and its potential influence on the proportion of statistically significant P values. Our findings demonstrated substantial and growing reliance on P values in display items, with increases of 2.5 to 14.5 times in 2017 compared to 1997. The overwhelming majority of P values (94%, 95% confidence interval [CI] 92% to 96%) were statistically significant. Methods to adjust for multiplicity were almost non-existent in 1997, but reported in many articles relying on P values in 2017 (Nature 68%, Science 48%, PNAS 38%). In their absence, almost all reported P values were statistically significant (98%, 95% CI 96% to 99%). Conversely, when any multiplicity corrections were described, 88% (95% CI 82% to 93%) of reported P values were statistically significant. Use of Bayesian methods was scant (2.5%) and rarely (0.7%) articles relied exclusively on Bayesian statistics. Overall, wider appreciation of the need for multiplicity corrections is a welcome evolution, but the rapid growth of reliance on P values and implausibly high rates of reported statistical significance are worrisome.

  1. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

    PubMed

    Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

    2013-08-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.

  2. Statistical technique for analysing functional connectivity of multiple spike trains.

    PubMed

    Masud, Mohammad Shahed; Borisyuk, Roman

    2011-03-15

    A new statistical technique, the Cox method, used for analysing functional connectivity of simultaneously recorded multiple spike trains is presented. This method is based on the theory of modulated renewal processes and it estimates a vector of influence strengths from multiple spike trains (called reference trains) to the selected (target) spike train. Selecting another target spike train and repeating the calculation of the influence strengths from the reference spike trains enables researchers to find all functional connections among multiple spike trains. In order to study functional connectivity an "influence function" is identified. This function recognises the specificity of neuronal interactions and reflects the dynamics of postsynaptic potential. In comparison to existing techniques, the Cox method has the following advantages: it does not use bins (binless method); it is applicable to cases where the sample size is small; it is sufficiently sensitive such that it estimates weak influences; it supports the simultaneous analysis of multiple influences; it is able to identify a correct connectivity scheme in difficult cases of "common source" or "indirect" connectivity. The Cox method has been thoroughly tested using multiple sets of data generated by the neural network model of the leaky integrate and fire neurons with a prescribed architecture of connections. The results suggest that this method is highly successful for analysing functional connectivity of simultaneously recorded multiple spike trains. Copyright © 2011 Elsevier B.V. All rights reserved.

  3. Towards sound epistemological foundations of statistical methods for high-dimensional biology.

    PubMed

    Mehta, Tapan; Tanik, Murat; Allison, David B

    2004-09-01

    A sound epistemological foundation for biological inquiry comes, in part, from application of valid statistical procedures. This tenet is widely appreciated by scientists studying the new realm of high-dimensional biology, or 'omic' research, which involves multiplicity at unprecedented scales. Many papers aimed at the high-dimensional biology community describe the development or application of statistical techniques. The validity of many of these is questionable, and a shared understanding about the epistemological foundations of the statistical methods themselves seems to be lacking. Here we offer a framework in which the epistemological foundation of proposed statistical methods can be evaluated.

  4. Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens.

    PubMed

    Taylor, Sandra L; Ruhaak, L Renee; Weiss, Robert H; Kelly, Karen; Kim, Kyoungmi

    2017-01-01

    High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact between-biospecimen correlation and multivariate analysis results. We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen. We provide R functions to implement and illustrate our method as supplementary information CONTACT: sltaylor@ucdavis.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. Statistical methods used in articles published by the Journal of Periodontal and Implant Science.

    PubMed

    Choi, Eunsil; Lyu, Jiyoung; Park, Jinyoung; Kim, Hae-Young

    2014-12-01

    The purposes of this study were to assess the trend of use of statistical methods including parametric and nonparametric methods and to evaluate the use of complex statistical methodology in recent periodontal studies. This study analyzed 123 articles published in the Journal of Periodontal & Implant Science (JPIS) between 2010 and 2014. Frequencies and percentages were calculated according to the number of statistical methods used, the type of statistical method applied, and the type of statistical software used. Most of the published articles considered (64.4%) used statistical methods. Since 2011, the percentage of JPIS articles using statistics has increased. On the basis of multiple counting, we found that the percentage of studies in JPIS using parametric methods was 61.1%. Further, complex statistical methods were applied in only 6 of the published studies (5.0%), and nonparametric statistical methods were applied in 77 of the published studies (38.9% of a total of 198 studies considered). We found an increasing trend towards the application of statistical methods and nonparametric methods in recent periodontal studies and thus, concluded that increased use of complex statistical methodology might be preferred by the researchers in the fields of study covered by JPIS.

  6. Statistical Analysis of a Class: Monte Carlo and Multiple Imputation Spreadsheet Methods for Estimation and Extrapolation

    ERIC Educational Resources Information Center

    Fish, Laurel J.; Halcoussis, Dennis; Phillips, G. Michael

    2017-01-01

    The Monte Carlo method and related multiple imputation methods are traditionally used in math, physics and science to estimate and analyze data and are now becoming standard tools in analyzing business and financial problems. However, few sources explain the application of the Monte Carlo method for individuals and business professionals who are…

  7. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

    PubMed Central

    Kim, Yoonsang; Emery, Sherry

    2013-01-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415

  8. Resampling-based Methods in Single and Multiple Testing for Equality of Covariance/Correlation Matrices

    PubMed Central

    Yang, Yang; DeGruttola, Victor

    2016-01-01

    Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients. PMID:22740584

  9. Resampling-based methods in single and multiple testing for equality of covariance/correlation matrices.

    PubMed

    Yang, Yang; DeGruttola, Victor

    2012-06-22

    Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients.

  10. Gene- and pathway-based association tests for multiple traits with GWAS summary statistics.

    PubMed

    Kwak, Il-Youp; Pan, Wei

    2017-01-01

    To identify novel genetic variants associated with complex traits and to shed new insights on underlying biology, in addition to the most popular single SNP-single trait association analysis, it would be useful to explore multiple correlated (intermediate) traits at the gene- or pathway-level by mining existing single GWAS or meta-analyzed GWAS data. For this purpose, we present an adaptive gene-based test and a pathway-based test for association analysis of multiple traits with GWAS summary statistics. The proposed tests are adaptive at both the SNP- and trait-levels; that is, they account for possibly varying association patterns (e.g. signal sparsity levels) across SNPs and traits, thus maintaining high power across a wide range of situations. Furthermore, the proposed methods are general: they can be applied to mixed types of traits, and to Z-statistics or P-values as summary statistics obtained from either a single GWAS or a meta-analysis of multiple GWAS. Our numerical studies with simulated and real data demonstrated the promising performance of the proposed methods. The methods are implemented in R package aSPU, freely and publicly available at: https://cran.r-project.org/web/packages/aSPU/ CONTACT: weip@biostat.umn.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. Using Artificial Neural Networks in Educational Research: Some Comparisons with Linear Statistical Models.

    ERIC Educational Resources Information Center

    Everson, Howard T.; And Others

    This paper explores the feasibility of neural computing methods such as artificial neural networks (ANNs) and abductory induction mechanisms (AIM) for use in educational measurement. ANNs and AIMS methods are contrasted with more traditional statistical techniques, such as multiple regression and discriminant function analyses, for making…

  12. Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers

    PubMed Central

    Han, Buhm; Kang, Hyun Min; Eskin, Eleazar

    2009-01-01

    With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu. PMID:19381255

  13. [Comparison between rapid detection method of enzyme substrate technique and multiple-tube fermentation technique in water coliform bacteria detection].

    PubMed

    Sun, Zong-ke; Wu, Rong; Ding, Pei; Xue, Jin-Rong

    2006-07-01

    To compare between rapid detection method of enzyme substrate technique and multiple-tube fermentation technique in water coliform bacteria detection. Using inoculated and real water samples to compare the equivalence and false positive rate between two methods. Results demonstrate that enzyme substrate technique shows equivalence with multiple-tube fermentation technique (P = 0.059), false positive rate between the two methods has no statistical difference. It is suggested that enzyme substrate technique can be used as a standard method for water microbiological safety evaluation.

  14. Statistical reconstruction for cosmic ray muon tomography.

    PubMed

    Schultz, Larry J; Blanpied, Gary S; Borozdin, Konstantin N; Fraser, Andrew M; Hengartner, Nicolas W; Klimenko, Alexei V; Morris, Christopher L; Orum, Chris; Sossong, Michael J

    2007-08-01

    Highly penetrating cosmic ray muons constantly shower the earth at a rate of about 1 muon per cm2 per minute. We have developed a technique which exploits the multiple Coulomb scattering of these particles to perform nondestructive inspection without the use of artificial radiation. In prior work [1]-[3], we have described heuristic methods for processing muon data to create reconstructed images. In this paper, we present a maximum likelihood/expectation maximization tomographic reconstruction algorithm designed for the technique. This algorithm borrows much from techniques used in medical imaging, particularly emission tomography, but the statistics of muon scattering dictates differences. We describe the statistical model for multiple scattering, derive the reconstruction algorithm, and present simulated examples. We also propose methods to improve the robustness of the algorithm to experimental errors and events departing from the statistical model.

  15. The multiple imputation method: a case study involving secondary data analysis.

    PubMed

    Walani, Salimah R; Cleland, Charles M

    2015-05-01

    To illustrate with the example of a secondary data analysis study the use of the multiple imputation method to replace missing data. Most large public datasets have missing data, which need to be handled by researchers conducting secondary data analysis studies. Multiple imputation is a technique widely used to replace missing values while preserving the sample size and sampling variability of the data. The 2004 National Sample Survey of Registered Nurses. The authors created a model to impute missing values using the chained equation method. They used imputation diagnostics procedures and conducted regression analysis of imputed data to determine the differences between the log hourly wages of internationally educated and US-educated registered nurses. The authors used multiple imputation procedures to replace missing values in a large dataset with 29,059 observations. Five multiple imputed datasets were created. Imputation diagnostics using time series and density plots showed that imputation was successful. The authors also present an example of the use of multiple imputed datasets to conduct regression analysis to answer a substantive research question. Multiple imputation is a powerful technique for imputing missing values in large datasets while preserving the sample size and variance of the data. Even though the chained equation method involves complex statistical computations, recent innovations in software and computation have made it possible for researchers to conduct this technique on large datasets. The authors recommend nurse researchers use multiple imputation methods for handling missing data to improve the statistical power and external validity of their studies.

  16. Multiple Kernel Learning with Random Effects for Predicting Longitudinal Outcomes and Data Integration

    PubMed Central

    Chen, Tianle; Zeng, Donglin

    2015-01-01

    Summary Predicting disease risk and progression is one of the main goals in many clinical research studies. Cohort studies on the natural history and etiology of chronic diseases span years and data are collected at multiple visits. Although kernel-based statistical learning methods are proven to be powerful for a wide range of disease prediction problems, these methods are only well studied for independent data but not for longitudinal data. It is thus important to develop time-sensitive prediction rules that make use of the longitudinal nature of the data. In this paper, we develop a novel statistical learning method for longitudinal data by introducing subject-specific short-term and long-term latent effects through a designed kernel to account for within-subject correlation of longitudinal measurements. Since the presence of multiple sources of data is increasingly common, we embed our method in a multiple kernel learning framework and propose a regularized multiple kernel statistical learning with random effects to construct effective nonparametric prediction rules. Our method allows easy integration of various heterogeneous data sources and takes advantage of correlation among longitudinal measures to increase prediction power. We use different kernels for each data source taking advantage of the distinctive feature of each data modality, and then optimally combine data across modalities. We apply the developed methods to two large epidemiological studies, one on Huntington's disease and the other on Alzheimer's Disease (Alzheimer's Disease Neuroimaging Initiative, ADNI) where we explore a unique opportunity to combine imaging and genetic data to study prediction of mild cognitive impairment, and show a substantial gain in performance while accounting for the longitudinal aspect of the data. PMID:26177419

  17. Accounting for Multiple Births in Neonatal and Perinatal Trials: Systematic Review and Case Study

    PubMed Central

    Hibbs, Anna Maria; Black, Dennis; Palermo, Lisa; Cnaan, Avital; Luan, Xianqun; Truog, William E; Walsh, Michele C; Ballard, Roberta A

    2010-01-01

    Objectives To determine the prevalence in the neonatal literature of statistical approaches accounting for the unique clustering patterns of multiple births. To explore the sensitivity of an actual trial to several analytic approaches to multiples. Methods A systematic review of recent perinatal trials assessed the prevalence of studies accounting for clustering of multiples. The NO CLD trial served as a case study of the sensitivity of the outcome to several statistical strategies. We calculated odds ratios using non-clustered (logistic regression) and clustered (generalized estimating equations, multiple outputation) analyses. Results In the systematic review, most studies did not describe the randomization of twins and did not account for clustering. Of those studies that did, exclusion of multiples and generalized estimating equations were the most common strategies. The NO CLD study included 84 infants with a sibling enrolled in the study. Multiples were more likely than singletons to be white and were born to older mothers (p<0.01). Analyses that accounted for clustering were statistically significant; analyses assuming independence were not. Conclusions The statistical approach to multiples can influence the odds ratio and width of confidence intervals, thereby affecting the interpretation of a study outcome. A minority of perinatal studies address this issue. PMID:19969305

  18. Statistical testing and power analysis for brain-wide association study.

    PubMed

    Gong, Weikang; Wan, Lin; Lu, Wenlian; Ma, Liang; Cheng, Fan; Cheng, Wei; Grünewald, Stefan; Feng, Jianfeng

    2018-04-05

    The identification of connexel-wise associations, which involves examining functional connectivities between pairwise voxels across the whole brain, is both statistically and computationally challenging. Although such a connexel-wise methodology has recently been adopted by brain-wide association studies (BWAS) to identify connectivity changes in several mental disorders, such as schizophrenia, autism and depression, the multiple correction and power analysis methods designed specifically for connexel-wise analysis are still lacking. Therefore, we herein report the development of a rigorous statistical framework for connexel-wise significance testing based on the Gaussian random field theory. It includes controlling the family-wise error rate (FWER) of multiple hypothesis testings using topological inference methods, and calculating power and sample size for a connexel-wise study. Our theoretical framework can control the false-positive rate accurately, as validated empirically using two resting-state fMRI datasets. Compared with Bonferroni correction and false discovery rate (FDR), it can reduce false-positive rate and increase statistical power by appropriately utilizing the spatial information of fMRI data. Importantly, our method bypasses the need of non-parametric permutation to correct for multiple comparison, thus, it can efficiently tackle large datasets with high resolution fMRI images. The utility of our method is shown in a case-control study. Our approach can identify altered functional connectivities in a major depression disorder dataset, whereas existing methods fail. A software package is available at https://github.com/weikanggong/BWAS. Copyright © 2018 Elsevier B.V. All rights reserved.

  19. Design and analysis of multiple diseases genome-wide association studies without controls.

    PubMed

    Chen, Zhongxue; Huang, Hanwen; Ng, Hon Keung Tony

    2012-11-15

    In genome-wide association studies (GWAS), multiple diseases with shared controls is one of the case-control study designs. If data obtained from these studies are appropriately analyzed, this design can have several advantages such as improving statistical power in detecting associations and reducing the time and cost in the data collection process. In this paper, we propose a study design for GWAS which involves multiple diseases but without controls. We also propose corresponding statistical data analysis strategy for GWAS with multiple diseases but no controls. Through a simulation study, we show that the statistical association test with the proposed study design is more powerful than the test with single disease sharing common controls, and it has comparable power to the overall test based on the whole dataset including the controls. We also apply the proposed method to a real GWAS dataset to illustrate the methodologies and the advantages of the proposed design. Some possible limitations of this study design and testing method and their solutions are also discussed. Our findings indicate that the proposed study design and statistical analysis strategy could be more efficient than the usual case-control GWAS as well as those with shared controls. Copyright © 2012 Elsevier B.V. All rights reserved.

  20. A novel statistical method for quantitative comparison of multiple ChIP-seq datasets.

    PubMed

    Chen, Li; Wang, Chi; Qin, Zhaohui S; Wu, Hao

    2015-06-15

    ChIP-seq is a powerful technology to measure the protein binding or histone modification strength in the whole genome scale. Although there are a number of methods available for single ChIP-seq data analysis (e.g. 'peak detection'), rigorous statistical method for quantitative comparison of multiple ChIP-seq datasets with the considerations of data from control experiment, signal to noise ratios, biological variations and multiple-factor experimental designs is under-developed. In this work, we develop a statistical method to perform quantitative comparison of multiple ChIP-seq datasets and detect genomic regions showing differential protein binding or histone modification. We first detect peaks from all datasets and then union them to form a single set of candidate regions. The read counts from IP experiment at the candidate regions are assumed to follow Poisson distribution. The underlying Poisson rates are modeled as an experiment-specific function of artifacts and biological signals. We then obtain the estimated biological signals and compare them through the hypothesis testing procedure in a linear model framework. Simulations and real data analyses demonstrate that the proposed method provides more accurate and robust results compared with existing ones. An R software package ChIPComp is freely available at http://web1.sph.emory.edu/users/hwu30/software/ChIPComp.html. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Multiple Chronic Conditions among Adults Aged 45 and Over: Trends Over the Past 10 Years

    MedlinePlus

    ... on Vital and Health Statistics Annual Reports Health Survey Research Methods Conference Reports from the National Medical Care Utilization and Expenditure Survey Clearinghouse on Health Indexes Statistical Notes for Health ...

  2. Beyond Multiple Regression: Using Commonality Analysis to Better Understand R[superscript 2] Results

    ERIC Educational Resources Information Center

    Warne, Russell T.

    2011-01-01

    Multiple regression is one of the most common statistical methods used in quantitative educational research. Despite the versatility and easy interpretability of multiple regression, it has some shortcomings in the detection of suppressor variables and for somewhat arbitrarily assigning values to the structure coefficients of correlated…

  3. THE SCREENING AND RANKING ALGORITHM FOR CHANGE-POINTS DETECTION IN MULTIPLE SAMPLES

    PubMed Central

    Song, Chi; Min, Xiaoyi; Zhang, Heping

    2016-01-01

    The chromosome copy number variation (CNV) is the deviation of genomic regions from their normal copy number states, which may associate with many human diseases. Current genetic studies usually collect hundreds to thousands of samples to study the association between CNV and diseases. CNVs can be called by detecting the change-points in mean for sequences of array-based intensity measurements. Although multiple samples are of interest, the majority of the available CNV calling methods are single sample based. Only a few multiple sample methods have been proposed using scan statistics that are computationally intensive and designed toward either common or rare change-points detection. In this paper, we propose a novel multiple sample method by adaptively combining the scan statistic of the screening and ranking algorithm (SaRa), which is computationally efficient and is able to detect both common and rare change-points. We prove that asymptotically this method can find the true change-points with almost certainty and show in theory that multiple sample methods are superior to single sample methods when shared change-points are of interest. Additionally, we report extensive simulation studies to examine the performance of our proposed method. Finally, using our proposed method as well as two competing approaches, we attempt to detect CNVs in the data from the Primary Open-Angle Glaucoma Genes and Environment study, and conclude that our method is faster and requires less information while our ability to detect the CNVs is comparable or better. PMID:28090239

  4. United States Census 2000 Population with Bridged Race Categories. Vital and Health Statistics. Data Evaluation and Methods Research.

    ERIC Educational Resources Information Center

    Ingram, Deborah D.; Parker, Jennifer D.; Schenker, Nathaniel; Weed, James A.; Hamilton, Brady; Arias, Elizabeth; Madans, Jennifer H.

    This report documents the National Center for Health Statistics' (NCHS) methods for bridging the Census 2000 multiple-race resident population to single-race categories and describing bridged race resident population estimates. Data came from the pooled 1997-2000 National Health Interview Surveys. The bridging models included demographic and…

  5. A new statistical method for transfer coefficient calculations in the framework of the general multiple-compartment model of transport for radionuclides in biological systems.

    PubMed

    Garcia, F; Arruda-Neto, J D; Manso, M V; Helene, O M; Vanin, V R; Rodriguez, O; Mesa, J; Likhachev, V P; Filho, J W; Deppman, A; Perez, G; Guzman, F; de Camargo, S P

    1999-10-01

    A new and simple statistical procedure (STATFLUX) for the calculation of transfer coefficients of radionuclide transport to animals and plants is proposed. The method is based on the general multiple-compartment model, which uses a system of linear equations involving geometrical volume considerations. By using experimentally available curves of radionuclide concentrations versus time, for each animal compartment (organs), flow parameters were estimated by employing a least-squares procedure, whose consistency is tested. Some numerical results are presented in order to compare the STATFLUX transfer coefficients with those from other works and experimental data.

  6. Multi-template tensor-based morphometry: Application to analysis of Alzheimer's disease

    PubMed Central

    Koikkalainen, Juha; Lötjönen, Jyrki; Thurfjell, Lennart; Rueckert, Daniel; Waldemar, Gunhild; Soininen, Hilkka

    2012-01-01

    In this paper methods for using multiple templates in tensor-based morphometry (TBM) are presented and comparedtothe conventional single-template approach. TBM analysis requires non-rigid registrations which are often subject to registration errors. When using multiple templates and, therefore, multiple registrations, it can be assumed that the registration errors are averaged and eventually compensated. Four different methods are proposed for multi-template TBM. The methods were evaluated using magnetic resonance (MR) images of healthy controls, patients with stable or progressive mild cognitive impairment (MCI), and patients with Alzheimer's disease (AD) from the ADNI database (N=772). The performance of TBM features in classifying images was evaluated both quantitatively and qualitatively. Classification results show that the multi-template methods are statistically significantly better than the single-template method. The overall classification accuracy was 86.0% for the classification of control and AD subjects, and 72.1%for the classification of stable and progressive MCI subjects. The statistical group-level difference maps produced using multi-template TBM were smoother, formed larger continuous regions, and had larger t-values than the maps obtained with single-template TBM. PMID:21419228

  7. Applied statistics in agricultural, biological, and environmental sciences.

    USDA-ARS?s Scientific Manuscript database

    Agronomic research often involves measurement and collection of multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate statistical methods encompass the simultaneous analysis of all random variables measured on each experimental or s...

  8. Meta-analysis of quantitative pleiotropic traits for next-generation sequencing with multivariate functional linear models

    PubMed Central

    Chiu, Chi-yang; Jung, Jeesun; Chen, Wei; Weeks, Daniel E; Ren, Haobo; Boehnke, Michael; Amos, Christopher I; Liu, Aiyi; Mills, James L; Ting Lee, Mei-ling; Xiong, Momiao; Fan, Ruzong

    2017-01-01

    To analyze next-generation sequencing data, multivariate functional linear models are developed for a meta-analysis of multiple studies to connect genetic variant data to multiple quantitative traits adjusting for covariates. The goal is to take the advantage of both meta-analysis and pleiotropic analysis in order to improve power and to carry out a unified association analysis of multiple studies and multiple traits of complex disorders. Three types of approximate F -distributions based on Pillai–Bartlett trace, Hotelling–Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants. Simulation analysis is performed to evaluate false-positive rates and power of the proposed tests. The proposed methods are applied to analyze lipid traits in eight European cohorts. It is shown that it is more advantageous to perform multivariate analysis than univariate analysis in general, and it is more advantageous to perform meta-analysis of multiple studies instead of analyzing the individual studies separately. The proposed models require individual observations. The value of the current paper can be seen at least for two reasons: (a) the proposed methods can be applied to studies that have individual genotype data; (b) the proposed methods can be used as a criterion for future work that uses summary statistics to build test statistics to meta-analyze the data. PMID:28000696

  9. Meta-analysis of quantitative pleiotropic traits for next-generation sequencing with multivariate functional linear models.

    PubMed

    Chiu, Chi-Yang; Jung, Jeesun; Chen, Wei; Weeks, Daniel E; Ren, Haobo; Boehnke, Michael; Amos, Christopher I; Liu, Aiyi; Mills, James L; Ting Lee, Mei-Ling; Xiong, Momiao; Fan, Ruzong

    2017-02-01

    To analyze next-generation sequencing data, multivariate functional linear models are developed for a meta-analysis of multiple studies to connect genetic variant data to multiple quantitative traits adjusting for covariates. The goal is to take the advantage of both meta-analysis and pleiotropic analysis in order to improve power and to carry out a unified association analysis of multiple studies and multiple traits of complex disorders. Three types of approximate F -distributions based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants. Simulation analysis is performed to evaluate false-positive rates and power of the proposed tests. The proposed methods are applied to analyze lipid traits in eight European cohorts. It is shown that it is more advantageous to perform multivariate analysis than univariate analysis in general, and it is more advantageous to perform meta-analysis of multiple studies instead of analyzing the individual studies separately. The proposed models require individual observations. The value of the current paper can be seen at least for two reasons: (a) the proposed methods can be applied to studies that have individual genotype data; (b) the proposed methods can be used as a criterion for future work that uses summary statistics to build test statistics to meta-analyze the data.

  10. Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2007-01-01

    Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.

  11. Statistical Prediction in Proprietary Rehabilitation.

    ERIC Educational Resources Information Center

    Johnson, Kurt L.; And Others

    1987-01-01

    Applied statistical methods to predict case expenditures for low back pain rehabilitation cases in proprietary rehabilitation. Extracted predictor variables from case records of 175 workers compensation claimants with some degree of permanent disability due to back injury. Performed several multiple regression analyses resulting in a formula that…

  12. No-Reference Video Quality Assessment Based on Statistical Analysis in 3D-DCT Domain.

    PubMed

    Li, Xuelong; Guo, Qun; Lu, Xiaoqiang

    2016-05-13

    It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics (NVS) in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are firstly extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression (SVR) model afterwards. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in 3DDCT domain which has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing FR-VQA and RR-VQA metrics.

  13. Statistical hadronization and microcanonical ensemble

    DOE PAGES

    Becattini, F.; Ferroni, L.

    2004-01-01

    We present a Monte Carlo calculation of the microcanonical ensemble of the of the ideal hadron-resonance gas including all known states up to a mass of 1. 8 GeV, taking into account quantum statistics. The computing method is a development of a previous one based on a Metropolis Monte Carlo algorithm, with a the grand-canonical limit of the multi-species multiplicity distribution as proposal matrix. The microcanonical average multiplicities of the various hadron species are found to converge to the canonical ones for moderately low values of the total energy. This algorithm opens the way for event generators based for themore » statistical hadronization model.« less

  14. Use of personalized Dynamic Treatment Regimes (DTRs) and Sequential Multiple Assignment Randomized Trials (SMARTs) in mental health studies

    PubMed Central

    Liu, Ying; ZENG, Donglin; WANG, Yuanjia

    2014-01-01

    Summary Dynamic treatment regimens (DTRs) are sequential decision rules tailored at each point where a clinical decision is made based on each patient’s time-varying characteristics and intermediate outcomes observed at earlier points in time. The complexity, patient heterogeneity, and chronicity of mental disorders call for learning optimal DTRs to dynamically adapt treatment to an individual’s response over time. The Sequential Multiple Assignment Randomized Trial (SMARTs) design allows for estimating causal effects of DTRs. Modern statistical tools have been developed to optimize DTRs based on personalized variables and intermediate outcomes using rich data collected from SMARTs; these statistical methods can also be used to recommend tailoring variables for designing future SMART studies. This paper introduces DTRs and SMARTs using two examples in mental health studies, discusses two machine learning methods for estimating optimal DTR from SMARTs data, and demonstrates the performance of the statistical methods using simulated data. PMID:25642116

  15. A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics.

    PubMed

    Pare, Guillaume; Mao, Shihong; Deng, Wei Q

    2016-06-08

    Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance.

  16. A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics

    PubMed Central

    Pare, Guillaume; Mao, Shihong; Deng, Wei Q.

    2016-01-01

    Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance. PMID:27273519

  17. SOME STATISTICAL ISSUES RELATED TO MULTIPLE LINEAR REGRESSION MODELING OF BEACH BACTERIA CONCENTRATIONS

    EPA Science Inventory

    As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...

  18. A retrospective likelihood approach for efficient integration of multiple omics factors in case-control association studies.

    PubMed

    Balliu, Brunilda; Tsonaka, Roula; Boehringer, Stefan; Houwing-Duistermaat, Jeanine

    2015-03-01

    Integrative omics, the joint analysis of outcome and multiple types of omics data, such as genomics, epigenomics, and transcriptomics data, constitute a promising approach for powerful and biologically relevant association studies. These studies often employ a case-control design, and often include nonomics covariates, such as age and gender, that may modify the underlying omics risk factors. An open question is how to best integrate multiple omics and nonomics information to maximize statistical power in case-control studies that ascertain individuals based on the phenotype. Recent work on integrative omics have used prospective approaches, modeling case-control status conditional on omics, and nonomics risk factors. Compared to univariate approaches, jointly analyzing multiple risk factors with a prospective approach increases power in nonascertained cohorts. However, these prospective approaches often lose power in case-control studies. In this article, we propose a novel statistical method for integrating multiple omics and nonomics factors in case-control association studies. Our method is based on a retrospective likelihood function that models the joint distribution of omics and nonomics factors conditional on case-control status. The new method provides accurate control of Type I error rate and has increased efficiency over prospective approaches in both simulated and real data. © 2015 Wiley Periodicals, Inc.

  19. Data Analysis Techniques for Physical Scientists

    NASA Astrophysics Data System (ADS)

    Pruneau, Claude A.

    2017-10-01

    Preface; How to read this book; 1. The scientific method; Part I. Foundation in Probability and Statistics: 2. Probability; 3. Probability models; 4. Classical inference I: estimators; 5. Classical inference II: optimization; 6. Classical inference III: confidence intervals and statistical tests; 7. Bayesian inference; Part II. Measurement Techniques: 8. Basic measurements; 9. Event reconstruction; 10. Correlation functions; 11. The multiple facets of correlation functions; 12. Data correction methods; Part III. Simulation Techniques: 13. Monte Carlo methods; 14. Collision and detector modeling; List of references; Index.

  20. Assimilating Flow Data into Complex Multiple-Point Statistical Facies Models Using Pilot Points Method

    NASA Astrophysics Data System (ADS)

    Ma, W.; Jafarpour, B.

    2017-12-01

    We develop a new pilot points method for conditioning discrete multiple-point statistical (MPS) facies simulation on dynamic flow data. While conditioning MPS simulation on static hard data is straightforward, their calibration against nonlinear flow data is nontrivial. The proposed method generates conditional models from a conceptual model of geologic connectivity, known as a training image (TI), by strategically placing and estimating pilot points. To place pilot points, a score map is generated based on three sources of information:: (i) the uncertainty in facies distribution, (ii) the model response sensitivity information, and (iii) the observed flow data. Once the pilot points are placed, the facies values at these points are inferred from production data and are used, along with available hard data at well locations, to simulate a new set of conditional facies realizations. While facies estimation at the pilot points can be performed using different inversion algorithms, in this study the ensemble smoother (ES) and its multiple data assimilation variant (ES-MDA) are adopted to update permeability maps from production data, which are then used to statistically infer facies types at the pilot point locations. The developed method combines the information in the flow data and the TI by using the former to infer facies values at select locations away from the wells and the latter to ensure consistent facies structure and connectivity where away from measurement locations. Several numerical experiments are used to evaluate the performance of the developed method and to discuss its important properties.

  1. Can the Site-Frequency Spectrum Distinguish Exponential Population Growth from Multiple-Merger Coalescents?

    PubMed Central

    Eldon, Bjarki; Birkner, Matthias; Blath, Jochen; Freund, Fabian

    2015-01-01

    The ability of the site-frequency spectrum (SFS) to reflect the particularities of gene genealogies exhibiting multiple mergers of ancestral lines as opposed to those obtained in the presence of population growth is our focus. An excess of singletons is a well-known characteristic of both population growth and multiple mergers. Other aspects of the SFS, in particular, the weight of the right tail, are, however, affected in specific ways by the two model classes. Using an approximate likelihood method and minimum-distance statistics, our estimates of statistical power indicate that exponential and algebraic growth can indeed be distinguished from multiple-merger coalescents, even for moderate sample sizes, if the number of segregating sites is high enough. A normalized version of the SFS (nSFS) is also used as a summary statistic in an approximate Bayesian computation (ABC) approach. The results give further positive evidence as to the general eligibility of the SFS to distinguish between the different histories. PMID:25575536

  2. Analysing and correcting the differences between multi-source and multi-scale spatial remote sensing observations.

    PubMed

    Dong, Yingying; Luo, Ruisen; Feng, Haikuan; Wang, Jihua; Zhao, Jinling; Zhu, Yining; Yang, Guijun

    2014-01-01

    Differences exist among analysis results of agriculture monitoring and crop production based on remote sensing observations, which are obtained at different spatial scales from multiple remote sensors in same time period, and processed by same algorithms, models or methods. These differences can be mainly quantitatively described from three aspects, i.e. multiple remote sensing observations, crop parameters estimation models, and spatial scale effects of surface parameters. Our research proposed a new method to analyse and correct the differences between multi-source and multi-scale spatial remote sensing surface reflectance datasets, aiming to provide references for further studies in agricultural application with multiple remotely sensed observations from different sources. The new method was constructed on the basis of physical and mathematical properties of multi-source and multi-scale reflectance datasets. Theories of statistics were involved to extract statistical characteristics of multiple surface reflectance datasets, and further quantitatively analyse spatial variations of these characteristics at multiple spatial scales. Then, taking the surface reflectance at small spatial scale as the baseline data, theories of Gaussian distribution were selected for multiple surface reflectance datasets correction based on the above obtained physical characteristics and mathematical distribution properties, and their spatial variations. This proposed method was verified by two sets of multiple satellite images, which were obtained in two experimental fields located in Inner Mongolia and Beijing, China with different degrees of homogeneity of underlying surfaces. Experimental results indicate that differences of surface reflectance datasets at multiple spatial scales could be effectively corrected over non-homogeneous underlying surfaces, which provide database for further multi-source and multi-scale crop growth monitoring and yield prediction, and their corresponding consistency analysis evaluation.

  3. Analysing and Correcting the Differences between Multi-Source and Multi-Scale Spatial Remote Sensing Observations

    PubMed Central

    Dong, Yingying; Luo, Ruisen; Feng, Haikuan; Wang, Jihua; Zhao, Jinling; Zhu, Yining; Yang, Guijun

    2014-01-01

    Differences exist among analysis results of agriculture monitoring and crop production based on remote sensing observations, which are obtained at different spatial scales from multiple remote sensors in same time period, and processed by same algorithms, models or methods. These differences can be mainly quantitatively described from three aspects, i.e. multiple remote sensing observations, crop parameters estimation models, and spatial scale effects of surface parameters. Our research proposed a new method to analyse and correct the differences between multi-source and multi-scale spatial remote sensing surface reflectance datasets, aiming to provide references for further studies in agricultural application with multiple remotely sensed observations from different sources. The new method was constructed on the basis of physical and mathematical properties of multi-source and multi-scale reflectance datasets. Theories of statistics were involved to extract statistical characteristics of multiple surface reflectance datasets, and further quantitatively analyse spatial variations of these characteristics at multiple spatial scales. Then, taking the surface reflectance at small spatial scale as the baseline data, theories of Gaussian distribution were selected for multiple surface reflectance datasets correction based on the above obtained physical characteristics and mathematical distribution properties, and their spatial variations. This proposed method was verified by two sets of multiple satellite images, which were obtained in two experimental fields located in Inner Mongolia and Beijing, China with different degrees of homogeneity of underlying surfaces. Experimental results indicate that differences of surface reflectance datasets at multiple spatial scales could be effectively corrected over non-homogeneous underlying surfaces, which provide database for further multi-source and multi-scale crop growth monitoring and yield prediction, and their corresponding consistency analysis evaluation. PMID:25405760

  4. Identifying reprioritization response shift in a stroke caregiver population: a comparison of missing data methods.

    PubMed

    Sajobi, Tolulope T; Lix, Lisa M; Singh, Gurbakhshash; Lowerison, Mark; Engbers, Jordan; Mayo, Nancy E

    2015-03-01

    Response shift (RS) is an important phenomenon that influences the assessment of longitudinal changes in health-related quality of life (HRQOL) studies. Given that RS effects are often small, missing data due to attrition or item non-response can contribute to failure to detect RS effects. Since missing data are often encountered in longitudinal HRQOL data, effective strategies to deal with missing data are important to consider. This study aims to compare different imputation methods on the detection of reprioritization RS in the HRQOL of caregivers of stroke survivors. Data were from a Canadian multi-center longitudinal study of caregivers of stroke survivors over a one-year period. The Stroke Impact Scale physical function score at baseline, with a cutoff of 75, was used to measure patient stroke severity for the reprioritization RS analysis. Mean imputation, likelihood-based expectation-maximization imputation, and multiple imputation methods were compared in test procedures based on changes in relative importance weights to detect RS in SF-36 domains over a 6-month period. Monte Carlo simulation methods were used to compare the statistical powers of relative importance test procedures for detecting RS in incomplete longitudinal data under different missing data mechanisms and imputation methods. Of the 409 caregivers, 15.9 and 31.3 % of them had missing data at baseline and 6 months, respectively. There were no statistically significant changes in relative importance weights on any of the domains when complete-case analysis was adopted. But statistical significant changes were detected on physical functioning and/or vitality domains when mean imputation or EM imputation was adopted. There were also statistically significant changes in relative importance weights for physical functioning, mental health, and vitality domains when multiple imputation method was adopted. Our simulations revealed that relative importance test procedures were least powerful under complete-case analysis method and most powerful when a mean imputation or multiple imputation method was adopted for missing data, regardless of the missing data mechanism and proportion of missing data. Test procedures based on relative importance measures are sensitive to the type and amount of missing data and imputation method. Relative importance test procedures based on mean imputation and multiple imputation are recommended for detecting RS in incomplete data.

  5. MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines.

    PubMed

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I; Marcotte, Edward M

    2011-07-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.

  6. MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines

    PubMed Central

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I.; Marcotte, Edward M.

    2011-01-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for all possible PSMs and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for all detected proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses. PMID:21488652

  7. Efficient exploration of pan-cancer networks by generalized covariance selection and interactive web content

    PubMed Central

    Kling, Teresia; Johansson, Patrik; Sanchez, José; Marinescu, Voichita D.; Jörnsten, Rebecka; Nelander, Sven

    2015-01-01

    Statistical network modeling techniques are increasingly important tools to analyze cancer genomics data. However, current tools and resources are not designed to work across multiple diagnoses and technical platforms, thus limiting their applicability to comprehensive pan-cancer datasets such as The Cancer Genome Atlas (TCGA). To address this, we describe a new data driven modeling method, based on generalized Sparse Inverse Covariance Selection (SICS). The method integrates genetic, epigenetic and transcriptional data from multiple cancers, to define links that are present in multiple cancers, a subset of cancers, or a single cancer. It is shown to be statistically robust and effective at detecting direct pathway links in data from TCGA. To facilitate interpretation of the results, we introduce a publicly accessible tool (cancerlandscapes.org), in which the derived networks are explored as interactive web content, linked to several pathway and pharmacological databases. To evaluate the performance of the method, we constructed a model for eight TCGA cancers, using data from 3900 patients. The model rediscovered known mechanisms and contained interesting predictions. Possible applications include prediction of regulatory relationships, comparison of network modules across multiple forms of cancer and identification of drug targets. PMID:25953855

  8. Testing independence of bivariate interval-censored data using modified Kendall's tau statistic.

    PubMed

    Kim, Yuneung; Lim, Johan; Park, DoHwan

    2015-11-01

    In this paper, we study a nonparametric procedure to test independence of bivariate interval censored data; for both current status data (case 1 interval-censored data) and case 2 interval-censored data. To do it, we propose a score-based modification of the Kendall's tau statistic for bivariate interval-censored data. Our modification defines the Kendall's tau statistic with expected numbers of concordant and disconcordant pairs of data. The performance of the modified approach is illustrated by simulation studies and application to the AIDS study. We compare our method to alternative approaches such as the two-stage estimation method by Sun et al. (Scandinavian Journal of Statistics, 2006) and the multiple imputation method by Betensky and Finkelstein (Statistics in Medicine, 1999b). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Analysis and Interpretation of Findings Using Multiple Regression Techniques

    ERIC Educational Resources Information Center

    Hoyt, William T.; Leierer, Stephen; Millington, Michael J.

    2006-01-01

    Multiple regression and correlation (MRC) methods form a flexible family of statistical techniques that can address a wide variety of different types of research questions of interest to rehabilitation professionals. In this article, we review basic concepts and terms, with an emphasis on interpretation of findings relevant to research questions…

  10. Investigating the Stability of Four Methods for Estimating Item Bias.

    ERIC Educational Resources Information Center

    Perlman, Carole L.; And Others

    The reliability of item bias estimates was studied for four methods: (1) the transformed delta method; (2) Shepard's modified delta method; (3) Rasch's one-parameter residual analysis; and (4) the Mantel-Haenszel procedure. Bias statistics were computed for each sample using all methods. Data were from administration of multiple-choice items from…

  11. A functional U-statistic method for association analysis of sequencing data.

    PubMed

    Jadhav, Sneha; Tong, Xiaoran; Lu, Qing

    2017-11-01

    Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence. © 2017 WILEY PERIODICALS, INC.

  12. gsSKAT: Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels.

    PubMed

    Larson, Nicholas B; McDonnell, Shannon; Cannon Albright, Lisa; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan; Schleutker, Johanna; Carpten, John D; Powell, Isaac; Bailey-Wilson, Joan E; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G; MacInnis, Robert J; Maier, Christiane; Whittemore, Alice S; Hsieh, Chih-Lin; Wiklund, Fredrik; Catalona, William J; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael J; Olson, Timothy M; Klein, Christopher J; Thibodeau, Stephen N; Schaid, Daniel J

    2017-05-01

    Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results. © 2017 WILEY PERIODICALS, INC.

  13. Does transport time help explain the high trauma mortality rates in rural areas? New and traditional predictors assessed by new and traditional statistical methods

    PubMed Central

    Røislien, Jo; Lossius, Hans Morten; Kristiansen, Thomas

    2015-01-01

    Background Trauma is a leading global cause of death. Trauma mortality rates are higher in rural areas, constituting a challenge for quality and equality in trauma care. The aim of the study was to explore population density and transport time to hospital care as possible predictors of geographical differences in mortality rates, and to what extent choice of statistical method might affect the analytical results and accompanying clinical conclusions. Methods Using data from the Norwegian Cause of Death registry, deaths from external causes 1998–2007 were analysed. Norway consists of 434 municipalities, and municipality population density and travel time to hospital care were entered as predictors of municipality mortality rates in univariate and multiple regression models of increasing model complexity. We fitted linear regression models with continuous and categorised predictors, as well as piecewise linear and generalised additive models (GAMs). Models were compared using Akaike's information criterion (AIC). Results Population density was an independent predictor of trauma mortality rates, while the contribution of transport time to hospital care was highly dependent on choice of statistical model. A multiple GAM or piecewise linear model was superior, and similar, in terms of AIC. However, while transport time was statistically significant in multiple models with piecewise linear or categorised predictors, it was not in GAM or standard linear regression. Conclusions Population density is an independent predictor of trauma mortality rates. The added explanatory value of transport time to hospital care is marginal and model-dependent, highlighting the importance of exploring several statistical models when studying complex associations in observational data. PMID:25972600

  14. A spatial scan statistic for compound Poisson data.

    PubMed

    Rosychuk, Rhonda J; Chang, Hsing-Ming

    2013-12-20

    The topic of spatial cluster detection gained attention in statistics during the late 1980s and early 1990s. Effort has been devoted to the development of methods for detecting spatial clustering of cases and events in the biological sciences, astronomy and epidemiology. More recently, research has examined detecting clusters of correlated count data associated with health conditions of individuals. Such a method allows researchers to examine spatial relationships of disease-related events rather than just incident or prevalent cases. We introduce a spatial scan test that identifies clusters of events in a study region. Because an individual case may have multiple (repeated) events, we base the test on a compound Poisson model. We illustrate our method for cluster detection on emergency department visits, where individuals may make multiple disease-related visits. Copyright © 2013 John Wiley & Sons, Ltd.

  15. Detecting anomalies in CMB maps: a new method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Neelakanta, Jayanth T., E-mail: jayanthtn@gmail.com

    2015-10-01

    Ever since WMAP announced its first results, different analyses have shown that there is weak evidence for several large-scale anomalies in the CMB data. While the evidence for each anomaly appears to be weak, the fact that there are multiple seemingly unrelated anomalies makes it difficult to account for them via a single statistical fluke. So, one is led to considering a combination of these anomalies. But, if we ''hand-pick'' the anomalies (test statistics) to consider, we are making an a posteriori choice. In this article, we propose two statistics that do not suffer from this problem. The statistics aremore » linear and quadratic combinations of the a{sub ℓ m}'s with random co-efficients, and they test the null hypothesis that the a{sub ℓ m}'s are independent, normally-distributed, zero-mean random variables with an m-independent variance. The motivation for considering multiple modes is this: because most physical models that lead to large-scale anomalies result in coupling multiple ℓ and m modes, the ''coherence'' of this coupling should get enhanced if a combination of different modes is considered. In this sense, the statistics are thus much more generic than those that have been hitherto considered in literature. Using fiducial data, we demonstrate that the method works and discuss how it can be used with actual CMB data to make quite general statements about the incompatibility of the data with the null hypothesis.« less

  16. A fast algorithm for determining bounds and accurate approximate p-values of the rank product statistic for replicate experiments.

    PubMed

    Heskes, Tom; Eisinga, Rob; Breitling, Rainer

    2014-11-21

    The rank product method is a powerful statistical technique for identifying differentially expressed molecules in replicated experiments. A critical issue in molecule selection is accurate calculation of the p-value of the rank product statistic to adequately address multiple testing. Both exact calculation and permutation and gamma approximations have been proposed to determine molecule-level significance. These current approaches have serious drawbacks as they are either computationally burdensome or provide inaccurate estimates in the tail of the p-value distribution. We derive strict lower and upper bounds to the exact p-value along with an accurate approximation that can be used to assess the significance of the rank product statistic in a computationally fast manner. The bounds and the proposed approximation are shown to provide far better accuracy over existing approximate methods in determining tail probabilities, with the slightly conservative upper bound protecting against false positives. We illustrate the proposed method in the context of a recently published analysis on transcriptomic profiling performed in blood. We provide a method to determine upper bounds and accurate approximate p-values of the rank product statistic. The proposed algorithm provides an order of magnitude increase in throughput as compared with current approaches and offers the opportunity to explore new application domains with even larger multiple testing issue. The R code is published in one of the Additional files and is available at http://www.ru.nl/publish/pages/726696/rankprodbounds.zip .

  17. An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin

    Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less

  18. An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology

    DOE PAGES

    Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin; ...

    2017-05-15

    Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less

  19. Content and Method in the Teaching of Marketing Research Revisited

    ERIC Educational Resources Information Center

    Wilson, Holt; Neeley, Concha; Niedzwiecki, Kelly

    2009-01-01

    This paper presents the findings from a survey of marketing research faculty. The study finds SPSS is the most used statistical software, that cross tabulation, single, independent, and dependent t-tests, and ANOVA are among the most important statistical tools according to respondents. Bivariate and multiple regression are also considered…

  20. graph-GPA: A graphical model for prioritizing GWAS results and investigating pleiotropic architecture.

    PubMed

    Chung, Dongjun; Kim, Hang J; Zhao, Hongyu

    2017-02-01

    Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, identification of risk variants associated with complex diseases remains challenging as they are often affected by many genetic variants with small or moderate effects. There has been accumulating evidence suggesting that different complex traits share common risk basis, namely pleiotropy. Recently, several statistical methods have been developed to improve statistical power to identify risk variants for complex traits through a joint analysis of multiple GWAS datasets by leveraging pleiotropy. While these methods were shown to improve statistical power for association mapping compared to separate analyses, they are still limited in the number of phenotypes that can be integrated. In order to address this challenge, in this paper, we propose a novel statistical framework, graph-GPA, to integrate a large number of GWAS datasets for multiple phenotypes using a hidden Markov random field approach. Application of graph-GPA to a joint analysis of GWAS datasets for 12 phenotypes shows that graph-GPA improves statistical power to identify risk variants compared to statistical methods based on smaller number of GWAS datasets. In addition, graph-GPA also promotes better understanding of genetic mechanisms shared among phenotypes, which can potentially be useful for the development of improved diagnosis and therapeutics. The R implementation of graph-GPA is currently available at https://dongjunchung.github.io/GGPA/.

  1. A new method to address verification bias in studies of clinical screening tests: cervical cancer screening assays as an example.

    PubMed

    Xue, Xiaonan; Kim, Mimi Y; Castle, Philip E; Strickler, Howard D

    2014-03-01

    Studies to evaluate clinical screening tests often face the problem that the "gold standard" diagnostic approach is costly and/or invasive. It is therefore common to verify only a subset of negative screening tests using the gold standard method. However, undersampling the screen negatives can lead to substantial overestimation of the sensitivity and underestimation of the specificity of the diagnostic test. Our objective was to develop a simple and accurate statistical method to address this "verification bias." We developed a weighted generalized estimating equation approach to estimate, in a single model, the accuracy (eg, sensitivity/specificity) of multiple assays and simultaneously compare results between assays while addressing verification bias. This approach can be implemented using standard statistical software. Simulations were conducted to assess the proposed method. An example is provided using a cervical cancer screening trial that compared the accuracy of human papillomavirus and Pap tests, with histologic data as the gold standard. The proposed approach performed well in estimating and comparing the accuracy of multiple assays in the presence of verification bias. The proposed approach is an easy to apply and accurate method for addressing verification bias in studies of multiple screening methods. Copyright © 2014 Elsevier Inc. All rights reserved.

  2. A basket two-part model to analyze medical expenditure on interdependent multiple sectors.

    PubMed

    Sugawara, Shinya; Wu, Tianyi; Yamanishi, Kenji

    2018-05-01

    This study proposes a novel statistical methodology to analyze expenditure on multiple medical sectors using consumer data. Conventionally, medical expenditure has been analyzed by two-part models, which separately consider purchase decision and amount of expenditure. We extend the traditional two-part models by adding the step of basket analysis for dimension reduction. This new step enables us to analyze complicated interdependence between multiple sectors without an identification problem. As an empirical application for the proposed method, we analyze data of 13 medical sectors from the Medical Expenditure Panel Survey. In comparison with the results of previous studies that analyzed the multiple sector independently, our method provides more detailed implications of the impacts of individual socioeconomic status on the composition of joint purchases from multiple medical sectors; our method has a better prediction performance.

  3. A location-based multiple point statistics method: modelling the reservoir with non-stationary characteristics

    NASA Astrophysics Data System (ADS)

    Yin, Yanshu; Feng, Wenjie

    2017-12-01

    In this paper, a location-based multiple point statistics method is developed to model a non-stationary reservoir. The proposed method characterizes the relationship between the sedimentary pattern and the deposit location using the relative central position distance function, which alleviates the requirement that the training image and the simulated grids have the same dimension. The weights in every direction of the distance function can be changed to characterize the reservoir heterogeneity in various directions. The local integral replacements of data events, structured random path, distance tolerance and multi-grid strategy are applied to reproduce the sedimentary patterns and obtain a more realistic result. This method is compared with the traditional Snesim method using a synthesized 3-D training image of Poyang Lake and a reservoir model of Shengli Oilfield in China. The results indicate that the new method can reproduce the non-stationary characteristics better than the traditional method and is more suitable for simulation of delta-front deposits. These results show that the new method is a powerful tool for modelling a reservoir with non-stationary characteristics.

  4. Epidemiologic programs for computers and calculators. A microcomputer program for multiple logistic regression by unconditional and conditional maximum likelihood methods.

    PubMed

    Campos-Filho, N; Franco, E L

    1989-02-01

    A frequent procedure in matched case-control studies is to report results from the multivariate unmatched analyses if they do not differ substantially from the ones obtained after conditioning on the matching variables. Although conceptually simple, this rule requires that an extensive series of logistic regression models be evaluated by both the conditional and unconditional maximum likelihood methods. Most computer programs for logistic regression employ only one maximum likelihood method, which requires that the analyses be performed in separate steps. This paper describes a Pascal microcomputer (IBM PC) program that performs multiple logistic regression by both maximum likelihood estimation methods, which obviates the need for switching between programs to obtain relative risk estimates from both matched and unmatched analyses. The program calculates most standard statistics and allows factoring of categorical or continuous variables by two distinct methods of contrast. A built-in, descriptive statistics option allows the user to inspect the distribution of cases and controls across categories of any given variable.

  5. A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants

    PubMed Central

    Broadaway, K. Alaine; Cutler, David J.; Duncan, Richard; Moore, Jacob L.; Ware, Erin B.; Jhun, Min A.; Bielak, Lawrence F.; Zhao, Wei; Smith, Jennifer A.; Peyser, Patricia A.; Kardia, Sharon L.R.; Ghosh, Debashis; Epstein, Michael P.

    2016-01-01

    Increasing empirical evidence suggests that many genetic variants influence multiple distinct phenotypes. When cross-phenotype effects exist, multivariate association methods that consider pleiotropy are often more powerful than univariate methods that model each phenotype separately. Although several statistical approaches exist for testing cross-phenotype effects for common variants, there is a lack of similar tests for gene-based analysis of rare variants. In order to fill this important gap, we introduce a statistical method for cross-phenotype analysis of rare variants using a nonparametric distance-covariance approach that compares similarity in multivariate phenotypes to similarity in rare-variant genotypes across a gene. The approach can accommodate both binary and continuous phenotypes and further can adjust for covariates. Our approach yields a closed-form test whose significance can be evaluated analytically, thereby improving computational efficiency and permitting application on a genome-wide scale. We use simulated data to demonstrate that our method, which we refer to as the Gene Association with Multiple Traits (GAMuT) test, provides increased power over competing approaches. We also illustrate our approach using exome-chip data from the Genetic Epidemiology Network of Arteriopathy. PMID:26942286

  6. Testing Mediation Using Multiple Regression and Structural Equation Modeling Analyses in Secondary Data

    ERIC Educational Resources Information Center

    Li, Spencer D.

    2011-01-01

    Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…

  7. Utilizing the Zero-One Linear Programming Constraints to Draw Multiple Sets of Matched Samples from a Non-Treatment Population as Control Groups for the Quasi-Experimental Design

    ERIC Educational Resources Information Center

    Li, Yuan H.; Yang, Yu N.; Tompkins, Leroy J.; Modarresi, Shahpar

    2005-01-01

    The statistical technique, "Zero-One Linear Programming," that has successfully been used to create multiple tests with similar characteristics (e.g., item difficulties, test information and test specifications) in the area of educational measurement, was deemed to be a suitable method for creating multiple sets of matched samples to be…

  8. Multiple Hypothesis Testing for Experimental Gingivitis Based on Wilcoxon Signed Rank Statistics

    PubMed Central

    Preisser, John S.; Sen, Pranab K.; Offenbacher, Steven

    2011-01-01

    Dental research often involves repeated multivariate outcomes on a small number of subjects for which there is interest in identifying outcomes that exhibit change in their levels over time as well as to characterize the nature of that change. In particular, periodontal research often involves the analysis of molecular mediators of inflammation for which multivariate parametric methods are highly sensitive to outliers and deviations from Gaussian assumptions. In such settings, nonparametric methods may be favored over parametric ones. Additionally, there is a need for statistical methods that control an overall error rate for multiple hypothesis testing. We review univariate and multivariate nonparametric hypothesis tests and apply them to longitudinal data to assess changes over time in 31 biomarkers measured from the gingival crevicular fluid in 22 subjects whereby gingivitis was induced by temporarily withholding tooth brushing. To identify biomarkers that can be induced to change, multivariate Wilcoxon signed rank tests for a set of four summary measures based upon area under the curve are applied for each biomarker and compared to their univariate counterparts. Multiple hypothesis testing methods with choice of control of the false discovery rate or strong control of the family-wise error rate are examined. PMID:21984957

  9. SD-MSAEs: Promoter recognition in human genome based on deep feature extraction.

    PubMed

    Xu, Wenxuan; Zhang, Li; Lu, Yaping

    2016-06-01

    The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. [Evaluation of using statistical methods in selected national medical journals].

    PubMed

    Sych, Z

    1996-01-01

    The paper covers the performed evaluation of frequency with which the statistical methods were applied in analyzed works having been published in six selected, national medical journals in the years 1988-1992. For analysis the following journals were chosen, namely: Klinika Oczna, Medycyna Pracy, Pediatria Polska, Polski Tygodnik Lekarski, Roczniki Państwowego Zakładu Higieny, Zdrowie Publiczne. Appropriate number of works up to the average in the remaining medical journals was randomly selected from respective volumes of Pol. Tyg. Lek. The studies did not include works wherein the statistical analysis was not implemented, which referred both to national and international publications. That exemption was also extended to review papers, casuistic ones, reviews of books, handbooks, monographies, reports from scientific congresses, as well as papers on historical topics. The number of works was defined in each volume. Next, analysis was performed to establish the mode of finding out a suitable sample in respective studies, differentiating two categories: random and target selections. Attention was also paid to the presence of control sample in the individual works. In the analysis attention was also focussed on the existence of sample characteristics, setting up three categories: complete, partial and lacking. In evaluating the analyzed works an effort was made to present the results of studies in tables and figures (Tab. 1, 3). Analysis was accomplished with regard to the rate of employing statistical methods in analyzed works in relevant volumes of six selected, national medical journals for the years 1988-1992, simultaneously determining the number of works, in which no statistical methods were used. Concurrently the frequency of applying the individual statistical methods was analyzed in the scrutinized works. Prominence was given to fundamental statistical methods in the field of descriptive statistics (measures of position, measures of dispersion) as well as most important methods of mathematical statistics such as parametric tests of significance, analysis of variance (in single and dual classifications). non-parametric tests of significance, correlation and regression. The works, in which use was made of either multiple correlation or multiple regression or else more complex methods of studying the relationship for two or more numbers of variables, were incorporated into the works whose statistical methods were constituted by correlation and regression as well as other methods, e.g. statistical methods being used in epidemiology (coefficients of incidence and morbidity, standardization of coefficients, survival tables) factor analysis conducted by Jacobi-Hotellng's method, taxonomic methods and others. On the basis of the performed studies it has been established that the frequency of employing statistical methods in the six selected national, medical journals in the years 1988-1992 was 61.1-66.0% of the analyzed works (Tab. 3), and they generally were almost similar to the frequency provided in English language medical journals. On a whole, no significant differences were disclosed in the frequency of applied statistical methods (Tab. 4) as well as in frequency of random tests (Tab. 3) in the analyzed works, appearing in the medical journals in respective years 1988-1992. The most frequently used statistical methods in analyzed works for 1988-1992 were the measures of position 44.2-55.6% and measures of dispersion 32.5-38.5% as well as parametric tests of significance 26.3-33.1% of the works analyzed (Tab. 4). For the purpose of increasing the frequency and reliability of the used statistical methods, the didactics should be widened in the field of biostatistics at medical studies and postgraduation training designed for physicians and scientific-didactic workers.

  11. Integrative Analysis of “-Omics” Data Using Penalty Functions

    PubMed Central

    Zhao, Qing; Shi, Xingjie; Huang, Jian; Liu, Jin; Li, Yang; Ma, Shuangge

    2014-01-01

    In the analysis of omics data, integrative analysis provides an effective way of pooling information across multiple datasets or multiple correlated responses, and can be more effective than single-dataset (response) analysis. Multiple families of integrative analysis methods have been proposed in the literature. The current review focuses on the penalization methods. Special attention is paid to sparse meta-analysis methods that pool summary statistics across datasets, and integrative analysis methods that pool raw data across datasets. We discuss their formulation and rationale. Beyond “standard” penalized selection, we also review contrasted penalization and Laplacian penalization which accommodate finer data structures. The computational aspects, including computational algorithms and tuning parameter selection, are examined. This review concludes with possible limitations and extensions. PMID:25691921

  12. Bayesian demography 250 years after Bayes

    PubMed Central

    Bijak, Jakub; Bryant, John

    2016-01-01

    Bayesian statistics offers an alternative to classical (frequentist) statistics. It is distinguished by its use of probability distributions to describe uncertain quantities, which leads to elegant solutions to many difficult statistical problems. Although Bayesian demography, like Bayesian statistics more generally, is around 250 years old, only recently has it begun to flourish. The aim of this paper is to review the achievements of Bayesian demography, address some misconceptions, and make the case for wider use of Bayesian methods in population studies. We focus on three applications: demographic forecasts, limited data, and highly structured or complex models. The key advantages of Bayesian methods are the ability to integrate information from multiple sources and to describe uncertainty coherently. Bayesian methods also allow for including additional (prior) information next to the data sample. As such, Bayesian approaches are complementary to many traditional methods, which can be productively re-expressed in Bayesian terms. PMID:26902889

  13. Blind image quality assessment based on aesthetic and statistical quality-aware features

    NASA Astrophysics Data System (ADS)

    Jenadeleh, Mohsen; Masaeli, Mohammad Masood; Moghaddam, Mohsen Ebrahimi

    2017-07-01

    The main goal of image quality assessment (IQA) methods is the emulation of human perceptual image quality judgments. Therefore, the correlation between objective scores of these methods with human perceptual scores is considered as their performance metric. Human judgment of the image quality implicitly includes many factors when assessing perceptual image qualities such as aesthetics, semantics, context, and various types of visual distortions. The main idea of this paper is to use a host of features that are commonly employed in image aesthetics assessment in order to improve blind image quality assessment (BIQA) methods accuracy. We propose an approach that enriches the features of BIQA methods by integrating a host of aesthetics image features with the features of natural image statistics derived from multiple domains. The proposed features have been used for augmenting five different state-of-the-art BIQA methods, which use statistical natural scene statistics features. Experiments were performed on seven benchmark image quality databases. The experimental results showed significant improvement of the accuracy of the methods.

  14. Monitoring and identification of spatiotemporal landscape changes in multiple remote sensing images by using a stratified conditional Latin hypercube sampling approach and geostatistical simulation.

    PubMed

    Lin, Yu-Pin; Chu, Hone-Jay; Huang, Yu-Long; Tang, Chia-Hsi; Rouhani, Shahrokh

    2011-06-01

    This study develops a stratified conditional Latin hypercube sampling (scLHS) approach for multiple, remotely sensed, normalized difference vegetation index (NDVI) images. The objective is to sample, monitor, and delineate spatiotemporal landscape changes, including spatial heterogeneity and variability, in a given area. The scLHS approach, which is based on the variance quadtree technique (VQT) and the conditional Latin hypercube sampling (cLHS) method, selects samples in order to delineate landscape changes from multiple NDVI images. The images are then mapped for calibration and validation by using sequential Gaussian simulation (SGS) with the scLHS selected samples. Spatial statistical results indicate that in terms of their statistical distribution, spatial distribution, and spatial variation, the statistics and variograms of the scLHS samples resemble those of multiple NDVI images more closely than those of cLHS and VQT samples. Moreover, the accuracy of simulated NDVI images based on SGS with scLHS samples is significantly better than that of simulated NDVI images based on SGS with cLHS samples and VQT samples, respectively. However, the proposed approach efficiently monitors the spatial characteristics of landscape changes, including the statistics, spatial variability, and heterogeneity of NDVI images. In addition, SGS with the scLHS samples effectively reproduces spatial patterns and landscape changes in multiple NDVI images.

  15. Linking stressors and ecological responses

    USGS Publications Warehouse

    Gentile, J.H.; Solomon, K.R.; Butcher, J.B.; Harrass, M.; Landis, W.G.; Power, M.; Rattner, B.A.; Warren-Hicks, W.J.; Wenger, R.; Foran, Jeffery A.; Ferenc, Susan A.

    1999-01-01

    To characterize risk, it is necessary to quantify the linkages and interactions between chemical, physical and biological stressors and endpoints in the conceptual framework for ecological risk assessment (ERA). This can present challenges in a multiple stressor analysis, and it will not always be possible to develop a quantitative stressor-response profile. This review commences with a conceptual representation of the problem of developing a linkage analysis for multiple stressors and responses. The remainder of the review surveys a variety of mathematical and statistical methods (e.g., ranking methods, matrix models, multivariate dose-response for mixtures, indices, visualization, simulation modeling and decision-oriented methods) for accomplishing the linkage analysis for multiple stressors. Describing the relationships between multiple stressors and ecological effects are critical components of 'effects assessment' in the ecological risk assessment framework.

  16. Evaluation of Cepstrum Algorithm with Impact Seeded Fault Data of Helicopter Oil Cooler Fan Bearings and Machine Fault Simulator Data

    DTIC Science & Technology

    2013-02-01

    of a bearing must be put into practice. There are many potential methods, the most traditional being the use of statistical time-domain features...accelerate degradation to test multiples bearings to gain statistical relevance and extrapolate results to scale for field conditions. Temperature...as time statistics , frequency estimation to improve the fault frequency detection. For future investigations, one can further explore the

  17. A statistically harmonized alignment-classification in image space enables accurate and robust alignment of noisy images in single particle analysis.

    PubMed

    Kawata, Masaaki; Sato, Chikara

    2007-06-01

    In determining the three-dimensional (3D) structure of macromolecular assemblies in single particle analysis, a large representative dataset of two-dimensional (2D) average images from huge number of raw images is a key for high resolution. Because alignments prior to averaging are computationally intensive, currently available multireference alignment (MRA) software does not survey every possible alignment. This leads to misaligned images, creating blurred averages and reducing the quality of the final 3D reconstruction. We present a new method, in which multireference alignment is harmonized with classification (multireference multiple alignment: MRMA). This method enables a statistical comparison of multiple alignment peaks, reflecting the similarities between each raw image and a set of reference images. Among the selected alignment candidates for each raw image, misaligned images are statistically excluded, based on the principle that aligned raw images of similar projections have a dense distribution around the correctly aligned coordinates in image space. This newly developed method was examined for accuracy and speed using model image sets with various signal-to-noise ratios, and with electron microscope images of the Transient Receptor Potential C3 and the sodium channel. In every data set, the newly developed method outperformed conventional methods in robustness against noise and in speed, creating 2D average images of higher quality. This statistically harmonized alignment-classification combination should greatly improve the quality of single particle analysis.

  18. A method of classification for multisource data in remote sensing based on interval-valued probabilities

    NASA Technical Reports Server (NTRS)

    Kim, Hakil; Swain, Philip H.

    1990-01-01

    An axiomatic approach to intervalued (IV) probabilities is presented, where the IV probability is defined by a pair of set-theoretic functions which satisfy some pre-specified axioms. On the basis of this approach representation of statistical evidence and combination of multiple bodies of evidence are emphasized. Although IV probabilities provide an innovative means for the representation and combination of evidential information, they make the decision process rather complicated. It entails more intelligent strategies for making decisions. The development of decision rules over IV probabilities is discussed from the viewpoint of statistical pattern recognition. The proposed method, so called evidential reasoning method, is applied to the ground-cover classification of a multisource data set consisting of Multispectral Scanner (MSS) data, Synthetic Aperture Radar (SAR) data, and digital terrain data such as elevation, slope, and aspect. By treating the data sources separately, the method is able to capture both parametric and nonparametric information and to combine them. Then the method is applied to two separate cases of classifying multiband data obtained by a single sensor. In each case a set of multiple sources is obtained by dividing the dimensionally huge data into smaller and more manageable pieces based on the global statistical correlation information. By a divide-and-combine process, the method is able to utilize more features than the conventional maximum likelihood method.

  19. HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS.

    PubMed

    Song, Chi; Tseng, George C

    2014-01-01

    Meta-analysis techniques have been widely developed and applied in genomic applications, especially for combining multiple transcriptomic studies. In this paper, we propose an order statistic of p-values ( r th ordered p-value, rOP) across combined studies as the test statistic. We illustrate different hypothesis settings that detect gene markers differentially expressed (DE) "in all studies", "in the majority of studies", or "in one or more studies", and specify rOP as a suitable method for detecting DE genes "in the majority of studies". We develop methods to estimate the parameter r in rOP for real applications. Statistical properties such as its asymptotic behavior and a one-sided testing correction for detecting markers of concordant expression changes are explored. Power calculation and simulation show better performance of rOP compared to classical Fisher's method, Stouffer's method, minimum p-value method and maximum p-value method under the focused hypothesis setting. Theoretically, rOP is found connected to the naïve vote counting method and can be viewed as a generalized form of vote counting with better statistical properties. The method is applied to three microarray meta-analysis examples including major depressive disorder, brain cancer and diabetes. The results demonstrate rOP as a more generalizable, robust and sensitive statistical framework to detect disease-related markers.

  20. Prediction of Multiple-Trait and Multiple-Environment Genomic Data Using Recommender Systems.

    PubMed

    Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José C; Mota-Sanchez, David; Estrada-González, Fermín; Gillberg, Jussi; Singh, Ravi; Mondal, Suchismita; Juliana, Philomin

    2018-01-04

    In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: item-based collaborative filtering (IBCF) and the matrix factorization algorithm (MF) in the context of multiple traits and multiple environments. The IBCF and MF methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique was slightly better in terms of prediction accuracy than the two conventional methods and the MF method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment-trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets. Copyright © 2018 Montesinos-Lopez et al.

  1. Prediction of Multiple-Trait and Multiple-Environment Genomic Data Using Recommender Systems

    PubMed Central

    Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José C.; Mota-Sanchez, David; Estrada-González, Fermín; Gillberg, Jussi; Singh, Ravi; Mondal, Suchismita; Juliana, Philomin

    2018-01-01

    In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: item-based collaborative filtering (IBCF) and the matrix factorization algorithm (MF) in the context of multiple traits and multiple environments. The IBCF and MF methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique was slightly better in terms of prediction accuracy than the two conventional methods and the MF method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment–trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets. PMID:29097376

  2. Extracting fingerprint of wireless devices based on phase noise and multiple level wavelet decomposition

    NASA Astrophysics Data System (ADS)

    Zhao, Weichen; Sun, Zhuo; Kong, Song

    2016-10-01

    Wireless devices can be identified by the fingerprint extracted from the signal transmitted, which is useful in wireless communication security and other fields. This paper presents a method that extracts fingerprint based on phase noise of signal and multiple level wavelet decomposition. The phase of signal will be extracted first and then decomposed by multiple level wavelet decomposition. The statistic value of each wavelet coefficient vector is utilized for constructing fingerprint. Besides, the relationship between wavelet decomposition level and recognition accuracy is simulated. And advertised decomposition level is revealed as well. Compared with previous methods, our method is simpler and the accuracy of recognition remains high when Signal Noise Ratio (SNR) is low.

  3. HAPRAP: a haplotype-based iterative method for statistical fine mapping using GWAS summary statistics.

    PubMed

    Zheng, Jie; Rodriguez, Santiago; Laurin, Charles; Baird, Denis; Trela-Larsen, Lea; Erzurumluoglu, Mesut A; Zheng, Yi; White, Jon; Giambartolomei, Claudia; Zabaneh, Delilah; Morris, Richard; Kumari, Meena; Casas, Juan P; Hingorani, Aroon D; Evans, David M; Gaunt, Tom R; Day, Ian N M

    2017-01-01

    Fine mapping is a widely used approach for identifying the causal variant(s) at disease-associated loci. Standard methods (e.g. multiple regression) require individual level genotypes. Recent fine mapping methods using summary-level data require the pairwise correlation coefficients ([Formula: see text]) of the variants. However, haplotypes rather than pairwise [Formula: see text], are the true biological representation of linkage disequilibrium (LD) among multiple loci. In this article, we present an empirical iterative method, HAPlotype Regional Association analysis Program (HAPRAP), that enables fine mapping using summary statistics and haplotype information from an individual-level reference panel. Simulations with individual-level genotypes show that the results of HAPRAP and multiple regression are highly consistent. In simulation with summary-level data, we demonstrate that HAPRAP is less sensitive to poor LD estimates. In a parametric simulation using Genetic Investigation of ANthropometric Traits height data, HAPRAP performs well with a small training sample size (N < 2000) while other methods become suboptimal. Moreover, HAPRAP's performance is not affected substantially by single nucleotide polymorphisms (SNPs) with low minor allele frequencies. We applied the method to existing quantitative trait and binary outcome meta-analyses (human height, QTc interval and gallbladder disease); all previous reported association signals were replicated and two additional variants were independently associated with human height. Due to the growing availability of summary level data, the value of HAPRAP is likely to increase markedly for future analyses (e.g. functional prediction and identification of instruments for Mendelian randomization). The HAPRAP package and documentation are available at http://apps.biocompute.org.uk/haprap/ CONTACT: : jie.zheng@bristol.ac.uk or tom.gaunt@bristol.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  4. Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions.

    PubMed

    Najibi, Seyed Morteza; Maadooliat, Mehdi; Zhou, Lan; Huang, Jianhua Z; Gao, Xin

    2017-01-01

    Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.

  5. Random matrices and condensation into multiple states

    NASA Astrophysics Data System (ADS)

    Sadeghi, Sina; Engel, Andreas

    2018-03-01

    In the present work, we employ methods from statistical mechanics of disordered systems to investigate static properties of condensation into multiple states in a general framework. We aim at showing how typical properties of random interaction matrices play a vital role in manifesting the statistics of condensate states. In particular, an analytical expression for the fraction of condensate states in the thermodynamic limit is provided that confirms the result of the mean number of coexisting species in a random tournament game. We also study the interplay between the condensation problem and zero-sum games with correlated random payoff matrices.

  6. A simulation-based evaluation of methods for inferring linear barriers to gene flow

    Treesearch

    Christopher Blair; Dana E. Weigel; Matthew Balazik; Annika T. H. Keeley; Faith M. Walker; Erin Landguth; Sam Cushman; Melanie Murphy; Lisette Waits; Niko Balkenhol

    2012-01-01

    Different analytical techniques used on the same data set may lead to different conclusions about the existence and strength of genetic structure. Therefore, reliable interpretation of the results from different methods depends on the efficacy and reliability of different statistical methods. In this paper, we evaluated the performance of multiple analytical methods to...

  7. Resting-state fMRI data reflects default network activity rather than null data: A defense of commonly employed methods to correct for multiple comparisons.

    PubMed

    Slotnick, Scott D

    2017-07-01

    Analysis of functional magnetic resonance imaging (fMRI) data typically involves over one hundred thousand independent statistical tests; therefore, it is necessary to correct for multiple comparisons to control familywise error. In a recent paper, Eklund, Nichols, and Knutsson used resting-state fMRI data to evaluate commonly employed methods to correct for multiple comparisons and reported unacceptable rates of familywise error. Eklund et al.'s analysis was based on the assumption that resting-state fMRI data reflect null data; however, their 'null data' actually reflected default network activity that inflated familywise error. As such, Eklund et al.'s results provide no basis to question the validity of the thousands of published fMRI studies that have corrected for multiple comparisons or the commonly employed methods to correct for multiple comparisons.

  8. A robust and efficient statistical method for genetic association studies using case and control samples from multiple cohorts

    PubMed Central

    2013-01-01

    Background The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case–control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. Results We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson’s disease (PD) case–control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk. Conclusions We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS. PMID:23394771

  9. An open-access CMIP5 pattern library for temperature and precipitation: description and methodology

    NASA Astrophysics Data System (ADS)

    Lynch, Cary; Hartin, Corinne; Bond-Lamberty, Ben; Kravitz, Ben

    2017-05-01

    Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squares regression methods. We explore the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90° N/S). Bias and mean errors between modeled and pattern-predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5 °C, but the choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. This paper describes our library of least squares regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns. The dataset and netCDF data generation code are available at doi:10.5281/zenodo.495632.

  10. Statistical and Machine Learning forecasting methods: Concerns and ways forward

    PubMed Central

    Makridakis, Spyros; Assimakopoulos, Vassilios

    2018-01-01

    Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions. PMID:29584784

  11. Cover/Frequency (CF)

    Treesearch

    John F. Caratti

    2006-01-01

    The FIREMON Cover/Frequency (CF) method is used to assess changes in plant species cover and frequency for a macroplot. This method uses multiple quadrats to sample within-plot variation and quantify statistically valid changes in plant species cover, height, and frequency over time. Because it is difficult to estimate cover in quadrats for larger plants, this method...

  12. A Statistical Model for Misreported Binary Outcomes in Clustered RCTs of Education Interventions

    ERIC Educational Resources Information Center

    Schochet, Peter Z.

    2013-01-01

    In randomized control trials (RCTs) of educational interventions, there is a growing literature on impact estimation methods to adjust for missing student outcome data using such methods as multiple imputation, the construction of nonresponse weights, casewise deletion, and maximum likelihood methods (see, for example, Allison, 2002; Graham, 2009;…

  13. Line Intercept (LI)

    Treesearch

    John F. Caratti

    2006-01-01

    The FIREMON Line Intercept (LI) method is used to assess changes in plant species cover for a macroplot. This method uses multiple line transects to sample within plot variation and quantify statistically valid changes in plant species cover and height over time. This method is suited for most forest and rangeland communities, but is especially useful for sampling...

  14. Comparing and combining biomarkers as principle surrogates for time-to-event clinical endpoints.

    PubMed

    Gabriel, Erin E; Sachs, Michael C; Gilbert, Peter B

    2015-02-10

    Principal surrogate endpoints are useful as targets for phase I and II trials. In many recent trials, multiple post-randomization biomarkers are measured. However, few statistical methods exist for comparison of or combination of biomarkers as principal surrogates, and none of these methods to our knowledge utilize time-to-event clinical endpoint information. We propose a Weibull model extension of the semi-parametric estimated maximum likelihood method that allows for the inclusion of multiple biomarkers in the same risk model as multivariate candidate principal surrogates. We propose several methods for comparing candidate principal surrogates and evaluating multivariate principal surrogates. These include the time-dependent and surrogate-dependent true and false positive fraction, the time-dependent and the integrated standardized total gain, and the cumulative distribution function of the risk difference. We illustrate the operating characteristics of our proposed methods in simulations and outline how these statistics can be used to evaluate and compare candidate principal surrogates. We use these methods to investigate candidate surrogates in the Diabetes Control and Complications Trial. Copyright © 2014 John Wiley & Sons, Ltd.

  15. DNA melting profiles from a matrix method.

    PubMed

    Poland, Douglas

    2004-02-05

    In this article we give a new method for the calculation of DNA melting profiles. Based on the matrix formulation of the DNA partition function, the method relies for its efficiency on the fact that the required matrices are very sparse, essentially reducing matrix multiplication to vector multiplication and thus making the computer time required to treat a DNA molecule containing N base pairs proportional to N(2). A key ingredient in the method is the result that multiplication by the inverse matrix can also be reduced to vector multiplication. The task of calculating the melting profile for the entire genome is further reduced by treating regions of the molecule between helix-plateaus, thus breaking the molecule up into independent parts that can each be treated individually. The method is easily modified to incorporate changes in the assignment of statistical weights to the different structural features of DNA. We illustrate the method using the genome of Haemophilus influenzae. Copyright 2003 Wiley Periodicals, Inc.

  16. Multiple imputation methods for bivariate outcomes in cluster randomised trials.

    PubMed

    DiazOrdaz, K; Kenward, M G; Gomes, M; Grieve, R

    2016-09-10

    Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  17. Bayesian models based on test statistics for multiple hypothesis testing problems.

    PubMed

    Ji, Yuan; Lu, Yiling; Mills, Gordon B

    2008-04-01

    We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.

  18. Multiple-Point statistics for stochastic modeling of aquifers, where do we stand?

    NASA Astrophysics Data System (ADS)

    Renard, P.; Julien, S.

    2017-12-01

    In the last 20 years, multiple-point statistics have been a focus of much research, successes and disappointments. The aim of this geostatistical approach was to integrate geological information into stochastic models of aquifer heterogeneity to better represent the connectivity of high or low permeability structures in the underground. Many different algorithms (ENESIM, SNESIM, SIMPAT, CCSIM, QUILTING, IMPALA, DEESSE, FILTERSIM, HYPPS, etc.) have been and are still proposed. They are all based on the concept of a training data set from which spatial statistics are derived and used in a further step to generate conditional realizations. Some of these algorithms evaluate the statistics of the spatial patterns for every pixel, other techniques consider the statistics at the scale of a patch or a tile. While the method clearly succeeded in enabling modelers to generate realistic models, several issues are still the topic of debate both from a practical and theoretical point of view, and some issues such as training data set availability are often hindering the application of the method in practical situations. In this talk, the aim is to present a review of the status of these approaches both from a theoretical and practical point of view using several examples at different scales (from pore network to regional aquifer).

  19. Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline

    PubMed Central

    2013-01-01

    Background As high-throughput genomic technologies become accurate and affordable, an increasing number of data sets have been accumulated in the public domain and genomic information integration and meta-analysis have become routine in biomedical research. In this paper, we focus on microarray meta-analysis, where multiple microarray studies with relevant biological hypotheses are combined in order to improve candidate marker detection. Many methods have been developed and applied in the literature, but their performance and properties have only been minimally investigated. There is currently no clear conclusion or guideline as to the proper choice of a meta-analysis method given an application; the decision essentially requires both statistical and biological considerations. Results We performed 12 microarray meta-analysis methods for combining multiple simulated expression profiles, and such methods can be categorized for different hypothesis setting purposes: (1) HS A : DE genes with non-zero effect sizes in all studies, (2) HS B : DE genes with non-zero effect sizes in one or more studies and (3) HS r : DE gene with non-zero effect in "majority" of studies. We then performed a comprehensive comparative analysis through six large-scale real applications using four quantitative statistical evaluation criteria: detection capability, biological association, stability and robustness. We elucidated hypothesis settings behind the methods and further apply multi-dimensional scaling (MDS) and an entropy measure to characterize the meta-analysis methods and data structure, respectively. Conclusions The aggregated results from the simulation study categorized the 12 methods into three hypothesis settings (HS A , HS B , and HS r ). Evaluation in real data and results from MDS and entropy analyses provided an insightful and practical guideline to the choice of the most suitable method in a given application. All source files for simulation and real data are available on the author’s publication website. PMID:24359104

  20. Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline.

    PubMed

    Chang, Lun-Ching; Lin, Hui-Min; Sibille, Etienne; Tseng, George C

    2013-12-21

    As high-throughput genomic technologies become accurate and affordable, an increasing number of data sets have been accumulated in the public domain and genomic information integration and meta-analysis have become routine in biomedical research. In this paper, we focus on microarray meta-analysis, where multiple microarray studies with relevant biological hypotheses are combined in order to improve candidate marker detection. Many methods have been developed and applied in the literature, but their performance and properties have only been minimally investigated. There is currently no clear conclusion or guideline as to the proper choice of a meta-analysis method given an application; the decision essentially requires both statistical and biological considerations. We performed 12 microarray meta-analysis methods for combining multiple simulated expression profiles, and such methods can be categorized for different hypothesis setting purposes: (1) HS(A): DE genes with non-zero effect sizes in all studies, (2) HS(B): DE genes with non-zero effect sizes in one or more studies and (3) HS(r): DE gene with non-zero effect in "majority" of studies. We then performed a comprehensive comparative analysis through six large-scale real applications using four quantitative statistical evaluation criteria: detection capability, biological association, stability and robustness. We elucidated hypothesis settings behind the methods and further apply multi-dimensional scaling (MDS) and an entropy measure to characterize the meta-analysis methods and data structure, respectively. The aggregated results from the simulation study categorized the 12 methods into three hypothesis settings (HS(A), HS(B), and HS(r)). Evaluation in real data and results from MDS and entropy analyses provided an insightful and practical guideline to the choice of the most suitable method in a given application. All source files for simulation and real data are available on the author's publication website.

  1. History, rare, and multiple events of mechanical unfolding of repeat proteins

    NASA Astrophysics Data System (ADS)

    Sumbul, Fidan; Marchesi, Arin; Rico, Felix

    2018-03-01

    Mechanical unfolding of proteins consisting of repeat domains is an excellent tool to obtain large statistics. Force spectroscopy experiments using atomic force microscopy on proteins presenting multiple domains have revealed that unfolding forces depend on the number of folded domains (history) and have reported intermediate states and rare events. However, the common use of unspecific attachment approaches to pull the protein of interest holds important limitations to study unfolding history and may lead to discarding rare and multiple probing events due to the presence of unspecific adhesion and uncertainty on the pulling site. Site-specific methods that have recently emerged minimize this uncertainty and would be excellent tools to probe unfolding history and rare events. However, detailed characterization of these approaches is required to identify their advantages and limitations. Here, we characterize a site-specific binding approach based on the ultrastable complex dockerin/cohesin III revealing its advantages and limitations to assess the unfolding history and to investigate rare and multiple events during the unfolding of repeated domains. We show that this approach is more robust, reproducible, and provides larger statistics than conventional unspecific methods. We show that the method is optimal to reveal the history of unfolding from the very first domain and to detect rare events, while being more limited to assess intermediate states. Finally, we quantify the forces required to unfold two molecules pulled in parallel, difficult when using unspecific approaches. The proposed method represents a step forward toward more reproducible measurements to probe protein unfolding history and opens the door to systematic probing of rare and multiple molecule unfolding mechanisms.

  2. Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models.

    PubMed

    Fan, Ruzong; Wang, Yifan; Boehnke, Michael; Chen, Wei; Li, Yun; Ren, Haobo; Lobach, Iryna; Xiong, Momiao

    2015-08-01

    Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies. Copyright © 2015 by the Genetics Society of America.

  3. Statistical methods for detecting and comparing periodic data and their application to the nycthemeral rhythm of bodily harm: A population based study

    PubMed Central

    2010-01-01

    Background Animals, including humans, exhibit a variety of biological rhythms. This article describes a method for the detection and simultaneous comparison of multiple nycthemeral rhythms. Methods A statistical method for detecting periodic patterns in time-related data via harmonic regression is described. The method is particularly capable of detecting nycthemeral rhythms in medical data. Additionally a method for simultaneously comparing two or more periodic patterns is described, which derives from the analysis of variance (ANOVA). This method statistically confirms or rejects equality of periodic patterns. Mathematical descriptions of the detecting method and the comparing method are displayed. Results Nycthemeral rhythms of incidents of bodily harm in Middle Franconia are analyzed in order to demonstrate both methods. Every day of the week showed a significant nycthemeral rhythm of bodily harm. These seven patterns of the week were compared to each other revealing only two different nycthemeral rhythms, one for Friday and Saturday and one for the other weekdays. PMID:21059197

  4. On using summary statistics from an external calibration sample to correct for covariate measurement error.

    PubMed

    Guo, Ying; Little, Roderick J; McConnell, Daniel S

    2012-01-01

    Covariate measurement error is common in epidemiologic studies. Current methods for correcting measurement error with information from external calibration samples are insufficient to provide valid adjusted inferences. We consider the problem of estimating the regression of an outcome Y on covariates X and Z, where Y and Z are observed, X is unobserved, but a variable W that measures X with error is observed. Information about measurement error is provided in an external calibration sample where data on X and W (but not Y and Z) are recorded. We describe a method that uses summary statistics from the calibration sample to create multiple imputations of the missing values of X in the regression sample, so that the regression coefficients of Y on X and Z and associated standard errors can be estimated using simple multiple imputation combining rules, yielding valid statistical inferences under the assumption of a multivariate normal distribution. The proposed method is shown by simulation to provide better inferences than existing methods, namely the naive method, classical calibration, and regression calibration, particularly for correction for bias and achieving nominal confidence levels. We also illustrate our method with an example using linear regression to examine the relation between serum reproductive hormone concentrations and bone mineral density loss in midlife women in the Michigan Bone Health and Metabolism Study. Existing methods fail to adjust appropriately for bias due to measurement error in the regression setting, particularly when measurement error is substantial. The proposed method corrects this deficiency.

  5. Generalized Full-Information Item Bifactor Analysis

    ERIC Educational Resources Information Center

    Cai, Li; Yang, Ji Seung; Hansen, Mark

    2011-01-01

    Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single-group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of…

  6. Software for the Integration of Multiomics Experiments in Bioconductor.

    PubMed

    Ramos, Marcel; Schiffer, Lucas; Re, Angela; Azhar, Rimsha; Basunia, Azfar; Rodriguez, Carmen; Chan, Tiffany; Chapman, Phil; Davis, Sean R; Gomez-Cabrero, David; Culhane, Aedin C; Haibe-Kains, Benjamin; Hansen, Kasper D; Kodali, Hanish; Louis, Marie S; Mer, Arvind S; Riester, Markus; Morgan, Martin; Carey, Vince; Waldron, Levi

    2017-11-01

    Multiomics experiments are increasingly commonplace in biomedical research and add layers of complexity to experimental design, data integration, and analysis. R and Bioconductor provide a generic framework for statistical analysis and visualization, as well as specialized data classes for a variety of high-throughput data types, but methods are lacking for integrative analysis of multiomics experiments. The MultiAssayExperiment software package, implemented in R and leveraging Bioconductor software and design principles, provides for the coordinated representation of, storage of, and operation on multiple diverse genomics data. We provide the unrestricted multiple 'omics data for each cancer tissue in The Cancer Genome Atlas as ready-to-analyze MultiAssayExperiment objects and demonstrate in these and other datasets how the software simplifies data representation, statistical analysis, and visualization. The MultiAssayExperiment Bioconductor package reduces major obstacles to efficient, scalable, and reproducible statistical analysis of multiomics data and enhances data science applications of multiple omics datasets. Cancer Res; 77(21); e39-42. ©2017 AACR . ©2017 American Association for Cancer Research.

  7. "TNOs are Cool": A survey of the trans-Neptunian region. XIII. Statistical analysis of multiple trans-Neptunian objects observed with Herschel Space Observatory

    NASA Astrophysics Data System (ADS)

    Kovalenko, I. D.; Doressoundiram, A.; Lellouch, E.; Vilenius, E.; Müller, T.; Stansberry, J.

    2017-11-01

    Context. Gravitationally bound multiple systems provide an opportunity to estimate the mean bulk density of the objects, whereas this characteristic is not available for single objects. Being a primitive population of the outer solar system, binary and multiple trans-Neptunian objects (TNOs) provide unique information about bulk density and internal structure, improving our understanding of their formation and evolution. Aims: The goal of this work is to analyse parameters of multiple trans-Neptunian systems, observed with Herschel and Spitzer space telescopes. Particularly, statistical analysis is done for radiometric size and geometric albedo, obtained from photometric observations, and for estimated bulk density. Methods: We use Monte Carlo simulation to estimate the real size distribution of TNOs. For this purpose, we expand the dataset of diameters by adopting the Minor Planet Center database list with available values of the absolute magnitude therein, and the albedo distribution derived from Herschel radiometric measurements. We use the 2-sample Anderson-Darling non-parametric statistical method for testing whether two samples of diameters, for binary and single TNOs, come from the same distribution. Additionally, we use the Spearman's coefficient as a measure of rank correlations between parameters. Uncertainties of estimated parameters together with lack of data are taken into account. Conclusions about correlations between parameters are based on statistical hypothesis testing. Results: We have found that the difference in size distributions of multiple and single TNOs is biased by small objects. The test on correlations between parameters shows that the effective diameter of binary TNOs strongly correlates with heliocentric orbital inclination and with magnitude difference between components of binary system. The correlation between diameter and magnitude difference implies that small and large binaries are formed by different mechanisms. Furthermore, the statistical test indicates, although not significant with the sample size, that a moderately strong correlation exists between diameter and bulk density. Herschel is an ESA space observatory with science instruments provided by European-led Principal Investigator consortia and with important participation from NASA.

  8. Multiple point statistical simulation using uncertain (soft) conditional data

    NASA Astrophysics Data System (ADS)

    Hansen, Thomas Mejer; Vu, Le Thanh; Mosegaard, Klaus; Cordua, Knud Skou

    2018-05-01

    Geostatistical simulation methods have been used to quantify spatial variability of reservoir models since the 80s. In the last two decades, state of the art simulation methods have changed from being based on covariance-based 2-point statistics to multiple-point statistics (MPS), that allow simulation of more realistic Earth-structures. In addition, increasing amounts of geo-information (geophysical, geological, etc.) from multiple sources are being collected. This pose the problem of integration of these different sources of information, such that decisions related to reservoir models can be taken on an as informed base as possible. In principle, though difficult in practice, this can be achieved using computationally expensive Monte Carlo methods. Here we investigate the use of sequential simulation based MPS simulation methods conditional to uncertain (soft) data, as a computational efficient alternative. First, it is demonstrated that current implementations of sequential simulation based on MPS (e.g. SNESIM, ENESIM and Direct Sampling) do not account properly for uncertain conditional information, due to a combination of using only co-located information, and a random simulation path. Then, we suggest two approaches that better account for the available uncertain information. The first make use of a preferential simulation path, where more informed model parameters are visited preferentially to less informed ones. The second approach involves using non co-located uncertain information. For different types of available data, these approaches are demonstrated to produce simulation results similar to those obtained by the general Monte Carlo based approach. These methods allow MPS simulation to condition properly to uncertain (soft) data, and hence provides a computationally attractive approach for integration of information about a reservoir model.

  9. Clinical Trials With Large Numbers of Variables: Important Advantages of Canonical Analysis.

    PubMed

    Cleophas, Ton J

    2016-01-01

    Canonical analysis assesses the combined effects of a set of predictor variables on a set of outcome variables, but it is little used in clinical trials despite the omnipresence of multiple variables. The aim of this study was to assess the performance of canonical analysis as compared with traditional multivariate methods using multivariate analysis of covariance (MANCOVA). As an example, a simulated data file with 12 gene expression levels and 4 drug efficacy scores was used. The correlation coefficient between the 12 predictor and 4 outcome variables was 0.87 (P = 0.0001) meaning that 76% of the variability in the outcome variables was explained by the 12 covariates. Repeated testing after the removal of 5 unimportant predictor and 1 outcome variable produced virtually the same overall result. The MANCOVA identified identical unimportant variables, but it was unable to provide overall statistics. (1) Canonical analysis is remarkable, because it can handle many more variables than traditional multivariate methods such as MANCOVA can. (2) At the same time, it accounts for the relative importance of the separate variables, their interactions and differences in units. (3) Canonical analysis provides overall statistics of the effects of sets of variables, whereas traditional multivariate methods only provide the statistics of the separate variables. (4) Unlike other methods for combining the effects of multiple variables such as factor analysis/partial least squares, canonical analysis is scientifically entirely rigorous. (5) Limitations include that it is less flexible than factor analysis/partial least squares, because only 2 sets of variables are used and because multiple solutions instead of one is offered. We do hope that this article will stimulate clinical investigators to start using this remarkable method.

  10. Enhancing the mathematical properties of new haplotype homozygosity statistics for the detection of selective sweeps.

    PubMed

    Garud, Nandita R; Rosenberg, Noah A

    2015-06-01

    Soft selective sweeps represent an important form of adaptation in which multiple haplotypes bearing adaptive alleles rise to high frequency. Most statistical methods for detecting selective sweeps from genetic polymorphism data, however, have focused on identifying hard selective sweeps in which a favored allele appears on a single haplotypic background; these methods might be underpowered to detect soft sweeps. Among exceptions is the set of haplotype homozygosity statistics introduced for the detection of soft sweeps by Garud et al. (2015). These statistics, examining frequencies of multiple haplotypes in relation to each other, include H12, a statistic designed to identify both hard and soft selective sweeps, and H2/H1, a statistic that conditional on high H12 values seeks to distinguish between hard and soft sweeps. A challenge in the use of H2/H1 is that its range depends on the associated value of H12, so that equal H2/H1 values might provide different levels of support for a soft sweep model at different values of H12. Here, we enhance the H12 and H2/H1 haplotype homozygosity statistics for selective sweep detection by deriving the upper bound on H2/H1 as a function of H12, thereby generating a statistic that normalizes H2/H1 to lie between 0 and 1. Through a reanalysis of resequencing data from inbred lines of Drosophila, we show that the enhanced statistic both strengthens interpretations obtained with the unnormalized statistic and leads to empirical insights that are less readily apparent without the normalization. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Novel image encryption algorithm based on multiple-parameter discrete fractional random transform

    NASA Astrophysics Data System (ADS)

    Zhou, Nanrun; Dong, Taiji; Wu, Jianhua

    2010-08-01

    A new method of digital image encryption is presented by utilizing a new multiple-parameter discrete fractional random transform. Image encryption and decryption are performed based on the index additivity and multiple parameters of the multiple-parameter fractional random transform. The plaintext and ciphertext are respectively in the spatial domain and in the fractional domain determined by the encryption keys. The proposed algorithm can resist statistic analyses effectively. The computer simulation results show that the proposed encryption algorithm is sensitive to the multiple keys, and that it has considerable robustness, noise immunity and security.

  12. Best (but oft-forgotten) practices: the multiple problems of multiplicity-whether and how to correct for many statistical tests.

    PubMed

    Streiner, David L

    2015-10-01

    Testing many null hypotheses in a single study results in an increased probability of detecting a significant finding just by chance (the problem of multiplicity). Debates have raged over many years with regard to whether to correct for multiplicity and, if so, how it should be done. This article first discusses how multiple tests lead to an inflation of the α level, then explores the following different contexts in which multiplicity arises: testing for baseline differences in various types of studies, having >1 outcome variable, conducting statistical tests that produce >1 P value, taking multiple "peeks" at the data, and unplanned, post hoc analyses (i.e., "data dredging," "fishing expeditions," or "P-hacking"). It then discusses some of the methods that have been proposed for correcting for multiplicity, including single-step procedures (e.g., Bonferroni); multistep procedures, such as those of Holm, Hochberg, and Šidák; false discovery rate control; and resampling approaches. Note that these various approaches describe different aspects and are not necessarily mutually exclusive. For example, resampling methods could be used to control the false discovery rate or the family-wise error rate (as defined later in this article). However, the use of one of these approaches presupposes that we should correct for multiplicity, which is not universally accepted, and the article presents the arguments for and against such "correction." The final section brings together these threads and presents suggestions with regard to when it makes sense to apply the corrections and how to do so. © 2015 American Society for Nutrition.

  13. Modelling multiple sources of dissemination bias in meta-analysis.

    PubMed

    Bowden, Jack; Jackson, Dan; Thompson, Simon G

    2010-03-30

    Asymmetry in the funnel plot for a meta-analysis suggests the presence of dissemination bias. This may be caused by publication bias through the decisions of journal editors, by selective reporting of research results by authors or by a combination of both. Typically, study results that are statistically significant or have larger estimated effect sizes are more likely to appear in the published literature, hence giving a biased picture of the evidence-base. Previous statistical approaches for addressing dissemination bias have assumed only a single selection mechanism. Here we consider a more realistic scenario in which multiple dissemination processes, involving both the publishing authors and journals, are operating. In practical applications, the methods can be used to provide sensitivity analyses for the potential effects of multiple dissemination biases operating in meta-analysis.

  14. Comparison of Adaline and Multiple Linear Regression Methods for Rainfall Forecasting

    NASA Astrophysics Data System (ADS)

    Sutawinaya, IP; Astawa, INGA; Hariyanti, NKD

    2018-01-01

    Heavy rainfall can cause disaster, therefore need a forecast to predict rainfall intensity. Main factor that cause flooding is there is a high rainfall intensity and it makes the river become overcapacity. This will cause flooding around the area. Rainfall factor is a dynamic factor, so rainfall is very interesting to be studied. In order to support the rainfall forecasting, there are methods that can be used from Artificial Intelligence (AI) to statistic. In this research, we used Adaline for AI method and Regression for statistic method. The more accurate forecast result shows the method that used is good for forecasting the rainfall. Through those methods, we expected which is the best method for rainfall forecasting here.

  15. Statistical inference from multiple iTRAQ experiments without using common reference standards.

    PubMed

    Herbrich, Shelley M; Cole, Robert N; West, Keith P; Schulze, Kerry; Yager, James D; Groopman, John D; Christian, Parul; Wu, Lee; O'Meally, Robert N; May, Damon H; McIntosh, Martin W; Ruczinski, Ingo

    2013-02-01

    Isobaric tags for relative and absolute quantitation (iTRAQ) is a prominent mass spectrometry technology for protein identification and quantification that is capable of analyzing multiple samples in a single experiment. Frequently, iTRAQ experiments are carried out using an aliquot from a pool of all samples, or "masterpool", in one of the channels as a reference sample standard to estimate protein relative abundances in the biological samples and to combine abundance estimates from multiple experiments. In this manuscript, we show that using a masterpool is counterproductive. We obtain more precise estimates of protein relative abundance by using the available biological data instead of the masterpool and do not need to occupy a channel that could otherwise be used for another biological sample. In addition, we introduce a simple statistical method to associate proteomic data from multiple iTRAQ experiments with a numeric response and show that this approach is more powerful than the conventionally employed masterpool-based approach. We illustrate our methods using data from four replicate iTRAQ experiments on aliquots of the same pool of plasma samples and from a 406-sample project designed to identify plasma proteins that covary with nutrient concentrations in chronically undernourished children from South Asia.

  16. The Beginner's Guide to the Bootstrap Method of Resampling.

    ERIC Educational Resources Information Center

    Lane, Ginny G.

    The bootstrap method of resampling can be useful in estimating the replicability of study results. The bootstrap procedure creates a mock population from a given sample of data from which multiple samples are then drawn. The method extends the usefulness of the jackknife procedure as it allows for computation of a given statistic across a maximal…

  17. Extreme Quantile Estimation in Binary Response Models

    DTIC Science & Technology

    1990-03-01

    in Cancer Research," Biometria , VoL 66, pp. 307-316. Hsi, B.P. [1969], ’The Multiple Sample Up-and-Down Method in Bioassay," Journal of the American...New Method of Estimation," Biometria , VoL 53, pp. 439-454. Wetherill, G.B. [1976], Sequential Methods in Statistics, London: Chapman and Hall. Wu, C.FJ

  18. A Mixed-Methods Study Investigating the Relationship between Media Multitasking Orientation and Grade Point Average

    ERIC Educational Resources Information Center

    Lee, Jennifer

    2012-01-01

    The intent of this study was to examine the relationship between media multitasking orientation and grade point average. The study utilized a mixed-methods approach to investigate the research questions. In the quantitative section of the study, the primary method of statistical analyses was multiple regression. The independent variables for the…

  19. Quantitative Comparison of Three Standardization Methods Using a One-Way ANOVA for Multiple Mean Comparisons

    ERIC Educational Resources Information Center

    Barrows, Russell D.

    2007-01-01

    A one-way ANOVA experiment is performed to determine whether or not the three standardization methods are statistically different in determining the concentration of the three paraffin analytes. The laboratory exercise asks students to combine the three methods in a single analytical procedure of their own design to determine the concentration of…

  20. A New Sample Size Formula for Regression.

    ERIC Educational Resources Information Center

    Brooks, Gordon P.; Barcikowski, Robert S.

    The focus of this research was to determine the efficacy of a new method of selecting sample sizes for multiple linear regression. A Monte Carlo simulation was used to study both empirical predictive power rates and empirical statistical power rates of the new method and seven other methods: those of C. N. Park and A. L. Dudycha (1974); J. Cohen…

  1. Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits

    PubMed Central

    Zeng, Ping; Mukherjee, Sayan; Zhou, Xiang

    2017-01-01

    Epistasis, commonly defined as the interaction between multiple genes, is an important genetic component underlying phenotypic variation. Many statistical methods have been developed to model and identify epistatic interactions between genetic variants. However, because of the large combinatorial search space of interactions, most epistasis mapping methods face enormous computational challenges and often suffer from low statistical power due to multiple test correction. Here, we present a novel, alternative strategy for mapping epistasis: instead of directly identifying individual pairwise or higher-order interactions, we focus on mapping variants that have non-zero marginal epistatic effects—the combined pairwise interaction effects between a given variant and all other variants. By testing marginal epistatic effects, we can identify candidate variants that are involved in epistasis without the need to identify the exact partners with which the variants interact, thus potentially alleviating much of the statistical and computational burden associated with standard epistatic mapping procedures. Our method is based on a variance component model, and relies on a recently developed variance component estimation method for efficient parameter inference and p-value computation. We refer to our method as the “MArginal ePIstasis Test”, or MAPIT. With simulations, we show how MAPIT can be used to estimate and test marginal epistatic effects, produce calibrated test statistics under the null, and facilitate the detection of pairwise epistatic interactions. We further illustrate the benefits of MAPIT in a QTL mapping study by analyzing the gene expression data of over 400 individuals from the GEUVADIS consortium. PMID:28746338

  2. PCTO-SIM: Multiple-point geostatistical modeling using parallel conditional texture optimization

    NASA Astrophysics Data System (ADS)

    Pourfard, Mohammadreza; Abdollahifard, Mohammad J.; Faez, Karim; Motamedi, Sayed Ahmad; Hosseinian, Tahmineh

    2017-05-01

    Multiple-point Geostatistics is a well-known general statistical framework by which complex geological phenomena have been modeled efficiently. Pixel-based and patch-based are two major categories of these methods. In this paper, the optimization-based category is used which has a dual concept in texture synthesis as texture optimization. Our extended version of texture optimization uses the energy concept to model geological phenomena. While honoring the hard point, the minimization of our proposed cost function forces simulation grid pixels to be as similar as possible to training images. Our algorithm has a self-enrichment capability and creates a richer training database from a sparser one through mixing the information of all surrounding patches of the simulation nodes. Therefore, it preserves pattern continuity in both continuous and categorical variables very well. It also shows a fuzzy result in its every realization similar to the expected result of multi realizations of other statistical models. While the main core of most previous Multiple-point Geostatistics methods is sequential, the parallel main core of our algorithm enabled it to use GPU efficiently to reduce the CPU time. One new validation method for MPS has also been proposed in this paper.

  3. Statistical tools for transgene copy number estimation based on real-time PCR.

    PubMed

    Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal

    2007-11-01

    As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.

  4. Density (DE)

    Treesearch

    John F. Caratti

    2006-01-01

    The FIREMON Density (DE) method is used to assess changes in plant species density and height for a macroplot. This method uses multiple quadrats and belt transects (transects having a width) to sample within plot variation and quantify statistically valid changes in plant species density and height over time. Herbaceous plant species are sampled with quadrats while...

  5. Robust inference from multiple test statistics via permutations: a better alternative to the single test statistic approach for randomized trials.

    PubMed

    Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie

    2013-01-01

    Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.

  6. LP-search and its use in analysis of the accuracy of control systems with acoustical models

    NASA Technical Reports Server (NTRS)

    Sergeyev, V. I.; Sobol, I. M.; Statnikov, R. B.; Statnikov, I. N.

    1973-01-01

    The LP-search is proposed as an analog of the Monte Carlo method for finding values in nonlinear statistical systems. It is concluded that: To attain the required accuracy in solution to the problem of control for a statistical system in the LP-search, a considerably smaller number of tests is required than in the Monte Carlo method. The LP-search allows the possibility of multiple repetitions of tests under identical conditions and observability of the output variables of the system.

  7. Statistics Clinic

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

    2014-01-01

    Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.

  8. ASCS online fault detection and isolation based on an improved MPCA

    NASA Astrophysics Data System (ADS)

    Peng, Jianxin; Liu, Haiou; Hu, Yuhui; Xi, Junqiang; Chen, Huiyan

    2014-09-01

    Multi-way principal component analysis (MPCA) has received considerable attention and been widely used in process monitoring. A traditional MPCA algorithm unfolds multiple batches of historical data into a two-dimensional matrix and cut the matrix along the time axis to form subspaces. However, low efficiency of subspaces and difficult fault isolation are the common disadvantages for the principal component model. This paper presents a new subspace construction method based on kernel density estimation function that can effectively reduce the storage amount of the subspace information. The MPCA model and the knowledge base are built based on the new subspace. Then, fault detection and isolation with the squared prediction error (SPE) statistic and the Hotelling ( T 2) statistic are also realized in process monitoring. When a fault occurs, fault isolation based on the SPE statistic is achieved by residual contribution analysis of different variables. For fault isolation of subspace based on the T 2 statistic, the relationship between the statistic indicator and state variables is constructed, and the constraint conditions are presented to check the validity of fault isolation. Then, to improve the robustness of fault isolation to unexpected disturbances, the statistic method is adopted to set the relation between single subspace and multiple subspaces to increase the corrective rate of fault isolation. Finally fault detection and isolation based on the improved MPCA is used to monitor the automatic shift control system (ASCS) to prove the correctness and effectiveness of the algorithm. The research proposes a new subspace construction method to reduce the required storage capacity and to prove the robustness of the principal component model, and sets the relationship between the state variables and fault detection indicators for fault isolation.

  9. Fast mean and variance computation of the diffuse sound transmission through finite-sized thick and layered wall and floor systems

    NASA Astrophysics Data System (ADS)

    Decraene, Carolina; Dijckmans, Arne; Reynders, Edwin P. B.

    2018-05-01

    A method is developed for computing the mean and variance of the diffuse field sound transmission loss of finite-sized layered wall and floor systems that consist of solid, fluid and/or poroelastic layers. This is achieved by coupling a transfer matrix model of the wall or floor to statistical energy analysis subsystem models of the adjacent room volumes. The modal behavior of the wall is approximately accounted for by projecting the wall displacement onto a set of sinusoidal lateral basis functions. This hybrid modal transfer matrix-statistical energy analysis method is validated on multiple wall systems: a thin steel plate, a polymethyl methacrylate panel, a thick brick wall, a sandwich panel, a double-leaf wall with poro-elastic material in the cavity, and a double glazing. The predictions are compared with experimental data and with results obtained using alternative prediction methods such as the transfer matrix method with spatial windowing, the hybrid wave based-transfer matrix method, and the hybrid finite element-statistical energy analysis method. These comparisons confirm the prediction accuracy of the proposed method and the computational efficiency against the conventional hybrid finite element-statistical energy analysis method.

  10. Multiple Illuminant Colour Estimation via Statistical Inference on Factor Graphs.

    PubMed

    Mutimbu, Lawrence; Robles-Kelly, Antonio

    2016-08-31

    This paper presents a method to recover a spatially varying illuminant colour estimate from scenes lit by multiple light sources. Starting with the image formation process, we formulate the illuminant recovery problem in a statistically datadriven setting. To do this, we use a factor graph defined across the scale space of the input image. In the graph, we utilise a set of illuminant prototypes computed using a data driven approach. As a result, our method delivers a pixelwise illuminant colour estimate being devoid of libraries or user input. The use of a factor graph also allows for the illuminant estimates to be recovered making use of a maximum a posteriori (MAP) inference process. Moreover, we compute the probability marginals by performing a Delaunay triangulation on our factor graph. We illustrate the utility of our method for pixelwise illuminant colour recovery on widely available datasets and compare against a number of alternatives. We also show sample colour correction results on real-world images.

  11. The HCUP SID Imputation Project: Improving Statistical Inferences for Health Disparities Research by Imputing Missing Race Data.

    PubMed

    Ma, Yan; Zhang, Wei; Lyman, Stephen; Huang, Yihe

    2018-06-01

    To identify the most appropriate imputation method for missing data in the HCUP State Inpatient Databases (SID) and assess the impact of different missing data methods on racial disparities research. HCUP SID. A novel simulation study compared four imputation methods (random draw, hot deck, joint multiple imputation [MI], conditional MI) for missing values for multiple variables, including race, gender, admission source, median household income, and total charges. The simulation was built on real data from the SID to retain their hierarchical data structures and missing data patterns. Additional predictive information from the U.S. Census and American Hospital Association (AHA) database was incorporated into the imputation. Conditional MI prediction was equivalent or superior to the best performing alternatives for all missing data structures and substantially outperformed each of the alternatives in various scenarios. Conditional MI substantially improved statistical inferences for racial health disparities research with the SID. © Health Research and Educational Trust.

  12. Statistical sensor fusion analysis of near-IR polarimetric and thermal imagery for the detection of minelike targets

    NASA Astrophysics Data System (ADS)

    Weisenseel, Robert A.; Karl, William C.; Castanon, David A.; DiMarzio, Charles A.

    1999-02-01

    We present an analysis of statistical model based data-level fusion for near-IR polarimetric and thermal data, particularly for the detection of mines and mine-like targets. Typical detection-level data fusion methods, approaches that fuse detections from individual sensors rather than fusing at the level of the raw data, do not account rationally for the relative reliability of different sensors, nor the redundancy often inherent in multiple sensors. Representative examples of such detection-level techniques include logical AND/OR operations on detections from individual sensors and majority vote methods. In this work, we exploit a statistical data model for the detection of mines and mine-like targets to compare and fuse multiple sensor channels. Our purpose is to quantify the amount of knowledge that each polarimetric or thermal channel supplies to the detection process. With this information, we can make reasonable decisions about the usefulness of each channel. We can use this information to improve the detection process, or we can use it to reduce the number of required channels.

  13. Modelling a real-world buried valley system with vertical non-stationarity using multiple-point statistics

    NASA Astrophysics Data System (ADS)

    He, Xiulan; Sonnenborg, Torben O.; Jørgensen, Flemming; Jensen, Karsten H.

    2017-03-01

    Stationarity has traditionally been a requirement of geostatistical simulations. A common way to deal with non-stationarity is to divide the system into stationary sub-regions and subsequently merge the realizations for each region. Recently, the so-called partition approach that has the flexibility to model non-stationary systems directly was developed for multiple-point statistics simulation (MPS). The objective of this study is to apply the MPS partition method with conventional borehole logs and high-resolution airborne electromagnetic (AEM) data, for simulation of a real-world non-stationary geological system characterized by a network of connected buried valleys that incise deeply into layered Miocene sediments (case study in Denmark). The results show that, based on fragmented information of the formation boundaries, the MPS partition method is able to simulate a non-stationary system including valley structures embedded in a layered Miocene sequence in a single run. Besides, statistical information retrieved from the AEM data improved the simulation of the geology significantly, especially for the deep-seated buried valley sediments where borehole information is sparse.

  14. Application of Bayesian methods to habitat selection modeling of the northern spotted owl in California: new statistical methods for wildlife research

    Treesearch

    Howard B. Stauffer; Cynthia J. Zabel; Jeffrey R. Dunk

    2005-01-01

    We compared a set of competing logistic regression habitat selection models for Northern Spotted Owls (Strix occidentalis caurina) in California. The habitat selection models were estimated, compared, evaluated, and tested using multiple sample datasets collected on federal forestlands in northern California. We used Bayesian methods in interpreting...

  15. Fault diagnosis of sensor networked structures with multiple faults using a virtual beam based approach

    NASA Astrophysics Data System (ADS)

    Wang, H.; Jing, X. J.

    2017-07-01

    This paper presents a virtual beam based approach suitable for conducting diagnosis of multiple faults in complex structures with limited prior knowledge of the faults involved. The "virtual beam", a recently-proposed concept for fault detection in complex structures, is applied, which consists of a chain of sensors representing a vibration energy transmission path embedded in the complex structure. Statistical tests and adaptive threshold are particularly adopted for fault detection due to limited prior knowledge of normal operational conditions and fault conditions. To isolate the multiple faults within a specific structure or substructure of a more complex one, a 'biased running' strategy is developed and embedded within the bacterial-based optimization method to construct effective virtual beams and thus to improve the accuracy of localization. The proposed method is easy and efficient to implement for multiple fault localization with limited prior knowledge of normal conditions and faults. With extensive experimental results, it is validated that the proposed method can localize both single fault and multiple faults more effectively than the classical trust index subtract on negative add on positive (TI-SNAP) method.

  16. A data-driven multiplicative fault diagnosis approach for automation processes.

    PubMed

    Hao, Haiyang; Zhang, Kai; Ding, Steven X; Chen, Zhiwen; Lei, Yaguo

    2014-09-01

    This paper presents a new data-driven method for diagnosing multiplicative key performance degradation in automation processes. Different from the well-established additive fault diagnosis approaches, the proposed method aims at identifying those low-level components which increase the variability of process variables and cause performance degradation. Based on process data, features of multiplicative fault are extracted. To identify the root cause, the impact of fault on each process variable is evaluated in the sense of contribution to performance degradation. Then, a numerical example is used to illustrate the functionalities of the method and Monte-Carlo simulation is performed to demonstrate the effectiveness from the statistical viewpoint. Finally, to show the practical applicability, a case study on the Tennessee Eastman process is presented. Copyright © 2013. Published by Elsevier Ltd.

  17. multiDE: a dimension reduced model based statistical method for differential expression analysis using RNA-sequencing data with multiple treatment conditions.

    PubMed

    Kang, Guangliang; Du, Li; Zhang, Hong

    2016-06-22

    The growing complexity of biological experiment design based on high-throughput RNA sequencing (RNA-seq) is calling for more accommodative statistical tools. We focus on differential expression (DE) analysis using RNA-seq data in the presence of multiple treatment conditions. We propose a novel method, multiDE, for facilitating DE analysis using RNA-seq read count data with multiple treatment conditions. The read count is assumed to follow a log-linear model incorporating two factors (i.e., condition and gene), where an interaction term is used to quantify the association between gene and condition. The number of the degrees of freedom is reduced to one through the first order decomposition of the interaction, leading to a dramatically power improvement in testing DE genes when the number of conditions is greater than two. In our simulation situations, multiDE outperformed the benchmark methods (i.e. edgeR and DESeq2) even if the underlying model was severely misspecified, and the power gain was increasing in the number of conditions. In the application to two real datasets, multiDE identified more biologically meaningful DE genes than the benchmark methods. An R package implementing multiDE is available publicly at http://homepage.fudan.edu.cn/zhangh/softwares/multiDE . When the number of conditions is two, multiDE performs comparably with the benchmark methods. When the number of conditions is greater than two, multiDE outperforms the benchmark methods.

  18. Analyzing the Validity of the Adult-Adolescent Parenting Inventory for Low-Income Populations

    ERIC Educational Resources Information Center

    Lawson, Michael A.; Alameda-Lawson, Tania; Byrnes, Edward

    2017-01-01

    Objectives: The purpose of this study was to examine the construct and predictive validity of the Adult-Adolescent Parenting Inventory (AAPI-2). Methods: The validity of the AAPI-2 was evaluated using multiple statistical methods, including exploratory factor analysis, confirmatory factor analysis, and latent class analysis. These analyses were…

  19. A generalization of random matrix theory and its application to statistical physics.

    PubMed

    Wang, Duan; Zhang, Xin; Horvatic, Davor; Podobnik, Boris; Eugene Stanley, H

    2017-02-01

    To study the statistical structure of crosscorrelations in empirical data, we generalize random matrix theory and propose a new method of cross-correlation analysis, known as autoregressive random matrix theory (ARRMT). ARRMT takes into account the influence of auto-correlations in the study of cross-correlations in multiple time series. We first analytically and numerically determine how auto-correlations affect the eigenvalue distribution of the correlation matrix. Then we introduce ARRMT with a detailed procedure of how to implement the method. Finally, we illustrate the method using two examples taken from inflation rates for air pressure data for 95 US cities.

  20. Post-processing method for wind speed ensemble forecast using wind speed and direction

    NASA Astrophysics Data System (ADS)

    Sofie Eide, Siri; Bjørnar Bremnes, John; Steinsland, Ingelin

    2017-04-01

    Statistical methods are widely applied to enhance the quality of both deterministic and ensemble NWP forecasts. In many situations, like wind speed forecasting, most of the predictive information is contained in one variable in the NWP models. However, in statistical calibration of deterministic forecasts it is often seen that including more variables can further improve forecast skill. For ensembles this is rarely taken advantage of, mainly due to that it is generally not straightforward how to include multiple variables. In this study, it is demonstrated how multiple variables can be included in Bayesian model averaging (BMA) by using a flexible regression method for estimating the conditional means. The method is applied to wind speed forecasting at 204 Norwegian stations based on wind speed and direction forecasts from the ECMWF ensemble system. At about 85 % of the sites the ensemble forecasts were improved in terms of CRPS by adding wind direction as predictor compared to only using wind speed. On average the improvements were about 5 %, but mainly for moderate to strong wind situations. For weak wind speeds adding wind direction had more or less neutral impact.

  1. A spreadsheet template compatible with Microsoft Excel and iWork Numbers that returns the simultaneous confidence intervals for all pairwise differences between multiple sample means.

    PubMed

    Brown, Angus M

    2010-04-01

    The objective of the method described in this paper is to develop a spreadsheet template for the purpose of comparing multiple sample means. An initial analysis of variance (ANOVA) test on the data returns F--the test statistic. If F is larger than the critical F value drawn from the F distribution at the appropriate degrees of freedom, convention dictates rejection of the null hypothesis and allows subsequent multiple comparison testing to determine where the inequalities between the sample means lie. A variety of multiple comparison methods are described that return the 95% confidence intervals for differences between means using an inclusive pairwise comparison of the sample means. 2009 Elsevier Ireland Ltd. All rights reserved.

  2. Predicting recreational water quality advisories: A comparison of statistical methods

    USGS Publications Warehouse

    Brooks, Wesley R.; Corsi, Steven R.; Fienen, Michael N.; Carvin, Rebecca B.

    2016-01-01

    Epidemiological studies indicate that fecal indicator bacteria (FIB) in beach water are associated with illnesses among people having contact with the water. In order to mitigate public health impacts, many beaches are posted with an advisory when the concentration of FIB exceeds a beach action value. The most commonly used method of measuring FIB concentration takes 18–24 h before returning a result. In order to avoid the 24 h lag, it has become common to ”nowcast” the FIB concentration using statistical regressions on environmental surrogate variables. Most commonly, nowcast models are estimated using ordinary least squares regression, but other regression methods from the statistical and machine learning literature are sometimes used. This study compares 14 regression methods across 7 Wisconsin beaches to identify which consistently produces the most accurate predictions. A random forest model is identified as the most accurate, followed by multiple regression fit using the adaptive LASSO.

  3. Statistical characterization of multiple-reaction monitoring mass spectrometry (MRM-MS) assays for quantitative proteomics

    PubMed Central

    2012-01-01

    Multiple reaction monitoring mass spectrometry (MRM-MS) with stable isotope dilution (SID) is increasingly becoming a widely accepted assay for the quantification of proteins and peptides. These assays have shown great promise in relatively high throughput verification of candidate biomarkers. While the use of MRM-MS assays is well established in the small molecule realm, their introduction and use in proteomics is relatively recent. As such, statistical and computational methods for the analysis of MRM-MS data from proteins and peptides are still being developed. Based on our extensive experience with analyzing a wide range of SID-MRM-MS data, we set forth a methodology for analysis that encompasses significant aspects ranging from data quality assessment, assay characterization including calibration curves, limits of detection (LOD) and quantification (LOQ), and measurement of intra- and interlaboratory precision. We draw upon publicly available seminal datasets to illustrate our methods and algorithms. PMID:23176545

  4. Statistical characterization of multiple-reaction monitoring mass spectrometry (MRM-MS) assays for quantitative proteomics.

    PubMed

    Mani, D R; Abbatiello, Susan E; Carr, Steven A

    2012-01-01

    Multiple reaction monitoring mass spectrometry (MRM-MS) with stable isotope dilution (SID) is increasingly becoming a widely accepted assay for the quantification of proteins and peptides. These assays have shown great promise in relatively high throughput verification of candidate biomarkers. While the use of MRM-MS assays is well established in the small molecule realm, their introduction and use in proteomics is relatively recent. As such, statistical and computational methods for the analysis of MRM-MS data from proteins and peptides are still being developed. Based on our extensive experience with analyzing a wide range of SID-MRM-MS data, we set forth a methodology for analysis that encompasses significant aspects ranging from data quality assessment, assay characterization including calibration curves, limits of detection (LOD) and quantification (LOQ), and measurement of intra- and interlaboratory precision. We draw upon publicly available seminal datasets to illustrate our methods and algorithms.

  5. The value of model averaging and dynamical climate model predictions for improving statistical seasonal streamflow forecasts over Australia

    NASA Astrophysics Data System (ADS)

    Pokhrel, Prafulla; Wang, Q. J.; Robertson, David E.

    2013-10-01

    Seasonal streamflow forecasts are valuable for planning and allocation of water resources. In Australia, the Bureau of Meteorology employs a statistical method to forecast seasonal streamflows. The method uses predictors that are related to catchment wetness at the start of a forecast period and to climate during the forecast period. For the latter, a predictor is selected among a number of lagged climate indices as candidates to give the "best" model in terms of model performance in cross validation. This study investigates two strategies for further improvement in seasonal streamflow forecasts. The first is to combine, through Bayesian model averaging, multiple candidate models with different lagged climate indices as predictors, to take advantage of different predictive strengths of the multiple models. The second strategy is to introduce additional candidate models, using rainfall and sea surface temperature predictions from a global climate model as predictors. This is to take advantage of the direct simulations of various dynamic processes. The results show that combining forecasts from multiple statistical models generally yields more skillful forecasts than using only the best model and appears to moderate the worst forecast errors. The use of rainfall predictions from the dynamical climate model marginally improves the streamflow forecasts when viewed over all the study catchments and seasons, but the use of sea surface temperature predictions provide little additional benefit.

  6. Testing biological liquid samples using modified m-line spectroscopy method

    NASA Astrophysics Data System (ADS)

    Augusciuk, Elzbieta; Rybiński, Grzegorz

    2005-09-01

    Non-chemical method of detection of sugar concentration in biological (animal and plant source) liquids has been investigated. Simplified set was build to show the easy way of carrying out the survey and to make easy to gather multiple measurements for error detecting and statistics. Method is suggested as easy and cheap alternative for chemical methods of measuring sugar concentration, but needing a lot effort to be made precise.

  7. How multiplicity determines entropy and the derivation of the maximum entropy principle for complex systems.

    PubMed

    Hanel, Rudolf; Thurner, Stefan; Gell-Mann, Murray

    2014-05-13

    The maximum entropy principle (MEP) is a method for obtaining the most likely distribution functions of observables from statistical systems by maximizing entropy under constraints. The MEP has found hundreds of applications in ergodic and Markovian systems in statistical mechanics, information theory, and statistics. For several decades there has been an ongoing controversy over whether the notion of the maximum entropy principle can be extended in a meaningful way to nonextensive, nonergodic, and complex statistical systems and processes. In this paper we start by reviewing how Boltzmann-Gibbs-Shannon entropy is related to multiplicities of independent random processes. We then show how the relaxation of independence naturally leads to the most general entropies that are compatible with the first three Shannon-Khinchin axioms, the (c,d)-entropies. We demonstrate that the MEP is a perfectly consistent concept for nonergodic and complex statistical systems if their relative entropy can be factored into a generalized multiplicity and a constraint term. The problem of finding such a factorization reduces to finding an appropriate representation of relative entropy in a linear basis. In a particular example we show that path-dependent random processes with memory naturally require specific generalized entropies. The example is to our knowledge the first exact derivation of a generalized entropy from the microscopic properties of a path-dependent random process.

  8. An extended sequential goodness-of-fit multiple testing method for discrete data.

    PubMed

    Castro-Conde, Irene; Döhler, Sebastian; de Uña-Álvarez, Jacobo

    2017-10-01

    The sequential goodness-of-fit (SGoF) multiple testing method has recently been proposed as an alternative to the familywise error rate- and the false discovery rate-controlling procedures in high-dimensional problems. For discrete data, the SGoF method may be very conservative. In this paper, we introduce an alternative SGoF-type procedure that takes into account the discreteness of the test statistics. Like the original SGoF, our new method provides weak control of the false discovery rate/familywise error rate but attains false discovery rate levels closer to the desired nominal level, and thus it is more powerful. We study the performance of this method in a simulation study and illustrate its application to a real pharmacovigilance data set.

  9. Fully Bayesian tests of neutrality using genealogical summary statistics.

    PubMed

    Drummond, Alexei J; Suchard, Marc A

    2008-10-31

    Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome. Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size. Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.

  10. Examination of Anxiety Sensitivity and Distress Tolerance as Transdiagnostic Mechanisms Linking Multiple Anxiety Pathologies to Alcohol Use Problems in Adolescents

    PubMed Central

    Wolitzky-Taylor, Kate; Guillot, Casey R.; Pang, Raina D.; Kirkpatrick, Matthew G.; Zvolensky, Michael J.; Buckner, Julia D.; Leventhal, Adam M.

    2015-01-01

    Background Multiple forms of anxiety psychopathology are associated with alcohol use problems in adolescents. Yet, the mechanisms underlying this association are unclear. Anxiety sensitivity (AS) and distress tolerance (DT) represent 2 distinct, conceptually relevant transdiagnostic constructs implicated in multiple manifestations of anxiety that may also underlie alcohol use problems and thereby explain why people with anxiety are more likely to have alcohol problems. Methods The current cross-sectional study examined whether AS and DT accounted for (i.e., statistically mediated) the relationship between manifest indicators of the 3 common anxiety phenotypes (generalized anxiety, social anxiety, and panic disorders) and alcohol problems in a sample of 534 high school students (14 to 15 years old). Results Multiple manifestations of anxiety were associated with greater alcohol use problems. AS statistically mediated multiple anxiety–alcohol associations, but DT did not. Conclusions These findings provide preliminary evidence suggesting AS may be an important transdiagnostic target for alcohol prevention programs for those in early adolescence that experience elevated anxiety symptoms. PMID:25706521

  11. The optimal hormonal replacement modality selection for multiple organ procurement from brain-dead organ donors

    PubMed Central

    Mi, Zhibao; Novitzky, Dimitri; Collins, Joseph F; Cooper, David KC

    2015-01-01

    The management of brain-dead organ donors is complex. The use of inotropic agents and replacement of depleted hormones (hormonal replacement therapy) is crucial for successful multiple organ procurement, yet the optimal hormonal replacement has not been identified, and the statistical adjustment to determine the best selection is not trivial. Traditional pair-wise comparisons between every pair of treatments, and multiple comparisons to all (MCA), are statistically conservative. Hsu’s multiple comparisons with the best (MCB) – adapted from the Dunnett’s multiple comparisons with control (MCC) – has been used for selecting the best treatment based on continuous variables. We selected the best hormonal replacement modality for successful multiple organ procurement using a two-step approach. First, we estimated the predicted margins by constructing generalized linear models (GLM) or generalized linear mixed models (GLMM), and then we applied the multiple comparison methods to identify the best hormonal replacement modality given that the testing of hormonal replacement modalities is independent. Based on 10-year data from the United Network for Organ Sharing (UNOS), among 16 hormonal replacement modalities, and using the 95% simultaneous confidence intervals, we found that the combination of thyroid hormone, a corticosteroid, antidiuretic hormone, and insulin was the best modality for multiple organ procurement for transplantation. PMID:25565890

  12. A method for classification of multisource data using interval-valued probabilities and its application to HIRIS data

    NASA Technical Reports Server (NTRS)

    Kim, H.; Swain, P. H.

    1991-01-01

    A method of classifying multisource data in remote sensing is presented. The proposed method considers each data source as an information source providing a body of evidence, represents statistical evidence by interval-valued probabilities, and uses Dempster's rule to integrate information based on multiple data source. The method is applied to the problems of ground-cover classification of multispectral data combined with digital terrain data such as elevation, slope, and aspect. Then this method is applied to simulated 201-band High Resolution Imaging Spectrometer (HIRIS) data by dividing the dimensionally huge data source into smaller and more manageable pieces based on the global statistical correlation information. It produces higher classification accuracy than the Maximum Likelihood (ML) classification method when the Hughes phenomenon is apparent.

  13. Evaluating methods of correcting for multiple comparisons implemented in SPM12 in social neuroscience fMRI studies: an example from moral psychology.

    PubMed

    Han, Hyemin; Glenn, Andrea L

    2018-06-01

    In fMRI research, the goal of correcting for multiple comparisons is to identify areas of activity that reflect true effects, and thus would be expected to replicate in future studies. Finding an appropriate balance between trying to minimize false positives (Type I error) while not being too stringent and omitting true effects (Type II error) can be challenging. Furthermore, the advantages and disadvantages of these types of errors may differ for different areas of study. In many areas of social neuroscience that involve complex processes and considerable individual differences, such as the study of moral judgment, effects are typically smaller and statistical power weaker, leading to the suggestion that less stringent corrections that allow for more sensitivity may be beneficial and also result in more false positives. Using moral judgment fMRI data, we evaluated four commonly used methods for multiple comparison correction implemented in Statistical Parametric Mapping 12 by examining which method produced the most precise overlap with results from a meta-analysis of relevant studies and with results from nonparametric permutation analyses. We found that voxelwise thresholding with familywise error correction based on Random Field Theory provides a more precise overlap (i.e., without omitting too few regions or encompassing too many additional regions) than either clusterwise thresholding, Bonferroni correction, or false discovery rate correction methods.

  14. Semiquantitative determination of mesophilic, aerobic microorganisms in cocoa products using the Soleris NF-TVC method.

    PubMed

    Montei, Carolyn; McDougal, Susan; Mozola, Mark; Rice, Jennifer

    2014-01-01

    The Soleris Non-fermenting Total Viable Count method was previously validated for a wide variety of food products, including cocoa powder. A matrix extension study was conducted to validate the method for use with cocoa butter and cocoa liquor. Test samples included naturally contaminated cocoa liquor and cocoa butter inoculated with natural microbial flora derived from cocoa liquor. A probability of detection statistical model was used to compare Soleris results at multiple test thresholds (dilutions) with aerobic plate counts determined using the AOAC Official Method 966.23 dilution plating method. Results of the two methods were not statistically different at any dilution level in any of the three trials conducted. The Soleris method offers the advantage of results within 24 h, compared to the 48 h required by standard dilution plating methods.

  15. Pilot points method for conditioning multiple-point statistical facies simulation on flow data

    NASA Astrophysics Data System (ADS)

    Ma, Wei; Jafarpour, Behnam

    2018-05-01

    We propose a new pilot points method for conditioning discrete multiple-point statistical (MPS) facies simulation on dynamic flow data. While conditioning MPS simulation on static hard data is straightforward, their calibration against nonlinear flow data is nontrivial. The proposed method generates conditional models from a conceptual model of geologic connectivity, known as a training image (TI), by strategically placing and estimating pilot points. To place pilot points, a score map is generated based on three sources of information: (i) the uncertainty in facies distribution, (ii) the model response sensitivity information, and (iii) the observed flow data. Once the pilot points are placed, the facies values at these points are inferred from production data and then are used, along with available hard data at well locations, to simulate a new set of conditional facies realizations. While facies estimation at the pilot points can be performed using different inversion algorithms, in this study the ensemble smoother (ES) is adopted to update permeability maps from production data, which are then used to statistically infer facies types at the pilot point locations. The developed method combines the information in the flow data and the TI by using the former to infer facies values at selected locations away from the wells and the latter to ensure consistent facies structure and connectivity where away from measurement locations. Several numerical experiments are used to evaluate the performance of the developed method and to discuss its important properties.

  16. Robust statistical reconstruction for charged particle tomography

    DOEpatents

    Schultz, Larry Joe; Klimenko, Alexei Vasilievich; Fraser, Andrew Mcleod; Morris, Christopher; Orum, John Christopher; Borozdin, Konstantin N; Sossong, Michael James; Hengartner, Nicolas W

    2013-10-08

    Systems and methods for charged particle detection including statistical reconstruction of object volume scattering density profiles from charged particle tomographic data to determine the probability distribution of charged particle scattering using a statistical multiple scattering model and determine a substantially maximum likelihood estimate of object volume scattering density using expectation maximization (ML/EM) algorithm to reconstruct the object volume scattering density. The presence of and/or type of object occupying the volume of interest can be identified from the reconstructed volume scattering density profile. The charged particle tomographic data can be cosmic ray muon tomographic data from a muon tracker for scanning packages, containers, vehicles or cargo. The method can be implemented using a computer program which is executable on a computer.

  17. Is using multiple imputation better than complete case analysis for estimating a prevalence (risk) difference in randomized controlled trials when binary outcome observations are missing?

    PubMed

    Mukaka, Mavuto; White, Sarah A; Terlouw, Dianne J; Mwapasa, Victor; Kalilani-Phiri, Linda; Faragher, E Brian

    2016-07-22

    Missing outcomes can seriously impair the ability to make correct inferences from randomized controlled trials (RCTs). Complete case (CC) analysis is commonly used, but it reduces sample size and is perceived to lead to reduced statistical efficiency of estimates while increasing the potential for bias. As multiple imputation (MI) methods preserve sample size, they are generally viewed as the preferred analytical approach. We examined this assumption, comparing the performance of CC and MI methods to determine risk difference (RD) estimates in the presence of missing binary outcomes. We conducted simulation studies of 5000 simulated data sets with 50 imputations of RCTs with one primary follow-up endpoint at different underlying levels of RD (3-25 %) and missing outcomes (5-30 %). For missing at random (MAR) or missing completely at random (MCAR) outcomes, CC method estimates generally remained unbiased and achieved precision similar to or better than MI methods, and high statistical coverage. Missing not at random (MNAR) scenarios yielded invalid inferences with both methods. Effect size estimate bias was reduced in MI methods by always including group membership even if this was unrelated to missingness. Surprisingly, under MAR and MCAR conditions in the assessed scenarios, MI offered no statistical advantage over CC methods. While MI must inherently accompany CC methods for intention-to-treat analyses, these findings endorse CC methods for per protocol risk difference analyses in these conditions. These findings provide an argument for the use of the CC approach to always complement MI analyses, with the usual caveat that the validity of the mechanism for missingness be thoroughly discussed. More importantly, researchers should strive to collect as much data as possible.

  18. Evaluating Composite Sampling Methods of Bacillus Spores at Low Concentrations

    PubMed Central

    Hess, Becky M.; Amidan, Brett G.; Anderson, Kevin K.; Hutchison, Janine R.

    2016-01-01

    Restoring all facility operations after the 2001 Amerithrax attacks took years to complete, highlighting the need to reduce remediation time. Some of the most time intensive tasks were environmental sampling and sample analyses. Composite sampling allows disparate samples to be combined, with only a single analysis needed, making it a promising method to reduce response times. We developed a statistical experimental design to test three different composite sampling methods: 1) single medium single pass composite (SM-SPC): a single cellulose sponge samples multiple coupons with a single pass across each coupon; 2) single medium multi-pass composite: a single cellulose sponge samples multiple coupons with multiple passes across each coupon (SM-MPC); and 3) multi-medium post-sample composite (MM-MPC): a single cellulose sponge samples a single surface, and then multiple sponges are combined during sample extraction. Five spore concentrations of Bacillus atrophaeus Nakamura spores were tested; concentrations ranged from 5 to 100 CFU/coupon (0.00775 to 0.155 CFU/cm2). Study variables included four clean surface materials (stainless steel, vinyl tile, ceramic tile, and painted dry wallboard) and three grime coated/dirty materials (stainless steel, vinyl tile, and ceramic tile). Analysis of variance for the clean study showed two significant factors: composite method (p< 0.0001) and coupon material (p = 0.0006). Recovery efficiency (RE) was higher overall using the MM-MPC method compared to the SM-SPC and SM-MPC methods. RE with the MM-MPC method for concentrations tested (10 to 100 CFU/coupon) was similar for ceramic tile, dry wall, and stainless steel for clean materials. RE was lowest for vinyl tile with both composite methods. Statistical tests for the dirty study showed RE was significantly higher for vinyl and stainless steel materials, but lower for ceramic tile. These results suggest post-sample compositing can be used to reduce sample analysis time when responding to a Bacillus anthracis contamination event of clean or dirty surfaces. PMID:27736999

  19. Evaluating Composite Sampling Methods of Bacillus Spores at Low Concentrations.

    PubMed

    Hess, Becky M; Amidan, Brett G; Anderson, Kevin K; Hutchison, Janine R

    2016-01-01

    Restoring all facility operations after the 2001 Amerithrax attacks took years to complete, highlighting the need to reduce remediation time. Some of the most time intensive tasks were environmental sampling and sample analyses. Composite sampling allows disparate samples to be combined, with only a single analysis needed, making it a promising method to reduce response times. We developed a statistical experimental design to test three different composite sampling methods: 1) single medium single pass composite (SM-SPC): a single cellulose sponge samples multiple coupons with a single pass across each coupon; 2) single medium multi-pass composite: a single cellulose sponge samples multiple coupons with multiple passes across each coupon (SM-MPC); and 3) multi-medium post-sample composite (MM-MPC): a single cellulose sponge samples a single surface, and then multiple sponges are combined during sample extraction. Five spore concentrations of Bacillus atrophaeus Nakamura spores were tested; concentrations ranged from 5 to 100 CFU/coupon (0.00775 to 0.155 CFU/cm2). Study variables included four clean surface materials (stainless steel, vinyl tile, ceramic tile, and painted dry wallboard) and three grime coated/dirty materials (stainless steel, vinyl tile, and ceramic tile). Analysis of variance for the clean study showed two significant factors: composite method (p< 0.0001) and coupon material (p = 0.0006). Recovery efficiency (RE) was higher overall using the MM-MPC method compared to the SM-SPC and SM-MPC methods. RE with the MM-MPC method for concentrations tested (10 to 100 CFU/coupon) was similar for ceramic tile, dry wall, and stainless steel for clean materials. RE was lowest for vinyl tile with both composite methods. Statistical tests for the dirty study showed RE was significantly higher for vinyl and stainless steel materials, but lower for ceramic tile. These results suggest post-sample compositing can be used to reduce sample analysis time when responding to a Bacillus anthracis contamination event of clean or dirty surfaces.

  20. Evaluating the decision accuracy and speed of clinical data visualizations.

    PubMed

    Pieczkiewicz, David S; Finkelstein, Stanley M

    2010-01-01

    Clinicians face an increasing volume of biomedical data. Assessing the efficacy of systems that enable accurate and timely clinical decision making merits corresponding attention. This paper discusses the multiple-reader multiple-case (MRMC) experimental design and linear mixed models as means of assessing and comparing decision accuracy and latency (time) for decision tasks in which clinician readers must interpret visual displays of data. These tools can assess and compare decision accuracy and latency (time). These experimental and statistical techniques, used extensively in radiology imaging studies, offer a number of practical and analytic advantages over more traditional quantitative methods such as percent-correct measurements and ANOVAs, and are recommended for their statistical efficiency and generalizability. An example analysis using readily available, free, and commercial statistical software is provided as an appendix. While these techniques are not appropriate for all evaluation questions, they can provide a valuable addition to the evaluative toolkit of medical informatics research.

  1. An evaluation of talker localization based on direction of arrival estimation and statistical sound source identification

    NASA Astrophysics Data System (ADS)

    Nishiura, Takanobu; Nakamura, Satoshi

    2002-11-01

    It is very important to capture distant-talking speech for a hands-free speech interface with high quality. A microphone array is an ideal candidate for this purpose. However, this approach requires localizing the target talker. Conventional talker localization algorithms in multiple sound source environments not only have difficulty localizing the multiple sound sources accurately, but also have difficulty localizing the target talker among known multiple sound source positions. To cope with these problems, we propose a new talker localization algorithm consisting of two algorithms. One is DOA (direction of arrival) estimation algorithm for multiple sound source localization based on CSP (cross-power spectrum phase) coefficient addition method. The other is statistical sound source identification algorithm based on GMM (Gaussian mixture model) for localizing the target talker position among localized multiple sound sources. In this paper, we particularly focus on the talker localization performance based on the combination of these two algorithms with a microphone array. We conducted evaluation experiments in real noisy reverberant environments. As a result, we confirmed that multiple sound signals can be identified accurately between ''speech'' or ''non-speech'' by the proposed algorithm. [Work supported by ATR, and MEXT of Japan.

  2. What influences the choice of assessment methods in health technology assessments? Statistical analysis of international health technology assessments from 1989 to 2002.

    PubMed

    Draborg, Eva; Andersen, Christian Kronborg

    2006-01-01

    Health technology assessment (HTA) has been used as input in decision making worldwide for more than 25 years. However, no uniform definition of HTA or agreement on assessment methods exists, leaving open the question of what influences the choice of assessment methods in HTAs. The objective of this study is to analyze statistically a possible relationship between methods of assessment used in practical HTAs, type of assessed technology, type of assessors, and year of publication. A sample of 433 HTAs published by eleven leading institutions or agencies in nine countries was reviewed and analyzed by multiple logistic regression. The study shows that outsourcing of HTA reports to external partners is associated with a higher likelihood of using assessment methods, such as meta-analysis, surveys, economic evaluations, and randomized controlled trials; and with a lower likelihood of using assessment methods, such as literature reviews and "other methods". The year of publication was statistically related to the inclusion of economic evaluations and shows a decreasing likelihood during the year span. The type of assessed technology was related to economic evaluations with a decreasing likelihood, to surveys, and to "other methods" with a decreasing likelihood when pharmaceuticals were the assessed type of technology. During the period from 1989 to 2002, no major developments in assessment methods used in practical HTAs were shown statistically in a sample of 433 HTAs worldwide. Outsourcing to external assessors has a statistically significant influence on choice of assessment methods.

  3. Statistical significance of combinatorial regulations

    PubMed Central

    Terada, Aika; Okada-Hatakeyama, Mariko; Tsuda, Koji; Sese, Jun

    2013-01-01

    More than three transcription factors often work together to enable cells to respond to various signals. The detection of combinatorial regulation by multiple transcription factors, however, is not only computationally nontrivial but also extremely unlikely because of multiple testing correction. The exponential growth in the number of tests forces us to set a strict limit on the maximum arity. Here, we propose an efficient branch-and-bound algorithm called the “limitless arity multiple-testing procedure” (LAMP) to count the exact number of testable combinations and calibrate the Bonferroni factor to the smallest possible value. LAMP lists significant combinations without any limit, whereas the family-wise error rate is rigorously controlled under the threshold. In the human breast cancer transcriptome, LAMP discovered statistically significant combinations of as many as eight binding motifs. This method may contribute to uncover pathways regulated in a coordinated fashion and find hidden associations in heterogeneous data. PMID:23882073

  4. The Effects of Statistical Multiplicity of Infection on Virus Quantification and Infectivity Assays.

    PubMed

    Mistry, Bhaven A; D'Orsogna, Maria R; Chou, Tom

    2018-06-19

    Many biological assays are employed in virology to quantify parameters of interest. Two such classes of assays, virus quantification assays (VQAs) and infectivity assays (IAs), aim to estimate the number of viruses present in a solution and the ability of a viral strain to successfully infect a host cell, respectively. VQAs operate at extremely dilute concentrations, and results can be subject to stochastic variability in virus-cell interactions. At the other extreme, high viral-particle concentrations are used in IAs, resulting in large numbers of viruses infecting each cell, enough for measurable change in total transcription activity. Furthermore, host cells can be infected at any concentration regime by multiple particles, resulting in a statistical multiplicity of infection and yielding potentially significant variability in the assay signal and parameter estimates. We develop probabilistic models for statistical multiplicity of infection at low and high viral-particle-concentration limits and apply them to the plaque (VQA), endpoint dilution (VQA), and luciferase reporter (IA) assays. A web-based tool implementing our models and analysis is also developed and presented. We test our proposed new methods for inferring experimental parameters from data using numerical simulations and show improvement on existing procedures in all limits. Copyright © 2018 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  5. Missing Data and Multiple Imputation: An Unbiased Approach

    NASA Technical Reports Server (NTRS)

    Foy, M.; VanBaalen, M.; Wear, M.; Mendez, C.; Mason, S.; Meyers, V.; Alexander, D.; Law, J.

    2014-01-01

    The default method of dealing with missing data in statistical analyses is to only use the complete observations (complete case analysis), which can lead to unexpected bias when data do not meet the assumption of missing completely at random (MCAR). For the assumption of MCAR to be met, missingness cannot be related to either the observed or unobserved variables. A less stringent assumption, missing at random (MAR), requires that missingness not be associated with the value of the missing variable itself, but can be associated with the other observed variables. When data are truly MAR as opposed to MCAR, the default complete case analysis method can lead to biased results. There are statistical options available to adjust for data that are MAR, including multiple imputation (MI) which is consistent and efficient at estimating effects. Multiple imputation uses informing variables to determine statistical distributions for each piece of missing data. Then multiple datasets are created by randomly drawing on the distributions for each piece of missing data. Since MI is efficient, only a limited number, usually less than 20, of imputed datasets are required to get stable estimates. Each imputed dataset is analyzed using standard statistical techniques, and then results are combined to get overall estimates of effect. A simulation study will be demonstrated to show the results of using the default complete case analysis, and MI in a linear regression of MCAR and MAR simulated data. Further, MI was successfully applied to the association study of CO2 levels and headaches when initial analysis showed there may be an underlying association between missing CO2 levels and reported headaches. Through MI, we were able to show that there is a strong association between average CO2 levels and the risk of headaches. Each unit increase in CO2 (mmHg) resulted in a doubling in the odds of reported headaches.

  6. Analysis of methods to estimate spring flows in a karst aquifer

    USGS Publications Warehouse

    Sepulveda, N.

    2009-01-01

    Hydraulically and statistically based methods were analyzed to identify the most reliable method to predict spring flows in a karst aquifer. Measured water levels at nearby observation wells, measured spring pool altitudes, and the distance between observation wells and the spring pool were the parameters used to match measured spring flows. Measured spring flows at six Upper Floridan aquifer springs in central Florida were used to assess the reliability of these methods to predict spring flows. Hydraulically based methods involved the application of the Theis, Hantush-Jacob, and Darcy-Weisbach equations, whereas the statistically based methods were the multiple linear regressions and the technology of artificial neural networks (ANNs). Root mean square errors between measured and predicted spring flows using the Darcy-Weisbach method ranged between 5% and 15% of the measured flows, lower than the 7% to 27% range for the Theis or Hantush-Jacob methods. Flows at all springs were estimated to be turbulent based on the Reynolds number derived from the Darcy-Weisbach equation for conduit flow. The multiple linear regression and the Darcy-Weisbach methods had similar spring flow prediction capabilities. The ANNs provided the lowest residuals between measured and predicted spring flows, ranging from 1.6% to 5.3% of the measured flows. The model prediction efficiency criteria also indicated that the ANNs were the most accurate method predicting spring flows in a karst aquifer. ?? 2008 National Ground Water Association.

  7. Analysis of methods to estimate spring flows in a karst aquifer.

    PubMed

    Sepúlveda, Nicasio

    2009-01-01

    Hydraulically and statistically based methods were analyzed to identify the most reliable method to predict spring flows in a karst aquifer. Measured water levels at nearby observation wells, measured spring pool altitudes, and the distance between observation wells and the spring pool were the parameters used to match measured spring flows. Measured spring flows at six Upper Floridan aquifer springs in central Florida were used to assess the reliability of these methods to predict spring flows. Hydraulically based methods involved the application of the Theis, Hantush-Jacob, and Darcy-Weisbach equations, whereas the statistically based methods were the multiple linear regressions and the technology of artificial neural networks (ANNs). Root mean square errors between measured and predicted spring flows using the Darcy-Weisbach method ranged between 5% and 15% of the measured flows, lower than the 7% to 27% range for the Theis or Hantush-Jacob methods. Flows at all springs were estimated to be turbulent based on the Reynolds number derived from the Darcy-Weisbach equation for conduit flow. The multiple linear regression and the Darcy-Weisbach methods had similar spring flow prediction capabilities. The ANNs provided the lowest residuals between measured and predicted spring flows, ranging from 1.6% to 5.3% of the measured flows. The model prediction efficiency criteria also indicated that the ANNs were the most accurate method predicting spring flows in a karst aquifer.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilbert, Richard O.

    The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Somemore » statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.« less

  9. Multivariate analysis: greater insights into complex systems

    USDA-ARS?s Scientific Manuscript database

    Many agronomic researchers measure and collect multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate (MV) statistical methods encompass the simultaneous analysis of all random variables (RV) measured on each experimental or sampling ...

  10. The effect of ion-exchange purification on the determination of plutonium at the New Brunswick Laboratory

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mitchell, W.G.; Spaletto, M.I.; Lewis, K.

    The method of plutonium (Pu) determination at the Brunswick Laboratory (NBL) consists of a combination of ion-exchange purification followed by controlled-potential coulometric analysis (IE/CPC). The present report's purpose is to quantify any detectable Pu loss occurring in the ion-exchange (IE) purification step which would cause a negative bias in the NBL method for Pu analysis. The magnitude of any such loss would be contained within the reproducibility (0.05%) of the IE/CPC method which utilizes a state-of-the-art autocoulometer developed at NBL. When the NBL IE/CPC method is used for Pu analysis, any loss in ion-exchange purification (<0.05%) is confounded with themore » repeatability of the ion-exchange and the precision of the CPC analysis technique (<0.05%). Consequently, to detect a bias in the IE/CPC method due to the IE alone using the IE/CPC method itself requires that many randomized analyses on a single material be performed over time and that statistical analysis of the data be performed. The initial approach described in this report to quantify any IE loss was an independent method, Isotope Dilution Mass Spectrometry; however, the number of analyses performed was insufficient to assign a statistically significant value to the IE loss (<0.02% of 10 mg samples of Pu). The second method used for quantifying any IE loss of Pu was multiple ion exchanges of the same Pu aliquant; the small number of analyses possible per individual IE together with the column-to-column variability over multiple ion exchanges prevented statistical detection of any loss of <0.05%. 12 refs.« less

  11. MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data

    PubMed Central

    Hu, Jiyuan; Li, Tengfei; Xiu, Zidi; Zhang, Hong

    2015-01-01

    Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package “MAFsnp” implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/. PMID:26309201

  12. REANALYSIS OF F-STATISTIC GRAVITATIONAL-WAVE SEARCHES WITH THE HIGHER CRITICISM STATISTIC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bennett, M. F.; Melatos, A.; Delaigle, A.

    2013-04-01

    We propose a new method of gravitational-wave detection using a modified form of higher criticism, a statistical technique introduced by Donoho and Jin. Higher criticism is designed to detect a group of sparse, weak sources, none of which are strong enough to be reliably estimated or detected individually. We apply higher criticism as a second-pass method to synthetic F-statistic and C-statistic data for a monochromatic periodic source in a binary system and quantify the improvement relative to the first-pass methods. We find that higher criticism on C-statistic data is more sensitive by {approx}6% than the C-statistic alone under optimal conditionsmore » (i.e., binary orbit known exactly) and the relative advantage increases as the error in the orbital parameters increases. Higher criticism is robust even when the source is not monochromatic (e.g., phase-wandering in an accreting system). Applying higher criticism to a phase-wandering source over multiple time intervals gives a {approx}> 30% increase in detectability with few assumptions about the frequency evolution. By contrast, in all-sky searches for unknown periodic sources, which are dominated by the brightest source, second-pass higher criticism does not provide any benefits over a first-pass search.« less

  13. Statistical Power in Evaluations That Investigate Effects on Multiple Outcomes: A Guide for Researchers

    ERIC Educational Resources Information Center

    Porter, Kristin E.

    2018-01-01

    Researchers are often interested in testing the effectiveness of an intervention on multiple outcomes, for multiple subgroups, at multiple points in time, or across multiple treatment groups. The resulting multiplicity of statistical hypothesis tests can lead to spurious findings of effects. Multiple testing procedures (MTPs) are statistical…

  14. General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies

    PubMed Central

    Lee, Seunggeun; Teslovich, Tanya M.; Boehnke, Michael; Lin, Xihong

    2013-01-01

    We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels. PMID:23768515

  15. Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.

    PubMed

    Harrington, Peter de Boves

    2018-01-02

    Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.

  16. [The problem of small "n" and big "P" in neuropsycho-pharmacology, or how to keep the rate of false discoveries under control].

    PubMed

    Petschner, Péter; Bagdy, György; Tóthfalusi, Laszló

    2015-03-01

    One of the characteristics of many methods used in neuropsychopharmacology is that a large number of parameters (P) are measured in relatively few subjects (n). Functional magnetic resonance imaging, electroencephalography (EEG) and genomic studies are typical examples. For example one microarray chip can contain thousands of probes. Therefore, in studies using microarray chips, P may be several thousand-fold larger than n. Statistical analysis of such studies is a challenging task and they are refereed to in the statistical literature such as the small "n" big "P" problem. The problem has many facets including the controversies associated with multiple hypothesis testing. A typical scenario in this context is, when two or more groups are compared by the individual attributes. If the increased classification error due to the multiple testing is neglected, then several highly significant differences will be discovered. But in reality, some of these significant differences are coincidental, not reproducible findings. Several methods were proposed to solve this problem. In this review we discuss two of the proposed solutions, algorithms to compare sets and statistical hypothesis tests controlling the false discovery rate.

  17. Compositional data analysis for physical activity, sedentary time and sleep research.

    PubMed

    Dumuid, Dorothea; Stanford, Tyman E; Martin-Fernández, Josep-Antoni; Pedišić, Željko; Maher, Carol A; Lewis, Lucy K; Hron, Karel; Katzmarzyk, Peter T; Chaput, Jean-Philippe; Fogelholm, Mikael; Hu, Gang; Lambert, Estelle V; Maia, José; Sarmiento, Olga L; Standage, Martyn; Barreira, Tiago V; Broyles, Stephanie T; Tudor-Locke, Catrine; Tremblay, Mark S; Olds, Timothy

    2017-01-01

    The health effects of daily activity behaviours (physical activity, sedentary time and sleep) are widely studied. While previous research has largely examined activity behaviours in isolation, recent studies have adjusted for multiple behaviours. However, the inclusion of all activity behaviours in traditional multivariate analyses has not been possible due to the perfect multicollinearity of 24-h time budget data. The ensuing lack of adjustment for known effects on the outcome undermines the validity of study findings. We describe a statistical approach that enables the inclusion of all daily activity behaviours, based on the principles of compositional data analysis. Using data from the International Study of Childhood Obesity, Lifestyle and the Environment, we demonstrate the application of compositional multiple linear regression to estimate adiposity from children's daily activity behaviours expressed as isometric log-ratio coordinates. We present a novel method for predicting change in a continuous outcome based on relative changes within a composition, and for calculating associated confidence intervals to allow for statistical inference. The compositional data analysis presented overcomes the lack of adjustment that has plagued traditional statistical methods in the field, and provides robust and reliable insights into the health effects of daily activity behaviours.

  18. Empirical comparison study of approximate methods for structure selection in binary graphical models.

    PubMed

    Viallon, Vivian; Banerjee, Onureena; Jougla, Eric; Rey, Grégoire; Coste, Joel

    2014-03-01

    Looking for associations among multiple variables is a topical issue in statistics due to the increasing amount of data encountered in biology, medicine, and many other domains involving statistical applications. Graphical models have recently gained popularity for this purpose in the statistical literature. In the binary case, however, exact inference is generally very slow or even intractable because of the form of the so-called log-partition function. In this paper, we review various approximate methods for structure selection in binary graphical models that have recently been proposed in the literature and compare them through an extensive simulation study. We also propose a modification of one existing method, that is shown to achieve good performance and to be generally very fast. We conclude with an application in which we search for associations among causes of death recorded on French death certificates. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Optimal allocation of testing resources for statistical simulations

    NASA Astrophysics Data System (ADS)

    Quintana, Carolina; Millwater, Harry R.; Singh, Gulshan; Golden, Patrick

    2015-07-01

    Statistical estimates from simulation involve uncertainty caused by the variability in the input random variables due to limited data. Allocating resources to obtain more experimental data of the input variables to better characterize their probability distributions can reduce the variance of statistical estimates. The methodology proposed determines the optimal number of additional experiments required to minimize the variance of the output moments given single or multiple constraints. The method uses multivariate t-distribution and Wishart distribution to generate realizations of the population mean and covariance of the input variables, respectively, given an amount of available data. This method handles independent and correlated random variables. A particle swarm method is used for the optimization. The optimal number of additional experiments per variable depends on the number and variance of the initial data, the influence of the variable in the output function and the cost of each additional experiment. The methodology is demonstrated using a fretting fatigue example.

  20. A SAS macro for testing differences among three or more independent groups using Kruskal-Wallis and Nemenyi tests.

    PubMed

    Liu, Yuewei; Chen, Weihong

    2012-02-01

    As a nonparametric method, the Kruskal-Wallis test is widely used to compare three or more independent groups when an ordinal or interval level of data is available, especially when the assumptions of analysis of variance (ANOVA) are not met. If the Kruskal-Wallis statistic is statistically significant, Nemenyi test is an alternative method for further pairwise multiple comparisons to locate the source of significance. Unfortunately, most popular statistical packages do not integrate the Nemenyi test, which is not easy to be calculated by hand. We described the theory and applications of the Kruskal-Wallis and Nemenyi tests, and presented a flexible SAS macro to implement the two tests. The SAS macro was demonstrated by two examples from our cohort study in occupational epidemiology. It provides a useful tool for SAS users to test the differences among three or more independent groups using a nonparametric method.

  1. Rare Variant Association Test with Multiple Phenotypes

    PubMed Central

    Lee, Selyeong; Won, Sungho; Kim, Young Jin; Kim, Yongkang; Kim, Bong-Jo; Park, Taesung

    2016-01-01

    Although genome-wide association studies (GWAS) have now discovered thousands of genetic variants associated with common traits, such variants cannot explain the large degree of “missing heritability,” likely due to rare variants. The advent of next generation sequencing technology has allowed rare variant detection and association with common traits, often by investigating specific genomic regions for rare variant effects on a trait. Although multiply correlated phenotypes are often concurrently observed in GWAS, most studies analyze only single phenotypes, which may lessen statistical power. To increase power, multivariate analyses, which consider correlations between multiple phenotypes, can be used. However, few existing multi-variant analyses can identify rare variants for assessing multiple phenotypes. Here, we propose Multivariate Association Analysis using Score Statistics (MAAUSS), to identify rare variants associated with multiple phenotypes, based on the widely used Sequence Kernel Association Test (SKAT) for a single phenotype. We applied MAAUSS to Whole Exome Sequencing (WES) data from a Korean population of 1,058 subjects, to discover genes associated with multiple traits of liver function. We then assessed validation of those genes by a replication study, using an independent dataset of 3,445 individuals. Notably, we detected the gene ZNF620 among five significant genes. We then performed a simulation study to compare MAAUSS's performance with existing methods. Overall, MAAUSS successfully conserved type 1 error rates and in many cases, had a higher power than the existing methods. This study illustrates a feasible and straightforward approach for identifying rare variants correlated with multiple phenotypes, with likely relevance to missing heritability. PMID:28039885

  2. Dynamic Quantitative Trait Locus Analysis of Plant Phenomic Data.

    PubMed

    Li, Zitong; Sillanpää, Mikko J

    2015-12-01

    Advanced platforms have recently become available for automatic and systematic quantification of plant growth and development. These new techniques can efficiently produce multiple measurements of phenotypes over time, and introduce time as an extra dimension to quantitative trait locus (QTL) studies. Functional mapping utilizes a class of statistical models for identifying QTLs associated with the growth characteristics of interest. A major benefit of functional mapping is that it integrates information over multiple timepoints, and therefore could increase the statistical power for QTL detection. We review the current development of computationally efficient functional mapping methods which provide invaluable tools for analyzing large-scale timecourse data that are readily available in our post-genome era. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. A simple test of association for contingency tables with multiple column responses.

    PubMed

    Decady, Y J; Thomas, D R

    2000-09-01

    Loughin and Scherer (1998, Biometrics 54, 630-637) investigated tests of association in two-way tables when one of the categorical variables allows for multiple-category responses from individual respondents. Standard chi-squared tests are invalid in this case, and they developed a bootstrap test procedure that provides good control of test levels under the null hypothesis. This procedure and some others that have been proposed are computationally involved and are based on techniques that are relatively unfamiliar to many practitioners. In this paper, the methods introduced by Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) for analyzing complex survey data are used to develop a simple test based on a corrected chi-squared statistic.

  4. MODELING A MIXTURE: PBPK/PD APPROACHES FOR PREDICTING CHEMICAL INTERACTIONS.

    EPA Science Inventory

    Since environmental chemical exposures generally involve multiple chemicals, there are both regulatory and scientific drivers to develop methods to predict outcomes of these exposures. Even using efficient statistical and experimental designs, it is not possible to test in vivo a...

  5. Strength and life criteria for corrugated fiberboard by three methods

    Treesearch

    Thomas J. Urbanik

    1997-01-01

    The conventional test method for determining the stacking life of corrugated containers at a fixed load level does not adequately predict a safe load when storage time is fixed. This study introduced multiple load levels and related the probability of time at failure to load. A statistical analysis of logarithm-of-time failure data varying with load level predicts the...

  6. Influence of Family Communication Structure and Vanity Trait on Consumption Behavior: A Case Study of Adolescent Students in Taiwan

    ERIC Educational Resources Information Center

    Chang, Wei-Lung; Liu, Hsiang-Te; Lin, Tai-An; Wen, Yung-Sung

    2008-01-01

    The purpose of this research was to study the relationship between family communication structure, vanity trait, and related consumption behavior. The study used an empirical method with adolescent students from the northern part of Taiwan as the subjects. Multiple statistical methods and the SEM model were used for testing the hypotheses. The…

  7. Statistical methods and regression analysis of stratospheric ozone and meteorological variables in Isfahan

    NASA Astrophysics Data System (ADS)

    Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.

    2008-04-01

    Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.

  8. A model of distributed phase aberration for deblurring phase estimated from scattering.

    PubMed

    Tillett, Jason C; Astheimer, Jeffrey P; Waag, Robert C

    2010-01-01

    Correction of aberration in ultrasound imaging uses the response of a point reflector or its equivalent to characterize the aberration. Because a point reflector is usually unavailable, its equivalent is obtained using statistical methods, such as processing reflections from multiple focal regions in a random medium. However, the validity of methods that use reflections from multiple points is limited to isoplanatic patches for which the aberration is essentially the same. In this study, aberration is modeled by an offset phase screen to relax the isoplanatic restriction. Methods are developed to determine the depth and phase of the screen and to use the model for compensation of aberration as the beam is steered. Use of the model to enhance the performance of the noted statistical estimation procedure is also described. Experimental results obtained with tissue-mimicking phantoms that implement different models and produce different amounts of aberration are presented to show the efficacy of these methods. The improvement in b-scan resolution realized with the model is illustrated. The results show that the isoplanatic patch assumption for estimation of aberration can be relaxed and that propagation-path characteristics and aberration estimation are closely related.

  9. Does rational selection of training and test sets improve the outcome of QSAR modeling?

    PubMed

    Martin, Todd M; Harten, Paul; Young, Douglas M; Muratov, Eugene N; Golbraikh, Alexander; Zhu, Hao; Tropsha, Alexander

    2012-10-22

    Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and k-nearest neighbor (kNN) methods were used to develop QSAR models based on the training sets. For kNN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable.

  10. Methods for meta-analysis of multiple traits using GWAS summary statistics.

    PubMed

    Ray, Debashree; Boehnke, Michael

    2018-03-01

    Genome-wide association studies (GWAS) for complex diseases have focused primarily on single-trait analyses for disease status and disease-related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL-cholesterol, HDL-cholesterol, and triglycerides (TGs) separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed that require individual-level data. Here, we develop metaUSAT (where USAT is unified score-based association test), a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. Although the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual-level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic P-value for association and is computationally efficient for implementation at a genome-wide level. Simulation experiments show that metaUSAT maintains proper type-I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D-GENES studies, metaUSAT detected genome-wide significant loci beyond the ones identified by univariate analyses. Evidence from larger studies suggest that the variants additionally detected by our test are, indeed, associated with lipid levels in humans. In summary, metaUSAT can provide novel insights into the genetic architecture of a common disease or traits. © 2017 WILEY PERIODICALS, INC.

  11. Probabilistic inversion with graph cuts: Application to the Boise Hydrogeophysical Research Site

    NASA Astrophysics Data System (ADS)

    Pirot, Guillaume; Linde, Niklas; Mariethoz, Grégoire; Bradford, John H.

    2017-02-01

    Inversion methods that build on multiple-point statistics tools offer the possibility to obtain model realizations that are not only in agreement with field data, but also with conceptual geological models that are represented by training images. A recent inversion approach based on patch-based geostatistical resimulation using graph cuts outperforms state-of-the-art multiple-point statistics methods when applied to synthetic inversion examples featuring continuous and discontinuous property fields. Applications of multiple-point statistics tools to field data are challenging due to inevitable discrepancies between actual subsurface structure and the assumptions made in deriving the training image. We introduce several amendments to the original graph cut inversion algorithm and present a first-ever field application by addressing porosity estimation at the Boise Hydrogeophysical Research Site, Boise, Idaho. We consider both a classical multi-Gaussian and an outcrop-based prior model (training image) that are in agreement with available porosity data. When conditioning to available crosshole ground-penetrating radar data using Markov chain Monte Carlo, we find that the posterior realizations honor overall both the characteristics of the prior models and the geophysical data. The porosity field is inverted jointly with the measurement error and the petrophysical parameters that link dielectric permittivity to porosity. Even though the multi-Gaussian prior model leads to posterior realizations with higher likelihoods, the outcrop-based prior model shows better convergence. In addition, it offers geologically more realistic posterior realizations and it better preserves the full porosity range of the prior.

  12. Improvements in cognition, quality of life, and physical performance with clinical Pilates in multiple sclerosis: a randomized controlled trial

    PubMed Central

    Küçük, Fadime; Kara, Bilge; Poyraz, Esra Çoşkuner; İdiman, Egemen

    2016-01-01

    [Purpose] The aim of this study was to determine the effects of clinical Pilates in multiple sclerosis patients. [Subjects and Methods] Twenty multiple sclerosis patients were enrolled in this study. The participants were divided into two groups as the clinical Pilates and control groups. Cognition (Multiple Sclerosis Functional Composite), balance (Berg Balance Scale), physical performance (timed performance tests, Timed up and go test), tiredness (Modified Fatigue Impact scale), depression (Beck Depression Inventory), and quality of life (Multiple Sclerosis International Quality of Life Questionnaire) were measured before and after treatment in all participants. [Results] There were statistically significant differences in balance, timed performance, tiredness and Multiple Sclerosis Functional Composite tests between before and after treatment in the clinical Pilates group. We also found significant differences in timed performance tests, the Timed up and go test and the Multiple Sclerosis Functional Composite between before and after treatment in the control group. According to the difference analyses, there were significant differences in Multiple Sclerosis Functional Composite and Multiple Sclerosis International Quality of Life Questionnaire scores between the two groups in favor of the clinical Pilates group. There were statistically significant clinical differences in favor of the clinical Pilates group in comparison of measurements between the groups. Clinical Pilates improved cognitive functions and quality of life compared with traditional exercise. [Conclusion] In Multiple Sclerosis treatment, clinical Pilates should be used as a holistic approach by physical therapists. PMID:27134355

  13. Improvements in cognition, quality of life, and physical performance with clinical Pilates in multiple sclerosis: a randomized controlled trial.

    PubMed

    Küçük, Fadime; Kara, Bilge; Poyraz, Esra Çoşkuner; İdiman, Egemen

    2016-03-01

    [Purpose] The aim of this study was to determine the effects of clinical Pilates in multiple sclerosis patients. [Subjects and Methods] Twenty multiple sclerosis patients were enrolled in this study. The participants were divided into two groups as the clinical Pilates and control groups. Cognition (Multiple Sclerosis Functional Composite), balance (Berg Balance Scale), physical performance (timed performance tests, Timed up and go test), tiredness (Modified Fatigue Impact scale), depression (Beck Depression Inventory), and quality of life (Multiple Sclerosis International Quality of Life Questionnaire) were measured before and after treatment in all participants. [Results] There were statistically significant differences in balance, timed performance, tiredness and Multiple Sclerosis Functional Composite tests between before and after treatment in the clinical Pilates group. We also found significant differences in timed performance tests, the Timed up and go test and the Multiple Sclerosis Functional Composite between before and after treatment in the control group. According to the difference analyses, there were significant differences in Multiple Sclerosis Functional Composite and Multiple Sclerosis International Quality of Life Questionnaire scores between the two groups in favor of the clinical Pilates group. There were statistically significant clinical differences in favor of the clinical Pilates group in comparison of measurements between the groups. Clinical Pilates improved cognitive functions and quality of life compared with traditional exercise. [Conclusion] In Multiple Sclerosis treatment, clinical Pilates should be used as a holistic approach by physical therapists.

  14. The Use of Meta-Analytic Statistical Significance Testing

    ERIC Educational Resources Information Center

    Polanin, Joshua R.; Pigott, Terri D.

    2015-01-01

    Meta-analysis multiplicity, the concept of conducting multiple tests of statistical significance within one review, is an underdeveloped literature. We address this issue by considering how Type I errors can impact meta-analytic results, suggest how statistical power may be affected through the use of multiplicity corrections, and propose how…

  15. Detecting Genomic Clustering of Risk Variants from Sequence Data: Cases vs. Controls

    PubMed Central

    Schaid, Daniel J.; Sinnwell, Jason P.; McDonnell, Shannon K.; Thibodeau, Stephen N.

    2013-01-01

    As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method – Tango’s statistic – to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled chi-square distribution, making computation of p-values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test (SKAT). Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios. PMID:23842950

  16. Reporting Practices and Use of Quantitative Methods in Canadian Journal Articles in Psychology.

    PubMed

    Counsell, Alyssa; Harlow, Lisa L

    2017-05-01

    With recent focus on the state of research in psychology, it is essential to assess the nature of the statistical methods and analyses used and reported by psychological researchers. To that end, we investigated the prevalence of different statistical procedures and the nature of statistical reporting practices in recent articles from the four major Canadian psychology journals. The majority of authors evaluated their research hypotheses through the use of analysis of variance (ANOVA), t -tests, and multiple regression. Multivariate approaches were less common. Null hypothesis significance testing remains a popular strategy, but the majority of authors reported a standardized or unstandardized effect size measure alongside their significance test results. Confidence intervals on effect sizes were infrequently employed. Many authors provided minimal details about their statistical analyses and less than a third of the articles presented on data complications such as missing data and violations of statistical assumptions. Strengths of and areas needing improvement for reporting quantitative results are highlighted. The paper concludes with recommendations for how researchers and reviewers can improve comprehension and transparency in statistical reporting.

  17. A note on generalized Genome Scan Meta-Analysis statistics

    PubMed Central

    Koziol, James A; Feng, Anne C

    2005-01-01

    Background Wise et al. introduced a rank-based statistical technique for meta-analysis of genome scans, the Genome Scan Meta-Analysis (GSMA) method. Levinson et al. recently described two generalizations of the GSMA statistic: (i) a weighted version of the GSMA statistic, so that different studies could be ascribed different weights for analysis; and (ii) an order statistic approach, reflecting the fact that a GSMA statistic can be computed for each chromosomal region or bin width across the various genome scan studies. Results We provide an Edgeworth approximation to the null distribution of the weighted GSMA statistic, and, we examine the limiting distribution of the GSMA statistics under the order statistic formulation, and quantify the relevance of the pairwise correlations of the GSMA statistics across different bins on this limiting distribution. We also remark on aggregate criteria and multiple testing for determining significance of GSMA results. Conclusion Theoretical considerations detailed herein can lead to clarification and simplification of testing criteria for generalizations of the GSMA statistic. PMID:15717930

  18. Dermatoglyphic features in patients with multiple sclerosis

    PubMed Central

    Sabanciogullari, Vedat; Cevik, Seyda; Karacan, Kezban; Bolayir, Ertugrul; Cimen, Mehmet

    2014-01-01

    Objective: To examine dermatoglyphic features to clarify implicated genetic predisposition in the etiology of multiple sclerosis (MS). Methods: The study was conducted between January and December 2013 in the Departments of Anatomy, and Neurology, Cumhuriyet University School of Medicine, Sivas, Turkey. The dermatoglyphic data of 61 patients, and a control group consisting of 62 healthy adults obtained with a digital scanner were transferred to a computer environment. The ImageJ program was used, and atd, dat, adt angles, a-b ridge count, sample types of all fingers, and ridge counts were calculated. Results: In both hands of the patients with MS, the a-b ridge count and ridge counts in all fingers increased, and the differences in these values were statistically significant. There was also a statistically significant increase in the dat angle in both hands of the MS patients. On the contrary, there was no statistically significant difference between the groups in terms of dermal ridge samples, and the most frequent sample in both groups was the ulnar loop. Conclusions: Aberrations in the distribution of dermatoglyphic samples support the genetic predisposition in MS etiology. Multiple sclerosis susceptible individuals may be determined by analyzing dermatoglyphic samples. PMID:25274586

  19. Transitioning to multiple imputation : a new method to impute missing blood alcohol concentration (BAC) values in FARS

    DOT National Transportation Integrated Search

    2002-01-01

    The National Center for Statistics and Analysis (NCSA) of the National Highway Traffic Safety : Administration (NHTSA) has undertaken several approaches to remedy the problem of missing blood alcohol : test results in the Fatality Analysis Reporting ...

  20. THE STATISTICAL ANALYSIS OF DISCRETE AND CONTINUOUS OUTCOMES USING DESIRABILITY FUNCTIONS.

    EPA Science Inventory

    Multiple types of outcomes are sometimes measured on each animal in toxicology dose-response experiments. In this paper we introduce a method of deriving a composite score for a dose-response experiment that combines information from discrete and continuous outcomes through the ...

  1. Analyzing the effect of selected control policy measures and sociodemographic factors on alcoholic beverage consumption in Europe within the AMPHORA project: statistical methods.

    PubMed

    Baccini, Michela; Carreras, Giulia

    2014-10-01

    This paper describes the methods used to investigate variations in total alcoholic beverage consumption as related to selected control intervention policies and other socioeconomic factors (unplanned factors) within 12 European countries involved in the AMPHORA project. The analysis presented several critical points: presence of missing values, strong correlation among the unplanned factors, long-term waves or trends in both the time series of alcohol consumption and the time series of the main explanatory variables. These difficulties were addressed by implementing a multiple imputation procedure for filling in missing values, then specifying for each country a multiple regression model which accounted for time trend, policy measures and a limited set of unplanned factors, selected in advance on the basis of sociological and statistical considerations are addressed. This approach allowed estimating the "net" effect of the selected control policies on alcohol consumption, but not the association between each unplanned factor and the outcome.

  2. Multiresolution multiscale active mask segmentation of fluorescence microscope images

    NASA Astrophysics Data System (ADS)

    Srinivasa, Gowri; Fickus, Matthew; Kovačević, Jelena

    2009-08-01

    We propose an active mask segmentation framework that combines the advantages of statistical modeling, smoothing, speed and flexibility offered by the traditional methods of region-growing, multiscale, multiresolution and active contours respectively. At the crux of this framework is a paradigm shift from evolving contours in the continuous domain to evolving multiple masks in the discrete domain. Thus, the active mask framework is particularly suited to segment digital images. We demonstrate the use of the framework in practice through the segmentation of punctate patterns in fluorescence microscope images. Experiments reveal that statistical modeling helps the multiple masks converge from a random initial configuration to a meaningful one. This obviates the need for an involved initialization procedure germane to most of the traditional methods used to segment fluorescence microscope images. While we provide the mathematical details of the functions used to segment fluorescence microscope images, this is only an instantiation of the active mask framework. We suggest some other instantiations of the framework to segment different types of images.

  3. Data Science in the Research Domain Criteria Era: Relevance of Machine Learning to the Study of Stress Pathology, Recovery, and Resilience

    PubMed Central

    Galatzer-Levy, Isaac R.; Ruggles, Kelly; Chen, Zhe

    2017-01-01

    Diverse environmental and biological systems interact to influence individual differences in response to environmental stress. Understanding the nature of these complex relationships can enhance the development of methods to: (1) identify risk, (2) classify individuals as healthy or ill, (3) understand mechanisms of change, and (4) develop effective treatments. The Research Domain Criteria (RDoC) initiative provides a theoretical framework to understand health and illness as the product of multiple inter-related systems but does not provide a framework to characterize or statistically evaluate such complex relationships. Characterizing and statistically evaluating models that integrate multiple levels (e.g. synapses, genes, environmental factors) as they relate to outcomes that a free from prior diagnostic benchmarks represents a challenge requiring new computational tools that are capable to capture complex relationships and identify clinically relevant populations. In the current review, we will summarize machine learning methods that can achieve these goals. PMID:29527592

  4. A survey and evaluations of histogram-based statistics in alignment-free sequence comparison.

    PubMed

    Luczak, Brian B; James, Benjamin T; Girgis, Hani Z

    2017-12-06

    Since the dawn of the bioinformatics field, sequence alignment scores have been the main method for comparing sequences. However, alignment algorithms are quadratic, requiring long execution time. As alternatives, scientists have developed tens of alignment-free statistics for measuring the similarity between two sequences. We surveyed tens of alignment-free k-mer statistics. Additionally, we evaluated 33 statistics and multiplicative combinations between the statistics and/or their squares. These statistics are calculated on two k-mer histograms representing two sequences. Our evaluations using global alignment scores revealed that the majority of the statistics are sensitive and capable of finding similar sequences to a query sequence. Therefore, any of these statistics can filter out dissimilar sequences quickly. Further, we observed that multiplicative combinations of the statistics are highly correlated with the identity score. Furthermore, combinations involving sequence length difference or Earth Mover's distance, which takes the length difference into account, are always among the highest correlated paired statistics with identity scores. Similarly, paired statistics including length difference or Earth Mover's distance are among the best performers in finding the K-closest sequences. Interestingly, similar performance can be obtained using histograms of shorter words, resulting in reducing the memory requirement and increasing the speed remarkably. Moreover, we found that simple single statistics are sufficient for processing next-generation sequencing reads and for applications relying on local alignment. Finally, we measured the time requirement of each statistic. The survey and the evaluations will help scientists with identifying efficient alternatives to the costly alignment algorithm, saving thousands of computational hours. The source code of the benchmarking tool is available as Supplementary Materials. © The Author 2017. Published by Oxford University Press.

  5. Statistical Power in Evaluations That Investigate Effects on Multiple Outcomes: A Guide for Researchers

    ERIC Educational Resources Information Center

    Porter, Kristin E.

    2016-01-01

    In education research and in many other fields, researchers are often interested in testing the effectiveness of an intervention on multiple outcomes, for multiple subgroups, at multiple points in time, or across multiple treatment groups. The resulting multiplicity of statistical hypothesis tests can lead to spurious findings of effects. Multiple…

  6. An efficient genome-wide association test for mixed binary and continuous phenotypes with applications to substance abuse research.

    PubMed

    Buu, Anne; Williams, L Keoki; Yang, James J

    2018-03-01

    We propose a new genome-wide association test for mixed binary and continuous phenotypes that uses an efficient numerical method to estimate the empirical distribution of the Fisher's combination statistic under the null hypothesis. Our simulation study shows that the proposed method controls the type I error rate and also maintains its power at the level of the permutation method. More importantly, the computational efficiency of the proposed method is much higher than the one of the permutation method. The simulation results also indicate that the power of the test increases when the genetic effect increases, the minor allele frequency increases, and the correlation between responses decreases. The statistical analysis on the database of the Study of Addiction: Genetics and Environment demonstrates that the proposed method combining multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests.

  7. Statistical Algorithms Accounting for Background Density in the Detection of UXO Target Areas at DoD Munitions Sites

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matzke, Brett D.; Wilson, John E.; Hathaway, J.

    2008-02-12

    Statistically defensible methods are presented for developing geophysical detector sampling plans and analyzing data for munitions response sites where unexploded ordnance (UXO) may exist. Detection methods for identifying areas of elevated anomaly density from background density are shown. Additionally, methods are described which aid in the choice of transect pattern and spacing to assure with degree of confidence that a target area (TA) of specific size, shape, and anomaly density will be identified using the detection methods. Methods for evaluating the sensitivity of designs to variation in certain parameters are also discussed. Methods presented have been incorporated into the Visualmore » Sample Plan (VSP) software (free at http://dqo.pnl.gov/vsp) and demonstrated at multiple sites in the United States. Application examples from actual transect designs and surveys from the previous two years are demonstrated.« less

  8. Robust inference for group sequential trials.

    PubMed

    Ganju, Jitendra; Lin, Yunzhi; Zhou, Kefei

    2017-03-01

    For ethical reasons, group sequential trials were introduced to allow trials to stop early in the event of extreme results. Endpoints in such trials are usually mortality or irreversible morbidity. For a given endpoint, the norm is to use a single test statistic and to use that same statistic for each analysis. This approach is risky because the test statistic has to be specified before the study is unblinded, and there is loss in power if the assumptions that ensure optimality for each analysis are not met. To minimize the risk of moderate to substantial loss in power due to a suboptimal choice of a statistic, a robust method was developed for nonsequential trials. The concept is analogous to diversification of financial investments to minimize risk. The method is based on combining P values from multiple test statistics for formal inference while controlling the type I error rate at its designated value.This article evaluates the performance of 2 P value combining methods for group sequential trials. The emphasis is on time to event trials although results from less complex trials are also included. The gain or loss in power with the combination method relative to a single statistic is asymmetric in its favor. Depending on the power of each individual test, the combination method can give more power than any single test or give power that is closer to the test with the most power. The versatility of the method is that it can combine P values from different test statistics for analysis at different times. The robustness of results suggests that inference from group sequential trials can be strengthened with the use of combined tests. Copyright © 2017 John Wiley & Sons, Ltd.

  9. Data Mining Methods Applied to Flight Operations Quality Assurance Data: A Comparison to Standard Statistical Methods

    NASA Technical Reports Server (NTRS)

    Stolzer, Alan J.; Halford, Carl

    2007-01-01

    In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.

  10. Global Topology Optimisation

    DTIC Science & Technology

    2016-10-31

    statistical physics. Sec. IV includes several examples of the application of the stochastic method, including matching of a shape to a fixed design, and...an important part of any future application of this method. Second, re-initialization of the level set can lead to small but significant movements of...of engineering design problems [6, 17]. However, many of the relevant applications involve non-convex optimisation problems with multiple locally

  11. Conceptual and statistical problems associated with the use of diversity indices in ecology.

    PubMed

    Barrantes, Gilbert; Sandoval, Luis

    2009-09-01

    Diversity indices, particularly the Shannon-Wiener index, have extensively been used in analyzing patterns of diversity at different geographic and ecological scales. These indices have serious conceptual and statistical problems which make comparisons of species richness or species abundances across communities nearly impossible. There is often no a single statistical method that retains all information needed to answer even a simple question. However, multivariate analyses could be used instead of diversity indices, such as cluster analyses or multiple regressions. More complex multivariate analyses, such as Canonical Correspondence Analysis, provide very valuable information on environmental variables associated to the presence and abundance of the species in a community. In addition, particular hypotheses associated to changes in species richness across localities, or change in abundance of one, or a group of species can be tested using univariate, bivariate, and/or rarefaction statistical tests. The rarefaction method has proved to be robust to standardize all samples to a common size. Even the simplest method as reporting the number of species per taxonomic category possibly provides more information than a diversity index value.

  12. Method for simulating atmospheric turbulence phase effects for multiple time slices and anisoplanatic conditions.

    PubMed

    Roggemann, M C; Welsh, B M; Montera, D; Rhoadarmer, T A

    1995-07-10

    Simulating the effects of atmospheric turbulence on optical imaging systems is an important aspect of understanding the performance of these systems. Simulations are particularly important for understanding the statistics of some adaptive-optics system performance measures, such as the mean and variance of the compensated optical transfer function, and for understanding the statistics of estimators used to reconstruct intensity distributions from turbulence-corrupted image measurements. Current methods of simulating the performance of these systems typically make use of random phase screens placed in the system pupil. Methods exist for making random draws of phase screens that have the correct spatial statistics. However, simulating temporal effects and anisoplanatism requires one or more phase screens at different distances from the aperture, possibly moving with different velocities. We describe and demonstrate a method for creating random draws of phase screens with the correct space-time statistics for a bitrary turbulence and wind-velocity profiles, which can be placed in the telescope pupil in simulations. Results are provided for both the von Kármán and the Kolmogorov turbulence spectra. We also show how to simulate anisoplanatic effects with this technique.

  13. Statistical analysis of secondary particle distributions in relativistic nucleus-nucleus collisions

    NASA Technical Reports Server (NTRS)

    Mcguire, Stephen C.

    1987-01-01

    The use is described of several statistical techniques to characterize structure in the angular distributions of secondary particles from nucleus-nucleus collisions in the energy range 24 to 61 GeV/nucleon. The objective of this work was to determine whether there are correlations between emitted particle intensity and angle that may be used to support the existence of the quark gluon plasma. The techniques include chi-square null hypothesis tests, the method of discrete Fourier transform analysis, and fluctuation analysis. We have also used the method of composite unit vectors to test for azimuthal asymmetry in a data set of 63 JACEE-3 events. Each method is presented in a manner that provides the reader with some practical detail regarding its application. Of those events with relatively high statistics, Fe approaches 0 at 55 GeV/nucleon was found to possess an azimuthal distribution with a highly non-random structure. No evidence of non-statistical fluctuations was found in the pseudo-rapidity distributions of the events studied. It is seen that the most effective application of these methods relies upon the availability of many events or single events that possess very high multiplicities.

  14. Statistical identification of stimulus-activated network nodes in multi-neuron voltage-sensitive dye optical recordings.

    PubMed

    Fathiazar, Elham; Anemuller, Jorn; Kretzberg, Jutta

    2016-08-01

    Voltage-Sensitive Dye (VSD) imaging is an optical imaging method that allows measuring the graded voltage changes of multiple neurons simultaneously. In neuroscience, this method is used to reveal networks of neurons involved in certain tasks. However, the recorded relative dye fluorescence changes are usually low and signals are superimposed by noise and artifacts. Therefore, establishing a reliable method to identify which cells are activated by specific stimulus conditions is the first step to identify functional networks. In this paper, we present a statistical method to identify stimulus-activated network nodes as cells, whose activities during sensory network stimulation differ significantly from the un-stimulated control condition. This method is demonstrated based on voltage-sensitive dye recordings from up to 100 neurons in a ganglion of the medicinal leech responding to tactile skin stimulation. Without relying on any prior physiological knowledge, the network nodes identified by our statistical analysis were found to match well with published cell types involved in tactile stimulus processing and to be consistent across stimulus conditions and preparations.

  15. Comparisons of survival predictions using survival risk ratios based on International Classification of Diseases, Ninth Revision and Abbreviated Injury Scale trauma diagnosis codes.

    PubMed

    Clarke, John R; Ragone, Andrew V; Greenwald, Lloyd

    2005-09-01

    We conducted a comparison of methods for predicting survival using survival risk ratios (SRRs), including new comparisons based on International Classification of Diseases, Ninth Revision (ICD-9) versus Abbreviated Injury Scale (AIS) six-digit codes. From the Pennsylvania trauma center's registry, all direct trauma admissions were collected through June 22, 1999. Patients with no comorbid medical diagnoses and both ICD-9 and AIS injury codes were used for comparisons based on a single set of data. SRRs for ICD-9 and then for AIS diagnostic codes were each calculated two ways: from the survival rate of patients with each diagnosis and when each diagnosis was an isolated diagnosis. Probabilities of survival for the cohort were calculated using each set of SRRs by the multiplicative ICISS method and, where appropriate, the minimum SRR method. These prediction sets were then internally validated against actual survival by the Hosmer-Lemeshow goodness-of-fit statistic. The 41,364 patients had 1,224 different ICD-9 injury diagnoses in 32,261 combinations and 1,263 corresponding AIS injury diagnoses in 31,755 combinations, ranging from 1 to 27 injuries per patient. All conventional ICD-9-based combinations of SRRs and methods had better Hosmer-Lemeshow goodness-of-fit statistic fits than their AIS-based counterparts. The minimum SRR method produced better calibration than the multiplicative methods, presumably because it did not magnify inaccuracies in the SRRs that might occur with multiplication. Predictions of survival based on anatomic injury alone can be performed using ICD-9 codes, with no advantage from extra coding of AIS diagnoses. Predictions based on the single worst SRR were closer to actual outcomes than those based on multiplying SRRs.

  16. Application of Scan Statistics to Detect Suicide Clusters in Australia

    PubMed Central

    Cheung, Yee Tak Derek; Spittal, Matthew J.; Williamson, Michelle Kate; Tung, Sui Jay; Pirkis, Jane

    2013-01-01

    Background Suicide clustering occurs when multiple suicide incidents take place in a small area or/and within a short period of time. In spite of the multi-national research attention and particular efforts in preparing guidelines for tackling suicide clusters, the broader picture of epidemiology of suicide clustering remains unclear. This study aimed to develop techniques in using scan statistics to detect clusters, with the detection of suicide clusters in Australia as example. Methods and Findings Scan statistics was applied to detect clusters among suicides occurring between 2004 and 2008. Manipulation of parameter settings and change of area for scan statistics were performed to remedy shortcomings in existing methods. In total, 243 suicides out of 10,176 (2.4%) were identified as belonging to 15 suicide clusters. These clusters were mainly located in the Northern Territory, the northern part of Western Australia, and the northern part of Queensland. Among the 15 clusters, 4 (26.7%) were detected by both national and state cluster detections, 8 (53.3%) were only detected by the state cluster detection, and 3 (20%) were only detected by the national cluster detection. Conclusions These findings illustrate that the majority of spatial-temporal clusters of suicide were located in the inland northern areas, with socio-economic deprivation and higher proportions of indigenous people. Discrepancies between national and state/territory cluster detection by scan statistics were due to the contrast of the underlying suicide rates across states/territories. Performing both small-area and large-area analyses, and applying multiple parameter settings may yield the maximum benefits for exploring clusters. PMID:23342098

  17. The role of multiple-scale modelling of epilepsy in seizure forecasting

    PubMed Central

    Kuhlmann, Levin; Grayden, David B.; Wendling, Fabrice; Schiff, Steven J.

    2014-01-01

    Over the past three decades, a number of seizure prediction, or forecasting, methods have been developed. Although major achievements were accomplished regarding the statistical evaluation of proposed algorithms, it is recognized that further progress is still necessary for clinical application in patients. The lack of physiological motivation can partly explain this limitation. Therefore, a natural question is raised: can computational models of epilepsy be used to improve these methods? Here we review the literature on the multiple-scale neural modelling of epilepsy and the use of such models to infer physiological changes underlying epilepsy and epileptic seizures. We argue how these methods can be applied to advance the state-of-the-art in seizure forecasting. PMID:26035674

  18. Transmit Designs for the MIMO Broadcast Channel With Statistical CSI

    NASA Astrophysics Data System (ADS)

    Wu, Yongpeng; Jin, Shi; Gao, Xiqi; McKay, Matthew R.; Xiao, Chengshan

    2014-09-01

    We investigate the multiple-input multiple-output broadcast channel with statistical channel state information available at the transmitter. The so-called linear assignment operation is employed, and necessary conditions are derived for the optimal transmit design under general fading conditions. Based on this, we introduce an iterative algorithm to maximize the linear assignment weighted sum-rate by applying a gradient descent method. To reduce complexity, we derive an upper bound of the linear assignment achievable rate of each receiver, from which a simplified closed-form expression for a near-optimal linear assignment matrix is derived. This reveals an interesting construction analogous to that of dirty-paper coding. In light of this, a low complexity transmission scheme is provided. Numerical examples illustrate the significant performance of the proposed low complexity scheme.

  19. [Quantitative structure-gas chromatographic retention relationship of polycyclic aromatic sulfur heterocycles using molecular electronegativity-distance vector].

    PubMed

    Li, Zhenghua; Cheng, Fansheng; Xia, Zhining

    2011-01-01

    The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.

  20. Evaluation of adding item-response theory analysis for evaluation of the European Board of Ophthalmology Diploma examination.

    PubMed

    Mathysen, Danny G P; Aclimandos, Wagih; Roelant, Ella; Wouters, Kristien; Creuzot-Garcher, Catherine; Ringens, Peter J; Hawlina, Marko; Tassignon, Marie-José

    2013-11-01

    To investigate whether introduction of item-response theory (IRT) analysis, in parallel to the 'traditional' statistical analysis methods available for performance evaluation of multiple T/F items as used in the European Board of Ophthalmology Diploma (EBOD) examination, has proved beneficial, and secondly, to study whether the overall assessment performance of the current written part of EBOD is sufficiently high (KR-20≥ 0.90) to be kept as examination format in future EBOD editions. 'Traditional' analysis methods for individual MCQ item performance comprise P-statistics, Rit-statistics and item discrimination, while overall reliability is evaluated through KR-20 for multiple T/F items. The additional set of statistical analysis methods for the evaluation of EBOD comprises mainly IRT analysis. These analysis techniques are used to monitor whether the introduction of negative marking for incorrect answers (since EBOD 2010) has a positive influence on the statistical performance of EBOD as a whole and its individual test items in particular. Item-response theory analysis demonstrated that item performance parameters should not be evaluated individually, but should be related to one another. Before the introduction of negative marking, the overall EBOD reliability (KR-20) was good though with room for improvement (EBOD 2008: 0.81; EBOD 2009: 0.78). After the introduction of negative marking, the overall reliability of EBOD improved significantly (EBOD 2010: 0.92; EBOD 2011:0.91; EBOD 2012: 0.91). Although many statistical performance parameters are available to evaluate individual items, our study demonstrates that the overall reliability assessment remains the only crucial parameter to be evaluated allowing comparison. While individual item performance analysis is worthwhile to undertake as secondary analysis, drawing final conclusions seems to be more difficult. Performance parameters need to be related, as shown by IRT analysis. Therefore, IRT analysis has proved beneficial for the statistical analysis of EBOD. Introduction of negative marking has led to a significant increase in the reliability (KR-20 > 0.90), indicating that the current examination format can be kept for future EBOD examinations. © 2013 Acta Ophthalmologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.

  1. Borrowing of strength and study weights in multivariate and network meta-analysis.

    PubMed

    Jackson, Dan; White, Ian R; Price, Malcolm; Copas, John; Riley, Richard D

    2017-12-01

    Multivariate and network meta-analysis have the potential for the estimated mean of one effect to borrow strength from the data on other effects of interest. The extent of this borrowing of strength is usually assessed informally. We present new mathematical definitions of 'borrowing of strength'. Our main proposal is based on a decomposition of the score statistic, which we show can be interpreted as comparing the precision of estimates from the multivariate and univariate models. Our definition of borrowing of strength therefore emulates the usual informal assessment. We also derive a method for calculating study weights, which we embed into the same framework as our borrowing of strength statistics, so that percentage study weights can accompany the results from multivariate and network meta-analyses as they do in conventional univariate meta-analyses. Our proposals are illustrated using three meta-analyses involving correlated effects for multiple outcomes, multiple risk factor associations and multiple treatments (network meta-analysis).

  2. Borrowing of strength and study weights in multivariate and network meta-analysis

    PubMed Central

    Jackson, Dan; White, Ian R; Price, Malcolm; Copas, John; Riley, Richard D

    2016-01-01

    Multivariate and network meta-analysis have the potential for the estimated mean of one effect to borrow strength from the data on other effects of interest. The extent of this borrowing of strength is usually assessed informally. We present new mathematical definitions of ‘borrowing of strength’. Our main proposal is based on a decomposition of the score statistic, which we show can be interpreted as comparing the precision of estimates from the multivariate and univariate models. Our definition of borrowing of strength therefore emulates the usual informal assessment. We also derive a method for calculating study weights, which we embed into the same framework as our borrowing of strength statistics, so that percentage study weights can accompany the results from multivariate and network meta-analyses as they do in conventional univariate meta-analyses. Our proposals are illustrated using three meta-analyses involving correlated effects for multiple outcomes, multiple risk factor associations and multiple treatments (network meta-analysis). PMID:26546254

  3. Multiple signal classification algorithm for super-resolution fluorescence microscopy

    PubMed Central

    Agarwal, Krishna; Macháň, Radek

    2016-01-01

    Single-molecule localization techniques are restricted by long acquisition and computational times, or the need of special fluorophores or biologically toxic photochemical environments. Here we propose a statistical super-resolution technique of wide-field fluorescence microscopy we call the multiple signal classification algorithm which has several advantages. It provides resolution down to at least 50 nm, requires fewer frames and lower excitation power and works even at high fluorophore concentrations. Further, it works with any fluorophore that exhibits blinking on the timescale of the recording. The multiple signal classification algorithm shows comparable or better performance in comparison with single-molecule localization techniques and four contemporary statistical super-resolution methods for experiments of in vitro actin filaments and other independently acquired experimental data sets. We also demonstrate super-resolution at timescales of 245 ms (using 49 frames acquired at 200 frames per second) in samples of live-cell microtubules and live-cell actin filaments imaged without imaging buffers. PMID:27934858

  4. Tensile Properties of Dyneema SK76 Single Fibers at Multiple Loading Rates Using a Direct Gripping Method

    DTIC Science & Technology

    2014-06-01

    lower density compared with aramid fibers such as Kevlar and Twaron. Numerical modeling is used to design more effective fiber-based composite armor...in measuring fibers and doing experiments. vi INTENTIONALLY LEFT BLANK. 1 1. Introduction Aramid fibers such as Kevlar (DuPont) and Twaron...methyl methacrylate blocks. The efficacy of this method to grip Kevlar fibers has been rigorously studied using a variety of statistical methods at

  5. A systematic review of the quality of statistical methods employed for analysing quality of life data in cancer randomised controlled trials.

    PubMed

    Hamel, Jean-Francois; Saulnier, Patrick; Pe, Madeline; Zikos, Efstathios; Musoro, Jammbe; Coens, Corneel; Bottomley, Andrew

    2017-09-01

    Over the last decades, Health-related Quality of Life (HRQoL) end-points have become an important outcome of the randomised controlled trials (RCTs). HRQoL methodology in RCTs has improved following international consensus recommendations. However, no international recommendations exist concerning the statistical analysis of such data. The aim of our study was to identify and characterise the quality of the statistical methods commonly used for analysing HRQoL data in cancer RCTs. Building on our recently published systematic review, we analysed a total of 33 published RCTs studying the HRQoL methods reported in RCTs since 1991. We focussed on the ability of the methods to deal with the three major problems commonly encountered when analysing HRQoL data: their multidimensional and longitudinal structure and the commonly high rate of missing data. All studies reported HRQoL being assessed repeatedly over time for a period ranging from 2 to 36 months. Missing data were common, with compliance rates ranging from 45% to 90%. From the 33 studies considered, 12 different statistical methods were identified. Twenty-nine studies analysed each of the questionnaire sub-dimensions without type I error adjustment. Thirteen studies repeated the HRQoL analysis at each assessment time again without type I error adjustment. Only 8 studies used methods suitable for repeated measurements. Our findings show a lack of consistency in statistical methods for analysing HRQoL data. Problems related to multiple comparisons were rarely considered leading to a high risk of false positive results. It is therefore critical that international recommendations for improving such statistical practices are developed. Copyright © 2017. Published by Elsevier Ltd.

  6. Statistical methods for quantitative mass spectrometry proteomic experiments with labeling.

    PubMed

    Oberg, Ann L; Mahoney, Douglas W

    2012-01-01

    Mass Spectrometry utilizing labeling allows multiple specimens to be subjected to mass spectrometry simultaneously. As a result, between-experiment variability is reduced. Here we describe use of fundamental concepts of statistical experimental design in the labeling framework in order to minimize variability and avoid biases. We demonstrate how to export data in the format that is most efficient for statistical analysis. We demonstrate how to assess the need for normalization, perform normalization, and check whether it worked. We describe how to build a model explaining the observed values and test for differential protein abundance along with descriptive statistics and measures of reliability of the findings. Concepts are illustrated through the use of three case studies utilizing the iTRAQ 4-plex labeling protocol.

  7. Joint QTL linkage mapping for multiple-cross mating design sharing one common parent

    USDA-ARS?s Scientific Manuscript database

    Nested association mapping (NAM) is a novel genetic mating design that combines the advantages of linkage analysis and association mapping. This design provides opportunities to study the inheritance of complex traits, but also requires more advanced statistical methods. In this paper, we present th...

  8. Statistical considerations in the analysis of data from replicated bioassays

    USDA-ARS?s Scientific Manuscript database

    Multiple-dose bioassay is generally the preferred method for characterizing virulence of insect pathogens. Linear regression of probit mortality on log dose enables estimation of LD50/LC50 and slope, the latter having substantial effect on LD90/95s (doses of considerable interest in pest management)...

  9. Influences of environment and disturbance on forest patterns in coastal Oregon watersheds.

    Treesearch

    Michael C. Wimberly; Thomas A. Spies

    2001-01-01

    Modern ecology often emphasizes the distinction between traditional theories of stable, environmentally structured communities and a new paradigm of disturbance driven, nonequilibrium dynamics. However, multiple hypotheses for observed vegetation patterns have seldom been explicitly tested. We used multivariate statistics and variation partitioning methods to assess...

  10. PARTIAL LEAST SQUARE ANALYSES FOR ASSOCIATION OF LANDSCAPE METRICS WITH WATER BIOLOGICAL AND CHEMICAL PROPERTIES IN THE SAVANNAH RIVER BASIN

    EPA Science Inventory

    Surface water quality is related to conditions in the surrounding geophysical environment, including soils, landcover, and anthropogenic activities. A number of statistical methods may be used to analyze and explore relationships among variables. Single-, multiple- and multivaria...

  11. Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework.

    PubMed

    Voillet, Valentin; Besse, Philippe; Liaubet, Laurence; San Cristobal, Magali; González, Ignacio

    2016-10-03

    In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue, we propose a multiple imputation (MI) approach in a multivariate framework. In this study, we focus on multiple factor analysis (MFA) as a tool to compare and integrate multiple layers of information. MI involves filling the missing rows with plausible values, resulting in M completed datasets. MFA is then applied to each completed dataset to produce M different configurations (the matrices of coordinates of individuals). Finally, the M configurations are combined to yield a single consensus solution. We assessed the performance of our method, named MI-MFA, on two real omics datasets. Incomplete artificial datasets with different patterns of missingness were created from these data. The MI-MFA results were compared with two other approaches i.e., regularized iterative MFA (RI-MFA) and mean variable imputation (MVI-MFA). For each configuration resulting from these three strategies, the suitability of the solution was determined against the true MFA configuration obtained from the original data and a comprehensive graphical comparison showing how the MI-, RI- or MVI-MFA configurations diverge from the true configuration was produced. Two approaches i.e., confidence ellipses and convex hulls, to visualize and assess the uncertainty due to missing values were also described. We showed how the areas of ellipses and convex hulls increased with the number of missing individuals. A free and easy-to-use code was proposed to implement the MI-MFA method in the R statistical environment. We believe that MI-MFA provides a useful and attractive method for estimating the coordinates of individuals on the first MFA components despite missing rows. MI-MFA configurations were close to the true configuration even when many individuals were missing in several data tables. This method takes into account the uncertainty of MI-MFA configurations induced by the missing rows, thereby allowing the reliability of the results to be evaluated.

  12. Light-transmittance predictions under multiple-light-scattering conditions. I. Direct problem: hybrid-method approximation.

    PubMed

    Czerwiński, M; Mroczka, J; Girasole, T; Gouesbet, G; Gréhan, G

    2001-03-20

    Our aim is to present a method of predicting light transmittances through dense three-dimensional layered media. A hybrid method is introduced as a combination of the four-flux method with coefficients predicted from a Monte Carlo statistical model to take into account the actual three-dimensional geometry of the problem under study. We present the principles of the hybrid method, some exemplifying results of numerical simulations, and their comparison with results obtained from Bouguer-Lambert-Beer law and from Monte Carlo simulations.

  13. Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits.

    PubMed

    Zhang, Futao; Xie, Dan; Liang, Meimei; Xiong, Momiao

    2016-04-01

    To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI's Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.

  14. Waste generated in high-rise buildings construction: a quantification model based on statistical multiple regression.

    PubMed

    Parisi Kern, Andrea; Ferreira Dias, Michele; Piva Kulakowski, Marlova; Paulo Gomes, Luciana

    2015-05-01

    Reducing construction waste is becoming a key environmental issue in the construction industry. The quantification of waste generation rates in the construction sector is an invaluable management tool in supporting mitigation actions. However, the quantification of waste can be a difficult process because of the specific characteristics and the wide range of materials used in different construction projects. Large variations are observed in the methods used to predict the amount of waste generated because of the range of variables involved in construction processes and the different contexts in which these methods are employed. This paper proposes a statistical model to determine the amount of waste generated in the construction of high-rise buildings by assessing the influence of design process and production system, often mentioned as the major culprits behind the generation of waste in construction. Multiple regression was used to conduct a case study based on multiple sources of data of eighteen residential buildings. The resulting statistical model produced dependent (i.e. amount of waste generated) and independent variables associated with the design and the production system used. The best regression model obtained from the sample data resulted in an adjusted R(2) value of 0.694, which means that it predicts approximately 69% of the factors involved in the generation of waste in similar constructions. Most independent variables showed a low determination coefficient when assessed in isolation, which emphasizes the importance of assessing their joint influence on the response (dependent) variable. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Missing CD4+ cell response in randomized clinical trials of maraviroc and dolutegravir.

    PubMed

    Cuffe, Robert; Barnett, Carly; Granier, Catherine; Machida, Mitsuaki; Wang, Cunshan; Roger, James

    2015-10-01

    Missing data can compromise inferences from clinical trials, yet the topic has received little attention in the clinical trial community. Shortcomings in commonly used methods used to analyze studies with missing data (complete case, last- or baseline-observation carried forward) have been highlighted in a recent Food and Drug Administration-sponsored report. This report recommends how to mitigate the issues associated with missing data. We present an example of the proposed concepts using data from recent clinical trials. CD4+ cell count data from the previously reported SINGLE and MOTIVATE studies of dolutegravir and maraviroc were analyzed using a variety of statistical methods to explore the impact of missing data. Four methodologies were used: complete case analysis, simple imputation, mixed models for repeated measures, and multiple imputation. We compared the sensitivity of conclusions to the volume of missing data and to the assumptions underpinning each method. Rates of missing data were greater in the MOTIVATE studies (35%-68% premature withdrawal) than in SINGLE (12%-20%). The sensitivity of results to assumptions about missing data was related to volume of missing data. Estimates of treatment differences by various analysis methods ranged across a 61 cells/mm3 window in MOTIVATE and a 22 cells/mm3 window in SINGLE. Where missing data are anticipated, analyses require robust statistical and clinical debate of the necessary but unverifiable underlying statistical assumptions. Multiple imputation makes these assumptions transparent, can accommodate a broad range of scenarios, and is a natural analysis for clinical trials in HIV with missing data.

  16. Expected p-values in light of an ROC curve analysis applied to optimal multiple testing procedures.

    PubMed

    Vexler, Albert; Yu, Jihnhee; Zhao, Yang; Hutson, Alan D; Gurevich, Gregory

    2017-01-01

    Many statistical studies report p-values for inferential purposes. In several scenarios, the stochastic aspect of p-values is neglected, which may contribute to drawing wrong conclusions in real data experiments. The stochastic nature of p-values makes their use to examine the performance of given testing procedures or associations between investigated factors to be difficult. We turn our focus on the modern statistical literature to address the expected p-value (EPV) as a measure of the performance of decision-making rules. During the course of our study, we prove that the EPV can be considered in the context of receiver operating characteristic (ROC) curve analysis, a well-established biostatistical methodology. The ROC-based framework provides a new and efficient methodology for investigating and constructing statistical decision-making procedures, including: (1) evaluation and visualization of properties of the testing mechanisms, considering, e.g. partial EPVs; (2) developing optimal tests via the minimization of EPVs; (3) creation of novel methods for optimally combining multiple test statistics. We demonstrate that the proposed EPV-based approach allows us to maximize the integrated power of testing algorithms with respect to various significance levels. In an application, we use the proposed method to construct the optimal test and analyze a myocardial infarction disease dataset. We outline the usefulness of the "EPV/ROC" technique for evaluating different decision-making procedures, their constructions and properties with an eye towards practical applications.

  17. Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions.

    PubMed

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y; Chen, Wei

    2016-02-01

    Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. © 2016 WILEY PERIODICALS, INC.

  18. Gene-based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions

    PubMed Central

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E.; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y.; Chen, Wei

    2015-01-01

    Summary Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, we develop here Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT) which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. PMID:26782979

  19. Bayesian estimation of the transmissivity spatial structure from pumping test data

    NASA Astrophysics Data System (ADS)

    Demir, Mehmet Taner; Copty, Nadim K.; Trinchero, Paolo; Sanchez-Vila, Xavier

    2017-06-01

    Estimating the statistical parameters (mean, variance, and integral scale) that define the spatial structure of the transmissivity or hydraulic conductivity fields is a fundamental step for the accurate prediction of subsurface flow and contaminant transport. In practice, the determination of the spatial structure is a challenge because of spatial heterogeneity and data scarcity. In this paper, we describe a novel approach that uses time drawdown data from multiple pumping tests to determine the transmissivity statistical spatial structure. The method builds on the pumping test interpretation procedure of Copty et al. (2011) (Continuous Derivation method, CD), which uses the time-drawdown data and its time derivative to estimate apparent transmissivity values as a function of radial distance from the pumping well. A Bayesian approach is then used to infer the statistical parameters of the transmissivity field by combining prior information about the parameters and the likelihood function expressed in terms of radially-dependent apparent transmissivities determined from pumping tests. A major advantage of the proposed Bayesian approach is that the likelihood function is readily determined from randomly generated multiple realizations of the transmissivity field, without the need to solve the groundwater flow equation. Applying the method to synthetically-generated pumping test data, we demonstrate that, through a relatively simple procedure, information on the spatial structure of the transmissivity may be inferred from pumping tests data. It is also shown that the prior parameter distribution has a significant influence on the estimation procedure, given the non-uniqueness of the estimation procedure. Results also indicate that the reliability of the estimated transmissivity statistical parameters increases with the number of available pumping tests.

  20. Predicting trauma patient mortality: ICD [or ICD-10-AM] versus AIS based approaches.

    PubMed

    Willis, Cameron D; Gabbe, Belinda J; Jolley, Damien; Harrison, James E; Cameron, Peter A

    2010-11-01

    The International Classification of Diseases Injury Severity Score (ICISS) has been proposed as an International Classification of Diseases (ICD)-10-based alternative to mortality prediction tools that use Abbreviated Injury Scale (AIS) data, including the Trauma and Injury Severity Score (TRISS). To date, studies have not examined the performance of ICISS using Australian trauma registry data. This study aimed to compare the performance of ICISS with other mortality prediction tools in an Australian trauma registry. This was a retrospective review of prospectively collected data from the Victorian State Trauma Registry. A training dataset was created for model development and a validation dataset for evaluation. The multiplicative ICISS model was compared with a worst injury ICISS approach, Victorian TRISS (V-TRISS, using local coefficients), maximum AIS severity and a multivariable model including ICD-10-AM codes as predictors. Models were investigated for discrimination (C-statistic) and calibration (Hosmer-Lemeshow statistic). The multivariable approach had the highest level of discrimination (C-statistic 0.90) and calibration (H-L 7.65, P= 0.468). Worst injury ICISS, V-TRISS and maximum AIS had similar performance. The multiplicative ICISS produced the lowest level of discrimination (C-statistic 0.80) and poorest calibration (H-L 50.23, P < 0.001). The performance of ICISS may be affected by the data used to develop estimates, the ICD version employed, the methods for deriving estimates and the inclusion of covariates. In this analysis, a multivariable approach using ICD-10-AM codes was the best-performing method. A multivariable ICISS approach may therefore be a useful alternative to AIS-based methods and may have comparable predictive performance to locally derived TRISS models. © 2010 The Authors. ANZ Journal of Surgery © 2010 Royal Australasian College of Surgeons.

  1. Comparing Visual and Statistical Analysis of Multiple Baseline Design Graphs.

    PubMed

    Wolfe, Katie; Dickenson, Tammiee S; Miller, Bridget; McGrath, Kathleen V

    2018-04-01

    A growing number of statistical analyses are being developed for single-case research. One important factor in evaluating these methods is the extent to which each corresponds to visual analysis. Few studies have compared statistical and visual analysis, and information about more recently developed statistics is scarce. Therefore, our purpose was to evaluate the agreement between visual analysis and four statistical analyses: improvement rate difference (IRD); Tau-U; Hedges, Pustejovsky, Shadish (HPS) effect size; and between-case standardized mean difference (BC-SMD). Results indicate that IRD and BC-SMD had the strongest overall agreement with visual analysis. Although Tau-U had strong agreement with visual analysis on raw values, it had poorer agreement when those values were dichotomized to represent the presence or absence of a functional relation. Overall, visual analysis appeared to be more conservative than statistical analysis, but further research is needed to evaluate the nature of these disagreements.

  2. Identifying when tagged fishes have been consumed by piscivorous predators: application of multivariate mixture models to movement parameters of telemetered fishes

    USGS Publications Warehouse

    Romine, Jason G.; Perry, Russell W.; Johnston, Samuel V.; Fitzer, Christopher W.; Pagliughi, Stephen W.; Blake, Aaron R.

    2013-01-01

    Mixture models proved valuable as a means to differentiate between salmonid smolts and predators that consumed salmonid smolts. However, successful application of this method requires that telemetered fishes and their predators exhibit measurable differences in movement behavior. Our approach is flexible, allows inclusion of multiple track statistics and improves upon rule-based manual classification methods.

  3. Detecting and removing multiplicative spatial bias in high-throughput screening technologies.

    PubMed

    Caraus, Iurie; Mazoure, Bogdan; Nadon, Robert; Makarenkov, Vladimir

    2017-10-15

    Considerable attention has been paid recently to improve data quality in high-throughput screening (HTS) and high-content screening (HCS) technologies widely used in drug development and chemical toxicity research. However, several environmentally- and procedurally-induced spatial biases in experimental HTS and HCS screens decrease measurement accuracy, leading to increased numbers of false positives and false negatives in hit selection. Although effective bias correction methods and software have been developed over the past decades, almost all of these tools have been designed to reduce the effect of additive bias only. Here, we address the case of multiplicative spatial bias. We introduce three new statistical methods meant to reduce multiplicative spatial bias in screening technologies. We assess the performance of the methods with synthetic and real data affected by multiplicative spatial bias, including comparisons with current bias correction methods. We also describe a wider data correction protocol that integrates methods for removing both assay and plate-specific spatial biases, which can be either additive or multiplicative. The methods for removing multiplicative spatial bias and the data correction protocol are effective in detecting and cleaning experimental data generated by screening technologies. As our protocol is of a general nature, it can be used by researchers analyzing current or next-generation high-throughput screens. The AssayCorrector program, implemented in R, is available on CRAN. makarenkov.vladimir@uqam.ca. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  4. Prediction system of hydroponic plant growth and development using algorithm Fuzzy Mamdani method

    NASA Astrophysics Data System (ADS)

    Sudana, I. Made; Purnawirawan, Okta; Arief, Ulfa Mediaty

    2017-03-01

    Hydroponics is a method of farming without soil. One of the Hydroponic plants is Watercress (Nasturtium Officinale). The development and growth process of hydroponic Watercress was influenced by levels of nutrients, acidity and temperature. The independent variables can be used as input variable system to predict the value level of plants growth and development. The prediction system is using Fuzzy Algorithm Mamdani method. This system was built to implement the function of Fuzzy Inference System (Fuzzy Inference System/FIS) as a part of the Fuzzy Logic Toolbox (FLT) by using MATLAB R2007b. FIS is a computing system that works on the principle of fuzzy reasoning which is similar to humans' reasoning. Basically FIS consists of four units which are fuzzification unit, fuzzy logic reasoning unit, base knowledge unit and defuzzification unit. In addition to know the effect of independent variables on the plants growth and development that can be visualized with the function diagram of FIS output surface that is shaped three-dimensional, and statistical tests based on the data from the prediction system using multiple linear regression method, which includes multiple linear regression analysis, T test, F test, the coefficient of determination and donations predictor that are calculated using SPSS (Statistical Product and Service Solutions) software applications.

  5. Sequential Monte Carlo tracking of the marginal artery by multiple cue fusion and random forest regression.

    PubMed

    Cherry, Kevin M; Peplinski, Brandon; Kim, Lauren; Wang, Shijun; Lu, Le; Zhang, Weidong; Liu, Jianfei; Wei, Zhuoshi; Summers, Ronald M

    2015-01-01

    Given the potential importance of marginal artery localization in automated registration in computed tomography colonography (CTC), we have devised a semi-automated method of marginal vessel detection employing sequential Monte Carlo tracking (also known as particle filtering tracking) by multiple cue fusion based on intensity, vesselness, organ detection, and minimum spanning tree information for poorly enhanced vessel segments. We then employed a random forest algorithm for intelligent cue fusion and decision making which achieved high sensitivity and robustness. After applying a vessel pruning procedure to the tracking results, we achieved statistically significantly improved precision compared to a baseline Hessian detection method (2.7% versus 75.2%, p<0.001). This method also showed statistically significantly improved recall rate compared to a 2-cue baseline method using fewer vessel cues (30.7% versus 67.7%, p<0.001). These results demonstrate that marginal artery localization on CTC is feasible by combining a discriminative classifier (i.e., random forest) with a sequential Monte Carlo tracking mechanism. In so doing, we present the effective application of an anatomical probability map to vessel pruning as well as a supplementary spatial coordinate system for colonic segmentation and registration when this task has been confounded by colon lumen collapse. Published by Elsevier B.V.

  6. The potential for increased power from combining P-values testing the same hypothesis.

    PubMed

    Ganju, Jitendra; Julie Ma, Guoguang

    2017-02-01

    The conventional approach to hypothesis testing for formal inference is to prespecify a single test statistic thought to be optimal. However, we usually have more than one test statistic in mind for testing the null hypothesis of no treatment effect but we do not know which one is the most powerful. Rather than relying on a single p-value, combining p-values from prespecified multiple test statistics can be used for inference. Combining functions include Fisher's combination test and the minimum p-value. Using randomization-based tests, the increase in power can be remarkable when compared with a single test and Simes's method. The versatility of the method is that it also applies when the number of covariates exceeds the number of observations. The increase in power is large enough to prefer combined p-values over a single p-value. The limitation is that the method does not provide an unbiased estimator of the treatment effect and does not apply to situations when the model includes treatment by covariate interaction.

  7. Statistical Model Selection for TID Hardness Assurance

    NASA Technical Reports Server (NTRS)

    Ladbury, R.; Gorelick, J. L.; McClure, S.

    2010-01-01

    Radiation Hardness Assurance (RHA) methodologies against Total Ionizing Dose (TID) degradation impose rigorous statistical treatments for data from a part's Radiation Lot Acceptance Test (RLAT) and/or its historical performance. However, no similar methods exist for using "similarity" data - that is, data for similar parts fabricated in the same process as the part under qualification. This is despite the greater difficulty and potential risk in interpreting of similarity data. In this work, we develop methods to disentangle part-to-part, lot-to-lot and part-type-to-part-type variation. The methods we develop apply not just for qualification decisions, but also for quality control and detection of process changes and other "out-of-family" behavior. We begin by discussing the data used in ·the study and the challenges of developing a statistic providing a meaningful measure of degradation across multiple part types, each with its own performance specifications. We then develop analysis techniques and apply them to the different data sets.

  8. Survival analysis in hematologic malignancies: recommendations for clinicians

    PubMed Central

    Delgado, Julio; Pereira, Arturo; Villamor, Neus; López-Guillermo, Armando; Rozman, Ciril

    2014-01-01

    The widespread availability of statistical packages has undoubtedly helped hematologists worldwide in the analysis of their data, but has also led to the inappropriate use of statistical methods. In this article, we review some basic concepts of survival analysis and also make recommendations about how and when to perform each particular test using SPSS, Stata and R. In particular, we describe a simple way of defining cut-off points for continuous variables and the appropriate and inappropriate uses of the Kaplan-Meier method and Cox proportional hazard regression models. We also provide practical advice on how to check the proportional hazards assumption and briefly review the role of relative survival and multiple imputation. PMID:25176982

  9. Fine-mapping additive and dominant SNP effects using group-LASSO and Fractional Resample Model Averaging

    PubMed Central

    Sabourin, Jeremy; Nobel, Andrew B.; Valdar, William

    2014-01-01

    Genomewide association studies sometimes identify loci at which both the number and identities of the underlying causal variants are ambiguous. In such cases, statistical methods that model effects of multiple SNPs simultaneously can help disentangle the observed patterns of association and provide information about how those SNPs could be prioritized for follow-up studies. Current multi-SNP methods, however, tend to assume that SNP effects are well captured by additive genetics; yet when genetic dominance is present, this assumption translates to reduced power and faulty prioritizations. We describe a statistical procedure for prioritizing SNPs at GWAS loci that efficiently models both additive and dominance effects. Our method, LLARRMA-dawg, combines a group LASSO procedure for sparse modeling of multiple SNP effects with a resampling procedure based on fractional observation weights; it estimates for each SNP the robustness of association with the phenotype both to sampling variation and to competing explanations from other SNPs. In producing a SNP prioritization that best identifies underlying true signals, we show that: our method easily outperforms a single marker analysis; when additive-only signals are present, our joint model for additive and dominance is equivalent to or only slightly less powerful than modeling additive-only effects; and, when dominance signals are present, even in combination with substantial additive effects, our joint model is unequivocally more powerful than a model assuming additivity. We also describe how performance can be improved through calibrated randomized penalization, and discuss how dominance in ungenotyped SNPs can be incorporated through either heterozygote dosage or multiple imputation. PMID:25417853

  10. A guide to missing data for the pediatric nephrologist.

    PubMed

    Larkins, Nicholas G; Craig, Jonathan C; Teixeira-Pinto, Armando

    2018-03-13

    Missing data is an important and common source of bias in clinical research. Readers should be alert to and consider the impact of missing data when reading studies. Beyond preventing missing data in the first place, through good study design and conduct, there are different strategies available to handle data containing missing observations. Complete case analysis is often biased unless data are missing completely at random. Better methods of handling missing data include multiple imputation and models using likelihood-based estimation. With advancing computing power and modern statistical software, these methods are within the reach of clinician-researchers under guidance of a biostatistician. As clinicians reading papers, we need to continue to update our understanding of statistical methods, so that we understand the limitations of these techniques and can critically interpret literature.

  11. Accelerating the weighted histogram analysis method by direct inversion in the iterative subspace.

    PubMed

    Zhang, Cheng; Lai, Chun-Liang; Pettitt, B Montgomery

    The weighted histogram analysis method (WHAM) for free energy calculations is a valuable tool to produce free energy differences with the minimal errors. Given multiple simulations, WHAM obtains from the distribution overlaps the optimal statistical estimator of the density of states, from which the free energy differences can be computed. The WHAM equations are often solved by an iterative procedure. In this work, we use a well-known linear algebra algorithm which allows for more rapid convergence to the solution. We find that the computational complexity of the iterative solution to WHAM and the closely-related multiple Bennett acceptance ratio (MBAR) method can be improved by using the method of direct inversion in the iterative subspace. We give examples from a lattice model, a simple liquid and an aqueous protein solution.

  12. Statistical detection of EEG synchrony using empirical bayesian inference.

    PubMed

    Singh, Archana K; Asoh, Hideki; Takeda, Yuji; Phillips, Steven

    2015-01-01

    There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries.

  13. An appraisal of statistical procedures used in derivation of reference intervals.

    PubMed

    Ichihara, Kiyoshi; Boyd, James C

    2010-11-01

    When conducting studies to derive reference intervals (RIs), various statistical procedures are commonly applied at each step, from the planning stages to final computation of RIs. Determination of the necessary sample size is an important consideration, and evaluation of at least 400 individuals in each subgroup has been recommended to establish reliable common RIs in multicenter studies. Multiple regression analysis allows identification of the most important factors contributing to variation in test results, while accounting for possible confounding relationships among these factors. Of the various approaches proposed for judging the necessity of partitioning reference values, nested analysis of variance (ANOVA) is the likely method of choice owing to its ability to handle multiple groups and being able to adjust for multiple factors. Box-Cox power transformation often has been used to transform data to a Gaussian distribution for parametric computation of RIs. However, this transformation occasionally fails. Therefore, the non-parametric method based on determination of the 2.5 and 97.5 percentiles following sorting of the data, has been recommended for general use. The performance of the Box-Cox transformation can be improved by introducing an additional parameter representing the origin of transformation. In simulations, the confidence intervals (CIs) of reference limits (RLs) calculated by the parametric method were narrower than those calculated by the non-parametric approach. However, the margin of difference was rather small owing to additional variability in parametrically-determined RLs introduced by estimation of parameters for the Box-Cox transformation. The parametric calculation method may have an advantage over the non-parametric method in allowing identification and exclusion of extreme values during RI computation.

  14. Detecting disease-predisposing variants: the haplotype method.

    PubMed Central

    Valdes, A M; Thomson, G

    1997-01-01

    For many HLA-associated diseases, multiple alleles-- and, in some cases, multiple loci--have been suggested as the causative agents. The haplotype method for identifying disease-predisposing amino acids in a genetic region is a stratification analysis. We show that, for each haplotype combination containing all the amino acid sites involved in the disease process, the relative frequencies of amino acid variants at sites not involved in disease but in linkage disequilibrium with the disease-predisposing sites are expected to be the same in patients and controls. The haplotype method is robust to mode of inheritance and penetrance of the disease and can be used to determine unequivocally whether all amino acid sites involved in the disease have not been identified. Using a resampling technique, we developed a statistical test that takes account of the nonindependence of the sites sampled. Further, when multiple sites in the genetic region are involved in disease, the test statistic gives a closer fit to the null expectation when some--compared with none--of the true predisposing factors are included in the haplotype analysis. Although the haplotype method cannot distinguish between very highly correlated sites in one population, ethnic comparisons may help identify the true predisposing factors. The haplotype method was applied to insulin-dependent diabetes mellitus (IDDM) HLA class II DQA1-DQB1 data from Caucasian, African, and Japanese populations. Our results indicate that the combination DQA1#52 (Arg predisposing) DQB1#57 (Asp protective), which has been proposed as an important IDDM agent, does not include all the predisposing elements. With rheumatoid arthritis HLA class II DRB1 data, the results were consistent with the shared-epitope hypothesis. PMID:9042931

  15. Using regression equations built from summary data in the psychological assessment of the individual case: extension to multiple regression.

    PubMed

    Crawford, John R; Garthwaite, Paul H; Denham, Annie K; Chelune, Gordon J

    2012-12-01

    Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because (a) not all psychologists are aware that regression equations can be built not only from raw data but also using only basic summary data for a sample, and (b) the computations involved are tedious and prone to error. In an attempt to overcome these barriers, Crawford and Garthwaite (2007) provided methods to build and apply simple linear regression models using summary statistics as data. In the present study, we extend this work to set out the steps required to build multiple regression models from sample summary statistics and the further steps required to compute the associated statistics for drawing inferences concerning an individual case. We also develop, describe, and make available a computer program that implements these methods. Although there are caveats associated with the use of the methods, these need to be balanced against pragmatic considerations and against the alternative of either entirely ignoring a pertinent data set or using it informally to provide a clinical "guesstimate." Upgraded versions of earlier programs for regression in the single case are also provided; these add the point and interval estimates of effect size developed in the present article.

  16. A statistical approach to selecting and confirming validation targets in -omics experiments

    PubMed Central

    2012-01-01

    Background Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. Results Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. Conclusions For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. PMID:22738145

  17. A multiple-point spatially weighted k-NN method for object-based classification

    NASA Astrophysics Data System (ADS)

    Tang, Yunwei; Jing, Linhai; Li, Hui; Atkinson, Peter M.

    2016-10-01

    Object-based classification, commonly referred to as object-based image analysis (OBIA), is now commonly regarded as able to produce more appealing classification maps, often of greater accuracy, than pixel-based classification and its application is now widespread. Therefore, improvement of OBIA using spatial techniques is of great interest. In this paper, multiple-point statistics (MPS) is proposed for object-based classification enhancement in the form of a new multiple-point k-nearest neighbour (k-NN) classification method (MPk-NN). The proposed method first utilises a training image derived from a pre-classified map to characterise the spatial correlation between multiple points of land cover classes. The MPS borrows spatial structures from other parts of the training image, and then incorporates this spatial information, in the form of multiple-point probabilities, into the k-NN classifier. Two satellite sensor images with a fine spatial resolution were selected to evaluate the new method. One is an IKONOS image of the Beijing urban area and the other is a WorldView-2 image of the Wolong mountainous area, in China. The images were object-based classified using the MPk-NN method and several alternatives, including the k-NN, the geostatistically weighted k-NN, the Bayesian method, the decision tree classifier (DTC), and the support vector machine classifier (SVM). It was demonstrated that the new spatial weighting based on MPS can achieve greater classification accuracy relative to the alternatives and it is, thus, recommended as appropriate for object-based classification.

  18. Comparison of two stand-alone CADe systems at multiple operating points

    NASA Astrophysics Data System (ADS)

    Sahiner, Berkman; Chen, Weijie; Pezeshk, Aria; Petrick, Nicholas

    2015-03-01

    Computer-aided detection (CADe) systems are typically designed to work at a given operating point: The device displays a mark if and only if the level of suspiciousness of a region of interest is above a fixed threshold. To compare the standalone performances of two systems, one approach is to select the parameters of the systems to yield a target false-positive rate that defines the operating point, and to compare the sensitivities at that operating point. Increasingly, CADe developers offer multiple operating points, which necessitates the comparison of two CADe systems involving multiple comparisons. To control the Type I error, multiple-comparison correction is needed for keeping the family-wise error rate (FWER) less than a given alpha-level. The sensitivities of a single modality at different operating points are correlated. In addition, the sensitivities of the two modalities at the same or different operating points are also likely to be correlated. It has been shown in the literature that when test statistics are correlated, well-known methods for controlling the FWER are conservative. In this study, we compared the FWER and power of three methods, namely the Bonferroni, step-up, and adjusted step-up methods in comparing the sensitivities of two CADe systems at multiple operating points, where the adjusted step-up method uses the estimated correlations. Our results indicate that the adjusted step-up method has a substantial advantage over other the two methods both in terms of the FWER and power.

  19. Analysis of correlation between pediatric asthma exacerbation and exposure to pollutant mixtures with association rule mining.

    PubMed

    Toti, Giulia; Vilalta, Ricardo; Lindner, Peggy; Lefer, Barry; Macias, Charles; Price, Daniel

    2016-11-01

    Traditional studies on effects of outdoor pollution on asthma have been criticized for questionable statistical validity and inefficacy in exploring the effects of multiple air pollutants, alone and in combination. Association rule mining (ARM), a method easily interpretable and suitable for the analysis of the effects of multiple exposures, could be of use, but the traditional interest metrics of support and confidence need to be substituted with metrics that focus on risk variations caused by different exposures. We present an ARM-based methodology that produces rules associated with relevant odds ratios and limits the number of final rules even at very low support levels (0.5%), thanks to post-pruning criteria that limit rule redundancy and control for statistical significance. The methodology has been applied to a case-crossover study to explore the effects of multiple air pollutants on risk of asthma in pediatric subjects. We identified 27 rules with interesting odds ratio among more than 10,000 having the required support. The only rule including only one chemical is exposure to ozone on the previous day of the reported asthma attack (OR=1.14). 26 combinatory rules highlight the limitations of air quality policies based on single pollutant thresholds and suggest that exposure to mixtures of chemicals is more harmful, with odds ratio as high as 1.54 (associated with the combination day0 SO 2 , day0 NO, day0 NO 2 , day1 PM). The proposed method can be used to analyze risk variations caused by single and multiple exposures. The method is reliable and requires fewer assumptions on the data than parametric approaches. Rules including more than one pollutant highlight interactions that deserve further investigation, while helping to limit the search field. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Calculating stage duration statistics in multistage diseases.

    PubMed

    Komarova, Natalia L; Thalhauser, Craig J

    2011-01-01

    Many human diseases are characterized by multiple stages of progression. While the typical sequence of disease progression can be identified, there may be large individual variations among patients. Identifying mean stage durations and their variations is critical for statistical hypothesis testing needed to determine if treatment is having a significant effect on the progression, or if a new therapy is showing a delay of progression through a multistage disease. In this paper we focus on two methods for extracting stage duration statistics from longitudinal datasets: an extension of the linear regression technique, and a counting algorithm. Both are non-iterative, non-parametric and computationally cheap methods, which makes them invaluable tools for studying the epidemiology of diseases, with a goal of identifying different patterns of progression by using bioinformatics methodologies. Here we show that the regression method performs well for calculating the mean stage durations under a wide variety of assumptions, however, its generalization to variance calculations fails under realistic assumptions about the data collection procedure. On the other hand, the counting method yields reliable estimations for both means and variances of stage durations. Applications to Alzheimer disease progression are discussed.

  1. Regression modeling of ground-water flow

    USGS Publications Warehouse

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  2. Research in Computational Astrobiology

    NASA Technical Reports Server (NTRS)

    Chaban, Galina; Colombano, Silvano; Scargle, Jeff; New, Michael H.; Pohorille, Andrew; Wilson, Michael A.

    2003-01-01

    We report on several projects in the field of computational astrobiology, which is devoted to advancing our understanding of the origin, evolution and distribution of life in the Universe using theoretical and computational tools. Research projects included modifying existing computer simulation codes to use efficient, multiple time step algorithms, statistical methods for analysis of astrophysical data via optimal partitioning methods, electronic structure calculations on water-nuclei acid complexes, incorporation of structural information into genomic sequence analysis methods and calculations of shock-induced formation of polycylic aromatic hydrocarbon compounds.

  3. The High Prevalence of the Varicella Zoster Virus in Patients With Relapsing-Remitting Multiple Sclerosis: A Case-Control Study in the North of Iran

    PubMed Central

    Najafi, Saeideh; Ghane, Masood; Yousefzadeh-Chabok, Shahrokh; Amiri, Mehdi

    2016-01-01

    Background Multiple sclerosis (MS) is the most common neurological autoimmune disease, characterized by multifocal areas of inflammatory demyelination within the central nervous system. It has been hypothesized that the stimulation of the immune system by viral infections is the leading cause of MS among susceptible individuals. Objectives The aim of this study was to investigate the prevalence of the varicella zoster virus (VZV) in patients with relapsing-remitting multiple sclerosis. Patients and Methods Plasma and peripheral blood mononuclear cells (PBMCs) collected from MS patients (n = 82) and controls (n = 89) were screened for the presence of anti-VZV antibodies and VZV DNA by the ELISA and PCR methods. DNA was extracted from all samples, and VZV infection was examined by the PCR technique. Statistical analysis was used to investigate the frequency of the virus in MS patients and a healthy control group. Results Of all the MS patients, 78 (95.1%) and 21 (25.6%) were positive for anti-VZV and VZV DNA, respectively. Statistical analysis of the PCR results showed a significant correlation between the abundance of VZV and MS disease (P < 0.001). However, there was no significant correlation between the abundance of anti-VZV antibodies and MS disease by the ELISA method. Conclusions These results support the hypothesis that VZV may contribute to MS in establishing a systemic infection process and inducing an immune response. PMID:27226879

  4. Identifying Causal Variants at Loci with Multiple Signals of Association

    PubMed Central

    Hormozdiari, Farhad; Kostem, Emrah; Kang, Eun Yong; Pasaniuc, Bogdan; Eskin, Eleazar

    2014-01-01

    Although genome-wide association studies have successfully identified thousands of risk loci for complex traits, only a handful of the biologically causal variants, responsible for association at these loci, have been successfully identified. Current statistical methods for identifying causal variants at risk loci either use the strength of the association signal in an iterative conditioning framework or estimate probabilities for variants to be causal. A main drawback of existing methods is that they rely on the simplifying assumption of a single causal variant at each risk locus, which is typically invalid at many risk loci. In this work, we propose a new statistical framework that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal. A direct benefit of our approach is that we predict a set of variants for each locus that under reasonable assumptions will contain all of the true causal variants with a high confidence level (e.g., 95%) even when the locus contains multiple causal variants. We use simulations to show that our approach provides 20–50% improvement in our ability to identify the causal variants compared to the existing methods at loci harboring multiple causal variants. We validate our approach using empirical data from an expression QTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus. CAVIAR is publicly available online at http://genetics.cs.ucla.edu/caviar/. PMID:25104515

  5. Identifying causal variants at loci with multiple signals of association.

    PubMed

    Hormozdiari, Farhad; Kostem, Emrah; Kang, Eun Yong; Pasaniuc, Bogdan; Eskin, Eleazar

    2014-10-01

    Although genome-wide association studies have successfully identified thousands of risk loci for complex traits, only a handful of the biologically causal variants, responsible for association at these loci, have been successfully identified. Current statistical methods for identifying causal variants at risk loci either use the strength of the association signal in an iterative conditioning framework or estimate probabilities for variants to be causal. A main drawback of existing methods is that they rely on the simplifying assumption of a single causal variant at each risk locus, which is typically invalid at many risk loci. In this work, we propose a new statistical framework that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal. A direct benefit of our approach is that we predict a set of variants for each locus that under reasonable assumptions will contain all of the true causal variants with a high confidence level (e.g., 95%) even when the locus contains multiple causal variants. We use simulations to show that our approach provides 20-50% improvement in our ability to identify the causal variants compared to the existing methods at loci harboring multiple causal variants. We validate our approach using empirical data from an expression QTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus. CAVIAR is publicly available online at http://genetics.cs.ucla.edu/caviar/. Copyright © 2014 by the Genetics Society of America.

  6. A Residual Mass Ballistic Testing Method to Compare Armor Materials or Components (Residual Mass Ballistic Testing Method)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Benjamin Langhorst; Thomas M Lillo; Henry S Chu

    2014-05-01

    A statistics based ballistic test method is presented for use when comparing multiple groups of test articles of unknown relative ballistic perforation resistance. The method is intended to be more efficient than many traditional methods for research and development testing. To establish the validity of the method, it is employed in this study to compare test groups of known relative ballistic performance. Multiple groups of test articles were perforated using consistent projectiles and impact conditions. Test groups were made of rolled homogeneous armor (RHA) plates and differed in thickness. After perforation, each residual projectile was captured behind the target andmore » its mass was measured. The residual masses measured for each test group were analyzed to provide ballistic performance rankings with associated confidence levels. When compared to traditional V50 methods, the residual mass (RM) method was found to require fewer test events and be more tolerant of variations in impact conditions.« less

  7. Nonparametric rank regression for analyzing water quality concentration data with multiple detection limits.

    PubMed

    Fu, Liya; Wang, You-Gan

    2011-02-15

    Environmental data usually include measurements, such as water quality data, which fall below detection limits, because of limitations of the instruments or of certain analytical methods used. The fact that some responses are not detected needs to be properly taken into account in statistical analysis of such data. However, it is well-known that it is challenging to analyze a data set with detection limits, and we often have to rely on the traditional parametric methods or simple imputation methods. Distributional assumptions can lead to biased inference and justification of distributions is often not possible when the data are correlated and there is a large proportion of data below detection limits. The extent of bias is usually unknown. To draw valid conclusions and hence provide useful advice for environmental management authorities, it is essential to develop and apply an appropriate statistical methodology. This paper proposes rank-based procedures for analyzing non-normally distributed data collected at different sites over a period of time in the presence of multiple detection limits. To take account of temporal correlations within each site, we propose an optimal linear combination of estimating functions and apply the induced smoothing method to reduce the computational burden. Finally, we apply the proposed method to the water quality data collected at Susquehanna River Basin in United States of America, which clearly demonstrates the advantages of the rank regression models.

  8. Ab initio multiple cloning simulations of pyrrole photodissociation: TKER spectra and velocity map imaging

    DOE PAGES

    Makhov, Dmitry V.; Saita, Kenichiro; Martinez, Todd J.; ...

    2014-12-11

    In this study, we report a detailed computational simulation of the photodissociation of pyrrole using the ab initio Multiple Cloning (AIMC) method implemented within MOLPRO. The efficiency of the AIMC implementation, employing train basis sets, linear approximation for matrix elements, and Ehrenfest configuration cloning, allows us to accumulate significant statistics. We calculate and analyze the total kinetic energy release (TKER) spectrum and Velocity Map Imaging (VMI) of pyrrole and compare the results directly with experimental measurements. Both the TKER spectrum and the structure of the velocity map image (VMI) are well reproduced. Previously, it has been assumed that the isotropicmore » component of the VMI arises from long time statistical dissociation. Instead, our simulations suggest that ultrafast dynamics contributes significantly to both low and high energy portions of the TKER spectrum.« less

  9. Ab initio multiple cloning simulations of pyrrole photodissociation: TKER spectra and velocity map imaging

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Makhov, Dmitry V.; Saita, Kenichiro; Martinez, Todd J.

    In this study, we report a detailed computational simulation of the photodissociation of pyrrole using the ab initio Multiple Cloning (AIMC) method implemented within MOLPRO. The efficiency of the AIMC implementation, employing train basis sets, linear approximation for matrix elements, and Ehrenfest configuration cloning, allows us to accumulate significant statistics. We calculate and analyze the total kinetic energy release (TKER) spectrum and Velocity Map Imaging (VMI) of pyrrole and compare the results directly with experimental measurements. Both the TKER spectrum and the structure of the velocity map image (VMI) are well reproduced. Previously, it has been assumed that the isotropicmore » component of the VMI arises from long time statistical dissociation. Instead, our simulations suggest that ultrafast dynamics contributes significantly to both low and high energy portions of the TKER spectrum.« less

  10. Statistical Estimation of Heterogeneities: A New Frontier in Well Testing

    NASA Astrophysics Data System (ADS)

    Neuman, S. P.; Guadagnini, A.; Illman, W. A.; Riva, M.; Vesselinov, V. V.

    2001-12-01

    Well-testing methods have traditionally relied on analytical solutions of groundwater flow equations in relatively simple domains, consisting of one or at most a few units having uniform hydraulic properties. Recently, attention has been shifting toward methods and solutions that would allow one to characterize subsurface heterogeneities in greater detail. On one hand, geostatistical inverse methods are being used to assess the spatial variability of parameters, such as permeability and porosity, on the basis of multiple cross-hole pressure interference tests. On the other hand, analytical solutions are being developed to describe the mean and variance (first and second statistical moments) of flow to a well in a randomly heterogeneous medium. Geostatistical inverse interpretation of cross-hole tests yields a smoothed but detailed "tomographic" image of how parameters actually vary in three-dimensional space, together with corresponding measures of estimation uncertainty. Moment solutions may soon allow one to interpret well tests in terms of statistical parameters such as the mean and variance of log permeability, its spatial autocorrelation and statistical anisotropy. The idea of geostatistical cross-hole tomography is illustrated through pneumatic injection tests conducted in unsaturated fractured tuff at the Apache Leap Research Site near Superior, Arizona. The idea of using moment equations to interpret well-tests statistically is illustrated through a recently developed three-dimensional solution for steady state flow to a well in a bounded, randomly heterogeneous, statistically anisotropic aquifer.

  11. Online and offline tools for head movement compensation in MEG.

    PubMed

    Stolk, Arjen; Todorovic, Ana; Schoffelen, Jan-Mathijs; Oostenveld, Robert

    2013-03-01

    Magnetoencephalography (MEG) is measured above the head, which makes it sensitive to variations of the head position with respect to the sensors. Head movements blur the topography of the neuronal sources of the MEG signal, increase localization errors, and reduce statistical sensitivity. Here we describe two novel and readily applicable methods that compensate for the detrimental effects of head motion on the statistical sensitivity of MEG experiments. First, we introduce an online procedure that continuously monitors head position. Second, we describe an offline analysis method that takes into account the head position time-series. We quantify the performance of these methods in the context of three different experimental settings, involving somatosensory, visual and auditory stimuli, assessing both individual and group-level statistics. The online head localization procedure allowed for optimal repositioning of the subjects over multiple sessions, resulting in a 28% reduction of the variance in dipole position and an improvement of up to 15% in statistical sensitivity. Offline incorporation of the head position time-series into the general linear model resulted in improvements of group-level statistical sensitivity between 15% and 29%. These tools can substantially reduce the influence of head movement within and between sessions, increasing the sensitivity of many cognitive neuroscience experiments. Copyright © 2012 Elsevier Inc. All rights reserved.

  12. Projecting future precipitation and temperature at sites with diverse climate through multiple statistical downscaling schemes

    NASA Astrophysics Data System (ADS)

    Vallam, P.; Qin, X. S.

    2017-10-01

    Anthropogenic-driven climate change would affect the global ecosystem and is becoming a world-wide concern. Numerous studies have been undertaken to determine the future trends of meteorological variables at different scales. Despite these studies, there remains significant uncertainty in the prediction of future climates. To examine the uncertainty arising from using different schemes to downscale the meteorological variables for the future horizons, projections from different statistical downscaling schemes were examined. These schemes included statistical downscaling method (SDSM), change factor incorporated with LARS-WG, and bias corrected disaggregation (BCD) method. Global circulation models (GCMs) based on CMIP3 (HadCM3) and CMIP5 (CanESM2) were utilized to perturb the changes in the future climate. Five study sites (i.e., Alice Springs, Edmonton, Frankfurt, Miami, and Singapore) with diverse climatic conditions were chosen for examining the spatial variability of applying various statistical downscaling schemes. The study results indicated that the regions experiencing heavy precipitation intensities were most likely to demonstrate the divergence between the predictions from various statistical downscaling methods. Also, the variance computed in projecting the weather extremes indicated the uncertainty derived from selection of downscaling tools and climate models. This study could help gain an improved understanding about the features of different downscaling approaches and the overall downscaling uncertainty.

  13. Identifying Node Role in Social Network Based on Multiple Indicators

    PubMed Central

    Huang, Shaobin; Lv, Tianyang; Zhang, Xizhe; Yang, Yange; Zheng, Weimin; Wen, Chao

    2014-01-01

    It is a classic topic of social network analysis to evaluate the importance of nodes and identify the node that takes on the role of core or bridge in a network. Because a single indicator is not sufficient to analyze multiple characteristics of a node, it is a natural solution to apply multiple indicators that should be selected carefully. An intuitive idea is to select some indicators with weak correlations to efficiently assess different characteristics of a node. However, this paper shows that it is much better to select the indicators with strong correlations. Because indicator correlation is based on the statistical analysis of a large number of nodes, the particularity of an important node will be outlined if its indicator relationship doesn't comply with the statistical correlation. Therefore, the paper selects the multiple indicators including degree, ego-betweenness centrality and eigenvector centrality to evaluate the importance and the role of a node. The importance of a node is equal to the normalized sum of its three indicators. A candidate for core or bridge is selected from the great degree nodes or the nodes with great ego-betweenness centrality respectively. Then, the role of a candidate is determined according to the difference between its indicators' relationship with the statistical correlation of the overall network. Based on 18 real networks and 3 kinds of model networks, the experimental results show that the proposed methods perform quite well in evaluating the importance of nodes and in identifying the node role. PMID:25089823

  14. The impact of loss to follow-up on hypothesis tests of the treatment effect for several statistical methods in substance abuse clinical trials.

    PubMed

    Hedden, Sarra L; Woolson, Robert F; Carter, Rickey E; Palesch, Yuko; Upadhyaya, Himanshu P; Malcolm, Robert J

    2009-07-01

    "Loss to follow-up" can be substantial in substance abuse clinical trials. When extensive losses to follow-up occur, one must cautiously analyze and interpret the findings of a research study. Aims of this project were to introduce the types of missing data mechanisms and describe several methods for analyzing data with loss to follow-up. Furthermore, a simulation study compared Type I error and power of several methods when missing data amount and mechanism varies. Methods compared were the following: Last observation carried forward (LOCF), multiple imputation (MI), modified stratified summary statistics (SSS), and mixed effects models. Results demonstrated nominal Type I error for all methods; power was high for all methods except LOCF. Mixed effect model, modified SSS, and MI are generally recommended for use; however, many methods require that the data are missing at random or missing completely at random (i.e., "ignorable"). If the missing data are presumed to be nonignorable, a sensitivity analysis is recommended.

  15. Protein Sectors: Statistical Coupling Analysis versus Conservation

    PubMed Central

    Teşileanu, Tiberiu; Colwell, Lucy J.; Leibler, Stanislas

    2015-01-01

    Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed “sectors”. The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation. PMID:25723535

  16. Detector-Response Correction of Two-Dimensional γ -Ray Spectra from Neutron Capture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rusev, G.; Jandel, M.; Arnold, C. W.

    2015-05-28

    The neutron-capture reaction produces a large variety of γ-ray cascades with different γ-ray multiplicities. A measured spectral distribution of these cascades for each γ-ray multiplicity is of importance to applications and studies of γ-ray statistical properties. The DANCE array, a 4π ball of 160 BaF 2 detectors, is an ideal tool for measurement of neutron-capture γ-rays. The high granularity of DANCE enables measurements of high-multiplicity γ-ray cascades. The measured two-dimensional spectra (γ-ray energy, γ-ray multiplicity) have to be corrected for the DANCE detector response in order to compare them with predictions of the statistical model or use them in applications.more » The detector-response correction problem becomes more difficult for a 4π detection system than for a single detector. A trial and error approach and an iterative decomposition of γ-ray multiplets, have been successfully applied to the detector-response correction. As a result, applications of the decomposition methods are discussed for two-dimensional γ-ray spectra measured at DANCE from γ-ray sources and from the 10B(n, γ) and 113Cd(n, γ) reactions.« less

  17. A common base method for analysis of qPCR data and the application of simple blocking in qPCR experiments.

    PubMed

    Ganger, Michael T; Dietz, Geoffrey D; Ewing, Sarah J

    2017-12-01

    qPCR has established itself as the technique of choice for the quantification of gene expression. Procedures for conducting qPCR have received significant attention; however, more rigorous approaches to the statistical analysis of qPCR data are needed. Here we develop a mathematical model, termed the Common Base Method, for analysis of qPCR data based on threshold cycle values (C q ) and efficiencies of reactions (E). The Common Base Method keeps all calculations in the logscale as long as possible by working with log 10 (E) ∙ C q , which we call the efficiency-weighted C q value; subsequent statistical analyses are then applied in the logscale. We show how efficiency-weighted C q values may be analyzed using a simple paired or unpaired experimental design and develop blocking methods to help reduce unexplained variation. The Common Base Method has several advantages. It allows for the incorporation of well-specific efficiencies and multiple reference genes. The method does not necessitate the pairing of samples that must be performed using traditional analysis methods in order to calculate relative expression ratios. Our method is also simple enough to be implemented in any spreadsheet or statistical software without additional scripts or proprietary components.

  18. Free-energy landscapes from adaptively biased methods: Application to quantum systems

    NASA Astrophysics Data System (ADS)

    Calvo, F.

    2010-10-01

    Several parallel adaptive biasing methods are applied to the calculation of free-energy pathways along reaction coordinates, choosing as a difficult example the double-funnel landscape of the 38-atom Lennard-Jones cluster. In the case of classical statistics, the Wang-Landau and adaptively biased molecular-dynamics (ABMD) methods are both found efficient if multiple walkers and replication and deletion schemes are used. An extension of the ABMD technique to quantum systems, implemented through the path-integral MD framework, is presented and tested on Ne38 against the quantum superposition method.

  19. Robust Lee local statistic filter for removal of mixed multiplicative and impulse noise

    NASA Astrophysics Data System (ADS)

    Ponomarenko, Nikolay N.; Lukin, Vladimir V.; Egiazarian, Karen O.; Astola, Jaakko T.

    2004-05-01

    A robust version of Lee local statistic filter able to effectively suppress the mixed multiplicative and impulse noise in images is proposed. The performance of the proposed modification is studied for a set of test images, several values of multiplicative noise variance, Gaussian and Rayleigh probability density functions of speckle, and different characteris-tics of impulse noise. The advantages of the designed filter in comparison to the conventional Lee local statistic filter and some other filters able to cope with mixed multiplicative+impulse noise are demonstrated.

  20. Measuring the scale dependence of intrinsic alignments using multiple shear estimates

    NASA Astrophysics Data System (ADS)

    Leonard, C. Danielle; Mandelbaum, Rachel

    2018-06-01

    We present a new method for measuring the scale dependence of the intrinsic alignment (IA) contamination to the galaxy-galaxy lensing signal, which takes advantage of multiple shear estimation methods applied to the same source galaxy sample. By exploiting the resulting correlation of both shape noise and cosmic variance, our method can provide an increase in the signal-to-noise of the measured IA signal as compared to methods which rely on the difference of the lensing signal from multiple photometric redshift bins. For a galaxy-galaxy lensing measurement which uses LSST sources and DESI lenses, the signal-to-noise on the IA signal from our method is predicted to improve by a factor of ˜2 relative to the method of Blazek et al. (2012), for pairs of shear estimates which yield substantially different measured IA amplitudes and highly correlated shape noise terms. We show that statistical error necessarily dominates the measurement of intrinsic alignments using our method. We also consider a physically motivated extension of the Blazek et al. (2012) method which assumes that all nearby galaxy pairs, rather than only excess pairs, are subject to IA. In this case, the signal-to-noise of the method of Blazek et al. (2012) is improved.

  1. Efficient species-level monitoring at the landscape scale

    Treesearch

    Barry R. Noon; Larissa L. Bailey; Thomas D. Sisk; Kevin S. McKelvey

    2012-01-01

    Monitoring the population trends of multiple animal species at a landscape scale is prohibitively expensive. However, advances in survey design, statistical methods, and the ability to estimate species presence on the basis of detection­nondetection data have greatly increased the feasibility of species-level monitoring. For example, recent advances in monitoring make...

  2. Applying Item Response Theory Methods to Examine the Impact of Different Response Formats

    ERIC Educational Resources Information Center

    Hohensinn, Christine; Kubinger, Klaus D.

    2011-01-01

    In aptitude and achievement tests, different response formats are usually used. A fundamental distinction must be made between the class of multiple-choice formats and the constructed response formats. Previous studies have examined the impact of different response formats applying traditional statistical approaches, but these influences can also…

  3. Markov Logic Networks in the Analysis of Genetic Data

    PubMed Central

    Sakhanenko, Nikita A.

    2010-01-01

    Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249

  4. The influence of the interactions between anthropogenic activities and multiple ecological factors on land surface temperatures of urban forests

    NASA Astrophysics Data System (ADS)

    Ren, Y.

    2017-12-01

    Context Land surface temperatures (LSTs) spatio-temporal distribution pattern of urban forests are influenced by many ecological factors; the identification of interaction between these factors can improve simulations and predictions of spatial patterns of urban cold islands. This quantitative research requires an integrated method that combines multiple sources data with spatial statistical analysis. Objectives The purpose of this study was to clarify urban forest LST influence interaction between anthropogenic activities and multiple ecological factors using cluster analysis of hot and cold spots and Geogdetector model. We introduced the hypothesis that anthropogenic activity interacts with certain ecological factors, and their combination influences urban forests LST. We also assumed that spatio-temporal distributions of urban forest LST should be similar to those of ecological factors and can be represented quantitatively. Methods We used Jinjiang as a representative city in China as a case study. Population density was employed to represent anthropogenic activity. We built up a multi-source data (forest inventory, digital elevation models (DEM), population, and remote sensing imagery) on a unified urban scale to support urban forest LST influence interaction research. Through a combination of spatial statistical analysis results, multi-source spatial data, and Geogdetector model, the interaction mechanisms of urban forest LST were revealed. Results Although different ecological factors have different influences on forest LST, in two periods with different hot spots and cold spots, the patch area and dominant tree species were the main factors contributing to LST clustering in urban forests. The interaction between anthropogenic activity and multiple ecological factors increased LST in urban forest stands, linearly and nonlinearly. Strong interactions between elevation and dominant species were generally observed and were prevalent in either hot or cold spots areas in different years. Conclusions In conclusion, a combination of spatial statistics and GeogDetector models should be effective for quantitatively evaluating interactive relationships among ecological factors, anthropogenic activity and LST.

  5. On the statistical equivalence of restrained-ensemble simulations with the maximum entropy method

    PubMed Central

    Roux, Benoît; Weare, Jonathan

    2013-01-01

    An issue of general interest in computer simulations is to incorporate information from experiments into a structural model. An important caveat in pursuing this goal is to avoid corrupting the resulting model with spurious and arbitrary biases. While the problem of biasing thermodynamic ensembles can be formulated rigorously using the maximum entropy method introduced by Jaynes, the approach can be cumbersome in practical applications with the need to determine multiple unknown coefficients iteratively. A popular alternative strategy to incorporate the information from experiments is to rely on restrained-ensemble molecular dynamics simulations. However, the fundamental validity of this computational strategy remains in question. Here, it is demonstrated that the statistical distribution produced by restrained-ensemble simulations is formally consistent with the maximum entropy method of Jaynes. This clarifies the underlying conditions under which restrained-ensemble simulations will yield results that are consistent with the maximum entropy method. PMID:23464140

  6. Analysis of in vitro fertilization data with multiple outcomes using discrete time-to-event analysis

    PubMed Central

    Maity, Arnab; Williams, Paige; Ryan, Louise; Missmer, Stacey; Coull, Brent; Hauser, Russ

    2014-01-01

    In vitro fertilization (IVF) is an increasingly common method of assisted reproductive technology. Because of the careful observation and followup required as part of the procedure, IVF studies provide an ideal opportunity to identify and assess clinical and demographic factors along with environmental exposures that may impact successful reproduction. A major challenge in analyzing data from IVF studies is handling the complexity and multiplicity of outcome, resulting from both multiple opportunities for pregnancy loss within a single IVF cycle in addition to multiple IVF cycles. To date, most evaluations of IVF studies do not make use of full data due to its complex structure. In this paper, we develop statistical methodology for analysis of IVF data with multiple cycles and possibly multiple failure types observed for each individual. We develop a general analysis framework based on a generalized linear modeling formulation that allows implementation of various types of models including shared frailty models, failure specific frailty models, and transitional models, using standard software. We apply our methodology to data from an IVF study conducted at the Brigham and Women’s Hospital, Massachusetts. We also summarize the performance of our proposed methods based on a simulation study. PMID:24317880

  7. Single-variant and multi-variant trend tests for genetic association with next-generation sequencing that are robust to sequencing error.

    PubMed

    Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Alejandro Q; Musolf, Anthony; Matise, Tara C; Finch, Stephen J; Gordon, Derek

    2012-01-01

    As with any new technology, next-generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to those data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single-variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p value, no matter how many loci. Copyright © 2013 S. Karger AG, Basel.

  8. Single variant and multi-variant trend tests for genetic association with next generation sequencing that are robust to sequencing error

    PubMed Central

    Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Andrew; Musolf, Anthony; Matise, Tara C.; Finch, Stephen J.; Gordon, Derek

    2013-01-01

    As with any new technology, next generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model, based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to that data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p-value, no matter how many loci. PMID:23594495

  9. Treatments of Missing Values in Large National Data Affect Conclusions: The Impact of Multiple Imputation on Arthroplasty Research.

    PubMed

    Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Su, Edwin P; Grauer, Jonathan N

    2018-03-01

    Despite the advantages of large, national datasets, one continuing concern is missing data values. Complete case analysis, where only cases with complete data are analyzed, is commonly used rather than more statistically rigorous approaches such as multiple imputation. This study characterizes the potential selection bias introduced using complete case analysis and compares the results of common regressions using both techniques following unicompartmental knee arthroplasty. Patients undergoing unicompartmental knee arthroplasty were extracted from the 2005 to 2015 National Surgical Quality Improvement Program. As examples, the demographics of patients with and without missing preoperative albumin and hematocrit values were compared. Missing data were then treated with both complete case analysis and multiple imputation (an approach that reproduces the variation and associations that would have been present in a full dataset) and the conclusions of common regressions for adverse outcomes were compared. A total of 6117 patients were included, of which 56.7% were missing at least one value. Younger, female, and healthier patients were more likely to have missing preoperative albumin and hematocrit values. The use of complete case analysis removed 3467 patients from the study in comparison with multiple imputation which included all 6117 patients. The 2 methods of handling missing values led to differing associations of low preoperative laboratory values with commonly studied adverse outcomes. The use of complete case analysis can introduce selection bias and may lead to different conclusions in comparison with the statistically rigorous multiple imputation approach. Joint surgeons should consider the methods of handling missing values when interpreting arthroplasty research. Copyright © 2017 Elsevier Inc. All rights reserved.

  10. Evaluating Composite Sampling Methods of Bacillus spores at Low Concentrations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hess, Becky M.; Amidan, Brett G.; Anderson, Kevin K.

    Restoring facility operations after the 2001 Amerithrax attacks took over three months to complete, highlighting the need to reduce remediation time. The most time intensive tasks were environmental sampling and sample analyses. Composite sampling allows disparate samples to be combined, with only a single analysis needed, making it a promising method to reduce response times. We developed a statistical experimental design to test three different composite sampling methods: 1) single medium single pass composite: a single cellulose sponge samples multiple coupons; 2) single medium multi-pass composite: a single cellulose sponge is used to sample multiple coupons; and 3) multi-medium post-samplemore » composite: a single cellulose sponge samples a single surface, and then multiple sponges are combined during sample extraction. Five spore concentrations of Bacillus atrophaeus Nakamura spores were tested; concentrations ranged from 5 to 100 CFU/coupon (0.00775 to 0.155CFU/cm2, respectively). Study variables included four clean surface materials (stainless steel, vinyl tile, ceramic tile, and painted wallboard) and three grime coated/dirty materials (stainless steel, vinyl tile, and ceramic tile). Analysis of variance for the clean study showed two significant factors: composite method (p-value < 0.0001) and coupon material (p-value = 0.0008). Recovery efficiency (RE) was higher overall using the post-sample composite (PSC) method compared to single medium composite from both clean and grime coated materials. RE with the PSC method for concentrations tested (10 to 100 CFU/coupon) was similar for ceramic tile, painted wall board, and stainless steel for clean materials. RE was lowest for vinyl tile with both composite methods. Statistical tests for the dirty study showed RE was significantly higher for vinyl and stainless steel materials, but significantly lower for ceramic tile. These results suggest post-sample compositing can be used to reduce sample analysis time when responding to a Bacillus anthracis contamination event of clean or dirty surfaces.« less

  11. A procedure of multiple period searching in unequally spaced time-series with the Lomb-Scargle method

    NASA Technical Reports Server (NTRS)

    Van Dongen, H. P.; Olofsen, E.; VanHartevelt, J. H.; Kruyt, E. W.; Dinges, D. F. (Principal Investigator)

    1999-01-01

    Periodogram analysis of unequally spaced time-series, as part of many biological rhythm investigations, is complicated. The mathematical framework is scattered over the literature, and the interpretation of results is often debatable. In this paper, we show that the Lomb-Scargle method is the appropriate tool for periodogram analysis of unequally spaced data. A unique procedure of multiple period searching is derived, facilitating the assessment of the various rhythms that may be present in a time-series. All relevant mathematical and statistical aspects are considered in detail, and much attention is given to the correct interpretation of results. The use of the procedure is illustrated by examples, and problems that may be encountered are discussed. It is argued that, when following the procedure of multiple period searching, we can even benefit from the unequal spacing of a time-series in biological rhythm research.

  12. Accelerated Enveloping Distribution Sampling: Enabling Sampling of Multiple End States while Preserving Local Energy Minima.

    PubMed

    Perthold, Jan Walther; Oostenbrink, Chris

    2018-05-17

    Enveloping distribution sampling (EDS) is an efficient approach to calculate multiple free-energy differences from a single molecular dynamics (MD) simulation. However, the construction of an appropriate reference-state Hamiltonian that samples all states efficiently is not straightforward. We propose a novel approach for the construction of the EDS reference-state Hamiltonian, related to a previously described procedure to smoothen energy landscapes. In contrast to previously suggested EDS approaches, our reference-state Hamiltonian preserves local energy minima of the combined end-states. Moreover, we propose an intuitive, robust and efficient parameter optimization scheme to tune EDS Hamiltonian parameters. We demonstrate the proposed method with established and novel test systems and conclude that our approach allows for the automated calculation of multiple free-energy differences from a single simulation. Accelerated EDS promises to be a robust and user-friendly method to compute free-energy differences based on solid statistical mechanics.

  13. Multiple Point Statistics algorithm based on direct sampling and multi-resolution images

    NASA Astrophysics Data System (ADS)

    Julien, S.; Renard, P.; Chugunova, T.

    2017-12-01

    Multiple Point Statistics (MPS) has become popular for more than one decade in Earth Sciences, because these methods allow to generate random fields reproducing highly complex spatial features given in a conceptual model, the training image, while classical geostatistics techniques based on bi-point statistics (covariance or variogram) fail to generate realistic models. Among MPS methods, the direct sampling consists in borrowing patterns from the training image to populate a simulation grid. This latter is sequentially filled by visiting each of these nodes in a random order, and then the patterns, whose the number of nodes is fixed, become narrower during the simulation process, as the simulation grid is more densely informed. Hence, large scale structures are caught in the beginning of the simulation and small scale ones in the end. However, MPS may mix spatial characteristics distinguishable at different scales in the training image, and then loose the spatial arrangement of different structures. To overcome this limitation, we propose to perform MPS simulation using a decomposition of the training image in a set of images at multiple resolutions. Applying a Gaussian kernel onto the training image (convolution) results in a lower resolution image, and iterating this process, a pyramid of images depicting fewer details at each level is built, as it can be done in image processing for example to lighten the space storage of a photography. The direct sampling is then employed to simulate the lowest resolution level, and then to simulate each level, up to the finest resolution, conditioned to the level one rank coarser. This scheme helps reproduce the spatial structures at any scale of the training image and then generate more realistic models. We illustrate the method with aerial photographies (satellite images) and natural textures. Indeed, these kinds of images often display typical structures at different scales and are well-suited for MPS simulation techniques.

  14. Fuzzy-logic based strategy for validation of multiplex methods: example with qualitative GMO assays.

    PubMed

    Bellocchi, Gianni; Bertholet, Vincent; Hamels, Sandrine; Moens, W; Remacle, José; Van den Eede, Guy

    2010-02-01

    This paper illustrates the advantages that a fuzzy-based aggregation method could bring into the validation of a multiplex method for GMO detection (DualChip GMO kit, Eppendorf). Guidelines for validation of chemical, bio-chemical, pharmaceutical and genetic methods have been developed and ad hoc validation statistics are available and routinely used, for in-house and inter-laboratory testing, and decision-making. Fuzzy logic allows summarising the information obtained by independent validation statistics into one synthetic indicator of overall method performance. The microarray technology, introduced for simultaneous identification of multiple GMOs, poses specific validation issues (patterns of performance for a variety of GMOs at different concentrations). A fuzzy-based indicator for overall evaluation is illustrated in this paper, and applied to validation data for different genetically modified elements. Remarks were drawn on the analytical results. The fuzzy-logic based rules were shown to be applicable to improve interpretation of results and facilitate overall evaluation of the multiplex method.

  15. STATISTICAL METHODOLOGY FOR THE SIMULTANEOUS ANALYSIS OF MULTIPLE TYPES OF OUTCOMES IN NONLINEAR THRESHOLD MODELS.

    EPA Science Inventory

    Multiple outcomes are often measured on each experimental unit in toxicology experiments. These multiple observations typically imply the existence of correlation between endpoints, and a statistical analysis that incorporates it may result in improved inference. When both disc...

  16. The need and approach for characterization - U.S. air force perspectives on materials state awareness

    NASA Astrophysics Data System (ADS)

    Aldrin, John C.; Lindgren, Eric A.

    2018-04-01

    This paper expands on the objective and motivation for NDE-based characterization and includes a discussion of the current approach using model-assisted inversion being pursued within the Air Force Research Laboratory (AFRL). This includes a discussion of the multiple model-based methods that can be used, including physics-based models, deep machine learning, and heuristic approaches. The benefits and drawbacks of each method is reviewed and the potential to integrate multiple methods is discussed. Initial successes are included to highlight the ability to obtain quantitative values of damage. Additional steps remaining to realize this capability with statistical metrics of accuracy are discussed, and how these results can be used to enable probabilistic life management are addressed. The outcome of this initiative will realize the long-term desired capability of NDE methods to provide quantitative characterization to accelerate certification of new materials and enhance life management of engineered systems.

  17. Methods for improved forewarning of critical events across multiple data channels

    DOEpatents

    Hively, Lee M [Philadelphia, TN

    2007-04-24

    This disclosed invention concerns improvements in forewarning of critical events via phase-space dissimilarity analysis of data from mechanical devices, electrical devices, biomedical data, and other physical processes. First, a single channel of process-indicative data is selected that can be used in place of multiple data channels without sacrificing consistent forewarning of critical events. Second, the method discards data of inadequate quality via statistical analysis of the raw data, because the analysis of poor quality data always yields inferior results. Third, two separate filtering operations are used in sequence to remove both high-frequency and low-frequency artifacts using a zero-phase quadratic filter. Fourth, the method constructs phase-space dissimilarity measures (PSDM) by combining of multi-channel time-serial data into a multi-channel time-delay phase-space reconstruction. Fifth, the method uses a composite measure of dissimilarity (C.sub.i) to provide a forewarning of failure and an indicator of failure onset.

  18. Robust Combining of Disparate Classifiers Through Order Statistics

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Ghosh, Joydeep

    2001-01-01

    Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.

  19. Asymptotic Linear Spectral Statistics for Spiked Hermitian Random Matrices

    NASA Astrophysics Data System (ADS)

    Passemier, Damien; McKay, Matthew R.; Chen, Yang

    2015-07-01

    Using the Coulomb Fluid method, this paper derives central limit theorems (CLTs) for linear spectral statistics of three "spiked" Hermitian random matrix ensembles. These include Johnstone's spiked model (i.e., central Wishart with spiked correlation), non-central Wishart with rank-one non-centrality, and a related class of non-central matrices. For a generic linear statistic, we derive simple and explicit CLT expressions as the matrix dimensions grow large. For all three ensembles under consideration, we find that the primary effect of the spike is to introduce an correction term to the asymptotic mean of the linear spectral statistic, which we characterize with simple formulas. The utility of our proposed framework is demonstrated through application to three different linear statistics problems: the classical likelihood ratio test for a population covariance, the capacity analysis of multi-antenna wireless communication systems with a line-of-sight transmission path, and a classical multiple sample significance testing problem.

  20. Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data.

    PubMed

    Zhang, Yun; Baheti, Saurabh; Sun, Zhifu

    2018-05-01

    High-throughput bisulfite methylation sequencing such as reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing is commonly used for base resolution methylome research. These data are represented either by the ratio of methylated cytosine versus total coverage at a CpG site or numbers of methylated and unmethylated cytosines. Multiple statistical methods can be used to detect differentially methylated CpGs (DMCs) between conditions, and these methods are often the base for the next step of differentially methylated region identification. The ratio data have a flexibility of fitting to many linear models, but the raw count data take consideration of coverage information. There is an array of options in each datatype for DMC detection; however, it is not clear which is an optimal statistical method. In this study, we systematically evaluated four statistic methods on methylation ratio data and four methods on count-based data and compared their performances with regard to type I error control, sensitivity and specificity of DMC detection and computational resource demands using real RRBS data along with simulation. Our results show that the ratio-based tests are generally more conservative (less sensitive) than the count-based tests. However, some count-based methods have high false-positive rates and should be avoided. The beta-binomial model gives a good balance between sensitivity and specificity and is preferred method. Selection of methods in different settings, signal versus noise and sample size estimation are also discussed.

  1. Simplified estimation of age-specific reference intervals for skewed data.

    PubMed

    Wright, E M; Royston, P

    1997-12-30

    Age-specific reference intervals are commonly used in medical screening and clinical practice, where interest lies in the detection of extreme values. Many different statistical approaches have been published on this topic. The advantages of a parametric method are that they necessarily produce smooth centile curves, the entire density is estimated and an explicit formula is available for the centiles. The method proposed here is a simplified version of a recent approach proposed by Royston and Wright. Basic transformations of the data and multiple regression techniques are combined to model the mean, standard deviation and skewness. Using these simple tools, which are implemented in almost all statistical computer packages, age-specific reference intervals may be obtained. The scope of the method is illustrated by fitting models to several real data sets and assessing each model using goodness-of-fit techniques.

  2. Hybrid cooperative spectrum sharing for cognitive radio networks: A contract-based approach

    NASA Astrophysics Data System (ADS)

    Zhang, Songwei; Mu, Xiaomin; Wang, Ning; Zhang, Dalong; Han, Gangtao

    2018-06-01

    In order to improve the spectral efficiency, a contract-based hybrid cooperative spectrum sharing approach is proposed in this paper, in which multiple primary users (PUs) and multiple secondary users (SUs) share the primary channels in a hybrid manner. Specifically, the SUs switch their transmission mode between underlay and overlay based on the second-order statistics of the primary links. The average transmission rates of PUs and SUs are analyzed for the two transmission modes, and an optimization problem is formulated to maximize the utility of PUs under the constraint that the utility of SUs is nonnegative, which is further solved by a contract-based approach in global statistical channel statistical information (S-CSI) scenarios and local S-CSI scenarios, individually. Numerical results show that the average transmission rate of the PUs is significantly improved by using the proposed method in both of the two scenarios, and in the meantime, the SUs can achieve a good average rate, especially while the SUs have the same number of the PUs in the local S-CSI scenarios.

  3. VoxelStats: A MATLAB Package for Multi-Modal Voxel-Wise Brain Image Analysis.

    PubMed

    Mathotaarachchi, Sulantha; Wang, Seqian; Shin, Monica; Pascoal, Tharick A; Benedet, Andrea L; Kang, Min Su; Beaudry, Thomas; Fonov, Vladimir S; Gauthier, Serge; Labbe, Aurélie; Rosa-Neto, Pedro

    2016-01-01

    In healthy individuals, behavioral outcomes are highly associated with the variability on brain regional structure or neurochemical phenotypes. Similarly, in the context of neurodegenerative conditions, neuroimaging reveals that cognitive decline is linked to the magnitude of atrophy, neurochemical declines, or concentrations of abnormal protein aggregates across brain regions. However, modeling the effects of multiple regional abnormalities as determinants of cognitive decline at the voxel level remains largely unexplored by multimodal imaging research, given the high computational cost of estimating regression models for every single voxel from various imaging modalities. VoxelStats is a voxel-wise computational framework to overcome these computational limitations and to perform statistical operations on multiple scalar variables and imaging modalities at the voxel level. VoxelStats package has been developed in Matlab(®) and supports imaging formats such as Nifti-1, ANALYZE, and MINC v2. Prebuilt functions in VoxelStats enable the user to perform voxel-wise general and generalized linear models and mixed effect models with multiple volumetric covariates. Importantly, VoxelStats can recognize scalar values or image volumes as response variables and can accommodate volumetric statistical covariates as well as their interaction effects with other variables. Furthermore, this package includes built-in functionality to perform voxel-wise receiver operating characteristic analysis and paired and unpaired group contrast analysis. Validation of VoxelStats was conducted by comparing the linear regression functionality with existing toolboxes such as glim_image and RMINC. The validation results were identical to existing methods and the additional functionality was demonstrated by generating feature case assessments (t-statistics, odds ratio, and true positive rate maps). In summary, VoxelStats expands the current methods for multimodal imaging analysis by allowing the estimation of advanced regional association metrics at the voxel level.

  4. Statistical framework for the utilization of simultaneous pupil plane and focal plane telemetry for exoplanet imaging. I. Accounting for aberrations in multiple planes.

    PubMed

    Frazin, Richard A

    2016-04-01

    A new generation of telescopes with mirror diameters of 20 m or more, called extremely large telescopes (ELTs), has the potential to provide unprecedented imaging and spectroscopy of exoplanetary systems, if the difficulties in achieving the extremely high dynamic range required to differentiate the planetary signal from the star can be overcome to a sufficient degree. Fully utilizing the potential of ELTs for exoplanet imaging will likely require simultaneous and self-consistent determination of both the planetary image and the unknown aberrations in multiple planes of the optical system, using statistical inference based on the wavefront sensor and science camera data streams. This approach promises to overcome the most important systematic errors inherent in the various schemes based on differential imaging, such as angular differential imaging and spectral differential imaging. This paper is the first in a series on this subject, in which a formalism is established for the exoplanet imaging problem, setting the stage for the statistical inference methods to follow in the future. Every effort has been made to be rigorous and complete, so that validity of approximations to be made later can be assessed. Here, the polarimetric image is expressed in terms of aberrations in the various planes of a polarizing telescope with an adaptive optics system. Further, it is shown that current methods that utilize focal plane sensing to correct the speckle field, e.g., electric field conjugation, rely on the tacit assumption that aberrations on multiple optical surfaces can be represented as aberration on a single optical surface, ultimately limiting their potential effectiveness for ground-based astronomy.

  5. Registration and Fusion of Multiple Source Remotely Sensed Image Data

    NASA Technical Reports Server (NTRS)

    LeMoigne, Jacqueline

    2004-01-01

    Earth and Space Science often involve the comparison, fusion, and integration of multiple types of remotely sensed data at various temporal, radiometric, and spatial resolutions. Results of this integration may be utilized for global change analysis, global coverage of an area at multiple resolutions, map updating or validation of new instruments, as well as integration of data provided by multiple instruments carried on multiple platforms, e.g. in spacecraft constellations or fleets of planetary rovers. Our focus is on developing methods to perform fast, accurate and automatic image registration and fusion. General methods for automatic image registration are being reviewed and evaluated. Various choices for feature extraction, feature matching and similarity measurements are being compared, including wavelet-based algorithms, mutual information and statistically robust techniques. Our work also involves studies related to image fusion and investigates dimension reduction and co-kriging for application-dependent fusion. All methods are being tested using several multi-sensor datasets, acquired at EOS Core Sites, and including multiple sensors such as IKONOS, Landsat-7/ETM+, EO1/ALI and Hyperion, MODIS, and SeaWIFS instruments. Issues related to the coregistration of data from the same platform (i.e., AIRS and MODIS from Aqua) or from several platforms of the A-train (i.e., MLS, HIRDLS, OMI from Aura with AIRS and MODIS from Terra and Aqua) will also be considered.

  6. Multiple imputation of missing fMRI data in whole brain analysis

    PubMed Central

    Vaden, Kenneth I.; Gebregziabher, Mulugeta; Kuchinsky, Stefanie E.; Eckert, Mark A.

    2012-01-01

    Whole brain fMRI analyses rarely include the entire brain because of missing data that result from data acquisition limits and susceptibility artifact, in particular. This missing data problem is typically addressed by omitting voxels from analysis, which may exclude brain regions that are of theoretical interest and increase the potential for Type II error at cortical boundaries or Type I error when spatial thresholds are used to establish significance. Imputation could significantly expand statistical map coverage, increase power, and enhance interpretations of fMRI results. We examined multiple imputation for group level analyses of missing fMRI data using methods that leverage the spatial information in fMRI datasets for both real and simulated data. Available case analysis, neighbor replacement, and regression based imputation approaches were compared in a general linear model framework to determine the extent to which these methods quantitatively (effect size) and qualitatively (spatial coverage) increased the sensitivity of group analyses. In both real and simulated data analysis, multiple imputation provided 1) variance that was most similar to estimates for voxels with no missing data, 2) fewer false positive errors in comparison to mean replacement, and 3) fewer false negative errors in comparison to available case analysis. Compared to the standard analysis approach of omitting voxels with missing data, imputation methods increased brain coverage in this study by 35% (from 33,323 to 45,071 voxels). In addition, multiple imputation increased the size of significant clusters by 58% and number of significant clusters across statistical thresholds, compared to the standard voxel omission approach. While neighbor replacement produced similar results, we recommend multiple imputation because it uses an informed sampling distribution to deal with missing data across subjects that can include neighbor values and other predictors. Multiple imputation is anticipated to be particularly useful for 1) large fMRI data sets with inconsistent missing voxels across subjects and 2) addressing the problem of increased artifact at ultra-high field, which significantly limit the extent of whole brain coverage and interpretations of results. PMID:22500925

  7. Prediction of crime occurrence from multi-modal data using deep learning

    PubMed Central

    Kang, Hyeon-Woo

    2017-01-01

    In recent years, various studies have been conducted on the prediction of crime occurrences. This predictive capability is intended to assist in crime prevention by facilitating effective implementation of police patrols. Previous studies have used data from multiple domains such as demographics, economics, and education. Their prediction models treat data from different domains equally. These methods have problems in crime occurrence prediction, such as difficulty in discovering highly nonlinear relationships, redundancies, and dependencies between multiple datasets. In order to enhance crime prediction models, we consider environmental context information, such as broken windows theory and crime prevention through environmental design. In this paper, we propose a feature-level data fusion method with environmental context based on a deep neural network (DNN). Our dataset consists of data collected from various online databases of crime statistics, demographic and meteorological data, and images in Chicago, Illinois. Prior to generating training data, we select crime-related data by conducting statistical analyses. Finally, we train our DNN, which consists of the following four kinds of layers: spatial, temporal, environmental context, and joint feature representation layers. Coupled with crucial data extracted from various domains, our fusion DNN is a product of an efficient decision-making process that statistically analyzes data redundancy. Experimental performance results show that our DNN model is more accurate in predicting crime occurrence than other prediction models. PMID:28437486

  8. Prediction of crime occurrence from multi-modal data using deep learning.

    PubMed

    Kang, Hyeon-Woo; Kang, Hang-Bong

    2017-01-01

    In recent years, various studies have been conducted on the prediction of crime occurrences. This predictive capability is intended to assist in crime prevention by facilitating effective implementation of police patrols. Previous studies have used data from multiple domains such as demographics, economics, and education. Their prediction models treat data from different domains equally. These methods have problems in crime occurrence prediction, such as difficulty in discovering highly nonlinear relationships, redundancies, and dependencies between multiple datasets. In order to enhance crime prediction models, we consider environmental context information, such as broken windows theory and crime prevention through environmental design. In this paper, we propose a feature-level data fusion method with environmental context based on a deep neural network (DNN). Our dataset consists of data collected from various online databases of crime statistics, demographic and meteorological data, and images in Chicago, Illinois. Prior to generating training data, we select crime-related data by conducting statistical analyses. Finally, we train our DNN, which consists of the following four kinds of layers: spatial, temporal, environmental context, and joint feature representation layers. Coupled with crucial data extracted from various domains, our fusion DNN is a product of an efficient decision-making process that statistically analyzes data redundancy. Experimental performance results show that our DNN model is more accurate in predicting crime occurrence than other prediction models.

  9. A Statistical Analysis of Brain Morphology Using Wild Bootstrapping

    PubMed Central

    Ibrahim, Joseph G.; Tang, Niansheng; Rowe, Daniel B.; Hao, Xuejun; Bansal, Ravi; Peterson, Bradley S.

    2008-01-01

    Methods for the analysis of brain morphology, including voxel-based morphology and surface-based morphometries, have been used to detect associations between brain structure and covariates of interest, such as diagnosis, severity of disease, age, IQ, and genotype. The statistical analysis of morphometric measures usually involves two statistical procedures: 1) invoking a statistical model at each voxel (or point) on the surface of the brain or brain subregion, followed by mapping test statistics (e.g., t test) or their associated p values at each of those voxels; 2) correction for the multiple statistical tests conducted across all voxels on the surface of the brain region under investigation. We propose the use of new statistical methods for each of these procedures. We first use a heteroscedastic linear model to test the associations between the morphological measures at each voxel on the surface of the specified subregion (e.g., cortical or subcortical surfaces) and the covariates of interest. Moreover, we develop a robust test procedure that is based on a resampling method, called wild bootstrapping. This procedure assesses the statistical significance of the associations between a measure of given brain structure and the covariates of interest. The value of this robust test procedure lies in its computationally simplicity and in its applicability to a wide range of imaging data, including data from both anatomical and functional magnetic resonance imaging (fMRI). Simulation studies demonstrate that this robust test procedure can accurately control the family-wise error rate. We demonstrate the application of this robust test procedure to the detection of statistically significant differences in the morphology of the hippocampus over time across gender groups in a large sample of healthy subjects. PMID:17649909

  10. Multivariate Methods for Meta-Analysis of Genetic Association Studies.

    PubMed

    Dimou, Niki L; Pantavou, Katerina G; Braliou, Georgia G; Bagos, Pantelis G

    2018-01-01

    Multivariate meta-analysis of genetic association studies and genome-wide association studies has received a remarkable attention as it improves the precision of the analysis. Here, we review, summarize and present in a unified framework methods for multivariate meta-analysis of genetic association studies and genome-wide association studies. Starting with the statistical methods used for robust analysis and genetic model selection, we present in brief univariate methods for meta-analysis and we then scrutinize multivariate methodologies. Multivariate models of meta-analysis for a single gene-disease association studies, including models for haplotype association studies, multiple linked polymorphisms and multiple outcomes are discussed. The popular Mendelian randomization approach and special cases of meta-analysis addressing issues such as the assumption of the mode of inheritance, deviation from Hardy-Weinberg Equilibrium and gene-environment interactions are also presented. All available methods are enriched with practical applications and methodologies that could be developed in the future are discussed. Links for all available software implementing multivariate meta-analysis methods are also provided.

  11. Potential Values of Incorporating a Multiple-Choice Question Construction in Physics Experimentation Instruction

    NASA Astrophysics Data System (ADS)

    Yu, Fu-Yun; Liu, Yu-Hsin

    2005-09-01

    The potential value of a multiple-choice question-construction instructional strategy for the support of students’ learning of physics experiments was examined in the study. Forty-two university freshmen participated in the study for a whole semester. A constant comparison method adopted to categorize students’ qualitative data indicated that the influences of multiple-choice question construction were evident in several significant ways (promoting constructive and productive studying habits; reflecting and previewing course-related materials; increasing in-group communication and interaction; breaking passive learning style and habits, etc.), which, worked together, not only enhanced students’ comprehension and retention of the obtained knowledge, but also helped distil a sense of empowerment and learning community within the participants. Analysis with one-group t-tests, using 3 as the expected mean, on quantitative data further found that students’ satisfaction toward past learning experience, and perceptions toward this strategy’s potentials for promoting learning were statistically significant at the 0.0005 level, while learning anxiety was not statistically significant. Suggestions for incorporating question-generation activities within classroom and topics for future studies were rendered.

  12. Predicting Flood Hazards in Systems with Multiple Flooding Mechanisms

    NASA Astrophysics Data System (ADS)

    Luke, A.; Schubert, J.; Cheng, L.; AghaKouchak, A.; Sanders, B. F.

    2014-12-01

    Delineating flood zones in systems that are susceptible to flooding from a single mechanism (riverine flooding) is a relatively well defined procedure with specific guidance from agencies such as FEMA and USACE. However, there is little guidance in delineating flood zones in systems that are susceptible to flooding from multiple mechanisms such as storm surge, waves, tidal influence, and riverine flooding. In this study, a new flood mapping method which accounts for multiple extremes occurring simultaneously is developed and exemplified. The study site in which the method is employed is the Tijuana River Estuary (TRE) located in Southern California adjacent to the U.S./Mexico border. TRE is an intertidal coastal estuary that receives freshwater flows from the Tijuana River. Extreme discharge from the Tijuana River is the primary driver of flooding within TRE, however tide level and storm surge also play a significant role in flooding extent and depth. A comparison between measured flows at the Tijuana River and ocean levels revealed a correlation between extreme discharge and ocean height. Using a novel statistical method based upon extreme value theory, ocean heights were predicted conditioned up extreme discharge occurring within the Tijuana River. This statistical technique could also be applied to other systems in which different factors are identified as the primary drivers of flooding, such as significant wave height conditioned upon tide level, for example. Using the predicted ocean levels conditioned upon varying return levels of discharge as forcing parameters for the 2D hydraulic model BreZo, the 100, 50, 20, and 10 year floodplains were delineated. The results will then be compared to floodplains delineated using the standard methods recommended by FEMA for riverine zones with a downstream ocean boundary.

  13. Multicollinearity is a red herring in the search for moderator variables: A guide to interpreting moderated multiple regression models and a critique of Iacobucci, Schneider, Popovich, and Bakamitsos (2016).

    PubMed

    McClelland, Gary H; Irwin, Julie R; Disatnik, David; Sivan, Liron

    2017-02-01

    Multicollinearity is irrelevant to the search for moderator variables, contrary to the implications of Iacobucci, Schneider, Popovich, and Bakamitsos (Behavior Research Methods, 2016, this issue). Multicollinearity is like the red herring in a mystery novel that distracts the statistical detective from the pursuit of a true moderator relationship. We show multicollinearity is completely irrelevant for tests of moderator variables. Furthermore, readers of Iacobucci et al. might be confused by a number of their errors. We note those errors, but more positively, we describe a variety of methods researchers might use to test and interpret their moderated multiple regression models, including two-stage testing, mean-centering, spotlighting, orthogonalizing, and floodlighting without regard to putative issues of multicollinearity. We cite a number of recent studies in the psychological literature in which the researchers used these methods appropriately to test, to interpret, and to report their moderated multiple regression models. We conclude with a set of recommendations for the analysis and reporting of moderated multiple regression that should help researchers better understand their models and facilitate generalizations across studies.

  14. An omnibus test for family-based association studies with multiple SNPs and multiple phenotypes.

    PubMed

    Lasky-Su, Jessica; Murphy, Amy; McQueen, Matthew B; Weiss, Scott; Lange, Christoph

    2010-06-01

    We propose an omnibus family-based association test (MFBAT) that can be applied to multiple markers and multiple phenotypes and that has only one degree of freedom. The proposed test statistic extends current FBAT methodology to incorporate multiple markers as well as multiple phenotypes. Using simulation studies, power estimates for the proposed methodology are compared with the standard methodologies. On the basis of these simulations, we find that MFBAT substantially outperforms other methods, including haplotypic approaches and doing multiple tests with single single-nucleotide polymorphisms (SNPs) and single phenotypes. The practical relevance of the approach is illustrated by an application to asthma in which SNP/phenotype combinations are identified and reach overall significance that would not have been identified using other approaches. This methodology is directly applicable to cases in which there are multiple SNPs, such as candidate gene studies, cases in which there are multiple phenotypes, such as expression data, and cases in which there are multiple phenotypes and genotypes, such as genome-wide association studies that incorporate expression profiles as phenotypes. This program is available in the PBAT analysis package.

  15. Mass univariate analysis of event-related brain potentials/fields I: a critical tutorial review.

    PubMed

    Groppe, David M; Urbach, Thomas P; Kutas, Marta

    2011-12-01

    Event-related potentials (ERPs) and magnetic fields (ERFs) are typically analyzed via ANOVAs on mean activity in a priori windows. Advances in computing power and statistics have produced an alternative, mass univariate analyses consisting of thousands of statistical tests and powerful corrections for multiple comparisons. Such analyses are most useful when one has little a priori knowledge of effect locations or latencies, and for delineating effect boundaries. Mass univariate analyses complement and, at times, obviate traditional analyses. Here we review this approach as applied to ERP/ERF data and four methods for multiple comparison correction: strong control of the familywise error rate (FWER) via permutation tests, weak control of FWER via cluster-based permutation tests, false discovery rate control, and control of the generalized FWER. We end with recommendations for their use and introduce free MATLAB software for their implementation. Copyright © 2011 Society for Psychophysiological Research.

  16. Predicting perceptual quality of images in realistic scenario using deep filter banks

    NASA Astrophysics Data System (ADS)

    Zhang, Weixia; Yan, Jia; Hu, Shiyong; Ma, Yang; Deng, Dexiang

    2018-03-01

    Classical image perceptual quality assessment models usually resort to natural scene statistic methods, which are based on an assumption that certain reliable statistical regularities hold on undistorted images and will be corrupted by introduced distortions. However, these models usually fail to accurately predict degradation severity of images in realistic scenarios since complex, multiple, and interactive authentic distortions usually appear on them. We propose a quality prediction model based on convolutional neural network. Quality-aware features extracted from filter banks of multiple convolutional layers are aggregated into the image representation. Furthermore, an easy-to-implement and effective feature selection strategy is used to further refine the image representation and finally a linear support vector regression model is trained to map image representation into images' subjective perceptual quality scores. The experimental results on benchmark databases present the effectiveness and generalizability of the proposed model.

  17. Statistical analysis of long-term monitoring data for persistent organic pollutants in the atmosphere at 20 monitoring stations broadly indicates declining concentrations.

    PubMed

    Kong, Deguo; MacLeod, Matthew; Hung, Hayley; Cousins, Ian T

    2014-11-04

    During recent decades concentrations of persistent organic pollutants (POPs) in the atmosphere have been monitored at multiple stations worldwide. We used three statistical methods to analyze a total of 748 time series of selected POPs in the atmosphere to determine if there are statistically significant reductions in levels of POPs that have had control actions enacted to restrict or eliminate manufacture, use and emissions. Significant decreasing trends were identified in 560 (75%) of the 748 time series collected from the Arctic, North America, and Europe, indicating that the atmospheric concentrations of these POPs are generally decreasing, consistent with the overall effectiveness of emission control actions. Statistically significant trends in synthetic time series could be reliably identified with the improved Mann-Kendall (iMK) test and the digital filtration (DF) technique in time series longer than 5 years. The temporal trends of new (or emerging) POPs in the atmosphere are often unclear because time series are too short. A statistical detrending method based on the iMK test was not able to identify abrupt changes in the rates of decline of atmospheric POP concentrations encoded into synthetic time series.

  18. Overview of the SAMSI year-long program on Statistical, Mathematical and Computational Methods for Astronomy

    NASA Astrophysics Data System (ADS)

    Jogesh Babu, G.

    2017-01-01

    A year-long research (Aug 2016- May 2017) program on `Statistical, Mathematical and Computational Methods for Astronomy (ASTRO)’ is well under way at Statistical and Applied Mathematical Sciences Institute (SAMSI), a National Science Foundation research institute in Research Triangle Park, NC. This program has brought together astronomers, computer scientists, applied mathematicians and statisticians. The main aims of this program are: to foster cross-disciplinary activities; to accelerate the adoption of modern statistical and mathematical tools into modern astronomy; and to develop new tools needed for important astronomical research problems. The program provides multiple avenues for cross-disciplinary interactions, including several workshops, long-term visitors, and regular teleconferences, so participants can continue collaborations, even if they can only spend limited time in residence at SAMSI. The main program is organized around five working groups:i) Uncertainty Quantification and Astrophysical Emulationii) Synoptic Time Domain Surveysiii) Multivariate and Irregularly Sampled Time Seriesiv) Astrophysical Populationsv) Statistics, computation, and modeling in cosmology.A brief description of each of the work under way by these groups will be given. Overlaps among various working groups will also be highlighted. How the wider astronomy community can both participate and benefit from the activities, will be briefly mentioned.

  19. Method and system for knowledge discovery using non-linear statistical analysis and a 1st and 2nd tier computer program

    DOEpatents

    Hively, Lee M [Philadelphia, TN

    2011-07-12

    The invention relates to a method and apparatus for simultaneously processing different sources of test data into informational data and then processing different categories of informational data into knowledge-based data. The knowledge-based data can then be communicated between nodes in a system of multiple computers according to rules for a type of complex, hierarchical computer system modeled on a human brain.

  20. Multiple-Solution Problems in a Statistics Classroom: An Example

    ERIC Educational Resources Information Center

    Chu, Chi Wing; Chan, Kevin L. T.; Chan, Wai-Sum; Kwong, Koon-Shing

    2017-01-01

    The mathematics education literature shows that encouraging students to develop multiple solutions for given problems has a positive effect on students' understanding and creativity. In this paper, we present an example of multiple-solution problems in statistics involving a set of non-traditional dice. In particular, we consider the exact…

  1. Allelic-based gene-gene interaction associated with quantitative traits.

    PubMed

    Jung, Jeesun; Sun, Bin; Kwon, Deukwoo; Koller, Daniel L; Foroud, Tatiana M

    2009-05-01

    Recent studies have shown that quantitative phenotypes may be influenced not only by multiple single nucleotide polymorphisms (SNPs) within a gene but also by the interaction between SNPs at unlinked genes. We propose a new statistical approach that can detect gene-gene interactions at the allelic level which contribute to the phenotypic variation in a quantitative trait. By testing for the association of allelic combinations at multiple unlinked loci with a quantitative trait, we can detect the SNP allelic interaction whether or not it can be detected as a main effect. Our proposed method assigns a score to unrelated subjects according to their allelic combination inferred from observed genotypes at two or more unlinked SNPs, and then tests for the association of the allelic score with a quantitative trait. To investigate the statistical properties of the proposed method, we performed a simulation study to estimate type I error rates and power and demonstrated that this allelic approach achieves greater power than the more commonly used genotypic approach to test for gene-gene interaction. As an example, the proposed method was applied to data obtained as part of a candidate gene study of sodium retention by the kidney. We found that this method detects an interaction between the calcium-sensing receptor gene (CaSR), the chloride channel gene (CLCNKB) and the Na, K, 2Cl cotransporter gene (CLC12A1) that contributes to variation in diastolic blood pressure.

  2. Correction of the significance level when attempting multiple transformations of an explanatory variable in generalized linear models

    PubMed Central

    2013-01-01

    Background In statistical modeling, finding the most favorable coding for an exploratory quantitative variable involves many tests. This process involves multiple testing problems and requires the correction of the significance level. Methods For each coding, a test on the nullity of the coefficient associated with the new coded variable is computed. The selected coding corresponds to that associated with the largest statistical test (or equivalently the smallest pvalue). In the context of the Generalized Linear Model, Liquet and Commenges (Stat Probability Lett,71:33–38,2005) proposed an asymptotic correction of the significance level. This procedure, based on the score test, has been developed for dichotomous and Box-Cox transformations. In this paper, we suggest the use of resampling methods to estimate the significance level for categorical transformations with more than two levels and, by definition those that involve more than one parameter in the model. The categorical transformation is a more flexible way to explore the unknown shape of the effect between an explanatory and a dependent variable. Results The simulations we ran in this study showed good performances of the proposed methods. These methods were illustrated using the data from a study of the relationship between cholesterol and dementia. Conclusion The algorithms were implemented using R, and the associated CPMCGLM R package is available on the CRAN. PMID:23758852

  3. Cross-Group Equivalence of Interest and Motivation Items in PISA 2012 Turkey Sample

    ERIC Educational Resources Information Center

    Ardic, Elif Ozlem; Gelbal, Selahattin

    2017-01-01

    Purpose: The aim of this study was to examine measurement invariance of the interest and motivation related items contained in the PISA 2012 student survey with regard to gender school type and statistical regions and to identify the items that show differential item functioning (DIF) across groups. Research Methods: Multiple-group confirmatory…

  4. SAR Speckle Noise Reduction Using Wiener Filter

    NASA Technical Reports Server (NTRS)

    Joo, T. H.; Held, D. N.

    1983-01-01

    Synthetic aperture radar (SAR) images are degraded by speckle. A multiplicative speckle noise model for SAR images is presented. Using this model, a Wiener filter is derived by minimizing the mean-squared error using the known speckle statistics. Implementation of the Wiener filter is discussed and experimental results are presented. Finally, possible improvements to this method are explored.

  5. Statistics for Time-Series Spatial Data: Applying Survival Analysis to Study Land-Use Change

    ERIC Educational Resources Information Center

    Wang, Ninghua Nathan

    2013-01-01

    Traditional spatial analysis and data mining methods fall short of extracting temporal information from data. This inability makes their use difficult to study changes and the associated mechanisms of many geographic phenomena of interest, for example, land-use. On the other hand, the growing availability of land-change data over multiple time…

  6. On the Optimality of Answer-Copying Indices: Theory and Practice

    ERIC Educational Resources Information Center

    Romero, Mauricio; Riascos, Álvaro; Jara, Diego

    2015-01-01

    Multiple-choice exams are frequently used as an efficient and objective method to assess learning, but they are more vulnerable to answer copying than tests based on open questions. Several statistical tests (known as indices in the literature) have been proposed to detect cheating; however, to the best of our knowledge, they all lack mathematical…

  7. The Abduction of Children by Strangers and Nonfamily Members: Estimating the Incidence Using Multiple Methods.

    ERIC Educational Resources Information Center

    Finkelhor, David; And Others

    1992-01-01

    Used a national survey of households with children, a national survey of police records, and an analysis of FBI homicide data to estimate the incidence of nonfamily abductions of children. Offers a definition of abduction, analyzes problems in compiling abduction statistics, and discusses public policy on prevention and response. (RJM)

  8. Survivability Versus Time

    NASA Technical Reports Server (NTRS)

    Joyner, James J., Sr.

    2014-01-01

    Develop Survivability vs Time Model as a decision-evaluation tool to assess various emergency egress methods used at Launch Complex 39B (LC 39B) and in the Vehicle Assembly Building (VAB) on NASAs Kennedy Space Center. For each hazard scenario, develop probability distributions to address statistical uncertainty resulting in survivability plots over time and composite survivability plots encompassing multiple hazard scenarios.

  9. A Statistical Method for Reducing Sidelobe Clutter for the Ku-Band Precipitation Radar on Board the GPM Core Observatory

    NASA Technical Reports Server (NTRS)

    Kubota, Takuji; Iguchi, Toshio; Kojima, Masahiro; Liao, Liang; Masaki, Takeshi; Hanado, Hiroshi; Meneghini, Robert; Oki, Riko

    2016-01-01

    A statistical method to reduce the sidelobe clutter of the Ku-band precipitation radar (KuPR) of the Dual-Frequency Precipitation Radar (DPR) on board the Global Precipitation Measurement (GPM) Core Observatory is described and evaluated using DPR observations. The KuPR sidelobe clutter was much more severe than that of the Precipitation Radar on board the Tropical Rainfall Measuring Mission (TRMM), and it has caused the misidentification of precipitation. The statistical method to reduce sidelobe clutter was constructed by subtracting the estimated sidelobe power, based upon a multiple regression model with explanatory variables of the normalized radar cross section (NRCS) of surface, from the received power of the echo. The saturation of the NRCS at near-nadir angles, resulting from strong surface scattering, was considered in the calculation of the regression coefficients.The method was implemented in the KuPR algorithm and applied to KuPR-observed data. It was found that the received power from sidelobe clutter over the ocean was largely reduced by using the developed method, although some of the received power from the sidelobe clutter still remained. From the statistical results of the evaluations, it was shown that the number of KuPR precipitation events in the clutter region, after the method was applied, was comparable to that in the clutter-free region. This confirms the reasonable performance of the method in removing sidelobe clutter. For further improving the effectiveness of the method, it is necessary to improve the consideration of the NRCS saturation, which will be explored in future work.

  10. Avoid lost discoveries, because of violations of standard assumptions, by using modern robust statistical methods.

    PubMed

    Wilcox, Rand; Carlson, Mike; Azen, Stan; Clark, Florence

    2013-03-01

    Recently, there have been major advances in statistical techniques for assessing central tendency and measures of association. The practical utility of modern methods has been documented extensively in the statistics literature, but they remain underused and relatively unknown in clinical trials. Our objective was to address this issue. STUDY DESIGN AND PURPOSE: The first purpose was to review common problems associated with standard methodologies (low power, lack of control over type I errors, and incorrect assessments of the strength of the association). The second purpose was to summarize some modern methods that can be used to circumvent such problems. The third purpose was to illustrate the practical utility of modern robust methods using data from the Well Elderly 2 randomized controlled trial. In multiple instances, robust methods uncovered differences among groups and associations among variables that were not detected by classic techniques. In particular, the results demonstrated that details of the nature and strength of the association were sometimes overlooked when using ordinary least squares regression and Pearson correlation. Modern robust methods can make a practical difference in detecting and describing differences between groups and associations between variables. Such procedures should be applied more frequently when analyzing trial-based data. Copyright © 2013 Elsevier Inc. All rights reserved.

  11. Diagramming the Never Ending Story: Student-generated diagrammatic stories integrate and retain science concepts improving science literacy

    NASA Astrophysics Data System (ADS)

    Pillsbury, Ralph T.

    This research examined an instructional strategy called Diagramming the Never Ending Story: A method called diagramming was taught to sixth grade students via an outdoor science inquiry ecology unit. Students generated diagrams of the new ecology concepts they encountered, creating explanatory 'captions' for their newly drawn diagrams while connecting them in a memorable story. The diagramming process culminates in 20-30 meter-long murals called the Never Ending Story: Months of science instruction are constructed as pictorial scrolls, making sense of all new science concepts they encounter. This method was taught at a North Carolina "Public" Charter School, Children's Community School, to measure its efficacy in helping students comprehend scientific concepts and retain them thereby increasing science literacy. There were four demographically similar classes of 20 students each. Two 'treatment' classes, randomly chosen from the four classes, generated their own Never Ending Stories after being taught the diagramming method. A Solomon Four-Group Design was employed: Two Classes (one control, one treatment) were administered pre- and post; two classes received post tests only. The tests were comprised of multiple choice, fill-in and extended response (open-ended) sections. Multiple choice and fill-in test data were not statistically significant whereas extended response test data confirm that treatment classes made statistically significant gains.

  12. Accelerating simulation for the multiple-point statistics algorithm using vector quantization

    NASA Astrophysics Data System (ADS)

    Zuo, Chen; Pan, Zhibin; Liang, Hao

    2018-03-01

    Multiple-point statistics (MPS) is a prominent algorithm to simulate categorical variables based on a sequential simulation procedure. Assuming training images (TIs) as prior conceptual models, MPS extracts patterns from TIs using a template and records their occurrences in a database. However, complex patterns increase the size of the database and require considerable time to retrieve the desired elements. In order to speed up simulation and improve simulation quality over state-of-the-art MPS methods, we propose an accelerating simulation for MPS using vector quantization (VQ), called VQ-MPS. First, a variable representation is presented to make categorical variables applicable for vector quantization. Second, we adopt a tree-structured VQ to compress the database so that stationary simulations are realized. Finally, a transformed template and classified VQ are used to address nonstationarity. A two-dimensional (2D) stationary channelized reservoir image is used to validate the proposed VQ-MPS. In comparison with several existing MPS programs, our method exhibits significantly better performance in terms of computational time, pattern reproductions, and spatial uncertainty. Further demonstrations consist of a 2D four facies simulation, two 2D nonstationary channel simulations, and a three-dimensional (3D) rock simulation. The results reveal that our proposed method is also capable of solving multifacies, nonstationarity, and 3D simulations based on 2D TIs.

  13. Study/experimental/research design: much more than statistics.

    PubMed

    Knight, Kenneth L

    2010-01-01

    The purpose of study, experimental, or research design in scientific manuscripts has changed significantly over the years. It has evolved from an explanation of the design of the experiment (ie, data gathering or acquisition) to an explanation of the statistical analysis. This practice makes "Methods" sections hard to read and understand. To clarify the difference between study design and statistical analysis, to show the advantages of a properly written study design on article comprehension, and to encourage authors to correctly describe study designs. The role of study design is explored from the introduction of the concept by Fisher through modern-day scientists and the AMA Manual of Style. At one time, when experiments were simpler, the study design and statistical design were identical or very similar. With the complex research that is common today, which often includes manipulating variables to create new variables and the multiple (and different) analyses of a single data set, data collection is very different than statistical design. Thus, both a study design and a statistical design are necessary. Scientific manuscripts will be much easier to read and comprehend. A proper experimental design serves as a road map to the study methods, helping readers to understand more clearly how the data were obtained and, therefore, assisting them in properly analyzing the results.

  14. Uncertainty Analysis of Inertial Model Attitude Sensor Calibration and Application with a Recommended New Calibration Method

    NASA Technical Reports Server (NTRS)

    Tripp, John S.; Tcheng, Ping

    1999-01-01

    Statistical tools, previously developed for nonlinear least-squares estimation of multivariate sensor calibration parameters and the associated calibration uncertainty analysis, have been applied to single- and multiple-axis inertial model attitude sensors used in wind tunnel testing to measure angle of attack and roll angle. The analysis provides confidence and prediction intervals of calibrated sensor measurement uncertainty as functions of applied input pitch and roll angles. A comparative performance study of various experimental designs for inertial sensor calibration is presented along with corroborating experimental data. The importance of replicated calibrations over extended time periods has been emphasized; replication provides independent estimates of calibration precision and bias uncertainties, statistical tests for calibration or modeling bias uncertainty, and statistical tests for sensor parameter drift over time. A set of recommendations for a new standardized model attitude sensor calibration method and usage procedures is included. The statistical information provided by these procedures is necessary for the uncertainty analysis of aerospace test results now required by users of industrial wind tunnel test facilities.

  15. What does the multiple mini interview have to offer over the panel interview?

    PubMed Central

    Pau, Allan; Chen, Yu Sui; Lee, Verna Kar Mun; Sow, Chew Fei; Alwis, Ranjit De

    2016-01-01

    Introduction This paper compares the panel interview (PI) performance with the multiple mini interview (MMI) performance and indication of behavioural concerns of a sample of medical school applicants. The acceptability of the MMI was also assessed. Materials and methods All applicants shortlisted for a PI were invited to an MMI. Applicants attended a 30-min PI with two faculty interviewers followed by an MMI consisting of ten 8-min stations. Applicants were assessed on their performance at each MMI station by one faculty. The interviewer also indicated if they perceived the applicant to be a concern. Finally, applicants completed an acceptability questionnaire. Results From the analysis of 133 (75.1%) completed MMI scoresheets, the MMI scores correlated statistically significantly with the PI scores (r=0.438, p=0.001). Both were not statistically associated with sex, age, race, or pre-university academic ability to any significance. Applicants assessed as a concern at two or more stations performed statistically significantly less well at the MMI when compared with those who were assessed as a concern at one station or none at all. However, there was no association with PI performance. Acceptability scores were generally high, and comparison of mean scores for each of the acceptability questionnaire items did not show statistically significant differences between sex and race categories. Conclusions Although PI and MMI performances are correlated, the MMI may have the added advantage of more objectively generating multiple impressions of the applicant's interpersonal skill, thoughtfulness, and general demeanour. Results of the present study indicated that the MMI is acceptable in a multicultural context. PMID:26873337

  16. What does the multiple mini interview have to offer over the panel interview?

    PubMed

    Pau, Allan; Chen, Yu Sui; Lee, Verna Kar Mun; Sow, Chew Fei; Alwis, Ranjit De

    2016-01-01

    Introduction This paper compares the panel interview (PI) performance with the multiple mini interview (MMI) performance and indication of behavioural concerns of a sample of medical school applicants. The acceptability of the MMI was also assessed. Materials and methods All applicants shortlisted for a PI were invited to an MMI. Applicants attended a 30-min PI with two faculty interviewers followed by an MMI consisting of ten 8-min stations. Applicants were assessed on their performance at each MMI station by one faculty. The interviewer also indicated if they perceived the applicant to be a concern. Finally, applicants completed an acceptability questionnaire. Results From the analysis of 133 (75.1%) completed MMI scoresheets, the MMI scores correlated statistically significantly with the PI scores (r=0.438, p=0.001). Both were not statistically associated with sex, age, race, or pre-university academic ability to any significance. Applicants assessed as a concern at two or more stations performed statistically significantly less well at the MMI when compared with those who were assessed as a concern at one station or none at all. However, there was no association with PI performance. Acceptability scores were generally high, and comparison of mean scores for each of the acceptability questionnaire items did not show statistically significant differences between sex and race categories. Conclusions Although PI and MMI performances are correlated, the MMI may have the added advantage of more objectively generating multiple impressions of the applicant's interpersonal skill, thoughtfulness, and general demeanour. Results of the present study indicated that the MMI is acceptable in a multicultural context.

  17. MixGF: spectral probabilities for mixture spectra from more than one peptide.

    PubMed

    Wang, Jian; Bourne, Philip E; Bandeira, Nuno

    2014-12-01

    In large-scale proteomic experiments, multiple peptide precursors are often cofragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols promoting the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30-390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.

  18. MixGF: Spectral Probabilities for Mixture Spectra from more than One Peptide*

    PubMed Central

    Wang, Jian; Bourne, Philip E.; Bandeira, Nuno

    2014-01-01

    In large-scale proteomic experiments, multiple peptide precursors are often cofragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols promoting the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30–390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra. PMID:25225354

  19. Comparing data collected by computerized and written surveys for adolescence health research.

    PubMed

    Wu, Ying; Newfield, Susan A

    2007-01-01

    This study assessed whether data-collection formats, computerized versus paper-and-pencil, affect response patterns and descriptive statistics for adolescent health assessment surveys. Youth were assessed as part of a health risk reduction program. Baseline data from 1131 youth were analyzed. Participants completed the questionnaire either by computer (n = 390) or by paper-and-pencil (n = 741). The rate of returned surveys meeting inclusion requirements was 90.6% and did not differ by methods. However, the computerized method resulted in significantly less incompleteness but more identical responses. Multiple regression indicated that the survey methods did not contribute to problematic responses. The two survey methods yielded similar scale internal reliability and descriptive statistics for behavioral and psychological outcomes, although the computerized method elicited higher reports of some risk items such as carrying a knife, beating up a person, selling drugs, and delivering drugs. Overall, the survey method did not produce a significant difference in outcomes. This provides support for program personnel selecting survey methods based on study goals with confidence that the method of administration will not have a significant impact on the outcome.

  20. A twelve-year profile of students' SAT scores, GPAs, and MCAT scores from a small university's premedical program.

    PubMed

    Montague, J R; Frei, J K

    1993-04-01

    To determine whether significant correlations existed among quantitative and qualitative predictors of students' academic success and quantitative outcomes of such success over a 12-year period in a small university's premedical program. A database was assembled from information on the 199 graduates who earned BS degrees in biology from Barry University's School of Natural and Health Sciences from 1980 through 1991. The quantitative variables were year of BS degree, total score on the Scholastic Aptitude Test (SAT), various measures of undergraduate grade-point averages (GPAs), and total score on the Medical College Admission Test (MCAT); and the qualitative variables were minority (54% of the students) or majority status and transfer (about one-third of the students) or nontransfer status. The statistical methods were multiple analysis of variance and stepwise multiple regression. Statistically significant positive correlations were found among SAT total scores, final GPAs, biology GPAs versus nonbiology GPAs, and MCAT total scores. These correlations held for transfer versus nontransfer students and for minority versus majority students. Over the 12-year period there were significant fluctuations in mean MCAT scores. The students' SAT scores and GPAs proved to be statistically reliable predictors of MCAT scores, but the minority or majority status and the transfer or nontransfer status of the students were statistically insignificant.

  1. Defining window-boundaries for genomic analyses using smoothing spline techniques

    DOE PAGES

    Beissinger, Timothy M.; Rosa, Guilherme J.M.; Kaeppler, Shawn M.; ...

    2015-04-17

    High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of tests performed. However, use of adjacent marker information can result in over- or under-smoothing, undesirable window boundary specifications, or highly correlated test statistics. We introduce a method for defining windows based on statistically guided breakpoints in the data, as a foundation for the analysis of multiple adjacent data points. This method involves first fitting a cubic smoothing spline to the datamore » and then identifying the inflection points of the fitted spline, which serve as the boundaries of adjacent windows. This technique does not require prior knowledge of linkage disequilibrium, and therefore can be applied to data collected from individual or pooled sequencing experiments. Moreover, in contrast to existing methods, an arbitrary choice of window size is not necessary, since these are determined empirically and allowed to vary along the genome.« less

  2. Integrating the ACR Appropriateness Criteria Into the Radiology Clerkship: Comparison of Didactic Format and Group-Based Learning.

    PubMed

    Stein, Marjorie W; Frank, Susan J; Roberts, Jeffrey H; Finkelstein, Malka; Heo, Moonseong

    2016-05-01

    The aim of this study was to determine whether group-based or didactic teaching is more effective to teach ACR Appropriateness Criteria to medical students. An identical pretest, posttest, and delayed multiple-choice test was used to evaluate the efficacy of the two teaching methods. Descriptive statistics comparing test scores were obtained. On the posttest, the didactic group gained 12.5 points (P < .0001), and the group-based learning students gained 16.3 points (P < .0001). On the delayed test, the didactic group gained 14.4 points (P < .0001), and the group-based learning students gained 11.8 points (P < .001). The gains in scores on both tests were statistically significant for both groups. However, the differences in scores were not statistically significant comparing the two educational methods. Compared with didactic lectures, group-based learning is more enjoyable, time efficient, and equally efficacious. The choice of educational method can be individualized for each institution on the basis of group size, time constraints, and faculty availability. Copyright © 2016 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  3. A method for automatic feature points extraction of human vertebrae three-dimensional model

    NASA Astrophysics Data System (ADS)

    Wu, Zhen; Wu, Junsheng

    2017-05-01

    A method for automatic extraction of the feature points of the human vertebrae three-dimensional model is presented. Firstly, the statistical model of vertebrae feature points is established based on the results of manual vertebrae feature points extraction. Then anatomical axial analysis of the vertebrae model is performed according to the physiological and morphological characteristics of the vertebrae. Using the axial information obtained from the analysis, a projection relationship between the statistical model and the vertebrae model to be extracted is established. According to the projection relationship, the statistical model is matched with the vertebrae model to get the estimated position of the feature point. Finally, by analyzing the curvature in the spherical neighborhood with the estimated position of feature points, the final position of the feature points is obtained. According to the benchmark result on multiple test models, the mean relative errors of feature point positions are less than 5.98%. At more than half of the positions, the error rate is less than 3% and the minimum mean relative error is 0.19%, which verifies the effectiveness of the method.

  4. Meta-analysis of diagnostic test data: a bivariate Bayesian modeling approach.

    PubMed

    Verde, Pablo E

    2010-12-30

    In the last decades, the amount of published results on clinical diagnostic tests has expanded very rapidly. The counterpart to this development has been the formal evaluation and synthesis of diagnostic results. However, published results present substantial heterogeneity and they can be regarded as so far removed from the classical domain of meta-analysis, that they can provide a rather severe test of classical statistical methods. Recently, bivariate random effects meta-analytic methods, which model the pairs of sensitivities and specificities, have been presented from the classical point of view. In this work a bivariate Bayesian modeling approach is presented. This approach substantially extends the scope of classical bivariate methods by allowing the structural distribution of the random effects to depend on multiple sources of variability. Meta-analysis is summarized by the predictive posterior distributions for sensitivity and specificity. This new approach allows, also, to perform substantial model checking, model diagnostic and model selection. Statistical computations are implemented in the public domain statistical software (WinBUGS and R) and illustrated with real data examples. Copyright © 2010 John Wiley & Sons, Ltd.

  5. Relations between Brain Structure and Attentional Function in Spina Bifida: Utilization of Robust Statistical Approaches

    PubMed Central

    Kulesz, Paulina A.; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M.; Francis, David J.

    2015-01-01

    Objective Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Method Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product moment correlation was compared with four robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator Results All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Conclusions Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PMID:25495830

  6. Improved spatial regression analysis of diffusion tensor imaging for lesion detection during longitudinal progression of multiple sclerosis in individual subjects

    NASA Astrophysics Data System (ADS)

    Liu, Bilan; Qiu, Xing; Zhu, Tong; Tian, Wei; Hu, Rui; Ekholm, Sven; Schifitto, Giovanni; Zhong, Jianhui

    2016-03-01

    Subject-specific longitudinal DTI study is vital for investigation of pathological changes of lesions and disease evolution. Spatial Regression Analysis of Diffusion tensor imaging (SPREAD) is a non-parametric permutation-based statistical framework that combines spatial regression and resampling techniques to achieve effective detection of localized longitudinal diffusion changes within the whole brain at individual level without a priori hypotheses. However, boundary blurring and dislocation limit its sensitivity, especially towards detecting lesions of irregular shapes. In the present study, we propose an improved SPREAD (dubbed improved SPREAD, or iSPREAD) method by incorporating a three-dimensional (3D) nonlinear anisotropic diffusion filtering method, which provides edge-preserving image smoothing through a nonlinear scale space approach. The statistical inference based on iSPREAD was evaluated and compared with the original SPREAD method using both simulated and in vivo human brain data. Results demonstrated that the sensitivity and accuracy of the SPREAD method has been improved substantially by adapting nonlinear anisotropic filtering. iSPREAD identifies subject-specific longitudinal changes in the brain with improved sensitivity, accuracy, and enhanced statistical power, especially when the spatial correlation is heterogeneous among neighboring image pixels in DTI.

  7. A multi-time-step noise reduction method for measuring velocity statistics from particle tracking velocimetry

    NASA Astrophysics Data System (ADS)

    Machicoane, Nathanaël; López-Caballero, Miguel; Bourgoin, Mickael; Aliseda, Alberto; Volk, Romain

    2017-10-01

    We present a method to improve the accuracy of velocity measurements for fluid flow or particles immersed in it, based on a multi-time-step approach that allows for cancellation of noise in the velocity measurements. Improved velocity statistics, a critical element in turbulent flow measurements, can be computed from the combination of the velocity moments computed using standard particle tracking velocimetry (PTV) or particle image velocimetry (PIV) techniques for data sets that have been collected over different values of time intervals between images. This method produces Eulerian velocity fields and Lagrangian velocity statistics with much lower noise levels compared to standard PIV or PTV measurements, without the need of filtering and/or windowing. Particle displacement between two frames is computed for multiple different time-step values between frames in a canonical experiment of homogeneous isotropic turbulence. The second order velocity structure function of the flow is computed with the new method and compared to results from traditional measurement techniques in the literature. Increased accuracy is also demonstrated by comparing the dissipation rate of turbulent kinetic energy measured from this function against previously validated measurements.

  8. Analysis of data collected from right and left limbs: Accounting for dependence and improving statistical efficiency in musculoskeletal research.

    PubMed

    Stewart, Sarah; Pearson, Janet; Rome, Keith; Dalbeth, Nicola; Vandal, Alain C

    2018-01-01

    Statistical techniques currently used in musculoskeletal research often inefficiently account for paired-limb measurements or the relationship between measurements taken from multiple regions within limbs. This study compared three commonly used analysis methods with a mixed-models approach that appropriately accounted for the association between limbs, regions, and trials and that utilised all information available from repeated trials. Four analysis were applied to an existing data set containing plantar pressure data, which was collected for seven masked regions on right and left feet, over three trials, across three participant groups. Methods 1-3 averaged data over trials and analysed right foot data (Method 1), data from a randomly selected foot (Method 2), and averaged right and left foot data (Method 3). Method 4 used all available data in a mixed-effects regression that accounted for repeated measures taken for each foot, foot region and trial. Confidence interval widths for the mean differences between groups for each foot region were used as a criterion for comparison of statistical efficiency. Mean differences in pressure between groups were similar across methods for each foot region, while the confidence interval widths were consistently smaller for Method 4. Method 4 also revealed significant between-group differences that were not detected by Methods 1-3. A mixed effects linear model approach generates improved efficiency and power by producing more precise estimates compared to alternative approaches that discard information in the process of accounting for paired-limb measurements. This approach is recommended in generating more clinically sound and statistically efficient research outputs. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Understanding spatial organizations of chromosomes via statistical analysis of Hi-C data

    PubMed Central

    Hu, Ming; Deng, Ke; Qin, Zhaohui; Liu, Jun S.

    2015-01-01

    Understanding how chromosomes fold provides insights into the transcription regulation, hence, the functional state of the cell. Using the next generation sequencing technology, the recently developed Hi-C approach enables a global view of spatial chromatin organization in the nucleus, which substantially expands our knowledge about genome organization and function. However, due to multiple layers of biases, noises and uncertainties buried in the protocol of Hi-C experiments, analyzing and interpreting Hi-C data poses great challenges, and requires novel statistical methods to be developed. This article provides an overview of recent Hi-C studies and their impacts on biomedical research, describes major challenges in statistical analysis of Hi-C data, and discusses some perspectives for future research. PMID:26124977

  10. Visualization and statistical comparisons of microbial communities using R packages on Phylochip data.

    PubMed

    Holmes, Susan; Alekseyenko, Alexander; Timme, Alden; Nelson, Tyrrell; Pasricha, Pankaj Jay; Spormann, Alfred

    2011-01-01

    This article explains the statistical and computational methodology used to analyze species abundances collected using the LNBL Phylochip in a study of Irritable Bowel Syndrome (IBS) in rats. Some tools already available for the analysis of ordinary microarray data are useful in this type of statistical analysis. For instance in correcting for multiple testing we use Family Wise Error rate control and step-down tests (available in the multtest package). Once the most significant species are chosen we use the hypergeometric tests familiar for testing GO categories to test specific phyla and families. We provide examples of normalization, multivariate projections, batch effect detection and integration of phylogenetic covariation, as well as tree equalization and robustification methods.

  11. Scanning probe recognition microscopy investigation of tissue scaffold properties

    PubMed Central

    Fan, Yuan; Chen, Qian; Ayres, Virginia M; Baczewski, Andrew D; Udpa, Lalita; Kumar, Shiva

    2007-01-01

    Scanning probe recognition microscopy is a new scanning probe microscopy technique which enables selective scanning along individual nanofibers within a tissue scaffold. Statistically significant data for multiple properties can be collected by repetitively fine-scanning an identical region of interest. The results of a scanning probe recognition microscopy investigation of the surface roughness and elasticity of a series of tissue scaffolds are presented. Deconvolution and statistical methods were developed and used for data accuracy along curved nanofiber surfaces. Nanofiber features were also independently analyzed using transmission electron microscopy, with results that supported the scanning probe recognition microscopy-based analysis. PMID:18203431

  12. Scanning probe recognition microscopy investigation of tissue scaffold properties.

    PubMed

    Fan, Yuan; Chen, Qian; Ayres, Virginia M; Baczewski, Andrew D; Udpa, Lalita; Kumar, Shiva

    2007-01-01

    Scanning probe recognition microscopy is a new scanning probe microscopy technique which enables selective scanning along individual nanofibers within a tissue scaffold. Statistically significant data for multiple properties can be collected by repetitively fine-scanning an identical region of interest. The results of a scanning probe recognition microscopy investigation of the surface roughness and elasticity of a series of tissue scaffolds are presented. Deconvolution and statistical methods were developed and used for data accuracy along curved nanofiber surfaces. Nanofiber features were also independently analyzed using transmission electron microscopy, with results that supported the scanning probe recognition microscopy-based analysis.

  13. Multi-view 3D echocardiography compounding based on feature consistency

    NASA Astrophysics Data System (ADS)

    Yao, Cheng; Simpson, John M.; Schaeffter, Tobias; Penney, Graeme P.

    2011-09-01

    Echocardiography (echo) is a widely available method to obtain images of the heart; however, echo can suffer due to the presence of artefacts, high noise and a restricted field of view. One method to overcome these limitations is to use multiple images, using the 'best' parts from each image to produce a higher quality 'compounded' image. This paper describes our compounding algorithm which specifically aims to reduce the effect of echo artefacts as well as improving the signal-to-noise ratio, contrast and extending the field of view. Our method weights image information based on a local feature coherence/consistency between all the overlapping images. Validation has been carried out using phantom, volunteer and patient datasets consisting of up to ten multi-view 3D images. Multiple sets of phantom images were acquired, some directly from the phantom surface, and others by imaging through hard and soft tissue mimicking material to degrade the image quality. Our compounding method is compared to the original, uncompounded echocardiography images, and to two basic statistical compounding methods (mean and maximum). Results show that our method is able to take a set of ten images, degraded by soft and hard tissue artefacts, and produce a compounded image of equivalent quality to images acquired directly from the phantom. Our method on phantom, volunteer and patient data achieves almost the same signal-to-noise improvement as the mean method, while simultaneously almost achieving the same contrast improvement as the maximum method. We show a statistically significant improvement in image quality by using an increased number of images (ten compared to five), and visual inspection studies by three clinicians showed very strong preference for our compounded volumes in terms of overall high image quality, large field of view, high endocardial border definition and low cavity noise.

  14. Identification of differentially expressed genes and false discovery rate in microarray studies.

    PubMed

    Gusnanto, Arief; Calza, Stefano; Pawitan, Yudi

    2007-04-01

    To highlight the development in microarray data analysis for the identification of differentially expressed genes, particularly via control of false discovery rate. The emergence of high-throughput technology such as microarrays raises two fundamental statistical issues: multiplicity and sensitivity. We focus on the biological problem of identifying differentially expressed genes. First, multiplicity arises due to testing tens of thousands of hypotheses, rendering the standard P value meaningless. Second, known optimal single-test procedures such as the t-test perform poorly in the context of highly multiple tests. The standard approach of dealing with multiplicity is too conservative in the microarray context. The false discovery rate concept is fast becoming the key statistical assessment tool replacing the P value. We review the false discovery rate approach and argue that it is more sensible for microarray data. We also discuss some methods to take into account additional information from the microarrays to improve the false discovery rate. There is growing consensus on how to analyse microarray data using the false discovery rate framework in place of the classical P value. Further research is needed on the preprocessing of the raw data, such as the normalization step and filtering, and on finding the most sensitive test procedure.

  15. Accounting for multiple births in neonatal and perinatal trials: systematic review and case study.

    PubMed

    Hibbs, Anna Maria; Black, Dennis; Palermo, Lisa; Cnaan, Avital; Luan, Xianqun; Truog, William E; Walsh, Michele C; Ballard, Roberta A

    2010-02-01

    To determine the prevalence in the neonatal literature of statistical approaches accounting for the unique clustering patterns of multiple births and to explore the sensitivity of an actual trial to several analytic approaches to multiples. A systematic review of recent perinatal trials assessed the prevalence of studies accounting for clustering of multiples. The Nitric Oxide to Prevent Chronic Lung Disease (NO CLD) trial served as a case study of the sensitivity of the outcome to several statistical strategies. We calculated odds ratios using nonclustered (logistic regression) and clustered (generalized estimating equations, multiple outputation) analyses. In the systematic review, most studies did not describe the random assignment of twins and did not account for clustering. Of those studies that did, exclusion of multiples and generalized estimating equations were the most common strategies. The NO CLD study included 84 infants with a sibling enrolled in the study. Multiples were more likely than singletons to be white and were born to older mothers (P < .01). Analyses that accounted for clustering were statistically significant; analyses assuming independence were not. The statistical approach to multiples can influence the odds ratio and width of confidence intervals, thereby affecting the interpretation of a study outcome. A minority of perinatal studies address this issue. Copyright 2010 Mosby, Inc. All rights reserved.

  16. Using string invariants for prediction searching for optimal parameters

    NASA Astrophysics Data System (ADS)

    Bundzel, Marek; Kasanický, Tomáš; Pinčák, Richard

    2016-02-01

    We have developed a novel prediction method based on string invariants. The method does not require learning but a small set of parameters must be set to achieve optimal performance. We have implemented an evolutionary algorithm for the parametric optimization. We have tested the performance of the method on artificial and real world data and compared the performance to statistical methods and to a number of artificial intelligence methods. We have used data and the results of a prediction competition as a benchmark. The results show that the method performs well in single step prediction but the method's performance for multiple step prediction needs to be improved. The method works well for a wide range of parameters.

  17. Automated optimal coordination of multiple-DOF neuromuscular actions in feedforward neuroprostheses.

    PubMed

    Lujan, J Luis; Crago, Patrick E

    2009-01-01

    This paper describes a new method for designing feedforward controllers for multiple-muscle, multiple-DOF, motor system neural prostheses. The design process is based on experimental measurement of the forward input/output properties of the neuromechanical system and numerical optimization of stimulation patterns to meet muscle coactivation criteria, thus resolving the muscle redundancy (i.e., overcontrol) and the coupled DOF problems inherent in neuromechanical systems. We designed feedforward controllers to control the isometric forces at the tip of the thumb in two directions during stimulation of three thumb muscles as a model system. We tested the method experimentally in ten able-bodied individuals and one patient with spinal cord injury. Good control of isometric force in both DOFs was observed, with rms errors less than 10% of the force range in seven experiments and statistically significant correlations between the actual and target forces in all ten experiments. Systematic bias and slope errors were observed in a few experiments, likely due to the neuromuscular fatigue. Overall, the tests demonstrated the ability of a general design approach to satisfy both control and coactivation criteria in multiple-muscle, multiple-axis neuromechanical systems, which is applicable to a wide range of neuromechanical systems and stimulation electrodes.

  18. A statistical method for assessing peptide identification confidence in accurate mass and time tag proteomics

    PubMed Central

    Stanley, Jeffrey R.; Adkins, Joshua N.; Slysz, Gordon W.; Monroe, Matthew E.; Purvine, Samuel O.; Karpievitch, Yuliya V.; Anderson, Gordon A.; Smith, Richard D.; Dabney, Alan R.

    2011-01-01

    Current algorithms for quantifying peptide identification confidence in the accurate mass and time (AMT) tag approach assume that the AMT tags themselves have been correctly identified. However, there is uncertainty in the identification of AMT tags, as this is based on matching LC-MS/MS fragmentation spectra to peptide sequences. In this paper, we incorporate confidence measures for the AMT tag identifications into the calculation of probabilities for correct matches to an AMT tag database, resulting in a more accurate overall measure of identification confidence for the AMT tag approach. The method is referred to as Statistical Tools for AMT tag Confidence (STAC). STAC additionally provides a Uniqueness Probability (UP) to help distinguish between multiple matches to an AMT tag and a method to calculate an overall false discovery rate (FDR). STAC is freely available for download as both a command line and a Windows graphical application. PMID:21692516

  19. Estimation of descriptive statistics for multiply censored water quality data

    USGS Publications Warehouse

    Helsel, Dennis R.; Cohn, Timothy A.

    1988-01-01

    This paper extends the work of Gilliom and Helsel (1986) on procedures for estimating descriptive statistics of water quality data that contain “less than” observations. Previously, procedures were evaluated when only one detection limit was present. Here we investigate the performance of estimators for data that have multiple detection limits. Probability plotting and maximum likelihood methods perform substantially better than simple substitution procedures now commonly in use. Therefore simple substitution procedures (e.g., substitution of the detection limit) should be avoided. Probability plotting methods are more robust than maximum likelihood methods to misspecification of the parent distribution and their use should be encouraged in the typical situation where the parent distribution is unknown. When utilized correctly, less than values frequently contain nearly as much information for estimating population moments and quantiles as would the same observations had the detection limit been below them.

  20. Multiple Statistical Models Based Analysis of Causative Factors and Loess Landslides in Tianshui City, China

    NASA Astrophysics Data System (ADS)

    Su, Xing; Meng, Xingmin; Ye, Weilin; Wu, Weijiang; Liu, Xingrong; Wei, Wanhong

    2018-03-01

    Tianshui City is one of the mountainous cities that are threatened by severe geo-hazards in Gansu Province, China. Statistical probability models have been widely used in analyzing and evaluating geo-hazards such as landslide. In this research, three approaches (Certainty Factor Method, Weight of Evidence Method and Information Quantity Method) were adopted to quantitively analyze the relationship between the causative factors and the landslides, respectively. The source data used in this study are including the SRTM DEM and local geological maps in the scale of 1:200,000. 12 causative factors (i.e., altitude, slope, aspect, curvature, plan curvature, profile curvature, roughness, relief amplitude, and distance to rivers, distance to faults, distance to roads, and the stratum lithology) were selected to do correlation analysis after thorough investigation of geological conditions and historical landslides. The results indicate that the outcomes of the three models are fairly consistent.

  1. Multiscale Structure of UXO Site Characterization: Spatial Estimation and Uncertainty Quantification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ostrouchov, George; Doll, William E.; Beard, Les P.

    2009-01-01

    Unexploded ordnance (UXO) site characterization must consider both how the contamination is generated and how we observe that contamination. Within the generation and observation processes, dependence structures can be exploited at multiple scales. We describe a conceptual site characterization process, the dependence structures available at several scales, and consider their statistical estimation aspects. It is evident that most of the statistical methods that are needed to address the estimation problems are known but their application-specific implementation may not be available. We demonstrate estimation at one scale and propose a representation for site contamination intensity that takes full account of uncertainty,more » is flexible enough to answer regulatory requirements, and is a practical tool for managing detailed spatial site characterization and remediation. The representation is based on point process spatial estimation methods that require modern computational resources for practical application. These methods have provisions for including prior and covariate information.« less

  2. Cluster-level statistical inference in fMRI datasets: The unexpected behavior of random fields in high dimensions.

    PubMed

    Bansal, Ravi; Peterson, Bradley S

    2018-06-01

    Identifying regional effects of interest in MRI datasets usually entails testing a priori hypotheses across many thousands of brain voxels, requiring control for false positive findings in these multiple hypotheses testing. Recent studies have suggested that parametric statistical methods may have incorrectly modeled functional MRI data, thereby leading to higher false positive rates than their nominal rates. Nonparametric methods for statistical inference when conducting multiple statistical tests, in contrast, are thought to produce false positives at the nominal rate, which has thus led to the suggestion that previously reported studies should reanalyze their fMRI data using nonparametric tools. To understand better why parametric methods may yield excessive false positives, we assessed their performance when applied both to simulated datasets of 1D, 2D, and 3D Gaussian Random Fields (GRFs) and to 710 real-world, resting-state fMRI datasets. We showed that both the simulated 2D and 3D GRFs and the real-world data contain a small percentage (<6%) of very large clusters (on average 60 times larger than the average cluster size), which were not present in 1D GRFs. These unexpectedly large clusters were deemed statistically significant using parametric methods, leading to empirical familywise error rates (FWERs) as high as 65%: the high empirical FWERs were not a consequence of parametric methods failing to model spatial smoothness accurately, but rather of these very large clusters that are inherently present in smooth, high-dimensional random fields. In fact, when discounting these very large clusters, the empirical FWER for parametric methods was 3.24%. Furthermore, even an empirical FWER of 65% would yield on average less than one of those very large clusters in each brain-wide analysis. Nonparametric methods, in contrast, estimated distributions from those large clusters, and therefore, by construct rejected the large clusters as false positives at the nominal FWERs. Those rejected clusters were outlying values in the distribution of cluster size but cannot be distinguished from true positive findings without further analyses, including assessing whether fMRI signal in those regions correlates with other clinical, behavioral, or cognitive measures. Rejecting the large clusters, however, significantly reduced the statistical power of nonparametric methods in detecting true findings compared with parametric methods, which would have detected most true findings that are essential for making valid biological inferences in MRI data. Parametric analyses, in contrast, detected most true findings while generating relatively few false positives: on average, less than one of those very large clusters would be deemed a true finding in each brain-wide analysis. We therefore recommend the continued use of parametric methods that model nonstationary smoothness for cluster-level, familywise control of false positives, particularly when using a Cluster Defining Threshold of 2.5 or higher, and subsequently assessing rigorously the biological plausibility of the findings, even for large clusters. Finally, because nonparametric methods yielded a large reduction in statistical power to detect true positive findings, we conclude that the modest reduction in false positive findings that nonparametric analyses afford does not warrant a re-analysis of previously published fMRI studies using nonparametric techniques. Copyright © 2018 Elsevier Inc. All rights reserved.

  3. Novel spectrophotometric methods for simultaneous determination of timolol and dorzolamide in their binary mixture.

    PubMed

    Lotfy, Hayam Mahmoud; Hegazy, Maha A; Rezk, Mamdouh R; Omran, Yasmin Rostom

    2014-05-21

    Two smart and novel spectrophotometric methods namely; absorbance subtraction (AS) and amplitude modulation (AM) were developed and validated for the determination of a binary mixture of timolol maleate (TIM) and dorzolamide hydrochloride (DOR) in presence of benzalkonium chloride without prior separation, using unified regression equation. Additionally, simple, specific, accurate and precise spectrophotometric methods manipulating ratio spectra were developed and validated for simultaneous determination of the binary mixture namely; simultaneous ratio subtraction (SRS), ratio difference (RD), ratio subtraction (RS) coupled with extended ratio subtraction (EXRS), constant multiplication method (CM) and mean centering of ratio spectra (MCR). The proposed spectrophotometric procedures do not require any separation steps. Accuracy, precision and linearity ranges of the proposed methods were determined and the specificity was assessed by analyzing synthetic mixtures of both drugs. They were applied to their pharmaceutical formulation and the results obtained were statistically compared to that of a reported spectrophotometric method. The statistical comparison showed that there is no significant difference between the proposed methods and the reported one regarding both accuracy and precision. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Creating ensembles of decision trees through sampling

    DOEpatents

    Kamath, Chandrika; Cantu-Paz, Erick

    2005-08-30

    A system for decision tree ensembles that includes a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method is based on statistical sampling techniques and includes the steps of reading the data; sorting the data; evaluating a potential split according to some criterion using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.

  5. MC3: Multi-core Markov-chain Monte Carlo code

    NASA Astrophysics Data System (ADS)

    Cubillos, Patricio; Harrington, Joseph; Lust, Nate; Foster, AJ; Stemm, Madison; Loredo, Tom; Stevenson, Kevin; Campo, Chris; Hardin, Matt; Hardy, Ryan

    2016-10-01

    MC3 (Multi-core Markov-chain Monte Carlo) is a Bayesian statistics tool that can be executed from the shell prompt or interactively through the Python interpreter with single- or multiple-CPU parallel computing. It offers Markov-chain Monte Carlo (MCMC) posterior-distribution sampling for several algorithms, Levenberg-Marquardt least-squares optimization, and uniform non-informative, Jeffreys non-informative, or Gaussian-informative priors. MC3 can share the same value among multiple parameters and fix the value of parameters to constant values, and offers Gelman-Rubin convergence testing and correlated-noise estimation with time-averaging or wavelet-based likelihood estimation methods.

  6. Kernel canonical-correlation Granger causality for multiple time series

    NASA Astrophysics Data System (ADS)

    Wu, Guorong; Duan, Xujun; Liao, Wei; Gao, Qing; Chen, Huafu

    2011-04-01

    Canonical-correlation analysis as a multivariate statistical technique has been applied to multivariate Granger causality analysis to infer information flow in complex systems. It shows unique appeal and great superiority over the traditional vector autoregressive method, due to the simplified procedure that detects causal interaction between multiple time series, and the avoidance of potential model estimation problems. However, it is limited to the linear case. Here, we extend the framework of canonical correlation to include the estimation of multivariate nonlinear Granger causality for drawing inference about directed interaction. Its feasibility and effectiveness are verified on simulated data.

  7. Family-Based Rare Variant Association Analysis: A Fast and Efficient Method of Multivariate Phenotype Association Analysis.

    PubMed

    Wang, Longfei; Lee, Sungyoung; Gim, Jungsoo; Qiao, Dandi; Cho, Michael; Elston, Robert C; Silverman, Edwin K; Won, Sungho

    2016-09-01

    Family-based designs have been repeatedly shown to be powerful in detecting the significant rare variants associated with human diseases. Furthermore, human diseases are often defined by the outcomes of multiple phenotypes, and thus we expect multivariate family-based analyses may be very efficient in detecting associations with rare variants. However, few statistical methods implementing this strategy have been developed for family-based designs. In this report, we describe one such implementation: the multivariate family-based rare variant association tool (mFARVAT). mFARVAT is a quasi-likelihood-based score test for rare variant association analysis with multiple phenotypes, and tests both homogeneous and heterogeneous effects of each variant on multiple phenotypes. Simulation results show that the proposed method is generally robust and efficient for various disease models, and we identify some promising candidate genes associated with chronic obstructive pulmonary disease. The software of mFARVAT is freely available at http://healthstat.snu.ac.kr/software/mfarvat/, implemented in C++ and supported on Linux and MS Windows. © 2016 WILEY PERIODICALS, INC.

  8. Using the Flipped Classroom to Bridge the Gap to Generation Y

    PubMed Central

    Gillispie, Veronica

    2016-01-01

    Background: The flipped classroom is a student-centered approach to learning that increases active learning for the student compared to traditional classroom-based instruction. In the flipped classroom model, students are first exposed to the learning material through didactics outside of the classroom, usually in the form of written material, voice-over lectures, or videos. During the formal teaching time, an instructor facilitates student-driven discussion of the material via case scenarios, allowing for complex problem solving, peer interaction, and a deep understanding of the concepts. A successful flipped classroom should have three goals: (1) allow the students to become critical thinkers, (2) fully engage students and instructors, and (3) stimulate the development of a deep understanding of the material. The flipped classroom model includes teaching and learning methods that can appeal to all four generations in the academic environment. Methods: During the 2015 academic year, we implemented the flipped classroom in the obstetrics and gynecology clerkship for the Ochsner Clinical School in New Orleans, LA. Voice-over presentations of the lectures that had been given to students in prior years were recorded and made available to the students through an online classroom. Weekly problem-based learning sessions matched to the subjects of the traditional lectures were held, and the faculty who had previously presented the information in the traditional lecture format facilitated the problem-based learning sessions. The knowledge base of students was evaluated at the end of the rotation via a multiple-choice question examination and the Objective Structured Clinical Examination (OSCE) as had been done in previous years. We compared demographic information and examination scores for traditional teaching and flipped classroom groups of students. The traditional teaching group consisted of students from Rotation 2 and Rotation 3 of the 2014 academic year who received traditional classroom-based instruction. The flipped classroom group consisted of students from Rotation 2 and Rotation 3 of the 2015 academic year who received formal didactics via voice-over presentation and had the weekly problem-based learning sessions. Results: When comparing the students taught by traditional methods to those taught in the flipped classroom model, we saw a statistically significant increase in test scores on the multiple-choice question examination in both the obstetrics and gynecology sections in Rotation 2. While the average score for the flipped classroom group increased in Rotation 3 on the obstetrics section of the multiple-choice question examination, the difference was not statistically significant. Unexpectedly, the average score on the gynecology portion of the multiple-choice question examination decreased among the flipped classroom group compared to the traditional teaching group, and this decrease was statistically significant. For both the obstetrics and the gynecology portions of the OSCE, we saw statistically significant increases in the scores for the flipped classroom group in both Rotation 2 and Rotation 3 compared to the traditional teaching group. With the exception of the gynecology portion of the multiple-choice question examination in Rotation 3, we saw improvement in scores after the implementation of the flipped classroom. Conclusion: The flipped classroom is a feasible and useful alternative to the traditional classroom. It is a method that embraces Generation Y's need for active learning in a group setting while maintaining a traditional classroom method for introducing the information. Active learning increases student engagement and can lead to improved retention of material as demonstrated on standard examinations. PMID:27046401

  9. Whole-Range Assessment: A Simple Method for Analysing Allelopathic Dose-Response Data

    PubMed Central

    An, Min; Pratley, J. E.; Haig, T.; Liu, D.L.

    2005-01-01

    Based on the typical biological responses of an organism to allelochemicals (hormesis), concepts of whole-range assessment and inhibition index were developed for improved analysis of allelopathic data. Examples of their application are presented using data drawn from the literature. The method is concise and comprehensive, and makes data grouping and multiple comparisons simple, logical, and possible. It improves data interpretation, enhances research outcomes, and is a statistically efficient summary of the plant response profiles. PMID:19330165

  10. Reconstruction Error and Principal Component Based Anomaly Detection in Hyperspectral Imagery

    DTIC Science & Technology

    2014-03-27

    2003), and (Jackson D. A., 1993). In 1933, Hotelling ( Hotelling , 1933), who coined the term ‘principal components,’ surmised that there was a...goodness of fit and multivariate quality control with the statistic Qi = (Xi(1×p) − X̂i(1×p) )(Xi(1×p) − X̂i(1×p) ) T (20) where, under the...sparsely targeted scenes through SNR or other methods. 5) Customize sorting and histogram construction methods in Multiple PCA to avoid redundancy

  11. Developing image processing meta-algorithms with data mining of multiple metrics.

    PubMed

    Leung, Kelvin; Cunha, Alexandre; Toga, A W; Parker, D Stott

    2014-01-01

    People often use multiple metrics in image processing, but here we take a novel approach of mining the values of batteries of metrics on image processing results. We present a case for extending image processing methods to incorporate automated mining of multiple image metric values. Here by a metric we mean any image similarity or distance measure, and in this paper we consider intensity-based and statistical image measures and focus on registration as an image processing problem. We show how it is possible to develop meta-algorithms that evaluate different image processing results with a number of different metrics and mine the results in an automated fashion so as to select the best results. We show that the mining of multiple metrics offers a variety of potential benefits for many image processing problems, including improved robustness and validation.

  12. Statistical Selection of Biological Models for Genome-Wide Association Analyses.

    PubMed

    Bi, Wenjian; Kang, Guolian; Pounds, Stanley B

    2018-05-24

    Genome-wide association studies have discovered many biologically important associations of genes with phenotypes. Typically, genome-wide association analyses formally test the association of each genetic feature (SNP, CNV, etc) with the phenotype of interest and summarize the results with multiplicity-adjusted p-values. However, very small p-values only provide evidence against the null hypothesis of no association without indicating which biological model best explains the observed data. Correctly identifying a specific biological model may improve the scientific interpretation and can be used to more effectively select and design a follow-up validation study. Thus, statistical methodology to identify the correct biological model for a particular genotype-phenotype association can be very useful to investigators. Here, we propose a general statistical method to summarize how accurately each of five biological models (null, additive, dominant, recessive, co-dominant) represents the data observed for each variant in a GWAS study. We show that the new method stringently controls the false discovery rate and asymptotically selects the correct biological model. Simulations of two-stage discovery-validation studies show that the new method has these properties and that its validation power is similar to or exceeds that of simple methods that use the same statistical model for all SNPs. Example analyses of three data sets also highlight these advantages of the new method. An R package is freely available at www.stjuderesearch.org/site/depts/biostats/maew. Copyright © 2018. Published by Elsevier Inc.

  13. A Comparison of Forecast Error Generators for Modeling Wind and Load Uncertainty

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lu, Ning; Diao, Ruisheng; Hafen, Ryan P.

    2013-07-25

    This paper presents four algorithms to generate random forecast error time series. The performance of four algorithms is compared. The error time series are used to create real-time (RT), hour-ahead (HA), and day-ahead (DA) wind and load forecast time series that statistically match historically observed forecasting data sets used in power grid operation to study the net load balancing need in variable generation integration studies. The four algorithms are truncated-normal distribution models, state-space based Markov models, seasonal autoregressive moving average (ARMA) models, and a stochastic-optimization based approach. The comparison is made using historical DA load forecast and actual load valuesmore » to generate new sets of DA forecasts with similar stoical forecast error characteristics (i.e., mean, standard deviation, autocorrelation, and cross-correlation). The results show that all methods generate satisfactory results. One method may preserve one or two required statistical characteristics better the other methods, but may not preserve other statistical characteristics as well compared with the other methods. Because the wind and load forecast error generators are used in wind integration studies to produce wind and load forecasts time series for stochastic planning processes, it is sometimes critical to use multiple methods to generate the error time series to obtain a statistically robust result. Therefore, this paper discusses and compares the capabilities of each algorithm to preserve the characteristics of the historical forecast data sets.« less

  14. Evaluating image reconstruction methods for tumor detection performance in whole-body PET oncology imaging

    NASA Astrophysics Data System (ADS)

    Lartizien, Carole; Kinahan, Paul E.; Comtat, Claude; Lin, Michael; Swensson, Richard G.; Trebossen, Regine; Bendriem, Bernard

    2000-04-01

    This work presents initial results from observer detection performance studies using the same volume visualization software tools that are used in clinical PET oncology imaging. Research into the FORE+OSEM and FORE+AWOSEM statistical image reconstruction methods tailored to whole- body 3D PET oncology imaging have indicated potential improvements in image SNR compared to currently used analytic reconstruction methods (FBP). To assess the resulting impact of these reconstruction methods on the performance of human observers in detecting and localizing tumors, we use a non- Monte Carlo technique to generate multiple statistically accurate realizations of 3D whole-body PET data, based on an extended MCAT phantom and with clinically realistic levels of statistical noise. For each realization, we add a fixed number of randomly located 1 cm diam. lesions whose contrast is varied among pre-calibrated values so that the range of true positive fractions is well sampled. The observer is told the number of tumors and, similar to the AFROC method, asked to localize all of them. The true positive fraction for the three algorithms (FBP, FORE+OSEM, FORE+AWOSEM) as a function of lesion contrast is calculated, although other protocols could be compared. A confidence level for each tumor is also recorded for incorporation into later AFROC analysis.

  15. The multiple decrement life table: a unifying framework for cause-of-death analysis in ecology.

    PubMed

    Carey, James R

    1989-01-01

    The multiple decrement life table is used widely in the human actuarial literature and provides statistical expressions for mortality in three different forms: i) the life table from all causes-of-death combined; ii) the life table disaggregated into selected cause-of-death categories; and iii) the life table with particular causes and combinations of causes eliminated. The purpose of this paper is to introduce the multiple decrement life table to the ecological literature by applying the methods to published death-by-cause information on Rhagoletis pomonella. Interrelations between the current approach and conventional tools used in basic and applied ecology are discussed including the conventional life table, Key Factor Analysis and Abbott's Correction used in toxicological bioassay.

  16. Fast alignment-free sequence comparison using spaced-word frequencies.

    PubMed

    Leimeister, Chris-Andre; Boden, Marcus; Horwege, Sebastian; Lindner, Sebastian; Morgenstern, Burkhard

    2014-07-15

    Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignments. They are, however, less accurate than methods based on sequence alignment. Most alignment-free approaches work by comparing the word composition of sequences. A well-known problem with these methods is that neighbouring word matches are far from independent. To reduce the statistical dependency between adjacent word matches, we propose to use 'spaced words', defined by patterns of 'match' and 'don't care' positions, for alignment-free sequence comparison. We describe a fast implementation of this approach using recursive hashing and bit operations, and we show that further improvements can be achieved by using multiple patterns instead of single patterns. To evaluate our approach, we use spaced-word frequencies as a basis for fast phylogeny reconstruction. Using real-world and simulated sequence data, we demonstrate that our multiple-pattern approach produces better phylogenies than approaches relying on contiguous words. Our program is freely available at http://spaced.gobics.de/. © The Author 2014. Published by Oxford University Press.

  17. SU-E-T-79: Comparison of Doses Received by the Hippocampus in Patients Treated with Single Vs Multiple Isocenter Based Stereotactic Radiation Therapy to the Brain for Multiple Brain Metastases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Algan, O; Giem, J; Young, J

    Purpose: To investigate the doses received by the hippocampus and normal brain tissue during a course of stereotactic radiotherapy utilizing a single isocenter (SI) versus multiple isocenter (MI) in patients with multiple intracranial metastases. Methods: Seven patients imaged with MRI including SPGR sequence and diagnosed with 2–3 brain metastases were included in this retrospective study. Two sets of stereotactic IMRT treatment plans, (MI vs SI), were generated. The hippocampus was contoured on SPGR sequences and doses received by the hippocampus and whole brain were calculated. The prescribed dose was 25Gy in 5 fractions. The two groups were compared using t-testmore » analysis. Results: There were 17 lesions in 7 patients. The median tumor, right hippocampus, left hippocampus and brain volumes were: 3.37cc, 2.56cc, 3.28cc, and 1417cc respectively. In comparing the two treatment plans, there was no difference in the PTV coverage except in the tail of the DVH curve. All tumors had V95 > 99.5%. The only statistically significant parameter was the V100 (72% vs 45%, p=0.002, favoring MI). All other evaluated parameters including the V95 and V98 did not reveal any statistically significant differences. None of the evaluated dosimetric parameters for the hippocampus (V100, V80, V60, V40, V20, V10, D100, D90, D70, D50, D30, D10) revealed any statistically significant differences (all p-values > 0.31) between MI and SI plans. The total brain dose was slightly higher in the SI plans, especially in the lower dose regions, although this difference was not statistically significant. Utilizing brain-sub-PTV volumes did not change these results. Conclusion: The use of SI treatment planning for patients with up to 3 brain metastases produces similar PTV coverage and similar normal tissue doses to the hippocampus and the brain compared to MI plans. SI treatment planning should be considered in patients with multiple brain metastases undergoing stereotactic treatment.« less

  18. Health Service Access across Racial/Ethnic Groups of Children in the Child Welfare System

    ERIC Educational Resources Information Center

    Wells, Rebecca; Hillemeier, Marianne M.; Bai, Yu; Belue, Rhonda

    2009-01-01

    Objective: This study examined health service access among children of different racial/ethnic groups in the child welfare system in an attempt to identify and explain disparities. Methods: Data were from the National Survey of Child and Adolescent Well-Being (NSCAW). N for descriptive statistics = 2,505. N for multiple regression model = 537.…

  19. GMHDIF: A Computer Program for Detecting DIF in Dichotomous and Polytomous Items Using Generalized Mantel-Haenszel Statistics

    ERIC Educational Resources Information Center

    Fidalgo, Angel M.

    2011-01-01

    Mantel-Haenszel (MH) methods constitute one of the most popular nonparametric differential item functioning (DIF) detection procedures. GMHDIF has been developed to provide an easy-to-use program for conducting DIF analyses. Some of the advantages of this program are that (a) it performs two-stage DIF analyses in multiple groups simultaneously;…

  20. Individual Change and the Timing and Onset of Important Life Events: Methods, Models, and Assumptions

    ERIC Educational Resources Information Center

    Grimm, Kevin; Marcoulides, Katerina

    2016-01-01

    Researchers are often interested in studying how the timing of a specific event affects concurrent and future development. When faced with such research questions there are multiple statistical models to consider and those models are the focus of this paper as well as their theoretical underpinnings and assumptions regarding the nature of the…

  1. A novel bi-level meta-analysis approach: applied to biological pathway analysis.

    PubMed

    Nguyen, Tin; Tagett, Rebecca; Donato, Michele; Mitrea, Cristina; Draghici, Sorin

    2016-02-01

    The accumulation of high-throughput data in public repositories creates a pressing need for integrative analysis of multiple datasets from independent experiments. However, study heterogeneity, study bias, outliers and the lack of power of available methods present real challenge in integrating genomic data. One practical drawback of many P-value-based meta-analysis methods, including Fisher's, Stouffer's, minP and maxP, is that they are sensitive to outliers. Another drawback is that, because they perform just one statistical test for each individual experiment, they may not fully exploit the potentially large number of samples within each study. We propose a novel bi-level meta-analysis approach that employs the additive method and the Central Limit Theorem within each individual experiment and also across multiple experiments. We prove that the bi-level framework is robust against bias, less sensitive to outliers than other methods, and more sensitive to small changes in signal. For comparative analysis, we demonstrate that the intra-experiment analysis has more power than the equivalent statistical test performed on a single large experiment. For pathway analysis, we compare the proposed framework versus classical meta-analysis approaches (Fisher's, Stouffer's and the additive method) as well as against a dedicated pathway meta-analysis package (MetaPath), using 1252 samples from 21 datasets related to three human diseases, acute myeloid leukemia (9 datasets), type II diabetes (5 datasets) and Alzheimer's disease (7 datasets). Our framework outperforms its competitors to correctly identify pathways relevant to the phenotypes. The framework is sufficiently general to be applied to any type of statistical meta-analysis. The R scripts are available on demand from the authors. sorin@wayne.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. Statistics in biomedical laboratory and clinical science: applications, issues and pitfalls.

    PubMed

    Ludbrook, John

    2008-01-01

    This review is directed at biomedical scientists who want to gain a better understanding of statistics: what tests to use, when, and why. In my view, even during the planning stage of a study it is very important to seek the advice of a qualified biostatistician. When designing and analyzing a study, it is important to construct and test global hypotheses, rather than to make multiple tests on the data. If the latter cannot be avoided, it is essential to control the risk of making false-positive inferences by applying multiple comparison procedures. For comparing two means or two proportions, it is best to use exact permutation tests rather then the better known, classical, ones. For comparing many means, analysis of variance, often of a complex type, is the most powerful approach. The correlation coefficient should never be used to compare the performances of two methods of measurement, or two measures, because it does not detect bias. Instead the Altman-Bland method of differences or least-products linear regression analysis should be preferred. Finally, the educational value to investigators of interaction with a biostatistician, before, during and after a study, cannot be overemphasized. (c) 2007 S. Karger AG, Basel.

  3. Viewpoint: observations on scaled average bioequivalence.

    PubMed

    Patterson, Scott D; Jones, Byron

    2012-01-01

    The two one-sided test procedure (TOST) has been used for average bioequivalence testing since 1992 and is required when marketing new formulations of an approved drug. TOST is known to require comparatively large numbers of subjects to demonstrate bioequivalence for highly variable drugs, defined as those drugs having intra-subject coefficients of variation greater than 30%. However, TOST has been shown to protect public health when multiple generic formulations enter the marketplace following patent expiration. Recently, scaled average bioequivalence (SABE) has been proposed as an alternative statistical analysis procedure for such products by multiple regulatory agencies. SABE testing requires that a three-period partial replicate cross-over or full replicate cross-over design be used. Following a brief summary of SABE analysis methods applied to existing data, we will consider three statistical ramifications of the proposed additional decision rules and the potential impact of implementation of scaled average bioequivalence in the marketplace using simulation. It is found that a constraint being applied is biased, that bias may also result from the common problem of missing data and that the SABE methods allow for much greater changes in exposure when generic-generic switching occurs in the marketplace. Copyright © 2011 John Wiley & Sons, Ltd.

  4. Meteorological Contribution to Variability in Particulate Matter Concentrations

    NASA Astrophysics Data System (ADS)

    Woods, H. L.; Spak, S. N.; Holloway, T.

    2006-12-01

    Local concentrations of fine particulate matter (PM) are driven by a number of processes, including emissions of aerosols and gaseous precursors, atmospheric chemistry, and meteorology at local, regional, and global scales. We apply statistical downscaling methods, typically used for regional climate analysis, to estimate the contribution of regional scale meteorology to PM mass concentration variability at a range of sites in the Upper Midwestern U.S. Multiple years of daily PM10 and PM2.5 data, reported by the U.S. Environmental Protection Agency (EPA), are correlated with large-scale meteorology over the region from the National Centers for Environmental Prediction (NCEP) reanalysis data. We use two statistical downscaling methods (multiple linear regression, MLR, and analog) to identify which processes have the greatest impact on aerosol concentration variability. Empirical Orthogonal Functions of the NCEP meteorological data are correlated with PM timeseries at measurement sites. We examine which meteorological variables exert the greatest influence on PM variability, and which sites exhibit the greatest response to regional meteorology. To evaluate model performance, measurement data are withheld for limited periods, and compared with model results. Preliminary results suggest that regional meteorological processes account over 50% of aerosol concentration variability at study sites.

  5. A model-based approach to wildland fire reconstruction using sediment charcoal records

    USGS Publications Warehouse

    Itter, Malcolm S.; Finley, Andrew O.; Hooten, Mevin B.; Higuera, Philip E.; Marlon, Jennifer R.; Kelly, Ryan; McLachlan, Jason S.

    2017-01-01

    Lake sediment charcoal records are used in paleoecological analyses to reconstruct fire history, including the identification of past wildland fires. One challenge of applying sediment charcoal records to infer fire history is the separation of charcoal associated with local fire occurrence and charcoal originating from regional fire activity. Despite a variety of methods to identify local fires from sediment charcoal records, an integrated statistical framework for fire reconstruction is lacking. We develop a Bayesian point process model to estimate the probability of fire associated with charcoal counts from individual-lake sediments and estimate mean fire return intervals. A multivariate extension of the model combines records from multiple lakes to reduce uncertainty in local fire identification and estimate a regional mean fire return interval. The univariate and multivariate models are applied to 13 lakes in the Yukon Flats region of Alaska. Both models resulted in similar mean fire return intervals (100–350 years) with reduced uncertainty under the multivariate model due to improved estimation of regional charcoal deposition. The point process model offers an integrated statistical framework for paleofire reconstruction and extends existing methods to infer regional fire history from multiple lake records with uncertainty following directly from posterior distributions.

  6. The Effects of Clinically Relevant Multiple-Choice Items on the Statistical Discrimination of Physician Clinical Competence.

    ERIC Educational Resources Information Center

    Downing, Steven M.; Maatsch, Jack L.

    To test the effect of clinically relevant multiple-choice item content on the validity of statistical discriminations of physicians' clinical competence, data were collected from a field test of the Emergency Medicine Examination, test items for the certification of specialists in emergency medicine. Two 91-item multiple-choice subscales were…

  7. Research education: findings of a study of teaching-learning research using multiple analytical perspectives.

    PubMed

    Vandermause, Roxanne; Barbosa-Leiker, Celestina; Fritz, Roschelle

    2014-12-01

    This multimethod, qualitative study provides results for educators of nursing doctoral students to consider. Combining the expertise of an empirical analytical researcher (who uses statistical methods) and an interpretive phenomenological researcher (who uses hermeneutic methods), a course was designed that would place doctoral students in the midst of multiparadigmatic discussions while learning fundamental research methods. Field notes and iterative analytical discussions led to patterns and themes that highlight the value of this innovative pedagogical application. Using content analysis and interpretive phenomenological approaches, together with one of the students, data were analyzed from field notes recorded in real time over the period the course was offered. This article describes the course and the study analysis, and offers the pedagogical experience as transformative. A link to a sample syllabus is included in the article. The results encourage nurse educators of doctoral nursing students to focus educational practice on multiple methodological perspectives. Copyright 2014, SLACK Incorporated.

  8. Emotion recognition based on multiple order features using fractional Fourier transform

    NASA Astrophysics Data System (ADS)

    Ren, Bo; Liu, Deyin; Qi, Lin

    2017-07-01

    In order to deal with the insufficiency of recently algorithms based on Two Dimensions Fractional Fourier Transform (2D-FrFT), this paper proposes a multiple order features based method for emotion recognition. Most existing methods utilize the feature of single order or a couple of orders of 2D-FrFT. However, different orders of 2D-FrFT have different contributions on the feature extraction of emotion recognition. Combination of these features can enhance the performance of an emotion recognition system. The proposed approach obtains numerous features that extracted in different orders of 2D-FrFT in the directions of x-axis and y-axis, and uses the statistical magnitudes as the final feature vectors for recognition. The Support Vector Machine (SVM) is utilized for the classification and RML Emotion database and Cohn-Kanade (CK) database are used for the experiment. The experimental results demonstrate the effectiveness of the proposed method.

  9. System and method for progressive band selection for hyperspectral images

    NASA Technical Reports Server (NTRS)

    Fisher, Kevin (Inventor)

    2013-01-01

    Disclosed herein are systems, methods, and non-transitory computer-readable storage media for progressive band selection for hyperspectral images. A system having module configured to control a processor to practice the method calculates a virtual dimensionality of a hyperspectral image having multiple bands to determine a quantity Q of how many bands are needed for a threshold level of information, ranks each band based on a statistical measure, selects Q bands from the multiple bands to generate a subset of bands based on the virtual dimensionality, and generates a reduced image based on the subset of bands. This approach can create reduced datasets of full hyperspectral images tailored for individual applications. The system uses a metric specific to a target application to rank the image bands, and then selects the most useful bands. The number of bands selected can be specified manually or calculated from the hyperspectral image's virtual dimensionality.

  10. Interspecies scaling and prediction of human clearance: comparison of small- and macro-molecule drugs

    PubMed Central

    Huh, Yeamin; Smith, David E.; Feng, Meihau Rose

    2014-01-01

    Human clearance prediction for small- and macro-molecule drugs was evaluated and compared using various scaling methods and statistical analysis.Human clearance is generally well predicted using single or multiple species simple allometry for macro- and small-molecule drugs excreted renally.The prediction error is higher for hepatically eliminated small-molecules using single or multiple species simple allometry scaling, and it appears that the prediction error is mainly associated with drugs with low hepatic extraction ratio (Eh). The error in human clearance prediction for hepatically eliminated small-molecules was reduced using scaling methods with a correction of maximum life span (MLP) or brain weight (BRW).Human clearance of both small- and macro-molecule drugs is well predicted using the monkey liver blood flow method. Predictions using liver blood flow from other species did not work as well, especially for the small-molecule drugs. PMID:21892879

  11. Detecting a Weak Association by Testing its Multiple Perturbations: a Data Mining Approach

    NASA Astrophysics Data System (ADS)

    Lo, Min-Tzu; Lee, Wen-Chung

    2014-05-01

    Many risk factors/interventions in epidemiologic/biomedical studies are of minuscule effects. To detect such weak associations, one needs a study with a very large sample size (the number of subjects, n). The n of a study can be increased but unfortunately only to an extent. Here, we propose a novel method which hinges on increasing sample size in a different direction-the total number of variables (p). We construct a p-based `multiple perturbation test', and conduct power calculations and computer simulations to show that it can achieve a very high power to detect weak associations when p can be made very large. As a demonstration, we apply the method to analyze a genome-wide association study on age-related macular degeneration and identify two novel genetic variants that are significantly associated with the disease. The p-based method may set a stage for a new paradigm of statistical tests.

  12. Incorporating molecular and functional context into the analysis and prioritization of human variants associated with cancer

    PubMed Central

    Peterson, Thomas A; Nehrt, Nathan L; Park, DoHwan

    2012-01-01

    Background and objective With recent breakthroughs in high-throughput sequencing, identifying deleterious mutations is one of the key challenges for personalized medicine. At the gene and protein level, it has proven difficult to determine the impact of previously unknown variants. A statistical method has been developed to assess the significance of disease mutation clusters on protein domains by incorporating domain functional annotations to assist in the functional characterization of novel variants. Methods Disease mutations aggregated from multiple databases were mapped to domains, and were classified as either cancer- or non-cancer-related. The statistical method for identifying significantly disease-associated domain positions was applied to both sets of mutations and to randomly generated mutation sets for comparison. To leverage the known function of protein domain regions, the method optionally distributes significant scores to associated functional feature positions. Results Most disease mutations are localized within protein domains and display a tendency to cluster at individual domain positions. The method identified significant disease mutation hotspots in both the cancer and non-cancer datasets. The domain significance scores (DS-scores) for cancer form a bimodal distribution with hotspots in oncogenes forming a second peak at higher DS-scores than non-cancer, and hotspots in tumor suppressors have scores more similar to non-cancers. In addition, on an independent mutation benchmarking set, the DS-score method identified mutations known to alter protein function with very high precision. Conclusion By aggregating mutations with known disease association at the domain level, the method was able to discover domain positions enriched with multiple occurrences of deleterious mutations while incorporating relevant functional annotations. The method can be incorporated into translational bioinformatics tools to characterize rare and novel variants within large-scale sequencing studies. PMID:22319177

  13. [Study of Cervical Exfoliated Cell's DNA Quantitative Analysis Based on Multi-Spectral Imaging Technology].

    PubMed

    Wu, Zheng; Zeng, Li-bo; Wu, Qiong-shui

    2016-02-01

    The conventional cervical cancer screening methods mainly include TBS (the bethesda system) classification method and cellular DNA quantitative analysis, however, by using multiple staining method in one cell slide, which is staining the cytoplasm with Papanicolaou reagent and the nucleus with Feulgen reagent, the study of achieving both two methods in the cervical cancer screening at the same time is still blank. Because the difficulty of this multiple staining method is that the absorbance of the non-DNA material may interfere with the absorbance of DNA, so that this paper has set up a multi-spectral imaging system, and established an absorbance unmixing model by using multiple linear regression method based on absorbance's linear superposition character, and successfully stripped out the absorbance of DNA to run the DNA quantitative analysis, and achieved the perfect combination of those two kinds of conventional screening method. Through a series of experiment we have proved that between the absorbance of DNA which is calculated by the absorbance unmixxing model and the absorbance of DNA which is measured there is no significant difference in statistics when the test level is 1%, also the result of actual application has shown that there is no intersection between the confidence interval of the DNA index of the tetraploid cells which are screened by using this paper's analysis method when the confidence level is 99% and the DNA index's judging interval of cancer cells, so that the accuracy and feasibility of the quantitative DNA analysis with multiple staining method expounded by this paper have been verified, therefore this analytical method has a broad application prospect and considerable market potential in early diagnosis of cervical cancer and other cancers.

  14. Watershed Regressions for Pesticides (WARP) models for predicting stream concentrations of multiple pesticides

    USGS Publications Warehouse

    Stone, Wesley W.; Crawford, Charles G.; Gilliom, Robert J.

    2013-01-01

    Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.

  15. A Simple Illustration for the Need of Multiple Comparison Procedures

    ERIC Educational Resources Information Center

    Carter, Rickey E.

    2010-01-01

    Statistical adjustments to accommodate multiple comparisons are routinely covered in introductory statistical courses. The fundamental rationale for such adjustments, however, may not be readily understood. This article presents a simple illustration to help remedy this.

  16. An empirical Bayes method for updating inferences in analysis of quantitative trait loci using information from related genome scans.

    PubMed

    Zhang, Kui; Wiener, Howard; Beasley, Mark; George, Varghese; Amos, Christopher I; Allison, David B

    2006-08-01

    Individual genome scans for quantitative trait loci (QTL) mapping often suffer from low statistical power and imprecise estimates of QTL location and effect. This lack of precision yields large confidence intervals for QTL location, which are problematic for subsequent fine mapping and positional cloning. In prioritizing areas for follow-up after an initial genome scan and in evaluating the credibility of apparent linkage signals, investigators typically examine the results of other genome scans of the same phenotype and informally update their beliefs about which linkage signals in their scan most merit confidence and follow-up via a subjective-intuitive integration approach. A method that acknowledges the wisdom of this general paradigm but formally borrows information from other scans to increase confidence in objectivity would be a benefit. We developed an empirical Bayes analytic method to integrate information from multiple genome scans. The linkage statistic obtained from a single genome scan study is updated by incorporating statistics from other genome scans as prior information. This technique does not require that all studies have an identical marker map or a common estimated QTL effect. The updated linkage statistic can then be used for the estimation of QTL location and effect. We evaluate the performance of our method by using extensive simulations based on actual marker spacing and allele frequencies from available data. Results indicate that the empirical Bayes method can account for between-study heterogeneity, estimate the QTL location and effect more precisely, and provide narrower confidence intervals than results from any single individual study. We also compared the empirical Bayes method with a method originally developed for meta-analysis (a closely related but distinct purpose). In the face of marked heterogeneity among studies, the empirical Bayes method outperforms the comparator.

  17. Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

    PubMed

    Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

    2011-01-01

    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.

  18. A Systematic Comparison of Linear Regression-Based Statistical Methods to Assess Exposome-Health Associations.

    PubMed

    Agier, Lydiane; Portengen, Lützen; Chadeau-Hyam, Marc; Basagaña, Xavier; Giorgis-Allemand, Lise; Siroux, Valérie; Robinson, Oliver; Vlaanderen, Jelle; González, Juan R; Nieuwenhuijsen, Mark J; Vineis, Paolo; Vrijheid, Martine; Slama, Rémy; Vermeulen, Roel

    2016-12-01

    The exposome constitutes a promising framework to improve understanding of the effects of environmental exposures on health by explicitly considering multiple testing and avoiding selective reporting. However, exposome studies are challenged by the simultaneous consideration of many correlated exposures. We compared the performances of linear regression-based statistical methods in assessing exposome-health associations. In a simulation study, we generated 237 exposure covariates with a realistic correlation structure and with a health outcome linearly related to 0 to 25 of these covariates. Statistical methods were compared primarily in terms of false discovery proportion (FDP) and sensitivity. On average over all simulation settings, the elastic net and sparse partial least-squares regression showed a sensitivity of 76% and an FDP of 44%; Graphical Unit Evolutionary Stochastic Search (GUESS) and the deletion/substitution/addition (DSA) algorithm revealed a sensitivity of 81% and an FDP of 34%. The environment-wide association study (EWAS) underperformed these methods in terms of FDP (average FDP, 86%) despite a higher sensitivity. Performances decreased considerably when assuming an exposome exposure matrix with high levels of correlation between covariates. Correlation between exposures is a challenge for exposome research, and the statistical methods investigated in this study were limited in their ability to efficiently differentiate true predictors from correlated covariates in a realistic exposome context. Although GUESS and DSA provided a marginally better balance between sensitivity and FDP, they did not outperform the other multivariate methods across all scenarios and properties examined, and computational complexity and flexibility should also be considered when choosing between these methods. Citation: Agier L, Portengen L, Chadeau-Hyam M, Basagaña X, Giorgis-Allemand L, Siroux V, Robinson O, Vlaanderen J, González JR, Nieuwenhuijsen MJ, Vineis P, Vrijheid M, Slama R, Vermeulen R. 2016. A systematic comparison of linear regression-based statistical methods to assess exposome-health associations. Environ Health Perspect 124:1848-1856; http://dx.doi.org/10.1289/EHP172.

  19. SU-F-T-386: Analysis of Three QA Methods for Predicting Dose Deviation Pass Percentage for Lung SBRT VMAT Plans

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hardin, M; To, D; Giaddui, T

    2016-06-15

    Purpose: To investigate the significance of using pinpoint ionization chambers (IC) and RadCalc (RC) in determining the quality of lung SBRT VMAT plans with low dose deviation pass percentage (DDPP) as reported by ScandiDos Delta4 (D4). To quantify the relationship between DDPP and point dose deviations determined by IC (ICDD), RadCalc (RCDD), and median dose deviation reported by D4 (D4DD). Methods: Point dose deviations and D4 DDPP were compiled for 45 SBRT VMAT plans. Eighteen patients were treated on Varian Truebeam linear accelerators (linacs); the remaining 27 were treated on Elekta Synergy linacs with Agility collimators. A one-way analysis ofmore » variance (ANOVA) was performed to determine if there were any statistically significant differences between D4DD, ICDD, and RCDD. Tukey’s test was used to determine which pair of means was statistically different from each other. Multiple regression analysis was performed to determine if D4DD, ICDD, or RCDD are statistically significant predictors of DDPP. Results: Median DDPP, D4DD, ICDD, and RCDD were 80.5% (47.6%–99.2%), −0.3% (−2.0%–1.6%), 0.2% (−7.5%–6.3%), and 2.9% (−4.0%–19.7%), respectively. The ANOVA showed a statistically significant difference between D4DD, ICDD, and RCDD for a 95% confidence interval (p < 0.001). Tukey’s test revealed a statistically significant difference between two pairs of groups, RCDD-D4DD and RCDD-ICDD (p < 0.001), but no difference between ICDD-D4DD (p = 0.485). Multiple regression analysis revealed that ICDD (p = 0.04) and D4DD (p = 0.03) are statistically significant predictors of DDPP with an adjusted r{sup 2} of 0.115. Conclusion: This study shows ICDD predicts trends in D4 DDPP; however this trend is highly variable as shown by our low r{sup 2}. This work suggests that ICDD can be used as a method to verify DDPP in delivery of lung SBRT VMAT plans. RCDD may not validate low DDPP discovered in D4 QA for small field SBRT treatments.« less

  20. Single, double or multiple-injection techniques for non-ultrasound guided axillary brachial plexus block in adults undergoing surgery of the lower arm.

    PubMed

    Chin, Ki Jinn; Alakkad, Husni; Cubillos, Javier E

    2013-08-08

    Regional anaesthesia comprising axillary block of the brachial plexus is a common anaesthetic technique for distal upper limb surgery. This is an update of a review first published in 2006 and updated in 2011. To compare the relative effects (benefits and harms) of three injection techniques (single, double and multiple) of axillary block of the brachial plexus for distal upper extremity surgery. We considered these effects primarily in terms of anaesthetic effectiveness; the complication rate (neurological and vascular); and pain and discomfort caused by performance of the block. We searched the Cochrane Central Register of Controlled Trials (CENTRAL) (The Cochrane Library), MEDLINE, EMBASE and reference lists of trials. We contacted trial authors. The date of the last search was March 2013 (updated from March 2011). We included randomized controlled trials that compared double with single-injection techniques, multiple with single-injection techniques, or multiple with double-injection techniques for axillary block in adults undergoing surgery of the distal upper limb. We excluded trials using ultrasound-guided techniques. Independent study selection, risk of bias assessment and data extraction were performed by at least two investigators. We undertook meta-analysis. The 21 included trials involved a total of 2148 participants who received regional anaesthesia for hand, wrist, forearm or elbow surgery. Risk of bias assessment indicated that trial design and conduct were generally adequate; the most common areas of weakness were in blinding and allocation concealment.Eight trials comparing double versus single injections showed a statistically significant decrease in primary anaesthesia failure (risk ratio (RR 0.51), 95% confidence interval (CI) 0.30 to 0.85). Subgroup analysis by method of nerve location showed that the effect size was greater when neurostimulation was used rather than the transarterial technique.Eight trials comparing multiple with single injections showed a statistically significant decrease in primary anaesthesia failure (RR 0.25, 95% CI 0.14 to 0.44) and of incomplete motor block (RR 0.61, 95% CI 0.39 to 0.96) in the multiple injection group.Eleven trials comparing multiple with double injections showed a statistically significant decrease in primary anaesthesia failure (RR 0.28, 95% CI 0.20 to 0.40) and of incomplete motor block (RR 0.55, 95% CI 0.36 to 0.85) in the multiple injection group.Tourniquet pain was significantly reduced with multiple injections compared with double injections (RR 0.53, 95% CI 0.33 to 0.84). Otherwise there were no statistically significant differences between groups in any of the three comparisons on secondary analgesia failure, complications and patient discomfort. The time for block performance was significantly shorter for single and double injections compared with multiple injections. This review provides evidence that multiple-injection techniques using nerve stimulation for axillary plexus block produce more effective anaesthesia than either double or single-injection techniques. However, there was insufficient evidence for a significant difference in other outcomes, including safety.

  1. Missing data and multiple imputation in clinical epidemiological research.

    PubMed

    Pedersen, Alma B; Mikkelsen, Ellen M; Cronin-Fenton, Deirdre; Kristensen, Nickolaj R; Pham, Tra My; Pedersen, Lars; Petersen, Irene

    2017-01-01

    Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data.

  2. Missing data and multiple imputation in clinical epidemiological research

    PubMed Central

    Pedersen, Alma B; Mikkelsen, Ellen M; Cronin-Fenton, Deirdre; Kristensen, Nickolaj R; Pham, Tra My; Pedersen, Lars; Petersen, Irene

    2017-01-01

    Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data. PMID:28352203

  3. Rank Dynamics of Word Usage at Multiple Scales

    NASA Astrophysics Data System (ADS)

    Morales, José A.; Colman, Ewan; Sánchez, Sergio; Sánchez-Puig, Fernanda; Pineda, Carlos; Iñiguez, Gerardo; Cocho, Germinal; Flores, Jorge; Gershenson, Carlos

    2018-05-01

    The recent dramatic increase in online data availability has allowed researchers to explore human culture with unprecedented detail, such as the growth and diversification of language. In particular, it provides statistical tools to explore whether word use is similar across languages, and if so, whether these generic features appear at different scales of language structure. Here we use the Google Books N-grams dataset to analyze the temporal evolution of word usage in several languages. We apply measures proposed recently to study rank dynamics, such as the diversity of N-grams in a given rank, the probability that an N-gram changes rank between successive time intervals, the rank entropy, and the rank complexity. Using different methods, results show that there are generic properties for different languages at different scales, such as a core of words necessary to minimally understand a language. We also propose a null model to explore the relevance of linguistic structure across multiple scales, concluding that N-gram statistics cannot be reduced to word statistics. We expect our results to be useful in improving text prediction algorithms, as well as in shedding light on the large-scale features of language use, beyond linguistic and cultural differences across human populations.

  4. Considerations in the statistical analysis of clinical trials in periodontitis.

    PubMed

    Imrey, P B

    1986-05-01

    Adult periodontitis has been described as a chronic infectious process exhibiting sporadic, acute exacerbations which cause quantal, localized losses of dental attachment. Many analytic problems of periodontal trials are similar to those of other chronic diseases. However, the episodic, localized, infrequent, and relatively unpredictable behavior of exacerbations, coupled with measurement error difficulties, cause some specific problems. Considerable controversy exists as to the proper selection and treatment of multiple site data from the same patient for group comparisons for epidemiologic or therapeutic evaluative purposes. This paper comments, with varying degrees of emphasis, on several issues pertinent to the analysis of periodontal trials. Considerable attention is given to the ways in which measurement variability may distort analytic results. Statistical treatments of multiple site data for descriptive summaries are distinguished from treatments for formal statistical inference to validate therapeutic effects. Evidence suggesting that sites behave independently is contested. For inferential analyses directed at therapeutic or preventive effects, analytic models based on site independence are deemed unsatisfactory. Methods of summarization that may yield more powerful analyses than all-site mean scores, while retaining appropriate treatment of inter-site associations, are suggested. Brief comments and opinions on an assortment of other issues in clinical trial analysis are preferred.

  5. LATENT SPACE MODELS FOR MULTIVIEW NETWORK DATA

    PubMed Central

    Salter-Townshend, Michael; McCormick, Tyler H.

    2018-01-01

    Social relationships consist of interactions along multiple dimensions. In social networks, this means that individuals form multiple types of relationships with the same person (e.g., an individual will not trust all of his/her acquaintances). Statistical models for these data require understanding two related types of dependence structure: (i) structure within each relationship type, or network view, and (ii) the association between views. In this paper, we propose a statistical framework that parsimoniously represents dependence between relationship types while also maintaining enough flexibility to allow individuals to serve different roles in different relationship types. Our approach builds on work on latent space models for networks [see, e.g., J. Amer. Statist. Assoc. 97 (2002) 1090–1098]. These models represent the propensity for two individuals to form edges as conditionally independent given the distance between the individuals in an unobserved social space. Our work departs from previous work in this area by representing dependence structure between network views through a multivariate Bernoulli likelihood, providing a representation of between-view association. This approach infers correlations between views not explained by the latent space model. Using our method, we explore 6 multiview network structures across 75 villages in rural southern Karnataka, India [Banerjee et al. (2013)]. PMID:29721127

  6. LATENT SPACE MODELS FOR MULTIVIEW NETWORK DATA.

    PubMed

    Salter-Townshend, Michael; McCormick, Tyler H

    2017-09-01

    Social relationships consist of interactions along multiple dimensions. In social networks, this means that individuals form multiple types of relationships with the same person (e.g., an individual will not trust all of his/her acquaintances). Statistical models for these data require understanding two related types of dependence structure: (i) structure within each relationship type, or network view, and (ii) the association between views. In this paper, we propose a statistical framework that parsimoniously represents dependence between relationship types while also maintaining enough flexibility to allow individuals to serve different roles in different relationship types. Our approach builds on work on latent space models for networks [see, e.g., J. Amer. Statist. Assoc. 97 (2002) 1090-1098]. These models represent the propensity for two individuals to form edges as conditionally independent given the distance between the individuals in an unobserved social space. Our work departs from previous work in this area by representing dependence structure between network views through a multivariate Bernoulli likelihood, providing a representation of between-view association. This approach infers correlations between views not explained by the latent space model. Using our method, we explore 6 multiview network structures across 75 villages in rural southern Karnataka, India [Banerjee et al. (2013)].

  7. Diffusion-weighted magnetic resonance imaging in the characterization of testicular germ cell neoplasms: Effect of ROI methods on apparent diffusion coefficient values and interobserver variability.

    PubMed

    Tsili, Athina C; Ntorkou, Alexandra; Astrakas, Loukas; Xydis, Vasilis; Tsampalas, Stavros; Sofikitis, Nikolaos; Argyropoulou, Maria I

    2017-04-01

    To evaluate the difference in apparent diffusion coefficient (ADC) measurements at diffusion-weighted (DW) magnetic resonance imaging of differently shaped regions-of-interest (ROIs) in testicular germ cell neoplasms (TGCNS), the diagnostic ability of differently shaped ROIs in differentiating seminomas from nonseminomatous germ cell neoplasms (NSGCNs) and the interobserver variability. Thirty-three TGCNs were retrospectively evaluated. Patients underwent MR examinations, including DWI on a 1.5-T MR system. Two observers measured mean tumor ADCs using four distinct ROI methods: round, square, freehand and multiple small, round ROIs. The interclass correlation coefficient was analyzed to assess interobserver variability. Statistical analysis was used to compare mean ADC measurements among observers, methods and histologic types. All ROI methods showed excellent interobserver agreement, with excellent correlation (P<0.001). Multiple, small ROIs provided the lower mean ADC in TGCNs. Seminomas had lower mean ADC compared to NSGCNs for each ROI method (P<0.001). Round ROI proved the most accurate method in characterizing TGCNS. Interobserver variability in ADC measurement is excellent, irrespective of the ROI shape. Multiple, small round ROIs and round ROI proved the more accurate methods for ADC measurement in the characterization of TGCNs and in the differentiation between seminomas and NSGCNs, respectively. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Sex genes for genomic analysis in human brain: internal controls for comparison of probe level data extraction.

    PubMed Central

    Galfalvy, Hanga C; Erraji-Benchekroun, Loubna; Smyrniotopoulos, Peggy; Pavlidis, Paul; Ellis, Steven P; Mann, J John; Sibille, Etienne; Arango, Victoria

    2003-01-01

    Background Genomic studies of complex tissues pose unique analytical challenges for assessment of data quality, performance of statistical methods used for data extraction, and detection of differentially expressed genes. Ideally, to assess the accuracy of gene expression analysis methods, one needs a set of genes which are known to be differentially expressed in the samples and which can be used as a "gold standard". We introduce the idea of using sex-chromosome genes as an alternative to spiked-in control genes or simulations for assessment of microarray data and analysis methods. Results Expression of sex-chromosome genes were used as true internal biological controls to compare alternate probe-level data extraction algorithms (Microarray Suite 5.0 [MAS5.0], Model Based Expression Index [MBEI] and Robust Multi-array Average [RMA]), to assess microarray data quality and to establish some statistical guidelines for analyzing large-scale gene expression. These approaches were implemented on a large new dataset of human brain samples. RMA-generated gene expression values were markedly less variable and more reliable than MAS5.0 and MBEI-derived values. A statistical technique controlling the false discovery rate was applied to adjust for multiple testing, as an alternative to the Bonferroni method, and showed no evidence of false negative results. Fourteen probesets, representing nine Y- and two X-chromosome linked genes, displayed significant sex differences in brain prefrontal cortex gene expression. Conclusion In this study, we have demonstrated the use of sex genes as true biological internal controls for genomic analysis of complex tissues, and suggested analytical guidelines for testing alternate oligonucleotide microarray data extraction protocols and for adjusting multiple statistical analysis of differentially expressed genes. Our results also provided evidence for sex differences in gene expression in the brain prefrontal cortex, supporting the notion of a putative direct role of sex-chromosome genes in differentiation and maintenance of sexual dimorphism of the central nervous system. Importantly, these analytical approaches are applicable to all microarray studies that include male and female human or animal subjects. PMID:12962547

  9. Sex genes for genomic analysis in human brain: internal controls for comparison of probe level data extraction.

    PubMed

    Galfalvy, Hanga C; Erraji-Benchekroun, Loubna; Smyrniotopoulos, Peggy; Pavlidis, Paul; Ellis, Steven P; Mann, J John; Sibille, Etienne; Arango, Victoria

    2003-09-08

    Genomic studies of complex tissues pose unique analytical challenges for assessment of data quality, performance of statistical methods used for data extraction, and detection of differentially expressed genes. Ideally, to assess the accuracy of gene expression analysis methods, one needs a set of genes which are known to be differentially expressed in the samples and which can be used as a "gold standard". We introduce the idea of using sex-chromosome genes as an alternative to spiked-in control genes or simulations for assessment of microarray data and analysis methods. Expression of sex-chromosome genes were used as true internal biological controls to compare alternate probe-level data extraction algorithms (Microarray Suite 5.0 [MAS5.0], Model Based Expression Index [MBEI] and Robust Multi-array Average [RMA]), to assess microarray data quality and to establish some statistical guidelines for analyzing large-scale gene expression. These approaches were implemented on a large new dataset of human brain samples. RMA-generated gene expression values were markedly less variable and more reliable than MAS5.0 and MBEI-derived values. A statistical technique controlling the false discovery rate was applied to adjust for multiple testing, as an alternative to the Bonferroni method, and showed no evidence of false negative results. Fourteen probesets, representing nine Y- and two X-chromosome linked genes, displayed significant sex differences in brain prefrontal cortex gene expression. In this study, we have demonstrated the use of sex genes as true biological internal controls for genomic analysis of complex tissues, and suggested analytical guidelines for testing alternate oligonucleotide microarray data extraction protocols and for adjusting multiple statistical analysis of differentially expressed genes. Our results also provided evidence for sex differences in gene expression in the brain prefrontal cortex, supporting the notion of a putative direct role of sex-chromosome genes in differentiation and maintenance of sexual dimorphism of the central nervous system. Importantly, these analytical approaches are applicable to all microarray studies that include male and female human or animal subjects.

  10. Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets.

    PubMed

    Marston, Louise; Peacock, Janet L; Yu, Keming; Brocklehurst, Peter; Calvert, Sandra A; Greenough, Anne; Marlow, Neil

    2009-07-01

    Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.

  11. Fighting bias with statistics: Detecting gender differences in responses to items on a preschool science assessment

    NASA Astrophysics Data System (ADS)

    Greenberg, Ariela Caren

    Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.

  12. A Synergy Cropland of China by Fusing Multiple Existing Maps and Statistics.

    PubMed

    Lu, Miao; Wu, Wenbin; You, Liangzhi; Chen, Di; Zhang, Li; Yang, Peng; Tang, Huajun

    2017-07-12

    Accurate information on cropland extent is critical for scientific research and resource management. Several cropland products from remotely sensed datasets are available. Nevertheless, significant inconsistency exists among these products and the cropland areas estimated from these products differ considerably from statistics. In this study, we propose a hierarchical optimization synergy approach (HOSA) to develop a hybrid cropland map of China, circa 2010, by fusing five existing cropland products, i.e., GlobeLand30, Climate Change Initiative Land Cover (CCI-LC), GlobCover 2009, MODIS Collection 5 (MODIS C5), and MODIS Cropland, and sub-national statistics of cropland area. HOSA simplifies the widely used method of score assignment into two steps, including determination of optimal agreement level and identification of the best product combination. The accuracy assessment indicates that the synergy map has higher accuracy of spatial locations and better consistency with statistics than the five existing datasets individually. This suggests that the synergy approach can improve the accuracy of cropland mapping and enhance consistency with statistics.

  13. Determination of osteoporosis risk factors using a multiple logistic regression model in postmenopausal Turkish women.

    PubMed

    Akkus, Zeki; Camdeviren, Handan; Celik, Fatma; Gur, Ali; Nas, Kemal

    2005-09-01

    To determine the risk factors of osteoporosis using a multiple binary logistic regression method and to assess the risk variables for osteoporosis, which is a major and growing health problem in many countries. We presented a case-control study, consisting of 126 postmenopausal healthy women as control group and 225 postmenopausal osteoporotic women as the case group. The study was carried out in the Department of Physical Medicine and Rehabilitation, Dicle University, Diyarbakir, Turkey between 1999-2002. The data from the 351 participants were collected using a standard questionnaire that contains 43 variables. A multiple logistic regression model was then used to evaluate the data and to find the best regression model. We classified 80.1% (281/351) of the participants using the regression model. Furthermore, the specificity value of the model was 67% (84/126) of the control group while the sensitivity value was 88% (197/225) of the case group. We found the distribution of residual values standardized for final model to be exponential using the Kolmogorow-Smirnow test (p=0.193). The receiver operating characteristic curve was found successful to predict patients with risk for osteoporosis. This study suggests that low levels of dietary calcium intake, physical activity, education, and longer duration of menopause are independent predictors of the risk of low bone density in our population. Adequate dietary calcium intake in combination with maintaining a daily physical activity, increasing educational level, decreasing birth rate, and duration of breast-feeding may contribute to healthy bones and play a role in practical prevention of osteoporosis in Southeast Anatolia. In addition, the findings of the present study indicate that the use of multivariate statistical method as a multiple logistic regression in osteoporosis, which maybe influenced by many variables, is better than univariate statistical evaluation.

  14. [The overall assessment of psychological well - being of patients with multiple sclerosis after the application of physical therapy. Part 2].

    PubMed

    Kubsik-Gidlewska, Anna; Klimkiewicz, Robert; Klimkiewicz, Paulina; Janczewska, Katarzyna; Jankowska, Agnieszka; Nowakowski, Tomasz; Woldańska-Okońska, Marta

    2017-01-01

    Multiple sclerosis is a chronic demyelinating disease of the central nervous system, which results a progressive disability. The disease reduces the quality of life of patients, changes the general health perceptions, and also limits performing social roles because of emotional problems. Evaluation of the impact of the methods of rehabilitation to improve the mental health of patients with multiple sclerosis, and also to change individual parameters included in the overall assessment of mental health. The study was conducted in 2010-2014 at the Department of Physical Medicine and Rehabilitation in Lodz. The study included 120 patients with multiple sclerosis. Patients were classified into 4 test groups: in the first was used the laser, in the second - laser and magnetostimulation, in the third - kinesiotherapy, and in the fourth - magnetostimulation. The tests were carried out three times. To evaluate the quality of life was used Quality of Life Questionnaire (MSQOL-54), analyzed the overall assessment of mental health. The improvement in a range of parameters, an overall assessment of the quality of mental health has allowed to get a better overall psychological well-being. ,There was oserved a statistically significant difference at the level of p<0.001 between groups in 4/5 investigated parameters, statistically significant differences weren't obserwed at the evaluation of cognitive functions. The greatest improvement was observed in Group II and Group IV. In the examination it was confirmed an effectiveness of physical treatment, such a the laser radiation and magnetostimulation. Synergism of both methods in their biological activity, allows for evoke of hysteresis fenomenon, resulting in the maintenance of the treatment effects after cessation of rehabilitation. Applying the classical kinesiotherapy only doesn't allow to get long-term effects.

  15. Analysis of Gene Expression Profiles of Soft Tissue Sarcoma Using a Combination of Knowledge-Based Filtering with Integration of Multiple Statistics

    PubMed Central

    Doi, Ayano; Ichinohe, Risa; Ikuyo, Yoriko; Takahashi, Teruyoshi; Marui, Shigetaka; Yasuhara, Koji; Nakamura, Tetsuro; Sugita, Shintaro; Sakamoto, Hiromi; Yoshida, Teruhiko; Hasegawa, Tadashi

    2014-01-01

    The diagnosis and treatment of soft tissue sarcomas (STS) have been difficult. Of the diverse histological subtypes, undifferentiated pleomorphic sarcoma (UPS) is particularly difficult to diagnose accurately, and its classification per se is still controversial. Recent advances in genomic technologies provide an excellent way to address such problems. However, it is often difficult, if not impossible, to identify definitive disease-associated genes using genome-wide analysis alone, primarily because of multiple testing problems. In the present study, we analyzed microarray data from 88 STS patients using a combination method that used knowledge-based filtering and a simulation based on the integration of multiple statistics to reduce multiple testing problems. We identified 25 genes, including hypoxia-related genes (e.g., MIF, SCD1, P4HA1, ENO1, and STAT1) and cell cycle- and DNA repair-related genes (e.g., TACC3, PRDX1, PRKDC, and H2AFY). These genes showed significant differential expression among histological subtypes, including UPS, and showed associations with overall survival. STAT1 showed a strong association with overall survival in UPS patients (logrank p = 1.84×10−6 and adjusted p value 2.99×10−3 after the permutation test). According to the literature, the 25 genes selected are useful not only as markers of differential diagnosis but also as prognostic/predictive markers and/or therapeutic targets for STS. Our combination method can identify genes that are potential prognostic/predictive factors and/or therapeutic targets in STS and possibly in other cancers. These disease-associated genes deserve further preclinical and clinical validation. PMID:25188299

  16. Multi-modal data fusion using source separation: Two effective models based on ICA and IVA and their properties

    PubMed Central

    Adali, Tülay; Levin-Schwartz, Yuri; Calhoun, Vince D.

    2015-01-01

    Fusion of information from multiple sets of data in order to extract a set of features that are most useful and relevant for the given task is inherent to many problems we deal with today. Since, usually, very little is known about the actual interaction among the datasets, it is highly desirable to minimize the underlying assumptions. This has been the main reason for the growing importance of data-driven methods, and in particular of independent component analysis (ICA) as it provides useful decompositions with a simple generative model and using only the assumption of statistical independence. A recent extension of ICA, independent vector analysis (IVA) generalizes ICA to multiple datasets by exploiting the statistical dependence across the datasets, and hence, as we discuss in this paper, provides an attractive solution to fusion of data from multiple datasets along with ICA. In this paper, we focus on two multivariate solutions for multi-modal data fusion that let multiple modalities fully interact for the estimation of underlying features that jointly report on all modalities. One solution is the Joint ICA model that has found wide application in medical imaging, and the second one is the the Transposed IVA model introduced here as a generalization of an approach based on multi-set canonical correlation analysis. In the discussion, we emphasize the role of diversity in the decompositions achieved by these two models, present their properties and implementation details to enable the user make informed decisions on the selection of a model along with its associated parameters. Discussions are supported by simulation results to help highlight the main issues in the implementation of these methods. PMID:26525830

  17. Time-dependent summary receiver operating characteristics for meta-analysis of prognostic studies.

    PubMed

    Hattori, Satoshi; Zhou, Xiao-Hua

    2016-11-20

    Prognostic studies are widely conducted to examine whether biomarkers are associated with patient's prognoses and play important roles in medical decisions. Because findings from one prognostic study may be very limited, meta-analyses may be useful to obtain sound evidence. However, prognostic studies are often analyzed by relying on a study-specific cut-off value, which can lead to difficulty in applying the standard meta-analysis techniques. In this paper, we propose two methods to estimate a time-dependent version of the summary receiver operating characteristics curve for meta-analyses of prognostic studies with a right-censored time-to-event outcome. We introduce a bivariate normal model for the pair of time-dependent sensitivity and specificity and propose a method to form inferences based on summary statistics reported in published papers. This method provides a valid inference asymptotically. In addition, we consider a bivariate binomial model. To draw inferences from this bivariate binomial model, we introduce a multiple imputation method. The multiple imputation is found to be approximately proper multiple imputation, and thus the standard Rubin's variance formula is justified from a Bayesian view point. Our simulation study and application to a real dataset revealed that both methods work well with a moderate or large number of studies and the bivariate binomial model coupled with the multiple imputation outperforms the bivariate normal model with a small number of studies. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  18. High-resolution modeling of thermal thresholds and environmental influences on coral bleaching for local and regional reef management.

    PubMed

    Kumagai, Naoki H; Yamano, Hiroya

    2018-01-01

    Coral reefs are one of the world's most threatened ecosystems, with global and local stressors contributing to their decline. Excessive sea-surface temperatures (SSTs) can cause coral bleaching, resulting in coral death and decreases in coral cover. A SST threshold of 1 °C over the climatological maximum is widely used to predict coral bleaching. In this study, we refined thermal indices predicting coral bleaching at high-spatial resolution (1 km) by statistically optimizing thermal thresholds, as well as considering other environmental influences on bleaching such as ultraviolet (UV) radiation, water turbidity, and cooling effects. We used a coral bleaching dataset derived from the web-based monitoring system Sango Map Project, at scales appropriate for the local and regional conservation of Japanese coral reefs. We recorded coral bleaching events in the years 2004-2016 in Japan. We revealed the influence of multiple factors on the ability to predict coral bleaching, including selection of thermal indices, statistical optimization of thermal thresholds, quantification of multiple environmental influences, and use of multiple modeling methods (generalized linear models and random forests). After optimization, differences in predictive ability among thermal indices were negligible. Thermal index, UV radiation, water turbidity, and cooling effects were important predictors of the occurrence of coral bleaching. Predictions based on the best model revealed that coral reefs in Japan have experienced recent and widespread bleaching. A practical method to reduce bleaching frequency by screening UV radiation was also demonstrated in this paper.

  19. A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging

    PubMed Central

    Logsdon, Benjamin A.; Carty, Cara L.; Reiner, Alexander P.; Dai, James Y.; Kooperberg, Charles

    2012-01-01

    Motivation: For many complex traits, including height, the majority of variants identified by genome-wide association studies (GWAS) have small effects, leaving a significant proportion of the heritable variation unexplained. Although many penalized multiple regression methodologies have been proposed to increase the power to detect associations for complex genetic architectures, they generally lack mechanisms for false-positive control and diagnostics for model over-fitting. Our methodology is the first penalized multiple regression approach that explicitly controls Type I error rates and provide model over-fitting diagnostics through a novel normally distributed statistic defined for every marker within the GWAS, based on results from a variational Bayes spike regression algorithm. Results: We compare the performance of our method to the lasso and single marker analysis on simulated data and demonstrate that our approach has superior performance in terms of power and Type I error control. In addition, using the Women's Health Initiative (WHI) SNP Health Association Resource (SHARe) GWAS of African-Americans, we show that our method has power to detect additional novel associations with body height. These findings replicate by reaching a stringent cutoff of marginal association in a larger cohort. Availability: An R-package, including an implementation of our variational Bayes spike regression (vBsr) algorithm, is available at http://kooperberg.fhcrc.org/soft.html. Contact: blogsdon@fhcrc.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22563072

  20. High-resolution modeling of thermal thresholds and environmental influences on coral bleaching for local and regional reef management

    PubMed Central

    Yamano, Hiroya

    2018-01-01

    Coral reefs are one of the world’s most threatened ecosystems, with global and local stressors contributing to their decline. Excessive sea-surface temperatures (SSTs) can cause coral bleaching, resulting in coral death and decreases in coral cover. A SST threshold of 1 °C over the climatological maximum is widely used to predict coral bleaching. In this study, we refined thermal indices predicting coral bleaching at high-spatial resolution (1 km) by statistically optimizing thermal thresholds, as well as considering other environmental influences on bleaching such as ultraviolet (UV) radiation, water turbidity, and cooling effects. We used a coral bleaching dataset derived from the web-based monitoring system Sango Map Project, at scales appropriate for the local and regional conservation of Japanese coral reefs. We recorded coral bleaching events in the years 2004–2016 in Japan. We revealed the influence of multiple factors on the ability to predict coral bleaching, including selection of thermal indices, statistical optimization of thermal thresholds, quantification of multiple environmental influences, and use of multiple modeling methods (generalized linear models and random forests). After optimization, differences in predictive ability among thermal indices were negligible. Thermal index, UV radiation, water turbidity, and cooling effects were important predictors of the occurrence of coral bleaching. Predictions based on the best model revealed that coral reefs in Japan have experienced recent and widespread bleaching. A practical method to reduce bleaching frequency by screening UV radiation was also demonstrated in this paper. PMID:29473007

  1. Use of multiple cluster analysis methods to explore the validity of a community outcomes concept map.

    PubMed

    Orsi, Rebecca

    2017-02-01

    Concept mapping is now a commonly-used technique for articulating and evaluating programmatic outcomes. However, research regarding validity of knowledge and outcomes produced with concept mapping is sparse. The current study describes quantitative validity analyses using a concept mapping dataset. We sought to increase the validity of concept mapping evaluation results by running multiple cluster analysis methods and then using several metrics to choose from among solutions. We present four different clustering methods based on analyses using the R statistical software package: partitioning around medoids (PAM), fuzzy analysis (FANNY), agglomerative nesting (AGNES) and divisive analysis (DIANA). We then used the Dunn and Davies-Bouldin indices to assist in choosing a valid cluster solution for a concept mapping outcomes evaluation. We conclude that the validity of the outcomes map is high, based on the analyses described. Finally, we discuss areas for further concept mapping methods research. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Synthetic Minority Oversampling Technique and Fractal Dimension for Identifying Multiple Sclerosis

    NASA Astrophysics Data System (ADS)

    Zhang, Yu-Dong; Zhang, Yin; Phillips, Preetha; Dong, Zhengchao; Wang, Shuihua

    Multiple sclerosis (MS) is a severe brain disease. Early detection can provide timely treatment. Fractal dimension can provide statistical index of pattern changes with scale at a given brain image. In this study, our team used susceptibility weighted imaging technique to obtain 676 MS slices and 880 healthy slices. We used synthetic minority oversampling technique to process the unbalanced dataset. Then, we used Canny edge detector to extract distinguishing edges. The Minkowski-Bouligand dimension was a fractal dimension estimation method and used to extract features from edges. Single hidden layer neural network was used as the classifier. Finally, we proposed a three-segment representation biogeography-based optimization to train the classifier. Our method achieved a sensitivity of 97.78±1.29%, a specificity of 97.82±1.60% and an accuracy of 97.80±1.40%. The proposed method is superior to seven state-of-the-art methods in terms of sensitivity and accuracy.

  3. Statistics of dislocation pinning at localized obstacles

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dutta, A.; Bhattacharya, M., E-mail: mishreyee@vecc.gov.in; Barat, P.

    2014-10-14

    Pinning of dislocations at nanosized obstacles like precipitates, voids, and bubbles is a crucial mechanism in the context of phenomena like hardening and creep. The interaction between such an obstacle and a dislocation is often studied at fundamental level by means of analytical tools, atomistic simulations, and finite element methods. Nevertheless, the information extracted from such studies cannot be utilized to its maximum extent on account of insufficient information about the underlying statistics of this process comprising a large number of dislocations and obstacles in a system. Here, we propose a new statistical approach, where the statistics of pinning ofmore » dislocations by idealized spherical obstacles is explored by taking into account the generalized size-distribution of the obstacles along with the dislocation density within a three-dimensional framework. Starting with a minimal set of material parameters, the framework employs the method of geometrical statistics with a few simple assumptions compatible with the real physical scenario. The application of this approach, in combination with the knowledge of fundamental dislocation-obstacle interactions, has successfully been demonstrated for dislocation pinning at nanovoids in neutron irradiated type 316-stainless steel in regard to the non-conservative motion of dislocations. An interesting phenomenon of transition from rare pinning to multiple pinning regimes with increasing irradiation temperature is revealed.« less

  4. SparRec: An effective matrix completion framework of missing data imputation for GWAS

    NASA Astrophysics Data System (ADS)

    Jiang, Bo; Ma, Shiqian; Causey, Jason; Qiao, Linbo; Hardin, Matthew Price; Bitts, Ian; Johnson, Daniel; Zhang, Shuzhong; Huang, Xiuzhen

    2016-10-01

    Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.

  5. Ensemble-based prediction of RNA secondary structures.

    PubMed

    Aghaeepour, Nima; Hoos, Holger H

    2013-04-24

    Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.

  6. Valid Statistical Analysis for Logistic Regression with Multiple Sources

    NASA Astrophysics Data System (ADS)

    Fienberg, Stephen E.; Nardi, Yuval; Slavković, Aleksandra B.

    Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.

  7. Visualization of Spatio-Temporal Relations in Movement Event Using Multi-View

    NASA Astrophysics Data System (ADS)

    Zheng, K.; Gu, D.; Fang, F.; Wang, Y.; Liu, H.; Zhao, W.; Zhang, M.; Li, Q.

    2017-09-01

    Spatio-temporal relations among movement events extracted from temporally varying trajectory data can provide useful information about the evolution of individual or collective movers, as well as their interactions with their spatial and temporal contexts. However, the pure statistical tools commonly used by analysts pose many difficulties, due to the large number of attributes embedded in multi-scale and multi-semantic trajectory data. The need for models that operate at multiple scales to search for relations at different locations within time and space, as well as intuitively interpret what these relations mean, also presents challenges. Since analysts do not know where or when these relevant spatio-temporal relations might emerge, these models must compute statistical summaries of multiple attributes at different granularities. In this paper, we propose a multi-view approach to visualize the spatio-temporal relations among movement events. We describe a method for visualizing movement events and spatio-temporal relations that uses multiple displays. A visual interface is presented, and the user can interactively select or filter spatial and temporal extents to guide the knowledge discovery process. We also demonstrate how this approach can help analysts to derive and explain the spatio-temporal relations of movement events from taxi trajectory data.

  8. Detecting concerted demographic response across community assemblages using hierarchical approximate Bayesian computation.

    PubMed

    Chan, Yvonne L; Schanzenbach, David; Hickerson, Michael J

    2014-09-01

    Methods that integrate population-level sampling from multiple taxa into a single community-level analysis are an essential addition to the comparative phylogeographic toolkit. Detecting how species within communities have demographically tracked each other in space and time is important for understanding the effects of future climate and landscape changes and the resulting acceleration of extinctions, biological invasions, and potential surges in adaptive evolution. Here, we present a statistical framework for such an analysis based on hierarchical approximate Bayesian computation (hABC) with the goal of detecting concerted demographic histories across an ecological assemblage. Our method combines population genetic data sets from multiple taxa into a single analysis to estimate: 1) the proportion of a community sample that demographically expanded in a temporally clustered pulse and 2) when the pulse occurred. To validate the accuracy and utility of this new approach, we use simulation cross-validation experiments and subsequently analyze an empirical data set of 32 avian populations from Australia that are hypothesized to have expanded from smaller refugia populations in the late Pleistocene. The method can accommodate data set heterogeneity such as variability in effective population size, mutation rates, and sample sizes across species and exploits the statistical strength from the simultaneous analysis of multiple species. This hABC framework used in a multitaxa demographic context can increase our understanding of the impact of historical climate change by determining what proportion of the community responded in concert or independently and can be used with a wide variety of comparative phylogeographic data sets as biota-wide DNA barcoding data sets accumulate. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  9. In vitro biotransformation rates in fish liver S9: effect of dosing techniques.

    PubMed

    Lee, Yung-Shan; Lee, Danny H Y; Delafoulhouze, Maximilien; Otton, S Victoria; Moore, Margo M; Kennedy, Chris J; Gobas, Frank A P C

    2014-08-01

    In vitro biotransformation assays are currently being explored to improve estimates of bioconcentration factors of potentially bioaccumulative organic chemicals in fish. The present study compares thin-film and solvent-delivery dosing techniques as well as single versus multiple chemical dosing for measuring biotransformation rates of selected polycyclic aromatic hydrocarbons in rainbow trout (Oncorhynchus mykiss) liver S9. The findings show that biotransformation rates of very hydrophobic substances can be accurately measured in thin-film sorbent-dosing assays from concentration-time profiles in the incubation medium but not from those in the sorbent phase because of low chemical film-to-incubation-medium mass-transfer rates at the incubation temperature of 13.5 °C required for trout liver assays. Biotransformation rates determined by thin-film dosing were greater than those determined by solvent-delivery dosing for chrysene (octanol-water partition coefficient [KOW ] =10(5.60) ) and benzo[a]pyrene (KOW  =10(6.04) ), whereas there were no statistical differences in pyrene (KOW  =10(5.18) ) biotransformation rates between the 2 methods. In sorbent delivery-based assays, simultaneous multiple-chemical dosing produced biotransformation rates that were not statistically different from those measured in single-chemical dosing experiments for pyrene and benzo[a]pyrene but not for chrysene. In solvent-delivery experiments, multiple-chemical dosing produced biotransformation rates that were much smaller than those in single-chemical dosing experiments for all test chemicals. While thin-film sorbent-phase and solvent delivery-based dosing methods are both suitable methods for measuring biotransformation rates of substances of intermediate hydrophobicity, thin-film sorbent-phase dosing may be more suitable for superhydrophobic chemicals. © 2014 SETAC.

  10. Difficult Decisions: A Qualitative Exploration of the Statistical Decision Making Process from the Perspectives of Psychology Students and Academics

    PubMed Central

    Allen, Peter J.; Dorozenko, Kate P.; Roberts, Lynne D.

    2016-01-01

    Quantitative research methods are essential to the development of professional competence in psychology. They are also an area of weakness for many students. In particular, students are known to struggle with the skill of selecting quantitative analytical strategies appropriate for common research questions, hypotheses and data types. To begin understanding this apparent deficit, we presented nine psychology undergraduates (who had all completed at least one quantitative methods course) with brief research vignettes, and asked them to explicate the process they would follow to identify an appropriate statistical technique for each. Thematic analysis revealed that all participants found this task challenging, and even those who had completed several research methods courses struggled to articulate how they would approach the vignettes on more than a very superficial and intuitive level. While some students recognized that there is a systematic decision making process that can be followed, none could describe it clearly or completely. We then presented the same vignettes to 10 psychology academics with particular expertise in conducting research and/or research methods instruction. Predictably, these “experts” were able to describe a far more systematic, comprehensive, flexible, and nuanced approach to statistical decision making, which begins early in the research process, and pays consideration to multiple contextual factors. They were sensitive to the challenges that students experience when making statistical decisions, which they attributed partially to how research methods and statistics are commonly taught. This sensitivity was reflected in their pedagogic practices. When asked to consider the format and features of an aid that could facilitate the statistical decision making process, both groups expressed a preference for an accessible, comprehensive and reputable resource that follows a basic decision tree logic. For the academics in particular, this aid should function as a teaching tool, which engages the user with each choice-point in the decision making process, rather than simply providing an “answer.” Based on these findings, we offer suggestions for tools and strategies that could be deployed in the research methods classroom to facilitate and strengthen students' statistical decision making abilities. PMID:26909064

  11. Difficult Decisions: A Qualitative Exploration of the Statistical Decision Making Process from the Perspectives of Psychology Students and Academics.

    PubMed

    Allen, Peter J; Dorozenko, Kate P; Roberts, Lynne D

    2016-01-01

    Quantitative research methods are essential to the development of professional competence in psychology. They are also an area of weakness for many students. In particular, students are known to struggle with the skill of selecting quantitative analytical strategies appropriate for common research questions, hypotheses and data types. To begin understanding this apparent deficit, we presented nine psychology undergraduates (who had all completed at least one quantitative methods course) with brief research vignettes, and asked them to explicate the process they would follow to identify an appropriate statistical technique for each. Thematic analysis revealed that all participants found this task challenging, and even those who had completed several research methods courses struggled to articulate how they would approach the vignettes on more than a very superficial and intuitive level. While some students recognized that there is a systematic decision making process that can be followed, none could describe it clearly or completely. We then presented the same vignettes to 10 psychology academics with particular expertise in conducting research and/or research methods instruction. Predictably, these "experts" were able to describe a far more systematic, comprehensive, flexible, and nuanced approach to statistical decision making, which begins early in the research process, and pays consideration to multiple contextual factors. They were sensitive to the challenges that students experience when making statistical decisions, which they attributed partially to how research methods and statistics are commonly taught. This sensitivity was reflected in their pedagogic practices. When asked to consider the format and features of an aid that could facilitate the statistical decision making process, both groups expressed a preference for an accessible, comprehensive and reputable resource that follows a basic decision tree logic. For the academics in particular, this aid should function as a teaching tool, which engages the user with each choice-point in the decision making process, rather than simply providing an "answer." Based on these findings, we offer suggestions for tools and strategies that could be deployed in the research methods classroom to facilitate and strengthen students' statistical decision making abilities.

  12. Comparison of Dissolution Similarity Assessment Methods for Products with Large Variations: f2 Statistics and Model-Independent Multivariate Confidence Region Procedure for Dissolution Profiles of Multiple Oral Products.

    PubMed

    Yoshida, Hiroyuki; Shibata, Hiroko; Izutsu, Ken-Ichi; Goda, Yukihiro

    2017-01-01

    The current Japanese Ministry of Health Labour and Welfare (MHLW)'s Guideline for Bioequivalence Studies of Generic Products uses averaged dissolution rates for the assessment of dissolution similarity between test and reference formulations. This study clarifies how the application of model-independent multivariate confidence region procedure (Method B), described in the European Medical Agency and U.S. Food and Drug Administration guidelines, affects similarity outcomes obtained empirically from dissolution profiles with large variations in individual dissolution rates. Sixty-one datasets of dissolution profiles for immediate release, oral generic, and corresponding innovator products that showed large variation in individual dissolution rates in generic products were assessed on their similarity by using the f 2 statistics defined in the MHLW guidelines (MHLW f 2 method) and two different Method B procedures, including a bootstrap method applied with f 2 statistics (BS method) and a multivariate analysis method using the Mahalanobis distance (MV method). The MHLW f 2 and BS methods provided similar dissolution similarities between reference and generic products. Although a small difference in the similarity assessment may be due to the decrease in the lower confidence interval for expected f 2 values derived from the large variation in individual dissolution rates, the MV method provided results different from those obtained through MHLW f 2 and BS methods. Analysis of actual dissolution data for products with large individual variations would provide valuable information towards an enhanced understanding of these methods and their possible incorporation in the MHLW guidelines.

  13. Multiple commodities in statistical microeconomics: Model and market

    NASA Astrophysics Data System (ADS)

    Baaquie, Belal E.; Yu, Miao; Du, Xin

    2016-11-01

    A statistical generalization of microeconomics has been made in Baaquie (2013). In Baaquie et al. (2015), the market behavior of single commodities was analyzed and it was shown that market data provides strong support for the statistical microeconomic description of commodity prices. The case of multiple commodities is studied and a parsimonious generalization of the single commodity model is made for the multiple commodities case. Market data shows that the generalization can accurately model the simultaneous correlation functions of up to four commodities. To accurately model five or more commodities, further terms have to be included in the model. This study shows that the statistical microeconomics approach is a comprehensive and complete formulation of microeconomics, and which is independent to the mainstream formulation of microeconomics.

  14. Automating approximate Bayesian computation by local linear regression.

    PubMed

    Thornton, Kevin R

    2009-07-07

    In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular.Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. In practice, the ABCreg simplifies implementing ABC based on local-linear regression.

  15. Air Quality Forecasting through Different Statistical and Artificial Intelligence Techniques

    NASA Astrophysics Data System (ADS)

    Mishra, D.; Goyal, P.

    2014-12-01

    Urban air pollution forecasting has emerged as an acute problem in recent years because there are sever environmental degradation due to increase in harmful air pollutants in the ambient atmosphere. In this study, there are different types of statistical as well as artificial intelligence techniques are used for forecasting and analysis of air pollution over Delhi urban area. These techniques are principle component analysis (PCA), multiple linear regression (MLR) and artificial neural network (ANN) and the forecasting are observed in good agreement with the observed concentrations through Central Pollution Control Board (CPCB) at different locations in Delhi. But such methods suffers from disadvantages like they provide limited accuracy as they are unable to predict the extreme points i.e. the pollution maximum and minimum cut-offs cannot be determined using such approach. Also, such methods are inefficient approach for better output forecasting. But with the advancement in technology and research, an alternative to the above traditional methods has been proposed i.e. the coupling of statistical techniques with artificial Intelligence (AI) can be used for forecasting purposes. The coupling of PCA, ANN and fuzzy logic is used for forecasting of air pollutant over Delhi urban area. The statistical measures e.g., correlation coefficient (R), normalized mean square error (NMSE), fractional bias (FB) and index of agreement (IOA) of the proposed model are observed in better agreement with the all other models. Hence, the coupling of statistical and artificial intelligence can be use for the forecasting of air pollutant over urban area.

  16. Accounting for multiple climate components when estimating climate change exposure and velocity

    USGS Publications Warehouse

    Nadeau, Christopher P.; Fuller, Angela K.

    2015-01-01

    The effect of anthropogenic climate change on organisms will likely be related to climate change exposure and velocity at local and regional scales. However, common methods to estimate climate change exposure and velocity ignore important components of climate that are known to affect the ecology and evolution of organisms.We develop a novel index of climate change (climate overlap) that simultaneously estimates changes in the means, variation and correlation between multiple weather variables. Specifically, we estimate the overlap between multivariate normal probability distributions representing historical and current or projected future climates. We provide methods for estimating the statistical significance of climate overlap values and methods to estimate velocity using climate overlap.We show that climates have changed significantly across 80% of the continental United States in the last 32 years and that much of this change is due to changes in the variation and correlation between weather variables (two statistics that are rarely incorporated into climate change studies). We also show that projected future temperatures are predicted to be locally novel (<1·5% overlap) across most of the global land surface and that exposure is likely to be highest in areas with low historical climate variation. Last, we show that accounting for changes in the variation and correlation between multiple weather variables can dramatically affect velocity estimates; mean velocity estimates in the continental United States were between 3·1 and 19·0 km yr−1when estimated using climate overlap compared to 1·4 km yr−1 when estimated using traditional methods.Our results suggest that accounting for changes in the means, variation and correlation between multiple weather variables can dramatically affect estimates of climate change exposure and velocity. These climate components are known to affect the ecology and evolution of organisms, but are ignored by most measures of climate change. We conclude with a set of future directions and recommend future work to determine which measures of climate change exposure and velocity are most related to biological responses to climate change.

  17. The effect of repeated firings on the color change of dental ceramics using different glazing methods

    PubMed Central

    Yılmaz, Kerem; Ozturk, Caner

    2014-01-01

    PURPOSE Surface color is one of the main criteria to obtain an ideal esthetic. Many factors such as the type of the material, surface specifications, number of firings, firing temperature and thickness of the porcelain are all important to provide an unchanged surface color in dental ceramics. The aim of this study was to evaluate the color changes in dental ceramics according to the material type and glazing methods, during the multiple firings. MATERIALS AND METHODS Three different types of dental ceramics (IPS Classical metal ceramic, Empress Esthetic and Empress 2 ceramics) were used in the study. Porcelains were evaluated under five main groups according to glaze and natural glaze methods. Color changes (ΔE) and changes in color parameters (ΔL, Δa, Δb) were determined using colorimeter during the control, the first, third, fifth, and seventh firings. The statistical analysis of the results was performed using ANOVA and Tukey test. RESULTS The color changes which occurred upon material-method-firing interaction were statistically significant (P<.05). ΔE, ΔL, Δa and Δb values also demonstrated a negative trend. The MC-G group was less affected in terms of color changes compared to other groups. In all-ceramic specimens, the surface color was significantly affected by multiple firings. CONCLUSION Firing detrimentally affected the structure of the porcelain surface and hence caused fading of the color and prominence of yellow and red characters. Compressible all-ceramics were remarkably affected by repeated firings due to their crystalline structure. PMID:25551001

  18. A statistical analysis of the association between tropical cyclone intensity change and tornado frequency

    NASA Astrophysics Data System (ADS)

    Moore, Todd W.

    2016-07-01

    Tropical cyclones often produce tornadoes that have the potential to compound the injury and fatality counts and the economic losses associated with tropical cyclones. These tornadoes do not occur uniformly through time or across space. Multiple statistical methods were used in this study to analyze the association between tropical cyclone intensity change and tornado frequency. Results indicate that there is an association between the two and that tropical cyclones tend to produce more tornadoes when they are weakening, but the association is weak. Tropical cyclones can also produce a substantial number of tornadoes when they are relatively stable or strengthening.

  19. A first principles calculation and statistical mechanics modeling of defects in Al-H system

    NASA Astrophysics Data System (ADS)

    Ji, Min; Wang, Cai-Zhuang; Ho, Kai-Ming

    2007-03-01

    The behavior of defects and hydrogen in Al was investigated by first principles calculations and statistical mechanics modeling. The formation energy of different defects in Al+H system such as Al vacancy, H in institution and multiple H in Al vacancy were calculated by first principles method. Defect concentration in thermodynamical equilibrium was studied by total free energy calculation including configuration entropy and defect-defect interaction from low concentration limit to hydride limit. In our grand canonical ensemble model, hydrogen chemical potential under different environment plays an important role in determing the defect concentration and properties in Al-H system.

  20. Application of radioisotope XRF and thermoluminescence (TL) dating in investigation of pottery from Tell AL-Kasra archaeological site, Syria.

    PubMed

    Abboud, R; Issa, H; Abed-Allah, Y D; Bakraji, E H

    2015-11-01

    Statistical analysis based on chemical composition, using radioisotope X-ray fluorescence, have been applied on 39 ancient pottery fragments coming from the excavation at Tell Al-Kasra archaeological site, Syria. Three groups were defined by applying Cluster and Factor analysis statistical methods. Thermoluminescence (TL) dating was investigated on three sherds taken from the bathroom (hammam) on the site. Multiple aliquot additive dose (MAAD) was used to estimate the paleodose value, and the gamma spectrometry was used to estimate the dose rate. The average age was found to be 715±36 year. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. A MULTIPLE TESTING OF THE ABC METHOD AND THE DEVELOPMENT OF A SECOND-GENERATION MODEL. PART II, TEST RESULTS AND AN ANALYSIS OF RECALL RATIO.

    ERIC Educational Resources Information Center

    ALTMANN, BERTHOLD

    AFTER A BRIEF SUMMARY OF THE TEST PROGRAM (DESCRIBED MORE FULLY IN LI 000 318), THE STATISTICAL RESULTS TABULATED AS OVERALL "ABC (APPROACH BY CONCEPT)-RELEVANCE RATIOS" AND "ABC-RECALL FIGURES" ARE PRESENTED AND REVIEWED. AN ABSTRACT MODEL DEVELOPED IN ACCORDANCE WITH MAX WEBER'S "IDEALTYPUS" ("DIE OBJEKTIVITAET…

  2. Evaluating Equating Results in the Non-Equivalent Groups with Anchor Test Design Using Equipercentile and Equity Criteria

    ERIC Educational Resources Information Center

    Duong, Minh Quang

    2011-01-01

    Testing programs often use multiple test forms of the same test to control item exposure and to ensure test security. Although test forms are constructed to be as similar as possible, they often differ. Test equating techniques are those statistical methods used to adjust scores obtained on different test forms of the same test so that they are…

  3. Why Can't They Keep the Book Longer and Do We Really Need to Charge Fines? Assessing Circulation Policies at the Harold B. Lee Library: A Case Study

    ERIC Educational Resources Information Center

    Wilson, Duane

    2014-01-01

    In response to a charge from the library administration, the Circulation Committee of the Harold B. Lee Library at Brigham Young University designed and implemented a thorough assessment of circulation policies. Using multiple assessment methods including surveys, focus groups, and statistical analysis, the committee determined that the…

  4. Developing Image Processing Meta-Algorithms with Data Mining of Multiple Metrics

    PubMed Central

    Cunha, Alexandre; Toga, A. W.; Parker, D. Stott

    2014-01-01

    People often use multiple metrics in image processing, but here we take a novel approach of mining the values of batteries of metrics on image processing results. We present a case for extending image processing methods to incorporate automated mining of multiple image metric values. Here by a metric we mean any image similarity or distance measure, and in this paper we consider intensity-based and statistical image measures and focus on registration as an image processing problem. We show how it is possible to develop meta-algorithms that evaluate different image processing results with a number of different metrics and mine the results in an automated fashion so as to select the best results. We show that the mining of multiple metrics offers a variety of potential benefits for many image processing problems, including improved robustness and validation. PMID:24653748

  5. Interpreting carnivore scent-station surveys

    USGS Publications Warehouse

    Sargeant, G.A.; Johnson, D.H.; Berg, W.E.

    1998-01-01

    The scent-station survey method has been widely used to estimate trends in carnivore abundance. However, statistical properties of scent-station data are poorly understood, and the relation between scent-station indices and carnivore abundance has not been adequately evaluated. We assessed properties of scent-station indices by analyzing data collected in Minnesota during 1986-03. Visits to stations separated by <2 km were correlated for all species because individual carnivores sometimes visited several stations in succession. Thus, visits to stations had an intractable statistical distribution. Dichotomizing results for lines of 10 stations (0 or 21 visits) produced binomially distributed data that were robust to multiple visits by individuals. We abandoned 2-way comparisons among years in favor of tests for population trend, which are less susceptible to bias, and analyzed results separately for biogeographic sections of Minnesota because trends differed among sections. Before drawing inferences about carnivore population trends, we reevaluated published validation experiments. Results implicated low statistical power and confounding as possible explanations for equivocal or conflicting results of validation efforts. Long-term trends in visitation rates probably reflect real changes in populations, but poor spatial and temporal resolution, susceptibility to confounding, and low statistical power limit the usefulness of this survey method.

  6. Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed p-values

    PubMed Central

    Tong, Tiejun; Feng, Zeny; Hilton, Julia S.; Zhao, Hongyu

    2013-01-01

    Estimating the proportion of true null hypotheses, π0, has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π0 by incorporating the distribution pattern of the observed p-values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p-values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1 − λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance. PMID:24078762

  7. Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed p-values.

    PubMed

    Tong, Tiejun; Feng, Zeny; Hilton, Julia S; Zhao, Hongyu

    2013-01-01

    Estimating the proportion of true null hypotheses, π 0 , has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π 0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π 0 by incorporating the distribution pattern of the observed p -values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p -values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1 - λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance.

  8. Evaluation of graphical and statistical representation of analytical signals of spectrophotometric methods

    NASA Astrophysics Data System (ADS)

    Lotfy, Hayam Mahmoud; Fayez, Yasmin Mohammed; Tawakkol, Shereen Mostafa; Fahmy, Nesma Mahmoud; Shehata, Mostafa Abd El-Atty

    2017-09-01

    Simultaneous determination of miconazole (MIC), mometasone furaoate (MF), and gentamicin (GEN) in their pharmaceutical combination. Gentamicin determination is based on derivatization with of o-phthalaldehyde reagent (OPA) without any interference of other cited drugs, while the spectra of MIC and MF are resolved using both successive and progressive resolution techniques. The first derivative spectrum of MF is measured using constant multiplication or spectrum subtraction, while its recovered zero order spectrum is obtained using derivative transformation. Beside the application of constant value method. Zero order spectrum of MIC is obtained by derivative transformation after getting its first derivative spectrum by derivative subtraction method. The novel method namely, differential amplitude modulation is used to get the concentration of MF and MIC, while the novel graphical method namely, concentration value is used to get the concentration of MIC, MF, and GEN. Accuracy and precision testing of the developed methods show good results. Specificity of the methods is ensured and is successfully applied for the analysis of pharmaceutical formulation of the three drugs in combination. ICH guidelines are used for validation of the proposed methods. Statistical data are calculated, and the results are satisfactory revealing no significant difference regarding accuracy and precision.

  9. Quantifying the impact of between-study heterogeneity in multivariate meta-analyses

    PubMed Central

    Jackson, Dan; White, Ian R; Riley, Richard D

    2012-01-01

    Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950

  10. A Novel Genome-Information Content-Based Statistic for Genome-Wide Association Analysis Designed for Next-Generation Sequencing Data

    PubMed Central

    Luo, Li; Zhu, Yun

    2012-01-01

    Abstract The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T2, collapsing method, multivariate and collapsing (CMC) method, individual χ2 test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets. PMID:22651812

  11. A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.

    PubMed

    Luo, Li; Zhu, Yun; Xiong, Momiao

    2012-06-01

    The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.

  12. Simultaneous reconstruction of multiple depth images without off-focus points in integral imaging using a graphics processing unit.

    PubMed

    Yi, Faliu; Lee, Jieun; Moon, Inkyu

    2014-05-01

    The reconstruction of multiple depth images with a ray back-propagation algorithm in three-dimensional (3D) computational integral imaging is computationally burdensome. Further, a reconstructed depth image consists of a focus and an off-focus area. Focus areas are 3D points on the surface of an object that are located at the reconstructed depth, while off-focus areas include 3D points in free-space that do not belong to any object surface in 3D space. Generally, without being removed, the presence of an off-focus area would adversely affect the high-level analysis of a 3D object, including its classification, recognition, and tracking. Here, we use a graphics processing unit (GPU) that supports parallel processing with multiple processors to simultaneously reconstruct multiple depth images using a lookup table containing the shifted values along the x and y directions for each elemental image in a given depth range. Moreover, each 3D point on a depth image can be measured by analyzing its statistical variance with its corresponding samples, which are captured by the two-dimensional (2D) elemental images. These statistical variances can be used to classify depth image pixels as either focus or off-focus points. At this stage, the measurement of focus and off-focus points in multiple depth images is also implemented in parallel on a GPU. Our proposed method is conducted based on the assumption that there is no occlusion of the 3D object during the capture stage of the integral imaging process. Experimental results have demonstrated that this method is capable of removing off-focus points in the reconstructed depth image. The results also showed that using a GPU to remove the off-focus points could greatly improve the overall computational speed compared with using a CPU.

  13. Evaluating the Use of Existing Data Sources, Probabilistic Linkage, and Multiple Imputation to Build Population-based Injury Databases Across Phases of Trauma Care

    PubMed Central

    Newgard, Craig; Malveau, Susan; Staudenmayer, Kristan; Wang, N. Ewen; Hsia, Renee Y.; Mann, N. Clay; Holmes, James F.; Kuppermann, Nathan; Haukoos, Jason S.; Bulger, Eileen M.; Dai, Mengtao; Cook, Lawrence J.

    2012-01-01

    Objectives The objective was to evaluate the process of using existing data sources, probabilistic linkage, and multiple imputation to create large population-based injury databases matched to outcomes. Methods This was a retrospective cohort study of injured children and adults transported by 94 emergency medical systems (EMS) agencies to 122 hospitals in seven regions of the western United States over a 36-month period (2006 to 2008). All injured patients evaluated by EMS personnel within specific geographic catchment areas were included, regardless of field disposition or outcome. The authors performed probabilistic linkage of EMS records to four hospital and postdischarge data sources (emergency department [ED] data, patient discharge data, trauma registries, and vital statistics files) and then handled missing values using multiple imputation. The authors compare and evaluate matched records, match rates (proportion of matches among eligible patients), and injury outcomes within and across sites. Results There were 381,719 injured patients evaluated by EMS personnel in the seven regions. Among transported patients, match rates ranged from 14.9% to 87.5% and were directly affected by the availability of hospital data sources and proportion of missing values for key linkage variables. For vital statistics records (1-year mortality), estimated match rates ranged from 88.0% to 98.7%. Use of multiple imputation (compared to complete case analysis) reduced bias for injury outcomes, although sample size, percentage missing, type of variable, and combined-site versus single-site imputation models all affected the resulting estimates and variance. Conclusions This project demonstrates the feasibility and describes the process of constructing population-based injury databases across multiple phases of care using existing data sources and commonly available analytic methods. Attention to key linkage variables and decisions for handling missing values can be used to increase match rates between data sources, minimize bias, and preserve sampling design. PMID:22506952

  14. Relations between volumetric measures of brain structure and attentional function in spina bifida: utilization of robust statistical approaches.

    PubMed

    Kulesz, Paulina A; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M; Francis, David J

    2015-03-01

    Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product-moment correlation was compared with 4 robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator. All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PsycINFO Database Record (c) 2015 APA, all rights reserved.

  15. Mortality table construction

    NASA Astrophysics Data System (ADS)

    Sutawanir

    2015-12-01

    Mortality tables play important role in actuarial studies such as life annuities, premium determination, premium reserve, valuation pension plan, pension funding. Some known mortality tables are CSO mortality table, Indonesian Mortality Table, Bowers mortality table, Japan Mortality table. For actuary applications some tables are constructed with different environment such as single decrement, double decrement, and multiple decrement. There exist two approaches in mortality table construction : mathematics approach and statistical approach. Distribution model and estimation theory are the statistical concepts that are used in mortality table construction. This article aims to discuss the statistical approach in mortality table construction. The distributional assumptions are uniform death distribution (UDD) and constant force (exponential). Moment estimation and maximum likelihood are used to estimate the mortality parameter. Moment estimation methods are easier to manipulate compared to maximum likelihood estimation (mle). However, the complete mortality data are not used in moment estimation method. Maximum likelihood exploited all available information in mortality estimation. Some mle equations are complicated and solved using numerical methods. The article focus on single decrement estimation using moment and maximum likelihood estimation. Some extension to double decrement will introduced. Simple dataset will be used to illustrated the mortality estimation, and mortality table.

  16. Incremental dynamical downscaling for probabilistic analysis based on multiple GCM projections

    NASA Astrophysics Data System (ADS)

    Wakazuki, Y.

    2015-12-01

    A dynamical downscaling method for probabilistic regional scale climate change projections was developed to cover an uncertainty of multiple general circulation model (GCM) climate simulations. The climatological increments (future minus present climate states) estimated by GCM simulation results were statistically analyzed using the singular vector decomposition. Both positive and negative perturbations from the ensemble mean with the magnitudes of their standard deviations were extracted and were added to the ensemble mean of the climatological increments. The analyzed multiple modal increments were utilized to create multiple modal lateral boundary conditions for the future climate regional climate model (RCM) simulations by adding to an objective analysis data. This data handling is regarded to be an advanced method of the pseudo-global-warming (PGW) method previously developed by Kimura and Kitoh (2007). The incremental handling for GCM simulations realized approximated probabilistic climate change projections with the smaller number of RCM simulations. Three values of a climatological variable simulated by RCMs for a mode were used to estimate the response to the perturbation of the mode. For the probabilistic analysis, climatological variables of RCMs were assumed to show linear response to the multiple modal perturbations, although the non-linearity was seen for local scale rainfall. Probability of temperature was able to be estimated within two modes perturbation simulations, where the number of RCM simulations for the future climate is five. On the other hand, local scale rainfalls needed four modes simulations, where the number of the RCM simulations is nine. The probabilistic method is expected to be used for regional scale climate change impact assessment in the future.

  17. Multivariate analysis in thoracic research.

    PubMed

    Mengual-Macenlle, Noemí; Marcos, Pedro J; Golpe, Rafael; González-Rivas, Diego

    2015-03-01

    Multivariate analysis is based in observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. The development of multivariate methods emerged to analyze large databases and increasingly complex data. Since the best way to represent the knowledge of reality is the modeling, we should use multivariate statistical methods. Multivariate methods are designed to simultaneously analyze data sets, i.e., the analysis of different variables for each person or object studied. Keep in mind at all times that all variables must be treated accurately reflect the reality of the problem addressed. There are different types of multivariate analysis and each one should be employed according to the type of variables to analyze: dependent, interdependence and structural methods. In conclusion, multivariate methods are ideal for the analysis of large data sets and to find the cause and effect relationships between variables; there is a wide range of analysis types that we can use.

  18. Testing Nelder-Mead based repulsion algorithms for multiple roots of nonlinear systems via a two-level factorial design of experiments.

    PubMed

    Ramadas, Gisela C V; Rocha, Ana Maria A C; Fernandes, Edite M G P

    2015-01-01

    This paper addresses the challenging task of computing multiple roots of a system of nonlinear equations. A repulsion algorithm that invokes the Nelder-Mead (N-M) local search method and uses a penalty-type merit function based on the error function, known as 'erf', is presented. In the N-M algorithm context, different strategies are proposed to enhance the quality of the solutions and improve the overall efficiency. The main goal of this paper is to use a two-level factorial design of experiments to analyze the statistical significance of the observed differences in selected performance criteria produced when testing different strategies in the N-M based repulsion algorithm. The main goal of this paper is to use a two-level factorial design of experiments to analyze the statistical significance of the observed differences in selected performance criteria produced when testing different strategies in the N-M based repulsion algorithm.

  19. Statistical Analysis of Microarray Data with Replicated Spots: A Case Study with Synechococcus WH8102

    PubMed Central

    Thomas, E. V.; Phillippy, K. H.; Brahamsha, B.; Haaland, D. M.; Timlin, J. A.; Elbourne, L. D. H.; Palenik, B.; Paulsen, I. T.

    2009-01-01

    Until recently microarray experiments often involved relatively few arrays with only a single representation of each gene on each array. A complete genome microarray with multiple spots per gene (spread out spatially across the array) was developed in order to compare the gene expression of a marine cyanobacterium and a knockout mutant strain in a defined artificial seawater medium. Statistical methods were developed for analysis in the special situation of this case study where there is gene replication within an array and where relatively few arrays are used, which can be the case with current array technology. Due in part to the replication within an array, it was possible to detect very small changes in the levels of expression between the wild type and mutant strains. One interesting biological outcome of this experiment is the indication of the extent to which the phosphorus regulatory system of this cyanobacterium affects the expression of multiple genes beyond those strictly involved in phosphorus acquisition. PMID:19404483

  20. Statistical Analysis of Microarray Data with Replicated Spots: A Case Study with Synechococcus WH8102

    DOE PAGES

    Thomas, E. V.; Phillippy, K. H.; Brahamsha, B.; ...

    2009-01-01

    Until recently microarray experiments often involved relatively few arrays with only a single representation of each gene on each array. A complete genome microarray with multiple spots per gene (spread out spatially across the array) was developed in order to compare the gene expression of a marine cyanobacterium and a knockout mutant strain in a defined artificial seawater medium. Statistical methods were developed for analysis in the special situation of this case study where there is gene replication within an array and where relatively few arrays are used, which can be the case with current array technology. Due in partmore » to the replication within an array, it was possible to detect very small changes in the levels of expression between the wild type and mutant strains. One interesting biological outcome of this experiment is the indication of the extent to which the phosphorus regulatory system of this cyanobacterium affects the expression of multiple genes beyond those strictly involved in phosphorus acquisition.« less

Top