Sample records for robust statistical tests

  1. Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: a Monte Carlo study.

    PubMed

    Chou, C P; Bentler, P M; Satorra, A

    1991-11-01

    Research studying robustness of maximum likelihood (ML) statistics in covariance structure analysis has concluded that test statistics and standard errors are biased under severe non-normality. An estimation procedure known as asymptotic distribution free (ADF), making no distributional assumption, has been suggested to avoid these biases. Corrections to the normal theory statistics to yield more adequate performance have also been proposed. This study compares the performance of a scaled test statistic and robust standard errors for two models under several non-normal conditions and also compares these with the results from ML and ADF methods. Both ML and ADF test statistics performed rather well in one model and considerably worse in the other. In general, the scaled test statistic seemed to behave better than the ML test statistic and the ADF statistic performed the worst. The robust and ADF standard errors yielded more appropriate estimates of sampling variability than the ML standard errors, which were usually downward biased, in both models under most of the non-normal conditions. ML test statistics and standard errors were found to be quite robust to the violation of the normality assumption when data had either symmetric and platykurtic distributions, or non-symmetric and zero kurtotic distributions.

  2. Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials

    PubMed Central

    Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo

    2018-01-01

    This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555

  3. New robust statistical procedures for the polytomous logistic regression models.

    PubMed

    Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

    2018-05-17

    This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.

  4. A Statistical Analysis of Brain Morphology Using Wild Bootstrapping

    PubMed Central

    Ibrahim, Joseph G.; Tang, Niansheng; Rowe, Daniel B.; Hao, Xuejun; Bansal, Ravi; Peterson, Bradley S.

    2008-01-01

    Methods for the analysis of brain morphology, including voxel-based morphology and surface-based morphometries, have been used to detect associations between brain structure and covariates of interest, such as diagnosis, severity of disease, age, IQ, and genotype. The statistical analysis of morphometric measures usually involves two statistical procedures: 1) invoking a statistical model at each voxel (or point) on the surface of the brain or brain subregion, followed by mapping test statistics (e.g., t test) or their associated p values at each of those voxels; 2) correction for the multiple statistical tests conducted across all voxels on the surface of the brain region under investigation. We propose the use of new statistical methods for each of these procedures. We first use a heteroscedastic linear model to test the associations between the morphological measures at each voxel on the surface of the specified subregion (e.g., cortical or subcortical surfaces) and the covariates of interest. Moreover, we develop a robust test procedure that is based on a resampling method, called wild bootstrapping. This procedure assesses the statistical significance of the associations between a measure of given brain structure and the covariates of interest. The value of this robust test procedure lies in its computationally simplicity and in its applicability to a wide range of imaging data, including data from both anatomical and functional magnetic resonance imaging (fMRI). Simulation studies demonstrate that this robust test procedure can accurately control the family-wise error rate. We demonstrate the application of this robust test procedure to the detection of statistically significant differences in the morphology of the hippocampus over time across gender groups in a large sample of healthy subjects. PMID:17649909

  5. EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.

    PubMed

    Tong, Xiaoxiao; Bentler, Peter M

    2013-01-01

    Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.

  6. Robust statistical methods for impulse noise suppressing of spread spectrum induced polarization data, with application to a mine site, Gansu province, China

    NASA Astrophysics Data System (ADS)

    Liu, Weiqiang; Chen, Rujun; Cai, Hongzhu; Luo, Weibin

    2016-12-01

    In this paper, we investigated the robust processing of noisy spread spectrum induced polarization (SSIP) data. SSIP is a new frequency domain induced polarization method that transmits pseudo-random m-sequence as source current where m-sequence is a broadband signal. The potential information at multiple frequencies can be obtained through measurement. Removing the noise is a crucial problem for SSIP data processing. Considering that if the ordinary mean stack and digital filter are not capable of reducing the impulse noise effectively in SSIP data processing, the impact of impulse noise will remain in the complex resistivity spectrum that will affect the interpretation of profile anomalies. We implemented a robust statistical method to SSIP data processing. The robust least-squares regression is used to fit and remove the linear trend from the original data before stacking. The robust M estimate is used to stack the data of all periods. The robust smooth filter is used to suppress the residual noise for data after stacking. For robust statistical scheme, the most appropriate influence function and iterative algorithm are chosen by testing the simulated data to suppress the outliers' influence. We tested the benefits of the robust SSIP data processing using examples of SSIP data recorded in a test site beside a mine in Gansu province, China.

  7. Notes on power of normality tests of error terms in regression models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Střelec, Luboš

    2015-03-10

    Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less

  8. Robust regression for large-scale neuroimaging studies.

    PubMed

    Fritsch, Virgile; Da Mota, Benoit; Loth, Eva; Varoquaux, Gaël; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Brühl, Rüdiger; Butzek, Brigitte; Conrod, Patricia; Flor, Herta; Garavan, Hugh; Lemaitre, Hervé; Mann, Karl; Nees, Frauke; Paus, Tomas; Schad, Daniel J; Schümann, Gunter; Frouin, Vincent; Poline, Jean-Baptiste; Thirion, Bertrand

    2015-05-01

    Multi-subject datasets used in neuroimaging group studies have a complex structure, as they exhibit non-stationary statistical properties across regions and display various artifacts. While studies with small sample sizes can rarely be shown to deviate from standard hypotheses (such as the normality of the residuals) due to the poor sensitivity of normality tests with low degrees of freedom, large-scale studies (e.g. >100 subjects) exhibit more obvious deviations from these hypotheses and call for more refined models for statistical inference. Here, we demonstrate the benefits of robust regression as a tool for analyzing large neuroimaging cohorts. First, we use an analytic test based on robust parameter estimates; based on simulations, this procedure is shown to provide an accurate statistical control without resorting to permutations. Second, we show that robust regression yields more detections than standard algorithms using as an example an imaging genetics study with 392 subjects. Third, we show that robust regression can avoid false positives in a large-scale analysis of brain-behavior relationships with over 1500 subjects. Finally we embed robust regression in the Randomized Parcellation Based Inference (RPBI) method and demonstrate that this combination further improves the sensitivity of tests carried out across the whole brain. Altogether, our results show that robust procedures provide important advantages in large-scale neuroimaging group studies. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Robust inference from multiple test statistics via permutations: a better alternative to the single test statistic approach for randomized trials.

    PubMed

    Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie

    2013-01-01

    Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.

  10. Proficiency Testing for Determination of Water Content in Toluene of Chemical Reagents by iteration robust statistic technique

    NASA Astrophysics Data System (ADS)

    Wang, Hao; Wang, Qunwei; He, Ming

    2018-05-01

    In order to investigate and improve the level of detection technology of water content in liquid chemical reagents of domestic laboratories, proficiency testing provider PT0031 (CNAS) has organized proficiency testing program of water content in toluene, 48 laboratories from 18 provinces/cities/municipals took part in the PT. This paper introduces the implementation process of proficiency testing for determination of water content in toluene, including sample preparation, homogeneity and stability test, the results of statistics of iteration robust statistic technique and analysis, summarized and analyzed those of the different test standards which are widely used in the laboratories, put forward the technological suggestions for the improvement of the test quality of water content. Satisfactory results were obtained by 43 laboratories, amounting to 89.6% of the total participating laboratories.

  11. Robustness of S1 statistic with Hodges-Lehmann for skewed distributions

    NASA Astrophysics Data System (ADS)

    Ahad, Nor Aishah; Yahaya, Sharipah Soaad Syed; Yin, Lee Ping

    2016-10-01

    Analysis of variance (ANOVA) is a common use parametric method to test the differences in means for more than two groups when the populations are normally distributed. ANOVA is highly inefficient under the influence of non- normal and heteroscedastic settings. When the assumptions are violated, researchers are looking for alternative such as Kruskal-Wallis under nonparametric or robust method. This study focused on flexible method, S1 statistic for comparing groups using median as the location estimator. S1 statistic was modified by substituting the median with Hodges-Lehmann and the default scale estimator with the variance of Hodges-Lehmann and MADn to produce two different test statistics for comparing groups. Bootstrap method was used for testing the hypotheses since the sampling distributions of these modified S1 statistics are unknown. The performance of the proposed statistic in terms of Type I error was measured and compared against the original S1 statistic, ANOVA and Kruskal-Wallis. The propose procedures show improvement compared to the original statistic especially under extremely skewed distribution.

  12. Robust inference for group sequential trials.

    PubMed

    Ganju, Jitendra; Lin, Yunzhi; Zhou, Kefei

    2017-03-01

    For ethical reasons, group sequential trials were introduced to allow trials to stop early in the event of extreme results. Endpoints in such trials are usually mortality or irreversible morbidity. For a given endpoint, the norm is to use a single test statistic and to use that same statistic for each analysis. This approach is risky because the test statistic has to be specified before the study is unblinded, and there is loss in power if the assumptions that ensure optimality for each analysis are not met. To minimize the risk of moderate to substantial loss in power due to a suboptimal choice of a statistic, a robust method was developed for nonsequential trials. The concept is analogous to diversification of financial investments to minimize risk. The method is based on combining P values from multiple test statistics for formal inference while controlling the type I error rate at its designated value.This article evaluates the performance of 2 P value combining methods for group sequential trials. The emphasis is on time to event trials although results from less complex trials are also included. The gain or loss in power with the combination method relative to a single statistic is asymmetric in its favor. Depending on the power of each individual test, the combination method can give more power than any single test or give power that is closer to the test with the most power. The versatility of the method is that it can combine P values from different test statistics for analysis at different times. The robustness of results suggests that inference from group sequential trials can be strengthened with the use of combined tests. Copyright © 2017 John Wiley & Sons, Ltd.

  13. On the Computation of the RMSEA and CFI from the Mean-And-Variance Corrected Test Statistic with Nonnormal Data in SEM.

    PubMed

    Savalei, Victoria

    2018-01-01

    A new type of nonnormality correction to the RMSEA has recently been developed, which has several advantages over existing corrections. In particular, the new correction adjusts the sample estimate of the RMSEA for the inflation due to nonnormality, while leaving its population value unchanged, so that established cutoff criteria can still be used to judge the degree of approximate fit. A confidence interval (CI) for the new robust RMSEA based on the mean-corrected ("Satorra-Bentler") test statistic has also been proposed. Follow up work has provided the same type of nonnormality correction for the CFI (Brosseau-Liard & Savalei, 2014). These developments have recently been implemented in lavaan. This note has three goals: a) to show how to compute the new robust RMSEA and CFI from the mean-and-variance corrected test statistic; b) to offer a new CI for the robust RMSEA based on the mean-and-variance corrected test statistic; and c) to caution that the logic of the new nonnormality corrections to RMSEA and CFI is most appropriate for the maximum likelihood (ML) estimator, and cannot easily be generalized to the most commonly used categorical data estimators.

  14. Evaluation of a New Mean Scaled and Moment Adjusted Test Statistic for SEM

    ERIC Educational Resources Information Center

    Tong, Xiaoxiao; Bentler, Peter M.

    2013-01-01

    Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and 2 well-known robust test…

  15. The Adequacy of Different Robust Statistical Tests in Comparing Two Independent Groups

    ERIC Educational Resources Information Center

    Pero-Cebollero, Maribel; Guardia-Olmos, Joan

    2013-01-01

    In the current study, we evaluated various robust statistical methods for comparing two independent groups. Two scenarios for simulation were generated: one of equality and another of population mean differences. In each of the scenarios, 33 experimental conditions were used as a function of sample size, standard deviation and asymmetry. For each…

  16. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

    PubMed

    Lin, Johnny; Bentler, Peter M

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.

  17. Hypothesis Testing, "p" Values, Confidence Intervals, Measures of Effect Size, and Bayesian Methods in Light of Modern Robust Techniques

    ERIC Educational Resources Information Center

    Wilcox, Rand R.; Serang, Sarfaraz

    2017-01-01

    The article provides perspectives on p values, null hypothesis testing, and alternative techniques in light of modern robust statistical methods. Null hypothesis testing and "p" values can provide useful information provided they are interpreted in a sound manner, which includes taking into account insights and advances that have…

  18. Robust Mean and Covariance Structure Analysis through Iteratively Reweighted Least Squares.

    ERIC Educational Resources Information Center

    Yuan, Ke-Hai; Bentler, Peter M.

    2000-01-01

    Adapts robust schemes to mean and covariance structures, providing an iteratively reweighted least squares approach to robust structural equation modeling. Each case is weighted according to its distance, based on first and second order moments. Test statistics and standard error estimators are given. (SLD)

  19. SU-F-T-187: Quantifying Normal Tissue Sparing with 4D Robust Optimization of Intensity Modulated Proton Therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Newpower, M; Ge, S; Mohan, R

    Purpose: To report an approach to quantify the normal tissue sparing for 4D robustly-optimized versus PTV-optimized IMPT plans. Methods: We generated two sets of 90 DVHs from a patient’s 10-phase 4D CT set; one by conventional PTV-based optimization done in the Eclipse treatment planning system, and the other by an in-house robust optimization algorithm. The 90 DVHs were created for the following scenarios in each of the ten phases of the 4DCT: ± 5mm shift along x, y, z; ± 3.5% range uncertainty and a nominal scenario. A Matlab function written by Gay and Niemierko was modified to calculate EUDmore » for each DVH for the following structures: esophagus, heart, ipsilateral lung and spinal cord. An F-test determined whether or not the variances of each structure’s DVHs were statistically different. Then a t-test determined if the average EUDs for each optimization algorithm were statistically significantly different. Results: T-test results showed each structure had a statistically significant difference in average EUD when comparing robust optimization versus PTV-based optimization. Under robust optimization all structures except the spinal cord received lower EUDs than PTV-based optimization. Using robust optimization the average EUDs decreased 1.45% for the esophagus, 1.54% for the heart and 5.45% for the ipsilateral lung. The average EUD to the spinal cord increased 24.86% but was still well below tolerance. Conclusion: This work has helped quantify a qualitative relationship noted earlier in our work: that robust optimization leads to plans with greater normal tissue sparing compared to PTV-based optimization. Except in the case of the spinal cord all structures received a lower EUD under robust optimization and these results are statistically significant. While the average EUD to the spinal cord increased to 25.06 Gy under robust optimization it is still well under the TD50 value of 66.5 Gy from Emami et al. Supported in part by the NCI U19 CA021239.« less

  20. Appropriate Statistical Analysis for Two Independent Groups of Likert-Type Data

    ERIC Educational Resources Information Center

    Warachan, Boonyasit

    2011-01-01

    The objective of this research was to determine the robustness and statistical power of three different methods for testing the hypothesis that ordinal samples of five and seven Likert categories come from equal populations. The three methods are the two sample t-test with equal variances, the Mann-Whitney test, and the Kolmogorov-Smirnov test. In…

  1. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis

    PubMed Central

    Lin, Johnny; Bentler, Peter M.

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne’s asymptotically distribution-free method and Satorra Bentler’s mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler’s statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby’s study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic. PMID:23144511

  2. New heterogeneous test statistics for the unbalanced fixed-effect nested design.

    PubMed

    Guo, Jiin-Huarng; Billard, L; Luh, Wei-Ming

    2011-05-01

    When the underlying variances are unknown or/and unequal, using the conventional F test is problematic in the two-factor hierarchical data structure. Prompted by the approximate test statistics (Welch and Alexander-Govern methods), the authors develop four new heterogeneous test statistics to test factor A and factor B nested within A for the unbalanced fixed-effect two-stage nested design under variance heterogeneity. The actual significance levels and statistical power of the test statistics were compared in a simulation study. The results show that the proposed procedures maintain better Type I error rate control and have greater statistical power than those obtained by the conventional F test in various conditions. Therefore, the proposed test statistics are recommended in terms of robustness and easy implementation. ©2010 The British Psychological Society.

  3. Robust Detection of Examinees with Aberrant Answer Changes

    ERIC Educational Resources Information Center

    Belov, Dmitry I.

    2015-01-01

    The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large-scale assessments and is now routinely performed at testing organizations. However, AC data has an uncertainty caused by technological or human factors. Therefore, existing statistics (e.g., number of wrong-to-right ACs) used to detect examinees…

  4. Robust Lee local statistic filter for removal of mixed multiplicative and impulse noise

    NASA Astrophysics Data System (ADS)

    Ponomarenko, Nikolay N.; Lukin, Vladimir V.; Egiazarian, Karen O.; Astola, Jaakko T.

    2004-05-01

    A robust version of Lee local statistic filter able to effectively suppress the mixed multiplicative and impulse noise in images is proposed. The performance of the proposed modification is studied for a set of test images, several values of multiplicative noise variance, Gaussian and Rayleigh probability density functions of speckle, and different characteris-tics of impulse noise. The advantages of the designed filter in comparison to the conventional Lee local statistic filter and some other filters able to cope with mixed multiplicative+impulse noise are demonstrated.

  5. Application of Statistics in Engineering Technology Programs

    ERIC Educational Resources Information Center

    Zhan, Wei; Fink, Rainer; Fang, Alex

    2010-01-01

    Statistics is a critical tool for robustness analysis, measurement system error analysis, test data analysis, probabilistic risk assessment, and many other fields in the engineering world. Traditionally, however, statistics is not extensively used in undergraduate engineering technology (ET) programs, resulting in a major disconnect from industry…

  6. GWAR: robust analysis and meta-analysis of genome-wide association studies.

    PubMed

    Dimou, Niki L; Tsirigos, Konstantinos D; Elofsson, Arne; Bagos, Pantelis G

    2017-05-15

    In the context of genome-wide association studies (GWAS), there is a variety of statistical techniques in order to conduct the analysis, but, in most cases, the underlying genetic model is usually unknown. Under these circumstances, the classical Cochran-Armitage trend test (CATT) is suboptimal. Robust procedures that maximize the power and preserve the nominal type I error rate are preferable. Moreover, performing a meta-analysis using robust procedures is of great interest and has never been addressed in the past. The primary goal of this work is to implement several robust methods for analysis and meta-analysis in the statistical package Stata and subsequently to make the software available to the scientific community. The CATT under a recessive, additive and dominant model of inheritance as well as robust methods based on the Maximum Efficiency Robust Test statistic, the MAX statistic and the MIN2 were implemented in Stata. Concerning MAX and MIN2, we calculated their asymptotic null distributions relying on numerical integration resulting in a great gain in computational time without losing accuracy. All the aforementioned approaches were employed in a fixed or a random effects meta-analysis setting using summary data with weights equal to the reciprocal of the combined cases and controls. Overall, this is the first complete effort to implement procedures for analysis and meta-analysis in GWAS using Stata. A Stata program and a web-server are freely available for academic users at http://www.compgen.org/tools/GWAR. pbagos@compgen.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  7. Finding differentially expressed genes in high dimensional data: Rank based test statistic via a distance measure.

    PubMed

    Mathur, Sunil; Sadana, Ajit

    2015-12-01

    We present a rank-based test statistic for the identification of differentially expressed genes using a distance measure. The proposed test statistic is highly robust against extreme values and does not assume the distribution of parent population. Simulation studies show that the proposed test is more powerful than some of the commonly used methods, such as paired t-test, Wilcoxon signed rank test, and significance analysis of microarray (SAM) under certain non-normal distributions. The asymptotic distribution of the test statistic, and the p-value function are discussed. The application of proposed method is shown using a real-life data set. © The Author(s) 2011.

  8. A Robust Alternative to the Normal Distribution.

    DTIC Science & Technology

    1982-07-07

    for any Purpose of the United States Governuent DEPARTMENT OF STATISTICS t -, STANFORD UIVERSITY I STANFORD, CALIFORNIA A Robust Alternative to the...Stanford University Technical Report No. 3. [5] Bhattacharya, S. K. (1966). A Modified Bessel Function lodel in Life Testing. Metrika 10, 133-144

  9. Robust Proton Pencil Beam Scanning Treatment Planning for Rectal Cancer Radiation Therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blanco Kiely, Janid Patricia, E-mail: jkiely@sas.upenn.edu; White, Benjamin M.

    2016-05-01

    Purpose: To investigate, in a treatment plan design and robustness study, whether proton pencil beam scanning (PBS) has the potential to offer advantages, relative to interfraction uncertainties, over photon volumetric modulated arc therapy (VMAT) in a locally advanced rectal cancer patient population. Methods and Materials: Ten patients received a planning CT scan, followed by an average of 4 weekly offline CT verification CT scans, which were rigidly co-registered to the planning CT. Clinical PBS plans were generated on the planning CT, using a single-field uniform-dose technique with single-posterior and parallel-opposed (LAT) fields geometries. The VMAT plans were generated on the planningmore » CT using 2 6-MV, 220° coplanar arcs. Clinical plans were forward-calculated on verification CTs to assess robustness relative to anatomic changes. Setup errors were assessed by forward-calculating clinical plans with a ±5-mm (left–right, anterior–posterior, superior–inferior) isocenter shift on the planning CT. Differences in clinical target volume and organ at risk dose–volume histogram (DHV) indicators between plans were tested for significance using an appropriate Wilcoxon test (P<.05). Results: Dosimetrically, PBS plans were statistically different from VMAT plans, showing greater organ at risk sparing. However, the bladder was statistically identical among LAT and VMAT plans. The clinical target volume coverage was statistically identical among all plans. The robustness test found that all DVH indicators for PBS and VMAT plans were robust, except the LAT's genitalia (V5, V35). The verification CT plans showed that all DVH indicators were robust. Conclusions: Pencil beam scanning plans were found to be as robust as VMAT plans relative to interfractional changes during treatment when posterior beam angles and appropriate range margins are used. Pencil beam scanning dosimetric gains in the bowel (V15, V20) over VMAT suggest that using PBS to treat rectal cancer may reduce radiation treatment–related toxicity.« less

  10. Robustness of statistical tests for multiplicative terms in the additive main effects and multiplicative interaction model for cultivar trials.

    PubMed

    Piepho, H P

    1995-03-01

    The additive main effects multiplicative interaction model is frequently used in the analysis of multilocation trials. In the analysis of such data it is of interest to decide how many of the multiplicative interaction terms are significant. Several tests for this task are available, all of which assume that errors are normally distributed with a common variance. This paper investigates the robustness of several tests (Gollob, F GH1, FGH2, FR)to departures from these assumptions. It is concluded that, because of its better robustness, the F Rtest is preferable. If the other tests are to be used, preliminary tests for the validity of assumptions should be performed.

  11. Horsetail matching: a flexible approach to optimization under uncertainty

    NASA Astrophysics Data System (ADS)

    Cook, L. W.; Jarrett, J. P.

    2018-04-01

    It is important to design engineering systems to be robust with respect to uncertainties in the design process. Often, this is done by considering statistical moments, but over-reliance on statistical moments when formulating a robust optimization can produce designs that are stochastically dominated by other feasible designs. This article instead proposes a formulation for optimization under uncertainty that minimizes the difference between a design's cumulative distribution function and a target. A standard target is proposed that produces stochastically non-dominated designs, but the formulation also offers enough flexibility to recover existing approaches for robust optimization. A numerical implementation is developed that employs kernels to give a differentiable objective function. The method is applied to algebraic test problems and a robust transonic airfoil design problem where it is compared to multi-objective, weighted-sum and density matching approaches to robust optimization; several advantages over these existing methods are demonstrated.

  12. Effect of non-normality on test statistics for one-way independent groups designs.

    PubMed

    Cribbie, Robert A; Fiksenbaum, Lisa; Keselman, H J; Wilcox, Rand R

    2012-02-01

    The data obtained from one-way independent groups designs is typically non-normal in form and rarely equally variable across treatment populations (i.e., population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e., the analysis of variance F test) typically provides invalid results (e.g., too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non-normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e., trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non-normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non-normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non-normal. © 2011 The British Psychological Society.

  13. Robustness of Multiple Objective Decision Analysis Preference Functions

    DTIC Science & Technology

    2002-06-01

    p p′ : The probability of some event. ,i ip q : The probability of event . i Π : An aggregation of proportional data used in calculating a test ...statistical tests of the significance of the term and also is conducted in a multivariate framework rather than the ROSA univariate approach. A...residual error is ˆ−e = y y (45) The coefficient provides a ready indicator of the contribution for the associated variable and statistical tests

  14. The Cross-Correlation and Reshuffling Tests in Discerning Induced Seismicity

    NASA Astrophysics Data System (ADS)

    Schultz, Ryan; Telesca, Luciano

    2018-05-01

    In recent years, cases of newly emergent induced clusters have increased seismic hazard and risk in locations with social, environmental, and economic consequence. Thus, the need for a quantitative and robust means to discern induced seismicity has become a critical concern. This paper reviews a Matlab-based algorithm designed to quantify the statistical confidence between two time-series datasets. Similar to prior approaches, our method utilizes the cross-correlation to delineate the strength and lag of correlated signals. In addition, use of surrogate reshuffling tests allows for the dynamic testing against statistical confidence intervals of anticipated spurious correlations. We demonstrate the robust nature of our algorithm in a suite of synthetic tests to determine the limits of accurate signal detection in the presence of noise and sub-sampling. Overall, this routine has considerable merit in terms of delineating the strength of correlated signals, one of which includes the discernment of induced seismicity from natural.

  15. kruX: matrix-based non-parametric eQTL discovery.

    PubMed

    Qi, Jianlong; Asl, Hassan Foroughi; Björkegren, Johan; Michoel, Tom

    2014-01-14

    The Kruskal-Wallis test is a popular non-parametric statistical test for identifying expression quantitative trait loci (eQTLs) from genome-wide data due to its robustness against variations in the underlying genetic model and expression trait distribution, but testing billions of marker-trait combinations one-by-one can become computationally prohibitive. We developed kruX, an algorithm implemented in Matlab, Python and R that uses matrix multiplications to simultaneously calculate the Kruskal-Wallis test statistic for several millions of marker-trait combinations at once. KruX is more than ten thousand times faster than computing associations one-by-one on a typical human dataset. We used kruX and a dataset of more than 500k SNPs and 20k expression traits measured in 102 human blood samples to compare eQTLs detected by the Kruskal-Wallis test to eQTLs detected by the parametric ANOVA and linear model methods. We found that the Kruskal-Wallis test is more robust against data outliers and heterogeneous genotype group sizes and detects a higher proportion of non-linear associations, but is more conservative for calling additive linear associations. kruX enables the use of robust non-parametric methods for massive eQTL mapping without the need for a high-performance computing infrastructure and is freely available from http://krux.googlecode.com.

  16. Robust Statistical Detection of Power-Law Cross-Correlation.

    PubMed

    Blythe, Duncan A J; Nikulin, Vadim V; Müller, Klaus-Robert

    2016-06-02

    We show that widely used approaches in statistical physics incorrectly indicate the existence of power-law cross-correlations between financial stock market fluctuations measured over several years and the neuronal activity of the human brain lasting for only a few minutes. While such cross-correlations are nonsensical, no current methodology allows them to be reliably discarded, leaving researchers at greater risk when the spurious nature of cross-correlations is not clear from the unrelated origin of the time series and rather requires careful statistical estimation. Here we propose a theory and method (PLCC-test) which allows us to rigorously and robustly test for power-law cross-correlations, correctly detecting genuine and discarding spurious cross-correlations, thus establishing meaningful relationships between processes in complex physical systems. Our method reveals for the first time the presence of power-law cross-correlations between amplitudes of the alpha and beta frequency ranges of the human electroencephalogram.

  17. Robust Statistical Detection of Power-Law Cross-Correlation

    PubMed Central

    Blythe, Duncan A. J.; Nikulin, Vadim V.; Müller, Klaus-Robert

    2016-01-01

    We show that widely used approaches in statistical physics incorrectly indicate the existence of power-law cross-correlations between financial stock market fluctuations measured over several years and the neuronal activity of the human brain lasting for only a few minutes. While such cross-correlations are nonsensical, no current methodology allows them to be reliably discarded, leaving researchers at greater risk when the spurious nature of cross-correlations is not clear from the unrelated origin of the time series and rather requires careful statistical estimation. Here we propose a theory and method (PLCC-test) which allows us to rigorously and robustly test for power-law cross-correlations, correctly detecting genuine and discarding spurious cross-correlations, thus establishing meaningful relationships between processes in complex physical systems. Our method reveals for the first time the presence of power-law cross-correlations between amplitudes of the alpha and beta frequency ranges of the human electroencephalogram. PMID:27250630

  18. New Approaches to Robust Confidence Intervals for Location: A Simulation Study.

    DTIC Science & Technology

    1984-06-01

    obtain a denominator for the test statistic. Those statistics based on location estimates derived from Hampel’s redescending influence function or v...defined an influence function for a test in terms of the behavior of its P-values when the data are sampled from a model distribution modified by point...proposal could be used for interval estimation as well as hypothesis testing, the extension is immediate. Once an influence function has been defined

  19. Resampling-based Methods in Single and Multiple Testing for Equality of Covariance/Correlation Matrices

    PubMed Central

    Yang, Yang; DeGruttola, Victor

    2016-01-01

    Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients. PMID:22740584

  20. Resampling-based methods in single and multiple testing for equality of covariance/correlation matrices.

    PubMed

    Yang, Yang; DeGruttola, Victor

    2012-06-22

    Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients.

  1. Modeling of a Robust Confidence Band for the Power Curve of a Wind Turbine.

    PubMed

    Hernandez, Wilmar; Méndez, Alfredo; Maldonado-Correa, Jorge L; Balleteros, Francisco

    2016-12-07

    Having an accurate model of the power curve of a wind turbine allows us to better monitor its operation and planning of storage capacity. Since wind speed and direction is of a highly stochastic nature, the forecasting of the power generated by the wind turbine is of the same nature as well. In this paper, a method for obtaining a robust confidence band containing the power curve of a wind turbine under test conditions is presented. Here, the confidence band is bound by two curves which are estimated using parametric statistical inference techniques. However, the observations that are used for carrying out the statistical analysis are obtained by using the binning method, and in each bin, the outliers are eliminated by using a censorship process based on robust statistical techniques. Then, the observations that are not outliers are divided into observation sets. Finally, both the power curve of the wind turbine and the two curves that define the robust confidence band are estimated using each of the previously mentioned observation sets.

  2. Modeling of a Robust Confidence Band for the Power Curve of a Wind Turbine

    PubMed Central

    Hernandez, Wilmar; Méndez, Alfredo; Maldonado-Correa, Jorge L.; Balleteros, Francisco

    2016-01-01

    Having an accurate model of the power curve of a wind turbine allows us to better monitor its operation and planning of storage capacity. Since wind speed and direction is of a highly stochastic nature, the forecasting of the power generated by the wind turbine is of the same nature as well. In this paper, a method for obtaining a robust confidence band containing the power curve of a wind turbine under test conditions is presented. Here, the confidence band is bound by two curves which are estimated using parametric statistical inference techniques. However, the observations that are used for carrying out the statistical analysis are obtained by using the binning method, and in each bin, the outliers are eliminated by using a censorship process based on robust statistical techniques. Then, the observations that are not outliers are divided into observation sets. Finally, both the power curve of the wind turbine and the two curves that define the robust confidence band are estimated using each of the previously mentioned observation sets. PMID:27941604

  3. Cosmic shear measurements with Dark Energy Survey Science Verification data

    DOE PAGES

    Becker, M. R.

    2016-07-06

    Here, we present measurements of weak gravitational lensing cosmic shear two-point statistics using Dark Energy Survey Science Verification data. We demonstrate that our results are robust to the choice of shear measurement pipeline, either ngmix or im3shape, and robust to the choice of two-point statistic, including both real and Fourier-space statistics. Our results pass a suite of null tests including tests for B-mode contamination and direct tests for any dependence of the two-point functions on a set of 16 observing conditions and galaxy properties, such as seeing, airmass, galaxy color, galaxy magnitude, etc. We use a large suite of simulationsmore » to compute the covariance matrix of the cosmic shear measurements and assign statistical significance to our null tests. We find that our covariance matrix is consistent with the halo model prediction, indicating that it has the appropriate level of halo sample variance. We also compare the same jackknife procedure applied to the data and the simulations in order to search for additional sources of noise not captured by the simulations. We find no statistically significant extra sources of noise in the data. The overall detection significance with tomography for our highest source density catalog is 9.7σ. Cosmological constraints from the measurements in this work are presented in a companion paper.« less

  4. [Application of Stata software to test heterogeneity in meta-analysis method].

    PubMed

    Wang, Dan; Mou, Zhen-yun; Zhai, Jun-xia; Zong, Hong-xia; Zhao, Xiao-dong

    2008-07-01

    To introduce the application of Stata software to heterogeneity test in meta-analysis. A data set was set up according to the example in the study, and the corresponding commands of the methods in Stata 9 software were applied to test the example. The methods used were Q-test and I2 statistic attached to the fixed effect model forest plot, H statistic and Galbraith plot. The existence of the heterogeneity among studies could be detected by Q-test and H statistic and the degree of the heterogeneity could be detected by I2 statistic. The outliers which were the sources of the heterogeneity could be spotted from the Galbraith plot. Heterogeneity test in meta-analysis can be completed by the four methods in Stata software simply and quickly. H and I2 statistics are more robust, and the outliers of the heterogeneity can be clearly seen in the Galbraith plot among the four methods.

  5. Robust hypothesis tests for detecting statistical evidence of two-dimensional and three-dimensional interactions in single-molecule measurements

    NASA Astrophysics Data System (ADS)

    Calderon, Christopher P.; Weiss, Lucien E.; Moerner, W. E.

    2014-05-01

    Experimental advances have improved the two- (2D) and three-dimensional (3D) spatial resolution that can be extracted from in vivo single-molecule measurements. This enables researchers to quantitatively infer the magnitude and directionality of forces experienced by biomolecules in their native environment. Situations where such force information is relevant range from mitosis to directed transport of protein cargo along cytoskeletal structures. Models commonly applied to quantify single-molecule dynamics assume that effective forces and velocity in the x ,y (or x ,y,z) directions are statistically independent, but this assumption is physically unrealistic in many situations. We present a hypothesis testing approach capable of determining if there is evidence of statistical dependence between positional coordinates in experimentally measured trajectories; if the hypothesis of independence between spatial coordinates is rejected, then a new model accounting for 2D (3D) interactions can and should be considered. Our hypothesis testing technique is robust, meaning it can detect interactions, even if the noise statistics are not well captured by the model. The approach is demonstrated on control simulations and on experimental data (directed transport of intraflagellar transport protein 88 homolog in the primary cilium).

  6. ROBUST: an interactive FORTRAN-77 package for exploratory data analysis using parametric, ROBUST and nonparametric location and scale estimates, data transformations, normality tests, and outlier assessment

    NASA Astrophysics Data System (ADS)

    Rock, N. M. S.

    ROBUST calculates 53 statistics, plus significance levels for 6 hypothesis tests, on each of up to 52 variables. These together allow the following properties of the data distribution for each variable to be examined in detail: (1) Location. Three means (arithmetic, geometric, harmonic) are calculated, together with the midrange and 19 high-performance robust L-, M-, and W-estimates of location (combined, adaptive, trimmed estimates, etc.) (2) Scale. The standard deviation is calculated along with the H-spread/2 (≈ semi-interquartile range), the mean and median absolute deviations from both mean and median, and a biweight scale estimator. The 23 location and 6 scale estimators programmed cover all possible degrees of robustness. (3) Normality: Distributions are tested against the null hypothesis that they are normal, using the 3rd (√ h1) and 4th ( b 2) moments, Geary's ratio (mean deviation/standard deviation), Filliben's probability plot correlation coefficient, and a more robust test based on the biweight scale estimator. These statistics collectively are sensitive to most usual departures from normality. (4) Presence of outliers. The maximum and minimum values are assessed individually or jointly using Grubbs' maximum Studentized residuals, Harvey's and Dixon's criteria, and the Studentized range. For a single input variable, outliers can be either winsorized or eliminated and all estimates recalculated iteratively as desired. The following data-transformations also can be applied: linear, log 10, generalized Box Cox power (including log, reciprocal, and square root), exponentiation, and standardization. For more than one variable, all results are tabulated in a single run of ROBUST. Further options are incorporated to assess ratios (of two variables) as well as discrete variables, and be concerned with missing data. Cumulative S-plots (for assessing normality graphically) also can be generated. The mutual consistency or inconsistency of all these measures helps to detect errors in data as well as to assess data-distributions themselves.

  7. Adaptive transmission disequilibrium test for family trio design.

    PubMed

    Yuan, Min; Tian, Xin; Zheng, Gang; Yang, Yaning

    2009-01-01

    The transmission disequilibrium test (TDT) is a standard method to detect association using family trio design. It is optimal for an additive genetic model. Other TDT-type tests optimal for recessive and dominant models have also been developed. Association tests using family data, including the TDT-type statistics, have been unified to a class of more comprehensive and flexable family-based association tests (FBAT). TDT-type tests have high efficiency when the genetic model is known or correctly specified, but may lose power if the model is mis-specified. Hence tests that are robust to genetic model mis-specification yet efficient are preferred. Constrained likelihood ratio test (CLRT) and MAX-type test have been shown to be efficiency robust. In this paper we propose a new efficiency robust procedure, referred to as adaptive TDT (aTDT). It uses the Hardy-Weinberg disequilibrium coefficient to identify the potential genetic model underlying the data and then applies the TDT-type test (or FBAT for general applications) corresponding to the selected model. Simulation demonstrates that aTDT is efficiency robust to model mis-specifications and generally outperforms the MAX test and CLRT in terms of power. We also show that aTDT has power close to, but much more robust, than the optimal TDT-type test based on a single genetic model. Applications to real and simulated data from Genetic Analysis Workshop (GAW) illustrate the use of our adaptive TDT.

  8. A general statistical test for correlations in a finite-length time series.

    PubMed

    Hanson, Jeffery A; Yang, Haw

    2008-06-07

    The statistical properties of the autocorrelation function from a time series composed of independently and identically distributed stochastic variables has been studied. Analytical expressions for the autocorrelation function's variance have been derived. It has been found that two common ways of calculating the autocorrelation, moving-average and Fourier transform, exhibit different uncertainty characteristics. For periodic time series, the Fourier transform method is preferred because it gives smaller uncertainties that are uniform through all time lags. Based on these analytical results, a statistically robust method has been proposed to test the existence of correlations in a time series. The statistical test is verified by computer simulations and an application to single-molecule fluorescence spectroscopy is discussed.

  9. A robust TDT-type association test under informative parental missingness.

    PubMed

    Chen, J H; Cheng, K F

    2011-02-10

    Many family-based association tests rely on the random transmission of alleles from parents to offspring. Among them, the transmission/disequilibrium test (TDT) may be considered to be the most popular statistical test. The TDT statistic and its variations were proposed to evaluate nonrandom transmission of alleles from parents to the diseased children. However, in family studies, parental genotypes may be missing due to parental death, loss, divorce, or other reasons. Under some missingness conditions, nonrandom transmission of alleles may still occur even when the gene and disease are not associated. As a consequence, the usual TDT-type tests would produce excessive false positive conclusions in association studies. In this paper, we propose a novel TDT-type association test which is not only simple in computation but also robust to the joint effect of population stratification and informative parental missingness. Our test is model-free and allows for different mechanisms of parental missingness across subpopulations. We use a simulation study to compare the performance of the new test with TDT and point out the advantage of the new method. Copyright © 2010 John Wiley & Sons, Ltd.

  10. Robust Statistics and Regularization for Feature Extraction and UXO Discrimination

    DTIC Science & Technology

    2011-07-01

    July 11, 2011 real data we find that this technique has an improved probability of finding all ordnance in a test data set, relative to previously...many sites. Tests on larger data sets should still be carried out. In previous work we considered a bootstrapping approach to selecting the operating...Marginalizing over x we obtain the probability that the ith order statistic in the test data belongs to the T class (55) P (T |x(i)) = ∞∫ −∞ P (T |x)p(x

  11. Statistical procedures for evaluating daily and monthly hydrologic model predictions

    USGS Publications Warehouse

    Coffey, M.E.; Workman, S.R.; Taraba, J.L.; Fogle, A.W.

    2004-01-01

    The overall study objective was to evaluate the applicability of different qualitative and quantitative methods for comparing daily and monthly SWAT computer model hydrologic streamflow predictions to observed data, and to recommend statistical methods for use in future model evaluations. Statistical methods were tested using daily streamflows and monthly equivalent runoff depths. The statistical techniques included linear regression, Nash-Sutcliffe efficiency, nonparametric tests, t-test, objective functions, autocorrelation, and cross-correlation. None of the methods specifically applied to the non-normal distribution and dependence between data points for the daily predicted and observed data. Of the tested methods, median objective functions, sign test, autocorrelation, and cross-correlation were most applicable for the daily data. The robust coefficient of determination (CD*) and robust modeling efficiency (EF*) objective functions were the preferred methods for daily model results due to the ease of comparing these values with a fixed ideal reference value of one. Predicted and observed monthly totals were more normally distributed, and there was less dependence between individual monthly totals than was observed for the corresponding predicted and observed daily values. More statistical methods were available for comparing SWAT model-predicted and observed monthly totals. The 1995 monthly SWAT model predictions and observed data had a regression Rr2 of 0.70, a Nash-Sutcliffe efficiency of 0.41, and the t-test failed to reject the equal data means hypothesis. The Nash-Sutcliffe coefficient and the R r2 coefficient were the preferred methods for monthly results due to the ability to compare these coefficients to a set ideal value of one.

  12. kruX: matrix-based non-parametric eQTL discovery

    PubMed Central

    2014-01-01

    Background The Kruskal-Wallis test is a popular non-parametric statistical test for identifying expression quantitative trait loci (eQTLs) from genome-wide data due to its robustness against variations in the underlying genetic model and expression trait distribution, but testing billions of marker-trait combinations one-by-one can become computationally prohibitive. Results We developed kruX, an algorithm implemented in Matlab, Python and R that uses matrix multiplications to simultaneously calculate the Kruskal-Wallis test statistic for several millions of marker-trait combinations at once. KruX is more than ten thousand times faster than computing associations one-by-one on a typical human dataset. We used kruX and a dataset of more than 500k SNPs and 20k expression traits measured in 102 human blood samples to compare eQTLs detected by the Kruskal-Wallis test to eQTLs detected by the parametric ANOVA and linear model methods. We found that the Kruskal-Wallis test is more robust against data outliers and heterogeneous genotype group sizes and detects a higher proportion of non-linear associations, but is more conservative for calling additive linear associations. Conclusion kruX enables the use of robust non-parametric methods for massive eQTL mapping without the need for a high-performance computing infrastructure and is freely available from http://krux.googlecode.com. PMID:24423115

  13. Robust functional statistics applied to Probability Density Function shape screening of sEMG data.

    PubMed

    Boudaoud, S; Rix, H; Al Harrach, M; Marin, F

    2014-01-01

    Recent studies pointed out possible shape modifications of the Probability Density Function (PDF) of surface electromyographical (sEMG) data according to several contexts like fatigue and muscle force increase. Following this idea, criteria have been proposed to monitor these shape modifications mainly using High Order Statistics (HOS) parameters like skewness and kurtosis. In experimental conditions, these parameters are confronted with small sample size in the estimation process. This small sample size induces errors in the estimated HOS parameters restraining real-time and precise sEMG PDF shape monitoring. Recently, a functional formalism, the Core Shape Model (CSM), has been used to analyse shape modifications of PDF curves. In this work, taking inspiration from CSM method, robust functional statistics are proposed to emulate both skewness and kurtosis behaviors. These functional statistics combine both kernel density estimation and PDF shape distances to evaluate shape modifications even in presence of small sample size. Then, the proposed statistics are tested, using Monte Carlo simulations, on both normal and Log-normal PDFs that mimic observed sEMG PDF shape behavior during muscle contraction. According to the obtained results, the functional statistics seem to be more robust than HOS parameters to small sample size effect and more accurate in sEMG PDF shape screening applications.

  14. Approximations to the distribution of a test statistic in covariance structure analysis: A comprehensive study.

    PubMed

    Wu, Hao

    2018-05-01

    In structural equation modelling (SEM), a robust adjustment to the test statistic or to its reference distribution is needed when its null distribution deviates from a χ 2 distribution, which usually arises when data do not follow a multivariate normal distribution. Unfortunately, existing studies on this issue typically focus on only a few methods and neglect the majority of alternative methods in statistics. Existing simulation studies typically consider only non-normal distributions of data that either satisfy asymptotic robustness or lead to an asymptotic scaled χ 2 distribution. In this work we conduct a comprehensive study that involves both typical methods in SEM and less well-known methods from the statistics literature. We also propose the use of several novel non-normal data distributions that are qualitatively different from the non-normal distributions widely used in existing studies. We found that several under-studied methods give the best performance under specific conditions, but the Satorra-Bentler method remains the most viable method for most situations. © 2017 The British Psychological Society.

  15. Model Checking Techniques for Assessing Functional Form Specifications in Censored Linear Regression Models.

    PubMed

    León, Larry F; Cai, Tianxi

    2012-04-01

    In this paper we develop model checking techniques for assessing functional form specifications of covariates in censored linear regression models. These procedures are based on a censored data analog to taking cumulative sums of "robust" residuals over the space of the covariate under investigation. These cumulative sums are formed by integrating certain Kaplan-Meier estimators and may be viewed as "robust" censored data analogs to the processes considered by Lin, Wei & Ying (2002). The null distributions of these stochastic processes can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by computer simulation. Each observed process can then be graphically compared with a few realizations from the Gaussian process. We also develop formal test statistics for numerical comparison. Such comparisons enable one to assess objectively whether an apparent trend seen in a residual plot reects model misspecification or natural variation. We illustrate the methods with a well known dataset. In addition, we examine the finite sample performance of the proposed test statistics in simulation experiments. In our simulation experiments, the proposed test statistics have good power of detecting misspecification while at the same time controlling the size of the test.

  16. Multilevel Factor Analysis by Model Segregation: New Applications for Robust Test Statistics

    ERIC Educational Resources Information Center

    Schweig, Jonathan

    2014-01-01

    Measures of classroom environments have become central to policy efforts that assess school and teacher quality. This has sparked a wide interest in using multilevel factor analysis to test measurement hypotheses about classroom-level variables. One approach partitions the total covariance matrix and tests models separately on the…

  17. The chi-square test of independence.

    PubMed

    McHugh, Mary L

    2013-01-01

    The Chi-square statistic is a non-parametric (distribution free) tool designed to analyze group differences when the dependent variable is measured at a nominal level. Like all non-parametric statistics, the Chi-square is robust with respect to the distribution of the data. Specifically, it does not require equality of variances among the study groups or homoscedasticity in the data. It permits evaluation of both dichotomous independent variables, and of multiple group studies. Unlike many other non-parametric and some parametric statistics, the calculations needed to compute the Chi-square provide considerable information about how each of the groups performed in the study. This richness of detail allows the researcher to understand the results and thus to derive more detailed information from this statistic than from many others. The Chi-square is a significance statistic, and should be followed with a strength statistic. The Cramer's V is the most common strength test used to test the data when a significant Chi-square result has been obtained. Advantages of the Chi-square include its robustness with respect to distribution of the data, its ease of computation, the detailed information that can be derived from the test, its use in studies for which parametric assumptions cannot be met, and its flexibility in handling data from both two group and multiple group studies. Limitations include its sample size requirements, difficulty of interpretation when there are large numbers of categories (20 or more) in the independent or dependent variables, and tendency of the Cramer's V to produce relative low correlation measures, even for highly significant results.

  18. Use of Robust z in Detecting Unstable Items in Item Response Theory Models

    ERIC Educational Resources Information Center

    Huynh, Huynh; Meyer, Patrick

    2010-01-01

    The first part of this paper describes the use of the robust z[subscript R] statistic to link test forms using the Rasch (or one-parameter logistic) model. The procedure is then extended to the two-parameter and three-parameter logistic and two-parameter partial credit (2PPC) models. A real set of data was used to illustrate the extension. The…

  19. HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS.

    PubMed

    Song, Chi; Tseng, George C

    2014-01-01

    Meta-analysis techniques have been widely developed and applied in genomic applications, especially for combining multiple transcriptomic studies. In this paper, we propose an order statistic of p-values ( r th ordered p-value, rOP) across combined studies as the test statistic. We illustrate different hypothesis settings that detect gene markers differentially expressed (DE) "in all studies", "in the majority of studies", or "in one or more studies", and specify rOP as a suitable method for detecting DE genes "in the majority of studies". We develop methods to estimate the parameter r in rOP for real applications. Statistical properties such as its asymptotic behavior and a one-sided testing correction for detecting markers of concordant expression changes are explored. Power calculation and simulation show better performance of rOP compared to classical Fisher's method, Stouffer's method, minimum p-value method and maximum p-value method under the focused hypothesis setting. Theoretically, rOP is found connected to the naïve vote counting method and can be viewed as a generalized form of vote counting with better statistical properties. The method is applied to three microarray meta-analysis examples including major depressive disorder, brain cancer and diabetes. The results demonstrate rOP as a more generalizable, robust and sensitive statistical framework to detect disease-related markers.

  20. An entropy-based nonparametric test for the validation of surrogate endpoints.

    PubMed

    Miao, Xiaopeng; Wang, Yong-Cheng; Gangopadhyay, Ashis

    2012-06-30

    We present a nonparametric test to validate surrogate endpoints based on measure of divergence and random permutation. This test is a proposal to directly verify the Prentice statistical definition of surrogacy. The test does not impose distributional assumptions on the endpoints, and it is robust to model misspecification. Our simulation study shows that the proposed nonparametric test outperforms the practical test of the Prentice criterion in terms of both robustness of size and power. We also evaluate the performance of three leading methods that attempt to quantify the effect of surrogate endpoints. The proposed method is applied to validate magnetic resonance imaging lesions as the surrogate endpoint for clinical relapses in a multiple sclerosis trial. Copyright © 2012 John Wiley & Sons, Ltd.

  1. Validating Coherence Measurements Using Aligned and Unaligned Coherence Functions

    NASA Technical Reports Server (NTRS)

    Miles, Jeffrey Hilton

    2006-01-01

    This paper describes a novel approach based on the use of coherence functions and statistical theory for sensor validation in a harsh environment. By the use of aligned and unaligned coherence functions and statistical theory one can test for sensor degradation, total sensor failure or changes in the signal. This advanced diagnostic approach and the novel data processing methodology discussed provides a single number that conveys this information. This number as calculated with standard statistical procedures for comparing the means of two distributions is compared with results obtained using Yuen's robust statistical method to create confidence intervals. Examination of experimental data from Kulite pressure transducers mounted in a Pratt & Whitney PW4098 combustor using spectrum analysis methods on aligned and unaligned time histories has verified the effectiveness of the proposed method. All the procedures produce good results which demonstrates how robust the technique is.

  2. Food Choice Questionnaire (FCQ) revisited. Suggestions for the development of an enhanced general food motivation model.

    PubMed

    Fotopoulos, Christos; Krystallis, Athanasios; Vassallo, Marco; Pagiaslis, Anastasios

    2009-02-01

    Recognising the need for a more statistically robust instrument to investigate general food selection determinants, the research validates and confirms Food Choice Questionnaire (FCQ's) factorial design, develops ad hoc a more robust FCQ version and tests its ability to discriminate between consumer segments in terms of the importance they assign to the FCQ motivational factors. The original FCQ appears to represent a comprehensive and reliable research instrument. However, the empirical data do not support the robustness of its 9-factorial design. On the other hand, segmentation results at the subpopulation level based on the enhanced FCQ version bring about an optimistic message for the FCQ's ability to predict food selection behaviour. The paper concludes that some of the basic components of the original FCQ can be used as a basis for a new general food motivation typology. The development of such a new instrument, with fewer, of higher abstraction FCQ-based dimensions and fewer items per dimension, is a right step forward; yet such a step should be theory-driven, while a rigorous statistical testing across and within population would be necessary.

  3. Statistical Considerations in Choosing a Test Reliability Coefficient. ACT Research Report Series, 2012 (10)

    ERIC Educational Resources Information Center

    Woodruff, David; Wu, Yi-Fang

    2012-01-01

    The purpose of this paper is to illustrate alpha's robustness and usefulness, using actual and simulated educational test data. The sampling properties of alpha are compared with the sampling properties of several other reliability coefficients: Guttman's lambda[subscript 2], lambda[subscript 4], and lambda[subscript 6]; test-retest reliability;…

  4. Statistics based sampling for controller and estimator design

    NASA Astrophysics Data System (ADS)

    Tenne, Dirk

    The purpose of this research is the development of statistical design tools for robust feed-forward/feedback controllers and nonlinear estimators. This dissertation is threefold and addresses the aforementioned topics nonlinear estimation, target tracking and robust control. To develop statistically robust controllers and nonlinear estimation algorithms, research has been performed to extend existing techniques, which propagate the statistics of the state, to achieve higher order accuracy. The so-called unscented transformation has been extended to capture higher order moments. Furthermore, higher order moment update algorithms based on a truncated power series have been developed. The proposed techniques are tested on various benchmark examples. Furthermore, the unscented transformation has been utilized to develop a three dimensional geometrically constrained target tracker. The proposed planar circular prediction algorithm has been developed in a local coordinate framework, which is amenable to extension of the tracking algorithm to three dimensional space. This tracker combines the predictions of a circular prediction algorithm and a constant velocity filter by utilizing the Covariance Intersection. This combined prediction can be updated with the subsequent measurement using a linear estimator. The proposed technique is illustrated on a 3D benchmark trajectory, which includes coordinated turns and straight line maneuvers. The third part of this dissertation addresses the design of controller which include knowledge of parametric uncertainties and their distributions. The parameter distributions are approximated by a finite set of points which are calculated by the unscented transformation. This set of points is used to design robust controllers which minimize a statistical performance of the plant over the domain of uncertainty consisting of a combination of the mean and variance. The proposed technique is illustrated on three benchmark problems. The first relates to the design of prefilters for a linear and nonlinear spring-mass-dashpot system and the second applies a feedback controller to a hovering helicopter. Lastly, the statistical robust controller design is devoted to a concurrent feed-forward/feedback controller structure for a high-speed low tension tape drive.

  5. Statistically Assessing Time-Averaged and Paleosecular Variation Field Models Against Paleomagnetic Directional Data Sets. Can Likely non-Zonal Features be Detected in a Robust way ?

    NASA Astrophysics Data System (ADS)

    Hulot, G.; Khokhlov, A.

    2007-12-01

    We recently introduced a method to rigorously test the statistical compatibility of combined time-averaged (TAF) and paleosecular variation (PSV) field models against any lava flow paleomagnetic database (Khokhlov et al., 2001, 2006). Applying this method to test (TAF+PSV) models against synthetic data produced from those shows that the method is very efficient at discriminating models, and very sensitive, provided those data errors are properly taken into account. This prompted us to test a variety of published combined (TAF+PSV) models against a test Bruhnes stable polarity data set extracted from the Quidelleur et al. (1994) data base. Not surprisingly, ignoring data errors leads all models to be rejected. But taking data errors into account leads to the stimulating conclusion that at least one (TAF+PSV) model appears to be compatible with the selected data set, this model being purely axisymmetric. This result shows that in practice also, and with the data bases currently available, the method can discriminate various candidate models and decide which actually best fits a given data set. But it also shows that likely non-zonal signatures of non-homogeneous boundary conditions imposed by the mantle are difficult to identify as statistically robust from paleomagnetic directional data sets. In the present paper, we will discuss the possibility that such signatures could eventually be identified as robust with the help of more recent data sets (such as the one put together under the collaborative "TAFI" effort, see e.g. Johnson et al. abstract #GP21A-0013, AGU Fall Meeting, 2005) or by taking additional information into account (such as the possible coincidence of non-zonal time-averaged field patterns with analogous patterns in the modern field).

  6. Consequences of Assumption Violations Revisited: A Quantitative Review of Alternatives to the One-Way Analysis of Variance "F" Test.

    ERIC Educational Resources Information Center

    Lix, Lisa M.; And Others

    1996-01-01

    Meta-analytic techniques were used to summarize the statistical robustness literature on Type I error properties of alternatives to the one-way analysis of variance "F" test. The James (1951) and Welch (1951) tests performed best under violations of the variance homogeneity assumption, although their use is not always appropriate. (SLD)

  7. Robust control for fractional variable-order chaotic systems with non-singular kernel

    NASA Astrophysics Data System (ADS)

    Zuñiga-Aguilar, C. J.; Gómez-Aguilar, J. F.; Escobar-Jiménez, R. F.; Romero-Ugalde, H. M.

    2018-01-01

    This paper investigates the chaos control for a class of variable-order fractional chaotic systems using robust control strategy. The variable-order fractional models of the non-autonomous biological system, the King Cobra chaotic system, the Halvorsen's attractor and the Burke-Shaw system, have been derived using the fractional-order derivative with Mittag-Leffler in the Liouville-Caputo sense. The fractional differential equations and the control law were solved using the Adams-Bashforth-Moulton algorithm. To test the control stability efficiency, different statistical indicators were introduced. Finally, simulation results demonstrate the effectiveness of the proposed robust control.

  8. Statistical Analysis of CFD Solutions from the Drag Prediction Workshop

    NASA Technical Reports Server (NTRS)

    Hemsch, Michael J.

    2002-01-01

    A simple, graphical framework is presented for robust statistical evaluation of results obtained from N-Version testing of a series of RANS CFD codes. The solutions were obtained by a variety of code developers and users for the June 2001 Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration used for the computational tests is the DLR-F4 wing-body combination previously tested in several European wind tunnels and for which a previous N-Version test had been conducted. The statistical framework is used to evaluate code results for (1) a single cruise design point, (2) drag polars and (3) drag rise. The paper concludes with a discussion of the meaning of the results, especially with respect to predictability, Validation, and reporting of solutions.

  9. [Statistical validity of the Mexican Food Security Scale and the Latin American and Caribbean Food Security Scale].

    PubMed

    Villagómez-Ornelas, Paloma; Hernández-López, Pedro; Carrasco-Enríquez, Brenda; Barrios-Sánchez, Karina; Pérez-Escamilla, Rafael; Melgar-Quiñónez, Hugo

    2014-01-01

    This article validates the statistical consistency of two food security scales: the Mexican Food Security Scale (EMSA) and the Latin American and Caribbean Food Security Scale (ELCSA). Validity tests were conducted in order to verify that both scales were consistent instruments, conformed by independent, properly calibrated and adequately sorted items, arranged in a continuum of severity. The following tests were developed: sorting of items; Cronbach's alpha analysis; parallelism of prevalence curves; Rasch models; sensitivity analysis through mean differences' hypothesis test. The tests showed that both scales meet the required attributes and are robust statistical instruments for food security measurement. This is relevant given that the lack of access to food indicator, included in multidimensional poverty measurement in Mexico, is calculated with EMSA.

  10. A robust and efficient statistical method for genetic association studies using case and control samples from multiple cohorts

    PubMed Central

    2013-01-01

    Background The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case–control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. Results We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson’s disease (PD) case–control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk. Conclusions We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS. PMID:23394771

  11. The Utility of Robust Means in Statistics

    ERIC Educational Resources Information Center

    Goodwyn, Fara

    2012-01-01

    Location estimates calculated from heuristic data were examined using traditional and robust statistical methods. The current paper demonstrates the impact outliers have on the sample mean and proposes robust methods to control for outliers in sample data. Traditional methods fail because they rely on the statistical assumptions of normality and…

  12. A novel modification of the Turing test for artificial intelligence and robotics in healthcare.

    PubMed

    Ashrafian, Hutan; Darzi, Ara; Athanasiou, Thanos

    2015-03-01

    The increasing demands of delivering higher quality global healthcare has resulted in a corresponding expansion in the development of computer-based and robotic healthcare tools that rely on artificially intelligent technologies. The Turing test was designed to assess artificial intelligence (AI) in computer technology. It remains an important qualitative tool for testing the next generation of medical diagnostics and medical robotics. Development of quantifiable diagnostic accuracy meta-analytical evaluative techniques for the Turing test paradigm. Modification of the Turing test to offer quantifiable diagnostic precision and statistical effect-size robustness in the assessment of AI for computer-based and robotic healthcare technologies. Modification of the Turing test to offer robust diagnostic scores for AI can contribute to enhancing and refining the next generation of digital diagnostic technologies and healthcare robotics. Copyright © 2014 John Wiley & Sons, Ltd.

  13. Multiple Phenotype Association Tests Using Summary Statistics in Genome-Wide Association Studies

    PubMed Central

    Liu, Zhonghua; Lin, Xihong

    2017-01-01

    Summary We study in this paper jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. PMID:28653391

  14. Multiple phenotype association tests using summary statistics in genome-wide association studies.

    PubMed

    Liu, Zhonghua; Lin, Xihong

    2018-03-01

    We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. © 2017, The International Biometric Society.

  15. Fusion And Inference From Multiple And Massive Disparate Distributed Dynamic Data Sets

    DTIC Science & Technology

    2017-07-01

    principled methodology for two-sample graph testing; designed a provably almost-surely perfect vertex clustering algorithm for block model graphs; proved...3.7 Semi-Supervised Clustering Methodology ...................................................................... 9 3.8 Robust Hypothesis Testing...dimensional Euclidean space – allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for

  16. The Performance of Methods to Test Upper-Level Mediation in the Presence of Nonnormal Data

    ERIC Educational Resources Information Center

    Pituch, Keenan A.; Stapleton, Laura M.

    2008-01-01

    A Monte Carlo study compared the statistical performance of standard and robust multilevel mediation analysis methods to test indirect effects for a cluster randomized experimental design under various departures from normality. The performance of these methods was examined for an upper-level mediation process, where the indirect effect is a fixed…

  17. An empirical comparison of methods for analyzing correlated data from a discrete choice survey to elicit patient preference for colorectal cancer screening

    PubMed Central

    2012-01-01

    Background A discrete choice experiment (DCE) is a preference survey which asks participants to make a choice among product portfolios comparing the key product characteristics by performing several choice tasks. Analyzing DCE data needs to account for within-participant correlation because choices from the same participant are likely to be similar. In this study, we empirically compared some commonly-used statistical methods for analyzing DCE data while accounting for within-participant correlation based on a survey of patient preference for colorectal cancer (CRC) screening tests conducted in Hamilton, Ontario, Canada in 2002. Methods A two-stage DCE design was used to investigate the impact of six attributes on participants' preferences for CRC screening test and willingness to undertake the test. We compared six models for clustered binary outcomes (logistic and probit regressions using cluster-robust standard error (SE), random-effects and generalized estimating equation approaches) and three models for clustered nominal outcomes (multinomial logistic and probit regressions with cluster-robust SE and random-effects multinomial logistic model). We also fitted a bivariate probit model with cluster-robust SE treating the choices from two stages as two correlated binary outcomes. The rank of relative importance between attributes and the estimates of β coefficient within attributes were used to assess the model robustness. Results In total 468 participants with each completing 10 choices were analyzed. Similar results were reported for the rank of relative importance and β coefficients across models for stage-one data on evaluating participants' preferences for the test. The six attributes ranked from high to low as follows: cost, specificity, process, sensitivity, preparation and pain. However, the results differed across models for stage-two data on evaluating participants' willingness to undertake the tests. Little within-patient correlation (ICC ≈ 0) was found in stage-one data, but substantial within-patient correlation existed (ICC = 0.659) in stage-two data. Conclusions When small clustering effect presented in DCE data, results remained robust across statistical models. However, results varied when larger clustering effect presented. Therefore, it is important to assess the robustness of the estimates via sensitivity analysis using different models for analyzing clustered data from DCE studies. PMID:22348526

  18. Approximate sample size formulas for the two-sample trimmed mean test with unequal variances.

    PubMed

    Luh, Wei-Ming; Guo, Jiin-Huarng

    2007-05-01

    Yuen's two-sample trimmed mean test statistic is one of the most robust methods to apply when variances are heterogeneous. The present study develops formulas for the sample size required for the test. The formulas are applicable for the cases of unequal variances, non-normality and unequal sample sizes. Given the specified alpha and the power (1-beta), the minimum sample size needed by the proposed formulas under various conditions is less than is given by the conventional formulas. Moreover, given a specified size of sample calculated by the proposed formulas, simulation results show that Yuen's test can achieve statistical power which is generally superior to that of the approximate t test. A numerical example is provided.

  19. Robust Covariate-Adjusted Log-Rank Statistics and Corresponding Sample Size Formula for Recurrent Events Data

    PubMed Central

    Song, Rui; Kosorok, Michael R.; Cai, Jianwen

    2009-01-01

    Summary Recurrent events data are frequently encountered in clinical trials. This article develops robust covariate-adjusted log-rank statistics applied to recurrent events data with arbitrary numbers of events under independent censoring and the corresponding sample size formula. The proposed log-rank tests are robust with respect to different data-generating processes and are adjusted for predictive covariates. It reduces to the Kong and Slud (1997, Biometrika 84, 847–862) setting in the case of a single event. The sample size formula is derived based on the asymptotic normality of the covariate-adjusted log-rank statistics under certain local alternatives and a working model for baseline covariates in the recurrent event data context. When the effect size is small and the baseline covariates do not contain significant information about event times, it reduces to the same form as that of Schoenfeld (1983, Biometrics 39, 499–503) for cases of a single event or independent event times within a subject. We carry out simulations to study the control of type I error and the comparison of powers between several methods in finite samples. The proposed sample size formula is illustrated using data from an rhDNase study. PMID:18162107

  20. Approach for Input Uncertainty Propagation and Robust Design in CFD Using Sensitivity Derivatives

    NASA Technical Reports Server (NTRS)

    Putko, Michele M.; Taylor, Arthur C., III; Newman, Perry A.; Green, Lawrence L.

    2002-01-01

    An implementation of the approximate statistical moment method for uncertainty propagation and robust optimization for quasi 3-D Euler CFD code is presented. Given uncertainties in statistically independent, random, normally distributed input variables, first- and second-order statistical moment procedures are performed to approximate the uncertainty in the CFD output. Efficient calculation of both first- and second-order sensitivity derivatives is required. In order to assess the validity of the approximations, these moments are compared with statistical moments generated through Monte Carlo simulations. The uncertainties in the CFD input variables are also incorporated into a robust optimization procedure. For this optimization, statistical moments involving first-order sensitivity derivatives appear in the objective function and system constraints. Second-order sensitivity derivatives are used in a gradient-based search to successfully execute a robust optimization. The approximate methods used throughout the analyses are found to be valid when considering robustness about input parameter mean values.

  1. Data-Division-Specific Robustness and Power of Randomization Tests for ABAB Designs

    ERIC Educational Resources Information Center

    Manolov, Rumen; Solanas, Antonio; Bulte, Isis; Onghena, Patrick

    2010-01-01

    This study deals with the statistical properties of a randomization test applied to an ABAB design in cases where the desirable random assignment of the points of change in phase is not possible. To obtain information about each possible data division, the authors carried out a conditional Monte Carlo simulation with 100,000 samples for each…

  2. Statistical tests for detecting associations with groups of genetic variants: generalization, evaluation, and implementation

    PubMed Central

    Ferguson, John; Wheeler, William; Fu, YiPing; Prokunina-Olsson, Ludmila; Zhao, Hongyu; Sampson, Joshua

    2013-01-01

    With recent advances in sequencing, genotyping arrays, and imputation, GWAS now aim to identify associations with rare and uncommon genetic variants. Here, we describe and evaluate a class of statistics, generalized score statistics (GSS), that can test for an association between a group of genetic variants and a phenotype. GSS are a simple weighted sum of single-variant statistics and their cross-products. We show that the majority of statistics currently used to detect associations with rare variants are equivalent to choosing a specific set of weights within this framework. We then evaluate the power of various weighting schemes as a function of variant characteristics, such as MAF, the proportion associated with the phenotype, and the direction of effect. Ultimately, we find that two classical tests are robust and powerful, but details are provided as to when other GSS may perform favorably. The software package CRaVe is available at our website (http://dceg.cancer.gov/bb/tools/crave). PMID:23092956

  3. Ranking metrics in gene set enrichment analysis: do they matter?

    PubMed

    Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna

    2017-05-12

    There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries.

  4. Standard and goodness-of-fit parameter estimation methods for the three-parameter lognormal distribution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kane, V.E.

    1982-01-01

    A class of goodness-of-fit estimators is found to provide a useful alternative in certain situations to the standard maximum likelihood method which has some undesirable estimation characteristics for estimation from the three-parameter lognormal distribution. The class of goodness-of-fit tests considered include the Shapiro-Wilk and Filliben tests which reduce to a weighted linear combination of the order statistics that can be maximized in estimation problems. The weighted order statistic estimators are compared to the standard procedures in Monte Carlo simulations. Robustness of the procedures are examined and example data sets analyzed.

  5. Statistical inference involving binomial and negative binomial parameters.

    PubMed

    García-Pérez, Miguel A; Núñez-Antón, Vicente

    2009-05-01

    Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.

  6. GPS baseline configuration design based on robustness analysis

    NASA Astrophysics Data System (ADS)

    Yetkin, M.; Berber, M.

    2012-11-01

    The robustness analysis results obtained from a Global Positioning System (GPS) network are dramatically influenced by the configurationof the observed baselines. The selection of optimal GPS baselines may allow for a cost effective survey campaign and a sufficiently robustnetwork. Furthermore, using the approach described in this paper, the required number of sessions, the baselines to be observed, and thesignificance levels for statistical testing and robustness analysis can be determined even before the GPS campaign starts. In this study, wepropose a robustness criterion for the optimal design of geodetic networks, and present a very simple and efficient algorithm based on thiscriterion for the selection of optimal GPS baselines. We also show the relationship between the number of sessions and the non-centralityparameter. Finally, a numerical example is given to verify the efficacy of the proposed approach.

  7. Climate Verification Using Running Mann Whitney Z Statistics

    USDA-ARS?s Scientific Manuscript database

    A robust method previously used to detect observed intra- to multi-decadal (IMD) climate regimes was adapted to test whether climate models could reproduce IMD variations in U.S. surface temperatures during 1919-2008. This procedure, called the running Mann Whitney Z (MWZ) method, samples data ranki...

  8. A Robust Semi-Parametric Test for Detecting Trait-Dependent Diversification.

    PubMed

    Rabosky, Daniel L; Huang, Huateng

    2016-03-01

    Rates of species diversification vary widely across the tree of life and there is considerable interest in identifying organismal traits that correlate with rates of speciation and extinction. However, it has been challenging to develop methodological frameworks for testing hypotheses about trait-dependent diversification that are robust to phylogenetic pseudoreplication and to directionally biased rates of character change. We describe a semi-parametric test for trait-dependent diversification that explicitly requires replicated associations between character states and diversification rates to detect effects. To use the method, diversification rates are reconstructed across a phylogenetic tree with no consideration of character states. A test statistic is then computed to measure the association between species-level traits and the corresponding diversification rate estimates at the tips of the tree. The empirical value of the test statistic is compared to a null distribution that is generated by structured permutations of evolutionary rates across the phylogeny. The test is applicable to binary discrete characters as well as continuous-valued traits and can accommodate extremely sparse sampling of character states at the tips of the tree. We apply the test to several empirical data sets and demonstrate that the method has acceptable Type I error rates. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  9. Structural damage detection based on stochastic subspace identification and statistical pattern recognition: I. Theory

    NASA Astrophysics Data System (ADS)

    Ren, W. X.; Lin, Y. Q.; Fang, S. E.

    2011-11-01

    One of the key issues in vibration-based structural health monitoring is to extract the damage-sensitive but environment-insensitive features from sampled dynamic response measurements and to carry out the statistical analysis of these features for structural damage detection. A new damage feature is proposed in this paper by using the system matrices of the forward innovation model based on the covariance-driven stochastic subspace identification of a vibrating system. To overcome the variations of the system matrices, a non-singularity transposition matrix is introduced so that the system matrices are normalized to their standard forms. For reducing the effects of modeling errors, noise and environmental variations on measured structural responses, a statistical pattern recognition paradigm is incorporated into the proposed method. The Mahalanobis and Euclidean distance decision functions of the damage feature vector are adopted by defining a statistics-based damage index. The proposed structural damage detection method is verified against one numerical signal and two numerical beams. It is demonstrated that the proposed statistics-based damage index is sensitive to damage and shows some robustness to the noise and false estimation of the system ranks. The method is capable of locating damage of the beam structures under different types of excitations. The robustness of the proposed damage detection method to the variations in environmental temperature is further validated in a companion paper by a reinforced concrete beam tested in the laboratory and a full-scale arch bridge tested in the field.

  10. Approach for Uncertainty Propagation and Robust Design in CFD Using Sensitivity Derivatives

    NASA Technical Reports Server (NTRS)

    Putko, Michele M.; Newman, Perry A.; Taylor, Arthur C., III; Green, Lawrence L.

    2001-01-01

    This paper presents an implementation of the approximate statistical moment method for uncertainty propagation and robust optimization for a quasi 1-D Euler CFD (computational fluid dynamics) code. Given uncertainties in statistically independent, random, normally distributed input variables, a first- and second-order statistical moment matching procedure is performed to approximate the uncertainty in the CFD output. Efficient calculation of both first- and second-order sensitivity derivatives is required. In order to assess the validity of the approximations, the moments are compared with statistical moments generated through Monte Carlo simulations. The uncertainties in the CFD input variables are also incorporated into a robust optimization procedure. For this optimization, statistical moments involving first-order sensitivity derivatives appear in the objective function and system constraints. Second-order sensitivity derivatives are used in a gradient-based search to successfully execute a robust optimization. The approximate methods used throughout the analyses are found to be valid when considering robustness about input parameter mean values.

  11. Using permutation tests to enhance causal inference in interrupted time series analysis.

    PubMed

    Linden, Ariel

    2018-06-01

    Interrupted time series analysis (ITSA) is an evaluation methodology in which a single treatment unit's outcome is studied serially over time and the intervention is expected to "interrupt" the level and/or trend of that outcome. The internal validity is strengthened considerably when the treated unit is contrasted with a comparable control group. In this paper, we introduce a robustness check based on permutation tests to further improve causal inference. We evaluate the effect of California's Proposition 99 for reducing cigarette sales by iteratively casting each nontreated state into the role of "treated," creating a comparable control group using the ITSAMATCH package in Stata, and then evaluating treatment effects using ITSA regression. If statistically significant "treatment effects" are estimated for pseudotreated states, then any significant changes in the outcome of the actual treatment unit (California) cannot be attributed to the intervention. We perform these analyses setting the cutpoint significance level to P > .40 for identifying balanced matches (the highest threshold possible for which controls could still be found for California) and use the difference in differences of trends as the treatment effect estimator. Only California attained a statistically significant treatment effect, strengthening confidence in the conclusion that Proposition 99 reduced cigarette sales. The proposed permutation testing framework provides an additional robustness check to either support or refute a treatment effect identified in for the true treated unit in ITSA. Given its value and ease of implementation, this framework should be considered as a standard robustness test in all multiple group interrupted time series analyses. © 2018 John Wiley & Sons, Ltd.

  12. Mapping Quantitative Traits in Unselected Families: Algorithms and Examples

    PubMed Central

    Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David

    2009-01-01

    Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016

  13. Vector autoregressive models: A Gini approach

    NASA Astrophysics Data System (ADS)

    Mussard, Stéphane; Ndiaye, Oumar Hamady

    2018-02-01

    In this paper, it is proven that the usual VAR models may be performed in the Gini sense, that is, on a ℓ1 metric space. The Gini regression is robust to outliers. As a consequence, when data are contaminated by extreme values, we show that semi-parametric VAR-Gini regressions may be used to obtain robust estimators. The inference about the estimators is made with the ℓ1 norm. Also, impulse response functions and Gini decompositions for prevision errors are introduced. Finally, Granger's causality tests are properly derived based on U-statistics.

  14. A generalized association test based on U statistics.

    PubMed

    Wei, Changshuai; Lu, Qing

    2017-07-01

    Second generation sequencing technologies are being increasingly used for genetic association studies, where the main research interest is to identify sets of genetic variants that contribute to various phenotypes. The phenotype can be univariate disease status, multivariate responses and even high-dimensional outcomes. Considering the genotype and phenotype as two complex objects, this also poses a general statistical problem of testing association between complex objects. We here proposed a similarity-based test, generalized similarity U (GSU), that can test the association between complex objects. We first studied the theoretical properties of the test in a general setting and then focused on the application of the test to sequencing association studies. Based on theoretical analysis, we proposed to use Laplacian Kernel-based similarity for GSU to boost power and enhance robustness. Through simulation, we found that GSU did have advantages over existing methods in terms of power and robustness. We further performed a whole genome sequencing (WGS) scan for Alzherimer's disease neuroimaging initiative data, identifying three genes, APOE , APOC1 and TOMM40 , associated with imaging phenotype. We developed a C ++ package for analysis of WGS data using GSU. The source codes can be downloaded at https://github.com/changshuaiwei/gsu . weichangshuai@gmail.com ; qlu@epi.msu.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  15. The Sperm Chromatin Structure Assay (SCSA(®)) and other sperm DNA fragmentation tests for evaluation of sperm nuclear DNA integrity as related to fertility.

    PubMed

    Evenson, Donald P

    2016-06-01

    Thirty-five years ago the pioneering paper in Science (240:1131) on the relationship between sperm DNA integrity and pregnancy outcome was featured as the cover issue showing a fluorescence photomicrograph of red and green stained sperm. The flow cytometry data showed a very significant difference in sperm DNA integrity between fertile and subfertile bulls and men. This study utilized heat (100°C, 5min) to denature DNA at sites of DNA strand breaks followed by staining with acridine orange (AO) and measurements of 5000 individual sperm of green double strand (ds) DNA and red single strand (ss) DNA fluorescence. Later, the heat protocol was changed to a low pH protocol to denature the DNA at sites of strand breaks; the heat and acid procedures produced the same results. SCSA data are very advantageously dual parameter with 1024 channels (degrees) of both red and green fluorescence. Hundreds of publications on the use of the SCSA test in animals and humans have validated the SCSA as a highly useful test for determining male breeding soundness. The SCSA test is a rapid, non-biased flow cytometer machine measurement providing robust statistical data with exceptional precision and repeatability. Many genotoxic experiments showed excellent dose response data with very low coefficient of variation that further validated the SCSA as being a highly powerful assay for sperm DNA integrity. Twelve years following the introduction of the SCSA test, the terminal deoxynucleotidyl transferase-mediated fluorescein-dUTP nick end labelling (TUNEL) test (1993) for sperm was introduced as the only other flow cytometric assay for sperm DNA fragmentation. However, the TUNEL test can also be done by light microscopy with much less statistical robustness. The COMET (1998) and Sperm Chromatin Dispersion (SCD; HALO) (2003) tests were introduced as light microscope tests that don't require a flow cytometer. Since these tests measure only 50-200 sperm per sample, they suffer from the lack of the statistical robustness of flow cytometric measurements. Only the SCSA test has an exact standardization of a fixed protocol. The many variations of the other tests make it very difficult to compare data and thresholds for risk of male factor infertility. Data from these four sperm DNA fragmentation tests plus the light microscope acridine orange test (AOT) are correlated to various degrees. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  16. Integrating prior knowledge in multiple testing under dependence with applications to detecting differential DNA methylation.

    PubMed

    Kuan, Pei Fen; Chiang, Derek Y

    2012-09-01

    DNA methylation has emerged as an important hallmark of epigenetics. Numerous platforms including tiling arrays and next generation sequencing, and experimental protocols are available for profiling DNA methylation. Similar to other tiling array data, DNA methylation data shares the characteristics of inherent correlation structure among nearby probes. However, unlike gene expression or protein DNA binding data, the varying CpG density which gives rise to CpG island, shore and shelf definition provides exogenous information in detecting differential methylation. This article aims to introduce a robust testing and probe ranking procedure based on a nonhomogeneous hidden Markov model that incorporates the above-mentioned features for detecting differential methylation. We revisit the seminal work of Sun and Cai (2009, Journal of the Royal Statistical Society: Series B (Statistical Methodology)71, 393-424) and propose modeling the nonnull using a nonparametric symmetric distribution in two-sided hypothesis testing. We show that this model improves probe ranking and is robust to model misspecification based on extensive simulation studies. We further illustrate that our proposed framework achieves good operating characteristics as compared to commonly used methods in real DNA methylation data that aims to detect differential methylation sites. © 2012, The International Biometric Society.

  17. Detecting associated single-nucleotide polymorphisms on the X chromosome in case control genome-wide association studies.

    PubMed

    Chen, Zhongxue; Ng, Hon Keung Tony; Li, Jing; Liu, Qingzhong; Huang, Hanwen

    2017-04-01

    In the past decade, hundreds of genome-wide association studies have been conducted to detect the significant single-nucleotide polymorphisms that are associated with certain diseases. However, most of the data from the X chromosome were not analyzed and only a few significant associated single-nucleotide polymorphisms from the X chromosome have been identified from genome-wide association studies. This is mainly due to the lack of powerful statistical tests. In this paper, we propose a novel statistical approach that combines the information of single-nucleotide polymorphisms on the X chromosome from both males and females in an efficient way. The proposed approach avoids the need of making strong assumptions about the underlying genetic models. Our proposed statistical test is a robust method that only makes the assumption that the risk allele is the same for both females and males if the single-nucleotide polymorphism is associated with the disease for both genders. Through simulation study and a real data application, we show that the proposed procedure is robust and have excellent performance compared to existing methods. We expect that many more associated single-nucleotide polymorphisms on the X chromosome will be identified if the proposed approach is applied to current available genome-wide association studies data.

  18. Effect size measures in a two-independent-samples case with nonnormal and nonhomogeneous data.

    PubMed

    Li, Johnson Ching-Hong

    2016-12-01

    In psychological science, the "new statistics" refer to the new statistical practices that focus on effect size (ES) evaluation instead of conventional null-hypothesis significance testing (Cumming, Psychological Science, 25, 7-29, 2014). In a two-independent-samples scenario, Cohen's (1988) standardized mean difference (d) is the most popular ES, but its accuracy relies on two assumptions: normality and homogeneity of variances. Five other ESs-the unscaled robust d (d r * ; Hogarty & Kromrey, 2001), scaled robust d (d r ; Algina, Keselman, & Penfield, Psychological Methods, 10, 317-328, 2005), point-biserial correlation (r pb ; McGrath & Meyer, Psychological Methods, 11, 386-401, 2006), common-language ES (CL; Cliff, Psychological Bulletin, 114, 494-509, 1993), and nonparametric estimator for CL (A w ; Ruscio, Psychological Methods, 13, 19-30, 2008)-may be robust to violations of these assumptions, but no study has systematically evaluated their performance. Thus, in this simulation study the performance of these six ESs was examined across five factors: data distribution, sample, base rate, variance ratio, and sample size. The results showed that A w and d r were generally robust to these violations, and A w slightly outperformed d r . Implications for the use of A w and d r in real-world research are discussed.

  19. Contour plot assessment of existing meta-analyses confirms robust association of statin use and acute kidney injury risk.

    PubMed

    Chevance, Aurélie; Schuster, Tibor; Steele, Russell; Ternès, Nils; Platt, Robert W

    2015-10-01

    Robustness of an existing meta-analysis can justify decisions on whether to conduct an additional study addressing the same research question. We illustrate the graphical assessment of the potential impact of an additional study on an existing meta-analysis using published data on statin use and the risk of acute kidney injury. A previously proposed graphical augmentation approach is used to assess the sensitivity of the current test and heterogeneity statistics extracted from existing meta-analysis data. In addition, we extended the graphical augmentation approach to assess potential changes in the pooled effect estimate after updating a current meta-analysis and applied the three graphical contour definitions to data from meta-analyses on statin use and acute kidney injury risk. In the considered example data, the pooled effect estimates and heterogeneity indices demonstrated to be considerably robust to the addition of a future study. Supportingly, for some previously inconclusive meta-analyses, a study update might yield statistically significant kidney injury risk increase associated with higher statin exposure. The illustrated contour approach should become a standard tool for the assessment of the robustness of meta-analyses. It can guide decisions on whether to conduct additional studies addressing a relevant research question. Copyright © 2015 Elsevier Inc. All rights reserved.

  20. Ice Water Classification Using Statistical Distribution Based Conditional Random Fields in RADARSAT-2 Dual Polarization Imagery

    NASA Astrophysics Data System (ADS)

    Zhang, Y.; Li, F.; Zhang, S.; Hao, W.; Zhu, T.; Yuan, L.; Xiao, F.

    2017-09-01

    In this paper, Statistical Distribution based Conditional Random Fields (STA-CRF) algorithm is exploited for improving marginal ice-water classification. Pixel level ice concentration is presented as the comparison of methods based on CRF. Furthermore, in order to explore the effective statistical distribution model to be integrated into STA-CRF, five statistical distribution models are investigated. The STA-CRF methods are tested on 2 scenes around Prydz Bay and Adélie Depression, where contain a variety of ice types during melt season. Experimental results indicate that the proposed method can resolve sea ice edge well in Marginal Ice Zone (MIZ) and show a robust distinction of ice and water.

  1. Statistical Analysis of Big Data on Pharmacogenomics

    PubMed Central

    Fan, Jianqing; Liu, Han

    2013-01-01

    This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905

  2. Generalizations and Extensions of the Probability of Superiority Effect Size Estimator

    ERIC Educational Resources Information Center

    Ruscio, John; Gera, Benjamin Lee

    2013-01-01

    Researchers are strongly encouraged to accompany the results of statistical tests with appropriate estimates of effect size. For 2-group comparisons, a probability-based effect size estimator ("A") has many appealing properties (e.g., it is easy to understand, robust to violations of parametric assumptions, insensitive to outliers). We review…

  3. Robust Optimum Invariant Tests for Random MANOVA Models.

    DTIC Science & Technology

    1986-10-01

    are assumed to be independent normal with zero mean and dispersion o2 and o72 respectively, Roy and Gnanadesikan (1959) considered the prob- 2 2 lem of...Part II: The multivariate case. Ann. Math. Statist. 31, 939-968. [7] Roy, S.N. and Gnanadesikan , R. (1959). Some contributions to ANOVA in one or more

  4. Robustness of Reconstructed Ancestral Protein Functions to Statistical Uncertainty.

    PubMed

    Eick, Geeta N; Bridgham, Jamie T; Anderson, Douglas P; Harms, Michael J; Thornton, Joseph W

    2017-02-01

    Hypotheses about the functions of ancient proteins and the effects of historical mutations on them are often tested using ancestral protein reconstruction (APR)-phylogenetic inference of ancestral sequences followed by synthesis and experimental characterization. Usually, some sequence sites are ambiguously reconstructed, with two or more statistically plausible states. The extent to which the inferred functions and mutational effects are robust to uncertainty about the ancestral sequence has not been studied systematically. To address this issue, we reconstructed ancestral proteins in three domain families that have different functions, architectures, and degrees of uncertainty; we then experimentally characterized the functional robustness of these proteins when uncertainty was incorporated using several approaches, including sampling amino acid states from the posterior distribution at each site and incorporating the alternative amino acid state at every ambiguous site in the sequence into a single "worst plausible case" protein. In every case, qualitative conclusions about the ancestral proteins' functions and the effects of key historical mutations were robust to sequence uncertainty, with similar functions observed even when scores of alternate amino acids were incorporated. There was some variation in quantitative descriptors of function among plausible sequences, suggesting that experimentally characterizing robustness is particularly important when quantitative estimates of ancient biochemical parameters are desired. The worst plausible case method appears to provide an efficient strategy for characterizing the functional robustness of ancestral proteins to large amounts of sequence uncertainty. Sampling from the posterior distribution sometimes produced artifactually nonfunctional proteins for sequences reconstructed with substantial ambiguity. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  5. Parenchymal texture analysis in digital mammography: robust texture feature identification and equivalence across devices.

    PubMed

    Keller, Brad M; Oustimov, Andrew; Wang, Yan; Chen, Jinbo; Acciavatti, Raymond J; Zheng, Yuanjie; Ray, Shonket; Gee, James C; Maidment, Andrew D A; Kontos, Despina

    2015-04-01

    An analytical framework is presented for evaluating the equivalence of parenchymal texture features across different full-field digital mammography (FFDM) systems using a physical breast phantom. Phantom images (FOR PROCESSING) are acquired from three FFDM systems using their automated exposure control setting. A panel of texture features, including gray-level histogram, co-occurrence, run length, and structural descriptors, are extracted. To identify features that are robust across imaging systems, a series of equivalence tests are performed on the feature distributions, in which the extent of their intersystem variation is compared to their intrasystem variation via the Hodges-Lehmann test statistic. Overall, histogram and structural features tend to be most robust across all systems, and certain features, such as edge enhancement, tend to be more robust to intergenerational differences between detectors of a single vendor than to intervendor differences. Texture features extracted from larger regions of interest (i.e., [Formula: see text]) and with a larger offset length (i.e., [Formula: see text]), when applicable, also appear to be more robust across imaging systems. This framework and observations from our experiments may benefit applications utilizing mammographic texture analysis on images acquired in multivendor settings, such as in multicenter studies of computer-aided detection and breast cancer risk assessment.

  6. A Simple and Robust Method for Partially Matched Samples Using the P-Values Pooling Approach

    PubMed Central

    Kuan, Pei Fen; Huang, Bo

    2013-01-01

    This paper focuses on statistical analyses in scenarios where some samples from the matched pairs design are missing, resulting in partially matched samples. Motivated by the idea of meta-analysis, we recast the partially matched samples as coming from two experimental designs, and propose a simple yet robust approach based on the weighted Z-test to integrate the p-values computed from these two designs. We show that the proposed approach achieves better operating characteristics in simulations and a case study, compared to existing methods for partially matched samples. PMID:23417968

  7. Navy Littoral Combat Ship (LCS)/Frigate Program: Background and Issues for Congress

    DTIC Science & Technology

    2015-09-23

    Defense Daily, June 2, 1014 : 4-5; Michael Fabey, “Robust Air Defense Not Needed In New Frigates, Studies Show,” Aerospace Daily & Defense Report...the frigate, according to an Aug. 7 notice posted to the Federal Business Opportunities website.48 Technical Risk and Issues Relating to Program...the Pentagon’s top test and evaluation officer. “Recent developmental testing provides no statistical evidence that the system is demonstrating

  8. Uniting statistical and individual-based approaches for animal movement modelling.

    PubMed

    Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel

    2014-01-01

    The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems.

  9. Uniting Statistical and Individual-Based Approaches for Animal Movement Modelling

    PubMed Central

    Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel

    2014-01-01

    The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems. PMID:24979047

  10. A Robust Method of Measuring Other-Race and Other-Ethnicity Effects: The Cambridge Face Memory Test Format

    PubMed Central

    McKone, Elinor; Stokes, Sacha; Liu, Jia; Cohan, Sarah; Fiorentini, Chiara; Pidcock, Madeleine; Yovel, Galit; Broughton, Mary; Pelleg, Michel

    2012-01-01

    Other-race and other-ethnicity effects on face memory have remained a topic of consistent research interest over several decades, across fields including face perception, social psychology, and forensic psychology (eyewitness testimony). Here we demonstrate that the Cambridge Face Memory Test format provides a robust method for measuring these effects. Testing the Cambridge Face Memory Test original version (CFMT-original; European-ancestry faces from Boston USA) and a new Cambridge Face Memory Test Chinese (CFMT-Chinese), with European and Asian observers, we report a race-of-face by race-of-observer interaction that was highly significant despite modest sample size and despite observers who had quite high exposure to the other race. We attribute this to high statistical power arising from the very high internal reliability of the tasks. This power also allows us to demonstrate a much smaller within-race other ethnicity effect, based on differences in European physiognomy between Boston faces/observers and Australian faces/observers (using the CFMT-Australian). PMID:23118912

  11. Comparative psychology and the grand challenge of drug discovery in psychiatry and neurodegeneration.

    PubMed

    Brunner, Dani; Balcı, Fuat; Ludvig, Elliot A

    2012-02-01

    Drug discovery for brain disorders is undergoing a period of upheaval. Faced with an empty drug pipeline and numerous failures of potential new drugs in clinical trials, many large pharmaceutical companies have been shrinking or even closing down their research divisions that focus on central nervous system (CNS) disorders. In this paper, we argue that many of the difficulties facing CNS drug discovery stem from a lack of robustness in pre-clinical (i.e., non-human animal) testing. There are two main sources for this lack of robustness. First, there is the lack of replicability of many results from the pre-clinical stage, which we argue is driven by a combination of publication bias and inappropriate selection of statistical and experimental designs. Second, there is the frequent failure to translate results in non-human animals to parallel results in humans in the clinic. This limitation can only be overcome by developing new behavioral tests for non-human animals that have predictive, construct, and etiological validity. Here, we present these translational difficulties as a "grand challenge" to researchers from comparative cognition, who are well positioned to provide new methods for testing behavior and cognition in non-human animals. These new experimental protocols will need to be both statistically robust and target behavioral and cognitive processes that allow for better connection with human CNS disorders. Our hope is that this downturn in industrial research may represent an opportunity to develop new protocols that will re-kindle the search for more effective and safer drugs for CNS disorders. Copyright © 2011 Elsevier B.V. All rights reserved.

  12. The other half of the story: effect size analysis in quantitative research.

    PubMed

    Maher, Jessica Middlemis; Markey, Jonathan C; Ebert-May, Diane

    2013-01-01

    Statistical significance testing is the cornerstone of quantitative research, but studies that fail to report measures of effect size are potentially missing a robust part of the analysis. We provide a rationale for why effect size measures should be included in quantitative discipline-based education research. Examples from both biological and educational research demonstrate the utility of effect size for evaluating practical significance. We also provide details about some effect size indices that are paired with common statistical significance tests used in educational research and offer general suggestions for interpreting effect size measures. Finally, we discuss some inherent limitations of effect size measures and provide further recommendations about reporting confidence intervals.

  13. Performance of Modified Test Statistics in Covariance and Correlation Structure Analysis under Conditions of Multivariate Nonnormality.

    ERIC Educational Resources Information Center

    Fouladi, Rachel T.

    2000-01-01

    Provides an overview of standard and modified normal theory and asymptotically distribution-free covariance and correlation structure analysis techniques and details Monte Carlo simulation results on Type I and Type II error control. Demonstrates through the simulation that robustness and nonrobustness of structure analysis techniques vary as a…

  14. How to Use Value-Added Analysis to Improve Student Learning: A Field Guide for School and District Leaders

    ERIC Educational Resources Information Center

    Kennedy, Kate; Peters, Mary; Thomas, Mike

    2012-01-01

    Value-added analysis is the most robust, statistically significant method available for helping educators quantify student progress over time. This powerful tool also reveals tangible strategies for improving instruction. Built around the work of Battelle for Kids, this book provides a field-tested continuous improvement model for using…

  15. Passing the Test: Ecological Regression Analysis in the Los Angeles County Case and Beyond.

    ERIC Educational Resources Information Center

    Lichtman, Allan J.

    1991-01-01

    Statistical analysis of racially polarized voting prepared for the Garza v County of Los Angeles (California) (1990) voting rights case is reviewed to demonstrate that ecological regression is a flexible, robust technique that illuminates the reality of ethnic voting, and superior to the neighborhood model supported by the defendants. (SLD)

  16. Super-delta: a new differential gene expression analysis procedure with robust data normalization.

    PubMed

    Liu, Yuhang; Zhang, Jinfeng; Qiu, Xing

    2017-12-21

    Normalization is an important data preparation step in gene expression analyses, designed to remove various systematic noise. Sample variance is greatly reduced after normalization, hence the power of subsequent statistical analyses is likely to increase. On the other hand, variance reduction is made possible by borrowing information across all genes, including differentially expressed genes (DEGs) and outliers, which will inevitably introduce some bias. This bias typically inflates type I error; and can reduce statistical power in certain situations. In this study we propose a new differential expression analysis pipeline, dubbed as super-delta, that consists of a multivariate extension of the global normalization and a modified t-test. A robust procedure is designed to minimize the bias introduced by DEGs in the normalization step. The modified t-test is derived based on asymptotic theory for hypothesis testing that suitably pairs with the proposed robust normalization. We first compared super-delta with four commonly used normalization methods: global, median-IQR, quantile, and cyclic loess normalization in simulation studies. Super-delta was shown to have better statistical power with tighter control of type I error rate than its competitors. In many cases, the performance of super-delta is close to that of an oracle test in which datasets without technical noise were used. We then applied all methods to a collection of gene expression datasets on breast cancer patients who received neoadjuvant chemotherapy. While there is a substantial overlap of the DEGs identified by all of them, super-delta were able to identify comparatively more DEGs than its competitors. Downstream gene set enrichment analysis confirmed that all these methods selected largely consistent pathways. Detailed investigations on the relatively small differences showed that pathways identified by super-delta have better connections to breast cancer than other methods. As a new pipeline, super-delta provides new insights to the area of differential gene expression analysis. Solid theoretical foundation supports its asymptotic unbiasedness and technical noise-free properties. Implementation on real and simulated datasets demonstrates its decent performance compared with state-of-art procedures. It also has the potential of expansion to be incorporated with other data type and/or more general between-group comparison problems.

  17. Fundamentals of Counting Statistics in Digital PCR: I Just Measured Two Target Copies-What Does It Mean?

    PubMed

    Tzonev, Svilen

    2018-01-01

    Current commercially available digital PCR (dPCR) systems and assays are capable of detecting individual target molecules with considerable reliability. As tests are developed and validated for use on clinical samples, the need to understand and develop robust statistical analysis routines increases. This chapter covers the fundamental processes and limitations of detecting and reporting on single molecule detection. We cover the basics of quantification of targets and sources of imprecision. We describe the basic test concepts: sensitivity, specificity, limit of blank, limit of detection, and limit of quantification in the context of dPCR. We provide basic guidelines how to determine those, how to choose and interpret the operating point, and what factors may influence overall test performance in practice.

  18. Enhanced echolocation via robust statistics and super-resolution of sonar images

    NASA Astrophysics Data System (ADS)

    Kim, Kio

    Echolocation is a process in which an animal uses acoustic signals to exchange information with environments. In a recent study, Neretti et al. have shown that the use of robust statistics can significantly improve the resiliency of echolocation against noise and enhance its accuracy by suppressing the development of sidelobes in the processing of an echo signal. In this research, the use of robust statistics is extended to problems in underwater explorations. The dissertation consists of two parts. Part I describes how robust statistics can enhance the identification of target objects, which in this case are cylindrical containers filled with four different liquids. Particularly, this work employs a variation of an existing robust estimator called an L-estimator, which was first suggested by Koenker and Bassett. As pointed out by Au et al.; a 'highlight interval' is an important feature, and it is closely related with many other important features that are known to be crucial for dolphin echolocation. A varied L-estimator described in this text is used to enhance the detection of highlight intervals, which eventually leads to a successful classification of echo signals. Part II extends the problem into 2 dimensions. Thanks to the advances in material and computer technology, various sonar imaging modalities are available on the market. By registering acoustic images from such video sequences, one can extract more information on the region of interest. Computer vision and image processing allowed application of robust statistics to the acoustic images produced by forward looking sonar systems, such as Dual-frequency Identification Sonar and ProViewer. The first use of robust statistics for sonar image enhancement in this text is in image registration. Random Sampling Consensus (RANSAC) is widely used for image registration. The registration algorithm using RANSAC is optimized for sonar image registration, and the performance is studied. The second use of robust statistics is in fusing the images. It is shown that the maximum a posteriori fusion method can be formulated in a Kalman filter-like manner, and also that the resulting expression is identical to a W-estimator with a specific weight function.

  19. Statistical Learning in a Natural Language by 8-Month-Old Infants

    PubMed Central

    Pelucchi, Bruna; Hay, Jessica F.; Saffran, Jenny R.

    2013-01-01

    Numerous studies over the past decade support the claim that infants are equipped with powerful statistical language learning mechanisms. The primary evidence for statistical language learning in word segmentation comes from studies using artificial languages, continuous streams of synthesized syllables that are highly simplified relative to real speech. To what extent can these conclusions be scaled up to natural language learning? In the current experiments, English-learning 8-month-old infants’ ability to track transitional probabilities in fluent infant-directed Italian speech was tested (N = 72). The results suggest that infants are sensitive to transitional probability cues in unfamiliar natural language stimuli, and support the claim that statistical learning is sufficiently robust to support aspects of real-world language acquisition. PMID:19489896

  20. Statistical learning in a natural language by 8-month-old infants.

    PubMed

    Pelucchi, Bruna; Hay, Jessica F; Saffran, Jenny R

    2009-01-01

    Numerous studies over the past decade support the claim that infants are equipped with powerful statistical language learning mechanisms. The primary evidence for statistical language learning in word segmentation comes from studies using artificial languages, continuous streams of synthesized syllables that are highly simplified relative to real speech. To what extent can these conclusions be scaled up to natural language learning? In the current experiments, English-learning 8-month-old infants' ability to track transitional probabilities in fluent infant-directed Italian speech was tested (N = 72). The results suggest that infants are sensitive to transitional probability cues in unfamiliar natural language stimuli, and support the claim that statistical learning is sufficiently robust to support aspects of real-world language acquisition.

  1. Low power and type II errors in recent ophthalmology research.

    PubMed

    Khan, Zainab; Milko, Jordan; Iqbal, Munir; Masri, Moness; Almeida, David R P

    2016-10-01

    To investigate the power of unpaired t tests in prospective, randomized controlled trials when these tests failed to detect a statistically significant difference and to determine the frequency of type II errors. Systematic review and meta-analysis. We examined all prospective, randomized controlled trials published between 2010 and 2012 in 4 major ophthalmology journals (Archives of Ophthalmology, British Journal of Ophthalmology, Ophthalmology, and American Journal of Ophthalmology). Studies that used unpaired t tests were included. Power was calculated using the number of subjects in each group, standard deviations, and α = 0.05. The difference between control and experimental means was set to be (1) 20% and (2) 50% of the absolute value of the control's initial conditions. Power and Precision version 4.0 software was used to carry out calculations. Finally, the proportion of articles with type II errors was calculated. β = 0.3 was set as the largest acceptable value for the probability of type II errors. In total, 280 articles were screened. Final analysis included 50 prospective, randomized controlled trials using unpaired t tests. The median power of tests to detect a 50% difference between means was 0.9 and was the same for all 4 journals regardless of the statistical significance of the test. The median power of tests to detect a 20% difference between means ranged from 0.26 to 0.9 for the 4 journals. The median power of these tests to detect a 50% and 20% difference between means was 0.9 and 0.5 for tests that did not achieve statistical significance. A total of 14% and 57% of articles with negative unpaired t tests contained results with β > 0.3 when power was calculated for differences between means of 50% and 20%, respectively. A large portion of studies demonstrate high probabilities of type II errors when detecting small differences between means. The power to detect small difference between means varies across journals. It is, therefore, worthwhile for authors to mention the minimum clinically important difference for individual studies. Journals can consider publishing statistical guidelines for authors to use. Day-to-day clinical decisions rely heavily on the evidence base formed by the plethora of studies available to clinicians. Prospective, randomized controlled clinical trials are highly regarded as a robust study and are used to make important clinical decisions that directly affect patient care. The quality of study designs and statistical methods in major clinical journals is improving overtime, 1 and researchers and journals are being more attentive to statistical methodologies incorporated by studies. The results of well-designed ophthalmic studies with robust methodologies, therefore, have the ability to modify the ways in which diseases are managed. Copyright © 2016 Canadian Ophthalmological Society. Published by Elsevier Inc. All rights reserved.

  2. Measuring continuous baseline covariate imbalances in clinical trial data

    PubMed Central

    Ciolino, Jody D.; Martin, Renee’ H.; Zhao, Wenle; Hill, Michael D.; Jauch, Edward C.; Palesch, Yuko Y.

    2014-01-01

    This paper presents and compares several methods of measuring continuous baseline covariate imbalance in clinical trial data. Simulations illustrate that though the t-test is an inappropriate method of assessing continuous baseline covariate imbalance, the test statistic itself is a robust measure in capturing imbalance in continuous covariate distributions. Guidelines to assess effects of imbalance on bias, type I error rate, and power for hypothesis test for treatment effect on continuous outcomes are presented, and the benefit of covariate-adjusted analysis (ANCOVA) is also illustrated. PMID:21865270

  3. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

    PubMed

    Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

    2016-07-01

    A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  4. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

    PubMed Central

    Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J.; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T.; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

    2016-01-01

    Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco Contacts: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153689

  5. [Quality of clinical studies published in the RBGO over one decade (1999-2009): methodological and ethical aspects and statistical procedures].

    PubMed

    de Sá, Joceline Cássia Ferezini; Marini, Gabriela; Gelaleti, Rafael Bottaro; da Silva, João Batista; de Azevedo, George Gantas; Rudge, Marilza Vieira Cunha

    2013-11-01

    To evaluate the methodological and statistical design evolution of the publications in the Brazilian Journal of Gynecology and Obstetrics (RBGO) from resolution 196/96. A review of 133 articles published in 1999 (65) and 2009 (68) was performed by two independent reviewers with training in clinical epidemiology and methodology of scientific research. We included all original clinical articles, case and series reports and excluded editorials, letters to the editor, systematic reviews, experimental studies, opinion articles, besides abstracts of theses and dissertations. Characteristics related to the methodological quality of the studies were analyzed in each article using a checklist that evaluated two criteria: methodological aspects and statistical procedures. We used descriptive statistics and the χ2 test for comparison of the two years. There was a difference between 1999 and 2009 regarding the study and statistical design, with more accuracy in the procedures and the use of more robust tests between 1999 and 2009. In RBGO, we observed an evolution in the methods of published articles and a more in-depth use of the statistical analyses, with more sophisticated tests such as regression and multilevel analyses, which are essential techniques for the knowledge and planning of health interventions, leading to fewer interpretation errors.

  6. Robust estimation approach for blind denoising.

    PubMed

    Rabie, Tamer

    2005-11-01

    This work develops a new robust statistical framework for blind image denoising. Robust statistics addresses the problem of estimation when the idealized assumptions about a system are occasionally violated. The contaminating noise in an image is considered as a violation of the assumption of spatial coherence of the image intensities and is treated as an outlier random variable. A denoised image is estimated by fitting a spatially coherent stationary image model to the available noisy data using a robust estimator-based regression method within an optimal-size adaptive window. The robust formulation aims at eliminating the noise outliers while preserving the edge structures in the restored image. Several examples demonstrating the effectiveness of this robust denoising technique are reported and a comparison with other standard denoising filters is presented.

  7. Association between DAOA gene polymorphisms and the risk of schizophrenia, bipolar disorder and depressive disorder.

    PubMed

    Tan, Jinjing; Lin, Yu; Su, Li; Yan, Yan; Chen, Qing; Jiang, Haiyun; Wei, Qiugui; Gu, Lian

    2014-06-03

    Schizophrenia (SCZ), bipolar disorder (BD) and depressive disorder (DD) are common psychiatric disorders, which show common genetic vulnerability. Previous gene-disease association studies have reported correlations between d-amino acid oxidase activator (DAOA) gene polymorphisms and the three psychiatric disorders. However, the findings were contradictory. A meta-analysis was therefore conducted to provide more robust investigations into DAOA polymorphisms and the risk of SCZ, BD and DD. This meta-analysis recruited 46 published studies up to July 2013, including 17,515 cases and 25,189 controls. Odds ratios (ORs) with 95% confidence intervals (CIs) were used to evaluate the association between three specific DAOA SNPs and SCZ, BD and DD. Publication bias was tested by Begg's test and funnel plot, and heterogeneity was assessed by the Cochran's chi-square-based Q statistic and the inconsistency index (I(2)). Moreover, the robustness of the findings was estimated by cumulative meta-analysis. DAOA genetic polymorphisms (M15, M18 and M23) were not found to confer a statistically significant increased risk of SCZ, BD or DD in the overall sample, or in Caucasians and Asians following subgroup analysis. The current study indicated that M15, M18 and M23 might not be the risk factor for SCZ, BD or DD. However, further studies are required to provide robust evidence to estimate the association between DAOA polymorphisms and psychiatric disorders. Copyright © 2014 Elsevier Inc. All rights reserved.

  8. A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

    USGS Publications Warehouse

    Cohn, T.A.; England, J.F.; Berenbrock, C.E.; Mason, R.R.; Stedinger, J.R.; Lamontagne, J.R.

    2013-01-01

    he Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as “less-than” values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.

  9. A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

    NASA Astrophysics Data System (ADS)

    Cohn, T. A.; England, J. F.; Berenbrock, C. E.; Mason, R. R.; Stedinger, J. R.; Lamontagne, J. R.

    2013-08-01

    The Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as "less-than" values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.

  10. Template protection and its implementation in 3D face recognition systems

    NASA Astrophysics Data System (ADS)

    Zhou, Xuebing

    2007-04-01

    As biometric recognition systems are widely applied in various application areas, security and privacy risks have recently attracted the attention of the biometric community. Template protection techniques prevent stored reference data from revealing private biometric information and enhance the security of biometrics systems against attacks such as identity theft and cross matching. This paper concentrates on a template protection algorithm that merges methods from cryptography, error correction coding and biometrics. The key component of the algorithm is to convert biometric templates into binary vectors. It is shown that the binary vectors should be robust, uniformly distributed, statistically independent and collision-free so that authentication performance can be optimized and information leakage can be avoided. Depending on statistical character of the biometric template, different approaches for transforming biometric templates into compact binary vectors are presented. The proposed methods are integrated into a 3D face recognition system and tested on the 3D facial images of the FRGC database. It is shown that the resulting binary vectors provide an authentication performance that is similar to the original 3D face templates. A high security level is achieved with reasonable false acceptance and false rejection rates of the system, based on an efficient statistical analysis. The algorithm estimates the statistical character of biometric templates from a number of biometric samples in the enrollment database. For the FRGC 3D face database, the small distinction of robustness and discriminative power between the classification results under the assumption of uniquely distributed templates and the ones under the assumption of Gaussian distributed templates is shown in our tests.

  11. Design of experiments enhanced statistical process control for wind tunnel check standard testing

    NASA Astrophysics Data System (ADS)

    Phillips, Ben D.

    The current wind tunnel check standard testing program at NASA Langley Research Center is focused on increasing data quality, uncertainty quantification and overall control and improvement of wind tunnel measurement processes. The statistical process control (SPC) methodology employed in the check standard testing program allows for the tracking of variations in measurements over time as well as an overall assessment of facility health. While the SPC approach can and does provide researchers with valuable information, it has certain limitations in the areas of process improvement and uncertainty quantification. It is thought by utilizing design of experiments methodology in conjunction with the current SPC practices that one can efficiently and more robustly characterize uncertainties and develop enhanced process improvement procedures. In this research, methodologies were developed to generate regression models for wind tunnel calibration coefficients, balance force coefficients and wind tunnel flow angularities. The coefficients of these regression models were then tracked in statistical process control charts, giving a higher level of understanding of the processes. The methodology outlined is sufficiently generic such that this research can be applicable to any wind tunnel check standard testing program.

  12. Frequentist Model Averaging in Structural Equation Modelling.

    PubMed

    Jin, Shaobo; Ankargren, Sebastian

    2018-06-04

    Model selection from a set of candidate models plays an important role in many structural equation modelling applications. However, traditional model selection methods introduce extra randomness that is not accounted for by post-model selection inference. In the current study, we propose a model averaging technique within the frequentist statistical framework. Instead of selecting an optimal model, the contributions of all candidate models are acknowledged. Valid confidence intervals and a [Formula: see text] test statistic are proposed. A simulation study shows that the proposed method is able to produce a robust mean-squared error, a better coverage probability, and a better goodness-of-fit test compared to model selection. It is an interesting compromise between model selection and the full model.

  13. QSAR study of curcumine derivatives as HIV-1 integrase inhibitors.

    PubMed

    Gupta, Pawan; Sharma, Anju; Garg, Prabha; Roy, Nilanjan

    2013-03-01

    A QSAR study was performed on curcumine derivatives as HIV-1 integrase inhibitors using multiple linear regression. The statistically significant model was developed with squared correlation coefficients (r(2)) 0.891 and cross validated r(2) (r(2) cv) 0.825. The developed model revealed that electronic, shape, size, geometry, substitution's information and hydrophilicity were important atomic properties for determining the inhibitory activity of these molecules. The model was also tested successfully for external validation (r(2) pred = 0.849) as well as Tropsha's test for model predictability. Furthermore, the domain analysis was carried out to evaluate the prediction reliability of external set molecules. The model was statistically robust and had good predictive power which can be successfully utilized for screening of new molecules.

  14. Characterizing the Joint Effect of Diverse Test-Statistic Correlation Structures and Effect Size on False Discovery Rates in a Multiple-Comparison Study of Many Outcome Measures

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.; Ploutz-Snyder, Robert; Fiedler, James

    2011-01-01

    In their 2009 Annals of Statistics paper, Gavrilov, Benjamini, and Sarkar report the results of a simulation assessing the robustness of their adaptive step-down procedure (GBS) for controlling the false discovery rate (FDR) when normally distributed test statistics are serially correlated. In this study we extend the investigation to the case of multiple comparisons involving correlated non-central t-statistics, in particular when several treatments or time periods are being compared to a control in a repeated-measures design with many dependent outcome measures. In addition, we consider several dependence structures other than serial correlation and illustrate how the FDR depends on the interaction between effect size and the type of correlation structure as indexed by Foerstner s distance metric from an identity. The relationship between the correlation matrix R of the original dependent variables and R, the correlation matrix of associated t-statistics is also studied. In general R depends not only on R, but also on sample size and the signed effect sizes for the multiple comparisons.

  15. Robust kernel representation with statistical local features for face recognition.

    PubMed

    Yang, Meng; Zhang, Lei; Shiu, Simon Chi-Keung; Zhang, David

    2013-06-01

    Factors such as misalignment, pose variation, and occlusion make robust face recognition a difficult problem. It is known that statistical features such as local binary pattern are effective for local feature extraction, whereas the recently proposed sparse or collaborative representation-based classification has shown interesting results in robust face recognition. In this paper, we propose a novel robust kernel representation model with statistical local features (SLF) for robust face recognition. Initially, multipartition max pooling is used to enhance the invariance of SLF to image registration error. Then, a kernel-based representation model is proposed to fully exploit the discrimination information embedded in the SLF, and robust regression is adopted to effectively handle the occlusion in face images. Extensive experiments are conducted on benchmark face databases, including extended Yale B, AR (A. Martinez and R. Benavente), multiple pose, illumination, and expression (multi-PIE), facial recognition technology (FERET), face recognition grand challenge (FRGC), and labeled faces in the wild (LFW), which have different variations of lighting, expression, pose, and occlusions, demonstrating the promising performance of the proposed method.

  16. Robust joint score tests in the application of DNA methylation data analysis.

    PubMed

    Li, Xuan; Fu, Yuejiao; Wang, Xiaogang; Qiu, Weiliang

    2018-05-18

    Recently differential variability has been showed to be valuable in evaluating the association of DNA methylation to the risks of complex human diseases. The statistical tests based on both differential methylation level and differential variability can be more powerful than those based only on differential methylation level. Anh and Wang (2013) proposed a joint score test (AW) to simultaneously detect for differential methylation and differential variability. However, AW's method seems to be quite conservative and has not been fully compared with existing joint tests. We proposed three improved joint score tests, namely iAW.Lev, iAW.BF, and iAW.TM, and have made extensive comparisons with the joint likelihood ratio test (jointLRT), the Kolmogorov-Smirnov (KS) test, and the AW test. Systematic simulation studies showed that: 1) the three improved tests performed better (i.e., having larger power, while keeping nominal Type I error rates) than the other three tests for data with outliers and having different variances between cases and controls; 2) for data from normal distributions, the three improved tests had slightly lower power than jointLRT and AW. The analyses of two Illumina HumanMethylation27 data sets GSE37020 and GSE20080 and one Illumina Infinium MethylationEPIC data set GSE107080 demonstrated that three improved tests had higher true validation rates than those from jointLRT, KS, and AW. The three proposed joint score tests are robust against the violation of normality assumption and presence of outlying observations in comparison with other three existing tests. Among the three proposed tests, iAW.BF seems to be the most robust and effective one for all simulated scenarios and also in real data analyses.

  17. The longevity of statistical learning: When infant memory decays, isolated words come to the rescue.

    PubMed

    Karaman, Ferhat; Hay, Jessica F

    2018-02-01

    Research over the past 2 decades has demonstrated that infants are equipped with remarkable computational abilities that allow them to find words in continuous speech. Infants can encode information about the transitional probability (TP) between syllables to segment words from artificial and natural languages. As previous research has tested infants immediately after familiarization, infants' ability to retain sequential statistics beyond the immediate familiarization context remains unknown. Here, we examine infants' memory for statistically defined words 10 min after familiarization with an Italian corpus. Eight-month-old English-learning infants were familiarized with Italian sentences that contained 4 embedded target words-2 words had high internal TP (HTP, TP = 1.0) and 2 had low TP (LTP, TP = .33)-and were tested on their ability to discriminate HTP from LTP words using the Headturn Preference Procedure. When tested after a 10-min delay, infants failed to discriminate HTP from LTP words, suggesting that memory for statistical information likely decays over even short delays (Experiment 1). Experiments 2-4 were designed to test whether experience with isolated words selectively reinforces memory for statistically defined (i.e., HTP) words. When 8-month-olds were given additional experience with isolated tokens of both HTP and LTP words immediately after familiarization, they looked significantly longer on HTP than LTP test trials 10 min later. Although initial representations of statistically defined words may be fragile, our results suggest that experience with isolated words may reinforce the output of statistical learning by helping infants create more robust memories for words with strong versus weak co-occurrence statistics. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  18. Gender discrimination and prediction on the basis of facial metric information.

    PubMed

    Fellous, J M

    1997-07-01

    Horizontal and vertical facial measurements are statistically independent. Discriminant analysis shows that five of such normalized distances explain over 95% of the gender differences of "training" samples and predict the gender of 90% novel test faces exhibiting various facial expressions. The robustness of the method and its results are assessed. It is argued that these distances (termed fiducial) are compatible with those found experimentally by psychophysical and neurophysiological studies. In consequence, partial explanations for the effects observed in these experiments can be found in the intrinsic statistical nature of the facial stimuli used.

  19. Statistical Control Paradigm for Aerospace Structures Under Impulsive Disturbances

    DTIC Science & Technology

    2006-08-03

    attitude control system with an innovative and robust statistical controller design shows significant promise for use in attitude hold mode operation...indicate that the existing attitude control system with an innovative and robust statistical controller design shows significant promise for use in...and three thrusters are for use in controlling the attitude of the satellite. Then the angular momentum of the satellite with three thrusters and a

  20. An Analytical Investigation of the Robustness and Power of ANCOVA with the Presence of Heterogeneous Regression Slopes.

    ERIC Educational Resources Information Center

    Hollingsworth, Holly H.

    This study shows that the test statistic for Analysis of Covariance (ANCOVA) has a noncentral F-districution with noncentrality parameter equal to zero if and only if the regression planes are homogeneous and/or the vector of overall covariate means is the null vector. The effect of heterogeneous regression slope parameters is to either increase…

  1. Variability and robustness of scatterers in HRR/ISAR ground target data and its influence on the ATR performance

    NASA Astrophysics Data System (ADS)

    Schumacher, R.; Schimpf, H.; Schiller, J.

    2011-06-01

    The most challenging problem of Automatic Target Recognition (ATR) is the extraction of robust and independent target features which describe the target unambiguously. These features have to be robust and invariant in different senses: in time, between aspect views (azimuth and elevation angle), between target motion (translation and rotation) and between different target variants. Especially for ground moving targets in military applications an irregular target motion is typical, so that a strong variation of the backscattered radar signal with azimuth and elevation angle makes the extraction of stable and robust features most difficult. For ATR based on High Range Resolution (HRR) profiles and / or Inverse Synthetic Aperture Radar (ISAR) images it is crucial that the reference dataset consists of stable and robust features, which, among others, will depend on the target aspect and depression angle amongst others. Here it is important to find an adequate data grid for an efficient data coverage in the reference dataset for ATR. In this paper the variability of the backscattered radar signals of target scattering centers is analyzed for different HRR profiles and ISAR images from measured turntable datasets of ground targets under controlled conditions. Especially the dependency of the features on the elevation angle is analyzed regarding to the ATR of large strip SAR data with a large range of depression angles by using available (I)SAR datasets as reference. In this work the robustness of these scattering centers is analyzed by extracting their amplitude, phase and position. Therefore turntable measurements under controlled conditions were performed targeting an artificial military reference object called STANDCAM. Measures referring to variability, similarity, robustness and separability regarding the scattering centers are defined. The dependency of the scattering behaviour with respect to azimuth and elevation variations is analyzed. Additionally generic types of features (geometrical, statistical), which can be derived especially from (I)SAR images, are applied to the ATR-task. Therefore subsequently the dependence of individual feature values as well as the feature statistics on aspect (i.e. azimuth and elevation) are presented. The Kolmogorov-Smirnov distance will be used to show how the feature statistics is influenced by varying elevation angles. Finally, confusion matrices are computed between the STANDCAM target at all eleven elevation angles. This helps to assess the robustness of ATR performance under the influence of aspect angle deviations between training set and test set.

  2. Order-restricted inference for means with missing values.

    PubMed

    Wang, Heng; Zhong, Ping-Shou

    2017-09-01

    Missing values appear very often in many applications, but the problem of missing values has not received much attention in testing order-restricted alternatives. Under the missing at random (MAR) assumption, we impute the missing values nonparametrically using kernel regression. For data with imputation, the classical likelihood ratio test designed for testing the order-restricted means is no longer applicable since the likelihood does not exist. This article proposes a novel method for constructing test statistics for assessing means with an increasing order or a decreasing order based on jackknife empirical likelihood (JEL) ratio. It is shown that the JEL ratio statistic evaluated under the null hypothesis converges to a chi-bar-square distribution, whose weights depend on missing probabilities and nonparametric imputation. Simulation study shows that the proposed test performs well under various missing scenarios and is robust for normally and nonnormally distributed data. The proposed method is applied to an Alzheimer's disease neuroimaging initiative data set for finding a biomarker for the diagnosis of the Alzheimer's disease. © 2017, The International Biometric Society.

  3. The MAX Statistic is Less Powerful for Genome Wide Association Studies Under Most Alternative Hypotheses.

    PubMed

    Shifflett, Benjamin; Huang, Rong; Edland, Steven D

    2017-01-01

    Genotypic association studies are prone to inflated type I error rates if multiple hypothesis testing is performed, e.g., sequentially testing for recessive, multiplicative, and dominant risk. Alternatives to multiple hypothesis testing include the model independent genotypic χ 2 test, the efficiency robust MAX statistic, which corrects for multiple comparisons but with some loss of power, or a single Armitage test for multiplicative trend, which has optimal power when the multiplicative model holds but with some loss of power when dominant or recessive models underlie the genetic association. We used Monte Carlo simulations to describe the relative performance of these three approaches under a range of scenarios. All three approaches maintained their nominal type I error rates. The genotypic χ 2 and MAX statistics were more powerful when testing a strictly recessive genetic effect or when testing a dominant effect when the allele frequency was high. The Armitage test for multiplicative trend was most powerful for the broad range of scenarios where heterozygote risk is intermediate between recessive and dominant risk. Moreover, all tests had limited power to detect recessive genetic risk unless the sample size was large, and conversely all tests were relatively well powered to detect dominant risk. Taken together, these results suggest the general utility of the multiplicative trend test when the underlying genetic model is unknown.

  4. An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder.

    PubMed

    Werling, Donna M; Brand, Harrison; An, Joon-Yong; Stone, Matthew R; Zhu, Lingxue; Glessner, Joseph T; Collins, Ryan L; Dong, Shan; Layer, Ryan M; Markenscoff-Papadimitriou, Eirene; Farrell, Andrew; Schwartz, Grace B; Wang, Harold Z; Currall, Benjamin B; Zhao, Xuefang; Dea, Jeanselle; Duhn, Clif; Erdman, Carolyn A; Gilson, Michael C; Yadav, Rachita; Handsaker, Robert E; Kashin, Seva; Klei, Lambertus; Mandell, Jeffrey D; Nowakowski, Tomasz J; Liu, Yuwen; Pochareddy, Sirisha; Smith, Louw; Walker, Michael F; Waterman, Matthew J; He, Xin; Kriegstein, Arnold R; Rubenstein, John L; Sestan, Nenad; McCarroll, Steven A; Neale, Benjamin M; Coon, Hilary; Willsey, A Jeremy; Buxbaum, Joseph D; Daly, Mark J; State, Matthew W; Quinlan, Aaron R; Marth, Gabor T; Roeder, Kathryn; Devlin, Bernie; Talkowski, Michael E; Sanders, Stephan J

    2018-05-01

    Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here we present a comparable framework to evaluate rare and de novo noncoding single-nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism, the contribution of de novo noncoding variation is probably modest in comparison to that of de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple-testing burden.

  5. Nine time steps: ultra-fast statistical consistency testing of the Community Earth System Model (pyCECT v3.0)

    NASA Astrophysics Data System (ADS)

    Milroy, Daniel J.; Baker, Allison H.; Hammerling, Dorit M.; Jessup, Elizabeth R.

    2018-02-01

    The Community Earth System Model Ensemble Consistency Test (CESM-ECT) suite was developed as an alternative to requiring bitwise identical output for quality assurance. This objective test provides a statistical measurement of consistency between an accepted ensemble created by small initial temperature perturbations and a test set of CESM simulations. In this work, we extend the CESM-ECT suite with an inexpensive and robust test for ensemble consistency that is applied to Community Atmospheric Model (CAM) output after only nine model time steps. We demonstrate that adequate ensemble variability is achieved with instantaneous variable values at the ninth step, despite rapid perturbation growth and heterogeneous variable spread. We refer to this new test as the Ultra-Fast CAM Ensemble Consistency Test (UF-CAM-ECT) and demonstrate its effectiveness in practice, including its ability to detect small-scale events and its applicability to the Community Land Model (CLM). The new ultra-fast test facilitates CESM development, porting, and optimization efforts, particularly when used to complement information from the original CESM-ECT suite of tools.

  6. A robust dataset-agnostic heart disease classifier from Phonocardiogram.

    PubMed

    Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M

    2017-07-01

    Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.

  7. A Review of Some Aspects of Robust Inference for Time Series.

    DTIC Science & Technology

    1984-09-01

    REVIEW OF SOME ASPECTSOF ROBUST INFERNCE FOR TIME SERIES by Ad . Dougla Main TE "iAL REPOW No. 63 Septermber 1984 Department of Statistics University of ...clear. One cannot hope to have a good method for dealing with outliers in time series by using only an instantaneous nonlinear transformation of the data...AI.49 716 A REVIEWd OF SOME ASPECTS OF ROBUST INFERENCE FOR TIME 1/1 SERIES(U) WASHINGTON UNIV SEATTLE DEPT OF STATISTICS R D MARTIN SEP 84 TR-53

  8. DARHT Multi-intelligence Seismic and Acoustic Data Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stevens, Garrison Nicole; Van Buren, Kendra Lu; Hemez, Francois M.

    The purpose of this report is to document the analysis of seismic and acoustic data collected at the Dual-Axis Radiographic Hydrodynamic Test (DARHT) facility at Los Alamos National Laboratory for robust, multi-intelligence decision making. The data utilized herein is obtained from two tri-axial seismic sensors and three acoustic sensors, resulting in a total of nine data channels. The goal of this analysis is to develop a generalized, automated framework to determine internal operations at DARHT using informative features extracted from measurements collected external of the facility. Our framework involves four components: (1) feature extraction, (2) data fusion, (3) classification, andmore » finally (4) robustness analysis. Two approaches are taken for extracting features from the data. The first of these, generic feature extraction, involves extraction of statistical features from the nine data channels. The second approach, event detection, identifies specific events relevant to traffic entering and leaving the facility as well as explosive activities at DARHT and nearby explosive testing sites. Event detection is completed using a two stage method, first utilizing signatures in the frequency domain to identify outliers and second extracting short duration events of interest among these outliers by evaluating residuals of an autoregressive exogenous time series model. Features extracted from each data set are then fused to perform analysis with a multi-intelligence paradigm, where information from multiple data sets are combined to generate more information than available through analysis of each independently. The fused feature set is used to train a statistical classifier and predict the state of operations to inform a decision maker. We demonstrate this classification using both generic statistical features and event detection and provide a comparison of the two methods. Finally, the concept of decision robustness is presented through a preliminary analysis where uncertainty is added to the system through noise in the measurements.« less

  9. Biostatistical analysis of quantitative immunofluorescence microscopy images.

    PubMed

    Giles, C; Albrecht, M A; Lam, V; Takechi, R; Mamo, J C

    2016-12-01

    Semiquantitative immunofluorescence microscopy has become a key methodology in biomedical research. Typical statistical workflows are considered in the context of avoiding pseudo-replication and marginalising experimental error. However, immunofluorescence microscopy naturally generates hierarchically structured data that can be leveraged to improve statistical power and enrich biological interpretation. Herein, we describe a robust distribution fitting procedure and compare several statistical tests, outlining their potential advantages/disadvantages in the context of biological interpretation. Further, we describe tractable procedures for power analysis that incorporates the underlying distribution, sample size and number of images captured per sample. The procedures outlined have significant potential for increasing understanding of biological processes and decreasing both ethical and financial burden through experimental optimization. © 2016 The Authors Journal of Microscopy © 2016 Royal Microscopical Society.

  10. How allele frequency and study design affect association test statistics with misrepresentation errors.

    PubMed

    Escott-Price, Valentina; Ghodsi, Mansoureh; Schmidt, Karl Michael

    2014-04-01

    We evaluate the effect of genotyping errors on the type-I error of a general association test based on genotypes, showing that, in the presence of errors in the case and control samples, the test statistic asymptotically follows a scaled non-central $\\chi ^2$ distribution. We give explicit formulae for the scaling factor and non-centrality parameter for the symmetric allele-based genotyping error model and for additive and recessive disease models. They show how genotyping errors can lead to a significantly higher false-positive rate, growing with sample size, compared with the nominal significance levels. The strength of this effect depends very strongly on the population distribution of the genotype, with a pronounced effect in the case of rare alleles, and a great robustness against error in the case of large minor allele frequency. We also show how these results can be used to correct $p$-values.

  11. Hypothesis testing of scientific Monte Carlo calculations.

    PubMed

    Wallerberger, Markus; Gull, Emanuel

    2017-11-01

    The steadily increasing size of scientific Monte Carlo simulations and the desire for robust, correct, and reproducible results necessitates rigorous testing procedures for scientific simulations in order to detect numerical problems and programming bugs. However, the testing paradigms developed for deterministic algorithms have proven to be ill suited for stochastic algorithms. In this paper we demonstrate explicitly how the technique of statistical hypothesis testing, which is in wide use in other fields of science, can be used to devise automatic and reliable tests for Monte Carlo methods, and we show that these tests are able to detect some of the common problems encountered in stochastic scientific simulations. We argue that hypothesis testing should become part of the standard testing toolkit for scientific simulations.

  12. Hypothesis testing of scientific Monte Carlo calculations

    NASA Astrophysics Data System (ADS)

    Wallerberger, Markus; Gull, Emanuel

    2017-11-01

    The steadily increasing size of scientific Monte Carlo simulations and the desire for robust, correct, and reproducible results necessitates rigorous testing procedures for scientific simulations in order to detect numerical problems and programming bugs. However, the testing paradigms developed for deterministic algorithms have proven to be ill suited for stochastic algorithms. In this paper we demonstrate explicitly how the technique of statistical hypothesis testing, which is in wide use in other fields of science, can be used to devise automatic and reliable tests for Monte Carlo methods, and we show that these tests are able to detect some of the common problems encountered in stochastic scientific simulations. We argue that hypothesis testing should become part of the standard testing toolkit for scientific simulations.

  13. Robustly detecting differential expression in RNA sequencing data using observation weights

    PubMed Central

    Zhou, Xiaobei; Lindsay, Helen; Robinson, Mark D.

    2014-01-01

    A popular approach for comparing gene expression levels between (replicated) conditions of RNA sequencing data relies on counting reads that map to features of interest. Within such count-based methods, many flexible and advanced statistical approaches now exist and offer the ability to adjust for covariates (e.g. batch effects). Often, these methods include some sort of ‘sharing of information’ across features to improve inferences in small samples. It is important to achieve an appropriate tradeoff between statistical power and protection against outliers. Here, we study the robustness of existing approaches for count-based differential expression analysis and propose a new strategy based on observation weights that can be used within existing frameworks. The results suggest that outliers can have a global effect on differential analyses. We demonstrate the effectiveness of our new approach with real data and simulated data that reflects properties of real datasets (e.g. dispersion-mean trend) and develop an extensible framework for comprehensive testing of current and future methods. In addition, we explore the origin of such outliers, in some cases highlighting additional biological or technical factors within the experiment. Further details can be downloaded from the project website: http://imlspenticton.uzh.ch/robinson_lab/edgeR_robust/. PMID:24753412

  14. Suitability and setup of next-generation sequencing-based method for taxonomic characterization of aquatic microbial biofilm.

    PubMed

    Bakal, Tomas; Janata, Jiri; Sabova, Lenka; Grabic, Roman; Zlabek, Vladimir; Najmanova, Lucie

    2018-06-16

    A robust and widely applicable method for sampling of aquatic microbial biofilm and further sample processing is presented. The method is based on next-generation sequencing of V4-V5 variable regions of 16S rRNA gene and further statistical analysis of sequencing data, which could be useful not only to investigate taxonomic composition of biofilm bacterial consortia but also to assess aquatic ecosystem health. Five artificial materials commonly used for biofilm growth (glass, stainless steel, aluminum, polypropylene, polyethylene) were tested to determine the one giving most robust and reproducible results. The effect of used sampler material on total microbial composition was not statistically significant; however, the non-plastic materials (glass, metal) gave more stable outputs without irregularities among sample parallels. The bias of the method is assessed with respect to the employment of a non-quantitative step (PCR amplification) to obtain quantitative results (relative abundance of identified taxa). This aspect is often overlooked in ecological and medical studies. We document that sequencing of a mixture of three merged primary PCR reactions for each sample and further evaluation of median values from three technical replicates for each sample enables to overcome this bias and gives robust and repeatable results well distinguishing among sampling localities and seasons.

  15. 3D Face Modeling Using the Multi-Deformable Method

    PubMed Central

    Hwang, Jinkyu; Yu, Sunjin; Kim, Joongrock; Lee, Sangyoun

    2012-01-01

    In this paper, we focus on the problem of the accuracy performance of 3D face modeling techniques using corresponding features in multiple views, which is quite sensitive to feature extraction errors. To solve the problem, we adopt a statistical model-based 3D face modeling approach in a mirror system consisting of two mirrors and a camera. The overall procedure of our 3D facial modeling method has two primary steps: 3D facial shape estimation using a multiple 3D face deformable model and texture mapping using seamless cloning that is a type of gradient-domain blending. To evaluate our method's performance, we generate 3D faces of 30 individuals and then carry out two tests: accuracy test and robustness test. Our method shows not only highly accurate 3D face shape results when compared with the ground truth, but also robustness to feature extraction errors. Moreover, 3D face rendering results intuitively show that our method is more robust to feature extraction errors than other 3D face modeling methods. An additional contribution of our method is that a wide range of face textures can be acquired by the mirror system. By using this texture map, we generate realistic 3D face for individuals at the end of the paper. PMID:23201976

  16. Fitting a three-parameter lognormal distribution with applications to hydrogeochemical data from the National Uranium Resource Evaluation Program

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kane, V.E.

    1979-10-01

    The standard maximum likelihood and moment estimation procedures are shown to have some undesirable characteristics for estimating the parameters in a three-parameter lognormal distribution. A class of goodness-of-fit estimators is found which provides a useful alternative to the standard methods. The class of goodness-of-fit tests considered include the Shapiro-Wilk and Shapiro-Francia tests which reduce to a weighted linear combination of the order statistics that can be maximized in estimation problems. The weighted-order statistic estimators are compared to the standard procedures in Monte Carlo simulations. Bias and robustness of the procedures are examined and example data sets analyzed including geochemical datamore » from the National Uranium Resource Evaluation Program.« less

  17. An empirical likelihood ratio test robust to individual heterogeneity for differential expression analysis of RNA-seq.

    PubMed

    Xu, Maoqi; Chen, Liang

    2018-01-01

    The individual sample heterogeneity is one of the biggest obstacles in biomarker identification for complex diseases such as cancers. Current statistical models to identify differentially expressed genes between disease and control groups often overlook the substantial human sample heterogeneity. Meanwhile, traditional nonparametric tests lose detailed data information and sacrifice the analysis power, although they are distribution free and robust to heterogeneity. Here, we propose an empirical likelihood ratio test with a mean-variance relationship constraint (ELTSeq) for the differential expression analysis of RNA sequencing (RNA-seq). As a distribution-free nonparametric model, ELTSeq handles individual heterogeneity by estimating an empirical probability for each observation without making any assumption about read-count distribution. It also incorporates a constraint for the read-count overdispersion, which is widely observed in RNA-seq data. ELTSeq demonstrates a significant improvement over existing methods such as edgeR, DESeq, t-tests, Wilcoxon tests and the classic empirical likelihood-ratio test when handling heterogeneous groups. It will significantly advance the transcriptomics studies of cancers and other complex disease. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Soffientini, Chiara Dolores, E-mail: chiaradolores.soffientini@polimi.it; Baselli, Giuseppe; De Bernardi, Elisabetta

    Purpose: Quantitative {sup 18}F-fluorodeoxyglucose positron emission tomography is limited by the uncertainty in lesion delineation due to poor SNR, low resolution, and partial volume effects, subsequently impacting oncological assessment, treatment planning, and follow-up. The present work develops and validates a segmentation algorithm based on statistical clustering. The introduction of constraints based on background features and contiguity priors is expected to improve robustness vs clinical image characteristics such as lesion dimension, noise, and contrast level. Methods: An eight-class Gaussian mixture model (GMM) clustering algorithm was modified by constraining the mean and variance parameters of four background classes according to the previousmore » analysis of a lesion-free background volume of interest (background modeling). Hence, expectation maximization operated only on the four classes dedicated to lesion detection. To favor the segmentation of connected objects, a further variant was introduced by inserting priors relevant to the classification of neighbors. The algorithm was applied to simulated datasets and acquired phantom data. Feasibility and robustness toward initialization were assessed on a clinical dataset manually contoured by two expert clinicians. Comparisons were performed with respect to a standard eight-class GMM algorithm and to four different state-of-the-art methods in terms of volume error (VE), Dice index, classification error (CE), and Hausdorff distance (HD). Results: The proposed GMM segmentation with background modeling outperformed standard GMM and all the other tested methods. Medians of accuracy indexes were VE <3%, Dice >0.88, CE <0.25, and HD <1.2 in simulations; VE <23%, Dice >0.74, CE <0.43, and HD <1.77 in phantom data. Robustness toward image statistic changes (±15%) was shown by the low index changes: <26% for VE, <17% for Dice, and <15% for CE. Finally, robustness toward the user-dependent volume initialization was demonstrated. The inclusion of the spatial prior improved segmentation accuracy only for lesions surrounded by heterogeneous background: in the relevant simulation subset, the median VE significantly decreased from 13% to 7%. Results on clinical data were found in accordance with simulations, with absolute VE <7%, Dice >0.85, CE <0.30, and HD <0.81. Conclusions: The sole introduction of constraints based on background modeling outperformed standard GMM and the other tested algorithms. Insertion of a spatial prior improved the accuracy for realistic cases of objects in heterogeneous backgrounds. Moreover, robustness against initialization supports the applicability in a clinical setting. In conclusion, application-driven constraints can generally improve the capabilities of GMM and statistical clustering algorithms.« less

  19. Adaptive time-stepping Monte Carlo integration of Coulomb collisions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sarkimaki, Konsta; Hirvijoki, E.; Terava, J.

    Here, we report an accessible and robust tool for evaluating the effects of Coulomb collisions on a test particle in a plasma that obeys Maxwell–Jüttner statistics. The implementation is based on the Beliaev–Budker collision integral which allows both the test particle and the background plasma to be relativistic. The integration method supports adaptive time stepping, which is shown to greatly improve the computational efficiency. The Monte Carlo method is implemented for both the three-dimensional particle momentum space and the five-dimensional guiding center phase space.

  20. Adaptive time-stepping Monte Carlo integration of Coulomb collisions

    DOE PAGES

    Sarkimaki, Konsta; Hirvijoki, E.; Terava, J.

    2017-10-12

    Here, we report an accessible and robust tool for evaluating the effects of Coulomb collisions on a test particle in a plasma that obeys Maxwell–Jüttner statistics. The implementation is based on the Beliaev–Budker collision integral which allows both the test particle and the background plasma to be relativistic. The integration method supports adaptive time stepping, which is shown to greatly improve the computational efficiency. The Monte Carlo method is implemented for both the three-dimensional particle momentum space and the five-dimensional guiding center phase space.

  1. Intelligent Condition Diagnosis Method Based on Adaptive Statistic Test Filter and Diagnostic Bayesian Network

    PubMed Central

    Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing

    2016-01-01

    A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method. PMID:26761006

  2. Intelligent Condition Diagnosis Method Based on Adaptive Statistic Test Filter and Diagnostic Bayesian Network.

    PubMed

    Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing

    2016-01-08

    A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method.

  3. GTest: a software tool for graphical assessment of empirical distributions' Gaussianity.

    PubMed

    Barca, E; Bruno, E; Bruno, D E; Passarella, G

    2016-03-01

    In the present paper, the novel software GTest is introduced, designed for testing the normality of a user-specified empirical distribution. It has been implemented with two unusual characteristics; the first is the user option of selecting four different versions of the normality test, each of them suited to be applied to a specific dataset or goal, and the second is the inferential paradigm that informs the output of such tests: it is basically graphical and intrinsically self-explanatory. The concept of inference-by-eye is an emerging inferential approach which will find a successful application in the near future due to the growing need of widening the audience of users of statistical methods to people with informal statistical skills. For instance, the latest European regulation concerning environmental issues introduced strict protocols for data handling (data quality assurance, outliers detection, etc.) and information exchange (areal statistics, trend detection, etc.) between regional and central environmental agencies. Therefore, more and more frequently, laboratory and field technicians will be requested to utilize complex software applications for subjecting data coming from monitoring, surveying or laboratory activities to specific statistical analyses. Unfortunately, inferential statistics, which actually influence the decisional processes for the correct managing of environmental resources, are often implemented in a way which expresses its outcomes in a numerical form with brief comments in a strict statistical jargon (degrees of freedom, level of significance, accepted/rejected H0, etc.). Therefore, often, the interpretation of such outcomes is really difficult for people with poor statistical knowledge. In such framework, the paradigm of the visual inference can contribute to fill in such gap, providing outcomes in self-explanatory graphical forms with a brief comment in the common language. Actually, the difficulties experienced by colleagues and their request for an effective tool for addressing such difficulties motivated us in adopting the inference-by-eye paradigm and implementing an easy-to-use, quick and reliable statistical tool. GTest visualizes its outcomes as a modified version of the Q-Q plot. The application has been developed in Visual Basic for Applications (VBA) within MS Excel 2010, which demonstrated to have all the characteristics of robustness and reliability needed. GTest provides true graphical normality tests which are as reliable as any statistical quantitative approach but much easier to understand. The Q-Q plots have been integrated with the outlining of an acceptance region around the representation of the theoretical distribution, defined in accordance with the alpha level of significance and the data sample size. The test decision rule is the following: if the empirical scatterplot falls completely within the acceptance region, then it can be concluded that the empirical distribution fits the theoretical one at the given alpha level. A comprehensive case study has been carried out with simulated and real-world data in order to check the robustness and reliability of the software.

  4. Towards improved behavioural testing in aquatic toxicology: Acclimation and observation times are important factors when designing behavioural tests with fish.

    PubMed

    Melvin, Steven D; Petit, Marie A; Duvignacq, Marion C; Sumpter, John P

    2017-08-01

    The quality and reproducibility of science has recently come under scrutiny, with criticisms spanning disciplines. In aquatic toxicology, behavioural tests are currently an area of controversy since inconsistent findings have been highlighted and attributed to poor quality science. The problem likely relates to limitations to our understanding of basic behavioural patterns, which can influence our ability to design statistically robust experiments yielding ecologically relevant data. The present study takes a first step towards understanding baseline behaviours in fish, including how basic choices in experimental design might influence behavioural outcomes and interpretations in aquatic toxicology. Specifically, we explored how fish acclimate to behavioural arenas and how different lengths of observation time impact estimates of basic swimming parameters (i.e., average, maximum and angular velocity). We performed a semi-quantitative literature review to place our findings in the context of the published literature describing behavioural tests with fish. Our results demonstrate that fish fundamentally change their swimming behaviour over time, and that acclimation and observational timeframes may therefore have implications for influencing both the ecological relevance and statistical robustness of behavioural toxicity tests. Our review identified 165 studies describing behavioural responses in fish exposed to various stressors, and revealed that the majority of publications documenting fish behavioural responses report extremely brief acclimation times and observational durations, which helps explain inconsistencies identified across studies. We recommend that researchers applying behavioural tests with fish, and other species, apply a similar framework to better understand baseline behaviours and the implications of design choices for influencing study outcomes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Conceptual and statistical problems associated with the use of diversity indices in ecology.

    PubMed

    Barrantes, Gilbert; Sandoval, Luis

    2009-09-01

    Diversity indices, particularly the Shannon-Wiener index, have extensively been used in analyzing patterns of diversity at different geographic and ecological scales. These indices have serious conceptual and statistical problems which make comparisons of species richness or species abundances across communities nearly impossible. There is often no a single statistical method that retains all information needed to answer even a simple question. However, multivariate analyses could be used instead of diversity indices, such as cluster analyses or multiple regressions. More complex multivariate analyses, such as Canonical Correspondence Analysis, provide very valuable information on environmental variables associated to the presence and abundance of the species in a community. In addition, particular hypotheses associated to changes in species richness across localities, or change in abundance of one, or a group of species can be tested using univariate, bivariate, and/or rarefaction statistical tests. The rarefaction method has proved to be robust to standardize all samples to a common size. Even the simplest method as reporting the number of species per taxonomic category possibly provides more information than a diversity index value.

  6. Generalized Hurst exponent estimates differentiate EEG signals of healthy and epileptic patients

    NASA Astrophysics Data System (ADS)

    Lahmiri, Salim

    2018-01-01

    The aim of our current study is to check whether multifractal patterns of the electroencephalographic (EEG) signals of normal and epileptic patients are statistically similar or different. In this regard, the generalized Hurst exponent (GHE) method is used for robust estimation of the multifractals in each type of EEG signals, and three powerful statistical tests are performed to check existence of differences between estimated GHEs from healthy control subjects and epileptic patients. The obtained results show that multifractals exist in both types of EEG signals. Particularly, it was found that the degree of fractal is more pronounced in short variations of normal EEG signals than in short variations of EEG signals with seizure free intervals. In contrary, it is more pronounced in long variations of EEG signals with seizure free intervals than in normal EEG signals. Importantly, both parametric and nonparametric statistical tests show strong evidence that estimated GHEs of normal EEG signals are statistically and significantly different from those with seizure free intervals. Therefore, GHEs can be efficiently used to distinguish between healthy and patients suffering from epilepsy.

  7. VHSIC/VHSIC-Like Reliability Prediction Modeling

    DTIC Science & Technology

    1989-10-01

    prediction would require ’ kowledge of event statistics as well as device robustness. Ii1 Additionally, although this is primarily a theoretical, bottom...Degradation in Section 5.3 P = Power PDIP = Plastic DIP P(f) = Probability of Failure due to EOS or ESD P(flc) = Probability of Failure given Contact from an...the results of those stresses: Device Stress Part Number Power Dissipation Manufacturer Test Type Part Description Junction Teniperatune Package Type

  8. Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

    PubMed

    Mi, Gu; Di, Yanming; Schafer, Daniel W

    2015-01-01

    This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

  9. Association analysis of multiple traits by an approach of combining P values.

    PubMed

    Chen, Lili; Wang, Yong; Zhou, Yajing

    2018-03-01

    Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants.

  10. Data-adaptive test statistics for microarray data.

    PubMed

    Mukherjee, Sach; Roberts, Stephen J; van der Laan, Mark J

    2005-09-01

    An important task in microarray data analysis is the selection of genes that are differentially expressed between different tissue samples, such as healthy and diseased. However, microarray data contain an enormous number of dimensions (genes) and very few samples (arrays), a mismatch which poses fundamental statistical problems for the selection process that have defied easy resolution. In this paper, we present a novel approach to the selection of differentially expressed genes in which test statistics are learned from data using a simple notion of reproducibility in selection results as the learning criterion. Reproducibility, as we define it, can be computed without any knowledge of the 'ground-truth', but takes advantage of certain properties of microarray data to provide an asymptotically valid guide to expected loss under the true data-generating distribution. We are therefore able to indirectly minimize expected loss, and obtain results substantially more robust than conventional methods. We apply our method to simulated and oligonucleotide array data. By request to the corresponding author.

  11. Statistical Method to Overcome Overfitting Issue in Rational Function Models

    NASA Astrophysics Data System (ADS)

    Alizadeh Moghaddam, S. H.; Mokhtarzade, M.; Alizadeh Naeini, A.; Alizadeh Moghaddam, S. A.

    2017-09-01

    Rational function models (RFMs) are known as one of the most appealing models which are extensively applied in geometric correction of satellite images and map production. Overfitting is a common issue, in the case of terrain dependent RFMs, that degrades the accuracy of RFMs-derived geospatial products. This issue, resulting from the high number of RFMs' parameters, leads to ill-posedness of the RFMs. To tackle this problem, in this study, a fast and robust statistical approach is proposed and compared to Tikhonov regularization (TR) method, as a frequently-used solution to RFMs' overfitting. In the proposed method, a statistical test, namely, significance test is applied to search for the RFMs' parameters that are resistant against overfitting issue. The performance of the proposed method was evaluated for two real data sets of Cartosat-1 satellite images. The obtained results demonstrate the efficiency of the proposed method in term of the achievable level of accuracy. This technique, indeed, shows an improvement of 50-80% over the TR.

  12. Appraisal of within- and between-laboratory reproducibility of non-radioisotopic local lymph node assay using flow cytometry, LLNA:BrdU-FCM: comparison of OECD TG429 performance standard and statistical evaluation.

    PubMed

    Yang, Hyeri; Na, Jihye; Jang, Won-Hee; Jung, Mi-Sook; Jeon, Jun-Young; Heo, Yong; Yeo, Kyung-Wook; Jo, Ji-Hoon; Lim, Kyung-Min; Bae, SeungJin

    2015-05-05

    Mouse local lymph node assay (LLNA, OECD TG429) is an alternative test replacing conventional guinea pig tests (OECD TG406) for the skin sensitization test but the use of a radioisotopic agent, (3)H-thymidine, deters its active dissemination. New non-radioisotopic LLNA, LLNA:BrdU-FCM employs a non-radioisotopic analog, 5-bromo-2'-deoxyuridine (BrdU) and flow cytometry. For an analogous method, OECD TG429 performance standard (PS) advises that two reference compounds be tested repeatedly and ECt(threshold) values obtained must fall within acceptable ranges to prove within- and between-laboratory reproducibility. However, this criteria is somewhat arbitrary and sample size of ECt is less than 5, raising concerns about insufficient reliability. Here, we explored various statistical methods to evaluate the reproducibility of LLNA:BrdU-FCM with stimulation index (SI), the raw data for ECt calculation, produced from 3 laboratories. Descriptive statistics along with graphical representation of SI was presented. For inferential statistics, parametric and non-parametric methods were applied to test the reproducibility of SI of a concurrent positive control and the robustness of results were investigated. Descriptive statistics and graphical representation of SI alone could illustrate the within- and between-laboratory reproducibility. Inferential statistics employing parametric and nonparametric methods drew similar conclusion. While all labs passed within- and between-laboratory reproducibility criteria given by OECD TG429 PS based on ECt values, statistical evaluation based on SI values showed that only two labs succeeded in achieving within-laboratory reproducibility. For those two labs that satisfied the within-lab reproducibility, between-laboratory reproducibility could be also attained based on inferential as well as descriptive statistics. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  13. A reliability study on brain activation during active and passive arm movements supported by an MRI-compatible robot.

    PubMed

    Estévez, Natalia; Yu, Ningbo; Brügger, Mike; Villiger, Michael; Hepp-Reymond, Marie-Claude; Riener, Robert; Kollias, Spyros

    2014-11-01

    In neurorehabilitation, longitudinal assessment of arm movement related brain function in patients with motor disability is challenging due to variability in task performance. MRI-compatible robots monitor and control task performance, yielding more reliable evaluation of brain function over time. The main goals of the present study were first to define the brain network activated while performing active and passive elbow movements with an MRI-compatible arm robot (MaRIA) in healthy subjects, and second to test the reproducibility of this activation over time. For the fMRI analysis two models were compared. In model 1 movement onset and duration were included, whereas in model 2 force and range of motion were added to the analysis. Reliability of brain activation was tested with several statistical approaches applied on individual and group activation maps and on summary statistics. The activated network included mainly the primary motor cortex, primary and secondary somatosensory cortex, superior and inferior parietal cortex, medial and lateral premotor regions, and subcortical structures. Reliability analyses revealed robust activation for active movements with both fMRI models and all the statistical methods used. Imposed passive movements also elicited mainly robust brain activation for individual and group activation maps, and reliability was improved by including additional force and range of motion using model 2. These findings demonstrate that the use of robotic devices, such as MaRIA, can be useful to reliably assess arm movement related brain activation in longitudinal studies and may contribute in studies evaluating therapies and brain plasticity following injury in the nervous system.

  14. A system for learning statistical motion patterns.

    PubMed

    Hu, Weiming; Xiao, Xuejuan; Fu, Zhouyu; Xie, Dan; Tan, Tieniu; Maybank, Steve

    2006-09-01

    Analysis of motion patterns is an effective approach for anomaly detection and behavior prediction. Current approaches for the analysis of motion patterns depend on known scenes, where objects move in predefined ways. It is highly desirable to automatically construct object motion patterns which reflect the knowledge of the scene. In this paper, we present a system for automatically learning motion patterns for anomaly detection and behavior prediction based on a proposed algorithm for robustly tracking multiple objects. In the tracking algorithm, foreground pixels are clustered using a fast accurate fuzzy K-means algorithm. Growing and prediction of the cluster centroids of foreground pixels ensure that each cluster centroid is associated with a moving object in the scene. In the algorithm for learning motion patterns, trajectories are clustered hierarchically using spatial and temporal information and then each motion pattern is represented with a chain of Gaussian distributions. Based on the learned statistical motion patterns, statistical methods are used to detect anomalies and predict behaviors. Our system is tested using image sequences acquired, respectively, from a crowded real traffic scene and a model traffic scene. Experimental results show the robustness of the tracking algorithm, the efficiency of the algorithm for learning motion patterns, and the encouraging performance of algorithms for anomaly detection and behavior prediction.

  15. A generalized Kruskal-Wallis test incorporating group uncertainty with application to genetic association studies.

    PubMed

    Acar, Elif F; Sun, Lei

    2013-06-01

    Motivated by genetic association studies of SNPs with genotype uncertainty, we propose a generalization of the Kruskal-Wallis test that incorporates group uncertainty when comparing k samples. The extended test statistic is based on probability-weighted rank-sums and follows an asymptotic chi-square distribution with k - 1 degrees of freedom under the null hypothesis. Simulation studies confirm the validity and robustness of the proposed test in finite samples. Application to a genome-wide association study of type 1 diabetic complications further demonstrates the utilities of this generalized Kruskal-Wallis test for studies with group uncertainty. The method has been implemented as an open-resource R program, GKW. © 2013, The International Biometric Society.

  16. Statistical learning of movement.

    PubMed

    Ongchoco, Joan Danielle Khonghun; Uddenberg, Stefan; Chun, Marvin M

    2016-12-01

    The environment is dynamic, but objects move in predictable and characteristic ways, whether they are a dancer in motion, or a bee buzzing around in flight. Sequences of movement are comprised of simpler motion trajectory elements chained together. But how do we know where one trajectory element ends and another begins, much like we parse words from continuous streams of speech? As a novel test of statistical learning, we explored the ability to parse continuous movement sequences into simpler element trajectories. Across four experiments, we showed that people can robustly parse such sequences from a continuous stream of trajectories under increasingly stringent tests of segmentation ability and statistical learning. Observers viewed a single dot as it moved along simple sequences of paths, and were later able to discriminate these sequences from novel and partial ones shown at test. Observers demonstrated this ability when there were potentially helpful trajectory-segmentation cues such as a common origin for all movements (Experiment 1); when the dot's motions were entirely continuous and unconstrained (Experiment 2); when sequences were tested against partial sequences as a more stringent test of statistical learning (Experiment 3); and finally, even when the element trajectories were in fact pairs of trajectories, so that abrupt directional changes in the dot's motion could no longer signal inter-trajectory boundaries (Experiment 4). These results suggest that observers can automatically extract regularities in movement - an ability that may underpin our capacity to learn more complex biological motions, as in sport or dance.

  17. Optimizing the design of a reproduction toxicity test with the pond snail Lymnaea stagnalis.

    PubMed

    Charles, Sandrine; Ducrot, Virginie; Azam, Didier; Benstead, Rachel; Brettschneider, Denise; De Schamphelaere, Karel; Filipe Goncalves, Sandra; Green, John W; Holbech, Henrik; Hutchinson, Thomas H; Faber, Daniel; Laranjeiro, Filipe; Matthiessen, Peter; Norrgren, Leif; Oehlmann, Jörg; Reategui-Zirena, Evelyn; Seeland-Fremer, Anne; Teigeler, Matthias; Thome, Jean-Pierre; Tobor Kaplon, Marysia; Weltje, Lennart; Lagadic, Laurent

    2016-11-01

    This paper presents the results from two ring-tests addressing the feasibility, robustness and reproducibility of a reproduction toxicity test with the freshwater gastropod Lymnaea stagnalis (RENILYS strain). Sixteen laboratories (from inexperienced to expert laboratories in mollusc testing) from nine countries participated in these ring-tests. Survival and reproduction were evaluated in L. stagnalis exposed to cadmium, tributyltin, prochloraz and trenbolone according to an OECD draft Test Guideline. In total, 49 datasets were analysed to assess the practicability of the proposed experimental protocol, and to estimate the between-laboratory reproducibility of toxicity endpoint values. The statistical analysis of count data (number of clutches or eggs per individual-day) leading to ECx estimation was specifically developed and automated through a free web-interface. Based on a complementary statistical analysis, the optimal test duration was established and the most sensitive and cost-effective reproduction toxicity endpoint was identified, to be used as the core endpoint. This validation process and the resulting optimized protocol were used to consolidate the OECD Test Guideline for the evaluation of reproductive effects of chemicals in L. stagnalis. Copyright © 2016 Elsevier Inc. All rights reserved.

  18. Measuring the Sensitivity of Single-locus “Neutrality Tests” Using a Direct Perturbation Approach

    PubMed Central

    Garrigan, Daniel; Lewontin, Richard; Wakeley, John

    2010-01-01

    A large number of statistical tests have been proposed to detect natural selection based on a sample of variation at a single genetic locus. These tests measure the deviation of the allelic frequency distribution observed within populations from the distribution expected under a set of assumptions that includes both neutral evolution and equilibrium population demography. The present study considers a new way to assess the statistical properties of these tests of selection, by their behavior in response to direct perturbations of the steady-state allelic frequency distribution, unconstrained by any particular nonequilibrium demographic scenario. Results from Monte Carlo computer simulations indicate that most tests of selection are more sensitive to perturbations of the allele frequency distribution that increase the variance in allele frequencies than to perturbations that decrease the variance. Simulations also demonstrate that it requires, on average, 4N generations (N is the diploid effective population size) for tests of selection to relax to their theoretical, steady-state distributions following different perturbations of the allele frequency distribution to its extremes. This relatively long relaxation time highlights the fact that these tests are not robust to violations of the other assumptions of the null model besides neutrality. Lastly, genetic variation arising under an example of a regularly cycling demographic scenario is simulated. Tests of selection performed on this last set of simulated data confirm the confounding nature of these tests for the inference of natural selection, under a demographic scenario that likely holds for many species. The utility of using empirical, genomic distributions of test statistics, instead of the theoretical steady-state distribution, is discussed as an alternative for improving the statistical inference of natural selection. PMID:19744997

  19. FRAIL Questionnaire Screening Tool and Short-Term Outcomes in Geriatric Fracture Patients.

    PubMed

    Gleason, Lauren Jan; Benton, Emily A; Alvarez-Nebreda, M Loreto; Weaver, Michael J; Harris, Mitchel B; Javedan, Houman

    2017-12-01

    There are limited screening tools to predict adverse postoperative outcomes for the geriatric surgical fracture population. Frailty is increasingly recognized as a risk assessment to capture complexity. The goal of this study was to use a short screening tool, the FRAIL scale, to categorize the level of frailty of older adults admitted with a fracture to determine the association of each frailty category with postoperative and 30-day outcomes. Retrospective cohort study. Level 1 trauma center. A total of 175 consecutive patients over age 70 years admitted to co-managed orthopedic trauma and geriatrics services. The FRAIL scale (short 5-question assessment of fatigue, resistance, aerobic capacity, illnesses, and loss of weight) classified the patients into 3 categories: robust (score = 0), prefrail (score = 1-2), and frail (score = 3-5). Postoperative outcome variables collected were postoperative complications, unplanned intensive care unit admission, length of stay (LOS), discharge disposition, and orthopedic follow-up after surgery. Thirty-day outcomes measured were 30-day readmission and 30-day mortality. Analysis of variance (1-way) and Kruskal-Wallis tests were used to compare continuous variables across the 3 FRAIL categories. Fisher exact tests were used to compare categorical variables. Multiple regression analysis, adjusted by age, sex, and Charlson index, was conducted to study the association between frailty category and outcomes. FRAIL scale categorized the patients into 3 groups: robust (n = 29), prefrail (n = 73), and frail (n = 73). There were statistically significant differences between groups in terms of age, comorbidity, dementia, functional dependency, polypharmacy, and rate of institutionalization, being higher in the frailest patients. Hip fracture was the most frequent fracture, and it was more frequent as the frailty of the patient increased (48%, 61%, and 75% in robust, prefrail, and frail groups, respectively). The American Society of Anesthesiologists preoperative risk significantly correlated with the frailty of the patient (American Society of Anesthesiologists score 3-4: 41%, 82% and 86%, in robust, prefrail, and frail groups, P < .001). After adjustment by age, sex, and comorbidity, there was a statistically significant association between frailty and both LOS and the development of any complication after surgery (LOS: 4.2, 5.0, and 7.1 days, P = .002; any complication: 3.4%, 26%, and 39.7%, P = .03; in robust, prefrail, and frail groups). There were also significant differences in discharge disposition (31% of robust vs 4.1% frail, P = .008) and follow-up completion (97% of robust vs 69% of the frail ones). Differences in time to surgery, unplanned intensive care unit admission, and 30-day readmission and mortality, although showing a trend, did not reach statistical significance. Frailty, measured by the FRAIL scale, was associated with increase LOS, complications after surgery, and discharge to rehabilitation facility in geriatric fracture patients. The FRAIL scale is a promising short screen to stratify and help operationalize the perioperative care of older surgical patients. Copyright © 2017 AMDA – The Society for Post-Acute and Long-Term Care Medicine. Published by Elsevier Inc. All rights reserved.

  20. A robust statistical estimation (RoSE) algorithm jointly recovers the 3D location and intensity of single molecules accurately and precisely

    NASA Astrophysics Data System (ADS)

    Mazidi, Hesam; Nehorai, Arye; Lew, Matthew D.

    2018-02-01

    In single-molecule (SM) super-resolution microscopy, the complexity of a biological structure, high molecular density, and a low signal-to-background ratio (SBR) may lead to imaging artifacts without a robust localization algorithm. Moreover, engineered point spread functions (PSFs) for 3D imaging pose difficulties due to their intricate features. We develop a Robust Statistical Estimation algorithm, called RoSE, that enables joint estimation of the 3D location and photon counts of SMs accurately and precisely using various PSFs under conditions of high molecular density and low SBR.

  1. Alignments of parity even/odd-only multipoles in CMB

    NASA Astrophysics Data System (ADS)

    Aluri, Pavan K.; Ralston, John P.; Weltman, Amanda

    2017-12-01

    We compare the statistics of parity even and odd multipoles of the cosmic microwave background (CMB) sky from Planck full mission temperature measurements. An excess power in odd multipoles compared to even multipoles has previously been found on large angular scales. Motivated by this apparent parity asymmetry, we evaluate directional statistics associated with even compared to odd multipoles, along with their significances. Primary tools are the Power tensor and Alignment tensor statistics. We limit our analysis to the first 60 multipoles i.e. l = [2, 61]. We find no evidence for statistically unusual alignments of even parity multipoles. More than one independent statistic finds evidence for alignments of anisotropy axes of odd multipoles, with a significance equivalent to ∼2σ or more. The robustness of alignment axes is tested by making Galactic cuts and varying the multipole range. Very interestingly, the region spanned by the (a)symmetry axes is found to broadly contain other parity (a)symmetry axes previously observed in the literature.

  2. Efficient and robust quantum random number generation by photon number detection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Applegate, M. J.; Cavendish Laboratory, University of Cambridge, 19 JJ Thomson Avenue, Cambridge CB3 0HE; Thomas, O.

    2015-08-17

    We present an efficient and robust quantum random number generator based upon high-rate room temperature photon number detection. We employ an electric field-modulated silicon avalanche photodiode, a type of device particularly suited to high-rate photon number detection with excellent photon number resolution to detect, without an applied dead-time, up to 4 photons from the optical pulses emitted by a laser. By both measuring and modeling the response of the detector to the incident photons, we are able to determine the illumination conditions that achieve an optimal bit rate that we show is robust against variation in the photon flux. Wemore » extract random bits from the detected photon numbers with an efficiency of 99% corresponding to 1.97 bits per detected photon number yielding a bit rate of 143 Mbit/s, and verify that the extracted bits pass stringent statistical tests for randomness. Our scheme is highly scalable and has the potential of multi-Gbit/s bit rates.« less

  3. Robust short-term memory without synaptic learning.

    PubMed

    Johnson, Samuel; Marro, J; Torres, Joaquín J

    2013-01-01

    Short-term memory in the brain cannot in general be explained the way long-term memory can--as a gradual modification of synaptic weights--since it takes place too quickly. Theories based on some form of cellular bistability, however, do not seem able to account for the fact that noisy neurons can collectively store information in a robust manner. We show how a sufficiently clustered network of simple model neurons can be instantly induced into metastable states capable of retaining information for a short time (a few seconds). The mechanism is robust to different network topologies and kinds of neural model. This could constitute a viable means available to the brain for sensory and/or short-term memory with no need of synaptic learning. Relevant phenomena described by neurobiology and psychology, such as local synchronization of synaptic inputs and power-law statistics of forgetting avalanches, emerge naturally from this mechanism, and we suggest possible experiments to test its viability in more biological settings.

  4. Reproducible detection of disease-associated markers from gene expression data.

    PubMed

    Omae, Katsuhiro; Komori, Osamu; Eguchi, Shinto

    2016-08-18

    Detection of disease-associated markers plays a crucial role in gene screening for biological studies. Two-sample test statistics, such as the t-statistic, are widely used to rank genes based on gene expression data. However, the resultant gene ranking is often not reproducible among different data sets. Such irreproducibility may be caused by disease heterogeneity. When we divided data into two subsets, we found that the signs of the two t-statistics were often reversed. Focusing on such instability, we proposed a sign-sum statistic that counts the signs of the t-statistics for all possible subsets. The proposed method excludes genes affected by heterogeneity, thereby improving the reproducibility of gene ranking. We compared the sign-sum statistic with the t-statistic by a theoretical evaluation of the upper confidence limit. Through simulations and applications to real data sets, we show that the sign-sum statistic exhibits superior performance. We derive the sign-sum statistic for getting a robust gene ranking. The sign-sum statistic gives more reproducible ranking than the t-statistic. Using simulated data sets we show that the sign-sum statistic excludes hetero-type genes well. Also for the real data sets, the sign-sum statistic performs well in a viewpoint of ranking reproducibility.

  5. Testing for Granger Causality in the Frequency Domain: A Phase Resampling Method.

    PubMed

    Liu, Siwei; Molenaar, Peter

    2016-01-01

    This article introduces phase resampling, an existing but rarely used surrogate data method for making statistical inferences of Granger causality in frequency domain time series analysis. Granger causality testing is essential for establishing causal relations among variables in multivariate dynamic processes. However, testing for Granger causality in the frequency domain is challenging due to the nonlinear relation between frequency domain measures (e.g., partial directed coherence, generalized partial directed coherence) and time domain data. Through a simulation study, we demonstrate that phase resampling is a general and robust method for making statistical inferences even with short time series. With Gaussian data, phase resampling yields satisfactory type I and type II error rates in all but one condition we examine: when a small effect size is combined with an insufficient number of data points. Violations of normality lead to slightly higher error rates but are mostly within acceptable ranges. We illustrate the utility of phase resampling with two empirical examples involving multivariate electroencephalography (EEG) and skin conductance data.

  6. Use of Bayesian Methods to Analyze and Visualize Content Uniformity Capability Versus United States Pharmacopeia and ASTM Standards.

    PubMed

    Hofer, Jeffrey D; Rauk, Adam P

    2017-02-01

    The purpose of this work was to develop a straightforward and robust approach to analyze and summarize the ability of content uniformity data to meet different criteria. A robust Bayesian statistical analysis methodology is presented which provides a concise and easily interpretable visual summary of the content uniformity analysis results. The visualization displays individual batch analysis results and shows whether there is high confidence that different content uniformity criteria could be met a high percentage of the time in the future. The 3 tests assessed are as follows: (a) United States Pharmacopeia Uniformity of Dosage Units <905>, (b) a specific ASTM E2810 Sampling Plan 1 criterion to potentially be used for routine release testing, and (c) another specific ASTM E2810 Sampling Plan 2 criterion to potentially be used for process validation. The approach shown here could readily be used to create similar result summaries for other potential criteria. Copyright © 2017 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.

  7. Argon-oxygen atmospheric pressure plasma treatment on carbon fiber reinforced polymer for improved bonding

    NASA Astrophysics Data System (ADS)

    Chartosias, Marios

    Acceptance of Carbon Fiber Reinforced Polymer (CFRP) structures requires a robust surface preparation method with improved process controls capable of ensuring high bond quality. Surface preparation in a production clean room environment prior to applying adhesive for bonding would minimize risk of contamination and reduce cost. Plasma treatment is a robust surface preparation process capable of being applied in a production clean room environment with process parameters that are easily controlled and documented. Repeatable and consistent processing is enabled through the development of a process parameter window utilizing techniques such as Design of Experiments (DOE) tailored to specific adhesive and substrate bonding applications. Insight from respective plasma treatment Original Equipment Manufacturers (OEMs) and screening tests determined critical process factors from non-factors and set the associated factor levels prior to execution of the DOE. Results from mode I Double Cantilever Beam (DCB) testing per ASTM D 5528 [1] standard and DOE statistical analysis software are used to produce a regression model and determine appropriate optimum settings for each factor.

  8. Performance evaluation of the Champagne source reconstruction algorithm on simulated and real M/EEG data.

    PubMed

    Owen, Julia P; Wipf, David P; Attias, Hagai T; Sekihara, Kensuke; Nagarajan, Srikantan S

    2012-03-01

    In this paper, we present an extensive performance evaluation of a novel source localization algorithm, Champagne. It is derived in an empirical Bayesian framework that yields sparse solutions to the inverse problem. It is robust to correlated sources and learns the statistics of non-stimulus-evoked activity to suppress the effect of noise and interfering brain activity. We tested Champagne on both simulated and real M/EEG data. The source locations used for the simulated data were chosen to test the performance on challenging source configurations. In simulations, we found that Champagne outperforms the benchmark algorithms in terms of both the accuracy of the source localizations and the correct estimation of source time courses. We also demonstrate that Champagne is more robust to correlated brain activity present in real MEG data and is able to resolve many distinct and functionally relevant brain areas with real MEG and EEG data. Copyright © 2011 Elsevier Inc. All rights reserved.

  9. Robust, Adaptive Radar Detection and Estimation

    DTIC Science & Technology

    2015-07-21

    cost function is not a convex function in R, we apply a transformation variables i.e., let X = σ2R−1 and S′ = 1 σ2 S. Then, the revised cost function in...1 viv H i . We apply this inverse covariance matrix in computing the SINR as well as estimator variance. • Rank Constrained Maximum Likelihood: Our...even as almost all available training samples are corrupted. Probability of Detection vs. SNR We apply three test statistics, the normalized matched

  10. An Intercompany Perspective on Biopharmaceutical Drug Product Robustness Studies.

    PubMed

    Morar-Mitrica, Sorina; Adams, Monica L; Crotts, George; Wurth, Christine; Ihnat, Peter M; Tabish, Tanvir; Antochshuk, Valentyn; DiLuzio, Willow; Dix, Daniel B; Fernandez, Jason E; Gupta, Kapil; Fleming, Michael S; He, Bing; Kranz, James K; Liu, Dingjiang; Narasimhan, Chakravarthy; Routhier, Eric; Taylor, Katherine D; Truong, Nobel; Stokes, Elaine S E

    2018-02-01

    The Biophorum Development Group (BPDG) is an industry-wide consortium enabling networking and sharing of best practices for the development of biopharmaceuticals. To gain a better understanding of current industry approaches for establishing biopharmaceutical drug product (DP) robustness, the BPDG-Formulation Point Share group conducted an intercompany collaboration exercise, which included a bench-marking survey and extensive group discussions around the scope, design, and execution of robustness studies. The results of this industry collaboration revealed several key common themes: (1) overall DP robustness is defined by both the formulation and the manufacturing process robustness; (2) robustness integrates the principles of quality by design (QbD); (3) DP robustness is an important factor in setting critical quality attribute control strategies and commercial specifications; (4) most companies employ robustness studies, along with prior knowledge, risk assessments, and statistics, to develop the DP design space; (5) studies are tailored to commercial development needs and the practices of each company. Three case studies further illustrate how a robustness study design for a biopharmaceutical DP balances experimental complexity, statistical power, scientific understanding, and risk assessment to provide the desired product and process knowledge. The BPDG-Formulation Point Share discusses identified industry challenges with regard to biopharmaceutical DP robustness and presents some recommendations for best practices. Copyright © 2018 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.

  11. Development of a Comprehensive Digital Avionics Curriculum for the Aeronautical Engineer

    DTIC Science & Technology

    2006-03-01

    able to analyze and design aircraft and missile guidance and control systems, including feedback stabilization schemes and stochastic processes, using ...Uncertainty modeling for robust control; Robust closed-loop stability and performance; Robust H- infinity control; Robustness check using mu-analysis...Controlled feedback (reduces noise) 3. Statistical group response (reduce pressure toward conformity) When used as a tool to study a complex problem

  12. Robust LOD scores for variance component-based linkage analysis.

    PubMed

    Blangero, J; Williams, J T; Almasy, L

    2000-01-01

    The variance component method is now widely used for linkage analysis of quantitative traits. Although this approach offers many advantages, the importance of the underlying assumption of multivariate normality of the trait distribution within pedigrees has not been studied extensively. Simulation studies have shown that traits with leptokurtic distributions yield linkage test statistics that exhibit excessive Type I error when analyzed naively. We derive analytical formulae relating the deviation from the expected asymptotic distribution of the lod score to the kurtosis and total heritability of the quantitative trait. A simple correction constant yields a robust lod score for any deviation from normality and for any pedigree structure, and effectively eliminates the problem of inflated Type I error due to misspecification of the underlying probability model in variance component-based linkage analysis.

  13. Experimental design and statistical methods for improved hit detection in high-throughput screening.

    PubMed

    Malo, Nathalie; Hanley, James A; Carlile, Graeme; Liu, Jing; Pelletier, Jerry; Thomas, David; Nadon, Robert

    2010-09-01

    Identification of active compounds in high-throughput screening (HTS) contexts can be substantially improved by applying classical experimental design and statistical inference principles to all phases of HTS studies. The authors present both experimental and simulated data to illustrate how true-positive rates can be maximized without increasing false-positive rates by the following analytical process. First, the use of robust data preprocessing methods reduces unwanted variation by removing row, column, and plate biases. Second, replicate measurements allow estimation of the magnitude of the remaining random error and the use of formal statistical models to benchmark putative hits relative to what is expected by chance. Receiver Operating Characteristic (ROC) analyses revealed superior power for data preprocessed by a trimmed-mean polish method combined with the RVM t-test, particularly for small- to moderate-sized biological hits.

  14. Linear models: permutation methods

    USGS Publications Warehouse

    Cade, B.S.; Everitt, B.S.; Howell, D.C.

    2005-01-01

    Permutation tests (see Permutation Based Inference) for the linear model have applications in behavioral studies when traditional parametric assumptions about the error term in a linear model are not tenable. Improved validity of Type I error rates can be achieved with properly constructed permutation tests. Perhaps more importantly, increased statistical power, improved robustness to effects of outliers, and detection of alternative distributional differences can be achieved by coupling permutation inference with alternative linear model estimators. For example, it is well-known that estimates of the mean in linear model are extremely sensitive to even a single outlying value of the dependent variable compared to estimates of the median [7, 19]. Traditionally, linear modeling focused on estimating changes in the center of distributions (means or medians). However, quantile regression allows distributional changes to be estimated in all or any selected part of a distribution or responses, providing a more complete statistical picture that has relevance to many biological questions [6]...

  15. A Statistics-Based Cracking Criterion of Resin-Bonded Silica Sand for Casting Process Simulation

    NASA Astrophysics Data System (ADS)

    Wang, Huimin; Lu, Yan; Ripplinger, Keith; Detwiler, Duane; Luo, Alan A.

    2017-02-01

    Cracking of sand molds/cores can result in many casting defects such as veining. A robust cracking criterion is needed in casting process simulation for predicting/controlling such defects. A cracking probability map, relating to fracture stress and effective volume, was proposed for resin-bonded silica sand based on Weibull statistics. Three-point bending test results of sand samples were used to generate the cracking map and set up a safety line for cracking criterion. Tensile test results confirmed the accuracy of the safety line for cracking prediction. A laboratory casting experiment was designed and carried out to predict cracking of a cup mold during aluminum casting. The stress-strain behavior and the effective volume of the cup molds were calculated using a finite element analysis code ProCAST®. Furthermore, an energy dispersive spectroscopy fractographic examination of the sand samples confirmed the binder cracking in resin-bonded silica sand.

  16. A novel methodology for building robust design rules by using design based metrology (DBM)

    NASA Astrophysics Data System (ADS)

    Lee, Myeongdong; Choi, Seiryung; Choi, Jinwoo; Kim, Jeahyun; Sung, Hyunju; Yeo, Hyunyoung; Shim, Myoungseob; Jin, Gyoyoung; Chung, Eunseung; Roh, Yonghan

    2013-03-01

    This paper addresses a methodology for building robust design rules by using design based metrology (DBM). Conventional method for building design rules has been using a simulation tool and a simple pattern spider mask. At the early stage of the device, the estimation of simulation tool is poor. And the evaluation of the simple pattern spider mask is rather subjective because it depends on the experiential judgment of an engineer. In this work, we designed a huge number of pattern situations including various 1D and 2D design structures. In order to overcome the difficulties of inspecting many types of patterns, we introduced Design Based Metrology (DBM) of Nano Geometry Research, Inc. And those mass patterns could be inspected at a fast speed with DBM. We also carried out quantitative analysis on PWQ silicon data to estimate process variability. Our methodology demonstrates high speed and accuracy for building design rules. All of test patterns were inspected within a few hours. Mass silicon data were handled with not personal decision but statistical processing. From the results, robust design rules are successfully verified and extracted. Finally we found out that our methodology is appropriate for building robust design rules.

  17. Algorithm for computing descriptive statistics for very large data sets and the exa-scale era

    NASA Astrophysics Data System (ADS)

    Beekman, Izaak

    2017-11-01

    An algorithm for Single-point, Parallel, Online, Converging Statistics (SPOCS) is presented. It is suited for in situ analysis that traditionally would be relegated to post-processing, and can be used to monitor the statistical convergence and estimate the error/residual in the quantity-useful for uncertainty quantification too. Today, data may be generated at an overwhelming rate by numerical simulations and proliferating sensing apparatuses in experiments and engineering applications. Monitoring descriptive statistics in real time lets costly computations and experiments be gracefully aborted if an error has occurred, and monitoring the level of statistical convergence allows them to be run for the shortest amount of time required to obtain good results. This algorithm extends work by Pébay (Sandia Report SAND2008-6212). Pébay's algorithms are recast into a converging delta formulation, with provably favorable properties. The mean, variance, covariances and arbitrary higher order statistical moments are computed in one pass. The algorithm is tested using Sillero, Jiménez, & Moser's (2013, 2014) publicly available UPM high Reynolds number turbulent boundary layer data set, demonstrating numerical robustness, efficiency and other favorable properties.

  18. Interlaboratory trial for the measurement of total cobalt in equine urine and plasma by ICP-MS.

    PubMed

    Popot, Marie-Agnes; Ho, Emmie N M; Stojiljkovic, Natali; Bagilet, Florian; Remy, Pierre; Maciejewski, Pascal; Loup, Benoit; Chan, George H M; Hargrave, Sabine; Arthur, Rick M; Russo, Charlie; White, James; Hincks, Pamela; Pearce, Clive; Ganio, George; Zahra, Paul; Batty, David; Jarrett, Mark; Brooks, Lydia; Prescott, Lise-Anne; Bailly-Chouriberry, Ludovic; Bonnaire, Yves; Wan, Terence S M

    2017-09-01

    Cobalt is an essential mineral micronutrient and is regularly present in equine nutritional and feed supplements. Therefore, cobalt is naturally present at low concentrations in biological samples. The administration of cobalt chloride is considered to be blood doping and is thus prohibited. To control the misuse of cobalt, it was mandatory to establish an international threshold for cobalt in plasma and/or in urine. To achieve this goal, an international collaboration, consisting of an interlaboratory comparison between 5 laboratories for the urine study and 8 laboratories for the plasma study, has been undertaken. Quantification of cobalt in the biological samples was performed by inductively coupled plasma-mass spectrometry (ICP-MS). Ring tests were based on the analysis of 5 urine samples supplemented at concentrations ranging from 5 up to 500 ng/mL and 5 plasma samples spiked at concentrations ranging from 0.5 up to 25 ng/mL. The results obtained from the different laboratories were collected, compiled, and compared to assess the reproducibility and robustness of cobalt quantification measurements. The statistical approach for the ring test for total cobalt in urine was based on the determination of percentage deviations from the calculated means, while robust statistics based on the calculated median were applied to the ring test for total cobalt in plasma. The inter-laboratory comparisons in urine and in plasma were successful so that 97.6% of the urine samples and 97.5% of the plasma samples gave satisfactory results. Threshold values for cobalt in plasma and urine were established from data only obtained by laboratories involved in the ring test. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  19. Statistical robustness of machine-learning estimates for characterizing a groundwater-surface water system, Southland, New Zealand

    NASA Astrophysics Data System (ADS)

    Friedel, M. J.; Daughney, C.

    2016-12-01

    The development of a successful surface-groundwater management strategy depends on the quality of data provided for analysis. This study evaluates the statistical robustness when using a modified self-organizing map (MSOM) technique to estimate missing values for three hypersurface models: synoptic groundwater-surface water hydrochemistry, time-series of groundwater-surface water hydrochemistry, and mixed-survey (combination of groundwater-surface water hydrochemistry and lithologies) hydrostratigraphic unit data. These models of increasing complexity are developed and validated based on observations from the Southland region of New Zealand. In each case, the estimation method is sufficiently robust to cope with groundwater-surface water hydrochemistry vagaries due to sample size and extreme data insufficiency, even when >80% of the data are missing. The estimation of surface water hydrochemistry time series values enabled the evaluation of seasonal variation, and the imputation of lithologies facilitated the evaluation of hydrostratigraphic controls on groundwater-surface water interaction. The robust statistical results for groundwater-surface water models of increasing data complexity provide justification to apply the MSOM technique in other regions of New Zealand and abroad.

  20. Integrating Symbolic and Statistical Methods for Testing Intelligent Systems Applications to Machine Learning and Computer Vision

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jha, Sumit Kumar; Pullum, Laura L; Ramanathan, Arvind

    Embedded intelligent systems ranging from tiny im- plantable biomedical devices to large swarms of autonomous un- manned aerial systems are becoming pervasive in our daily lives. While we depend on the flawless functioning of such intelligent systems, and often take their behavioral correctness and safety for granted, it is notoriously difficult to generate test cases that expose subtle errors in the implementations of machine learning algorithms. Hence, the validation of intelligent systems is usually achieved by studying their behavior on representative data sets, using methods such as cross-validation and bootstrapping.In this paper, we present a new testing methodology for studyingmore » the correctness of intelligent systems. Our approach uses symbolic decision procedures coupled with statistical hypothesis testing to. We also use our algorithm to analyze the robustness of a human detection algorithm built using the OpenCV open-source computer vision library. We show that the human detection implementation can fail to detect humans in perturbed video frames even when the perturbations are so small that the corresponding frames look identical to the naked eye.« less

  1. A Simple and Robust Statistical Test for Detecting the Presence of Recombination

    PubMed Central

    Bruen, Trevor C.; Philippe, Hervé; Bryant, David

    2006-01-01

    Recombination is a powerful evolutionary force that merges historically distinct genotypes. But the extent of recombination within many organisms is unknown, and even determining its presence within a set of homologous sequences is a difficult question. Here we develop a new statistic, Φw, that can be used to test for recombination. We show through simulation that our test can discriminate effectively between the presence and absence of recombination, even in diverse situations such as exponential growth (star-like topologies) and patterns of substitution rate correlation. A number of other tests, Max χ2, NSS, a coalescent-based likelihood permutation test (from LDHat), and correlation of linkage disequilibrium (both r2 and |D′|) with distance, all tend to underestimate the presence of recombination under strong population growth. Moreover, both Max χ2 and NSS falsely infer the presence of recombination under a simple model of mutation rate correlation. Results on empirical data show that our test can be used to detect recombination between closely as well as distantly related samples, regardless of the suspected rate of recombination. The results suggest that Φw is one of the best approaches to distinguish recurrent mutation from recombination in a wide variety of circumstances. PMID:16489234

  2. Single-Item Measurement of Suicidal Behaviors: Validity and Consequences of Misclassification

    PubMed Central

    Millner, Alexander J.; Lee, Michael D.; Nock, Matthew K.

    2015-01-01

    Suicide is a leading cause of death worldwide. Although research has made strides in better defining suicidal behaviors, there has been less focus on accurate measurement. Currently, the widespread use of self-report, single-item questions to assess suicide ideation, plans and attempts may contribute to measurement problems and misclassification. We examined the validity of single-item measurement and the potential for statistical errors. Over 1,500 participants completed an online survey containing single-item questions regarding a history of suicidal behaviors, followed by questions with more precise language, multiple response options and narrative responses to examine the validity of single-item questions. We also conducted simulations to test whether common statistical tests are robust against the degree of misclassification produced by the use of single-items. We found that 11.3% of participants that endorsed a single-item suicide attempt measure engaged in behavior that would not meet the standard definition of a suicide attempt. Similarly, 8.8% of those who endorsed a single-item measure of suicide ideation endorsed thoughts that would not meet standard definitions of suicide ideation. Statistical simulations revealed that this level of misclassification substantially decreases statistical power and increases the likelihood of false conclusions from statistical tests. Providing a wider range of response options for each item reduced the misclassification rate by approximately half. Overall, the use of single-item, self-report questions to assess the presence of suicidal behaviors leads to misclassification, increasing the likelihood of statistical decision errors. Improving the measurement of suicidal behaviors is critical to increase understanding and prevention of suicide. PMID:26496707

  3. Design of robust flow processing networks with time-programmed responses

    NASA Astrophysics Data System (ADS)

    Kaluza, P.; Mikhailov, A. S.

    2012-04-01

    Can artificially designed networks reach the levels of robustness against local damage which are comparable with those of the biochemical networks of a living cell? We consider a simple model where the flow applied to an input node propagates through the network and arrives at different times to the output nodes, thus generating a pattern of coordinated responses. By using evolutionary optimization algorithms, functional networks - with required time-programmed responses - were constructed. Then, continuing the evolution, such networks were additionally optimized for robustness against deletion of individual nodes or links. In this manner, large ensembles of functional networks with different kinds of robustness were obtained, making statistical investigations and comparison of their structural properties possible. We have found that, generally, different architectures are needed for various kinds of robustness. The differences are statistically revealed, for example, in the Laplacian spectra of the respective graphs. On the other hand, motif distributions of robust networks do not differ from those of the merely functional networks; they are found to belong to the first Alon superfamily, the same as that of the gene transcription networks of single-cell organisms.

  4. Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline

    PubMed Central

    Rahmatallah, Yasir; Emmert-Streib, Frank

    2016-01-01

    Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128

  5. Structural damage detection based on stochastic subspace identification and statistical pattern recognition: II. Experimental validation under varying temperature

    NASA Astrophysics Data System (ADS)

    Lin, Y. Q.; Ren, W. X.; Fang, S. E.

    2011-11-01

    Although most vibration-based damage detection methods can acquire satisfactory verification on analytical or numerical structures, most of them may encounter problems when applied to real-world structures under varying environments. The damage detection methods that directly extract damage features from the periodically sampled dynamic time history response measurements are desirable but relevant research and field application verification are still lacking. In this second part of a two-part paper, the robustness and performance of the statistics-based damage index using the forward innovation model by stochastic subspace identification of a vibrating structure proposed in the first part have been investigated against two prestressed reinforced concrete (RC) beams tested in the laboratory and a full-scale RC arch bridge tested in the field under varying environments. Experimental verification is focused on temperature effects. It is demonstrated that the proposed statistics-based damage index is insensitive to temperature variations but sensitive to the structural deterioration or state alteration. This makes it possible to detect the structural damage for the real-scale structures experiencing ambient excitations and varying environmental conditions.

  6. Comparison of optimization strategy and similarity metric in atlas-to-subject registration using statistical deformation model

    NASA Astrophysics Data System (ADS)

    Otake, Y.; Murphy, R. J.; Grupp, R. B.; Sato, Y.; Taylor, R. H.; Armand, M.

    2015-03-01

    A robust atlas-to-subject registration using a statistical deformation model (SDM) is presented. The SDM uses statistics of voxel-wise displacement learned from pre-computed deformation vectors of a training dataset. This allows an atlas instance to be directly translated into an intensity volume and compared with a patient's intensity volume. Rigid and nonrigid transformation parameters were simultaneously optimized via the Covariance Matrix Adaptation - Evolutionary Strategy (CMA-ES), with image similarity used as the objective function. The algorithm was tested on CT volumes of the pelvis from 55 female subjects. A performance comparison of the CMA-ES and Nelder-Mead downhill simplex optimization algorithms with the mutual information and normalized cross correlation similarity metrics was conducted. Simulation studies using synthetic subjects were performed, as well as leave-one-out cross validation studies. Both studies suggested that mutual information and CMA-ES achieved the best performance. The leave-one-out test demonstrated 4.13 mm error with respect to the true displacement field, and 26,102 function evaluations in 180 seconds, on average.

  7. Robustness of radiomic breast features of benign lesions and luminal A cancers across MR magnet strengths

    NASA Astrophysics Data System (ADS)

    Whitney, Heather M.; Drukker, Karen; Edwards, Alexandra; Papaioannou, John; Giger, Maryellen L.

    2018-02-01

    Radiomics features extracted from breast lesion images have shown potential in diagnosis and prognosis of breast cancer. As clinical institutions transition from 1.5 T to 3.0 T magnetic resonance imaging (MRI), it is helpful to identify robust features across these field strengths. In this study, dynamic contrast-enhanced MR images were acquired retrospectively under IRB/HIPAA compliance, yielding 738 cases: 241 and 124 benign lesions imaged at 1.5 T and 3.0 T and 231 and 142 luminal A cancers imaged at 1.5 T and 3.0 T, respectively. Lesions were segmented using a fuzzy C-means method. Extracted radiomic values for each group of lesions by cancer status and field strength of acquisition were compared using a Kolmogorov-Smirnov test for the null hypothesis that two groups being compared came from the same distribution, with p-values being corrected for multiple comparisons by the Holm-Bonferroni method. Two shape features, one texture feature, and three enhancement variance kinetics features were found to be potentially robust. All potentially robust features had areas under the receiver operating characteristic curve (AUC) statistically greater than 0.5 in the task of distinguishing between lesion types (range of means 0.57-0.78). The significant difference in voxel size between field strength of acquisition limits the ability to affirm more features as robust or not robust according to field strength alone, and inhomogeneities in static field strength and radiofrequency field could also have affected the assessment of kinetic curve features as robust or not. Vendor-specific image scaling could have also been a factor. These findings will contribute to the development of radiomic signatures that use features identified as robust across field strength.

  8. Statistics, Handle with Care: Detecting Multiple Model Components with the Likelihood Ratio Test

    NASA Astrophysics Data System (ADS)

    Protassov, Rostislav; van Dyk, David A.; Connors, Alanna; Kashyap, Vinay L.; Siemiginowska, Aneta

    2002-05-01

    The likelihood ratio test (LRT) and the related F-test, popularized in astrophysics by Eadie and coworkers in 1971, Bevington in 1969, Lampton, Margon, & Bowyer, in 1976, Cash in 1979, and Avni in 1978, do not (even asymptotically) adhere to their nominal χ2 and F-distributions in many statistical tests common in astrophysics, thereby casting many marginal line or source detections and nondetections into doubt. Although the above authors illustrate the many legitimate uses of these statistics, in some important cases it can be impossible to compute the correct false positive rate. For example, it has become common practice to use the LRT or the F-test to detect a line in a spectral model or a source above background despite the lack of certain required regularity conditions. (These applications were not originally suggested by Cash or by Bevington.) In these and other settings that involve testing a hypothesis that is on the boundary of the parameter space, contrary to common practice, the nominal χ2 distribution for the LRT or the F-distribution for the F-test should not be used. In this paper, we characterize an important class of problems in which the LRT and the F-test fail and illustrate this nonstandard behavior. We briefly sketch several possible acceptable alternatives, focusing on Bayesian posterior predictive probability values. We present this method in some detail since it is a simple, robust, and intuitive approach. This alternative method is illustrated using the gamma-ray burst of 1997 May 8 (GRB 970508) to investigate the presence of an Fe K emission line during the initial phase of the observation. There are many legitimate uses of the LRT and the F-test in astrophysics, and even when these tests are inappropriate, there remain several statistical alternatives (e.g., judicious use of error bars and Bayes factors). Nevertheless, there are numerous cases of the inappropriate use of the LRT and similar tests in the literature, bringing substantive scientific results into question.

  9. Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation.

    PubMed

    Su, Cheng; Zhou, Lei; Hu, Zheng; Weng, Winnie; Subramani, Jayanthi; Tadkod, Vineet; Hamilton, Kortney; Bautista, Ami; Wu, Yu; Chirmule, Narendra; Zhong, Zhandong Don

    2015-10-01

    Biotherapeutics can elicit immune responses, which can alter the exposure, safety, and efficacy of the therapeutics. A well-designed and robust bioanalytical method is critical for the detection and characterization of relevant anti-drug antibody (ADA) and the success of an immunogenicity study. As a fundamental criterion in immunogenicity testing, assay cut points need to be statistically established with a risk-based approach to reduce subjectivity. This manuscript describes the development of a validated, web-based, multi-tier customized assay statistical tool (CAST) for assessing cut points of ADA assays. The tool provides an intuitive web interface that allows users to import experimental data generated from a standardized experimental design, select the assay factors, run the standardized analysis algorithms, and generate tables, figures, and listings (TFL). It allows bioanalytical scientists to perform complex statistical analysis at a click of the button to produce reliable assay parameters in support of immunogenicity studies. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. The use of imputed sibling genotypes in sibship-based association analysis: on modeling alternatives, power and model misspecification.

    PubMed

    Minică, Camelia C; Dolan, Conor V; Hottenga, Jouke-Jan; Willemsen, Gonneke; Vink, Jacqueline M; Boomsma, Dorret I

    2013-05-01

    When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of two statistical approaches suitable to model imputed genotype data: the mixture approach, which involves the full distribution of the imputed genotypes and the dosage approach, where the mean of the conditional distribution features as the imputed genotype. Simulations were run by varying sibship size, size of the phenotypic correlations among siblings, imputation accuracy and minor allele frequency of the causal SNP. Furthermore, as imputing sibling data and extending the model to include sibships of size two or greater requires modeling the familial covariance matrix, we inquired whether model misspecification affects power. Finally, the results obtained via simulations were empirically verified in two datasets with continuous phenotype data (height) and with a dichotomous phenotype (smoking initiation). Across the settings considered, the mixture and the dosage approach are equally powerful and both produce unbiased parameter estimates. In addition, the likelihood-ratio test in the linear mixed model appears to be robust to the considered misspecification in the background covariance structure, given low to moderate phenotypic correlations among siblings. Empirical results show that the inclusion in association analysis of imputed sibling genotypes does not always result in larger test statistic. The actual test statistic may drop in value due to small effect sizes. That is, if the power benefit is small, that the change in distribution of the test statistic under the alternative is relatively small, the probability is greater of obtaining a smaller test statistic. As the genetic effects are typically hypothesized to be small, in practice, the decision on whether family-based imputation could be used as a means to increase power should be informed by prior power calculations and by the consideration of the background correlation.

  11. LCC-Demons: a robust and accurate symmetric diffeomorphic registration algorithm.

    PubMed

    Lorenzi, M; Ayache, N; Frisoni, G B; Pennec, X

    2013-11-01

    Non-linear registration is a key instrument for computational anatomy to study the morphology of organs and tissues. However, in order to be an effective instrument for the clinical practice, registration algorithms must be computationally efficient, accurate and most importantly robust to the multiple biases affecting medical images. In this work we propose a fast and robust registration framework based on the log-Demons diffeomorphic registration algorithm. The transformation is parameterized by stationary velocity fields (SVFs), and the similarity metric implements a symmetric local correlation coefficient (LCC). Moreover, we show how the SVF setting provides a stable and consistent numerical scheme for the computation of the Jacobian determinant and the flux of the deformation across the boundaries of a given region. Thus, it provides a robust evaluation of spatial changes. We tested the LCC-Demons in the inter-subject registration setting, by comparing with state-of-the-art registration algorithms on public available datasets, and in the intra-subject longitudinal registration problem, for the statistically powered measurements of the longitudinal atrophy in Alzheimer's disease. Experimental results show that LCC-Demons is a generic, flexible, efficient and robust algorithm for the accurate non-linear registration of images, which can find several applications in the field of medical imaging. Without any additional optimization, it solves equally well intra & inter-subject registration problems, and compares favorably to state-of-the-art methods. Copyright © 2013 Elsevier Inc. All rights reserved.

  12. An Improved Rank Correlation Effect Size Statistic for Single-Case Designs: Baseline Corrected Tau.

    PubMed

    Tarlow, Kevin R

    2017-07-01

    Measuring treatment effects when an individual's pretreatment performance is improving poses a challenge for single-case experimental designs. It may be difficult to determine whether improvement is due to the treatment or due to the preexisting baseline trend. Tau- U is a popular single-case effect size statistic that purports to control for baseline trend. However, despite its strengths, Tau- U has substantial limitations: Its values are inflated and not bound between -1 and +1, it cannot be visually graphed, and its relatively weak method of trend control leads to unacceptable levels of Type I error wherein ineffective treatments appear effective. An improved effect size statistic based on rank correlation and robust regression, Baseline Corrected Tau, is proposed and field-tested with both published and simulated single-case time series. A web-based calculator for Baseline Corrected Tau is also introduced for use by single-case investigators.

  13. A statistical framework for evaluating neural networks to predict recurrent events in breast cancer

    NASA Astrophysics Data System (ADS)

    Gorunescu, Florin; Gorunescu, Marina; El-Darzi, Elia; Gorunescu, Smaranda

    2010-07-01

    Breast cancer is the second leading cause of cancer deaths in women today. Sometimes, breast cancer can return after primary treatment. A medical diagnosis of recurrent cancer is often a more challenging task than the initial one. In this paper, we investigate the potential contribution of neural networks (NNs) to support health professionals in diagnosing such events. The NN algorithms are tested and applied to two different datasets. An extensive statistical analysis has been performed to verify our experiments. The results show that a simple network structure for both the multi-layer perceptron and radial basis function can produce equally good results, not all attributes are needed to train these algorithms and, finally, the classification performances of all algorithms are statistically robust. Moreover, we have shown that the best performing algorithm will strongly depend on the features of the datasets, and hence, there is not necessarily a single best classifier.

  14. A new method for detecting signal regions in ordered sequences of real numbers, and application to viral genomic data.

    PubMed

    Gog, Julia R; Lever, Andrew M L; Skittrall, Jordan P

    2018-01-01

    We present a fast, robust and parsimonious approach to detecting signals in an ordered sequence of numbers. Our motivation is in seeking a suitable method to take a sequence of scores corresponding to properties of positions in virus genomes, and find outlying regions of low scores. Suitable statistical methods without using complex models or making many assumptions are surprisingly lacking. We resolve this by developing a method that detects regions of low score within sequences of real numbers. The method makes no assumptions a priori about the length of such a region; it gives the explicit location of the region and scores it statistically. It does not use detailed mechanistic models so the method is fast and will be useful in a wide range of applications. We present our approach in detail, and test it on simulated sequences. We show that it is robust to a wide range of signal morphologies, and that it is able to capture multiple signals in the same sequence. Finally we apply it to viral genomic data to identify regions of evolutionary conservation within influenza and rotavirus.

  15. Statistical segmentation of multidimensional brain datasets

    NASA Astrophysics Data System (ADS)

    Desco, Manuel; Gispert, Juan D.; Reig, Santiago; Santos, Andres; Pascau, Javier; Malpica, Norberto; Garcia-Barreno, Pedro

    2001-07-01

    This paper presents an automatic segmentation procedure for MRI neuroimages that overcomes part of the problems involved in multidimensional clustering techniques like partial volume effects (PVE), processing speed and difficulty of incorporating a priori knowledge. The method is a three-stage procedure: 1) Exclusion of background and skull voxels using threshold-based region growing techniques with fully automated seed selection. 2) Expectation Maximization algorithms are used to estimate the probability density function (PDF) of the remaining pixels, which are assumed to be mixtures of gaussians. These pixels can then be classified into cerebrospinal fluid (CSF), white matter and grey matter. Using this procedure, our method takes advantage of using the full covariance matrix (instead of the diagonal) for the joint PDF estimation. On the other hand, logistic discrimination techniques are more robust against violation of multi-gaussian assumptions. 3) A priori knowledge is added using Markov Random Field techniques. The algorithm has been tested with a dataset of 30 brain MRI studies (co-registered T1 and T2 MRI). Our method was compared with clustering techniques and with template-based statistical segmentation, using manual segmentation as a gold-standard. Our results were more robust and closer to the gold-standard.

  16. Three plot correlation-based small infrared target detection in dense sun-glint environment for infrared search and track

    NASA Astrophysics Data System (ADS)

    Kim, Sungho; Choi, Byungin; Kim, Jieun; Kwon, Soon; Kim, Kyung-Tae

    2012-05-01

    This paper presents a separate spatio-temporal filter based small infrared target detection method to address the sea-based infrared search and track (IRST) problem in dense sun-glint environment. It is critical to detect small infrared targets such as sea-skimming missiles or asymmetric small ships for national defense. On the sea surface, sun-glint clutters degrade the detection performance. Furthermore, if we have to detect true targets using only three images with a low frame rate camera, then the problem is more difficult. We propose a novel three plot correlation filter and statistics based clutter reduction method to achieve robust small target detection rate in dense sun-glint environment. We validate the robust detection performance of the proposed method via real infrared test sequences including synthetic targets.

  17. Commercialisation of Biomarker Tests for Mental Illnesses: Advances and Obstacles.

    PubMed

    Chan, Man K; Cooper, Jason D; Bahn, Sabine

    2015-12-01

    Substantial strides have been made in the field of biomarker research for mental illnesses over the past few decades. However, no US FDA-cleared blood-based biomarker tests have been translated into routine clinical practice. Here, we review the challenges associated with commercialisation of research findings and discuss how these challenges can impede scientific impact and progress. Overall evidence indicates that a lack of research funding and poor reproducibility of findings were the most important obstacles to commercialization of biomarker tests. Fraud, pre-analytical and analytical limitations, and inappropriate statistical analysis are major contributors to poor reproducibility. Increasingly, these issues are acknowledged and actions are being taken to improve data validity, raising the hope that robust biomarker tests will become available in the foreseeable future. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Neural Correlates of Morphology Acquisition through a Statistical Learning Paradigm.

    PubMed

    Sandoval, Michelle; Patterson, Dianne; Dai, Huanping; Vance, Christopher J; Plante, Elena

    2017-01-01

    The neural basis of statistical learning as it occurs over time was explored with stimuli drawn from a natural language (Russian nouns). The input reflected the "rules" for marking categories of gendered nouns, without making participants explicitly aware of the nature of what they were to learn. Participants were scanned while listening to a series of gender-marked nouns during four sequential scans, and were tested for their learning immediately after each scan. Although participants were not told the nature of the learning task, they exhibited learning after their initial exposure to the stimuli. Independent component analysis of the brain data revealed five task-related sub-networks. Unlike prior statistical learning studies of word segmentation, this morphological learning task robustly activated the inferior frontal gyrus during the learning period. This region was represented in multiple independent components, suggesting it functions as a network hub for this type of learning. Moreover, the results suggest that subnetworks activated by statistical learning are driven by the nature of the input, rather than reflecting a general statistical learning system.

  19. Neural Correlates of Morphology Acquisition through a Statistical Learning Paradigm

    PubMed Central

    Sandoval, Michelle; Patterson, Dianne; Dai, Huanping; Vance, Christopher J.; Plante, Elena

    2017-01-01

    The neural basis of statistical learning as it occurs over time was explored with stimuli drawn from a natural language (Russian nouns). The input reflected the “rules” for marking categories of gendered nouns, without making participants explicitly aware of the nature of what they were to learn. Participants were scanned while listening to a series of gender-marked nouns during four sequential scans, and were tested for their learning immediately after each scan. Although participants were not told the nature of the learning task, they exhibited learning after their initial exposure to the stimuli. Independent component analysis of the brain data revealed five task-related sub-networks. Unlike prior statistical learning studies of word segmentation, this morphological learning task robustly activated the inferior frontal gyrus during the learning period. This region was represented in multiple independent components, suggesting it functions as a network hub for this type of learning. Moreover, the results suggest that subnetworks activated by statistical learning are driven by the nature of the input, rather than reflecting a general statistical learning system. PMID:28798703

  20. Simulated performance of an order statistic threshold strategy for detection of narrowband signals

    NASA Technical Reports Server (NTRS)

    Satorius, E.; Brady, R.; Deich, W.; Gulkis, S.; Olsen, E.

    1988-01-01

    The application of order statistics to signal detection is becoming an increasingly active area of research. This is due to the inherent robustness of rank estimators in the presence of large outliers that would significantly degrade more conventional mean-level-based detection systems. A detection strategy is presented in which the threshold estimate is obtained using order statistics. The performance of this algorithm in the presence of simulated interference and broadband noise is evaluated. In this way, the robustness of the proposed strategy in the presence of the interference can be fully assessed as a function of the interference, noise, and detector parameters.

  1. Employing Sensitivity Derivatives for Robust Optimization under Uncertainty in CFD

    NASA Technical Reports Server (NTRS)

    Newman, Perry A.; Putko, Michele M.; Taylor, Arthur C., III

    2004-01-01

    A robust optimization is demonstrated on a two-dimensional inviscid airfoil problem in subsonic flow. Given uncertainties in statistically independent, random, normally distributed flow parameters (input variables), an approximate first-order statistical moment method is employed to represent the Computational Fluid Dynamics (CFD) code outputs as expected values with variances. These output quantities are used to form the objective function and constraints. The constraints are cast in probabilistic terms; that is, the probability that a constraint is satisfied is greater than or equal to some desired target probability. Gradient-based robust optimization of this stochastic problem is accomplished through use of both first and second-order sensitivity derivatives. For each robust optimization, the effect of increasing both input standard deviations and target probability of constraint satisfaction are demonstrated. This method provides a means for incorporating uncertainty when considering small deviations from input mean values.

  2. A Monocular Vision Measurement System of Three-Degree-of-Freedom Air-Bearing Test-Bed Based on FCCSP

    NASA Astrophysics Data System (ADS)

    Gao, Zhanyu; Gu, Yingying; Lv, Yaoyu; Xu, Zhenbang; Wu, Qingwen

    2018-06-01

    A monocular vision-based pose measurement system is provided for real-time measurement of a three-degree-of-freedom (3-DOF) air-bearing test-bed. Firstly, a circular plane cooperative target is designed. An image of a target fixed on the test-bed is then acquired. Blob analysis-based image processing is used to detect the object circles on the target. A fast algorithm (FCCSP) based on pixel statistics is proposed to extract the centers of object circles. Finally, pose measurements can be obtained when combined with the centers and the coordinate transformation relation. Experiments show that the proposed method is fast, accurate, and robust enough to satisfy the requirement of the pose measurement.

  3. Evaluating optimal therapy robustness by virtual expansion of a sample population, with a case study in cancer immunotherapy

    PubMed Central

    Barish, Syndi; Ochs, Michael F.; Sontag, Eduardo D.; Gevertz, Jana L.

    2017-01-01

    Cancer is a highly heterogeneous disease, exhibiting spatial and temporal variations that pose challenges for designing robust therapies. Here, we propose the VEPART (Virtual Expansion of Populations for Analyzing Robustness of Therapies) technique as a platform that integrates experimental data, mathematical modeling, and statistical analyses for identifying robust optimal treatment protocols. VEPART begins with time course experimental data for a sample population, and a mathematical model fit to aggregate data from that sample population. Using nonparametric statistics, the sample population is amplified and used to create a large number of virtual populations. At the final step of VEPART, robustness is assessed by identifying and analyzing the optimal therapy (perhaps restricted to a set of clinically realizable protocols) across each virtual population. As proof of concept, we have applied the VEPART method to study the robustness of treatment response in a mouse model of melanoma subject to treatment with immunostimulatory oncolytic viruses and dendritic cell vaccines. Our analysis (i) showed that every scheduling variant of the experimentally used treatment protocol is fragile (nonrobust) and (ii) discovered an alternative region of dosing space (lower oncolytic virus dose, higher dendritic cell dose) for which a robust optimal protocol exists. PMID:28716945

  4. Future Tense and Economic Decisions: Controlling for Cultural Evolution

    PubMed Central

    Roberts, Seán G.; Winters, James; Chen, Keith

    2015-01-01

    A previous study by Chen demonstrates a correlation between languages that grammatically mark future events and their speakers' propensity to save, even after controlling for numerous economic and demographic factors. The implication is that languages which grammatically distinguish the present and the future may bias their speakers to distinguish them psychologically, leading to less future-oriented decision making. However, Chen's original analysis assumed languages are independent. This neglects the fact that languages are related, causing correlations to appear stronger than is warranted (Galton's problem). In this paper, we test the robustness of Chen's correlations to corrections for the geographic and historical relatedness of languages. While the question seems simple, the answer is complex. In general, the statistical correlation between the two variables is weaker when controlling for relatedness. When applying the strictest tests for relatedness, and when data is not aggregated across individuals, the correlation is not significant. However, the correlation did remain reasonably robust under a number of tests. We argue that any claims of synchronic patterns between cultural variables should be tested for spurious correlations, with the kinds of approaches used in this paper. However, experiments or case-studies would be more fruitful avenues for future research on this specific topic, rather than further large-scale cross-cultural correlational studies. PMID:26186527

  5. Future Tense and Economic Decisions: Controlling for Cultural Evolution.

    PubMed

    Roberts, Seán G; Winters, James; Chen, Keith

    2015-01-01

    A previous study by Chen demonstrates a correlation between languages that grammatically mark future events and their speakers' propensity to save, even after controlling for numerous economic and demographic factors. The implication is that languages which grammatically distinguish the present and the future may bias their speakers to distinguish them psychologically, leading to less future-oriented decision making. However, Chen's original analysis assumed languages are independent. This neglects the fact that languages are related, causing correlations to appear stronger than is warranted (Galton's problem). In this paper, we test the robustness of Chen's correlations to corrections for the geographic and historical relatedness of languages. While the question seems simple, the answer is complex. In general, the statistical correlation between the two variables is weaker when controlling for relatedness. When applying the strictest tests for relatedness, and when data is not aggregated across individuals, the correlation is not significant. However, the correlation did remain reasonably robust under a number of tests. We argue that any claims of synchronic patterns between cultural variables should be tested for spurious correlations, with the kinds of approaches used in this paper. However, experiments or case-studies would be more fruitful avenues for future research on this specific topic, rather than further large-scale cross-cultural correlational studies.

  6. Quick Overview Scout 2008 Version 1.0

    EPA Science Inventory

    The Scout 2008 version 1.0 statistical software package has been updated from past DOS and Windows versions to provide classical and robust univariate and multivariate graphical and statistical methods that are not typically available in commercial or freeware statistical softwar...

  7. Spatio-temporal conditional inference and hypothesis tests for neural ensemble spiking precision

    PubMed Central

    Harrison, Matthew T.; Amarasingham, Asohan; Truccolo, Wilson

    2014-01-01

    The collective dynamics of neural ensembles create complex spike patterns with many spatial and temporal scales. Understanding the statistical structure of these patterns can help resolve fundamental questions about neural computation and neural dynamics. Spatio-temporal conditional inference (STCI) is introduced here as a semiparametric statistical framework for investigating the nature of precise spiking patterns from collections of neurons that is robust to arbitrarily complex and nonstationary coarse spiking dynamics. The main idea is to focus statistical modeling and inference, not on the full distribution of the data, but rather on families of conditional distributions of precise spiking given different types of coarse spiking. The framework is then used to develop families of hypothesis tests for probing the spatio-temporal precision of spiking patterns. Relationships among different conditional distributions are used to improve multiple hypothesis testing adjustments and to design novel Monte Carlo spike resampling algorithms. Of special note are algorithms that can locally jitter spike times while still preserving the instantaneous peri-stimulus time histogram (PSTH) or the instantaneous total spike count from a group of recorded neurons. The framework can also be used to test whether first-order maximum entropy models with possibly random and time-varying parameters can account for observed patterns of spiking. STCI provides a detailed example of the generic principle of conditional inference, which may be applicable in other areas of neurostatistical analysis. PMID:25380339

  8. False-positive results in pharmacoepidemiology and pharmacovigilance.

    PubMed

    Bezin, Julien; Bosco-Levy, Pauline; Pariente, Antoine

    2017-09-01

    False-positive constitute an important issue in scientific research. In the domain of drug evaluation, it affects all phases of drug development and assessment, from the very early preclinical studies to the late post-marketing evaluations. The core concern associated with this false-positive is the lack of replicability of the results. Aside from fraud or misconducts, false-positive is often envisioned from the statistical angle, which considers them as a price to pay for type I error in statistical testing, and its inflation in the context of multiple testing. If envisioning this problematic in the context of pharmacoepidemiology and pharmacovigilance however, that both evaluate drugs in an observational settings, information brought by statistical testing and the significance of such should only be considered as additional to the estimates provided and their confidence interval, in a context where differences have to be a clinically meaningful upon everything, and the results appear robust to the biases likely to have affected the studies. In the following article, we consequently illustrate these biases and their consequences in generating false-positive results, through studies and associations between drug use and health outcomes that have been widely disputed. Copyright © 2017 Société française de pharmacologie et de thérapeutique. Published by Elsevier Masson SAS. All rights reserved.

  9. Stochastic Integration H∞ Filter for Rapid Transfer Alignment of INS.

    PubMed

    Zhou, Dapeng; Guo, Lei

    2017-11-18

    The performance of an inertial navigation system (INS) operated on a moving base greatly depends on the accuracy of rapid transfer alignment (RTA). However, in practice, the coexistence of large initial attitude errors and uncertain observation noise statistics poses a great challenge for the estimation accuracy of misalignment angles. This study aims to develop a novel robust nonlinear filter, namely the stochastic integration H ∞ filter (SIH ∞ F) for improving both the accuracy and robustness of RTA. In this new nonlinear H ∞ filter, the stochastic spherical-radial integration rule is incorporated with the framework of the derivative-free H ∞ filter for the first time, and the resulting SIH ∞ F simultaneously attenuates the negative effect in estimations caused by significant nonlinearity and large uncertainty. Comparisons between the SIH ∞ F and previously well-known methodologies are carried out by means of numerical simulation and a van test. The results demonstrate that the newly-proposed method outperforms the cubature H ∞ filter. Moreover, the SIH ∞ F inherits the benefit of the traditional stochastic integration filter, but with more robustness in the presence of uncertainty.

  10. Robust biological parametric mapping: an improved technique for multimodal brain image analysis

    NASA Astrophysics Data System (ADS)

    Yang, Xue; Beason-Held, Lori; Resnick, Susan M.; Landman, Bennett A.

    2011-03-01

    Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, region of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrics. Recently, biological parametric mapping has extended the widely popular statistical parametric approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and robust inference in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provides a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities.

  11. State estimation and prediction using clustered particle filters.

    PubMed

    Lee, Yoonsang; Majda, Andrew J

    2016-12-20

    Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors.

  12. State estimation and prediction using clustered particle filters

    PubMed Central

    Lee, Yoonsang; Majda, Andrew J.

    2016-01-01

    Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors. PMID:27930332

  13. Field significance of performance measures in the context of regional climate model evaluation. Part 2: precipitation

    NASA Astrophysics Data System (ADS)

    Ivanov, Martin; Warrach-Sagi, Kirsten; Wulfmeyer, Volker

    2018-04-01

    A new approach for rigorous spatial analysis of the downscaling performance of regional climate model (RCM) simulations is introduced. It is based on a multiple comparison of the local tests at the grid cells and is also known as `field' or `global' significance. The block length for the local resampling tests is precisely determined to adequately account for the time series structure. New performance measures for estimating the added value of downscaled data relative to the large-scale forcing fields are developed. The methodology is exemplarily applied to a standard EURO-CORDEX hindcast simulation with the Weather Research and Forecasting (WRF) model coupled with the land surface model NOAH at 0.11 ∘ grid resolution. Daily precipitation climatology for the 1990-2009 period is analysed for Germany for winter and summer in comparison with high-resolution gridded observations from the German Weather Service. The field significance test controls the proportion of falsely rejected local tests in a meaningful way and is robust to spatial dependence. Hence, the spatial patterns of the statistically significant local tests are also meaningful. We interpret them from a process-oriented perspective. While the downscaled precipitation distributions are statistically indistinguishable from the observed ones in most regions in summer, the biases of some distribution characteristics are significant over large areas in winter. WRF-NOAH generates appropriate stationary fine-scale climate features in the daily precipitation field over regions of complex topography in both seasons and appropriate transient fine-scale features almost everywhere in summer. As the added value of global climate model (GCM)-driven simulations cannot be smaller than this perfect-boundary estimate, this work demonstrates in a rigorous manner the clear additional value of dynamical downscaling over global climate simulations. The evaluation methodology has a broad spectrum of applicability as it is distribution-free, robust to spatial dependence, and accounts for time series structure.

  14. Robust tissue-air volume segmentation of MR images based on the statistics of phase and magnitude: Its applications in the display of susceptibility-weighted imaging of the brain.

    PubMed

    Du, Yiping P; Jin, Zhaoyang

    2009-10-01

    To develop a robust algorithm for tissue-air segmentation in magnetic resonance imaging (MRI) using the statistics of phase and magnitude of the images. A multivariate measure based on the statistics of phase and magnitude was constructed for tissue-air volume segmentation. The standard deviation of first-order phase difference and the standard deviation of magnitude were calculated in a 3 x 3 x 3 kernel in the image domain. To improve differentiation accuracy, the uniformity of phase distribution in the kernel was also calculated and linear background phase introduced by field inhomogeneity was corrected. The effectiveness of the proposed volume segmentation technique was compared to a conventional approach that uses the magnitude data alone. The proposed algorithm was shown to be more effective and robust in volume segmentation in both synthetic phantom and susceptibility-weighted images of human brain. Using our proposed volume segmentation method, veins in the peripheral regions of the brain were well depicted in the minimum-intensity projection of the susceptibility-weighted images. Using the additional statistics of phase, tissue-air volume segmentation can be substantially improved compared to that using the statistics of magnitude data alone. (c) 2009 Wiley-Liss, Inc.

  15. Influence of musical training on understanding voiced and whispered speech in noise.

    PubMed

    Ruggles, Dorea R; Freyman, Richard L; Oxenham, Andrew J

    2014-01-01

    This study tested the hypothesis that the previously reported advantage of musicians over non-musicians in understanding speech in noise arises from more efficient or robust coding of periodic voiced speech, particularly in fluctuating backgrounds. Speech intelligibility was measured in listeners with extensive musical training, and in those with very little musical training or experience, using normal (voiced) or whispered (unvoiced) grammatically correct nonsense sentences in noise that was spectrally shaped to match the long-term spectrum of the speech, and was either continuous or gated with a 16-Hz square wave. Performance was also measured in clinical speech-in-noise tests and in pitch discrimination. Musicians exhibited enhanced pitch discrimination, as expected. However, no systematic or statistically significant advantage for musicians over non-musicians was found in understanding either voiced or whispered sentences in either continuous or gated noise. Musicians also showed no statistically significant advantage in the clinical speech-in-noise tests. Overall, the results provide no evidence for a significant difference between young adult musicians and non-musicians in their ability to understand speech in noise.

  16. Robust Combining of Disparate Classifiers Through Order Statistics

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Ghosh, Joydeep

    2001-01-01

    Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.

  17. Development of the Statistical Reasoning in Biology Concept Inventory (SRBCI)

    PubMed Central

    Deane, Thomas; Nomme, Kathy; Jeffery, Erica; Pollock, Carol; Birol, Gülnur

    2016-01-01

    We followed established best practices in concept inventory design and developed a 12-item inventory to assess student ability in statistical reasoning in biology (Statistical Reasoning in Biology Concept Inventory [SRBCI]). It is important to assess student thinking in this conceptual area, because it is a fundamental requirement of being statistically literate and associated skills are needed in almost all walks of life. Despite this, previous work shows that non–expert-like thinking in statistical reasoning is common, even after instruction. As science educators, our goal should be to move students along a novice-to-expert spectrum, which could be achieved with growing experience in statistical reasoning. We used item response theory analyses (the one-parameter Rasch model and associated analyses) to assess responses gathered from biology students in two populations at a large research university in Canada in order to test SRBCI’s robustness and sensitivity in capturing useful data relating to the students’ conceptual ability in statistical reasoning. Our analyses indicated that SRBCI is a unidimensional construct, with items that vary widely in difficulty and provide useful information about such student ability. SRBCI should be useful as a diagnostic tool in a variety of biology settings and as a means of measuring the success of teaching interventions designed to improve statistical reasoning skills. PMID:26903497

  18. Robust and efficient method for matching features in omnidirectional images

    NASA Astrophysics Data System (ADS)

    Zhu, Qinyi; Zhang, Zhijiang; Zeng, Dan

    2018-04-01

    Binary descriptors have been widely used in many real-time applications due to their efficiency. These descriptors are commonly designed for perspective images but perform poorly on omnidirectional images, which are severely distorted. To address this issue, this paper proposes tangent plane BRIEF (TPBRIEF) and adapted log polar grid-based motion statistics (ALPGMS). TPBRIEF projects keypoints to a unit sphere and applies the fixed test set in BRIEF descriptor on the tangent plane of the unit sphere. The fixed test set is then backprojected onto the original distorted images to construct the distortion invariant descriptor. TPBRIEF directly enables keypoint detecting and feature describing on original distorted images, whereas other approaches correct the distortion through image resampling, which introduces artifacts and adds time cost. With ALPGMS, omnidirectional images are divided into circular arches named adapted log polar grids. Whether a match is true or false is then determined by simply thresholding the match numbers in a grid pair where the two matched points located. Experiments show that TPBRIEF greatly improves the feature matching accuracy and ALPGMS robustly removes wrong matches. Our proposed method outperforms the state-of-the-art methods.

  19. A robust bayesian estimate of the concordance correlation coefficient.

    PubMed

    Feng, Dai; Baumgartner, Richard; Svetnik, Vladimir

    2015-01-01

    A need for assessment of agreement arises in many situations including statistical biomarker qualification or assay or method validation. Concordance correlation coefficient (CCC) is one of the most popular scaled indices reported in evaluation of agreement. Robust methods for CCC estimation currently present an important statistical challenge. Here, we propose a novel Bayesian method of robust estimation of CCC based on multivariate Student's t-distribution and compare it with its alternatives. Furthermore, we extend the method to practically relevant settings, enabling incorporation of confounding covariates and replications. The superiority of the new approach is demonstrated using simulation as well as real datasets from biomarker application in electroencephalography (EEG). This biomarker is relevant in neuroscience for development of treatments for insomnia.

  20. Design of a factorial experiment with randomization restrictions to assess medical device performance on vascular tissue.

    PubMed

    Diestelkamp, Wiebke S; Krane, Carissa M; Pinnell, Margaret F

    2011-05-20

    Energy-based surgical scalpels are designed to efficiently transect and seal blood vessels using thermal energy to promote protein denaturation and coagulation. Assessment and design improvement of ultrasonic scalpel performance relies on both in vivo and ex vivo testing. The objective of this work was to design and implement a robust, experimental test matrix with randomization restrictions and predictive statistical power, which allowed for identification of those experimental variables that may affect the quality of the seal obtained ex vivo. The design of the experiment included three factors: temperature (two levels); the type of solution used to perfuse the artery during transection (three types); and artery type (two types) resulting in a total of twelve possible treatment combinations. Burst pressures of porcine carotid and renal arteries sealed ex vivo were assigned as the response variable. The experimental test matrix was designed and carried out as a split-plot experiment in order to assess the contributions of several variables and their interactions while accounting for randomization restrictions present in the experimental setup. The statistical software package SAS was utilized and PROC MIXED was used to account for the randomization restrictions in the split-plot design. The combination of temperature, solution, and vessel type had a statistically significant impact on seal quality. The design and implementation of a split-plot experimental test-matrix provided a mechanism for addressing the existing technical randomization restrictions of ex vivo ultrasonic scalpel performance testing, while preserving the ability to examine the potential effects of independent factors or variables. This method for generating the experimental design and the statistical analyses of the resulting data are adaptable to a wide variety of experimental problems involving large-scale tissue-based studies of medical or experimental device efficacy and performance.

  1. MIDAS robust trend estimator for accurate GPS station velocities without step detection

    PubMed Central

    Kreemer, Corné; Hammond, William C.; Gazeaux, Julien

    2016-01-01

    Abstract Automatic estimation of velocities from GPS coordinate time series is becoming required to cope with the exponentially increasing flood of available data, but problems detectable to the human eye are often overlooked. This motivates us to find an automatic and accurate estimator of trend that is resistant to common problems such as step discontinuities, outliers, seasonality, skewness, and heteroscedasticity. Developed here, Median Interannual Difference Adjusted for Skewness (MIDAS) is a variant of the Theil‐Sen median trend estimator, for which the ordinary version is the median of slopes vij = (xj–xi)/(tj–ti) computed between all data pairs i > j. For normally distributed data, Theil‐Sen and least squares trend estimates are statistically identical, but unlike least squares, Theil‐Sen is resistant to undetected data problems. To mitigate both seasonality and step discontinuities, MIDAS selects data pairs separated by 1 year. This condition is relaxed for time series with gaps so that all data are used. Slopes from data pairs spanning a step function produce one‐sided outliers that can bias the median. To reduce bias, MIDAS removes outliers and recomputes the median. MIDAS also computes a robust and realistic estimate of trend uncertainty. Statistical tests using GPS data in the rigid North American plate interior show ±0.23 mm/yr root‐mean‐square (RMS) accuracy in horizontal velocity. In blind tests using synthetic data, MIDAS velocities have an RMS accuracy of ±0.33 mm/yr horizontal, ±1.1 mm/yr up, with a 5th percentile range smaller than all 20 automatic estimators tested. Considering its general nature, MIDAS has the potential for broader application in the geosciences. PMID:27668140

  2. MIDAS robust trend estimator for accurate GPS station velocities without step detection.

    PubMed

    Blewitt, Geoffrey; Kreemer, Corné; Hammond, William C; Gazeaux, Julien

    2016-03-01

    Automatic estimation of velocities from GPS coordinate time series is becoming required to cope with the exponentially increasing flood of available data, but problems detectable to the human eye are often overlooked. This motivates us to find an automatic and accurate estimator of trend that is resistant to common problems such as step discontinuities, outliers, seasonality, skewness, and heteroscedasticity. Developed here, Median Interannual Difference Adjusted for Skewness (MIDAS) is a variant of the Theil-Sen median trend estimator, for which the ordinary version is the median of slopes v ij  = ( x j -x i )/( t j -t i ) computed between all data pairs i  >  j . For normally distributed data, Theil-Sen and least squares trend estimates are statistically identical, but unlike least squares, Theil-Sen is resistant to undetected data problems. To mitigate both seasonality and step discontinuities, MIDAS selects data pairs separated by 1 year. This condition is relaxed for time series with gaps so that all data are used. Slopes from data pairs spanning a step function produce one-sided outliers that can bias the median. To reduce bias, MIDAS removes outliers and recomputes the median. MIDAS also computes a robust and realistic estimate of trend uncertainty. Statistical tests using GPS data in the rigid North American plate interior show ±0.23 mm/yr root-mean-square (RMS) accuracy in horizontal velocity. In blind tests using synthetic data, MIDAS velocities have an RMS accuracy of ±0.33 mm/yr horizontal, ±1.1 mm/yr up, with a 5th percentile range smaller than all 20 automatic estimators tested. Considering its general nature, MIDAS has the potential for broader application in the geosciences.

  3. Synthetic CO, H2 and H I surveys of the second galactic quadrant, and the properties of molecular gas

    NASA Astrophysics Data System (ADS)

    Duarte-Cabral, A.; Acreman, D. M.; Dobbs, C. L.; Mottram, J. C.; Gibson, S. J.; Brunt, C. M.; Douglas, K. A.

    2015-03-01

    We present CO, H2, H I and HISA (H I self-absorption) distributions from a set of simulations of grand design spirals including stellar feedback, self-gravity, heating and cooling. We replicate the emission of the second galactic quadrant by placing the observer inside the modelled galaxies and post-process the simulations using a radiative transfer code, so as to create synthetic observations. We compare the synthetic data cubes to observations of the second quadrant of the Milky Way to test the ability of the current models to reproduce the basic chemistry of the Galactic interstellar medium (ISM), as well as to test how sensitive such galaxy models are to different recipes of chemistry and/or feedback. We find that models which include feedback and self-gravity can reproduce the production of CO with respect to H2 as observed in our Galaxy, as well as the distribution of the material perpendicular to the Galactic plane. While changes in the chemistry/feedback recipes do not have a huge impact on the statistical properties of the chemistry in the simulated galaxies, we find that the inclusion of both feedback and self-gravity are crucial ingredients, as our test without feedback failed to reproduce all of the observables. Finally, even though the transition from H2 to CO seems to be robust, we find that all models seem to underproduce molecular gas, and have a lower molecular to atomic gas fraction than is observed. Nevertheless, our fiducial model with feedback and self-gravity has shown to be robust in reproducing the statistical properties of the basic molecular gas components of the ISM in our Galaxy.

  4. MIDAS robust trend estimator for accurate GPS station velocities without step detection

    NASA Astrophysics Data System (ADS)

    Blewitt, Geoffrey; Kreemer, Corné; Hammond, William C.; Gazeaux, Julien

    2016-03-01

    Automatic estimation of velocities from GPS coordinate time series is becoming required to cope with the exponentially increasing flood of available data, but problems detectable to the human eye are often overlooked. This motivates us to find an automatic and accurate estimator of trend that is resistant to common problems such as step discontinuities, outliers, seasonality, skewness, and heteroscedasticity. Developed here, Median Interannual Difference Adjusted for Skewness (MIDAS) is a variant of the Theil-Sen median trend estimator, for which the ordinary version is the median of slopes vij = (xj-xi)/(tj-ti) computed between all data pairs i > j. For normally distributed data, Theil-Sen and least squares trend estimates are statistically identical, but unlike least squares, Theil-Sen is resistant to undetected data problems. To mitigate both seasonality and step discontinuities, MIDAS selects data pairs separated by 1 year. This condition is relaxed for time series with gaps so that all data are used. Slopes from data pairs spanning a step function produce one-sided outliers that can bias the median. To reduce bias, MIDAS removes outliers and recomputes the median. MIDAS also computes a robust and realistic estimate of trend uncertainty. Statistical tests using GPS data in the rigid North American plate interior show ±0.23 mm/yr root-mean-square (RMS) accuracy in horizontal velocity. In blind tests using synthetic data, MIDAS velocities have an RMS accuracy of ±0.33 mm/yr horizontal, ±1.1 mm/yr up, with a 5th percentile range smaller than all 20 automatic estimators tested. Considering its general nature, MIDAS has the potential for broader application in the geosciences.

  5. Identification of robust statistical downscaling methods based on a comprehensive suite of performance metrics for South Korea

    NASA Astrophysics Data System (ADS)

    Eum, H. I.; Cannon, A. J.

    2015-12-01

    Climate models are a key provider to investigate impacts of projected future climate conditions on regional hydrologic systems. However, there is a considerable mismatch of spatial resolution between GCMs and regional applications, in particular a region characterized by complex terrain such as Korean peninsula. Therefore, a downscaling procedure is an essential to assess regional impacts of climate change. Numerous statistical downscaling methods have been used mainly due to the computational efficiency and simplicity. In this study, four statistical downscaling methods [Bias-Correction/Spatial Disaggregation (BCSD), Bias-Correction/Constructed Analogue (BCCA), Multivariate Adaptive Constructed Analogs (MACA), and Bias-Correction/Climate Imprint (BCCI)] are applied to downscale the latest Climate Forecast System Reanalysis data to stations for precipitation, maximum temperature, and minimum temperature over South Korea. By split sampling scheme, all methods are calibrated with observational station data for 19 years from 1973 to 1991 are and tested for the recent 19 years from 1992 to 2010. To assess skill of the downscaling methods, we construct a comprehensive suite of performance metrics that measure an ability of reproducing temporal correlation, distribution, spatial correlation, and extreme events. In addition, we employ Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) to identify robust statistical downscaling methods based on the performance metrics for each season. The results show that downscaling skill is considerably affected by the skill of CFSR and all methods lead to large improvements in representing all performance metrics. According to seasonal performance metrics evaluated, when TOPSIS is applied, MACA is identified as the most reliable and robust method for all variables and seasons. Note that such result is derived from CFSR output which is recognized as near perfect climate data in climate studies. Therefore, the ranking of this study may be changed when various GCMs are downscaled and evaluated. Nevertheless, it may be informative for end-users (i.e. modelers or water resources managers) to understand and select more suitable downscaling methods corresponding to priorities on regional applications.

  6. An adaptive two-stage dose-response design method for establishing proof of concept.

    PubMed

    Franchetti, Yoko; Anderson, Stewart J; Sampson, Allan R

    2013-01-01

    We propose an adaptive two-stage dose-response design where a prespecified adaptation rule is used to add and/or drop treatment arms between the stages. We extend the multiple comparison procedures-modeling (MCP-Mod) approach into a two-stage design. In each stage, we use the same set of candidate dose-response models and test for a dose-response relationship or proof of concept (PoC) via model-associated statistics. The stage-wise test results are then combined to establish "global" PoC using a conditional error function. Our simulation studies showed good and more robust power in our design method compared to conventional and fixed designs.

  7. OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis.

    PubMed

    Verzotto, Davide; M Teo, Audrey S; Hillmer, Axel M; Nagarajan, Niranjan

    2016-01-01

    Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences. We introduce a novel seed-and-extend glocal (short for global-local) alignment method, OPTIMA (and a sliding-window extension for overlap alignment, OPTIMA-Overlap), which is the first to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic with respect to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests. We show that OPTIMA and OPTIMA-Overlap outperform other state-of-the-art approaches (1.6-2 times more sensitive) and are more efficient (170-200 %) and precise in their alignments (nearly 99 % precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust, provide improved sensitivity and guarantee high precision.

  8. Improving Electronic Sensor Reliability by Robust Outlier Screening

    PubMed Central

    Moreno-Lizaranzu, Manuel J.; Cuesta, Federico

    2013-01-01

    Electronic sensors are widely used in different application areas, and in some of them, such as automotive or medical equipment, they must perform with an extremely low defect rate. Increasing reliability is paramount. Outlier detection algorithms are a key component in screening latent defects and decreasing the number of customer quality incidents (CQIs). This paper focuses on new spatial algorithms (Good Die in a Bad Cluster with Statistical Bins (GDBC SB) and Bad Bin in a Bad Cluster (BBBC)) and an advanced outlier screening method, called Robust Dynamic Part Averaging Testing (RDPAT), as well as two practical improvements, which significantly enhance existing algorithms. Those methods have been used in production in Freescale® Semiconductor probe factories around the world for several years. Moreover, a study was conducted with production data of 289,080 dice with 26 CQIs to determine and compare the efficiency and effectiveness of all these algorithms in identifying CQIs. PMID:24113682

  9. Improving electronic sensor reliability by robust outlier screening.

    PubMed

    Moreno-Lizaranzu, Manuel J; Cuesta, Federico

    2013-10-09

    Electronic sensors are widely used in different application areas, and in some of them, such as automotive or medical equipment, they must perform with an extremely low defect rate. Increasing reliability is paramount. Outlier detection algorithms are a key component in screening latent defects and decreasing the number of customer quality incidents (CQIs). This paper focuses on new spatial algorithms (Good Die in a Bad Cluster with Statistical Bins (GDBC SB) and Bad Bin in a Bad Cluster (BBBC)) and an advanced outlier screening method, called Robust Dynamic Part Averaging Testing (RDPAT), as well as two practical improvements, which significantly enhance existing algorithms. Those methods have been used in production in Freescale® Semiconductor probe factories around the world for several years. Moreover, a study was conducted with production data of 289,080 dice with 26 CQIs to determine and compare the efficiency and effectiveness of all these algorithms in identifying CQIs.

  10. Accelerated Enveloping Distribution Sampling: Enabling Sampling of Multiple End States while Preserving Local Energy Minima.

    PubMed

    Perthold, Jan Walther; Oostenbrink, Chris

    2018-05-17

    Enveloping distribution sampling (EDS) is an efficient approach to calculate multiple free-energy differences from a single molecular dynamics (MD) simulation. However, the construction of an appropriate reference-state Hamiltonian that samples all states efficiently is not straightforward. We propose a novel approach for the construction of the EDS reference-state Hamiltonian, related to a previously described procedure to smoothen energy landscapes. In contrast to previously suggested EDS approaches, our reference-state Hamiltonian preserves local energy minima of the combined end-states. Moreover, we propose an intuitive, robust and efficient parameter optimization scheme to tune EDS Hamiltonian parameters. We demonstrate the proposed method with established and novel test systems and conclude that our approach allows for the automated calculation of multiple free-energy differences from a single simulation. Accelerated EDS promises to be a robust and user-friendly method to compute free-energy differences based on solid statistical mechanics.

  11. Nowcasting and Forecasting the Monthly Food Stamps Data in the US Using Online Search Data

    PubMed Central

    Fantazzini, Dean

    2014-01-01

    We propose the use of Google online search data for nowcasting and forecasting the number of food stamps recipients. We perform a large out-of-sample forecasting exercise with almost 3000 competing models with forecast horizons up to 2 years ahead, and we show that models including Google search data statistically outperform the competing models at all considered horizons. These results hold also with several robustness checks, considering alternative keywords, a falsification test, different out-of-samples, directional accuracy and forecasts at the state-level. PMID:25369315

  12. A Simple Simulation Technique for Nonnormal Data with Prespecified Skewness, Kurtosis, and Covariance Matrix.

    PubMed

    Foldnes, Njål; Olsson, Ulf Henning

    2016-01-01

    We present and investigate a simple way to generate nonnormal data using linear combinations of independent generator (IG) variables. The simulated data have prespecified univariate skewness and kurtosis and a given covariance matrix. In contrast to the widely used Vale-Maurelli (VM) transform, the obtained data are shown to have a non-Gaussian copula. We analytically obtain asymptotic robustness conditions for the IG distribution. We show empirically that popular test statistics in covariance analysis tend to reject true models more often under the IG transform than under the VM transform. This implies that overly optimistic evaluations of estimators and fit statistics in covariance structure analysis may be tempered by including the IG transform for nonnormal data generation. We provide an implementation of the IG transform in the R environment.

  13. Scout 2008 Version 1.0 User Guide

    EPA Science Inventory

    The Scout 2008 version 1.0 software package provides a wide variety of classical and robust statistical methods that are not typically available in other commercial software packages. A major part of Scout deals with classical, robust, and resistant univariate and multivariate ou...

  14. Social and economic sustainability of urban systems: comparative analysis of metropolitan statistical areas in Ohio, USA

    EPA Science Inventory

    This article presents a general and versatile methodology for assessing sustainability with Fisher Information as a function of dynamic changes in urban systems. Using robust statistical methods, six Metropolitan Statistical Areas (MSAs) in Ohio were evaluated to comparatively as...

  15. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model

    PubMed Central

    Lartillot, Nicolas; Brinkmann, Henner; Philippe, Hervé

    2007-01-01

    Background Thanks to the large amount of signal contained in genome-wide sequence alignments, phylogenomic analyses are converging towards highly supported trees. However, high statistical support does not imply that the tree is accurate. Systematic errors, such as the Long Branch Attraction (LBA) artefact, can be misleading, in particular when the taxon sampling is poor, or the outgroup is distant. In an otherwise consistent probabilistic framework, systematic errors in genome-wide analyses can be traced back to model mis-specification problems, which suggests that better models of sequence evolution should be devised, that would be more robust to tree reconstruction artefacts, even under the most challenging conditions. Methods We focus on a well characterized LBA artefact analyzed in a previous phylogenomic study of the metazoan tree, in which two fast-evolving animal phyla, nematodes and platyhelminths, emerge either at the base of all other Bilateria, or within protostomes, depending on the outgroup. We use this artefactual result as a case study for comparing the robustness of two alternative models: a standard, site-homogeneous model, based on an empirical matrix of amino-acid replacement (WAG), and a site-heterogeneous mixture model (CAT). In parallel, we propose a posterior predictive test, allowing one to measure how well a model acknowledges sequence saturation. Results Adopting a Bayesian framework, we show that the LBA artefact observed under WAG disappears when the site-heterogeneous model CAT is used. Using cross-validation, we further demonstrate that CAT has a better statistical fit than WAG on this data set. Finally, using our statistical goodness-of-fit test, we show that CAT, but not WAG, correctly accounts for the overall level of saturation, and that this is due to a better estimation of site-specific amino-acid preferences. Conclusion The CAT model appears to be more robust than WAG against LBA artefacts, essentially because it correctly anticipates the high probability of convergences and reversions implied by the small effective size of the amino-acid alphabet at each site of the alignment. More generally, our results provide strong evidence that site-specificities in the substitution process need be accounted for in order to obtain more reliable phylogenetic trees. PMID:17288577

  16. Robust tracking control of a magnetically suspended rigid body

    NASA Technical Reports Server (NTRS)

    Lim, Kyong B.; Cox, David E.

    1994-01-01

    This study is an application of H-infinity and micro-synthesis for designing robust tracking controllers for the Large Angle Magnetic Suspension Test Facility. The modeling, design, analysis, simulation, and testing of a control law that guarantees tracking performance under external disturbances and model uncertainties is investigated. The type of uncertainties considered and the tracking performance metric used is discussed. This study demonstrates the tradeoff between tracking performance at low frequencies and robustness at high frequencies. Two sets of controllers were designed and tested. The first set emphasized performance over robustness, while the second set traded off performance for robustness. Comparisons of simulation and test results are also included. Current simulation and experimental results indicate that reasonably good robust tracking performance can be attained for this system using multivariable robust control approach.

  17. STATISTICAL SAMPLING AND DATA ANALYSIS

    EPA Science Inventory

    Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other hetero...

  18. Avoid lost discoveries, because of violations of standard assumptions, by using modern robust statistical methods.

    PubMed

    Wilcox, Rand; Carlson, Mike; Azen, Stan; Clark, Florence

    2013-03-01

    Recently, there have been major advances in statistical techniques for assessing central tendency and measures of association. The practical utility of modern methods has been documented extensively in the statistics literature, but they remain underused and relatively unknown in clinical trials. Our objective was to address this issue. STUDY DESIGN AND PURPOSE: The first purpose was to review common problems associated with standard methodologies (low power, lack of control over type I errors, and incorrect assessments of the strength of the association). The second purpose was to summarize some modern methods that can be used to circumvent such problems. The third purpose was to illustrate the practical utility of modern robust methods using data from the Well Elderly 2 randomized controlled trial. In multiple instances, robust methods uncovered differences among groups and associations among variables that were not detected by classic techniques. In particular, the results demonstrated that details of the nature and strength of the association were sometimes overlooked when using ordinary least squares regression and Pearson correlation. Modern robust methods can make a practical difference in detecting and describing differences between groups and associations between variables. Such procedures should be applied more frequently when analyzing trial-based data. Copyright © 2013 Elsevier Inc. All rights reserved.

  19. Learning moment-based fast local binary descriptor

    NASA Astrophysics Data System (ADS)

    Bellarbi, Abdelkader; Zenati, Nadia; Otmane, Samir; Belghit, Hayet

    2017-03-01

    Recently, binary descriptors have attracted significant attention due to their speed and low memory consumption; however, using intensity differences to calculate the binary descriptive vector is not efficient enough. We propose an approach to binary description called POLAR_MOBIL, in which we perform binary tests between geometrical and statistical information using moments in the patch instead of the classical intensity binary test. In addition, we introduce a learning technique used to select an optimized set of binary tests with low correlation and high variance. This approach offers high distinctiveness against affine transformations and appearance changes. An extensive evaluation on well-known benchmark datasets reveals the robustness and the effectiveness of the proposed descriptor, as well as its good performance in terms of low computation complexity when compared with state-of-the-art real-time local descriptors.

  20. Robustness of movement models: can models bridge the gap between temporal scales of data sets and behavioural processes?

    PubMed

    Schlägel, Ulrike E; Lewis, Mark A

    2016-12-01

    Discrete-time random walks and their extensions are common tools for analyzing animal movement data. In these analyses, resolution of temporal discretization is a critical feature. Ideally, a model both mirrors the relevant temporal scale of the biological process of interest and matches the data sampling rate. Challenges arise when resolution of data is too coarse due to technological constraints, or when we wish to extrapolate results or compare results obtained from data with different resolutions. Drawing loosely on the concept of robustness in statistics, we propose a rigorous mathematical framework for studying movement models' robustness against changes in temporal resolution. In this framework, we define varying levels of robustness as formal model properties, focusing on random walk models with spatially-explicit component. With the new framework, we can investigate whether models can validly be applied to data across varying temporal resolutions and how we can account for these different resolutions in statistical inference results. We apply the new framework to movement-based resource selection models, demonstrating both analytical and numerical calculations, as well as a Monte Carlo simulation approach. While exact robustness is rare, the concept of approximate robustness provides a promising new direction for analyzing movement models.

  1. Robust Statistics: What They Are, and Why They Are So Important

    ERIC Educational Resources Information Center

    Corlu, Sencer M.

    2009-01-01

    The problem with "classical" statistics all invoking the mean is that these estimates are notoriously influenced by atypical scores (outliers), partly because the mean itself is differentially influenced by outliers. In theory, "modern" statistics may generate more replicable characterizations of data, because at least in some…

  2. Fulfilling the law of a single independent variable and improving the result of mathematical educational research

    NASA Astrophysics Data System (ADS)

    Pardimin, H.; Arcana, N.

    2018-01-01

    Many types of research in the field of mathematics education apply the Quasi-Experimental method and statistical analysis use t-test. Quasi-experiment has a weakness that is difficult to fulfil “the law of a single independent variable”. T-test also has a weakness that is a generalization of the conclusions obtained is less powerful. This research aimed to find ways to reduce the weaknesses of the Quasi-experimental method and improved the generalization of the research results. The method applied in the research was a non-interactive qualitative method, and the type was concept analysis. Concepts analysed are the concept of statistics, research methods of education, and research reports. The result represented a way to overcome the weaknesses of quasi-Experiments and T-test. In addition, the way was to apply a combination of Factorial Design and Balanced Design, which the authors refer to as Factorial-Balanced Design. The advantages of this design are: (1) almost fulfilling “the low of single independent variable” so no need to test the similarity of the academic ability, (2) the sample size of the experimental group and the control group became larger and equal; so it becomes robust to deal with violations of the assumptions of the ANOVA test.

  3. Distinguishing synchronous and time-varying synergies using point process interval statistics: motor primitives in frog and rat

    PubMed Central

    Hart, Corey B.; Giszter, Simon F.

    2013-01-01

    We present and apply a method that uses point process statistics to discriminate the forms of synergies in motor pattern data, prior to explicit synergy extraction. The method uses electromyogram (EMG) pulse peak timing or onset timing. Peak timing is preferable in complex patterns where pulse onsets may be overlapping. An interval statistic derived from the point processes of EMG peak timings distinguishes time-varying synergies from synchronous synergies (SS). Model data shows that the statistic is robust for most conditions. Its application to both frog hindlimb EMG and rat locomotion hindlimb EMG show data from these preparations is clearly most consistent with synchronous synergy models (p < 0.001). Additional direct tests of pulse and interval relations in frog data further bolster the support for synchronous synergy mechanisms in these data. Our method and analyses support separated control of rhythm and pattern of motor primitives, with the low level execution primitives comprising pulsed SS in both frog and rat, and both episodic and rhythmic behaviors. PMID:23675341

  4. Robust detection-isolation-accommodation for sensor failures

    NASA Technical Reports Server (NTRS)

    Weiss, J. L.; Pattipati, K. R.; Willsky, A. S.; Eterno, J. S.; Crawford, J. T.

    1985-01-01

    The results of a one year study to: (1) develop a theory for Robust Failure Detection and Identification (FDI) in the presence of model uncertainty, (2) develop a design methodology which utilizes the robust FDI ththeory, (3) apply the methodology to a sensor FDI problem for the F-100 jet engine, and (4) demonstrate the application of the theory to the evaluation of alternative FDI schemes are presented. Theoretical results in statistical discrimination are used to evaluate the robustness of residual signals (or parity relations) in terms of their usefulness for FDI. Furthermore, optimally robust parity relations are derived through the optimization of robustness metrics. The result is viewed as decentralization of the FDI process. A general structure for decentralized FDI is proposed and robustness metrics are used for determining various parameters of the algorithm.

  5. Biochemical methane potential (BMP) tests: Reducing test time by early parameter estimation.

    PubMed

    Da Silva, C; Astals, S; Peces, M; Campos, J L; Guerrero, L

    2018-01-01

    Biochemical methane potential (BMP) test is a key analytical technique to assess the implementation and optimisation of anaerobic biotechnologies. However, this technique is characterised by long testing times (from 20 to >100days), which is not suitable for waste utilities, consulting companies or plants operators whose decision-making processes cannot be held for such a long time. This study develops a statistically robust mathematical strategy using sensitivity functions for early prediction of BMP first-order model parameters, i.e. methane yield (B 0 ) and kinetic constant rate (k). The minimum testing time for early parameter estimation showed a potential correlation with the k value, where (i) slowly biodegradable substrates (k≤0.1d -1 ) have a minimum testing times of ≥15days, (ii) moderately biodegradable substrates (0.1

  6. Enrichment analysis in high-throughput genomics - accounting for dependency in the NULL.

    PubMed

    Gold, David L; Coombes, Kevin R; Wang, Jing; Mallick, Bani

    2007-03-01

    Translating the overwhelming amount of data generated in high-throughput genomics experiments into biologically meaningful evidence, which may for example point to a series of biomarkers or hint at a relevant pathway, is a matter of great interest in bioinformatics these days. Genes showing similar experimental profiles, it is hypothesized, share biological mechanisms that if understood could provide clues to the molecular processes leading to pathological events. It is the topic of further study to learn if or how a priori information about the known genes may serve to explain coexpression. One popular method of knowledge discovery in high-throughput genomics experiments, enrichment analysis (EA), seeks to infer if an interesting collection of genes is 'enriched' for a Consortium particular set of a priori Gene Ontology Consortium (GO) classes. For the purposes of statistical testing, the conventional methods offered in EA software implicitly assume independence between the GO classes. Genes may be annotated for more than one biological classification, and therefore the resulting test statistics of enrichment between GO classes can be highly dependent if the overlapping gene sets are relatively large. There is a need to formally determine if conventional EA results are robust to the independence assumption. We derive the exact null distribution for testing enrichment of GO classes by relaxing the independence assumption using well-known statistical theory. In applications with publicly available data sets, our test results are similar to the conventional approach which assumes independence. We argue that the independence assumption is not detrimental.

  7. Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection

    NASA Astrophysics Data System (ADS)

    Karjanto, Suryaefiza; Ramli, Norazan Mohamed; Ghani, Nor Azura Md; Aripin, Rasimah; Yusop, Noorezatty Mohd

    2015-02-01

    Microarray involves of placing an orderly arrangement of thousands of gene sequences in a grid on a suitable surface. The technology has made a novelty discovery since its development and obtained an increasing attention among researchers. The widespread of microarray technology is largely due to its ability to perform simultaneous analysis of thousands of genes in a massively parallel manner in one experiment. Hence, it provides valuable knowledge on gene interaction and function. The microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints. Therefore, the sample covariance matrix in Hotelling's T2 statistic is not positive definite and become singular, thus it cannot be inverted. In this research, the Hotelling's T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The use of shrinkage covariance matrix overcomes the singularity problem by converting an unbiased to an improved biased estimator of covariance matrix. Robust trimmed mean is integrated into the shrinkage matrix to reduce the influence of outliers and consequently increases its efficiency. The performance of the proposed method is measured using several simulation designs. The results are expected to outperform existing techniques in many tested conditions.

  8. Impact of genotyping errors on statistical power of association tests in genomic analyses: A case study

    PubMed Central

    Hou, Lin; Sun, Ning; Mane, Shrikant; Sayward, Fred; Rajeevan, Nallakkandi; Cheung, Kei-Hoi; Cho, Kelly; Pyarajan, Saiju; Aslan, Mihaela; Miller, Perry; Harvey, Philip D.; Gaziano, J. Michael; Concato, John; Zhao, Hongyu

    2017-01-01

    A key step in genomic studies is to assess high throughput measurements across millions of markers for each participant’s DNA, either using microarrays or sequencing techniques. Accurate genotype calling is essential for downstream statistical analysis of genotype-phenotype associations, and next generation sequencing (NGS) has recently become a more common approach in genomic studies. How the accuracy of variant calling in NGS-based studies affects downstream association analysis has not, however, been studied using empirical data in which both microarrays and NGS were available. In this article, we investigate the impact of variant calling errors on the statistical power to identify associations between single nucleotides and disease, and on associations between multiple rare variants and disease. Both differential and nondifferential genotyping errors are considered. Our results show that the power of burden tests for rare variants is strongly influenced by the specificity in variant calling, but is rather robust with regard to sensitivity. By using the variant calling accuracies estimated from a substudy of a Cooperative Studies Program project conducted by the Department of Veterans Affairs, we show that the power of association tests is mostly retained with commonly adopted variant calling pipelines. An R package, GWAS.PC, is provided to accommodate power analysis that takes account of genotyping errors (http://zhaocenter.org/software/). PMID:28019059

  9. A Statistical Method to Distinguish Functional Brain Networks

    PubMed Central

    Fujita, André; Vidal, Maciel C.; Takahashi, Daniel Y.

    2017-01-01

    One major problem in neuroscience is the comparison of functional brain networks of different populations, e.g., distinguishing the networks of controls and patients. Traditional algorithms are based on search for isomorphism between networks, assuming that they are deterministic. However, biological networks present randomness that cannot be well modeled by those algorithms. For instance, functional brain networks of distinct subjects of the same population can be different due to individual characteristics. Moreover, networks of subjects from different populations can be generated through the same stochastic process. Thus, a better hypothesis is that networks are generated by random processes. In this case, subjects from the same group are samples from the same random process, whereas subjects from different groups are generated by distinct processes. Using this idea, we developed a statistical test called ANOGVA to test whether two or more populations of graphs are generated by the same random graph model. Our simulations' results demonstrate that we can precisely control the rate of false positives and that the test is powerful to discriminate random graphs generated by different models and parameters. The method also showed to be robust for unbalanced data. As an example, we applied ANOGVA to an fMRI dataset composed of controls and patients diagnosed with autism or Asperger. ANOGVA identified the cerebellar functional sub-network as statistically different between controls and autism (p < 0.001). PMID:28261045

  10. A Statistical Method to Distinguish Functional Brain Networks.

    PubMed

    Fujita, André; Vidal, Maciel C; Takahashi, Daniel Y

    2017-01-01

    One major problem in neuroscience is the comparison of functional brain networks of different populations, e.g., distinguishing the networks of controls and patients. Traditional algorithms are based on search for isomorphism between networks, assuming that they are deterministic. However, biological networks present randomness that cannot be well modeled by those algorithms. For instance, functional brain networks of distinct subjects of the same population can be different due to individual characteristics. Moreover, networks of subjects from different populations can be generated through the same stochastic process. Thus, a better hypothesis is that networks are generated by random processes. In this case, subjects from the same group are samples from the same random process, whereas subjects from different groups are generated by distinct processes. Using this idea, we developed a statistical test called ANOGVA to test whether two or more populations of graphs are generated by the same random graph model. Our simulations' results demonstrate that we can precisely control the rate of false positives and that the test is powerful to discriminate random graphs generated by different models and parameters. The method also showed to be robust for unbalanced data. As an example, we applied ANOGVA to an fMRI dataset composed of controls and patients diagnosed with autism or Asperger. ANOGVA identified the cerebellar functional sub-network as statistically different between controls and autism ( p < 0.001).

  11. Robustness of fit indices to outliers and leverage observations in structural equation modeling.

    PubMed

    Yuan, Ke-Hai; Zhong, Xiaoling

    2013-06-01

    Normal-distribution-based maximum likelihood (NML) is the most widely used method in structural equation modeling (SEM), although practical data tend to be nonnormally distributed. The effect of nonnormally distributed data or data contamination on the normal-distribution-based likelihood ratio (LR) statistic is well understood due to many analytical and empirical studies. In SEM, fit indices are used as widely as the LR statistic. In addition to NML, robust procedures have been developed for more efficient and less biased parameter estimates with practical data. This article studies the effect of outliers and leverage observations on fit indices following NML and two robust methods. Analysis and empirical results indicate that good leverage observations following NML and one of the robust methods lead most fit indices to give more support to the substantive model. While outliers tend to make a good model superficially bad according to many fit indices following NML, they have little effect on those following the two robust procedures. Implications of the results to data analysis are discussed, and recommendations are provided regarding the use of estimation methods and interpretation of fit indices. (PsycINFO Database Record (c) 2013 APA, all rights reserved).

  12. Accurate and robust genomic prediction of celiac disease using statistical learning.

    PubMed

    Abraham, Gad; Tye-Din, Jason A; Bhalala, Oneil G; Kowalczyk, Adam; Zobel, Justin; Inouye, Michael

    2014-02-01

    Practical application of genomic-based risk stratification to clinical diagnosis is appealing yet performance varies widely depending on the disease and genomic risk score (GRS) method. Celiac disease (CD), a common immune-mediated illness, is strongly genetically determined and requires specific HLA haplotypes. HLA testing can exclude diagnosis but has low specificity, providing little information suitable for clinical risk stratification. Using six European cohorts, we provide a proof-of-concept that statistical learning approaches which simultaneously model all SNPs can generate robust and highly accurate predictive models of CD based on genome-wide SNP profiles. The high predictive capacity replicated both in cross-validation within each cohort (AUC of 0.87-0.89) and in independent replication across cohorts (AUC of 0.86-0.9), despite differences in ethnicity. The models explained 30-35% of disease variance and up to ∼43% of heritability. The GRS's utility was assessed in different clinically relevant settings. Comparable to HLA typing, the GRS can be used to identify individuals without CD with ≥99.6% negative predictive value however, unlike HLA typing, fine-scale stratification of individuals into categories of higher-risk for CD can identify those that would benefit from more invasive and costly definitive testing. The GRS is flexible and its performance can be adapted to the clinical situation by adjusting the threshold cut-off. Despite explaining a minority of disease heritability, our findings indicate a genomic risk score provides clinically relevant information to improve upon current diagnostic pathways for CD and support further studies evaluating the clinical utility of this approach in CD and other complex diseases.

  13. Manufacturing Execution Systems: Examples of Performance Indicator and Operational Robustness Tools.

    PubMed

    Gendre, Yannick; Waridel, Gérard; Guyon, Myrtille; Demuth, Jean-François; Guelpa, Hervé; Humbert, Thierry

    Manufacturing Execution Systems (MES) are computerized systems used to measure production performance in terms of productivity, yield, and quality. In the first part, performance indicator and overall equipment effectiveness (OEE), process robustness tools and statistical process control are described. The second part details some tools to help process robustness and control by operators by preventing deviations from target control charts. MES was developed by Syngenta together with CIMO for automation.

  14. Testing the applicability of a benthic foraminiferal-based transfer function for the reconstruction of paleowater depth changes in Rhodes (Greece) during the early Pleistocene.

    PubMed

    Milker, Yvonne; Weinkauf, Manuel F G; Titschack, Jürgen; Freiwald, Andre; Krüger, Stefan; Jorissen, Frans J; Schmiedl, Gerhard

    2017-01-01

    We present paleo-water depth reconstructions for the Pefka E section deposited on the island of Rhodes (Greece) during the early Pleistocene. For these reconstructions, a transfer function (TF) using modern benthic foraminifera surface samples from the Adriatic and Western Mediterranean Seas has been developed. The TF model gives an overall predictive accuracy of ~50 m over a water depth range of ~1200 m. Two separate TF models for shallower and deeper water depth ranges indicate a good predictive accuracy of 9 m for shallower water depths (0-200 m) but far less accuracy of 130 m for deeper water depths (200-1200 m) due to uneven sampling along the water depth gradient. To test the robustness of the TF, we randomly selected modern samples to develop random TFs, showing that the model is robust for water depths between 20 and 850 m while greater water depths are underestimated. We applied the TF to the Pefka E fossil data set. The goodness-of-fit statistics showed that most fossil samples have a poor to extremely poor fit to water depth. We interpret this as a consequence of a lack of modern analogues for the fossil samples and removed all samples with extremely poor fit. To test the robustness and significance of the reconstructions, we compared them to reconstructions from an alternative TF model based on the modern analogue technique and applied the randomization TF test. We found our estimates to be robust and significant at the 95% confidence level, but we also observed that our estimates are strongly overprinted by orbital, precession-driven changes in paleo-productivity and corrected our estimates by filtering out the precession-related component. We compared our corrected record to reconstructions based on a modified plankton/benthos (P/B) ratio, excluding infaunal species, and to stable oxygen isotope data from the same section, as well as to paleo-water depth estimates for the Lindos Bay Formation of other sediment sections of Rhodes. These comparisons indicate that our orbital-corrected reconstructions are reasonable and reflect major tectonic movements of Rhodes during the early Pleistocene.

  15. Testing the applicability of a benthic foraminiferal-based transfer function for the reconstruction of paleowater depth changes in Rhodes (Greece) during the early Pleistocene

    PubMed Central

    Weinkauf, Manuel F. G.; Titschack, Jürgen; Freiwald, Andre; Krüger, Stefan; Jorissen, Frans J.; Schmiedl, Gerhard

    2017-01-01

    We present paleo-water depth reconstructions for the Pefka E section deposited on the island of Rhodes (Greece) during the early Pleistocene. For these reconstructions, a transfer function (TF) using modern benthic foraminifera surface samples from the Adriatic and Western Mediterranean Seas has been developed. The TF model gives an overall predictive accuracy of ~50 m over a water depth range of ~1200 m. Two separate TF models for shallower and deeper water depth ranges indicate a good predictive accuracy of 9 m for shallower water depths (0–200 m) but far less accuracy of 130 m for deeper water depths (200–1200 m) due to uneven sampling along the water depth gradient. To test the robustness of the TF, we randomly selected modern samples to develop random TFs, showing that the model is robust for water depths between 20 and 850 m while greater water depths are underestimated. We applied the TF to the Pefka E fossil data set. The goodness-of-fit statistics showed that most fossil samples have a poor to extremely poor fit to water depth. We interpret this as a consequence of a lack of modern analogues for the fossil samples and removed all samples with extremely poor fit. To test the robustness and significance of the reconstructions, we compared them to reconstructions from an alternative TF model based on the modern analogue technique and applied the randomization TF test. We found our estimates to be robust and significant at the 95% confidence level, but we also observed that our estimates are strongly overprinted by orbital, precession-driven changes in paleo-productivity and corrected our estimates by filtering out the precession-related component. We compared our corrected record to reconstructions based on a modified plankton/benthos (P/B) ratio, excluding infaunal species, and to stable oxygen isotope data from the same section, as well as to paleo-water depth estimates for the Lindos Bay Formation of other sediment sections of Rhodes. These comparisons indicate that our orbital-corrected reconstructions are reasonable and reflect major tectonic movements of Rhodes during the early Pleistocene. PMID:29166653

  16. Practical robustness measures in multivariable control system analysis. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Lehtomaki, N. A.

    1981-01-01

    The robustness of the stability of multivariable linear time invariant feedback control systems with respect to model uncertainty is considered using frequency domain criteria. Available robustness tests are unified under a common framework based on the nature and structure of model errors. These results are derived using a multivariable version of Nyquist's stability theorem in which the minimum singular value of the return difference transfer matrix is shown to be the multivariable generalization of the distance to the critical point on a single input, single output Nyquist diagram. Using the return difference transfer matrix, a very general robustness theorem is presented from which all of the robustness tests dealing with specific model errors may be derived. The robustness tests that explicitly utilized model error structure are able to guarantee feedback system stability in the face of model errors of larger magnitude than those robustness tests that do not. The robustness of linear quadratic Gaussian control systems are analyzed.

  17. Analysis of covariance as a remedy for demographic mismatch of research subject groups: some sobering simulations.

    PubMed

    Adams, K M; Brown, G G; Grant, I

    1985-08-01

    Analysis of Covariance (ANCOVA) is often used in neuropsychological studies to effect ex-post-facto adjustment of performance variables amongst groups of subjects mismatched on some relevant demographic variable. This paper reviews some of the statistical assumptions underlying this usage. In an attempt to illustrate the complexities of this statistical technique, three sham studies using actual patient data are presented. These staged simulations have varying relationships between group test performance differences and levels of covariate discrepancy. The results were robust and consistent in their nature, and were held to support the wisdom of previous cautions by statisticians concerning the employment of ANCOVA to justify comparisons between incomparable groups. ANCOVA should not be used in neuropsychological research to equate groups unequal on variables such as age and education or to exert statistical control whose objective is to eliminate consideration of the covariate as an explanation for results. Finally, the report advocates by example the use of simulation to further our understanding of neuropsychological variables.

  18. Statistical detection of systematic election irregularities

    PubMed Central

    Klimek, Peter; Yegorov, Yuri; Hanel, Rudolf; Thurner, Stefan

    2012-01-01

    Democratic societies are built around the principle of free and fair elections, and that each citizen’s vote should count equally. National elections can be regarded as large-scale social experiments, where people are grouped into usually large numbers of electoral districts and vote according to their preferences. The large number of samples implies statistical consequences for the polling results, which can be used to identify election irregularities. Using a suitable data representation, we find that vote distributions of elections with alleged fraud show a kurtosis substantially exceeding the kurtosis of normal elections, depending on the level of data aggregation. As an example, we show that reported irregularities in recent Russian elections are, indeed, well-explained by systematic ballot stuffing. We develop a parametric model quantifying the extent to which fraudulent mechanisms are present. We formulate a parametric test detecting these statistical properties in election results. Remarkably, this technique produces robust outcomes with respect to the resolution of the data and therefore, allows for cross-country comparisons. PMID:23010929

  19. Enhancing Dairy Manufacturing through customer feedback: A statistical approach

    NASA Astrophysics Data System (ADS)

    Vineesh, D.; Anbuudayasankar, S. P.; Narassima, M. S.

    2018-02-01

    Dairy products have become inevitable of habitual diet. This study aims to investigate the consumers’ satisfaction towards dairy products so as to provide useful information for the manufacturers which would serve as useful inputs for enriching the quality of products delivered. The study involved consumers of dairy products from various demographical backgrounds across South India. The questionnaire focussed on quality aspects of dairy products and also the service provided. A customer satisfaction model was developed based on various factors identified, with robust hypotheses that govern the use of the product. The developed model proved to be statistically significant as it passed the required statistical tests for reliability, construct validity and interdependency between the constructs. Some major concerns detected were regarding the fat content, taste and odour of packaged milk. A minor proportion of people (15.64%) were unsatisfied with the quality of service provided, which is another issue to be addressed to eliminate the sense of dissatisfaction in the minds of consumers.

  20. Statistical detection of systematic election irregularities.

    PubMed

    Klimek, Peter; Yegorov, Yuri; Hanel, Rudolf; Thurner, Stefan

    2012-10-09

    Democratic societies are built around the principle of free and fair elections, and that each citizen's vote should count equally. National elections can be regarded as large-scale social experiments, where people are grouped into usually large numbers of electoral districts and vote according to their preferences. The large number of samples implies statistical consequences for the polling results, which can be used to identify election irregularities. Using a suitable data representation, we find that vote distributions of elections with alleged fraud show a kurtosis substantially exceeding the kurtosis of normal elections, depending on the level of data aggregation. As an example, we show that reported irregularities in recent Russian elections are, indeed, well-explained by systematic ballot stuffing. We develop a parametric model quantifying the extent to which fraudulent mechanisms are present. We formulate a parametric test detecting these statistical properties in election results. Remarkably, this technique produces robust outcomes with respect to the resolution of the data and therefore, allows for cross-country comparisons.

  1. [Quantitative structure-gas chromatographic retention relationship of polycyclic aromatic sulfur heterocycles using molecular electronegativity-distance vector].

    PubMed

    Li, Zhenghua; Cheng, Fansheng; Xia, Zhining

    2011-01-01

    The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.

  2. Dark Energy Survey Year 1 Results: Multi-Probe Methodology and Simulated Likelihood Analyses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krause, E.; et al.

    We present the methodology for and detail the implementation of the Dark Energy Survey (DES) 3x2pt DES Year 1 (Y1) analysis, which combines configuration-space two-point statistics from three different cosmological probes: cosmic shear, galaxy-galaxy lensing, and galaxy clustering, using data from the first year of DES observations. We have developed two independent modeling pipelines and describe the code validation process. We derive expressions for analytical real-space multi-probe covariances, and describe their validation with numerical simulations. We stress-test the inference pipelines in simulated likelihood analyses that vary 6-7 cosmology parameters plus 20 nuisance parameters and precisely resemble the analysis to be presented in the DES 3x2pt analysis paper, using a variety of simulated input data vectors with varying assumptions. We find that any disagreement between pipelines leads to changes in assigned likelihoodmore » $$\\Delta \\chi^2 \\le 0.045$$ with respect to the statistical error of the DES Y1 data vector. We also find that angular binning and survey mask do not impact our analytic covariance at a significant level. We determine lower bounds on scales used for analysis of galaxy clustering (8 Mpc$$~h^{-1}$$) and galaxy-galaxy lensing (12 Mpc$$~h^{-1}$$) such that the impact of modeling uncertainties in the non-linear regime is well below statistical errors, and show that our analysis choices are robust against a variety of systematics. These tests demonstrate that we have a robust analysis pipeline that yields unbiased cosmological parameter inferences for the flagship 3x2pt DES Y1 analysis. We emphasize that the level of independent code development and subsequent code comparison as demonstrated in this paper is necessary to produce credible constraints from increasingly complex multi-probe analyses of current data.« less

  3. Diagnostic methods for atmospheric inversions of long-lived greenhouse gases

    NASA Astrophysics Data System (ADS)

    Michalak, Anna M.; Randazzo, Nina A.; Chevallier, Frédéric

    2017-06-01

    The ability to predict the trajectory of climate change requires a clear understanding of the emissions and uptake (i.e., surface fluxes) of long-lived greenhouse gases (GHGs). Furthermore, the development of climate policies is driving a need to constrain the budgets of anthropogenic GHG emissions. Inverse problems that couple atmospheric observations of GHG concentrations with an atmospheric chemistry and transport model have increasingly been used to gain insights into surface fluxes. Given the inherent technical challenges associated with their solution, it is imperative that objective approaches exist for the evaluation of such inverse problems. Because direct observation of fluxes at compatible spatiotemporal scales is rarely possible, diagnostics tools must rely on indirect measures. Here we review diagnostics that have been implemented in recent studies and discuss their use in informing adjustments to model setup. We group the diagnostics along a continuum starting with those that are most closely related to the scientific question being targeted, and ending with those most closely tied to the statistical and computational setup of the inversion. We thus begin with diagnostics based on assessments against independent information (e.g., unused atmospheric observations, large-scale scientific constraints), followed by statistical diagnostics of inversion results, diagnostics based on sensitivity tests, and analyses of robustness (e.g., tests focusing on the chemistry and transport model, the atmospheric observations, or the statistical and computational framework), and close with the use of synthetic data experiments (i.e., observing system simulation experiments, OSSEs). We find that existing diagnostics provide a crucial toolbox for evaluating and improving flux estimates but, not surprisingly, cannot overcome the fundamental challenges associated with limited atmospheric observations or the lack of direct flux measurements at compatible scales. As atmospheric inversions are increasingly expected to contribute to national reporting of GHG emissions, the need for developing and implementing robust and transparent evaluation approaches will only grow.

  4. Refining the Use of Linkage Disequilibrium as a Robust Signature of Selective Sweeps.

    PubMed

    Jacobs, Guy S; Sluckin, Tim J; Kivisild, Toomas

    2016-08-01

    During a selective sweep, characteristic patterns of linkage disequilibrium can arise in the genomic region surrounding a selected locus. These have been used to infer past selective sweeps. However, the recombination rate is known to vary substantially along the genome for many species. We here investigate the effectiveness of current (Kelly's [Formula: see text] and [Formula: see text]) and novel statistics at inferring hard selective sweeps based on linkage disequilibrium distortions under different conditions, including a human-realistic demographic model and recombination rate variation. When the recombination rate is constant, Kelly's [Formula: see text] offers high power, but is outperformed by a novel statistic that we test, which we call [Formula: see text] We also find this statistic to be effective at detecting sweeps from standing variation. When recombination rate fluctuations are included, there is a considerable reduction in power for all linkage disequilibrium-based statistics. However, this can largely be reversed by appropriately controlling for expected linkage disequilibrium using a genetic map. To further test these different methods, we perform selection scans on well-characterized HapMap data, finding that all three statistics-[Formula: see text] Kelly's [Formula: see text] and [Formula: see text]-are able to replicate signals at regions previously identified as selection candidates based on population differentiation or the site frequency spectrum. While [Formula: see text] replicates most candidates when recombination map data are not available, the [Formula: see text] and [Formula: see text] statistics are more successful when recombination rate variation is controlled for. Given both this and their higher power in simulations of selective sweeps, these statistics are preferred when information on local recombination rate variation is available. Copyright © 2016 by the Genetics Society of America.

  5. Change Detection in Rough Time Series

    DTIC Science & Technology

    2014-09-01

    Business Statistics : An Inferential Approach, Dellen: San Francisco. [18] Winston, W. (1997) Operations Research Applications and Algorithms, Duxbury...distribution that can present significant challenges to conventional statistical tracking techniques. To address this problem the proposed method...applies hybrid fuzzy statistical techniques to series granules instead of to individual measures. Three examples demonstrated the robust nature of the

  6. Adaptive and robust statistical methods for processing near-field scanning microwave microscopy images.

    PubMed

    Coakley, K J; Imtiaz, A; Wallis, T M; Weber, J C; Berweger, S; Kabos, P

    2015-03-01

    Near-field scanning microwave microscopy offers great potential to facilitate characterization, development and modeling of materials. By acquiring microwave images at multiple frequencies and amplitudes (along with the other modalities) one can study material and device physics at different lateral and depth scales. Images are typically noisy and contaminated by artifacts that can vary from scan line to scan line and planar-like trends due to sample tilt errors. Here, we level images based on an estimate of a smooth 2-d trend determined with a robust implementation of a local regression method. In this robust approach, features and outliers which are not due to the trend are automatically downweighted. We denoise images with the Adaptive Weights Smoothing method. This method smooths out additive noise while preserving edge-like features in images. We demonstrate the feasibility of our methods on topography images and microwave |S11| images. For one challenging test case, we demonstrate that our method outperforms alternative methods from the scanning probe microscopy data analysis software package Gwyddion. Our methods should be useful for massive image data sets where manual selection of landmarks or image subsets by a user is impractical. Published by Elsevier B.V.

  7. Robustness of Type I Error and Power in Set Correlation Analysis of Contingency Tables.

    ERIC Educational Resources Information Center

    Cohen, Jacob; Nee, John C. M.

    1990-01-01

    The analysis of contingency tables via set correlation allows the assessment of subhypotheses involving contrast functions of the categories of the nominal scales. The robustness of such methods with regard to Type I error and statistical power was studied via a Monte Carlo experiment. (TJH)

  8. Robustness of meta-analyses in finding gene × environment interactions

    PubMed Central

    Shi, Gang; Nehorai, Arye

    2017-01-01

    Meta-analyses that synthesize statistical evidence across studies have become important analytical tools for genetic studies. Inspired by the success of genome-wide association studies of the genetic main effect, researchers are searching for gene × environment interactions. Confounders are routinely included in the genome-wide gene × environment interaction analysis as covariates; however, this does not control for any confounding effects on the results if covariate × environment interactions are present. We carried out simulation studies to evaluate the robustness to the covariate × environment confounder for meta-regression and joint meta-analysis, which are two commonly used meta-analysis methods for testing the gene × environment interaction or the genetic main effect and interaction jointly. Here we show that meta-regression is robust to the covariate × environment confounder while joint meta-analysis is subject to the confounding effect with inflated type I error rates. Given vast sample sizes employed in genome-wide gene × environment interaction studies, non-significant covariate × environment interactions at the study level could substantially elevate the type I error rate at the consortium level. When covariate × environment confounders are present, type I errors can be controlled in joint meta-analysis by including the covariate × environment terms in the analysis at the study level. Alternatively, meta-regression can be applied, which is robust to potential covariate × environment confounders. PMID:28362796

  9. Modeling and replicating statistical topology and evidence for CMB nonhomogeneity

    PubMed Central

    Agami, Sarit

    2017-01-01

    Under the banner of “big data,” the detection and classification of structure in extremely large, high-dimensional, data sets are two of the central statistical challenges of our times. Among the most intriguing new approaches to this challenge is “TDA,” or “topological data analysis,” one of the primary aims of which is providing nonmetric, but topologically informative, preanalyses of data which make later, more quantitative, analyses feasible. While TDA rests on strong mathematical foundations from topology, in applications, it has faced challenges due to difficulties in handling issues of statistical reliability and robustness, often leading to an inability to make scientific claims with verifiable levels of statistical confidence. We propose a methodology for the parametric representation, estimation, and replication of persistence diagrams, the main diagnostic tool of TDA. The power of the methodology lies in the fact that even if only one persistence diagram is available for analysis—the typical case for big data applications—the replications permit conventional statistical hypothesis testing. The methodology is conceptually simple and computationally practical, and provides a broadly effective statistical framework for persistence diagram TDA analysis. We demonstrate the basic ideas on a toy example, and the power of the parametric approach to TDA modeling in an analysis of cosmic microwave background (CMB) nonhomogeneity. PMID:29078301

  10. A Unified Mixed-Effects Model for Rare-Variant Association in Sequencing Studies

    PubMed Central

    Sun, Jianping; Zheng, Yingye; Hsu, Li

    2013-01-01

    For rare-variant association analysis, due to extreme low frequencies of these variants, it is necessary to aggregate them by a prior set (e.g., genes and pathways) in order to achieve adequate power. In this paper, we consider hierarchical models to relate a set of rare variants to phenotype by modeling the effects of variants as a function of variant characteristics while allowing for variant-specific effect (heterogeneity). We derive a set of two score statistics, testing the group effect by variant characteristics and the heterogeneity effect. We make a novel modification to these score statistics so that they are independent under the null hypothesis and their asymptotic distributions can be derived. As a result, the computational burden is greatly reduced compared with permutation-based tests. Our approach provides a general testing framework for rare variants association, which includes many commonly used tests, such as the burden test [Li and Leal, 2008] and the sequence kernel association test [Wu et al., 2011], as special cases. Furthermore, in contrast to these tests, our proposed test has an added capacity to identify which components of variant characteristics and heterogeneity contribute to the association. Simulations under a wide range of scenarios show that the proposed test is valid, robust and powerful. An application to the Dallas Heart Study illustrates that apart from identifying genes with significant associations, the new method also provides additional information regarding the source of the association. Such information may be useful for generating hypothesis in future studies. PMID:23483651

  11. Technology Solutions Case Study: Predicting Envelope Leakage in Attached Dwellings

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    None

    2013-11-01

    The most common method of measuring air leakage is to perform single (or solo) blower door pressurization and/or depressurization test. In detached housing, the single blower door test measures leakage to the outside. In attached housing, however, this “solo” test method measures both air leakage to the outside and air leakage between adjacent units through common surfaces. In an attempt to create a simplified tool for predicting leakage to the outside, Building America team Consortium for Advanced Residential Buildings (CARB) performed a preliminary statistical analysis on blower door test results from 112 attached dwelling units in four apartment complexes. Althoughmore » the subject data set is limited in size and variety, the preliminary analyses suggest significant predictors are present and support the development of a predictive model. Further data collection is underway to create a more robust prediction tool for use across different construction types, climate zones, and unit configurations.« less

  12. Initial evaluation of discrete orthogonal basis reconstruction of ECT images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moody, E.B.; Donohue, K.D.

    1996-12-31

    Discrete orthogonal basis restoration (DOBR) is a linear, non-iterative, and robust method for solving inverse problems for systems characterized by shift-variant transfer functions. This simulation study evaluates the feasibility of using DOBR for reconstructing emission computed tomographic (ECT) images. The imaging system model uses typical SPECT parameters and incorporates the effects of attenuation, spatially-variant PSF, and Poisson noise in the projection process. Sample reconstructions and statistical error analyses for a class of digital phantoms compare the DOBR performance for Hartley and Walsh basis functions. Test results confirm that DOBR with either basis set produces images with good statistical properties. Nomore » problems were encountered with reconstruction instability. The flexibility of the DOBR method and its consistent performance warrants further investigation of DOBR as a means of ECT image reconstruction.« less

  13. The case for increasing the statistical power of eddy covariance ecosystem studies: why, where and how?

    PubMed

    Hill, Timothy; Chocholek, Melanie; Clement, Robert

    2017-06-01

    Eddy covariance (EC) continues to provide invaluable insights into the dynamics of Earth's surface processes. However, despite its many strengths, spatial replication of EC at the ecosystem scale is rare. High equipment costs are likely to be partially responsible. This contributes to the low sampling, and even lower replication, of ecoregions in Africa, Oceania (excluding Australia) and South America. The level of replication matters as it directly affects statistical power. While the ergodicity of turbulence and temporal replication allow an EC tower to provide statistically robust flux estimates for its footprint, these principles do not extend to larger ecosystem scales. Despite the challenge of spatially replicating EC, it is clearly of interest to be able to use EC to provide statistically robust flux estimates for larger areas. We ask: How much spatial replication of EC is required for statistical confidence in our flux estimates of an ecosystem? We provide the reader with tools to estimate the number of EC towers needed to achieve a given statistical power. We show that for a typical ecosystem, around four EC towers are needed to have 95% statistical confidence that the annual flux of an ecosystem is nonzero. Furthermore, if the true flux is small relative to instrument noise and spatial variability, the number of towers needed can rise dramatically. We discuss approaches for improving statistical power and describe one solution: an inexpensive EC system that could help by making spatial replication more affordable. However, we note that diverting limited resources from other key measurements in order to allow spatial replication may not be optimal, and a balance needs to be struck. While individual EC towers are well suited to providing fluxes from the flux footprint, we emphasize that spatial replication is essential for statistically robust fluxes if a wider ecosystem is being studied. © 2016 The Authors Global Change Biology Published by John Wiley & Sons Ltd.

  14. Fault-tolerant Greenberger-Horne-Zeilinger paradox based on non-Abelian anyons.

    PubMed

    Deng, Dong-Ling; Wu, Chunfeng; Chen, Jing-Ling; Oh, C H

    2010-08-06

    We propose a scheme to test the Greenberger-Horne-Zeilinger paradox based on braidings of non-Abelian anyons, which are exotic quasiparticle excitations of topological states of matter. Because topological ordered states are robust against local perturbations, this scheme is in some sense "fault-tolerant" and might close the detection inefficiency loophole problem in previous experimental tests of the Greenberger-Horne-Zeilinger paradox. In turn, the construction of the Greenberger-Horne-Zeilinger paradox reveals the nonlocal property of non-Abelian anyons. Our results indicate that the non-Abelian fractional statistics is a pure quantum effect and cannot be described by local realistic theories. Finally, we present a possible experimental implementation of the scheme based on the anyonic interferometry technologies.

  15. Optimizing ELISAs for precision and robustness using laboratory automation and statistical design of experiments.

    PubMed

    Joelsson, Daniel; Moravec, Phil; Troutman, Matthew; Pigeon, Joseph; DePhillips, Pete

    2008-08-20

    Transferring manual ELISAs to automated platforms requires optimizing the assays for each particular robotic platform. These optimization experiments are often time consuming and difficult to perform using a traditional one-factor-at-a-time strategy. In this manuscript we describe the development of an automated process using statistical design of experiments (DOE) to quickly optimize immunoassays for precision and robustness on the Tecan EVO liquid handler. By using fractional factorials and a split-plot design, five incubation time variables and four reagent concentration variables can be optimized in a short period of time.

  16. Reliability and validity of a nutrition and physical activity environmental self-assessment for child care

    PubMed Central

    Benjamin, Sara E; Neelon, Brian; Ball, Sarah C; Bangdiwala, Shrikant I; Ammerman, Alice S; Ward, Dianne S

    2007-01-01

    Background Few assessment instruments have examined the nutrition and physical activity environments in child care, and none are self-administered. Given the emerging focus on child care settings as a target for intervention, a valid and reliable measure of the nutrition and physical activity environment is needed. Methods To measure inter-rater reliability, 59 child care center directors and 109 staff completed the self-assessment concurrently, but independently. Three weeks later, a repeat self-assessment was completed by a sub-sample of 38 directors to assess test-retest reliability. To assess criterion validity, a researcher-administered environmental assessment was conducted at 69 centers and was compared to a self-assessment completed by the director. A weighted kappa test statistic and percent agreement were calculated to assess agreement for each question on the self-assessment. Results For inter-rater reliability, kappa statistics ranged from 0.20 to 1.00 across all questions. Test-retest reliability of the self-assessment yielded kappa statistics that ranged from 0.07 to 1.00. The inter-quartile kappa statistic ranges for inter-rater and test-retest reliability were 0.45 to 0.63 and 0.27 to 0.45, respectively. When percent agreement was calculated, questions ranged from 52.6% to 100% for inter-rater reliability and 34.3% to 100% for test-retest reliability. Kappa statistics for validity ranged from -0.01 to 0.79, with an inter-quartile range of 0.08 to 0.34. Percent agreement for validity ranged from 12.9% to 93.7%. Conclusion This study provides estimates of criterion validity, inter-rater reliability and test-retest reliability for an environmental nutrition and physical activity self-assessment instrument for child care. Results indicate that the self-assessment is a stable and reasonably accurate instrument for use with child care interventions. We therefore recommend the Nutrition and Physical Activity Self-Assessment for Child Care (NAP SACC) instrument to researchers and practitioners interested in conducting healthy weight intervention in child care. However, a more robust, less subjective measure would be more appropriate for researchers seeking an outcome measure to assess intervention impact. PMID:17615078

  17. [Application of robustness test for assessment of the measurement uncertainty at the end of development phase of a chromatographic method for quantification of water-soluble vitamins].

    PubMed

    Ihssane, B; Bouchafra, H; El Karbane, M; Azougagh, M; Saffaj, T

    2016-05-01

    We propose in this work an efficient way to evaluate the measurement of uncertainty at the end of the development step of an analytical method, since this assessment provides an indication of the performance of the optimization process. The estimation of the uncertainty is done through a robustness test by applying a Placquett-Burman design, investigating six parameters influencing the simultaneous chromatographic assay of five water-soluble vitamins. The estimated effects of the variation of each parameter are translated into standard uncertainty value at each concentration level. The values obtained of the relative uncertainty do not exceed the acceptance limit of 5%, showing that the procedure development was well done. In addition, a statistical comparison conducted to compare standard uncertainty after the development stage and those of the validation step indicates that the estimated uncertainty are equivalent. The results obtained show clearly the performance and capacity of the chromatographic method to simultaneously assay the five vitamins and suitability for use in routine application. Copyright © 2015 Académie Nationale de Pharmacie. Published by Elsevier Masson SAS. All rights reserved.

  18. Sensitivity Analyses of the Change in FVC in a Phase 3 Trial of Pirfenidone for Idiopathic Pulmonary Fibrosis.

    PubMed

    Lederer, David J; Bradford, Williamson Z; Fagan, Elizabeth A; Glaspole, Ian; Glassberg, Marilyn K; Glasscock, Kenneth F; Kardatzke, David; King, Talmadge E; Lancaster, Lisa H; Nathan, Steven D; Pereira, Carlos A; Sahn, Steven A; Swigris, Jeffrey J; Noble, Paul W

    2015-07-01

    FVC outcomes in clinical trials on idiopathic pulmonary fibrosis (IPF) can be substantially influenced by the analytic methodology and the handling of missing data. We conducted a series of sensitivity analyses to assess the robustness of the statistical finding and the stability of the estimate of the magnitude of treatment effect on the primary end point of FVC change in a phase 3 trial evaluating pirfenidone in adults with IPF. Source data included all 555 study participants randomized to treatment with pirfenidone or placebo in the Assessment of Pirfenidone to Confirm Efficacy and Safety in Idiopathic Pulmonary Fibrosis (ASCEND) study. Sensitivity analyses were conducted to assess whether alternative statistical tests and methods for handling missing data influenced the observed magnitude of treatment effect on the primary end point of change from baseline to week 52 in FVC. The distribution of FVC change at week 52 was systematically different between the two treatment groups and favored pirfenidone in each analysis. The method used to impute missing data due to death had a marked effect on the magnitude of change in FVC in both treatment groups; however, the magnitude of treatment benefit was generally consistent on a relative basis, with an approximate 50% reduction in FVC decline observed in the pirfenidone group in each analysis. Our results confirm the robustness of the statistical finding on the primary end point of change in FVC in the ASCEND trial and corroborate the estimated magnitude of the pirfenidone treatment effect in patients with IPF. ClinicalTrials.gov; No.: NCT01366209; URL: www.clinicaltrials.gov.

  19. Detection of outliers in the response and explanatory variables of the simple circular regression model

    NASA Astrophysics Data System (ADS)

    Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah

    2016-06-01

    The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.

  20. Logistic regression applied to natural hazards: rare event logistic regression with replications

    NASA Astrophysics Data System (ADS)

    Guns, M.; Vanacker, V.

    2012-06-01

    Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.

  1. The effect of shunt surgery on neuropsychological performance in normal pressure hydrocephalus: a systematic review and meta-analysis.

    PubMed

    Peterson, Katie A; Savulich, George; Jackson, Dan; Killikelly, Clare; Pickard, John D; Sahakian, Barbara J

    2016-08-01

    We conducted a systematic review of the literature and used meta-analytic techniques to evaluate the impact of shunt surgery on neuropsychological performance in patients with normal pressure hydrocephalus (NPH). Twenty-three studies with 1059 patients were identified for review using PubMed, Web of Science, Google scholar and manual searching. Inclusion criteria were prospective, within-subject investigations of cognitive outcome using neuropsychological assessment before and after shunt surgery in patients with NPH. There were statistically significant effects of shunt surgery on cognition (Mini-Mental State Examination; MMSE), learning and memory (Rey Auditory Verbal Learning Test; RAVLT, total and delayed subtests), executive function (backwards digit span, phonemic verbal fluency, trail making test B) and psychomotor speed (trail making test A) all in the direction of improvement following shunt surgery, but with considerable heterogeneity across all measures. A more detailed examination of the data suggested robust evidence for improved MMSE, RAVLT total, RAVLT delayed, phonemic verbal fluency and trail making test A only. Meta-regressions revealed no statistically significant effect of age, sex or follow-up interval on improvement in the MMSE. Our results suggest that shunt surgery is most sensitive for improving global cognition, learning and memory and psychomotor speed in patients with NPH.

  2. An Alternative Flight Software Trigger Paradigm: Applying Multivariate Logistic Regression to Sense Trigger Conditions Using Inaccurate or Scarce Information

    NASA Technical Reports Server (NTRS)

    Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.

    2013-01-01

    In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.

  3. An Alternative Flight Software Paradigm: Applying Multivariate Logistic Regression to Sense Trigger Conditions using Inaccurate or Scarce Information

    NASA Technical Reports Server (NTRS)

    Smith, Kelly; Gay, Robert; Stachowiak, Susan

    2013-01-01

    In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles

  4. Robust scoring functions for protein-ligand interactions with quantum chemical charge models.

    PubMed

    Wang, Jui-Chih; Lin, Jung-Hsin; Chen, Chung-Ming; Perryman, Alex L; Olson, Arthur J

    2011-10-24

    Ordinary least-squares (OLS) regression has been used widely for constructing the scoring functions for protein-ligand interactions. However, OLS is very sensitive to the existence of outliers, and models constructed using it are easily affected by the outliers or even the choice of the data set. On the other hand, determination of atomic charges is regarded as of central importance, because the electrostatic interaction is known to be a key contributing factor for biomolecular association. In the development of the AutoDock4 scoring function, only OLS was conducted, and the simple Gasteiger method was adopted. It is therefore of considerable interest to see whether more rigorous charge models could improve the statistical performance of the AutoDock4 scoring function. In this study, we have employed two well-established quantum chemical approaches, namely the restrained electrostatic potential (RESP) and the Austin-model 1-bond charge correction (AM1-BCC) methods, to obtain atomic partial charges, and we have compared how different charge models affect the performance of AutoDock4 scoring functions. In combination with robust regression analysis and outlier exclusion, our new protein-ligand free energy regression model with AM1-BCC charges for ligands and Amber99SB charges for proteins achieve lowest root-mean-squared error of 1.637 kcal/mol for the training set of 147 complexes and 2.176 kcal/mol for the external test set of 1427 complexes. The assessment for binding pose prediction with the 100 external decoy sets indicates very high success rate of 87% with the criteria of predicted root-mean-squared deviation of less than 2 Å. The success rates and statistical performance of our robust scoring functions are only weakly class-dependent (hydrophobic, hydrophilic, or mixed).

  5. Single-variant and multi-variant trend tests for genetic association with next-generation sequencing that are robust to sequencing error.

    PubMed

    Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Alejandro Q; Musolf, Anthony; Matise, Tara C; Finch, Stephen J; Gordon, Derek

    2012-01-01

    As with any new technology, next-generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to those data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single-variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p value, no matter how many loci. Copyright © 2013 S. Karger AG, Basel.

  6. Single variant and multi-variant trend tests for genetic association with next generation sequencing that are robust to sequencing error

    PubMed Central

    Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Andrew; Musolf, Anthony; Matise, Tara C.; Finch, Stephen J.; Gordon, Derek

    2013-01-01

    As with any new technology, next generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model, based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to that data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p-value, no matter how many loci. PMID:23594495

  7. A note on the SG( m) test

    NASA Astrophysics Data System (ADS)

    López, Fernando A.; Matilla-García, Mariano; Mur, Jesús; Páez, Antonio; Ruiz, Manuel

    2016-01-01

    López et al. (Reg Sci Urban Econ 40(2-3):106-115, 2010) introduce a nonparametric test of spatial dependence, called SG( m). The test is claimed to be consistent and asymptotically Chi-square distributed. Elsinger (Reg Sci Urban Econ 43(5):838-840, 2013) raises doubts about the two properties. Using a particular counterexample, he shows that the asymptotic distribution of the SG( m) test may be far from the Chi-square family; the property of consistency is also questioned. In this note, the authors want to clarify the properties of the SG( m) test. We argue that the cause of the conflict is in the specification of the symbolization map. The discrepancies can be solved by adjusting some of the definitions made in the original paper. Moreover, we introduce a permutational bootstrapped version of the SG( m) test, which is powerful and robust to the underlying statistical assumptions. This bootstrapped version may be very useful in an applied context.

  8. SSD for R: A Comprehensive Statistical Package to Analyze Single-System Data

    ERIC Educational Resources Information Center

    Auerbach, Charles; Schudrich, Wendy Zeitlin

    2013-01-01

    The need for statistical analysis in single-subject designs presents a challenge, as analytical methods that are applied to group comparison studies are often not appropriate in single-subject research. "SSD for R" is a robust set of statistical functions with wide applicability to single-subject research. It is a comprehensive package…

  9. Model Uncertainty and Robustness: A Computational Framework for Multimodel Analysis

    ERIC Educational Resources Information Center

    Young, Cristobal; Holsteen, Katherine

    2017-01-01

    Model uncertainty is pervasive in social science. A key question is how robust empirical results are to sensible changes in model specification. We present a new approach and applied statistical software for computational multimodel analysis. Our approach proceeds in two steps: First, we estimate the modeling distribution of estimates across all…

  10. Standard and Robust Methods in Regression Imputation

    ERIC Educational Resources Information Center

    Moraveji, Behjat; Jafarian, Koorosh

    2014-01-01

    The aim of this paper is to provide an introduction of new imputation algorithms for estimating missing values from official statistics in larger data sets of data pre-processing, or outliers. The goal is to propose a new algorithm called IRMI (iterative robust model-based imputation). This algorithm is able to deal with all challenges like…

  11. On the streaming model for redshift-space distortions

    NASA Astrophysics Data System (ADS)

    Kuruvilla, Joseph; Porciani, Cristiano

    2018-06-01

    The streaming model describes the mapping between real and redshift space for 2-point clustering statistics. Its key element is the probability density function (PDF) of line-of-sight pairwise peculiar velocities. Following a kinetic-theory approach, we derive the fundamental equations of the streaming model for ordered and unordered pairs. In the first case, we recover the classic equation while we demonstrate that modifications are necessary for unordered pairs. We then discuss several statistical properties of the pairwise velocities for DM particles and haloes by using a suite of high-resolution N-body simulations. We test the often used Gaussian ansatz for the PDF of pairwise velocities and discuss its limitations. Finally, we introduce a mixture of Gaussians which is known in statistics as the generalised hyperbolic distribution and show that it provides an accurate fit to the PDF. Once inserted in the streaming equation, the fit yields an excellent description of redshift-space correlations at all scales that vastly outperforms the Gaussian and exponential approximations. Using a principal-component analysis, we reduce the complexity of our model for large redshift-space separations. Our results increase the robustness of studies of anisotropic galaxy clustering and are useful for extending them towards smaller scales in order to test theories of gravity and interacting dark-energy models.

  12. The Role of Design-of-Experiments in Managing Flow in Compact Air Vehicle Inlets

    NASA Technical Reports Server (NTRS)

    Anderson, Bernhard H.; Miller, Daniel N.; Gridley, Marvin C.; Agrell, Johan

    2003-01-01

    It is the purpose of this study to demonstrate the viability and economy of Design-of-Experiments methodologies to arrive at microscale secondary flow control array designs that maintain optimal inlet performance over a wide range of the mission variables and to explore how these statistical methods provide a better understanding of the management of flow in compact air vehicle inlets. These statistical design concepts were used to investigate the robustness properties of low unit strength micro-effector arrays. Low unit strength micro-effectors are micro-vanes set at very low angles-of-incidence with very long chord lengths. They were designed to influence the near wall inlet flow over an extended streamwise distance, and their advantage lies in low total pressure loss and high effectiveness in managing engine face distortion. The term robustness is used in this paper in the same sense as it is used in the industrial problem solving community. It refers to minimizing the effects of the hard-to-control factors that influence the development of a product or process. In Robustness Engineering, the effects of the hard-to-control factors are often called noise , and the hard-to-control factors themselves are referred to as the environmental variables or sometimes as the Taguchi noise variables. Hence Robust Optimization refers to minimizing the effects of the environmental or noise variables on the development (design) of a product or process. In the management of flow in compact inlets, the environmental or noise variables can be identified with the mission variables. Therefore this paper formulates a statistical design methodology that minimizes the impact of variations in the mission variables on inlet performance and demonstrates that these statistical design concepts can lead to simpler inlet flow management systems.

  13. On adaptive robustness approach to Anti-Jam signal processing

    NASA Astrophysics Data System (ADS)

    Poberezhskiy, Y. S.; Poberezhskiy, G. Y.

    An effective approach to exploiting statistical differences between desired and jamming signals named adaptive robustness is proposed and analyzed in this paper. It combines conventional Bayesian, adaptive, and robust approaches that are complementary to each other. This combining strengthens the advantages and mitigates the drawbacks of the conventional approaches. Adaptive robustness is equally applicable to both jammers and their victim systems. The capabilities required for realization of adaptive robustness in jammers and victim systems are determined. The employment of a specific nonlinear robust algorithm for anti-jam (AJ) processing is described and analyzed. Its effectiveness in practical situations has been proven analytically and confirmed by simulation. Since adaptive robustness can be used by both sides in electronic warfare, it is more advantageous for the fastest and most intelligent side. Many results obtained and discussed in this paper are also applicable to commercial applications such as communications in unregulated or poorly regulated frequency ranges and systems with cognitive capabilities.

  14. Fundamentals of Research Data and Variables: The Devil Is in the Details.

    PubMed

    Vetter, Thomas R

    2017-10-01

    Designing, conducting, analyzing, reporting, and interpreting the findings of a research study require an understanding of the types and characteristics of data and variables. Descriptive statistics are typically used simply to calculate, describe, and summarize the collected research data in a logical, meaningful, and efficient way. Inferential statistics allow researchers to make a valid estimate of the association between an intervention and the treatment effect in a specific population, based upon their randomly collected, representative sample data. Categorical data can be either dichotomous or polytomous. Dichotomous data have only 2 categories, and thus are considered binary. Polytomous data have more than 2 categories. Unlike dichotomous and polytomous data, ordinal data are rank ordered, typically based on a numerical scale that is comprised of a small set of discrete classes or integers. Continuous data are measured on a continuum and can have any numeric value over this continuous range. Continuous data can be meaningfully divided into smaller and smaller or finer and finer increments, depending upon the precision of the measurement instrument. Interval data are a form of continuous data in which equal intervals represent equal differences in the property being measured. Ratio data are another form of continuous data, which have the same properties as interval data, plus a true definition of an absolute zero point, and the ratios of the values on the measurement scale make sense. The normal (Gaussian) distribution ("bell-shaped curve") is of the most common statistical distributions. Many applied inferential statistical tests are predicated on the assumption that the analyzed data follow a normal distribution. The histogram and the Q-Q plot are 2 graphical methods to assess if a set of data have a normal distribution (display "normality"). The Shapiro-Wilk test and the Kolmogorov-Smirnov test are 2 well-known and historically widely applied quantitative methods to assess for data normality. Parametric statistical tests make certain assumptions about the characteristics and/or parameters of the underlying population distribution upon which the test is based, whereas nonparametric tests make fewer or less rigorous assumptions. If the normality test concludes that the study data deviate significantly from a Gaussian distribution, rather than applying a less robust nonparametric test, the problem can potentially be remedied by judiciously and openly: (1) performing a data transformation of all the data values; or (2) eliminating any obvious data outlier(s).

  15. Peaks Over Threshold (POT): A methodology for automatic threshold estimation using goodness of fit p-value

    NASA Astrophysics Data System (ADS)

    Solari, Sebastián.; Egüen, Marta; Polo, María. José; Losada, Miguel A.

    2017-04-01

    Threshold estimation in the Peaks Over Threshold (POT) method and the impact of the estimation method on the calculation of high return period quantiles and their uncertainty (or confidence intervals) are issues that are still unresolved. In the past, methods based on goodness of fit tests and EDF-statistics have yielded satisfactory results, but their use has not yet been systematized. This paper proposes a methodology for automatic threshold estimation, based on the Anderson-Darling EDF-statistic and goodness of fit test. When combined with bootstrapping techniques, this methodology can be used to quantify both the uncertainty of threshold estimation and its impact on the uncertainty of high return period quantiles. This methodology was applied to several simulated series and to four precipitation/river flow data series. The results obtained confirmed its robustness. For the measured series, the estimated thresholds corresponded to those obtained by nonautomatic methods. Moreover, even though the uncertainty of the threshold estimation was high, this did not have a significant effect on the width of the confidence intervals of high return period quantiles.

  16. Statistical analysis of the magnetization signatures of impact basins

    NASA Astrophysics Data System (ADS)

    Gabasova, L. R.; Wieczorek, M. A.

    2017-09-01

    We quantify the magnetic signatures of the largest lunar impact basins using recent mission data and robust statistical bounds, and obtain an early activity timeline for the lunar core dynamo which appears to peak earlier than indicated by Apollo paleointensity measurements.

  17. A UNIFYING CONCEPT FOR ASSESSING TOXICOLOGICAL INTERACTIONS: CHANGES IN SLOPE

    EPA Science Inventory

    Robust statistical methods are important to the evaluation of interactions among chemicals in a mixture. However, different concepts of interaction as applied to the statistical analysis of chemical mixture toxicology data or as used in environmental risk assessment often can ap...

  18. Robust Fixed-Structure Control

    DTIC Science & Technology

    1994-10-30

    Deterministic Foundation for Statistical Energy Analysis ," J. Sound Vibr., to appear. 1.96 D. S. Bernstein and S. P. Bhat, "Lyapunov Stability, Semistability...S. Bernstein, "Power Flow, Energy Balance, and Statistical Energy Analysis for Large Scale, Interconnected Systems," Proc. Amer. Contr. Conf., pp

  19. Refining the Use of Linkage Disequilibrium as a Robust Signature of Selective Sweeps

    PubMed Central

    Jacobs, Guy S.; Sluckin, Timothy J.; Kivisild, Toomas

    2016-01-01

    During a selective sweep, characteristic patterns of linkage disequilibrium can arise in the genomic region surrounding a selected locus. These have been used to infer past selective sweeps. However, the recombination rate is known to vary substantially along the genome for many species. We here investigate the effectiveness of current (Kelly’s ZnS and ωmax) and novel statistics at inferring hard selective sweeps based on linkage disequilibrium distortions under different conditions, including a human-realistic demographic model and recombination rate variation. When the recombination rate is constant, Kelly’s ZnS offers high power, but is outperformed by a novel statistic that we test, which we call Zα. We also find this statistic to be effective at detecting sweeps from standing variation. When recombination rate fluctuations are included, there is a considerable reduction in power for all linkage disequilibrium-based statistics. However, this can largely be reversed by appropriately controlling for expected linkage disequilibrium using a genetic map. To further test these different methods, we perform selection scans on well-characterized HapMap data, finding that all three statistics—ωmax, Kelly’s ZnS, and Zα—are able to replicate signals at regions previously identified as selection candidates based on population differentiation or the site frequency spectrum. While ωmax replicates most candidates when recombination map data are not available, the ZnS and Zα statistics are more successful when recombination rate variation is controlled for. Given both this and their higher power in simulations of selective sweeps, these statistics are preferred when information on local recombination rate variation is available. PMID:27516617

  20. A global logrank test for adaptive treatment strategies based on observational studies.

    PubMed

    Li, Zhiguo; Valenstein, Marcia; Pfeiffer, Paul; Ganoczy, Dara

    2014-02-28

    In studying adaptive treatment strategies, a natural question that is of paramount interest is whether there is any significant difference among all possible treatment strategies. When the outcome variable of interest is time-to-event, we propose an inverse probability weighted logrank test for testing the equivalence of a fixed set of pre-specified adaptive treatment strategies based on data from an observational study. The weights take into account both the possible selection bias in an observational study and the fact that the same subject may be consistent with more than one treatment strategy. The asymptotic distribution of the weighted logrank statistic under the null hypothesis is obtained. We show that, in an observational study where the treatment selection probabilities need to be estimated, the estimation of these probabilities does not have an effect on the asymptotic distribution of the weighted logrank statistic, as long as the estimation of the parameters in the models for these probabilities is n-consistent. Finite sample performance of the test is assessed via a simulation study. We also show in the simulation that the test can be pretty robust to misspecification of the models for the probabilities of treatment selection. The method is applied to analyze data on antidepressant adherence time from an observational database maintained at the Department of Veterans Affairs' Serious Mental Illness Treatment Research and Evaluation Center. Copyright © 2013 John Wiley & Sons, Ltd.

  1. Design of a factorial experiment with randomization restrictions to assess medical device performance on vascular tissue

    PubMed Central

    2011-01-01

    Background Energy-based surgical scalpels are designed to efficiently transect and seal blood vessels using thermal energy to promote protein denaturation and coagulation. Assessment and design improvement of ultrasonic scalpel performance relies on both in vivo and ex vivo testing. The objective of this work was to design and implement a robust, experimental test matrix with randomization restrictions and predictive statistical power, which allowed for identification of those experimental variables that may affect the quality of the seal obtained ex vivo. Methods The design of the experiment included three factors: temperature (two levels); the type of solution used to perfuse the artery during transection (three types); and artery type (two types) resulting in a total of twelve possible treatment combinations. Burst pressures of porcine carotid and renal arteries sealed ex vivo were assigned as the response variable. Results The experimental test matrix was designed and carried out as a split-plot experiment in order to assess the contributions of several variables and their interactions while accounting for randomization restrictions present in the experimental setup. The statistical software package SAS was utilized and PROC MIXED was used to account for the randomization restrictions in the split-plot design. The combination of temperature, solution, and vessel type had a statistically significant impact on seal quality. Conclusions The design and implementation of a split-plot experimental test-matrix provided a mechanism for addressing the existing technical randomization restrictions of ex vivo ultrasonic scalpel performance testing, while preserving the ability to examine the potential effects of independent factors or variables. This method for generating the experimental design and the statistical analyses of the resulting data are adaptable to a wide variety of experimental problems involving large-scale tissue-based studies of medical or experimental device efficacy and performance. PMID:21599963

  2. Robust GPS autonomous signal quality monitoring

    NASA Astrophysics Data System (ADS)

    Ndili, Awele Nnaemeka

    The Global Positioning System (GPS), introduced by the U.S. Department of Defense in 1973, provides unprecedented world-wide navigation capabilities through a constellation of 24 satellites in global orbit, each emitting a low-power radio-frequency signal for ranging. GPS receivers track these transmitted signals, computing position to within 30 meters from range measurements made to four satellites. GPS has a wide range of applications, including aircraft, marine and land vehicle navigation. Each application places demands on GPS for various levels of accuracy, integrity, system availability and continuity of service. Radio frequency interference (RFI), which results from natural sources such as TV/FM harmonics, radar or Mobile Satellite Systems (MSS), presents a challenge in the use of GPS, by posing a threat to the accuracy, integrity and availability of the GPS navigation solution. In order to use GPS for integrity-sensitive applications, it is therefore necessary to monitor the quality of the received signal, with the objective of promptly detecting the presence of RFI, and thus provide a timely warning of degradation of system accuracy. This presents a challenge, since the myriad kinds of RFI affect the GPS receiver in different ways. What is required then, is a robust method of detecting GPS accuracy degradation, which is effective regardless of the origin of the threat. This dissertation presents a new method of robust signal quality monitoring for GPS. Algorithms for receiver autonomous interference detection and integrity monitoring are demonstrated. Candidate test statistics are derived from fundamental receiver measurements of in-phase and quadrature correlation outputs, and the gain of the Active Gain Controller (AGC). Performance of selected test statistics are evaluated in the presence of RFI: broadband interference, pulsed and non-pulsed interference, coherent CW at different frequencies; and non-RFI: GPS signal fading due to physical blockage and multipath. Results are presented which verify the effectiveness of these proposed methods. The benefits of pseudolites in reducing service outages due to interference are demonstrated. Pseudolites also enhance the geometry of the GPS constellation, improving overall system accuracy. Designs for pseudolites signals, to reduce the near-far problem associated with pseudolite use, are also presented.

  3. Evaluation of the impact of fetal fibronectin test implementation on hospital admissions for preterm labour in Ontario: a multiple baseline time-series design.

    PubMed

    Fell, D B; Sprague, A E; Grimshaw, J M; Yasseen, A S; Coyle, D; Dunn, S I; Perkins, S L; Peterson, W E; Johnson, M; Bunting, P S; Walker, M C

    2014-03-01

    To determine the impact of a health system-wide fetal fibronectin (fFN) testing programme on the rates of hospital admission for preterm labour (PTL). Multiple baseline time-series design. Canadian province of Ontario. A retrospective population-based cohort of antepartum and delivered obstetrical admissions in all Ontario hospitals between 1 April 2002 and 31 March 2010. International Classification of Diseases codes in a health system-wide hospital administrative database were used to identify the study population and define the outcome measure. An aggregate time series of monthly rates of hospital admissions for PTL was analysed using segmented regression models after aligning the fFN test implementation date for each institution. Rate of obstetrical hospital admission for PTL. Estimated rates of hospital admission for PTL following fFN implementation were lower than predicted had pre-implementation trends prevailed. The reduction in the rate was modest, but statistically significant, when estimated at 12 months following fFN implementation (-0.96 hospital admissions for PTL per 100 preterm births; 95% confidence interval [CI], -1.02 to -0.90, P = 0.04). The statistically significant reduction was sustained at 24 and 36 months following implementation. Using a robust quasi-experimental study design to overcome confounding as a result of underlying secular trends or concurrent interventions, we found evidence of a small but statistically significant reduction in the health system-level rate of hospital admissions for PTL following implementation of fFN testing in a large Canadian province. © 2013 Royal College of Obstetricians and Gynaecologists.

  4. Can power-law scaling and neuronal avalanches arise from stochastic dynamics?

    PubMed

    Touboul, Jonathan; Destexhe, Alain

    2010-02-11

    The presence of self-organized criticality in biology is often evidenced by a power-law scaling of event size distributions, which can be measured by linear regression on logarithmic axes. We show here that such a procedure does not necessarily mean that the system exhibits self-organized criticality. We first provide an analysis of multisite local field potential (LFP) recordings of brain activity and show that event size distributions defined as negative LFP peaks can be close to power-law distributions. However, this result is not robust to change in detection threshold, or when tested using more rigorous statistical analyses such as the Kolmogorov-Smirnov test. Similar power-law scaling is observed for surrogate signals, suggesting that power-law scaling may be a generic property of thresholded stochastic processes. We next investigate this problem analytically, and show that, indeed, stochastic processes can produce spurious power-law scaling without the presence of underlying self-organized criticality. However, this power-law is only apparent in logarithmic representations, and does not survive more rigorous analysis such as the Kolmogorov-Smirnov test. The same analysis was also performed on an artificial network known to display self-organized criticality. In this case, both the graphical representations and the rigorous statistical analysis reveal with no ambiguity that the avalanche size is distributed as a power-law. We conclude that logarithmic representations can lead to spurious power-law scaling induced by the stochastic nature of the phenomenon. This apparent power-law scaling does not constitute a proof of self-organized criticality, which should be demonstrated by more stringent statistical tests.

  5. Robust variable selection method for nonparametric differential equation models with application to nonlinear dynamic gene regulatory network analysis.

    PubMed

    Lu, Tao

    2016-01-01

    The gene regulation network (GRN) evaluates the interactions between genes and look for models to describe the gene expression behavior. These models have many applications; for instance, by characterizing the gene expression mechanisms that cause certain disorders, it would be possible to target those genes to block the progress of the disease. Many biological processes are driven by nonlinear dynamic GRN. In this article, we propose a nonparametric differential equation (ODE) to model the nonlinear dynamic GRN. Specially, we address following questions simultaneously: (i) extract information from noisy time course gene expression data; (ii) model the nonlinear ODE through a nonparametric smoothing function; (iii) identify the important regulatory gene(s) through a group smoothly clipped absolute deviation (SCAD) approach; (iv) test the robustness of the model against possible shortening of experimental duration. We illustrate the usefulness of the model and associated statistical methods through a simulation and a real application examples.

  6. Novel Kalman filter algorithm for statistical monitoring of extensive landscapes with synoptic sensor data

    Treesearch

    Raymond L. Czaplewski

    2015-01-01

    Wall-to-wall remotely sensed data are increasingly available to monitor landscape dynamics over large geographic areas. However, statistical monitoring programs that use post-stratification cannot fully utilize those sensor data. The Kalman filter (KF) is an alternative statistical estimator. I develop a new KF algorithm that is numerically robust with large numbers of...

  7. Robustness and structure of complex networks

    NASA Astrophysics Data System (ADS)

    Shao, Shuai

    This dissertation covers the two major parts of my PhD research on statistical physics and complex networks: i) modeling a new type of attack -- localized attack, and investigating robustness of complex networks under this type of attack; ii) discovering the clustering structure in complex networks and its influence on the robustness of coupled networks. Complex networks appear in every aspect of our daily life and are widely studied in Physics, Mathematics, Biology, and Computer Science. One important property of complex networks is their robustness under attacks, which depends crucially on the nature of attacks and the structure of the networks themselves. Previous studies have focused on two types of attack: random attack and targeted attack, which, however, are insufficient to describe many real-world damages. Here we propose a new type of attack -- localized attack, and study the robustness of complex networks under this type of attack, both analytically and via simulation. On the other hand, we also study the clustering structure in the network, and its influence on the robustness of a complex network system. In the first part, we propose a theoretical framework to study the robustness of complex networks under localized attack based on percolation theory and generating function method. We investigate the percolation properties, including the critical threshold of the phase transition pc and the size of the giant component Pinfinity. We compare localized attack with random attack and find that while random regular (RR) networks are more robust against localized attack, Erdoḧs-Renyi (ER) networks are equally robust under both types of attacks. As for scale-free (SF) networks, their robustness depends crucially on the degree exponent lambda. The simulation results show perfect agreement with theoretical predictions. We also test our model on two real-world networks: a peer-to-peer computer network and an airline network, and find that the real-world networks are much more vulnerable to localized attack compared with random attack. In the second part, we extend the tree-like generating function method to incorporating clustering structure in complex networks. We study the robustness of a complex network system, especially a network of networks (NON) with clustering structure in each network. We find that the system becomes less robust as we increase the clustering coefficient of each network. For a partially dependent network system, we also find that the influence of the clustering coefficient on network robustness decreases as we decrease the coupling strength, and the critical coupling strength qc, at which the first-order phase transition changes to second-order, increases as we increase the clustering coefficient.

  8. Model Robust Calibration: Method and Application to Electronically-Scanned Pressure Transducers

    NASA Technical Reports Server (NTRS)

    Walker, Eric L.; Starnes, B. Alden; Birch, Jeffery B.; Mays, James E.

    2010-01-01

    This article presents the application of a recently developed statistical regression method to the controlled instrument calibration problem. The statistical method of Model Robust Regression (MRR), developed by Mays, Birch, and Starnes, is shown to improve instrument calibration by reducing the reliance of the calibration on a predetermined parametric (e.g. polynomial, exponential, logarithmic) model. This is accomplished by allowing fits from the predetermined parametric model to be augmented by a certain portion of a fit to the residuals from the initial regression using a nonparametric (locally parametric) regression technique. The method is demonstrated for the absolute scale calibration of silicon-based pressure transducers.

  9. Relations between Brain Structure and Attentional Function in Spina Bifida: Utilization of Robust Statistical Approaches

    PubMed Central

    Kulesz, Paulina A.; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M.; Francis, David J.

    2015-01-01

    Objective Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Method Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product moment correlation was compared with four robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator Results All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Conclusions Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PMID:25495830

  10. Relations between volumetric measures of brain structure and attentional function in spina bifida: utilization of robust statistical approaches.

    PubMed

    Kulesz, Paulina A; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M; Francis, David J

    2015-03-01

    Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product-moment correlation was compared with 4 robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator. All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PsycINFO Database Record (c) 2015 APA, all rights reserved.

  11. Does bisphenol A induce superfeminization in Marisa cornuarietis? Part I: intra- and inter-laboratory variability in test endpoints.

    PubMed

    Forbes, Valery E; Selck, Henriette; Palmqvist, Annemette; Aufderheide, John; Warbritton, Ryan; Pounds, Nadine; Thompson, Roy; van der Hoeven, Nelly; Caspers, Norbert

    2007-03-01

    It has been claimed that bisphenol A (BPA) induces superfeminization in the freshwater gastropod, Marisa cornuarietis. To explore the reproducibility of prior work, here we present results from a three-laboratory study, the objectives of which were to determine the mean and variability in test endpoints (i.e., adult fecundity, egg hatchability, and juvenile growth) under baseline conditions and to identify the sources of variability. A major source of variability for all of the measured endpoints was due to differences within and among individuals. With few exceptions, variability among laboratories and among replicate tanks within laboratories contributed little to the observed variability in endpoints. The results highlight the importance of obtaining basic knowledge of husbandry requirements and baseline information on life-history traits of potential test species prior to designing toxicity test protocols. Understanding of the levels and sources of endpoint variability is essential so that statistically robust and ecologically relevant tests of chemicals can be conducted.

  12. MetaGenyo: a web tool for meta-analysis of genetic association studies.

    PubMed

    Martorell-Marugan, Jordi; Toro-Dominguez, Daniel; Alarcon-Riquelme, Marta E; Carmona-Saez, Pedro

    2017-12-16

    Genetic association studies (GAS) aims to evaluate the association between genetic variants and phenotypes. In the last few years, the number of this type of study has increased exponentially, but the results are not always reproducible due to experimental designs, low sample sizes and other methodological errors. In this field, meta-analysis techniques are becoming very popular tools to combine results across studies to increase statistical power and to resolve discrepancies in genetic association studies. A meta-analysis summarizes research findings, increases statistical power and enables the identification of genuine associations between genotypes and phenotypes. Meta-analysis techniques are increasingly used in GAS, but it is also increasing the amount of published meta-analysis containing different errors. Although there are several software packages that implement meta-analysis, none of them are specifically designed for genetic association studies and in most cases their use requires advanced programming or scripting expertise. We have developed MetaGenyo, a web tool for meta-analysis in GAS. MetaGenyo implements a complete and comprehensive workflow that can be executed in an easy-to-use environment without programming knowledge. MetaGenyo has been developed to guide users through the main steps of a GAS meta-analysis, covering Hardy-Weinberg test, statistical association for different genetic models, analysis of heterogeneity, testing for publication bias, subgroup analysis and robustness testing of the results. MetaGenyo is a useful tool to conduct comprehensive genetic association meta-analysis. The application is freely available at http://bioinfo.genyo.es/metagenyo/ .

  13. Interpreting carnivore scent-station surveys

    USGS Publications Warehouse

    Sargeant, G.A.; Johnson, D.H.; Berg, W.E.

    1998-01-01

    The scent-station survey method has been widely used to estimate trends in carnivore abundance. However, statistical properties of scent-station data are poorly understood, and the relation between scent-station indices and carnivore abundance has not been adequately evaluated. We assessed properties of scent-station indices by analyzing data collected in Minnesota during 1986-03. Visits to stations separated by <2 km were correlated for all species because individual carnivores sometimes visited several stations in succession. Thus, visits to stations had an intractable statistical distribution. Dichotomizing results for lines of 10 stations (0 or 21 visits) produced binomially distributed data that were robust to multiple visits by individuals. We abandoned 2-way comparisons among years in favor of tests for population trend, which are less susceptible to bias, and analyzed results separately for biogeographic sections of Minnesota because trends differed among sections. Before drawing inferences about carnivore population trends, we reevaluated published validation experiments. Results implicated low statistical power and confounding as possible explanations for equivocal or conflicting results of validation efforts. Long-term trends in visitation rates probably reflect real changes in populations, but poor spatial and temporal resolution, susceptibility to confounding, and low statistical power limit the usefulness of this survey method.

  14. Median statistics estimates of Hubble and Newton's constants

    NASA Astrophysics Data System (ADS)

    Bethapudi, Suryarao; Desai, Shantanu

    2017-02-01

    Robustness of any statistics depends upon the number of assumptions it makes about the measured data. We point out the advantages of median statistics using toy numerical experiments and demonstrate its robustness, when the number of assumptions we can make about the data are limited. We then apply the median statistics technique to obtain estimates of two constants of nature, Hubble constant (H0) and Newton's gravitational constant ( G , both of which show significant differences between different measurements. For H0, we update the analyses done by Chen and Ratra (2011) and Gott et al. (2001) using 576 measurements. We find after grouping the different results according to their primary type of measurement, the median estimates are given by H0 = 72.5^{+2.5}_{-8} km/sec/Mpc with errors corresponding to 95% c.l. (2 σ) and G=6.674702^{+0.0014}_{-0.0009} × 10^{-11} Nm2kg-2 corresponding to 68% c.l. (1σ).

  15. Are Assumptions of Well-Known Statistical Techniques Checked, and Why (Not)?

    PubMed Central

    Hoekstra, Rink; Kiers, Henk A. L.; Johnson, Addie

    2012-01-01

    A valid interpretation of most statistical techniques requires that one or more assumptions be met. In published articles, however, little information tends to be reported on whether the data satisfy the assumptions underlying the statistical techniques used. This could be due to self-selection: Only manuscripts with data fulfilling the assumptions are submitted. Another explanation could be that violations of assumptions are rarely checked for in the first place. We studied whether and how 30 researchers checked fictitious data for violations of assumptions in their own working environment. Participants were asked to analyze the data as they would their own data, for which often used and well-known techniques such as the t-procedure, ANOVA and regression (or non-parametric alternatives) were required. It was found that the assumptions of the techniques were rarely checked, and that if they were, it was regularly by means of a statistical test. Interviews afterward revealed a general lack of knowledge about assumptions, the robustness of the techniques with regards to the assumptions, and how (or whether) assumptions should be checked. These data suggest that checking for violations of assumptions is not a well-considered choice, and that the use of statistics can be described as opportunistic. PMID:22593746

  16. Defect detection of castings in radiography images using a robust statistical feature.

    PubMed

    Zhao, Xinyue; He, Zaixing; Zhang, Shuyou

    2014-01-01

    One of the most commonly used optical methods for defect detection is radiographic inspection. Compared with methods that extract defects directly from the radiography image, model-based methods deal with the case of an object with complex structure well. However, detection of small low-contrast defects in nonuniformly illuminated images is still a major challenge for them. In this paper, we present a new method based on the grayscale arranging pairs (GAP) feature to detect casting defects in radiography images automatically. First, a model is built using pixel pairs with a stable intensity relationship based on the GAP feature from previously acquired images. Second, defects can be extracted by comparing the difference of intensity-difference signs between the input image and the model statistically. The robustness of the proposed method to noise and illumination variations has been verified on casting radioscopic images with defects. The experimental results showed that the average computation time of the proposed method in the testing stage is 28 ms per image on a computer with a Pentium Core 2 Duo 3.00 GHz processor. For the comparison, we also evaluated the performance of the proposed method as well as that of the mixture-of-Gaussian-based and crossing line profile methods. The proposed method achieved 2.7% and 2.0% false negative rates in the noise and illumination variation experiments, respectively.

  17. 3D statistical shape models incorporating 3D random forest regression voting for robust CT liver segmentation

    NASA Astrophysics Data System (ADS)

    Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.

    2015-03-01

    During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.

  18. A Baseline for the Multivariate Comparison of Resting-State Networks

    PubMed Central

    Allen, Elena A.; Erhardt, Erik B.; Damaraju, Eswar; Gruner, William; Segall, Judith M.; Silva, Rogers F.; Havlicek, Martin; Rachakonda, Srinivas; Fries, Jill; Kalyanam, Ravi; Michael, Andrew M.; Caprihan, Arvind; Turner, Jessica A.; Eichele, Tom; Adelsheim, Steven; Bryan, Angela D.; Bustillo, Juan; Clark, Vincent P.; Feldstein Ewing, Sarah W.; Filbey, Francesca; Ford, Corey C.; Hutchison, Kent; Jung, Rex E.; Kiehl, Kent A.; Kodituwakku, Piyadasa; Komesu, Yuko M.; Mayer, Andrew R.; Pearlson, Godfrey D.; Phillips, John P.; Sadek, Joseph R.; Stevens, Michael; Teuscher, Ursina; Thoma, Robert J.; Calhoun, Vince D.

    2011-01-01

    As the size of functional and structural MRI datasets expands, it becomes increasingly important to establish a baseline from which diagnostic relevance may be determined, a processing strategy that efficiently prepares data for analysis, and a statistical approach that identifies important effects in a manner that is both robust and reproducible. In this paper, we introduce a multivariate analytic approach that optimizes sensitivity and reduces unnecessary testing. We demonstrate the utility of this mega-analytic approach by identifying the effects of age and gender on the resting-state networks (RSNs) of 603 healthy adolescents and adults (mean age: 23.4 years, range: 12–71 years). Data were collected on the same scanner, preprocessed using an automated analysis pipeline based in SPM, and studied using group independent component analysis. RSNs were identified and evaluated in terms of three primary outcome measures: time course spectral power, spatial map intensity, and functional network connectivity. Results revealed robust effects of age on all three outcome measures, largely indicating decreases in network coherence and connectivity with increasing age. Gender effects were of smaller magnitude but suggested stronger intra-network connectivity in females and more inter-network connectivity in males, particularly with regard to sensorimotor networks. These findings, along with the analysis approach and statistical framework described here, provide a useful baseline for future investigations of brain networks in health and disease. PMID:21442040

  19. Replication, lies and lesser-known truths regarding experimental design in environmental microbiology.

    PubMed

    Lennon, Jay T

    2011-06-01

    A recent analysis revealed that most environmental microbiologists neglect replication in their science (Prosser, 2010). Of all peer-reviewed papers published during 2009 in the field's leading journals, slightly more than 70% lacked replication when it came to analyzing microbial community data. The paucity of replication is viewed as an 'endemic' and 'embarrassing' problem that amounts to 'bad science', or worse yet, as the title suggests, lying (Prosser, 2010). Although replication is an important component of experimental design, it is possible to do good science without replication. There are various quantitative techniques - some old, some new - that, when used properly, will allow environmental microbiologists to make strong statistical conclusions from experimental and comparative data. Here, I provide examples where unreplicated data can be used to test hypotheses and yield novel information in a statistically robust manner. © 2011 Society for Applied Microbiology and Blackwell Publishing Ltd.

  20. An On-Demand Optical Quantum Random Number Generator with In-Future Action and Ultra-Fast Response

    PubMed Central

    Stipčević, Mario; Ursin, Rupert

    2015-01-01

    Random numbers are essential for our modern information based society e.g. in cryptography. Unlike frequently used pseudo-random generators, physical random number generators do not depend on complex algorithms but rather on a physicsal process to provide true randomness. Quantum random number generators (QRNG) do rely on a process, wich can be described by a probabilistic theory only, even in principle. Here we present a conceptualy simple implementation, which offers a 100% efficiency of producing a random bit upon a request and simultaneously exhibits an ultra low latency. A careful technical and statistical analysis demonstrates its robustness against imperfections of the actual implemented technology and enables to quickly estimate randomness of very long sequences. Generated random numbers pass standard statistical tests without any post-processing. The setup described, as well as the theory presented here, demonstrate the maturity and overall understanding of the technology. PMID:26057576

  1. Statistical shape (ASM) and appearance (AAM) models for the segmentation of the cerebellum in fetal ultrasound

    NASA Astrophysics Data System (ADS)

    Reyes López, Misael; Arámbula Cosío, Fernando

    2017-11-01

    The cerebellum is an important structure to determine the gestational age of the fetus, moreover most of the abnormalities it presents are related to growth disorders. In this work, we present the results of the segmentation of the fetal cerebellum applying statistical shape and appearance models. Both models were tested on ultrasound images of the fetal brain taken from 23 pregnant women, between 18 and 24 gestational weeks. The accuracy results obtained on 11 ultrasound images show a mean Hausdorff distance of 6.08 mm between the manual segmentation and the segmentation using active shape model, and a mean Hausdorff distance of 7.54 mm between the manual segmentation and the segmentation using active appearance model. The reported results demonstrate that the active shape model is more robust in the segmentation of the fetal cerebellum in ultrasound images.

  2. RESOLVE: A new algorithm for aperture synthesis imaging of extended emission in radio astronomy

    NASA Astrophysics Data System (ADS)

    Junklewitz, H.; Bell, M. R.; Selig, M.; Enßlin, T. A.

    2016-02-01

    We present resolve, a new algorithm for radio aperture synthesis imaging of extended and diffuse emission in total intensity. The algorithm is derived using Bayesian statistical inference techniques, estimating the surface brightness in the sky assuming a priori log-normal statistics. resolve estimates the measured sky brightness in total intensity, and the spatial correlation structure in the sky, which is used to guide the algorithm to an optimal reconstruction of extended and diffuse sources. During this process, the algorithm succeeds in deconvolving the effects of the radio interferometric point spread function. Additionally, resolve provides a map with an uncertainty estimate of the reconstructed surface brightness. Furthermore, with resolve we introduce a new, optimal visibility weighting scheme that can be viewed as an extension to robust weighting. In tests using simulated observations, the algorithm shows improved performance against two standard imaging approaches for extended sources, Multiscale-CLEAN and the Maximum Entropy Method.

  3. Evaluating the validity of an integrity-based situational judgement test for medical school admissions.

    PubMed

    Husbands, Adrian; Rodgerson, Mark J; Dowell, Jon; Patterson, Fiona

    2015-09-02

    While the construct of integrity has emerged as a front-runner amongst the desirable attributes to select for in medical school admissions, it is less clear how best to assess this characteristic. A potential solution lies in the use of Situational Judgement Tests (SJTs) which have gained popularity due to robust psychometric evidence and potential for large-scale administration. This study aims to explore the psychometric properties of an SJT designed to measure the construct of integrity. Ten SJT scenarios, each with five response stems were developed from critical incident interviews with academic and clinical staff. 200 of 520 (38.5 %) Multiple Mini Interview candidates at Dundee Medical School participated in the study during the 2012-2013 admissions cycle. Participants were asked to rate the appropriateness of each SJT response on a 4-point likert scale as well as complete the HEXACO personality inventory and a face validity questionnaire. Pearson's correlations and descriptive statistics were used to examine the associations between SJT score, HEXACO personality traits, pre-admissions measures namely academic and United Kingdom Clinical Aptitude Test (UKCAT) scores, as well as acceptability. Cronbach's alpha reliability for the SJT was .64. Statistically significant correlations ranging from .16 to .36 (.22 to .53 disattenuated) were observed between SJT score and the honesty-humility (integrity), conscientiousness, extraversion and agreeableness dimensions of the HEXACO inventory. A significant correlation of .32 (.47 disattenuated) was observed between SJT and MMI scores and no significant relationship with the UKCAT. Participant reactions to the SJTs were generally positive. Initial findings are encouraging regarding the psychometric robustness of an integrity-based SJT for medical student selection, with significant associations found between the SJTs, integrity, other desirable personality traits and the MMI. The SJTs showed little or no redundancy with cognitive ability. Results suggest that carefully-designed SJTs may augment more costly MMIs.

  4. Biochemical phenotypes to discriminate microbial subpopulations and improve outbreak detection.

    PubMed

    Galar, Alicia; Kulldorff, Martin; Rudnick, Wallis; O'Brien, Thomas F; Stelling, John

    2013-01-01

    Clinical microbiology laboratories worldwide constitute an invaluable resource for monitoring emerging threats and the spread of antimicrobial resistance. We studied the growing number of biochemical tests routinely performed on clinical isolates to explore their value as epidemiological markers. Microbiology laboratory results from January 2009 through December 2011 from a 793-bed hospital stored in WHONET were examined. Variables included patient location, collection date, organism, and 47 biochemical and 17 antimicrobial susceptibility test results reported by Vitek 2. To identify biochemical tests that were particularly valuable (stable with repeat testing, but good variability across the species) or problematic (inconsistent results with repeat testing), three types of variance analyses were performed on isolates of K. pneumonia: descriptive analysis of discordant biochemical results in same-day isolates, an average within-patient variance index, and generalized linear mixed model variance component analysis. 4,200 isolates of K. pneumoniae were identified from 2,485 patients, 32% of whom had multiple isolates. The first two variance analyses highlighted SUCT, TyrA, GlyA, and GGT as "nuisance" biochemicals for which discordant within-patient test results impacted a high proportion of patient results, while dTAG had relatively good within-patient stability with good heterogeneity across the species. Variance component analyses confirmed the relative stability of dTAG, and identified additional biochemicals such as PHOS with a large between patient to within patient variance ratio. A reduced subset of biochemicals improved the robustness of strain definition for carbapenem-resistant K. pneumoniae. Surveillance analyses suggest that the reduced biochemical profile could improve the timeliness and specificity of outbreak detection algorithms. The statistical approaches explored can improve the robust recognition of microbial subpopulations with routinely available biochemical test results, of value in the timely detection of outbreak clones and evolutionarily important genetic events.

  5. Compact Quantum Random Number Generator with Silicon Nanocrystals Light Emitting Device Coupled to a Silicon Photomultiplier

    NASA Astrophysics Data System (ADS)

    Bisadi, Zahra; Acerbi, Fabio; Fontana, Giorgio; Zorzi, Nicola; Piemonte, Claudio; Pucker, Georg; Pavesi, Lorenzo

    2018-02-01

    A small-sized photonic quantum random number generator, easy to be implemented in small electronic devices for secure data encryption and other applications, is highly demanding nowadays. Here, we propose a compact configuration with Silicon nanocrystals large area light emitting device (LED) coupled to a Silicon photomultiplier to generate random numbers. The random number generation methodology is based on the photon arrival time and is robust against the non-idealities of the detector and the source of quantum entropy. The raw data show high quality of randomness and pass all the statistical tests in national institute of standards and technology tests (NIST) suite without a post-processing algorithm. The highest bit rate is 0.5 Mbps with the efficiency of 4 bits per detected photon.

  6. Gene expression markers of age-related inflammation in two human cohorts.

    PubMed

    Pilling, Luke C; Joehanes, Roby; Melzer, David; Harries, Lorna W; Henley, William; Dupuis, Josée; Lin, Honghuang; Mitchell, Marcus; Hernandez, Dena; Ying, Sai-Xia; Lunetta, Kathryn L; Benjamin, Emelia J; Singleton, Andrew; Levy, Daniel; Munson, Peter; Murabito, Joanne M; Ferrucci, Luigi

    2015-10-01

    Chronically elevated circulating inflammatory markers are common in older persons but mechanisms are unclear. Many blood transcripts (>800 genes) are associated with interleukin-6 protein levels (IL6) independent of age. We aimed to identify gene transcripts statistically mediating, as drivers or responders, the increasing levels of IL6 protein in blood at older ages. Blood derived in-vivo RNA from the Framingham Heart Study (FHS, n=2422, ages 40-92 yrs) and InCHIANTI study (n=694, ages 30-104 yrs), with Affymetrix and Illumina expression arrays respectively (>17,000 genes tested), were tested for statistical mediation of the age-IL6 association using resampling techniques, adjusted for confounders and multiple testing. In FHS, IL6 expression was not associated with IL6 protein levels in blood. 102 genes (0.6% of 17,324 expressed) statistically mediated the age-IL6 association of which 25 replicated in InCHIANTI (including 5 of the 10 largest effect genes). The largest effect gene (SLC4A10, coding for NCBE, a sodium bicarbonate transporter) mediated 19% (adjusted CI 8.9 to 34.1%) and replicated by PCR in InCHIANTI (n=194, 35.6% mediated, p=0.01). Other replicated mediators included PRF1 (perforin, a cytolytic protein in cytotoxic T lymphocytes and NK cells) and IL1B (Interleukin 1 beta): few other cytokines were significant mediators. This transcriptome-wide study on human blood identified a small distinct set of genes that statistically mediate the age-IL6 association. Findings are robust across two cohorts and different expression technologies. Raised IL6 levels may not derive from circulating white cells in age related inflammation. Published by Elsevier Inc.

  7. Strong claims and weak evidence: reassessing the predictive validity of the IAT.

    PubMed

    Blanton, Hart; Jaccard, James; Klick, Jonathan; Mellers, Barbara; Mitchell, Gregory; Tetlock, Philip E

    2009-05-01

    The authors reanalyzed data from 2 influential studies-A. R. McConnell and J. M. Leibold and J. C. Ziegert and P. J. Hanges-that explore links between implicit bias and discriminatory behavior and that have been invoked to support strong claims about the predictive validity of the Implicit Association Test. In both of these studies, the inclusion of race Implicit Association Test scores in regression models reduced prediction errors by only tiny amounts, and Implicit Association Test scores did not permit prediction of individual-level behaviors. Furthermore, the results were not robust when the impact of rater reliability, statistical specifications, and/or outliers were taken into account, and reanalysis of A. R. McConnell & J. M. Leibold (2001) revealed a pattern of behavior consistent with a pro-Black behavioral bias, rather than the anti-Black bias suggested in the original study. (c) 2009 APA, all rights reserved.

  8. Advancing institutional anomie theory: a microlevel examination connecting culture, institutions, and deviance.

    PubMed

    Muftić, Lisa R

    2006-12-01

    Institutional anomie theory (IAT) contends that crime can be explained by an examination of American society, particularly the exaggerated emphasis on economic success inherent in American culture, which has created a "cheating orientation" that permeates structural institutions, including academia. Consistent with its macrosocial perspective, previous tests of IAT have examined IAT variables at the structural level only. The current study tests the robustness of IAT by operationalizing IAT variables at the individual level and looking at a minor form of deviance, student cheating. The author also examines the role statistical modeling has in testing the theory at the microlevel. Undergraduates, 122 American born and 48 international, were surveyed about their cheating behaviors and adherence to economic goal orientations. Results related to the hypothesis that American students, relative to foreign-born students, will have an increased adherence to economic goal orientations that increase cheating behaviors are presented, as are suggestions for future studies.

  9. Integrated versus isolated training of the hemiparetic upper extremity in haptically rendered virtual environments.

    PubMed

    Qiu, Qinyin; Fluet, Gerard G; Saleh, Soha; Lafond, Ian; Merians, Alma S; Adamovich, Sergei V

    2010-01-01

    This paper describes the preliminary results of an ongoing study of the effects of two training approaches on motor function and learning in persons with hemi paresis due to cerebrovascular accidents. Eighteen subjects with chronic stroke performed eight, three-hour sessions of sensorimotor training in haptically renedered environments. Eleven subjects performed training activities that integrated hand and arm movement while another seven subjects performed activities that trained the hand and arm with separately. As a whole, the eighteen subjects made statistically significant improvements in motor function as evidenced by robust improvements in Wolf Motor Function Test times and corresponding improvements in Jebsen Test of Hand Function times. There were no significant between group effects for these tests. However, the two training approaches elicited different patterns and magnitudes of performance improvement that suggest that they may elicit different types of change in motor learning and or control.

  10. Assessing noninferiority in a three-arm trial using the Bayesian approach.

    PubMed

    Ghosh, Pulak; Nathoo, Farouk; Gönen, Mithat; Tiwari, Ram C

    2011-07-10

    Non-inferiority trials, which aim to demonstrate that a test product is not worse than a competitor by more than a pre-specified small amount, are of great importance to the pharmaceutical community. As a result, methodology for designing and analyzing such trials is required, and developing new methods for such analysis is an important area of statistical research. The three-arm trial consists of a placebo, a reference and an experimental treatment, and simultaneously tests the superiority of the reference over the placebo along with comparing this reference to an experimental treatment. In this paper, we consider the analysis of non-inferiority trials using Bayesian methods which incorporate both parametric as well as semi-parametric models. The resulting testing approach is both flexible and robust. The benefit of the proposed Bayesian methods is assessed via simulation, based on a study examining home-based blood pressure interventions. Copyright © 2011 John Wiley & Sons, Ltd.

  11. Exploiting the full power of temporal gene expression profiling through a new statistical test: application to the analysis of muscular dystrophy data.

    PubMed

    Vinciotti, Veronica; Liu, Xiaohui; Turk, Rolf; de Meijer, Emile J; 't Hoen, Peter A C

    2006-04-03

    The identification of biologically interesting genes in a temporal expression profiling dataset is challenging and complicated by high levels of experimental noise. Most statistical methods used in the literature do not fully exploit the temporal ordering in the dataset and are not suited to the case where temporal profiles are measured for a number of different biological conditions. We present a statistical test that makes explicit use of the temporal order in the data by fitting polynomial functions to the temporal profile of each gene and for each biological condition. A Hotelling T2-statistic is derived to detect the genes for which the parameters of these polynomials are significantly different from each other. We validate the temporal Hotelling T2-test on muscular gene expression data from four mouse strains which were profiled at different ages: dystrophin-, beta-sarcoglycan and gamma-sarcoglycan deficient mice, and wild-type mice. The first three are animal models for different muscular dystrophies. Extensive biological validation shows that the method is capable of finding genes with temporal profiles significantly different across the four strains, as well as identifying potential biomarkers for each form of the disease. The added value of the temporal test compared to an identical test which does not make use of temporal ordering is demonstrated via a simulation study, and through confirmation of the expression profiles from selected genes by quantitative PCR experiments. The proposed method maximises the detection of the biologically interesting genes, whilst minimising false detections. The temporal Hotelling T2-test is capable of finding relatively small and robust sets of genes that display different temporal profiles between the conditions of interest. The test is simple, it can be used on gene expression data generated from any experimental design and for any number of conditions, and it allows fast interpretation of the temporal behaviour of genes. The R code is available from V.V. The microarray data have been submitted to GEO under series GSE1574 and GSE3523.

  12. Exploiting the full power of temporal gene expression profiling through a new statistical test: Application to the analysis of muscular dystrophy data

    PubMed Central

    Vinciotti, Veronica; Liu, Xiaohui; Turk, Rolf; de Meijer, Emile J; 't Hoen, Peter AC

    2006-01-01

    Background The identification of biologically interesting genes in a temporal expression profiling dataset is challenging and complicated by high levels of experimental noise. Most statistical methods used in the literature do not fully exploit the temporal ordering in the dataset and are not suited to the case where temporal profiles are measured for a number of different biological conditions. We present a statistical test that makes explicit use of the temporal order in the data by fitting polynomial functions to the temporal profile of each gene and for each biological condition. A Hotelling T2-statistic is derived to detect the genes for which the parameters of these polynomials are significantly different from each other. Results We validate the temporal Hotelling T2-test on muscular gene expression data from four mouse strains which were profiled at different ages: dystrophin-, beta-sarcoglycan and gamma-sarcoglycan deficient mice, and wild-type mice. The first three are animal models for different muscular dystrophies. Extensive biological validation shows that the method is capable of finding genes with temporal profiles significantly different across the four strains, as well as identifying potential biomarkers for each form of the disease. The added value of the temporal test compared to an identical test which does not make use of temporal ordering is demonstrated via a simulation study, and through confirmation of the expression profiles from selected genes by quantitative PCR experiments. The proposed method maximises the detection of the biologically interesting genes, whilst minimising false detections. Conclusion The temporal Hotelling T2-test is capable of finding relatively small and robust sets of genes that display different temporal profiles between the conditions of interest. The test is simple, it can be used on gene expression data generated from any experimental design and for any number of conditions, and it allows fast interpretation of the temporal behaviour of genes. The R code is available from V.V. The microarray data have been submitted to GEO under series GSE1574 and GSE3523. PMID:16584545

  13. Robust Skull-Stripping Segmentation Based on Irrational Mask for Magnetic Resonance Brain Images.

    PubMed

    Moldovanu, Simona; Moraru, Luminița; Biswas, Anjan

    2015-12-01

    This paper proposes a new method for simple, efficient, and robust removal of the non-brain tissues in MR images based on an irrational mask for filtration within a binary morphological operation framework. The proposed skull-stripping segmentation is based on two irrational 3 × 3 and 5 × 5 masks, having the sum of its weights equal to the transcendental number π value provided by the Gregory-Leibniz infinite series. It allows maintaining a lower rate of useful pixel loss. The proposed method has been tested in two ways. First, it has been validated as a binary method by comparing and contrasting with Otsu's, Sauvola's, Niblack's, and Bernsen's binary methods. Secondly, its accuracy has been verified against three state-of-the-art skull-stripping methods: the graph cuts method, the method based on Chan-Vese active contour model, and the simplex mesh and histogram analysis skull stripping. The performance of the proposed method has been assessed using the Dice scores, overlap and extra fractions, and sensitivity and specificity as statistical methods. The gold standard has been provided by two neurologist experts. The proposed method has been tested and validated on 26 image series which contain 216 images from two publicly available databases: the Whole Brain Atlas and the Internet Brain Segmentation Repository that include a highly variable sample population (with reference to age, sex, healthy/diseased). The approach performs accurately on both standardized databases. The main advantage of the proposed method is its robustness and speed.

  14. Robust parallel iterative solvers for linear and least-squares problems, Final Technical Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Saad, Yousef

    2014-01-16

    The primary goal of this project is to study and develop robust iterative methods for solving linear systems of equations and least squares systems. The focus of the Minnesota team is on algorithms development, robustness issues, and on tests and validation of the methods on realistic problems. 1. The project begun with an investigation on how to practically update a preconditioner obtained from an ILU-type factorization, when the coefficient matrix changes. 2. We investigated strategies to improve robustness in parallel preconditioners in a specific case of a PDE with discontinuous coefficients. 3. We explored ways to adapt standard preconditioners formore » solving linear systems arising from the Helmholtz equation. These are often difficult linear systems to solve by iterative methods. 4. We have also worked on purely theoretical issues related to the analysis of Krylov subspace methods for linear systems. 5. We developed an effective strategy for performing ILU factorizations for the case when the matrix is highly indefinite. The strategy uses shifting in some optimal way. The method was extended to the solution of Helmholtz equations by using complex shifts, yielding very good results in many cases. 6. We addressed the difficult problem of preconditioning sparse systems of equations on GPUs. 7. A by-product of the above work is a software package consisting of an iterative solver library for GPUs based on CUDA. This was made publicly available. It was the first such library that offers complete iterative solvers for GPUs. 8. We considered another form of ILU which blends coarsening techniques from Multigrid with algebraic multilevel methods. 9. We have released a new version on our parallel solver - called pARMS [new version is version 3]. As part of this we have tested the code in complex settings - including the solution of Maxwell and Helmholtz equations and for a problem of crystal growth.10. As an application of polynomial preconditioning we considered the problem of evaluating f(A)v which arises in statistical sampling. 11. As an application to the methods we developed, we tackled the problem of computing the diagonal of the inverse of a matrix. This arises in statistical applications as well as in many applications in physics. We explored probing methods as well as domain-decomposition type methods. 12. A collaboration with researchers from Toulouse, France, considered the important problem of computing the Schur complement in a domain-decomposition approach. 13. We explored new ways of preconditioning linear systems, based on low-rank approximations.« less

  15. Assessing the utility of frequency tagging for tracking memory-based reactivation of word representations.

    PubMed

    Lewis, Ashley Glen; Schriefers, Herbert; Bastiaansen, Marcel; Schoffelen, Jan-Mathijs

    2018-05-21

    Reinstatement of memory-related neural activity measured with high temporal precision potentially provides a useful index for real-time monitoring of the timing of activation of memory content during cognitive processing. The utility of such an index extends to any situation where one is interested in the (relative) timing of activation of different sources of information in memory, a paradigm case of which is tracking lexical activation during language processing. Essential for this approach is that memory reinstatement effects are robust, so that their absence (in the average) definitively indicates that no lexical activation is present. We used electroencephalography to test the robustness of a reported subsequent memory finding involving reinstatement of frequency-specific entrained oscillatory brain activity during subsequent recognition. Participants learned lists of words presented on a background flickering at either 6 or 15 Hz to entrain a steady-state brain response. Target words subsequently presented on a non-flickering background that were correctly identified as previously seen exhibited reinstatement effects at both entrainment frequencies. Reliability of these statistical inferences was however critically dependent on the approach used for multiple comparisons correction. We conclude that effects are not robust enough to be used as a reliable index of lexical activation during language processing.

  16. Robust Magnetotelluric Impedance Estimation

    NASA Astrophysics Data System (ADS)

    Sutarno, D.

    2010-12-01

    Robust magnetotelluric (MT) response function estimators are now in standard use by the induction community. Properly devised and applied, these have ability to reduce the influence of unusual data (outliers). The estimators always yield impedance estimates which are better than the conventional least square (LS) estimation because the `real' MT data almost never satisfy the statistical assumptions of Gaussian distribution and stationary upon which normal spectral analysis is based. This paper discuses the development and application of robust estimation procedures which can be classified as M-estimators to MT data. Starting with the description of the estimators, special attention is addressed to the recent development of a bounded-influence robust estimation, including utilization of the Hilbert Transform (HT) operation on causal MT impedance functions. The resulting robust performances are illustrated using synthetic as well as real MT data.

  17. Robustness of location estimators under t-distributions: a literature review

    NASA Astrophysics Data System (ADS)

    Sumarni, C.; Sadik, K.; Notodiputro, K. A.; Sartono, B.

    2017-03-01

    The assumption of normality is commonly used in estimation of parameters in statistical modelling, but this assumption is very sensitive to outliers. The t-distribution is more robust than the normal distribution since the t-distributions have longer tails. The robustness measures of location estimators under t-distributions are reviewed and discussed in this paper. For the purpose of illustration we use the onion yield data which includes outliers as a case study and showed that the t model produces better fit than the normal model.

  18. Level set method with automatic selective local statistics for brain tumor segmentation in MR images.

    PubMed

    Thapaliya, Kiran; Pyun, Jae-Young; Park, Chun-Su; Kwon, Goo-Rak

    2013-01-01

    The level set approach is a powerful tool for segmenting images. This paper proposes a method for segmenting brain tumor images from MR images. A new signed pressure function (SPF) that can efficiently stop the contours at weak or blurred edges is introduced. The local statistics of the different objects present in the MR images were calculated. Using local statistics, the tumor objects were identified among different objects. In this level set method, the calculation of the parameters is a challenging task. The calculations of different parameters for different types of images were automatic. The basic thresholding value was updated and adjusted automatically for different MR images. This thresholding value was used to calculate the different parameters in the proposed algorithm. The proposed algorithm was tested on the magnetic resonance images of the brain for tumor segmentation and its performance was evaluated visually and quantitatively. Numerical experiments on some brain tumor images highlighted the efficiency and robustness of this method. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.

  19. Sensitivity Analyses of the Change in FVC in a Phase 3 Trial of Pirfenidone for Idiopathic Pulmonary Fibrosis

    PubMed Central

    Bradford, Williamson Z.; Fagan, Elizabeth A.; Glaspole, Ian; Glassberg, Marilyn K.; Glasscock, Kenneth F.; King, Talmadge E.; Lancaster, Lisa H.; Nathan, Steven D.; Pereira, Carlos A.; Sahn, Steven A.; Swigris, Jeffrey J.; Noble, Paul W.

    2015-01-01

    BACKGROUND: FVC outcomes in clinical trials on idiopathic pulmonary fibrosis (IPF) can be substantially influenced by the analytic methodology and the handling of missing data. We conducted a series of sensitivity analyses to assess the robustness of the statistical finding and the stability of the estimate of the magnitude of treatment effect on the primary end point of FVC change in a phase 3 trial evaluating pirfenidone in adults with IPF. METHODS: Source data included all 555 study participants randomized to treatment with pirfenidone or placebo in the Assessment of Pirfenidone to Confirm Efficacy and Safety in Idiopathic Pulmonary Fibrosis (ASCEND) study. Sensitivity analyses were conducted to assess whether alternative statistical tests and methods for handling missing data influenced the observed magnitude of treatment effect on the primary end point of change from baseline to week 52 in FVC. RESULTS: The distribution of FVC change at week 52 was systematically different between the two treatment groups and favored pirfenidone in each analysis. The method used to impute missing data due to death had a marked effect on the magnitude of change in FVC in both treatment groups; however, the magnitude of treatment benefit was generally consistent on a relative basis, with an approximate 50% reduction in FVC decline observed in the pirfenidone group in each analysis. CONCLUSIONS: Our results confirm the robustness of the statistical finding on the primary end point of change in FVC in the ASCEND trial and corroborate the estimated magnitude of the pirfenidone treatment effect in patients with IPF. TRIAL REGISTRY: ClinicalTrials.gov; No.: NCT01366209; URL: www.clinicaltrials.gov PMID:25856121

  20. Power of treatment success definitions when the Canine Brief Pain Inventory is used to evaluate carprofen treatment for the control of pain and inflammation in dogs with osteoarthritis.

    PubMed

    Brown, Dorothy Cimino; Bell, Margie; Rhodes, Linda

    2013-12-01

    To determine the optimal method for use of the Canine Brief Pain Inventory (CBPI) to quantitate responses of dogs with osteoarthritis to treatment with carprofen or placebo. 150 dogs with osteoarthritis. Data were analyzed from 2 studies with identical protocols in which owner-completed CBPIs were used. Treatment for each dog was classified as a success or failure by comparing the pain severity score (PSS) and pain interference score (PIS) on day 0 (baseline) with those on day 14. Treatment success or failure was defined on the basis of various combinations of reduction in the 2 scores when inclusion criteria were set as a PSS and PIS ≥ 1, 2, or 3 at baseline. Statistical analyses were performed to select the definition of treatment success that had the greatest statistical power to detect differences between carprofen and placebo treatments. Defining treatment success as a reduction of ≥ 1 in PSS and ≥ 2 in PIS in each dog had consistently robust power. Power was 62.8% in the population that included only dogs with baseline scores ≥ 2 and 64.7% in the population that included only dogs with baseline scores ≥ 3. The CBPI had robust statistical power to evaluate the treatment effect of carprofen in dogs with osteoarthritis when protocol success criteria were predefined as a reduction ≥ 1 in PIS and ≥ 2 in PSS. Results indicated the CBPI can be used as an outcome measure in clinical trials to evaluate new pain treatments when it is desirable to evaluate success in individual dogs rather than overall mean or median scores in a test population.

  1. The Graphical Display of Simulation Results, with Applications to the Comparison of Robust IRT Estimators of Ability.

    ERIC Educational Resources Information Center

    Thissen, David; Wainer, Howard

    Simulation studies of the performance of (potentially) robust statistical estimation produce large quantities of numbers in the form of performance indices of the various estimators under various conditions. This report presents a multivariate graphical display used to aid in the digestion of the plentiful results in a current study of Item…

  2. Robust Mokken Scale Analysis by Means of the Forward Search Algorithm for Outlier Detection

    ERIC Educational Resources Information Center

    Zijlstra, Wobbe P.; van der Ark, L. Andries; Sijtsma, Klaas

    2011-01-01

    Exploratory Mokken scale analysis (MSA) is a popular method for identifying scales from larger sets of items. As with any statistical method, in MSA the presence of outliers in the data may result in biased results and wrong conclusions. The forward search algorithm is a robust diagnostic method for outlier detection, which we adapt here to…

  3. Wind Energy Facilities and Residential Properties: The Effect of Proximity and View on Sales Prices

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    San Diego State University; Bard Center for Environmental Policy at Bard College; Hoen, Ben

    2011-06-23

    With increasing numbers of communities considering wind power developments, empirical investigations regarding related community concerns are needed. One such concern is that proximate property values may be adversely affected, yet relatively little research exists on the subject. The present research investigates roughly 7,500 sales of single-family homes surrounding 24 existing U.S. wind facilities. Across four different hedonic models, and a variety of robustness tests, the results are consistent: neither the view of the wind facilities nor the distance of the home to those facilities is found to have a statistically significant effect on sales prices, yet further research is warranted.

  4. Morphometric and molecular data on two mitochondrial genes of a newly discovered chimaeran fish ( Hydrolagus melanophasma, Chondrichthyes)

    NASA Astrophysics Data System (ADS)

    De La Cruz-Agüero, José; García-Rodríguez, Francisco Javier; Cota-Gómez, Víctor Manuel; Melo-Barrera, Felipe Neri; González-Armas, Rogelio

    2012-06-01

    Fresh and preserved (type material) specimens of the black ghost chimaera Hydrolagus melanophasma were compared for morphometric characteristics. A molecular comparison was also performed on two mitochondrial gene sequences (12S rRNA and 16S rRNA gene sequences). While significant differences in measurements were found, the differences were not attributable to sexual dimorphism or the quality of the specimens, but to the sample size and the type of statistical tests. The result of the genetic characterization showed that 12S rRNA and 16S rRNA genes represented robust molecular markers that characterized the species.

  5. Sampling stored product insect pests: a comparison of four statistical sampling models for probability of pest detection

    USDA-ARS?s Scientific Manuscript database

    Statistically robust sampling strategies form an integral component of grain storage and handling activities throughout the world. Developing sampling strategies to target biological pests such as insects in stored grain is inherently difficult due to species biology and behavioral characteristics. ...

  6. Statistical plant set estimation using Schroeder-phased multisinusoidal input design

    NASA Technical Reports Server (NTRS)

    Bayard, D. S.

    1992-01-01

    A frequency domain method is developed for plant set estimation. The estimation of a plant 'set' rather than a point estimate is required to support many methods of modern robust control design. The approach here is based on using a Schroeder-phased multisinusoid input design which has the special property of placing input energy only at the discrete frequency points used in the computation. A detailed analysis of the statistical properties of the frequency domain estimator is given, leading to exact expressions for the probability distribution of the estimation error, and many important properties. It is shown that, for any nominal parametric plant estimate, one can use these results to construct an overbound on the additive uncertainty to any prescribed statistical confidence. The 'soft' bound thus obtained can be used to replace 'hard' bounds presently used in many robust control analysis and synthesis methods.

  7. Rigorous force field optimization principles based on statistical distance minimization

    DOE PAGES

    Vlcek, Lukas; Chialvo, Ariel A.

    2015-10-12

    We use the concept of statistical distance to define a measure of distinguishability between a pair of statistical mechanical systems, i.e., a model and its target, and show that its minimization leads to general convergence of the model’s static measurable properties to those of the target. Here we exploit this feature to define a rigorous basis for the development of accurate and robust effective molecular force fields that are inherently compatible with coarse-grained experimental data. The new model optimization principles and their efficient implementation are illustrated through selected examples, whose outcome demonstrates the higher robustness and predictive accuracy of themore » approach compared to other currently used methods, such as force matching and relative entropy minimization. We also discuss relations between the newly developed principles and established thermodynamic concepts, which include the Gibbs-Bogoliubov inequality and the thermodynamic length.« less

  8. Baseline estimation in flame's spectra by using neural networks and robust statistics

    NASA Astrophysics Data System (ADS)

    Garces, Hugo; Arias, Luis; Rojas, Alejandro

    2014-09-01

    This work presents a baseline estimation method in flame spectra based on artificial intelligence structure as a neural network, combining robust statistics with multivariate analysis to automatically discriminate measured wavelengths belonging to continuous feature for model adaptation, surpassing restriction of measuring target baseline for training. The main contributions of this paper are: to analyze a flame spectra database computing Jolliffe statistics from Principal Components Analysis detecting wavelengths not correlated with most of the measured data corresponding to baseline; to systematically determine the optimal number of neurons in hidden layers based on Akaike's Final Prediction Error; to estimate baseline in full wavelength range sampling measured spectra; and to train an artificial intelligence structure as a Neural Network which allows to generalize the relation between measured and baseline spectra. The main application of our research is to compute total radiation with baseline information, allowing to diagnose combustion process state for optimization in early stages.

  9. MEG connectivity analysis in patients with Alzheimer's disease using cross mutual information and spectral coherence.

    PubMed

    Alonso, Joan Francesc; Poza, Jesús; Mañanas, Miguel Angel; Romero, Sergio; Fernández, Alberto; Hornero, Roberto

    2011-01-01

    Alzheimer's disease (AD) is an irreversible brain disorder which represents the most common form of dementia in western countries. An early and accurate diagnosis of AD would enable to develop new strategies for managing the disease; however, nowadays there is no single test that can accurately predict the development of AD. In this sense, only a few studies have focused on the magnetoencephalographic (MEG) AD connectivity patterns. This study compares brain connectivity in terms of linear and nonlinear couplings by means of spectral coherence and cross mutual information function (CMIF), respectively. The variables defined from these functions provide statistically significant differences (p < 0.05) between AD patients and control subjects, especially the variables obtained from CMIF. The results suggest that AD is characterized by both decreases and increases of functional couplings in different frequency bands as well as by an increase in regularity, that is, more evident statistical deterministic relationships in AD patients' MEG connectivity. The significant differences obtained indicate that AD could disturb brain interactions causing abnormal brain connectivity and operation. Furthermore, the combination of coherence and CMIF features to perform a diagnostic test based on logistic regression improved the tests based on individual variables for its robustness.

  10. [Analysis on 2011 quality control results on aerobic plate count of microbiology laboratories in China].

    PubMed

    Han, Haihong; Li, Ning; Li, Yepeng; Fu, Ping; Yu, Dongmin; Li Zhigang; Du, Chunming; Guo, Yunchang

    2015-01-01

    To test the aerobic plate count examining capability of microbiology laboratories, to ensure the accuracy and comparability of quantitative bacteria examination results, and to improve the quality of monitoring. The 4 different concentration aerobic plate count piece samples were prepared and noted as I, II, III and IV. After homogeneity and stability tests, the samples were delivered to monitoring institutions. The results of I, II, III samples were logarithmic transformed, and evaluated with Z-score method using the robust average and standard deviation. The results of IV samples were evaluated as "satisfactory" when reported as < 10 CFU/piece or as "not satisfactory" otherwise. Pearson χ2 test was used to analyze the ratio results. 309 monitoring institutions, which was 99.04% of the total number, reported their results. 271 institutions reported a satisfactory result, and the satisfactory rate was 87.70%. There was no statistical difference in satisfactory rates of I, II and III samples which were 81.52%, 88.30% and 91.40% respectively. The satisfactory rate of IV samples was 93.33%. There was no statistical difference in satisfactory rates between provincial and municipal CDC. The quality control program has provided scientific data that the aerobic plate count capability of the laboratories meets the requirements of monitoring tasks.

  11. Robust, automatic GPS station velocities and velocity time series

    NASA Astrophysics Data System (ADS)

    Blewitt, G.; Kreemer, C.; Hammond, W. C.

    2014-12-01

    Automation in GPS coordinate time series analysis makes results more objective and reproducible, but not necessarily as robust as the human eye to detect problems. Moreover, it is not a realistic option to manually scan our current load of >20,000 time series per day. This motivates us to find an automatic way to estimate station velocities that is robust to outliers, discontinuities, seasonality, and noise characteristics (e.g., heteroscedasticity). Here we present a non-parametric method based on the Theil-Sen estimator, defined as the median of velocities vij=(xj-xi)/(tj-ti) computed between all pairs (i, j). Theil-Sen estimators produce statistically identical solutions to ordinary least squares for normally distributed data, but they can tolerate up to 29% of data being problematic. To mitigate seasonality, our proposed estimator only uses pairs approximately separated by an integer number of years (N-δt)<(tj-ti )<(N+δt), where δt is chosen to be small enough to capture seasonality, yet large enough to reduce random error. We fix N=1 to maximally protect against discontinuities. In addition to estimating an overall velocity, we also use these pairs to estimate velocity time series. To test our methods, we process real data sets that have already been used with velocities published in the NA12 reference frame. Accuracy can be tested by the scatter of horizontal velocities in the North American plate interior, which is known to be stable to ~0.3 mm/yr. This presents new opportunities for time series interpretation. For example, the pattern of velocity variations at the interannual scale can help separate tectonic from hydrological processes. Without any step detection, velocity estimates prove to be robust for stations affected by the Mw7.2 2010 El Mayor-Cucapah earthquake, and velocity time series show a clear change after the earthquake, without any of the usual parametric constraints, such as relaxation of postseismic velocities to their preseismic values.

  12. Field significance of performance measures in the context of regional climate model evaluation. Part 1: temperature

    NASA Astrophysics Data System (ADS)

    Ivanov, Martin; Warrach-Sagi, Kirsten; Wulfmeyer, Volker

    2018-04-01

    A new approach for rigorous spatial analysis of the downscaling performance of regional climate model (RCM) simulations is introduced. It is based on a multiple comparison of the local tests at the grid cells and is also known as "field" or "global" significance. New performance measures for estimating the added value of downscaled data relative to the large-scale forcing fields are developed. The methodology is exemplarily applied to a standard EURO-CORDEX hindcast simulation with the Weather Research and Forecasting (WRF) model coupled with the land surface model NOAH at 0.11 ∘ grid resolution. Monthly temperature climatology for the 1990-2009 period is analysed for Germany for winter and summer in comparison with high-resolution gridded observations from the German Weather Service. The field significance test controls the proportion of falsely rejected local tests in a meaningful way and is robust to spatial dependence. Hence, the spatial patterns of the statistically significant local tests are also meaningful. We interpret them from a process-oriented perspective. In winter and in most regions in summer, the downscaled distributions are statistically indistinguishable from the observed ones. A systematic cold summer bias occurs in deep river valleys due to overestimated elevations, in coastal areas due probably to enhanced sea breeze circulation, and over large lakes due to the interpolation of water temperatures. Urban areas in concave topography forms have a warm summer bias due to the strong heat islands, not reflected in the observations. WRF-NOAH generates appropriate fine-scale features in the monthly temperature field over regions of complex topography, but over spatially homogeneous areas even small biases can lead to significant deteriorations relative to the driving reanalysis. As the added value of global climate model (GCM)-driven simulations cannot be smaller than this perfect-boundary estimate, this work demonstrates in a rigorous manner the clear additional value of dynamical downscaling over global climate simulations. The evaluation methodology has a broad spectrum of applicability as it is distribution-free, robust to spatial dependence, and accounts for time series structure.

  13. A New Polarimetric Classification Approach Evaluated for Agricultural Crops

    NASA Astrophysics Data System (ADS)

    Hoekman, D.

    2003-04-01

    Statistical properties of the polarimetric backscatter behaviour for a single homogeneous area are described by the Wishart distribution or its marginal distributions. These distributions do not necessarily well describe the statistics for a collection of homogeneous areas of the same class because of variation in, for example, biophysical parameters. Using Kolmogorov-Smirnov (K-S) tests of fit it is shown that, for example, the Beta distribution is a better descriptor for the coherence magnitude, and the log-normal distribution for the backscatter level. An evaluation is given for a number of agricultural crop classes, grasslands and fruit tree plantations at the Flevoland test site, using an AirSAR (C-, L- and P- band polarimetric) image of 3 July 1991. A new reversible transform of the covariance matrix into backscatter intensities will be introduced in order to describe the full polarimetric target properties in a mathematically alternative way, allowing for the development of simple, versatile and robust classifiers. Moreover, it allows for polarimetric image segmentation using conventional approaches. The effect of azimuthally asymmetric backscatter behaviour on the classification results is discussed. Several models are proposed and results are compared with results from literature for the same test site. It can be concluded that the introduced classifiers perform very well, with levels of accuracy for this test site of 90.4% for C-band, 88.7% for L- band and 96.3% for the combination of C- and L-band.

  14. Developing Uncertainty Models for Robust Flutter Analysis Using Ground Vibration Test Data

    NASA Technical Reports Server (NTRS)

    Potter, Starr; Lind, Rick; Kehoe, Michael W. (Technical Monitor)

    2001-01-01

    A ground vibration test can be used to obtain information about structural dynamics that is important for flutter analysis. Traditionally, this information#such as natural frequencies of modes#is used to update analytical models used to predict flutter speeds. The ground vibration test can also be used to obtain uncertainty models, such as natural frequencies and their associated variations, that can update analytical models for the purpose of predicting robust flutter speeds. Analyzing test data using the -norm, rather than the traditional 2-norm, is shown to lead to a minimum-size uncertainty description and, consequently, a least-conservative robust flutter speed. This approach is demonstrated using ground vibration test data for the Aerostructures Test Wing. Different norms are used to formulate uncertainty models and their associated robust flutter speeds to evaluate which norm is least conservative.

  15. Robust matching for voice recognition

    NASA Astrophysics Data System (ADS)

    Higgins, Alan; Bahler, L.; Porter, J.; Blais, P.

    1994-10-01

    This paper describes an automated method of comparing a voice sample of an unknown individual with samples from known speakers in order to establish or verify the individual's identity. The method is based on a statistical pattern matching approach that employs a simple training procedure, requires no human intervention (transcription, work or phonetic marketing, etc.), and makes no assumptions regarding the expected form of the statistical distributions of the observations. The content of the speech material (vocabulary, grammar, etc.) is not assumed to be constrained in any way. An algorithm is described which incorporates frame pruning and channel equalization processes designed to achieve robust performance with reasonable computational resources. An experimental implementation demonstrating the feasibility of the concept is described.

  16. A prospective earthquake forecast experiment in the western Pacific

    NASA Astrophysics Data System (ADS)

    Eberhard, David A. J.; Zechar, J. Douglas; Wiemer, Stefan

    2012-09-01

    Since the beginning of 2009, the Collaboratory for the Study of Earthquake Predictability (CSEP) has been conducting an earthquake forecast experiment in the western Pacific. This experiment is an extension of the Kagan-Jackson experiments begun 15 years earlier and is a prototype for future global earthquake predictability experiments. At the beginning of each year, seismicity models make a spatially gridded forecast of the number of Mw≥ 5.8 earthquakes expected in the next year. For the three participating statistical models, we analyse the first two years of this experiment. We use likelihood-based metrics to evaluate the consistency of the forecasts with the observed target earthquakes and we apply measures based on Student's t-test and the Wilcoxon signed-rank test to compare the forecasts. Overall, a simple smoothed seismicity model (TripleS) performs the best, but there are some exceptions that indicate continued experiments are vital to fully understand the stability of these models, the robustness of model selection and, more generally, earthquake predictability in this region. We also estimate uncertainties in our results that are caused by uncertainties in earthquake location and seismic moment. Our uncertainty estimates are relatively small and suggest that the evaluation metrics are relatively robust. Finally, we consider the implications of our results for a global earthquake forecast experiment.

  17. Restoration of MRI data for intensity non-uniformities using local high order intensity statistics

    PubMed Central

    Hadjidemetriou, Stathis; Studholme, Colin; Mueller, Susanne; Weiner, Michael; Schuff, Norbert

    2008-01-01

    MRI at high magnetic fields (>3.0 T) is complicated by strong inhomogeneous radio-frequency fields, sometimes termed the “bias field”. These lead to non-biological intensity non-uniformities across the image. They can complicate further image analysis such as registration and tissue segmentation. Existing methods for intensity uniformity restoration have been optimized for 1.5 T, but they are less effective for 3.0 T MRI, and not at all satisfactory for higher fields. Also, many of the existing restoration algorithms require a brain template or use a prior atlas, which can restrict their practicalities. In this study an effective intensity uniformity restoration algorithm has been developed based on non-parametric statistics of high order local intensity co-occurrences. These statistics are restored with a non-stationary Wiener filter. The algorithm also assumes a smooth non-uniformity and is stable. It does not require a prior atlas and is robust to variations in anatomy. In geriatric brain imaging it is robust to variations such as enlarged ventricles and low contrast to noise ratio. The co-occurrence statistics improve robustness to whole head images with pronounced non-uniformities present in high field acquisitions. Its significantly improved performance and lower time requirements have been demonstrated by comparing it to the very commonly used N3 algorithm on BrainWeb MR simulator images as well as on real 4 T human head images. PMID:18621568

  18. Using eye tracking to test for individual differences in attention to attractive faces

    PubMed Central

    Valuch, Christian; Pflüger, Lena S.; Wallner, Bernard; Laeng, Bruno; Ansorge, Ulrich

    2015-01-01

    We assessed individual differences in visual attention toward faces in relation to their attractiveness via saccadic reaction times. Motivated by the aim to understand individual differences in attention to faces, we tested three hypotheses: (a) Attractive faces hold or capture attention more effectively than less attractive faces; (b) men show a stronger bias toward attractive opposite-sex faces than women; and (c) blue-eyed men show a stronger bias toward blue-eyed than brown-eyed feminine faces. The latter test was included because prior research suggested a high effect size. Our data supported hypotheses (a) and (b) but not (c). By conducting separate tests for disengagement of attention and attention capture, we found that individual differences exist at distinct stages of attentional processing but these differences are of varying robustness and importance. In our conclusion, we also advocate the use of linear mixed effects models as the most appropriate statistical approach for studying inter-individual differences in visual attention with naturalistic stimuli. PMID:25698993

  19. Validation of powder X-ray diffraction following EN ISO/IEC 17025.

    PubMed

    Eckardt, Regina; Krupicka, Erik; Hofmeister, Wolfgang

    2012-05-01

    Powder X-ray diffraction (PXRD) is used widely in forensic science laboratories with the main focus of qualitative phase identification. Little is found in literature referring to the topic of validation of PXRD in the field of forensic sciences. According to EN ISO/IEC 17025, the method has to be tested for several parameters. Trueness, specificity, and selectivity of PXRD were tested using certified reference materials or a combination thereof. All three tested parameters showed the secure performance of the method. Sample preparation errors were simulated to evaluate the robustness of the method. These errors were either easily detected by the operator or nonsignificant for phase identification. In case of the detection limit, a statistical evaluation of the signal-to-noise ratio showed that a peak criterion of three sigma is inadequate and recommendations for a more realistic peak criterion are given. Finally, the results of an international proficiency test showed the secure performance of PXRD. © 2012 American Academy of Forensic Sciences.

  20. Using eye tracking to test for individual differences in attention to attractive faces.

    PubMed

    Valuch, Christian; Pflüger, Lena S; Wallner, Bernard; Laeng, Bruno; Ansorge, Ulrich

    2015-01-01

    We assessed individual differences in visual attention toward faces in relation to their attractiveness via saccadic reaction times. Motivated by the aim to understand individual differences in attention to faces, we tested three hypotheses: (a) Attractive faces hold or capture attention more effectively than less attractive faces; (b) men show a stronger bias toward attractive opposite-sex faces than women; and (c) blue-eyed men show a stronger bias toward blue-eyed than brown-eyed feminine faces. The latter test was included because prior research suggested a high effect size. Our data supported hypotheses (a) and (b) but not (c). By conducting separate tests for disengagement of attention and attention capture, we found that individual differences exist at distinct stages of attentional processing but these differences are of varying robustness and importance. In our conclusion, we also advocate the use of linear mixed effects models as the most appropriate statistical approach for studying inter-individual differences in visual attention with naturalistic stimuli.

  1. The Global Error Assessment (GEA) model for the selection of differentially expressed genes in microarray data.

    PubMed

    Mansourian, Robert; Mutch, David M; Antille, Nicolas; Aubert, Jerome; Fogel, Paul; Le Goff, Jean-Marc; Moulin, Julie; Petrov, Anton; Rytz, Andreas; Voegel, Johannes J; Roberts, Matthew-Alan

    2004-11-01

    Microarray technology has become a powerful research tool in many fields of study; however, the cost of microarrays often results in the use of a low number of replicates (k). Under circumstances where k is low, it becomes difficult to perform standard statistical tests to extract the most biologically significant experimental results. Other more advanced statistical tests have been developed; however, their use and interpretation often remain difficult to implement in routine biological research. The present work outlines a method that achieves sufficient statistical power for selecting differentially expressed genes under conditions of low k, while remaining as an intuitive and computationally efficient procedure. The present study describes a Global Error Assessment (GEA) methodology to select differentially expressed genes in microarray datasets, and was developed using an in vitro experiment that compared control and interferon-gamma treated skin cells. In this experiment, up to nine replicates were used to confidently estimate error, thereby enabling methods of different statistical power to be compared. Gene expression results of a similar absolute expression are binned, so as to enable a highly accurate local estimate of the mean squared error within conditions. The model then relates variability of gene expression in each bin to absolute expression levels and uses this in a test derived from the classical ANOVA. The GEA selection method is compared with both the classical and permutational ANOVA tests, and demonstrates an increased stability, robustness and confidence in gene selection. A subset of the selected genes were validated by real-time reverse transcription-polymerase chain reaction (RT-PCR). All these results suggest that GEA methodology is (i) suitable for selection of differentially expressed genes in microarray data, (ii) intuitive and computationally efficient and (iii) especially advantageous under conditions of low k. The GEA code for R software is freely available upon request to authors.

  2. STATISTICS AND INTELLIGENCE IN DEVELOPING COUNTRIES: A NOTE.

    PubMed

    Kodila-Tedika, Oasis; Asongu, Simplice A; Azia-Dimbu, Florentin

    2017-05-01

    The purpose of this study is to assess the relationship between intelligence (or human capital) and the statistical capacity of developing countries. The line of inquiry is motivated essentially by the scarce literature on poor statistics in developing countries and an evolving stream of literature on the knowledge economy. A positive association is established between intelligence quotient (IQ) and statistical capacity. The relationship is robust to alternative specifications with varying conditioning information sets and control for outliers. Policy implications are discussed.

  3. Robust spike classification based on frequency domain neural waveform features.

    PubMed

    Yang, Chenhui; Yuan, Yuan; Si, Jennie

    2013-12-01

    We introduce a new spike classification algorithm based on frequency domain features of the spike snippets. The goal for the algorithm is to provide high classification accuracy, low false misclassification, ease of implementation, robustness to signal degradation, and objectivity in classification outcomes. In this paper, we propose a spike classification algorithm based on frequency domain features (CFDF). It makes use of frequency domain contents of the recorded neural waveforms for spike classification. The self-organizing map (SOM) is used as a tool to determine the cluster number intuitively and directly by viewing the SOM output map. After that, spike classification can be easily performed using clustering algorithms such as the k-Means. In conjunction with our previously developed multiscale correlation of wavelet coefficient (MCWC) spike detection algorithm, we show that the MCWC and CFDF detection and classification system is robust when tested on several sets of artificial and real neural waveforms. The CFDF is comparable to or outperforms some popular automatic spike classification algorithms with artificial and real neural data. The detection and classification of neural action potentials or neural spikes is an important step in single-unit-based neuroscientific studies and applications. After the detection of neural snippets potentially containing neural spikes, a robust classification algorithm is applied for the analysis of the snippets to (1) extract similar waveforms into one class for them to be considered coming from one unit, and to (2) remove noise snippets if they do not contain any features of an action potential. Usually, a snippet is a small 2 or 3 ms segment of the recorded waveform, and differences in neural action potentials can be subtle from one unit to another. Therefore, a robust, high performance classification system like the CFDF is necessary. In addition, the proposed algorithm does not require any assumptions on statistical properties of the noise and proves to be robust under noise contamination.

  4. Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2007-01-01

    Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.

  5. Inferring causal relationships between phenotypes using summary statistics from genome-wide association studies.

    PubMed

    Meng, Xiang-He; Shen, Hui; Chen, Xiang-Ding; Xiao, Hong-Mei; Deng, Hong-Wen

    2018-03-01

    Genome-wide association studies (GWAS) have successfully identified numerous genetic variants associated with diverse complex phenotypes and diseases, and provided tremendous opportunities for further analyses using summary association statistics. Recently, Pickrell et al. developed a robust method for causal inference using independent putative causal SNPs. However, this method may fail to infer the causal relationship between two phenotypes when only a limited number of independent putative causal SNPs identified. Here, we extended Pickrell's method to make it more applicable for the general situations. We extended the causal inference method by replacing the putative causal SNPs with the lead SNPs (the set of the most significant SNPs in each independent locus) and tested the performance of our extended method using both simulation and empirical data. Simulations suggested that when the same number of genetic variants is used, our extended method had similar distribution of test statistic under the null model as well as comparable power under the causal model compared with the original method by Pickrell et al. But in practice, our extended method would generally be more powerful because the number of independent lead SNPs was often larger than the number of independent putative causal SNPs. And including more SNPs, on the other hand, would not cause more false positives. By applying our extended method to summary statistics from GWAS for blood metabolites and femoral neck bone mineral density (FN-BMD), we successfully identified ten blood metabolites that may causally influence FN-BMD. We extended a causal inference method for inferring putative causal relationship between two phenotypes using summary statistics from GWAS, and identified a number of potential causal metabolites for FN-BMD, which may provide novel insights into the pathophysiological mechanisms underlying osteoporosis.

  6. Robust transceiver design for reciprocal M × N interference channel based on statistical linearization approximation

    NASA Astrophysics Data System (ADS)

    Mayvan, Ali D.; Aghaeinia, Hassan; Kazemi, Mohammad

    2017-12-01

    This paper focuses on robust transceiver design for throughput enhancement on the interference channel (IC), under imperfect channel state information (CSI). In this paper, two algorithms are proposed to improve the throughput of the multi-input multi-output (MIMO) IC. Each transmitter and receiver has, respectively, M and N antennas and IC operates in a time division duplex mode. In the first proposed algorithm, each transceiver adjusts its filter to maximize the expected value of signal-to-interference-plus-noise ratio (SINR). On the other hand, the second algorithm tries to minimize the variances of the SINRs to hedge against the variability due to CSI error. Taylor expansion is exploited to approximate the effect of CSI imperfection on mean and variance. The proposed robust algorithms utilize the reciprocity of wireless networks to optimize the estimated statistical properties in two different working modes. Monte Carlo simulations are employed to investigate sum rate performance of the proposed algorithms and the advantage of incorporating variation minimization into the transceiver design.

  7. Segmenting lung fields in serial chest radiographs using both population-based and patient-specific shape statistics.

    PubMed

    Shi, Y; Qi, F; Xue, Z; Chen, L; Ito, K; Matsuo, H; Shen, D

    2008-04-01

    This paper presents a new deformable model using both population-based and patient-specific shape statistics to segment lung fields from serial chest radiographs. There are two novelties in the proposed deformable model. First, a modified scale invariant feature transform (SIFT) local descriptor, which is more distinctive than the general intensity and gradient features, is used to characterize the image features in the vicinity of each pixel. Second, the deformable contour is constrained by both population-based and patient-specific shape statistics, and it yields more robust and accurate segmentation of lung fields for serial chest radiographs. In particular, for segmenting the initial time-point images, the population-based shape statistics is used to constrain the deformable contour; as more subsequent images of the same patient are acquired, the patient-specific shape statistics online collected from the previous segmentation results gradually takes more roles. Thus, this patient-specific shape statistics is updated each time when a new segmentation result is obtained, and it is further used to refine the segmentation results of all the available time-point images. Experimental results show that the proposed method is more robust and accurate than other active shape models in segmenting the lung fields from serial chest radiographs.

  8. Skeletal Correlates for Body Mass Estimation in Modern and Fossil Flying Birds

    PubMed Central

    Field, Daniel J.; Lynner, Colton; Brown, Christian; Darroch, Simon A. F.

    2013-01-01

    Scaling relationships between skeletal dimensions and body mass in extant birds are often used to estimate body mass in fossil crown-group birds, as well as in stem-group avialans. However, useful statistical measurements for constraining the precision and accuracy of fossil mass estimates are rarely provided, which prevents the quantification of robust upper and lower bound body mass estimates for fossils. Here, we generate thirteen body mass correlations and associated measures of statistical robustness using a sample of 863 extant flying birds. By providing robust body mass regressions with upper- and lower-bound prediction intervals for individual skeletal elements, we address the longstanding problem of body mass estimation for highly fragmentary fossil birds. We demonstrate that the most precise proxy for estimating body mass in the overall dataset, measured both as coefficient determination of ordinary least squares regression and percent prediction error, is the maximum diameter of the coracoid’s humeral articulation facet (the glenoid). We further demonstrate that this result is consistent among the majority of investigated avian orders (10 out of 18). As a result, we suggest that, in the majority of cases, this proxy may provide the most accurate estimates of body mass for volant fossil birds. Additionally, by presenting statistical measurements of body mass prediction error for thirteen different body mass regressions, this study provides a much-needed quantitative framework for the accurate estimation of body mass and associated ecological correlates in fossil birds. The application of these regressions will enhance the precision and robustness of many mass-based inferences in future paleornithological studies. PMID:24312392

  9. Regional-scale analysis of extreme precipitation from short and fragmented records

    NASA Astrophysics Data System (ADS)

    Libertino, Andrea; Allamano, Paola; Laio, Francesco; Claps, Pierluigi

    2018-02-01

    Rain gauge is the oldest and most accurate instrument for rainfall measurement, able to provide long series of reliable data. However, rain gauge records are often plagued by gaps, spatio-temporal discontinuities and inhomogeneities that could affect their suitability for a statistical assessment of the characteristics of extreme rainfall. Furthermore, the need to discard the shorter series for obtaining robust estimates leads to ignore a significant amount of information which can be essential, especially when large return periods estimates are sought. This work describes a robust statistical framework for dealing with uneven and fragmented rainfall records on a regional spatial domain. The proposed technique, named "patched kriging" allows one to exploit all the information available from the recorded series, independently of their length, to provide extreme rainfall estimates in ungauged areas. The methodology involves the sequential application of the ordinary kriging equations, producing a homogeneous dataset of synthetic series with uniform lengths. In this way, the errors inherent to any regional statistical estimation can be easily represented in the spatial domain and, possibly, corrected. Furthermore, the homogeneity of the obtained series, provides robustness toward local artefacts during the parameter-estimation phase. The application to a case study in the north-western Italy demonstrates the potential of the methodology and provides a significant base for discussing its advantages over previous techniques.

  10. Improving Robustness of Hydrologic Ensemble Predictions Through Probabilistic Pre- and Post-Processing in Sequential Data Assimilation

    NASA Astrophysics Data System (ADS)

    Wang, S.; Ancell, B. C.; Huang, G. H.; Baetz, B. W.

    2018-03-01

    Data assimilation using the ensemble Kalman filter (EnKF) has been increasingly recognized as a promising tool for probabilistic hydrologic predictions. However, little effort has been made to conduct the pre- and post-processing of assimilation experiments, posing a significant challenge in achieving the best performance of hydrologic predictions. This paper presents a unified data assimilation framework for improving the robustness of hydrologic ensemble predictions. Statistical pre-processing of assimilation experiments is conducted through the factorial design and analysis to identify the best EnKF settings with maximized performance. After the data assimilation operation, statistical post-processing analysis is also performed through the factorial polynomial chaos expansion to efficiently address uncertainties in hydrologic predictions, as well as to explicitly reveal potential interactions among model parameters and their contributions to the predictive accuracy. In addition, the Gaussian anamorphosis is used to establish a seamless bridge between data assimilation and uncertainty quantification of hydrologic predictions. Both synthetic and real data assimilation experiments are carried out to demonstrate feasibility and applicability of the proposed methodology in the Guadalupe River basin, Texas. Results suggest that statistical pre- and post-processing of data assimilation experiments provide meaningful insights into the dynamic behavior of hydrologic systems and enhance robustness of hydrologic ensemble predictions.

  11. Omnibus Risk Assessment via Accelerated Failure Time Kernel Machine Modeling

    PubMed Central

    Sinnott, Jennifer A.; Cai, Tianxi

    2013-01-01

    Summary Integrating genomic information with traditional clinical risk factors to improve the prediction of disease outcomes could profoundly change the practice of medicine. However, the large number of potential markers and possible complexity of the relationship between markers and disease make it difficult to construct accurate risk prediction models. Standard approaches for identifying important markers often rely on marginal associations or linearity assumptions and may not capture non-linear or interactive effects. In recent years, much work has been done to group genes into pathways and networks. Integrating such biological knowledge into statistical learning could potentially improve model interpretability and reliability. One effective approach is to employ a kernel machine (KM) framework, which can capture nonlinear effects if nonlinear kernels are used (Scholkopf and Smola, 2002; Liu et al., 2007, 2008). For survival outcomes, KM regression modeling and testing procedures have been derived under a proportional hazards (PH) assumption (Li and Luan, 2003; Cai et al., 2011). In this paper, we derive testing and prediction methods for KM regression under the accelerated failure time model, a useful alternative to the PH model. We approximate the null distribution of our test statistic using resampling procedures. When multiple kernels are of potential interest, it may be unclear in advance which kernel to use for testing and estimation. We propose a robust Omnibus Test that combines information across kernels, and an approach for selecting the best kernel for estimation. The methods are illustrated with an application in breast cancer. PMID:24328713

  12. Adaptive linear rank tests for eQTL studies

    PubMed Central

    Szymczak, Silke; Scheinhardt, Markus O.; Zeller, Tanja; Wild, Philipp S.; Blankenberg, Stefan; Ziegler, Andreas

    2013-01-01

    Expression quantitative trait loci (eQTL) studies are performed to identify single-nucleotide polymorphisms that modify average expression values of genes, proteins, or metabolites, depending on the genotype. As expression values are often not normally distributed, statistical methods for eQTL studies should be valid and powerful in these situations. Adaptive tests are promising alternatives to standard approaches, such as the analysis of variance or the Kruskal–Wallis test. In a two-stage procedure, skewness and tail length of the distributions are estimated and used to select one of several linear rank tests. In this study, we compare two adaptive tests that were proposed in the literature using extensive Monte Carlo simulations of a wide range of different symmetric and skewed distributions. We derive a new adaptive test that combines the advantages of both literature-based approaches. The new test does not require the user to specify a distribution. It is slightly less powerful than the locally most powerful rank test for the correct distribution and at least as powerful as the maximin efficiency robust rank test. We illustrate the application of all tests using two examples from different eQTL studies. PMID:22933317

  13. Adaptive linear rank tests for eQTL studies.

    PubMed

    Szymczak, Silke; Scheinhardt, Markus O; Zeller, Tanja; Wild, Philipp S; Blankenberg, Stefan; Ziegler, Andreas

    2013-02-10

    Expression quantitative trait loci (eQTL) studies are performed to identify single-nucleotide polymorphisms that modify average expression values of genes, proteins, or metabolites, depending on the genotype. As expression values are often not normally distributed, statistical methods for eQTL studies should be valid and powerful in these situations. Adaptive tests are promising alternatives to standard approaches, such as the analysis of variance or the Kruskal-Wallis test. In a two-stage procedure, skewness and tail length of the distributions are estimated and used to select one of several linear rank tests. In this study, we compare two adaptive tests that were proposed in the literature using extensive Monte Carlo simulations of a wide range of different symmetric and skewed distributions. We derive a new adaptive test that combines the advantages of both literature-based approaches. The new test does not require the user to specify a distribution. It is slightly less powerful than the locally most powerful rank test for the correct distribution and at least as powerful as the maximin efficiency robust rank test. We illustrate the application of all tests using two examples from different eQTL studies. Copyright © 2012 John Wiley & Sons, Ltd.

  14. Symbol recognition via statistical integration of pixel-level constraint histograms: a new descriptor.

    PubMed

    Yang, Su

    2005-02-01

    A new descriptor for symbol recognition is proposed. 1) A histogram is constructed for every pixel to figure out the distribution of the constraints among the other pixels. 2) All the histograms are statistically integrated to form a feature vector with fixed dimension. The robustness and invariance were experimentally confirmed.

  15. The Robustness of the Studentized Range Statistic to Violations of the Normality and Homogeneity of Variance Assumptions.

    ERIC Educational Resources Information Center

    Ramseyer, Gary C.; Tcheng, Tse-Kia

    The present study was directed at determining the extent to which the Type I Error rate is affected by violations in the basic assumptions of the q statistic. Monte Carlo methods were employed, and a variety of departures from the assumptions were examined. (Author)

  16. Statistical methods for change-point detection in surface temperature records

    NASA Astrophysics Data System (ADS)

    Pintar, A. L.; Possolo, A.; Zhang, N. F.

    2013-09-01

    We describe several statistical methods to detect possible change-points in a time series of values of surface temperature measured at a meteorological station, and to assess the statistical significance of such changes, taking into account the natural variability of the measured values, and the autocorrelations between them. These methods serve to determine whether the record may suffer from biases unrelated to the climate signal, hence whether there may be a need for adjustments as considered by M. J. Menne and C. N. Williams (2009) "Homogenization of Temperature Series via Pairwise Comparisons", Journal of Climate 22 (7), 1700-1717. We also review methods to characterize patterns of seasonality (seasonal decomposition using monthly medians or robust local regression), and explain the role they play in the imputation of missing values, and in enabling robust decompositions of the measured values into a seasonal component, a possible climate signal, and a station-specific remainder. The methods for change-point detection that we describe include statistical process control, wavelet multi-resolution analysis, adaptive weights smoothing, and a Bayesian procedure, all of which are applicable to single station records.

  17. A Robustness Testing Campaign for IMA-SP Partitioning Kernels

    NASA Astrophysics Data System (ADS)

    Grixti, Stephen; Lopez Trecastro, Jorge; Sammut, Nicholas; Zammit-Mangion, David

    2015-09-01

    With time and space partitioned architectures becoming increasingly appealing to the European space sector, the dependability of partitioning kernel technology is a key factor to its applicability in European Space Agency projects. This paper explores the potential of the data type fault model, which injects faults through the Application Program Interface, in partitioning kernel robustness testing. This fault injection methodology has been tailored to investigate its relevance in uncovering vulnerabilities within partitioning kernels and potentially contributing towards fault removal campaigns within this domain. This is demonstrated through a robustness testing case study of the XtratuM partitioning kernel for SPARC LEON3 processors. The robustness campaign exposed a number of vulnerabilities in XtratuM, exhibiting the potential benefits of using such a methodology for the robustness assessment of partitioning kernels.

  18. Phi Index: A New Metric to Test the Flush Early and Avoid the Rush Hypothesis

    PubMed Central

    Samia, Diogo S. M.; Blumstein, Daniel T.

    2014-01-01

    Optimal escape theory states that animals should counterbalance the costs and benefits of flight when escaping from a potential predator. However, in apparent contradiction with this well-established optimality model, birds and mammals generally initiate escape soon after beginning to monitor an approaching threat, a phenomena codified as the “Flush Early and Avoid the Rush” (FEAR) hypothesis. Typically, the FEAR hypothesis is tested using correlational statistics and is supported when there is a strong relationship between the distance at which an individual first responds behaviorally to an approaching predator (alert distance, AD), and its flight initiation distance (the distance at which it flees the approaching predator, FID). However, such correlational statistics are both inadequate to analyze relationships constrained by an envelope (such as that in the AD-FID relationship) and are sensitive to outliers with high leverage, which can lead one to erroneous conclusions. To overcome these statistical concerns we develop the phi index (Φ), a distribution-free metric to evaluate the goodness of fit of a 1∶1 relationship in a constraint envelope (the prediction of the FEAR hypothesis). Using both simulation and empirical data, we conclude that Φ is superior to traditional correlational analyses because it explicitly tests the FEAR prediction, is robust to outliers, and it controls for the disproportionate influence of observations from large predictor values (caused by the constrained envelope in AD-FID relationship). Importantly, by analyzing the empirical data we corroborate the strong effect that alertness has on flight as stated by the FEAR hypothesis. PMID:25405872

  19. Phi index: a new metric to test the flush early and avoid the rush hypothesis.

    PubMed

    Samia, Diogo S M; Blumstein, Daniel T

    2014-01-01

    Optimal escape theory states that animals should counterbalance the costs and benefits of flight when escaping from a potential predator. However, in apparent contradiction with this well-established optimality model, birds and mammals generally initiate escape soon after beginning to monitor an approaching threat, a phenomena codified as the "Flush Early and Avoid the Rush" (FEAR) hypothesis. Typically, the FEAR hypothesis is tested using correlational statistics and is supported when there is a strong relationship between the distance at which an individual first responds behaviorally to an approaching predator (alert distance, AD), and its flight initiation distance (the distance at which it flees the approaching predator, FID). However, such correlational statistics are both inadequate to analyze relationships constrained by an envelope (such as that in the AD-FID relationship) and are sensitive to outliers with high leverage, which can lead one to erroneous conclusions. To overcome these statistical concerns we develop the phi index (Φ), a distribution-free metric to evaluate the goodness of fit of a 1:1 relationship in a constraint envelope (the prediction of the FEAR hypothesis). Using both simulation and empirical data, we conclude that Φ is superior to traditional correlational analyses because it explicitly tests the FEAR prediction, is robust to outliers, and it controls for the disproportionate influence of observations from large predictor values (caused by the constrained envelope in AD-FID relationship). Importantly, by analyzing the empirical data we corroborate the strong effect that alertness has on flight as stated by the FEAR hypothesis.

  20. Factors Associated with HIV Testing Among Participants from Substance Use Disorder Treatment Programs in the US: A Machine Learning Approach.

    PubMed

    Pan, Yue; Liu, Hongmei; Metsch, Lisa R; Feaster, Daniel J

    2017-02-01

    HIV testing is the foundation for consolidated HIV treatment and prevention. In this study, we aim to discover the most relevant variables for predicting HIV testing uptake among substance users in substance use disorder treatment programs by applying random forest (RF), a robust multivariate statistical learning method. We also provide a descriptive introduction to this method for those who are unfamiliar with it. We used data from the National Institute on Drug Abuse Clinical Trials Network HIV testing and counseling study (CTN-0032). A total of 1281 HIV-negative or status unknown participants from 12 US community-based substance use disorder treatment programs were included and were randomized into three HIV testing and counseling treatment groups. The a priori primary outcome was self-reported receipt of HIV test results. Classification accuracy of RF was compared to logistic regression, a standard statistical approach for binary outcomes. Variable importance measures for the RF model were used to select the most relevant variables. RF based models produced much higher classification accuracy than those based on logistic regression. Treatment group is the most important predictor among all covariates, with a variable importance index of 12.9%. RF variable importance revealed that several types of condomless sex behaviors, condom use self-efficacy and attitudes towards condom use, and level of depression are the most important predictors of receipt of HIV testing results. There is a non-linear negative relationship between count of condomless sex acts and the receipt of HIV testing. In conclusion, RF seems promising in discovering important factors related to HIV testing uptake among large numbers of predictors and should be encouraged in future HIV prevention and treatment research and intervention program evaluations.

  1. Electrophysiological evidence of statistical learning of long-distance dependencies in 8-month-old preterm and full-term infants.

    PubMed

    Kabdebon, C; Pena, M; Buiatti, M; Dehaene-Lambertz, G

    2015-09-01

    Using electroencephalography, we examined 8-month-old infants' ability to discover a systematic dependency between the first and third syllables of successive words, concatenated into a monotonous speech stream, and to subsequently generalize this regularity to new items presented in isolation. Full-term and preterm infants, while exposed to the stream, displayed a significant entrainment (phase-locking) to the syllabic and word frequencies, demonstrating that they were sensitive to the word unit. The acquisition of the systematic dependency defining words was confirmed by the significantly different neural responses to rule-words and part-words subsequently presented during the test phase. Finally, we observed a correlation between syllabic entrainment during learning and the difference in phase coherence between the test conditions (rule-words vs part-words) suggesting that temporal processing of the syllable unit might be crucial in linguistic learning. No group difference was observed suggesting that non-adjacent statistical computations are already robust at 8 months, even in preterm infants, and thus develop during the first year of life, earlier than expected from behavioral studies. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. A brief introduction to computer-intensive methods, with a view towards applications in spatial statistics and stereology.

    PubMed

    Mattfeldt, Torsten

    2011-04-01

    Computer-intensive methods may be defined as data analytical procedures involving a huge number of highly repetitive computations. We mention resampling methods with replacement (bootstrap methods), resampling methods without replacement (randomization tests) and simulation methods. The resampling methods are based on simple and robust principles and are largely free from distributional assumptions. Bootstrap methods may be used to compute confidence intervals for a scalar model parameter and for summary statistics from replicated planar point patterns, and for significance tests. For some simple models of planar point processes, point patterns can be simulated by elementary Monte Carlo methods. The simulation of models with more complex interaction properties usually requires more advanced computing methods. In this context, we mention simulation of Gibbs processes with Markov chain Monte Carlo methods using the Metropolis-Hastings algorithm. An alternative to simulations on the basis of a parametric model consists of stochastic reconstruction methods. The basic ideas behind the methods are briefly reviewed and illustrated by simple worked examples in order to encourage novices in the field to use computer-intensive methods. © 2010 The Authors Journal of Microscopy © 2010 Royal Microscopical Society.

  3. Efficient Robust Regression via Two-Stage Generalized Empirical Likelihood

    PubMed Central

    Bondell, Howard D.; Stefanski, Leonard A.

    2013-01-01

    Large- and finite-sample efficiency and resistance to outliers are the key goals of robust statistics. Although often not simultaneously attainable, we develop and study a linear regression estimator that comes close. Efficiency obtains from the estimator’s close connection to generalized empirical likelihood, and its favorable robustness properties are obtained by constraining the associated sum of (weighted) squared residuals. We prove maximum attainable finite-sample replacement breakdown point, and full asymptotic efficiency for normal errors. Simulation evidence shows that compared to existing robust regression estimators, the new estimator has relatively high efficiency for small sample sizes, and comparable outlier resistance. The estimator is further illustrated and compared to existing methods via application to a real data set with purported outliers. PMID:23976805

  4. On the Interplay between the Evolvability and Network Robustness in an Evolutionary Biological Network: A Systems Biology Approach

    PubMed Central

    Chen, Bor-Sen; Lin, Ying-Po

    2011-01-01

    In the evolutionary process, the random transmission and mutation of genes provide biological diversities for natural selection. In order to preserve functional phenotypes between generations, gene networks need to evolve robustly under the influence of random perturbations. Therefore, the robustness of the phenotype, in the evolutionary process, exerts a selection force on gene networks to keep network functions. However, gene networks need to adjust, by variations in genetic content, to generate phenotypes for new challenges in the network’s evolution, ie, the evolvability. Hence, there should be some interplay between the evolvability and network robustness in evolutionary gene networks. In this study, the interplay between the evolvability and network robustness of a gene network and a biochemical network is discussed from a nonlinear stochastic system point of view. It was found that if the genetic robustness plus environmental robustness is less than the network robustness, the phenotype of the biological network is robust in evolution. The tradeoff between the genetic robustness and environmental robustness in evolution is discussed from the stochastic stability robustness and sensitivity of the nonlinear stochastic biological network, which may be relevant to the statistical tradeoff between bias and variance, the so-called bias/variance dilemma. Further, the tradeoff could be considered as an antagonistic pleiotropic action of a gene network and discussed from the systems biology perspective. PMID:22084563

  5. Side Effects of Being Blue: Influence of Sad Mood on Visual Statistical Learning

    PubMed Central

    Bertels, Julie; Demoulin, Catherine; Franco, Ana; Destrebecqz, Arnaud

    2013-01-01

    It is well established that mood influences many cognitive processes, such as learning and executive functions. Although statistical learning is assumed to be part of our daily life, as mood does, the influence of mood on statistical learning has never been investigated before. In the present study, a sad vs. neutral mood was induced to the participants through the listening of stories while they were exposed to a stream of visual shapes made up of the repeated presentation of four triplets, namely sequences of three shapes presented in a fixed order. Given that the inter-stimulus interval was held constant within and between triplets, the only cues available for triplet segmentation were the transitional probabilities between shapes. Direct and indirect measures of learning taken either immediately or 20 minutes after the exposure/mood induction phase revealed that participants learned the statistical regularities between shapes. Interestingly, although participants from the sad and neutral groups performed similarly in these tasks, subjective measures (confidence judgments taken after each trial) revealed that participants who experienced the sad mood induction showed increased conscious access to their statistical knowledge. These effects were not modulated by the time delay between the exposure/mood induction and the test phases. These results are discussed within the scope of the robustness principle and the influence of negative affects on processing style. PMID:23555797

  6. Low-Level Stratus Prediction Using Binary Statistical Regression: A Progress Report Using Moffett Field Data.

    DTIC Science & Technology

    1983-12-01

    analysis; such work is not reported here. It seems pos- sible that a robust principle component analysis may he informa- tive (see Gnanadesikan (1977...Statistics in Atmospheric Sciences, American Meteorological Soc., Boston, Mass. (1979) pp. 46-48. a Gnanadesikan , R., Methods for Statistical Data...North Carolina Chapel Hill, NC 20742 Dr. R. Gnanadesikan Bell Telephone Lab Murray Hill, NJ 07733 -%.. *5%a: *1 *15 I ,, - . . , ,, ... . . . . . . NO

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    P-Mart was designed specifically to allow cancer researchers to perform robust statistical processing of publicly available cancer proteomic datasets. To date an online statistical processing suite for proteomics does not exist. The P-Mart software is designed to allow statistical programmers to utilize these algorithms through packages in the R programming language as well as offering a web-based interface using the Azure cloud technology. The Azure cloud technology also allows the release of the software via Docker containers.

  8. Decreased external skeletal robustness in schoolchildren--a global trend? Ten year comparison of Russian and German data.

    PubMed

    Rietsch, Katrin; Godina, Elena; Scheffler, Christiane

    2013-01-01

    Obesity and a reduced physical activity are global developments. Physical activity affects the external skeletal robustness which decreased in German children. It was assumed that the negative trend of decreased external skeletal robustness can be found in other countries. Therefore anthropometric data of Russian and German children from the years 2000 and 2010 were compared. Russian (2000/2010 n = 1023/268) and German (2000/2010 n = 2103/1750) children aged 6-10 years were investigated. Height, BMI and external skeletal robustness (Frame-Index) were examined and compared for the years and the countries. Statistical analysis was performed by Mann-Whitney-Test. Comparison 2010 and 2000: In Russian children BMI was significantly higher; boys were significantly taller and exhibited a decreased Frame-Index (p = .002) in 2010. German boys showed significantly higher BMI in 2010. In both sexes Frame-Index (p = .001) was reduced in 2010. Comparison Russian and German children in 2000: BMI, height and Frame-Index were different between Russian and German children. German children were significantly taller but exhibited a lower Frame-Index (p<.001). Even German girls showed a significantly higher BMI. Comparison Russian and German children in 2010: BMI and Frame-Index were different. Russian children displayed a higher Frame-Index (p<.001) compared with Germans. In Russian children BMI has increased in recent years. Frame-Index is still higher in Russian children compared with Germans however in Russian boys Frame-Index is reduced. This trend and the physical activity should be observed in the future.

  9. FANSe2: a robust and cost-efficient alignment tool for quantitative next-generation sequencing applications.

    PubMed

    Xiao, Chuan-Le; Mai, Zhi-Biao; Lian, Xin-Lei; Zhong, Jia-Yong; Jin, Jing-Jie; He, Qing-Yu; Zhang, Gong

    2014-01-01

    Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/.

  10. Robust electroencephalogram phase estimation with applications in brain-computer interface systems.

    PubMed

    Seraj, Esmaeil; Sameni, Reza

    2017-03-01

    In this study, a robust method is developed for frequency-specific electroencephalogram (EEG) phase extraction using the analytic representation of the EEG. Based on recent theoretical findings in this area, it is shown that some of the phase variations-previously associated to the brain response-are systematic side-effects of the methods used for EEG phase calculation, especially during low analytical amplitude segments of the EEG. With this insight, the proposed method generates randomized ensembles of the EEG phase using minor perturbations in the zero-pole loci of narrow-band filters, followed by phase estimation using the signal's analytical form and ensemble averaging over the randomized ensembles to obtain a robust EEG phase and frequency. This Monte Carlo estimation method is shown to be very robust to noise and minor changes of the filter parameters and reduces the effect of fake EEG phase jumps, which do not have a cerebral origin. As proof of concept, the proposed method is used for extracting EEG phase features for a brain computer interface (BCI) application. The results show significant improvement in classification rates using rather simple phase-related features and a standard K-nearest neighbors and random forest classifiers, over a standard BCI dataset. The average performance was improved between 4-7% (in absence of additive noise) and 8-12% (in presence of additive noise). The significance of these improvements was statistically confirmed by a paired sample t-test, with 0.01 and 0.03 p-values, respectively. The proposed method for EEG phase calculation is very generic and may be applied to other EEG phase-based studies.

  11. The Conceptualisation and Measurement of DSM-5 Internet Gaming Disorder: The Development of the IGD-20 Test

    PubMed Central

    Pontes, Halley M.; Király, Orsolya; Demetrovics, Zsolt; Griffiths, Mark D.

    2014-01-01

    Background Over the last decade, there has been growing concern about ‘gaming addiction’ and its widely documented detrimental impacts on a minority of individuals that play excessively. The latest (fifth) edition of the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders (DSM-5) included nine criteria for the potential diagnosis of Internet Gaming Disorder (IGD) and noted that it was a condition that warranted further empirical study. Aim: The main aim of this study was to develop a valid and reliable standardised psychometrically robust tool in addition to providing empirically supported cut-off points. Methods A sample of 1003 gamers (85.2% males; mean age 26 years) from 57 different countries were recruited via online gaming forums. Validity was assessed by confirmatory factor analysis (CFA), criterion-related validity, and concurrent validity. Latent profile analysis was also carried to distinguish disordered gamers from non-disordered gamers. Sensitivity and specificity analyses were performed to determine an empirical cut-off for the test. Results The CFA confirmed the viability of IGD-20 Test with a six-factor structure (salience, mood modification, tolerance, withdrawal, conflict and relapse) for the assessment of IGD according to the nine criteria from DSM-5. The IGD-20 Test proved to be valid and reliable. According to the latent profile analysis, 5.3% of the total participants were classed as disordered gamers. Additionally, an optimal empirical cut-off of 71 points (out of 100) seemed to be adequate according to the sensitivity and specificity analyses carried. Conclusions The present findings support the viability of the IGD-20 Test as an adequate standardised psychometrically robust tool for assessing internet gaming disorder. Consequently, the new instrument represents the first step towards unification and consensus in the field of gaming studies. PMID:25313515

  12. The conceptualisation and measurement of DSM-5 Internet Gaming Disorder: the development of the IGD-20 Test.

    PubMed

    Pontes, Halley M; Király, Orsolya; Demetrovics, Zsolt; Griffiths, Mark D

    2014-01-01

    Over the last decade, there has been growing concern about 'gaming addiction' and its widely documented detrimental impacts on a minority of individuals that play excessively. The latest (fifth) edition of the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders (DSM-5) included nine criteria for the potential diagnosis of Internet Gaming Disorder (IGD) and noted that it was a condition that warranted further empirical study. The main aim of this study was to develop a valid and reliable standardised psychometrically robust tool in addition to providing empirically supported cut-off points. A sample of 1003 gamers (85.2% males; mean age 26 years) from 57 different countries were recruited via online gaming forums. Validity was assessed by confirmatory factor analysis (CFA), criterion-related validity, and concurrent validity. Latent profile analysis was also carried to distinguish disordered gamers from non-disordered gamers. Sensitivity and specificity analyses were performed to determine an empirical cut-off for the test. The CFA confirmed the viability of IGD-20 Test with a six-factor structure (salience, mood modification, tolerance, withdrawal, conflict and relapse) for the assessment of IGD according to the nine criteria from DSM-5. The IGD-20 Test proved to be valid and reliable. According to the latent profile analysis, 5.3% of the total participants were classed as disordered gamers. Additionally, an optimal empirical cut-off of 71 points (out of 100) seemed to be adequate according to the sensitivity and specificity analyses carried. The present findings support the viability of the IGD-20 Test as an adequate standardised psychometrically robust tool for assessing internet gaming disorder. Consequently, the new instrument represents the first step towards unification and consensus in the field of gaming studies.

  13. Maternal Serologic Screening to Prevent Congenital Toxoplasmosis: A Decision-Analytic Economic Model

    PubMed Central

    Stillwaggon, Eileen; Carrier, Christopher S.; Sautter, Mari; McLeod, Rima

    2011-01-01

    Objective To determine a cost-minimizing option for congenital toxoplasmosis in the United States. Methodology/Principal Findings A decision-analytic and cost-minimization model was constructed to compare monthly maternal serological screening, prenatal treatment, and post-natal follow-up and treatment according to the current French (Paris) protocol, versus no systematic screening or perinatal treatment. Costs are based on published estimates of lifetime societal costs of developmental disabilities and current diagnostic and treatment costs. Probabilities are based on published results and clinical practice in the United States and France. One- and two-way sensitivity analyses are used to evaluate robustness of results. Universal monthly maternal screening for congenital toxoplasmosis with follow-up and treatment, following the French protocol, is found to be cost-saving, with savings of $620 per child screened. Results are robust to changes in test costs, value of statistical life, seroprevalence in women of childbearing age, fetal loss due to amniocentesis, and to bivariate analysis of test costs and incidence of primary T. gondii infection in pregnancy. Given the parameters in this model and a maternal screening test cost of $12, screening is cost-saving for rates of congenital infection above 1 per 10,000 live births. If universal testing generates economies of scale in diagnostic tools—lowering test costs to about $2 per test—universal screening is cost-saving at rates of congenital infection well below the lowest reported rates in the United States of 1 per 10,000 live births. Conclusion/Significance Universal screening according to the French protocol is cost saving for the US population within broad parameters for costs and probabilities. PMID:21980546

  14. Novel Kalman Filter Algorithm for Statistical Monitoring of Extensive Landscapes with Synoptic Sensor Data

    PubMed Central

    Czaplewski, Raymond L.

    2015-01-01

    Wall-to-wall remotely sensed data are increasingly available to monitor landscape dynamics over large geographic areas. However, statistical monitoring programs that use post-stratification cannot fully utilize those sensor data. The Kalman filter (KF) is an alternative statistical estimator. I develop a new KF algorithm that is numerically robust with large numbers of study variables and auxiliary sensor variables. A National Forest Inventory (NFI) illustrates application within an official statistics program. Practical recommendations regarding remote sensing and statistical issues are offered. This algorithm has the potential to increase the value of synoptic sensor data for statistical monitoring of large geographic areas. PMID:26393588

  15. No-Reference Video Quality Assessment Based on Statistical Analysis in 3D-DCT Domain.

    PubMed

    Li, Xuelong; Guo, Qun; Lu, Xiaoqiang

    2016-05-13

    It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics (NVS) in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are firstly extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression (SVR) model afterwards. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in 3DDCT domain which has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing FR-VQA and RR-VQA metrics.

  16. Estimating temporary emigration and breeding proportions using capture-recapture data with Pollock's robust design

    USGS Publications Warehouse

    Kendall, W.L.; Nichols, J.D.; Hines, J.E.

    1997-01-01

    Statistical inference for capture-recapture studies of open animal populations typically relies on the assumption that all emigration from the studied population is permanent. However, there are many instances in which this assumption is unlikely to be met. We define two general models for the process of temporary emigration, completely random and Markovian. We then consider effects of these two types of temporary emigration on Jolly-Seber (Seber 1982) estimators and on estimators arising from the full-likelihood approach of Kendall et al. (1995) to robust design data. Capture-recapture data arising from Pollock's (1982) robust design provide the basis for obtaining unbiased estimates of demographic parameters in the presence of temporary emigration and for estimating the probability of temporary emigration. We present a likelihood-based approach to dealing with temporary emigration that permits estimation under different models of temporary emigration and yields tests for completely random and Markovian emigration. In addition, we use the relationship between capture probability estimates based on closed and open models under completely random temporary emigration to derive three ad hoc estimators for the probability of temporary emigration, two of which should be especially useful in situations where capture probabilities are heterogeneous among individual animals. Ad hoc and full-likelihood estimators are illustrated for small mammal capture-recapture data sets. We believe that these models and estimators will be useful for testing hypotheses about the process of temporary emigration, for estimating demographic parameters in the presence of temporary emigration, and for estimating probabilities of temporary emigration. These latter estimates are frequently of ecological interest as indicators of animal movement and, in some sampling situations, as direct estimates of breeding probabilities and proportions.

  17. Investigating the cognitive precursors of emotional response to cancer stress: re-testing Lazarus's transactional model.

    PubMed

    Hulbert-Williams, N J; Morrison, V; Wilkinson, C; Neal, R D

    2013-02-01

    Lazarus's Transactional Model of stress and coping underwent significant theoretical development through the 1990s to better incorporate emotional reactions to stress with their appraisal components. Few studies have robustly explored the full model. This study aimed to do so within the context of a major life event: cancer diagnosis. A repeated measures design was used whereby data were collected using self-report questionnaire at baseline (soon after diagnosis), and 3- and 6-month follow-up. A total of 160 recently diagnosed cancer patients were recruited (mean time since diagnosis = 46 days). Their mean age was 64.2 years. Data on appraisals, core-relational themes, and emotions were collected. Data were analysed using both Spearman's correlation tests and multivariate regression modelling. Longitudinal analysis demonstrated weak correlation between change scores of theoretically associated components and some emotions correlated more strongly with cognitions contradicting theoretical expectations. Cross-sectional multivariate testing of the ability of cognitions to explain variance in emotion was largely theory inconsistent. Although data support the generic structure of the Transactional Model, they question the model specifics. Larger scale research is needed encompassing a wider range of emotions and using more complex statistical testing. WHAT IS ALREADY KNOWN ON THIS SUBJECT?: • Stress processes are transactional and coping outcome is informed by both cognitive appraisal of the stressor and the individual's emotional response (Lazarus & Folkman, 1984). • Lazarus (1999) made specific hypotheses about which particular stress appraisals would determine which emotional response, but only a small number of these relationships have been robustly investigated. • Previous empirical testing of this theory has been limited by design and statistical limitations. WHAT DOES THIS STUDY ADD?: • This study empirically investigates the cognitive precedents of a much larger range of emotional outcomes than has previously been attempted in the literature. • Support for the model at a general level is established: this study demonstrates that both primary and secondary appraisals, and core-relational themes are important variables in explaining variance in emotional outcome. • The specific hypotheses proposed by Lazarus (1999) are not, however, supported: using data-driven approaches we demonstrate that equally high levels of variance can be explained using entirely different cognitive appraisals than those hypothesized. © 2012 The British Psychological Society.

  18. Assessment of Low Cycle Fatigue Behavior of Powder Metallurgy Alloy U720

    NASA Technical Reports Server (NTRS)

    Gabb, Tomothy P.; Bonacuse, Peter J.; Ghosn, Louis J.; Sweeney, Joseph W.; Chatterjee, Amit; Green, Kenneth A.

    2000-01-01

    The fatigue lives of modem powder metallurgy disk alloys are influenced by variabilities in alloy microstructure and mechanical properties. These properties can vary as functions of variables the different steps of materials/component processing: powder atomization, consolidation, extrusion, forging, heat treating, and machining. It is important to understand the relationship between the statistical variations in life and these variables, as well as the change in life distribution due to changes in fatigue loading conditions. The objective of this study was to investigate these relationships in a nickel-base disk superalloy, U720, produced using powder metallurgy processing. Multiple strain-controlled fatigue tests were performed at 538 C (1000 F) at limited sets of test conditions. Analyses were performed to: (1) assess variations of microstructure, mechanical properties, and LCF failure initiation sites as functions of disk processing and loading conditions; and (2) compare mean and minimum fatigue life predictions using different approaches for modeling the data from assorted test conditions. Significant variations in life were observed as functions of the disk processing variables evaluated. However, the lives of all specimens could still be combined and modeled together. The failure initiation sites for tests performed at a strain ratio R(sub epsilon) = epsilon(sub min)/epsilon(sub max) of 0 were different from those in tests at a strain ratio of -1. An approach could still be applied to account for the differences in mean and maximum stresses and strains. This allowed the data in tests of various conditions to be combined for more robust statistical estimates of mean and minimum lives.

  19. Robust Statistical Approaches for RSS-Based Floor Detection in Indoor Localization.

    PubMed

    Razavi, Alireza; Valkama, Mikko; Lohan, Elena Simona

    2016-05-31

    Floor detection for indoor 3D localization of mobile devices is currently an important challenge in the wireless world. Many approaches currently exist, but usually the robustness of such approaches is not addressed or investigated. The goal of this paper is to show how to robustify the floor estimation when probabilistic approaches with a low number of parameters are employed. Indeed, such an approach would allow a building-independent estimation and a lower computing power at the mobile side. Four robustified algorithms are to be presented: a robust weighted centroid localization method, a robust linear trilateration method, a robust nonlinear trilateration method, and a robust deconvolution method. The proposed approaches use the received signal strengths (RSS) measured by the Mobile Station (MS) from various heard WiFi access points (APs) and provide an estimate of the vertical position of the MS, which can be used for floor detection. We will show that robustification can indeed increase the performance of the RSS-based floor detection algorithms.

  20. New machine-learning algorithms for prediction of Parkinson's disease

    NASA Astrophysics Data System (ADS)

    Mandal, Indrajit; Sairam, N.

    2014-03-01

    This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.

  1. Development of a universal measure of quadrupedal forelimb-hindlimb coordination using digital motion capture and computerised analysis.

    PubMed

    Hamilton, Lindsay; Franklin, Robin J M; Jeffery, Nick D

    2007-09-18

    Clinical spinal cord injury in domestic dogs provides a model population in which to test the efficacy of putative therapeutic interventions for human spinal cord injury. To achieve this potential a robust method of functional analysis is required so that statistical comparison of numerical data derived from treated and control animals can be achieved. In this study we describe the use of digital motion capture equipment combined with mathematical analysis to derive a simple quantitative parameter - 'the mean diagonal coupling interval' - to describe coordination between forelimb and hindlimb movement. In normal dogs this parameter is independent of size, conformation, speed of walking or gait pattern. We show here that mean diagonal coupling interval is highly sensitive to alterations in forelimb-hindlimb coordination in dogs that have suffered spinal cord injury, and can be accurately quantified, but is unaffected by orthopaedic perturbations of gait. Mean diagonal coupling interval is an easily derived, highly robust measurement that provides an ideal method to compare the functional effect of therapeutic interventions after spinal cord injury in quadrupeds.

  2. Evaluation of Cross-Protocol Stability of a Fully Automated Brain Multi-Atlas Parcellation Tool.

    PubMed

    Liang, Zifei; He, Xiaohai; Ceritoglu, Can; Tang, Xiaoying; Li, Yue; Kutten, Kwame S; Oishi, Kenichi; Miller, Michael I; Mori, Susumu; Faria, Andreia V

    2015-01-01

    Brain parcellation tools based on multiple-atlas algorithms have recently emerged as a promising method with which to accurately define brain structures. When dealing with data from various sources, it is crucial that these tools are robust for many different imaging protocols. In this study, we tested the robustness of a multiple-atlas, likelihood fusion algorithm using Alzheimer's Disease Neuroimaging Initiative (ADNI) data with six different protocols, comprising three manufacturers and two magnetic field strengths. The entire brain was parceled into five different levels of granularity. In each level, which defines a set of brain structures, ranging from eight to 286 regions, we evaluated the variability of brain volumes related to the protocol, age, and diagnosis (healthy or Alzheimer's disease). Our results indicated that, with proper pre-processing steps, the impact of different protocols is minor compared to biological effects, such as age and pathology. A precise knowledge of the sources of data variation enables sufficient statistical power and ensures the reliability of an anatomical analysis when using this automated brain parcellation tool on datasets from various imaging protocols, such as clinical databases.

  3. Flexible Wing Base Micro Aerial Vehicles: Vision-Guided Flight Stability and Autonomy for Micro Air Vehicles

    NASA Technical Reports Server (NTRS)

    Ettinger, Scott M.; Nechyba, Michael C.; Ifju, Peter G.; Wazak, Martin

    2002-01-01

    Substantial progress has been made recently towards design building and test-flying remotely piloted Micro Air Vehicle's (MAVs). We seek to complement this progress in overcoming the aerodynamic obstacles to.flight at very small scales with a vision stability and autonomy system. The developed system based on a robust horizon detection algorithm which we discuss in greater detail in a companion paper. In this paper, we first motivate the use of computer vision for MAV autonomy arguing that given current sensor technology, vision may he the only practical approach to the problem. We then briefly review our statistical vision-based horizon detection algorithm, which has been demonstrated at 30Hz with over 99.9% correct horizon identification. Next we develop robust schemes for the detection of extreme MAV attitudes, where no horizon is visible, and for the detection of horizon estimation errors, due to external factors such as video transmission noise. Finally, we discuss our feed-back controller for self-stabilized flight, and report results on vision autonomous flights of duration exceeding ten minutes.

  4. DNS of Supersonic Turbulent Flows in a DLR Scramjet Intake

    NASA Astrophysics Data System (ADS)

    Li, Xinliang; Yu, Changping

    2014-11-01

    Direct numerical simulation (DNS) of supersonic/hypersonic flow through a DLR scramjet intake GK01 is performed. The free stream Mach numbers are 3, 5 and 7, and the angle of attack is zero degree. The DNS cases are performed by using an optimized MP scheme with adaptive dissipation (OMP-AD) developed by the authors, and the blow-and-suction perturbations near the leading edge are used to trigger the transition. To stabilize the simulation, locally non-linear flittering is used in high-Mach-number case. The transition, separation, and shock-turbulent boundary layer interaction are studied by using both flow visualization and statistical analysis. A new method, OMP-AD, is also addressed in this paper. The OMP-AD scheme is developed by using joint MP method and optimized technique, and the coefficients in the scheme are flexible to show low dissipation in the smoothing region, and to show high robust (but high dissipation) in the large gradient region. Numerical tests show that the OMP-AD is more robust than the original MP schemes, and the numerical dissipation of OMP-AD is very low.

  5. Improved Doubly Robust Estimation when Data are Monotonely Coarsened, with Application to Longitudinal Studies with Dropout

    PubMed Central

    Tsiatis, Anastasios A.; Davidian, Marie; Cao, Weihua

    2010-01-01

    Summary A routine challenge is that of making inference on parameters in a statistical model of interest from longitudinal data subject to drop out, which are a special case of the more general setting of monotonely coarsened data. Considerable recent attention has focused on doubly robust estimators, which in this context involve positing models for both the missingness (more generally, coarsening) mechanism and aspects of the distribution of the full data, that have the appealing property of yielding consistent inferences if only one of these models is correctly specified. Doubly robust estimators have been criticized for potentially disastrous performance when both of these models are even only mildly misspecified. We propose a doubly robust estimator applicable in general monotone coarsening problems that achieves comparable or improved performance relative to existing doubly robust methods, which we demonstrate via simulation studies and by application to data from an AIDS clinical trial. PMID:20731640

  6. Analysis of Light Emitting Diode Technology for Aerospace Suitability in Human Space Flight Applications

    NASA Astrophysics Data System (ADS)

    Treichel, Todd H.

    Commercial space designers are required to manage space flight designs in accordance with parts selections made from qualified parts listings approved by Department of Defense and NASA agencies for reliability and safety. The research problem was a government and private aerospace industry problem involving how LEDs cannot replace existing fluorescent lighting in manned space flight vehicles until such technology meets DOD and NASA requirements for reliability and safety, and effects on astronaut cognition and health. The purpose of this quantitative experimental study was to determine to what extent commercial LEDs can suitably meet NASA requirements for manufacturer reliability, color reliability, robustness to environmental test requirements, and degradation effects from operational power, while providing comfortable ambient light free of eyestrain to astronauts in lieu of current fluorescent lighting. A fractional factorial experiment tested white and blue LEDs for NASA required space flight environmental stress testing and applied operating current. The second phase of the study used a randomized block design, to test human factor effects of LEDs and a qualified ISS fluorescent for retinal fatigue and eye strain. Eighteen human subjects were recruited from university student members of the American Institute of Aeronautics and Astronautics. Findings for Phase 1 testing showed that commercial LEDs met all DOD and NASA requirements for manufacturer reliability, color reliability, robustness to environmental requirements, and degradation effects from operational power. Findings showed statistical significance for LED color and operational power variables but degraded light output levels did not fall below the industry recognized <70%. Findings from Phase 2 human factors testing showed no statistically significant evidence that the NASA approved ISS fluorescent lights or blue or white LEDs caused fatigue, eye strain and/or headache, when study participants perform detailed tasks of reading and assembling mechanical parts for an extended period of two uninterrupted hours. However, human subjects self-reported that blue LEDs provided the most white light and the favored light source over the white LED and the ISS fluorescent as a sole artificial light source for space travel. According to NASA standards, findings from this study indicate that LEDs meet criteria for the NASA TRL 7 rating, as study findings showed that commercial LED manufacturers passed the rigorous testing standards of suitability for space flight environments and human factor effects. Recommendations for future research include further testing for space flight using the basis of this study for replication, but reduce study limitations by 1) testing human subjects exposure to LEDs in a simulated space capsule environment over several days, and 2) installing and testing LEDs in space modules being tested for human spaceflight.

  7. Understanding Evaluation of Learning Support in Mathematics and Statistics

    ERIC Educational Resources Information Center

    MacGillivray, Helen; Croft, Tony

    2011-01-01

    With rapid and continuing growth of learning support initiatives in mathematics and statistics found in many parts of the world, and with the likelihood that this trend will continue, there is a need to ensure that robust and coherent measures are in place to evaluate the effectiveness of these initiatives. The nature of learning support brings…

  8. Fisher statistics for analysis of diffusion tensor directional information.

    PubMed

    Hutchinson, Elizabeth B; Rutecki, Paul A; Alexander, Andrew L; Sutula, Thomas P

    2012-04-30

    A statistical approach is presented for the quantitative analysis of diffusion tensor imaging (DTI) directional information using Fisher statistics, which were originally developed for the analysis of vectors in the field of paleomagnetism. In this framework, descriptive and inferential statistics have been formulated based on the Fisher probability density function, a spherical analogue of the normal distribution. The Fisher approach was evaluated for investigation of rat brain DTI maps to characterize tissue orientation in the corpus callosum, fornix, and hilus of the dorsal hippocampal dentate gyrus, and to compare directional properties in these regions following status epilepticus (SE) or traumatic brain injury (TBI) with values in healthy brains. Direction vectors were determined for each region of interest (ROI) for each brain sample and Fisher statistics were applied to calculate the mean direction vector and variance parameters in the corpus callosum, fornix, and dentate gyrus of normal rats and rats that experienced TBI or SE. Hypothesis testing was performed by calculation of Watson's F-statistic and associated p-value giving the likelihood that grouped observations were from the same directional distribution. In the fornix and midline corpus callosum, no directional differences were detected between groups, however in the hilus, significant (p<0.0005) differences were found that robustly confirmed observations that were suggested by visual inspection of directionally encoded color DTI maps. The Fisher approach is a potentially useful analysis tool that may extend the current capabilities of DTI investigation by providing a means of statistical comparison of tissue structural orientation. Copyright © 2012 Elsevier B.V. All rights reserved.

  9. Variable selection for marginal longitudinal generalized linear models.

    PubMed

    Cantoni, Eva; Flemming, Joanna Mills; Ronchetti, Elvezio

    2005-06-01

    Variable selection is an essential part of any statistical analysis and yet has been somewhat neglected in the context of longitudinal data analysis. In this article, we propose a generalized version of Mallows's C(p) (GC(p)) suitable for use with both parametric and nonparametric models. GC(p) provides an estimate of a measure of model's adequacy for prediction. We examine its performance with popular marginal longitudinal models (fitted using GEE) and contrast results with what is typically done in practice: variable selection based on Wald-type or score-type tests. An application to real data further demonstrates the merits of our approach while at the same time emphasizing some important robust features inherent to GC(p).

  10. Current Methodologies for the Analysis of Contingency Tables: Robustness with Respect to Small Expected Values.

    DTIC Science & Technology

    1982-06-09

    40 @ "o VAi %P t 00493f "f t0 IVI Ř "f %,?I 0N% &p &~9. - 4F.904 S d @ 4P099 4U d 909@ 00F 99@ 5 44%4#f m 999F.490F~pPd.5 WAPd 040Pdp.4Pd44090095...64 គ 67 22 21 63 គ 51 18 51 គ ’-, IS N -- -. - . . . . . . . . . - .. .. - .. - 71 264 MINIMUM N (N) TABLE: 2 x5 .10__ .05 .01 VECTOR K P G K P...the Use and Interpretation of Certain Test Cviteria for the Purpose of Statistical Inference, Part II", Biometrika, 20, 264 -299. Odoroff, C. L. (1970

  11. Completely automated modal analysis procedure based on the combination of different OMA methods

    NASA Astrophysics Data System (ADS)

    Ripamonti, Francesco; Bussini, Alberto; Resta, Ferruccio

    2018-03-01

    In this work a completely automated output-only Modal Analysis procedure is presented and all its benefits are listed. Based on the merging of different Operational Modal Analysis methods and a statistical approach, the identification process has been improved becoming more robust and giving as results only the real natural frequencies, damping ratios and mode shapes of the system. The effect of the temperature can be taken into account as well, leading to the creation of a better tool for automated Structural Health Monitoring. The algorithm has been developed and tested on a numerical model of a scaled three-story steel building present in the laboratories of Politecnico di Milano.

  12. An Automated Method for Navigation Assessment for Earth Survey Sensors Using Island Targets

    NASA Technical Reports Server (NTRS)

    Patt, F. S.; Woodward, R. H.; Gregg, W. W.

    1997-01-01

    An automated method has been developed for performing navigation assessment on satellite-based Earth sensor data. The method utilizes islands as targets which can be readily located in the sensor data and identified with reference locations. The essential elements are an algorithm for classifying the sensor data according to source, a reference catalogue of island locations, and a robust pattern-matching algorithm for island identification. The algorithms were developed and tested for the Sea-viewing Wide Field-of-view Sensor (SeaWiFS), an ocean colour sensor. This method will allow navigation error statistics to be automatically generated for large numbers of points, supporting analysis over large spatial and temporal ranges.

  13. Automated navigation assessment for earth survey sensors using island targets

    NASA Technical Reports Server (NTRS)

    Patt, Frederick S.; Woodward, Robert H.; Gregg, Watson W.

    1997-01-01

    An automated method has been developed for performing navigation assessment on satellite-based Earth sensor data. The method utilizes islands as targets which can be readily located in the sensor data and identified with reference locations. The essential elements are an algorithm for classifying the sensor data according to source, a reference catalog of island locations, and a robust pattern-matching algorithm for island identification. The algorithms were developed and tested for the Sea-viewing Wide Field-of-view Sensor (SeaWiFS), an ocean color sensor. This method will allow navigation error statistics to be automatically generated for large numbers of points, supporting analysis over large spatial and temporal ranges.

  14. Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials.

    PubMed

    Gomes, Manuel; Ng, Edmond S-W; Grieve, Richard; Nixon, Richard; Carpenter, James; Thompson, Simon G

    2012-01-01

    Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering--seemingly unrelated regression (SUR) without a robust standard error (SE)--and 4 methods that recognized clustering--SUR and generalized estimating equations (GEEs), both with robust SE, a "2-stage" nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92-0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters.

  15. Sampling design considerations for demographic studies: a case of colonial seabirds

    USGS Publications Warehouse

    Kendall, William L.; Converse, Sarah J.; Doherty, Paul F.; Naughton, Maura B.; Anders, Angela; Hines, James E.; Flint, Elizabeth

    2009-01-01

    For the purposes of making many informed conservation decisions, the main goal for data collection is to assess population status and allow prediction of the consequences of candidate management actions. Reducing the bias and variance of estimates of population parameters reduces uncertainty in population status and projections, thereby reducing the overall uncertainty under which a population manager must make a decision. In capture-recapture studies, imperfect detection of individuals, unobservable life-history states, local movement outside study areas, and tag loss can cause bias or precision problems with estimates of population parameters. Furthermore, excessive disturbance to individuals during capture?recapture sampling may be of concern because disturbance may have demographic consequences. We address these problems using as an example a monitoring program for Black-footed Albatross (Phoebastria nigripes) and Laysan Albatross (Phoebastria immutabilis) nesting populations in the northwestern Hawaiian Islands. To mitigate these estimation problems, we describe a synergistic combination of sampling design and modeling approaches. Solutions include multiple capture periods per season and multistate, robust design statistical models, dead recoveries and incidental observations, telemetry and data loggers, buffer areas around study plots to neutralize the effect of local movements outside study plots, and double banding and statistical models that account for band loss. We also present a variation on the robust capture?recapture design and a corresponding statistical model that minimizes disturbance to individuals. For the albatross case study, this less invasive robust design was more time efficient and, when used in combination with a traditional robust design, reduced the standard error of detection probability by 14% with only two hours of additional effort in the field. These field techniques and associated modeling approaches are applicable to studies of most taxa being marked and in some cases have individually been applied to studies of birds, fish, herpetofauna, and mammals.

  16. Developing Appropriate Methods for Cost-Effectiveness Analysis of Cluster Randomized Trials

    PubMed Central

    Gomes, Manuel; Ng, Edmond S.-W.; Nixon, Richard; Carpenter, James; Thompson, Simon G.

    2012-01-01

    Aim. Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Methods. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering—seemingly unrelated regression (SUR) without a robust standard error (SE)—and 4 methods that recognized clustering—SUR and generalized estimating equations (GEEs), both with robust SE, a “2-stage” nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Results. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92–0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. Conclusions. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters. PMID:22016450

  17. Characterization of Strong Light-Matter Coupling in Semiconductor Quantum-Dot Microcavities via Photon-Statistics Spectroscopy

    NASA Astrophysics Data System (ADS)

    Schneebeli, L.; Kira, M.; Koch, S. W.

    2008-08-01

    It is shown that spectrally resolved photon-statistics measurements of the resonance fluorescence from realistic semiconductor quantum-dot systems allow for high contrast identification of the two-photon strong-coupling states. Using a microscopic theory, the second-rung resonance of Jaynes-Cummings ladder is analyzed and optimum excitation conditions are determined. The computed photon-statistics spectrum displays gigantic, experimentally robust resonances at the energetic positions of the second-rung emission.

  18. Surveying Europe's Only Cave-Dwelling Chordate Species (Proteus anguinus) Using Environmental DNA.

    PubMed

    Vörös, Judit; Márton, Orsolya; Schmidt, Benedikt R; Gál, Júlia Tünde; Jelić, Dušan

    2017-01-01

    In surveillance of subterranean fauna, especially in the case of rare or elusive aquatic species, traditional techniques used for epigean species are often not feasible. We developed a non-invasive survey method based on environmental DNA (eDNA) to detect the presence of the red-listed cave-dwelling amphibian, Proteus anguinus, in the caves of the Dinaric Karst. We tested the method in fifteen caves in Croatia, from which the species was previously recorded or expected to occur. We successfully confirmed the presence of P. anguinus from ten caves and detected the species for the first time in five others. Using a hierarchical occupancy model we compared the availability and detection probability of eDNA of two water sampling methods, filtration and precipitation. The statistical analysis showed that both availability and detection probability depended on the method and estimates for both probabilities were higher using filter samples than for precipitation samples. Combining reliable field and laboratory methods with robust statistical modeling will give the best estimates of species occurrence.

  19. Statistical distributions of extreme dry spell in Peninsular Malaysia

    NASA Astrophysics Data System (ADS)

    Zin, Wan Zawiah Wan; Jemain, Abdul Aziz

    2010-11-01

    Statistical distributions of annual extreme (AE) series and partial duration (PD) series for dry-spell event are analyzed for a database of daily rainfall records of 50 rain-gauge stations in Peninsular Malaysia, with recording period extending from 1975 to 2004. The three-parameter generalized extreme value (GEV) and generalized Pareto (GP) distributions are considered to model both series. In both cases, the parameters of these two distributions are fitted by means of the L-moments method, which provides a robust estimation of them. The goodness-of-fit (GOF) between empirical data and theoretical distributions are then evaluated by means of the L-moment ratio diagram and several goodness-of-fit tests for each of the 50 stations. It is found that for the majority of stations, the AE and PD series are well fitted by the GEV and GP models, respectively. Based on the models that have been identified, we can reasonably predict the risks associated with extreme dry spells for various return periods.

  20. Application of Bingham statistics to a paleopole data set: Towards a better definition of APWP trends?

    NASA Astrophysics Data System (ADS)

    Cederquist, D. P.; Mac Niocaill, C.; Van der Voo, R.

    1997-01-01

    Bingham statistical analyses were applied to paleomagnetic data from 50 published studies from North America, of Carboniferous through Early Jurassic age, in an attempt to test whether the azimuths of the long axes of the Bingham ellipses lie tangent to the apparent polar wander path. The underlying assumption is that paleomagnetic directions will form a Fisherian (circular) distribution if no apparent polar wander has taken place during magnetization acquisition. However, the distribution should appear elongated (elliptical) if magnetization acquisition occurred over a significant amount of time involving apparent polar wander. The long axes in direction space yield corresponding azimuths in paleopole space, which can be compared to the North American APWP. We find that, generally, these azimuths are indeed sub-parallel to the APWP, validating the methods and the hypothesis. Plotting a pole as an azimuthal cord, representing the long axis of the ellipse, will provide additional robustness or definition to an APWP based upon temporally sparse paleomagnetic studies.

  1. Systems Engineering Metrics: Organizational Complexity and Product Quality Modeling

    NASA Technical Reports Server (NTRS)

    Mog, Robert A.

    1997-01-01

    Innovative organizational complexity and product quality models applicable to performance metrics for NASA-MSFC's Systems Analysis and Integration Laboratory (SAIL) missions and objectives are presented. An intensive research effort focuses on the synergistic combination of stochastic process modeling, nodal and spatial decomposition techniques, organizational and computational complexity, systems science and metrics, chaos, and proprietary statistical tools for accelerated risk assessment. This is followed by the development of a preliminary model, which is uniquely applicable and robust for quantitative purposes. Exercise of the preliminary model using a generic system hierarchy and the AXAF-I architectural hierarchy is provided. The Kendall test for positive dependence provides an initial verification and validation of the model. Finally, the research and development of the innovation is revisited, prior to peer review. This research and development effort results in near-term, measurable SAIL organizational and product quality methodologies, enhanced organizational risk assessment and evolutionary modeling results, and 91 improved statistical quantification of SAIL productivity interests.

  2. Validation of a two-dimensional liquid chromatography method for quality control testing of pharmaceutical materials.

    PubMed

    Yang, Samuel H; Wang, Jenny; Zhang, Kelly

    2017-04-07

    Despite the advantages of 2D-LC, there is currently little to no work in demonstrating the suitability of these 2D-LC methods for use in a quality control (QC) environment for good manufacturing practice (GMP) tests. This lack of information becomes more critical as the availability of commercial 2D-LC instrumentation has significantly increased, and more testing facilities begin to acquire these 2D-LC capabilities. It is increasingly important that the transferability of developed 2D-LC methods be assessed in terms of reproducibility, robustness and performance across different laboratories worldwide. The work presented here focuses on the evaluation of a heart-cutting 2D-LC method used for the analysis of a pharmaceutical material, where a key, co-eluting impurity in the first dimension ( 1 D) is resolved from the main peak and analyzed in the second dimension ( 2 D). A design-of-experiments (DOE) approach was taken in the collection of the data, and the results were then modeled in order to evaluate method robustness using statistical modeling software. This quality by design (QBD) approach gives a deeper understanding of the impact of these 2D-LC critical method attributes (CMAs) and how they affect overall method performance. Although there are multiple parameters that may be critical from method development point of view, a special focus of this work is devoted towards evaluation of unique 2D-LC critical method attributes from method validation perspective that transcend conventional method development and validation. The 2D-LC method attributes are evaluated for their recovery, peak shape, and resolution of the two co-eluting compounds in question on the 2 D. In the method, linearity, accuracy, precision, repeatability, and sensitivity are assessed along with day-to-day, analyst-to-analyst, and lab-to-lab (instrument-to-instrument) assessments. The results of this validation study demonstrate that the 2D-LC method is accurate, sensitive, and robust and is ultimately suitable for QC testing with good method transferability. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Robust Kalman filter design for predictive wind shear detection

    NASA Technical Reports Server (NTRS)

    Stratton, Alexander D.; Stengel, Robert F.

    1991-01-01

    Severe, low-altitude wind shear is a threat to aviation safety. Airborne sensors under development measure the radial component of wind along a line directly in front of an aircraft. In this paper, optimal estimation theory is used to define a detection algorithm to warn of hazardous wind shear from these sensors. To achieve robustness, a wind shear detection algorithm must distinguish threatening wind shear from less hazardous gustiness, despite variations in wind shear structure. This paper presents statistical analysis methods to refine wind shear detection algorithm robustness. Computational methods predict the ability to warn of severe wind shear and avoid false warning. Comparative capability of the detection algorithm as a function of its design parameters is determined, identifying designs that provide robust detection of severe wind shear.

  4. Topologically protected modes in non-equilibrium stochastic systems.

    PubMed

    Murugan, Arvind; Vaikuntanathan, Suriyanarayanan

    2017-01-10

    Non-equilibrium driving of biophysical processes is believed to enable their robust functioning despite the presence of thermal fluctuations and other sources of disorder. Such robust functions include sensory adaptation, enhanced enzymatic specificity and maintenance of coherent oscillations. Elucidating the relation between energy consumption and organization remains an important and open question in non-equilibrium statistical mechanics. Here we report that steady states of systems with non-equilibrium fluxes can support topologically protected boundary modes that resemble similar modes in electronic and mechanical systems. Akin to their electronic and mechanical counterparts, topological-protected boundary steady states in non-equilibrium systems are robust and are largely insensitive to local perturbations. We argue that our work provides a framework for how biophysical systems can use non-equilibrium driving to achieve robust function.

  5. Apparent Yield Strength of Hot-Pressed SiCs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daloz, William L; Wereszczak, Andrew A; Jadaan, Osama M.

    2008-01-01

    Apparent yield strengths (YApp) of four hot-pressed silicon carbides (SiC-B, SiC-N,SiC-HPN, and SiC-SC-1RN) were estimated using diamond spherical or Hertzian indentation. The von Mises and Tresca criteria were considered. The developed test method was robust, simple and quick to execute, and thusly enabled the acquisition of confident sampling statistics. The choice of indenter size, test method, and method of analysis are described. The compressive force necessary to initiate apparent yielding was identified postmortem using differential interference contrast (or Nomarski) imaging with an optical microscope. It was found that the YApp of SiC-HPN (14.0 GPa) was approximately 10% higher than themore » equivalently valued YApp of SiC-B, SiC-N, and SiC-SC-1RN. This discrimination in YApp shows that the use of this test method could be insightful because there were no differences among the average Knoop hardnesses of the four SiC grades.« less

  6. Robustness of survival estimates for radio-marked animals

    USGS Publications Warehouse

    Bunck, C.M.; Chen, C.-L.

    1992-01-01

    Telemetry techniques are often used to study the survival of birds and mammals; particularly whcn mark-recapture approaches are unsuitable. Both parametric and nonparametric methods to estimate survival have becn developed or modified from other applications. An implicit assumption in these approaches is that the probability of re-locating an animal with a functioning transmitter is one. A Monte Carlo study was conducted to determine the bias and variance of the Kaplan-Meier estimator and an estimator based also on the assumption of constant hazard and to eva!uate the performance of the two-sample tests associated with each. Modifications of each estimator which allow a re-Iocation probability of less than one are described and evaluated. Generallv the unmodified estimators were biased but had lower variance. At low sample sizes all estimators performed poorly. Under the null hypothesis, the distribution of all test statistics reasonably approximated the null distribution when survival was low but not when it was high. The power of the two-sample tests were similar.

  7. Set size manipulations reveal the boundary conditions of perceptual ensemble learning.

    PubMed

    Chetverikov, Andrey; Campana, Gianluca; Kristjánsson, Árni

    2017-11-01

    Recent evidence suggests that observers can grasp patterns of feature variations in the environment with surprising efficiency. During visual search tasks where all distractors are randomly drawn from a certain distribution rather than all being homogeneous, observers are capable of learning highly complex statistical properties of distractor sets. After only a few trials (learning phase), the statistical properties of distributions - mean, variance and crucially, shape - can be learned, and these representations affect search during a subsequent test phase (Chetverikov, Campana, & Kristjánsson, 2016). To assess the limits of such distribution learning, we varied the information available to observers about the underlying distractor distributions by manipulating set size during the learning phase in two experiments. We found that robust distribution learning only occurred for large set sizes. We also used set size to assess whether the learning of distribution properties makes search more efficient. The results reveal how a certain minimum of information is required for learning to occur, thereby delineating the boundary conditions of learning of statistical variation in the environment. However, the benefits of distribution learning for search efficiency remain unclear. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Active Nonlinear Feedback Control for Aerospace Systems. Processor

    DTIC Science & Technology

    1990-12-01

    relating to the role of nonlinearities in feedback control. These area include Lyapunov function theory, chaotic controllers, statistical energy analysis , phase robustness, and optimal nonlinear control theory.

  9. Improving the detection of pathways in genome-wide association studies by combined effects of SNPs from Linkage Disequilibrium blocks.

    PubMed

    Zhao, Huiying; Nyholt, Dale R; Yang, Yuanhao; Wang, Jihua; Yang, Yuedong

    2017-06-14

    Genome-wide association studies (GWAS) have successfully identified single variants associated with diseases. To increase the power of GWAS, gene-based and pathway-based tests are commonly employed to detect more risk factors. However, the gene- and pathway-based association tests may be biased towards genes or pathways containing a large number of single-nucleotide polymorphisms (SNPs) with small P-values caused by high linkage disequilibrium (LD) correlations. To address such bias, numerous pathway-based methods have been developed. Here we propose a novel method, DGAT-path, to divide all SNPs assigned to genes in each pathway into LD blocks, and to sum the chi-square statistics of LD blocks for assessing the significance of the pathway by permutation tests. The method was proven robust with the type I error rate >1.6 times lower than other methods. Meanwhile, the method displays a higher power and is not biased by the pathway size. The applications to the GWAS summary statistics for schizophrenia and breast cancer indicate that the detected top pathways contain more genes close to associated SNPs than other methods. As a result, the method identified 17 and 12 significant pathways containing 20 and 21 novel associated genes, respectively for two diseases. The method is available online by http://sparks-lab.org/server/DGAT-path .

  10. The predictive power of Japanese candlestick charting in Chinese stock market

    NASA Astrophysics Data System (ADS)

    Chen, Shi; Bao, Si; Zhou, Yu

    2016-09-01

    This paper studies the predictive power of 4 popular pairs of two-day bullish and bearish Japanese candlestick patterns in Chinese stock market. Based on Morris' study, we give the quantitative details of definition of long candlestick, which is important in two-day candlestick pattern recognition but ignored by several previous researches, and we further give the quantitative definitions of these four pairs of two-day candlestick patterns. To test the predictive power of candlestick patterns on short-term price movement, we propose the definition of daily average return to alleviate the impact of correlation among stocks' overlap-time returns in statistical tests. To show the robustness of our result, two methods of trend definition are used for both the medium-market-value and large-market-value sample sets. We use Step-SPA test to correct for data snooping bias. Statistical results show that the predictive power differs from pattern to pattern, three of the eight patterns provide both short-term and relatively long-term prediction, another one pair only provide significant forecasting power within very short-term period, while the rest three patterns present contradictory results for different market value groups. For all the four pairs, the predictive power drops as predicting time increases, and forecasting power is stronger for stocks with medium market value than those with large market value.

  11. Development of the Statistical Reasoning in Biology Concept Inventory (SRBCI).

    PubMed

    Deane, Thomas; Nomme, Kathy; Jeffery, Erica; Pollock, Carol; Birol, Gülnur

    2016-01-01

    We followed established best practices in concept inventory design and developed a 12-item inventory to assess student ability in statistical reasoning in biology (Statistical Reasoning in Biology Concept Inventory [SRBCI]). It is important to assess student thinking in this conceptual area, because it is a fundamental requirement of being statistically literate and associated skills are needed in almost all walks of life. Despite this, previous work shows that non-expert-like thinking in statistical reasoning is common, even after instruction. As science educators, our goal should be to move students along a novice-to-expert spectrum, which could be achieved with growing experience in statistical reasoning. We used item response theory analyses (the one-parameter Rasch model and associated analyses) to assess responses gathered from biology students in two populations at a large research university in Canada in order to test SRBCI's robustness and sensitivity in capturing useful data relating to the students' conceptual ability in statistical reasoning. Our analyses indicated that SRBCI is a unidimensional construct, with items that vary widely in difficulty and provide useful information about such student ability. SRBCI should be useful as a diagnostic tool in a variety of biology settings and as a means of measuring the success of teaching interventions designed to improve statistical reasoning skills. © 2016 T. Deane et al. CBE—Life Sciences Education © 2016 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).

  12. Theoretical Conversions of Different Hardness and Tensile Strength for Ductile Materials Based on Stress-Strain Curves

    NASA Astrophysics Data System (ADS)

    Chen, Hui; Cai, Li-Xun

    2018-04-01

    Based on the power-law stress-strain relation and equivalent energy principle, theoretical equations for converting between Brinell hardness (HB), Rockwell hardness (HR), and Vickers hardness (HV) were established. Combining the pre-existing relation between the tensile strength ( σ b ) and Hollomon parameters ( K, N), theoretical conversions between hardness (HB/HR/HV) and tensile strength ( σ b ) were obtained as well. In addition, to confirm the pre-existing σ b -( K, N) relation, a large number of uniaxial tensile tests were conducted in various ductile materials. Finally, to verify the theoretical conversions, plenty of statistical data listed in ASTM and ISO standards were adopted to test the robustness of the converting equations with various hardness and tensile strength. The results show that both hardness conversions and hardness-strength conversions calculated from the theoretical equations accord well with the standard data.

  13. Childhood maltreatment and educational outcomes: evidence from South Africa.

    PubMed

    Pieterse, Duncan

    2015-07-01

    Many South African children experience maltreatment, but we know little about the effects on long-term child development. Using the only representative dataset that includes a module on childhood maltreatment for a metropolitan city in South Africa, we explore the association between different measures of childhood maltreatment and two educational outcomes (numeracy test scores and dropout). Our study provides an estimate of the association between childhood maltreatment and educational outcomes in a developing country where maltreatment is high. We control for potential confounders using a range of statistical techniques and add several robustness checks to evaluate the strength of our findings. Our results indicate that children who are maltreated suffer large adverse consequences in terms of their numeracy test scores and probability of dropout and that the estimated effects of maltreatment are larger and more consistent for the most severe type of maltreatment. Copyright © 2014 John Wiley & Sons, Ltd.

  14. Biochemical Phenotypes to Discriminate Microbial Subpopulations and Improve Outbreak Detection

    PubMed Central

    Galar, Alicia; Kulldorff, Martin; Rudnick, Wallis; O'Brien, Thomas F.; Stelling, John

    2013-01-01

    Background Clinical microbiology laboratories worldwide constitute an invaluable resource for monitoring emerging threats and the spread of antimicrobial resistance. We studied the growing number of biochemical tests routinely performed on clinical isolates to explore their value as epidemiological markers. Methodology/Principal Findings Microbiology laboratory results from January 2009 through December 2011 from a 793-bed hospital stored in WHONET were examined. Variables included patient location, collection date, organism, and 47 biochemical and 17 antimicrobial susceptibility test results reported by Vitek 2. To identify biochemical tests that were particularly valuable (stable with repeat testing, but good variability across the species) or problematic (inconsistent results with repeat testing), three types of variance analyses were performed on isolates of K. pneumonia: descriptive analysis of discordant biochemical results in same-day isolates, an average within-patient variance index, and generalized linear mixed model variance component analysis. Results: 4,200 isolates of K. pneumoniae were identified from 2,485 patients, 32% of whom had multiple isolates. The first two variance analyses highlighted SUCT, TyrA, GlyA, and GGT as “nuisance” biochemicals for which discordant within-patient test results impacted a high proportion of patient results, while dTAG had relatively good within-patient stability with good heterogeneity across the species. Variance component analyses confirmed the relative stability of dTAG, and identified additional biochemicals such as PHOS with a large between patient to within patient variance ratio. A reduced subset of biochemicals improved the robustness of strain definition for carbapenem-resistant K. pneumoniae. Surveillance analyses suggest that the reduced biochemical profile could improve the timeliness and specificity of outbreak detection algorithms. Conclusions The statistical approaches explored can improve the robust recognition of microbial subpopulations with routinely available biochemical test results, of value in the timely detection of outbreak clones and evolutionarily important genetic events. PMID:24391936

  15. New approach in the quantum statistical parton distribution

    NASA Astrophysics Data System (ADS)

    Sohaily, Sozha; Vaziri (Khamedi), Mohammad

    2017-12-01

    An attempt to find simple parton distribution functions (PDFs) based on quantum statistical approach is presented. The PDFs described by the statistical model have very interesting physical properties which help to understand the structure of partons. The longitudinal portion of distribution functions are given by applying the maximum entropy principle. An interesting and simple approach to determine the statistical variables exactly without fitting and fixing parameters is surveyed. Analytic expressions of the x-dependent PDFs are obtained in the whole x region [0, 1], and the computed distributions are consistent with the experimental observations. The agreement with experimental data, gives a robust confirm of our simple presented statistical model.

  16. Reward Capacity Predicts Leptin Dynamics During Laboratory-Controlled Eating in Women as a Function of Body Mass Index

    PubMed Central

    Holsen, Laura M.; Jackson, Benita

    2017-01-01

    Objective The role of leptin in mesolimbic signaling non-food-related reward has been well established at the pre-clinical level, yet studies in humans are lacking. The present investigation explored the association between hedonic capacity and leptin dynamics, and whether this association differed by BMI class. Methods In this cross-sectional study of 75 women (42 with lean BMIs, 33 with obese BMIs), we measured serum leptin before/after meal consumption. Reward capacity was assessed using the Snaith-Hamilton Pleasure Scale (SHAPS). Multiple regression tested whether reward capacity was associated with leptin AUC, with an interaction term to test differences between lean (LN) and obese (OB) groups. Results The interaction of SHAPS by BMI group was robust (β=−.40, p=.005); among women with obesity, greater SHAPS score was associated with lower leptin AUC (β=−.35, p=.002, adjusted R-squared=.66). Among the lean group, the association was not statistically significant (β=−.16, p=.252, adjusted R-squared=.22). Findings were above and beyond BMI and age. Conclusions In this sample a robust, negative association between reward capacity and circulating leptin was stronger in women with obesity compared to lean counterparts. These findings suggest that despite likely leptin resistance, inhibitory leptin functioning related to non-food reward may be spared in women with obesity. PMID:28722317

  17. Cultural adaptation and validation of the Health Literacy Questionnaire (HLQ): robust nine-dimension Danish language confirmatory factor model.

    PubMed

    Maindal, Helle Terkildsen; Kayser, Lars; Norgaard, Ole; Bo, Anne; Elsworth, Gerald R; Osborne, Richard H

    2016-01-01

    Health literacy is an important construct in population health and healthcare requiring rigorous measurement. The Health Literacy Questionnaire (HLQ), with nine scales, measures a broad perception of health literacy. This study aimed to adapt the HLQ to the Danish setting, and to examine the factor structure, homogeneity, reliability and discriminant validity. The HLQ was adapted using forward-backward translation, consensus conference and cognitive interviews (n = 15). Psychometric properties were examined based on data collected by face-to-face interview (n = 481). Tests included difficulty level, composite scale reliability and confirmatory factor analysis (CFA). Cognitive testing revealed that only minor re-wording was required. The easiest scale to respond to positively was 'Social support for health', and the hardest were 'Navigating the healthcare system' and 'Appraisal of health information'. CFA of the individual scales showed acceptably high loadings (range 0.49-0.93). CFA fit statistics after including correlated residuals were good for seven scales, acceptable for one. Composite reliability and Cronbach's α were >0.8 for all but one scale. A nine-factor CFA model was fitted to items with no cross-loadings or correlated residuals allowed. Given this restricted model, the fit was satisfactory. The HLQ appears robust for its intended application of assessing health literacy in a range of settings. Further work is required to demonstrate sensitivity to measure changes.

  18. A New Color Image Encryption Scheme Using CML and a Fractional-Order Chaotic System

    PubMed Central

    Wu, Xiangjun; Li, Yang; Kurths, Jürgen

    2015-01-01

    The chaos-based image cryptosystems have been widely investigated in recent years to provide real-time encryption and transmission. In this paper, a novel color image encryption algorithm by using coupled-map lattices (CML) and a fractional-order chaotic system is proposed to enhance the security and robustness of the encryption algorithms with a permutation-diffusion structure. To make the encryption procedure more confusing and complex, an image division-shuffling process is put forward, where the plain-image is first divided into four sub-images, and then the position of the pixels in the whole image is shuffled. In order to generate initial conditions and parameters of two chaotic systems, a 280-bit long external secret key is employed. The key space analysis, various statistical analysis, information entropy analysis, differential analysis and key sensitivity analysis are introduced to test the security of the new image encryption algorithm. The cryptosystem speed is analyzed and tested as well. Experimental results confirm that, in comparison to other image encryption schemes, the new algorithm has higher security and is fast for practical image encryption. Moreover, an extensive tolerance analysis of some common image processing operations such as noise adding, cropping, JPEG compression, rotation, brightening and darkening, has been performed on the proposed image encryption technique. Corresponding results reveal that the proposed image encryption method has good robustness against some image processing operations and geometric attacks. PMID:25826602

  19. Omnibus risk assessment via accelerated failure time kernel machine modeling.

    PubMed

    Sinnott, Jennifer A; Cai, Tianxi

    2013-12-01

    Integrating genomic information with traditional clinical risk factors to improve the prediction of disease outcomes could profoundly change the practice of medicine. However, the large number of potential markers and possible complexity of the relationship between markers and disease make it difficult to construct accurate risk prediction models. Standard approaches for identifying important markers often rely on marginal associations or linearity assumptions and may not capture non-linear or interactive effects. In recent years, much work has been done to group genes into pathways and networks. Integrating such biological knowledge into statistical learning could potentially improve model interpretability and reliability. One effective approach is to employ a kernel machine (KM) framework, which can capture nonlinear effects if nonlinear kernels are used (Scholkopf and Smola, 2002; Liu et al., 2007, 2008). For survival outcomes, KM regression modeling and testing procedures have been derived under a proportional hazards (PH) assumption (Li and Luan, 2003; Cai, Tonini, and Lin, 2011). In this article, we derive testing and prediction methods for KM regression under the accelerated failure time (AFT) model, a useful alternative to the PH model. We approximate the null distribution of our test statistic using resampling procedures. When multiple kernels are of potential interest, it may be unclear in advance which kernel to use for testing and estimation. We propose a robust Omnibus Test that combines information across kernels, and an approach for selecting the best kernel for estimation. The methods are illustrated with an application in breast cancer. © 2013, The International Biometric Society.

  20. A Tutorial for Analyzing Human Reaction Times: How to Filter Data, Manage Missing Values, and Choose a Statistical Model

    ERIC Educational Resources Information Center

    Lachaud, Christian Michel; Renaud, Olivier

    2011-01-01

    This tutorial for the statistical processing of reaction times collected through a repeated-measure design is addressed to researchers in psychology. It aims at making explicit some important methodological issues, at orienting researchers to the existing solutions, and at providing them some evaluation tools for choosing the most robust and…

  1. Statistical Techniques for Signal Processing

    DTIC Science & Technology

    1993-01-12

    functions and extended influence functions of the associated underlying estimators. An interesting application of the influence function and its...and related filter smtctures. While the influence function is best known for its role in characterizing the robustness of estimators. the mathematical...statistics can be designed and analyzed for performance using the influence function as a tool. In particular, we have examined the mean-median

  2. When Mommy Comes to the Rescue of Statistics: Infants Combine Top-Down and Bottom-Up Cues to Segment Speech

    ERIC Educational Resources Information Center

    Mersad, Karima; Nazzi, Thierry

    2012-01-01

    Transitional Probability (TP) computations are regarded as a powerful learning mechanism that is functional early in development and has been proposed as an initial bootstrapping device for speech segmentation. However, a recent study casts doubt on the robustness of early statistical word-learning. Johnson and Tyler (2010) showed that when…

  3. The power and robustness of maximum LOD score statistics.

    PubMed

    Yoo, Y J; Mendell, N R

    2008-07-01

    The maximum LOD score statistic is extremely powerful for gene mapping when calculated using the correct genetic parameter value. When the mode of genetic transmission is unknown, the maximum of the LOD scores obtained using several genetic parameter values is reported. This latter statistic requires higher critical value than the maximum LOD score statistic calculated from a single genetic parameter value. In this paper, we compare the power of maximum LOD scores based on three fixed sets of genetic parameter values with the power of the LOD score obtained after maximizing over the entire range of genetic parameter values. We simulate family data under nine generating models. For generating models with non-zero phenocopy rates, LOD scores maximized over the entire range of genetic parameters yielded greater power than maximum LOD scores for fixed sets of parameter values with zero phenocopy rates. No maximum LOD score was consistently more powerful than the others for generating models with a zero phenocopy rate. The power loss of the LOD score maximized over the entire range of genetic parameters, relative to the maximum LOD score calculated using the correct genetic parameter value, appeared to be robust to the generating models.

  4. Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kleijnen, J.P.C.; Helton, J.C.

    1999-04-01

    The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are consideredmore » for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.« less

  5. Variation in reaction norms: Statistical considerations and biological interpretation.

    PubMed

    Morrissey, Michael B; Liefting, Maartje

    2016-09-01

    Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.

  6. Ecological Momentary Assessments and Automated Time Series Analysis to Promote Tailored Health Care: A Proof-of-Principle Study.

    PubMed

    van der Krieke, Lian; Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith Gm; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter

    2015-08-07

    Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher's tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use.

  7. Ecological Momentary Assessments and Automated Time Series Analysis to Promote Tailored Health Care: A Proof-of-Principle Study

    PubMed Central

    Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith GM; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter

    2015-01-01

    Background Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. Objective This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. Methods We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher’s tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). Results An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Conclusions Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use. PMID:26254160

  8. Statistical Model to Analyze Quantitative Proteomics Data Obtained by 18O/16O Labeling and Linear Ion Trap Mass Spectrometry

    PubMed Central

    Jorge, Inmaculada; Navarro, Pedro; Martínez-Acedo, Pablo; Núñez, Estefanía; Serrano, Horacio; Alfranca, Arántzazu; Redondo, Juan Miguel; Vázquez, Jesús

    2009-01-01

    Statistical models for the analysis of protein expression changes by stable isotope labeling are still poorly developed, particularly for data obtained by 16O/18O labeling. Besides large scale test experiments to validate the null hypothesis are lacking. Although the study of mechanisms underlying biological actions promoted by vascular endothelial growth factor (VEGF) on endothelial cells is of considerable interest, quantitative proteomics studies on this subject are scarce and have been performed after exposing cells to the factor for long periods of time. In this work we present the largest quantitative proteomics study to date on the short term effects of VEGF on human umbilical vein endothelial cells by 18O/16O labeling. Current statistical models based on normality and variance homogeneity were found unsuitable to describe the null hypothesis in a large scale test experiment performed on these cells, producing false expression changes. A random effects model was developed including four different sources of variance at the spectrum-fitting, scan, peptide, and protein levels. With the new model the number of outliers at scan and peptide levels was negligible in three large scale experiments, and only one false protein expression change was observed in the test experiment among more than 1000 proteins. The new model allowed the detection of significant protein expression changes upon VEGF stimulation for 4 and 8 h. The consistency of the changes observed at 4 h was confirmed by a replica at a smaller scale and further validated by Western blot analysis of some proteins. Most of the observed changes have not been described previously and are consistent with a pattern of protein expression that dynamically changes over time following the evolution of the angiogenic response. With this statistical model the 18O labeling approach emerges as a very promising and robust alternative to perform quantitative proteomics studies at a depth of several thousand proteins. PMID:19181660

  9. Robust Ultraviolet-Visible (UV-Vis) Partial Least-Squares (PLS) Models for Tannin Quantification in Red Wine.

    PubMed

    Aleixandre-Tudo, José Luis; Nieuwoudt, Helené; Aleixandre, José Luis; Du Toit, Wessel J

    2015-02-04

    The validation of ultraviolet-visible (UV-vis) spectroscopy combined with partial least-squares (PLS) regression to quantify red wine tannins is reported. The methylcellulose precipitable (MCP) tannin assay and the bovine serum albumin (BSA) tannin assay were used as reference methods. To take the high variability of wine tannins into account when the calibration models were built, a diverse data set was collected from samples of South African red wines that consisted of 18 different cultivars, from regions spanning the wine grape-growing areas of South Africa with their various sites, climates, and soils, ranging in vintage from 2000 to 2012. A total of 240 wine samples were analyzed, and these were divided into a calibration set (n = 120) and a validation set (n = 120) to evaluate the predictive ability of the models. To test the robustness of the PLS calibration models, the predictive ability of the classifying variables cultivar, vintage year, and experimental versus commercial wines was also tested. In general, the statistics obtained when BSA was used as a reference method were slightly better than those obtained with MCP. Despite this, the MCP tannin assay should also be considered as a valid reference method for developing PLS calibrations. The best calibration statistics for the prediction of new samples were coefficient of correlation (R 2 val) = 0.89, root mean standard error of prediction (RMSEP) = 0.16, and residual predictive deviation (RPD) = 3.49 for MCP and R 2 val = 0.93, RMSEP = 0.08, and RPD = 4.07 for BSA, when only the UV region (260-310 nm) was selected, which also led to a faster analysis time. In addition, a difference in the results obtained when the predictive ability of the classifying variables vintage, cultivar, or commercial versus experimental wines was studied suggests that tannin composition is highly affected by many factors. This study also discusses the correlations in tannin values between the methylcellulose and protein precipitation methods.

  10. Bayesian Dose-Response Modeling in Sparse Data

    NASA Astrophysics Data System (ADS)

    Kim, Steven B.

    This book discusses Bayesian dose-response modeling in small samples applied to two different settings. The first setting is early phase clinical trials, and the second setting is toxicology studies in cancer risk assessment. In early phase clinical trials, experimental units are humans who are actual patients. Prior to a clinical trial, opinions from multiple subject area experts are generally more informative than the opinion of a single expert, but we may face a dilemma when they have disagreeing prior opinions. In this regard, we consider compromising the disagreement and compare two different approaches for making a decision. In addition to combining multiple opinions, we also address balancing two levels of ethics in early phase clinical trials. The first level is individual-level ethics which reflects the perspective of trial participants. The second level is population-level ethics which reflects the perspective of future patients. We extensively compare two existing statistical methods which focus on each perspective and propose a new method which balances the two conflicting perspectives. In toxicology studies, experimental units are living animals. Here we focus on a potential non-monotonic dose-response relationship which is known as hormesis. Briefly, hormesis is a phenomenon which can be characterized by a beneficial effect at low doses and a harmful effect at high doses. In cancer risk assessments, the estimation of a parameter, which is known as a benchmark dose, can be highly sensitive to a class of assumptions, monotonicity or hormesis. In this regard, we propose a robust approach which considers both monotonicity and hormesis as a possibility. In addition, We discuss statistical hypothesis testing for hormesis and consider various experimental designs for detecting hormesis based on Bayesian decision theory. Past experiments have not been optimally designed for testing for hormesis, and some Bayesian optimal designs may not be optimal under a wrong parametric assumption. In this regard, we consider a robust experimental design which does not require any parametric assumption.

  11. Mutual interference between statistical summary perception and statistical learning.

    PubMed

    Zhao, Jiaying; Ngo, Nhi; McKendrick, Ryan; Turk-Browne, Nicholas B

    2011-09-01

    The visual system is an efficient statistician, extracting statistical summaries over sets of objects (statistical summary perception) and statistical regularities among individual objects (statistical learning). Although these two kinds of statistical processing have been studied extensively in isolation, their relationship is not yet understood. We first examined how statistical summary perception influences statistical learning by manipulating the task that participants performed over sets of objects containing statistical regularities (Experiment 1). Participants who performed a summary task showed no statistical learning of the regularities, whereas those who performed control tasks showed robust learning. We then examined how statistical learning influences statistical summary perception by manipulating whether the sets being summarized contained regularities (Experiment 2) and whether such regularities had already been learned (Experiment 3). The accuracy of summary judgments improved when regularities were removed and when learning had occurred in advance. In sum, calculating summary statistics impeded statistical learning, and extracting statistical regularities impeded statistical summary perception. This mutual interference suggests that statistical summary perception and statistical learning are fundamentally related.

  12. Robust Feature Selection Technique using Rank Aggregation.

    PubMed

    Sarkar, Chandrima; Cooley, Sarah; Srivastava, Jaideep

    2014-01-01

    Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique which produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases while classifiers exploit different statistical properties of data for evaluation. In numerous situations this can put researchers into dilemma as to which feature selection method and a classifiers to choose from a vast range of choices. In this paper, we propose a technique that aggregates the consensus properties of various feature selection methods to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable towards achieving similar and ideally higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the Robustness Index (RI). We perform an extensive empirical evaluation of our technique on eight data sets with different dimensions including Arrythmia, Lung Cancer, Madelon, mfeat-fourier, internet-ads, Leukemia-3c and Embryonal Tumor and a real world data set namely Acute Myeloid Leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that compared to other techniques our algorithm improves the classification accuracy by approximately 3-4% (in data set with less than 500 features) and by more than 5% (in data set with more than 500 features), across a wide range of classifiers.

  13. Estimating biozone hydraulic conductivity in wastewater soil-infiltration systems using inverse numerical modeling.

    PubMed

    Bumgarner, Johnathan R; McCray, John E

    2007-06-01

    During operation of an onsite wastewater treatment system, a low-permeability biozone develops at the infiltrative surface (IS) during application of wastewater to soil. Inverse numerical-model simulations were used to estimate the biozone saturated hydraulic conductivity (K(biozone)) under variably saturated conditions for 29 wastewater infiltration test cells installed in a sandy loam field soil. Test cells employed two loading rates (4 and 8cm/day) and 3 IS designs: open chamber, gravel, and synthetic bundles. The ratio of K(biozone) to the saturated hydraulic conductivity of the natural soil (K(s)) was used to quantify the reductions in the IS hydraulic conductivity. A smaller value of K(biozone)/K(s,) reflects a greater reduction in hydraulic conductivity. The IS hydraulic conductivity was reduced by 1-3 orders of magnitude. The reduction in IS hydraulic conductivity was primarily influenced by wastewater loading rate and IS type and not by the K(s) of the native soil. The higher loading rate yielded greater reductions in IS hydraulic conductivity than the lower loading rate for bundle and gravel cells, but the difference was not statistically significant for chamber cells. Bundle and gravel cells exhibited a greater reduction in IS hydraulic conductivity than chamber cells at the higher loading rates, while the difference between gravel and bundle systems was not statistically significant. At the lower rate, bundle cells exhibited generally lower K(biozone)/K(s) values, but not at a statistically significant level, while gravel and chamber cells were statistically similar. Gravel cells exhibited the greatest variability in measured values, which may complicate design efforts based on K(biozone) evaluations for these systems. These results suggest that chamber systems may provide for a more robust design, particularly for high or variable wastewater infiltration rates.

  14. The comparison of robust partial least squares regression with robust principal component regression on a real

    NASA Astrophysics Data System (ADS)

    Polat, Esra; Gunay, Suleyman

    2013-10-01

    One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.

  15. [Regression on order statistics and its application in estimating nondetects for food exposure assessment].

    PubMed

    Yu, Xiaojin; Liu, Pei; Min, Jie; Chen, Qiguang

    2009-01-01

    To explore the application of regression on order statistics (ROS) in estimating nondetects for food exposure assessment. Regression on order statistics was adopted in analysis of cadmium residual data set from global food contaminant monitoring, the mean residual was estimated basing SAS programming and compared with the results from substitution methods. The results show that ROS method performs better obviously than substitution methods for being robust and convenient for posterior analysis. Regression on order statistics is worth to adopt,but more efforts should be make for details of application of this method.

  16. Incorporation of support vector machines in the LIBS toolbox for sensitive and robust classification amidst unexpected sample and system variability

    PubMed Central

    ChariDingari, Narahara; Barman, Ishan; Myakalwar, Ashwin Kumar; Tewari, Surya P.; Kumar, G. Manoj

    2012-01-01

    Despite the intrinsic elemental analysis capability and lack of sample preparation requirements, laser-induced breakdown spectroscopy (LIBS) has not been extensively used for real world applications, e.g. quality assurance and process monitoring. Specifically, variability in sample, system and experimental parameters in LIBS studies present a substantive hurdle for robust classification, even when standard multivariate chemometric techniques are used for analysis. Considering pharmaceutical sample investigation as an example, we propose the use of support vector machines (SVM) as a non-linear classification method over conventional linear techniques such as soft independent modeling of class analogy (SIMCA) and partial least-squares discriminant analysis (PLS-DA) for discrimination based on LIBS measurements. Using over-the-counter pharmaceutical samples, we demonstrate that application of SVM enables statistically significant improvements in prospective classification accuracy (sensitivity), due to its ability to address variability in LIBS sample ablation and plasma self-absorption behavior. Furthermore, our results reveal that SVM provides nearly 10% improvement in correct allocation rate and a concomitant reduction in misclassification rates of 75% (cf. PLS-DA) and 80% (cf. SIMCA)-when measurements from samples not included in the training set are incorporated in the test data – highlighting its robustness. While further studies on a wider matrix of sample types performed using different LIBS systems is needed to fully characterize the capability of SVM to provide superior predictions, we anticipate that the improved sensitivity and robustness observed here will facilitate application of the proposed LIBS-SVM toolbox for screening drugs and detecting counterfeit samples as well as in related areas of forensic and biological sample analysis. PMID:22292496

  17. Using ventricular modeling to robustly probe significant deep gray matter pathologies: Application to cerebral palsy.

    PubMed

    Pagnozzi, Alex M; Shen, Kaikai; Doecke, James D; Boyd, Roslyn N; Bradley, Andrew P; Rose, Stephen; Dowson, Nicholas

    2016-11-01

    Understanding the relationships between the structure and function of the brain largely relies on the qualitative assessment of Magnetic Resonance Images (MRIs) by expert clinicians. Automated analysis systems can support these assessments by providing quantitative measures of brain injury. However, the assessment of deep gray matter structures, which are critical to motor and executive function, remains difficult as a result of large anatomical injuries commonly observed in children with Cerebral Palsy (CP). Hence, this article proposes a robust surrogate marker of the extent of deep gray matter injury based on impingement due to local ventricular enlargement on surrounding anatomy. Local enlargement was computed using a statistical shape model of the lateral ventricles constructed from 44 healthy subjects. Measures of injury on 95 age-matched CP patients were used to train a regression model to predict six clinical measures of function. The robustness of identifying ventricular enlargement was demonstrated by an area under the curve of 0.91 when tested against a dichotomised expert clinical assessment. The measures also showed strong and significant relationships for multiple clinical scores, including: motor function (r 2  = 0.62, P < 0.005), executive function (r 2  = 0.55, P < 0.005), and communication (r 2  = 0.50, P < 0.005), especially compared to using volumes obtained from standard anatomical segmentation approaches. The lack of reliance on accurate anatomical segmentations and its resulting robustness to large anatomical variations is a key feature of the proposed automated approach. This coupled with its strong correlation with clinically meaningful scores, signifies the potential utility to repeatedly assess MRIs for clinicians diagnosing children with CP. Hum Brain Mapp 37:3795-3809, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  18. Incorporation of support vector machines in the LIBS toolbox for sensitive and robust classification amidst unexpected sample and system variability.

    PubMed

    Dingari, Narahara Chari; Barman, Ishan; Myakalwar, Ashwin Kumar; Tewari, Surya P; Kumar Gundawar, Manoj

    2012-03-20

    Despite the intrinsic elemental analysis capability and lack of sample preparation requirements, laser-induced breakdown spectroscopy (LIBS) has not been extensively used for real-world applications, e.g., quality assurance and process monitoring. Specifically, variability in sample, system, and experimental parameters in LIBS studies present a substantive hurdle for robust classification, even when standard multivariate chemometric techniques are used for analysis. Considering pharmaceutical sample investigation as an example, we propose the use of support vector machines (SVM) as a nonlinear classification method over conventional linear techniques such as soft independent modeling of class analogy (SIMCA) and partial least-squares discriminant analysis (PLS-DA) for discrimination based on LIBS measurements. Using over-the-counter pharmaceutical samples, we demonstrate that the application of SVM enables statistically significant improvements in prospective classification accuracy (sensitivity), because of its ability to address variability in LIBS sample ablation and plasma self-absorption behavior. Furthermore, our results reveal that SVM provides nearly 10% improvement in correct allocation rate and a concomitant reduction in misclassification rates of 75% (cf. PLS-DA) and 80% (cf. SIMCA)-when measurements from samples not included in the training set are incorporated in the test data-highlighting its robustness. While further studies on a wider matrix of sample types performed using different LIBS systems is needed to fully characterize the capability of SVM to provide superior predictions, we anticipate that the improved sensitivity and robustness observed here will facilitate application of the proposed LIBS-SVM toolbox for screening drugs and detecting counterfeit samples, as well as in related areas of forensic and biological sample analysis.

  19. SU-E-J-85: Leave-One-Out Perturbation (LOOP) Fitting Algorithm for Absolute Dose Film Calibration

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chu, A; Ahmad, M; Chen, Z

    2014-06-01

    Purpose: To introduce an outliers-recognition fitting routine for film dosimetry. It cannot only be flexible with any linear and non-linear regression but also can provide information for the minimal number of sampling points, critical sampling distributions and evaluating analytical functions for absolute film-dose calibration. Methods: The technique, leave-one-out (LOO) cross validation, is often used for statistical analyses on model performance. We used LOO analyses with perturbed bootstrap fitting called leave-one-out perturbation (LOOP) for film-dose calibration . Given a threshold, the LOO process detects unfit points (“outliers”) compared to other cohorts, and a bootstrap fitting process follows to seek any possibilitiesmore » of using perturbations for further improvement. After that outliers were reconfirmed by a traditional t-test statistics and eliminated, then another LOOP feedback resulted in the final. An over-sampled film-dose- calibration dataset was collected as a reference (dose range: 0-800cGy), and various simulated conditions for outliers and sampling distributions were derived from the reference. Comparisons over the various conditions were made, and the performance of fitting functions, polynomial and rational functions, were evaluated. Results: (1) LOOP can prove its sensitive outlier-recognition by its statistical correlation to an exceptional better goodness-of-fit as outliers being left-out. (2) With sufficient statistical information, the LOOP can correct outliers under some low-sampling conditions that other “robust fits”, e.g. Least Absolute Residuals, cannot. (3) Complete cross-validated analyses of LOOP indicate that the function of rational type demonstrates a much superior performance compared to the polynomial. Even with 5 data points including one outlier, using LOOP with rational function can restore more than a 95% value back to its reference values, while the polynomial fitting completely failed under the same conditions. Conclusion: LOOP can cooperate with any fitting routine functioning as a “robust fit”. In addition, it can be set as a benchmark for film-dose calibration fitting performance.« less

  20. A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula

    PubMed Central

    Giordano, Bruno L.; Kayser, Christoph; Rousselet, Guillaume A.; Gross, Joachim; Schyns, Philippe G.

    2016-01-01

    Abstract We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open‐source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541–1573, 2017. © 2016 Wiley Periodicals, Inc. PMID:27860095

  1. Testing Genetic Pleiotropy with GWAS Summary Statistics for Marginal and Conditional Analyses.

    PubMed

    Deng, Yangqing; Pan, Wei

    2017-12-01

    There is growing interest in testing genetic pleiotropy, which is when a single genetic variant influences multiple traits. Several methods have been proposed; however, these methods have some limitations. First, all the proposed methods are based on the use of individual-level genotype and phenotype data; in contrast, for logistical, and other, reasons, summary statistics of univariate SNP-trait associations are typically only available based on meta- or mega-analyzed large genome-wide association study (GWAS) data. Second, existing tests are based on marginal pleiotropy, which cannot distinguish between direct and indirect associations of a single genetic variant with multiple traits due to correlations among the traits. Hence, it is useful to consider conditional analysis, in which a subset of traits is adjusted for another subset of traits. For example, in spite of substantial lowering of low-density lipoprotein cholesterol (LDL) with statin therapy, some patients still maintain high residual cardiovascular risk, and, for these patients, it might be helpful to reduce their triglyceride (TG) level. For this purpose, in order to identify new therapeutic targets, it would be useful to identify genetic variants with pleiotropic effects on LDL and TG after adjusting the latter for LDL; otherwise, a pleiotropic effect of a genetic variant detected by a marginal model could simply be due to its association with LDL only, given the well-known correlation between the two types of lipids. Here, we develop a new pleiotropy testing procedure based only on GWAS summary statistics that can be applied for both marginal analysis and conditional analysis. Although the main technical development is based on published union-intersection testing methods, care is needed in specifying conditional models to avoid invalid statistical estimation and inference. In addition to the previously used likelihood ratio test, we also propose using generalized estimating equations under the working independence model for robust inference. We provide numerical examples based on both simulated and real data, including two large lipid GWAS summary association datasets based on ∼100,000 and ∼189,000 samples, respectively, to demonstrate the difference between marginal and conditional analyses, as well as the effectiveness of our new approach. Copyright © 2017 by the Genetics Society of America.

  2. Statistical Properties of Lorenz-like Flows, Recent Developments and Perspectives

    NASA Astrophysics Data System (ADS)

    Araujo, Vitor; Galatolo, Stefano; Pacifico, Maria José

    We comment on the mathematical results about the statistical behavior of Lorenz equations and its attractor, and more generally on the class of singular hyperbolic systems. The mathematical theory of such kind of systems turned out to be surprisingly difficult. It is remarkable that a rigorous proof of the existence of the Lorenz attractor was presented only around the year 2000 with a computer-assisted proof together with an extension of the hyperbolic theory developed to encompass attractors robustly containing equilibria. We present some of the main results on the statistical behavior of such systems. We show that for attractors of three-dimensional flows, robust chaotic behavior is equivalent to the existence of certain hyperbolic structures, known as singular-hyperbolicity. These structures, in turn, are associated with the existence of physical measures: in low dimensions, robust chaotic behavior for flows ensures the existence of a physical measure. We then give more details on recent results on the dynamics of singular-hyperbolic (Lorenz-like) attractors: (1) there exists an invariant foliation whose leaves are forward contracted by the flow (and further properties which are useful to understand the statistical properties of the dynamics); (2) there exists a positive Lyapunov exponent at every orbit; (3) there is a unique physical measure whose support is the whole attractor and which is the equilibrium state with respect to the center-unstable Jacobian; (4) this measure is exact dimensional; (5) the induced measure on a suitable family of cross-sections has exponential decay of correlations for Lipschitz observables with respect to a suitable Poincaré return time map; (6) the hitting time associated to Lorenz-like attractors satisfy a logarithm law; (7) the geometric Lorenz flow satisfies the Almost Sure Invariance Principle (ASIP) and the Central Limit Theorem (CLT); (8) the rate of decay of large deviations for the volume measure on the ergodic basin of a geometric Lorenz attractor is exponential; (9) a class of geometric Lorenz flows exhibits robust exponential decay of correlations; (10) all geometric Lorenz flows are rapidly mixing and their time-1 map satisfies both ASIP and CLT.

  3. The Weighting Is The Hardest Part: On The Behavior of the Likelihood Ratio Test and the Score Test Under a Data-Driven Weighting Scheme in Sequenced Samples

    PubMed Central

    Minică, Camelia C.; Genovese, Giulio; Hultman, Christina M.; Pool, René; Vink, Jacqueline M.; Neale, Michael C.; Dolan, Conor V.; Neale, Benjamin M.

    2017-01-01

    Sequence-based association studies are at a critical inflexion point with the increasing availability of exome-sequencing data. A popular test of association is the sequence kernel association test (SKAT). Weights are embedded within SKAT to reflect the hypothesized contribution of the variants to the trait variance. Because the true weights are generally unknown, and so are subject to misspecification, we examined the efficiency of a data-driven weighting scheme. We propose the use of a set of theoretically defensible weighting schemes, of which, we assume, the one that gives the largest test statistic is likely to capture best the allele frequency-functional effect relationship. We show that the use of alternative weights obviates the need to impose arbitrary frequency thresholds in sequence data association analyses. As both the score test and the likelihood ratio test (LRT) may be used in this context, and may differ in power, we characterize the behavior of both tests. We found that the two tests have equal power if the set of weights resembled the correct ones. However, if the weights are badly specified, the LRT shows superior power (due to its robustness to misspecification). With this data-driven weighting procedure the LRT detected significant signal in genes located in regions already confirmed as associated with schizophrenia – the PRRC2A (P=1.020E-06) and the VARS2 (P=2.383E-06) – in the Swedish schizophrenia case-control cohort of 11,040 individuals with exome-sequencing data. The score test is currently preferred for its computational efficiency and power. Indeed, assuming correct specification, in some circumstances the score test is the most powerful. However, LRT has the advantageous properties of being generally more robust and more powerful under weight misspecification. This is an important result given that, arguably, misspecified models are likely to be the rule rather than the exception in weighting-based approaches. PMID:28238293

  4. Robust Strategy for Rocket Engine Health Monitoring

    NASA Technical Reports Server (NTRS)

    Santi, L. Michael

    2001-01-01

    Monitoring the health of rocket engine systems is essentially a two-phase process. The acquisition phase involves sensing physical conditions at selected locations, converting physical inputs to electrical signals, conditioning the signals as appropriate to establish scale or filter interference, and recording results in a form that is easy to interpret. The inference phase involves analysis of results from the acquisition phase, comparison of analysis results to established health measures, and assessment of health indications. A variety of analytical tools may be employed in the inference phase of health monitoring. These tools can be separated into three broad categories: statistical, rule based, and model based. Statistical methods can provide excellent comparative measures of engine operating health. They require well-characterized data from an ensemble of "typical" engines, or "golden" data from a specific test assumed to define the operating norm in order to establish reliable comparative measures. Statistical methods are generally suitable for real-time health monitoring because they do not deal with the physical complexities of engine operation. The utility of statistical methods in rocket engine health monitoring is hindered by practical limits on the quantity and quality of available data. This is due to the difficulty and high cost of data acquisition, the limited number of available test engines, and the problem of simulating flight conditions in ground test facilities. In addition, statistical methods incur a penalty for disregarding flow complexity and are therefore limited in their ability to define performance shift causality. Rule based methods infer the health state of the engine system based on comparison of individual measurements or combinations of measurements with defined health norms or rules. This does not mean that rule based methods are necessarily simple. Although binary yes-no health assessment can sometimes be established by relatively simple rules, the causality assignment needed for refined health monitoring often requires an exceptionally complex rule base involving complicated logical maps. Structuring the rule system to be clear and unambiguous can be difficult, and the expert input required to maintain a large logic network and associated rule base can be prohibitive.

  5. Can Targeted Intervention Mitigate Early Emotional and Behavioral Problems?: Generating Robust Evidence within Randomized Controlled Trials

    PubMed Central

    Doyle, Orla; McGlanaghy, Edel; O’Farrelly, Christine; Tremblay, Richard E.

    2016-01-01

    This study examined the impact of a targeted Irish early intervention program on children’s emotional and behavioral development using multiple methods to test the robustness of the results. Data on 164 Preparing for Life participants who were randomly assigned into an intervention group, involving home visits from pregnancy onwards, or a control group, was used to test the impact of the intervention on Child Behavior Checklist scores at 24-months. Using inverse probability weighting to account for differential attrition, permutation testing to address small sample size, and quantile regression to characterize the distributional impact of the intervention, we found that the few treatment effects were largely concentrated among boys most at risk of developing emotional and behavioral problems. The average treatment effect identified a 13% reduction in the likelihood of falling into the borderline clinical threshold for Total Problems. The interaction and subgroup analysis found that this main effect was driven by boys. The distributional analysis identified a 10-point reduction in the Externalizing Problems score for boys at the 90th percentile. No effects were observed for girls or for the continuous measures of Total, Internalizing, and Externalizing problems. These findings suggest that the impact of this prenatally commencing home visiting program may be limited to boys experiencing the most difficulties. Further adoption of the statistical methods applied here may help to improve the internal validity of randomized controlled trials and contribute to the field of evaluation science more generally. Trial Registration: ISRCTN Registry ISRCTN04631728 PMID:27253184

  6. A probabilistic approach to aircraft design emphasizing stability and control uncertainties

    NASA Astrophysics Data System (ADS)

    Delaurentis, Daniel Andrew

    In order to address identified deficiencies in current approaches to aerospace systems design, a new method has been developed. This new method for design is based on the premise that design is a decision making activity, and that deterministic analysis and synthesis can lead to poor, or misguided decision making. This is due to a lack of disciplinary knowledge of sufficient fidelity about the product, to the presence of uncertainty at multiple levels of the aircraft design hierarchy, and to a failure to focus on overall affordability metrics as measures of goodness. Design solutions are desired which are robust to uncertainty and are based on the maximum knowledge possible. The new method represents advances in the two following general areas. 1. Design models and uncertainty. The research performed completes a transition from a deterministic design representation to a probabilistic one through a modeling of design uncertainty at multiple levels of the aircraft design hierarchy, including: (1) Consistent, traceable uncertainty classification and representation; (2) Concise mathematical statement of the Probabilistic Robust Design problem; (3) Variants of the Cumulative Distribution Functions (CDFs) as decision functions for Robust Design; (4) Probabilistic Sensitivities which identify the most influential sources of variability. 2. Multidisciplinary analysis and design. Imbedded in the probabilistic methodology is a new approach for multidisciplinary design analysis and optimization (MDA/O), employing disciplinary analysis approximations formed through statistical experimentation and regression. These approximation models are a function of design variables common to the system level as well as other disciplines. For aircraft, it is proposed that synthesis/sizing is the proper avenue for integrating multiple disciplines. Research hypotheses are translated into a structured method, which is subsequently tested for validity. Specifically, the implementation involves the study of the relaxed static stability technology for a supersonic commercial transport aircraft. The probabilistic robust design method is exercised resulting in a series of robust design solutions based on different interpretations of "robustness". Insightful results are obtained and the ability of the method to expose trends in the design space are noted as a key advantage.

  7. Using a Standardized Clinical Quantitative Sensory Testing Battery to Judge the Clinical Relevance of Sensory Differences Between Adjacent Body Areas.

    PubMed

    Dimova, Violeta; Oertel, Bruno G; Lötsch, Jörn

    2017-01-01

    Skin sensitivity to sensory stimuli varies among different body areas. A standardized clinical quantitative sensory testing (QST) battery, established for the diagnosis of neuropathic pain, was used to assess whether the magnitude of differences between test sites reaches clinical significance. Ten different sensory QST measures derived from thermal and mechanical stimuli were obtained from 21 healthy volunteers (10 men) and used to create somatosensory profiles bilateral from the dorsum of the hands (the standard area for the assessment of normative values for the upper extremities as proposed by the German Research Network on Neuropathic Pain) and bilateral at volar forearms as a neighboring nonstandard area. The parameters obtained were statistically compared between test sites. Three of the 10 QST parameters differed significantly with respect to the "body area," that is, warmth detection, thermal sensory limen, and mechanical pain thresholds. After z-transformation and interpretation according to the QST battery's standard instructions, 22 abnormal values were obtained at the hand. Applying the same procedure to parameters assessed at the nonstandard site forearm, that is, z-transforming them to the reference values for the hand, 24 measurements values emerged as abnormal, which was not significantly different compared with the hand (P=0.4185). Sensory differences between neighboring body areas are statistically significant, reproducing prior knowledge. This has to be considered in scientific assessments where a small variation of the tested body areas may not be an option. However, the magnitude of these differences was below the difference in sensory parameters that is judged as abnormal, indicating a robustness of the QST instrument against protocol deviations with respect to the test area when using the method of comparison with a 95 % confidence interval of a reference dataset.

  8. Change-point analysis of geophysical time-series: application to landslide displacement rate (Séchilienne rock avalanche, France)

    NASA Astrophysics Data System (ADS)

    Amorese, D.; Grasso, J.-R.; Garambois, S.; Font, M.

    2018-05-01

    The rank-sum multiple change-point method is a robust statistical procedure designed to search for the optimal number and the location of change points in an arbitrary continue or discrete sequence of values. As such, this procedure can be used to analyse time-series data. Twelve years of robust data sets for the Séchilienne (French Alps) rockslide show a continuous increase in average displacement rate from 50 to 280 mm per month, in the 2004-2014 period, followed by a strong decrease back to 50 mm per month in the 2014-2015 period. When possible kinematic phases are tentatively suggested in previous studies, its solely rely on the basis of empirical threshold values. In this paper, we analyse how the use of a statistical algorithm for change-point detection helps to better understand time phases in landslide kinematics. First, we test the efficiency of the statistical algorithm on geophysical benchmark data, these data sets (stream flows and Northern Hemisphere temperatures) being already analysed by independent statistical tools. Second, we apply the method to 12-yr daily time-series of the Séchilienne landslide, for rainfall and displacement data, from 2003 December to 2015 December, in order to quantitatively extract changes in landslide kinematics. We find two strong significant discontinuities in the weekly cumulated rainfall values: an average rainfall rate increase is resolved in 2012 April and a decrease in 2014 August. Four robust changes are highlighted in the displacement time-series (2008 May, 2009 November-December-2010 January, 2012 September and 2014 March), the 2010 one being preceded by a significant but weak rainfall rate increase (in 2009 November). Accordingly, we are able to quantitatively define five kinematic stages for the Séchilienne rock avalanche during this period. The synchronization between the rainfall and displacement rate, only resolved at the end of 2009 and beginning of 2010, corresponds to a remarkable change (fourfold increase in mean displacement rate) in the landslide kinematic. This suggests that an increase of the rainfall is able to drive an increase of the landslide displacement rate, but that most of the kinematics of the landslide is not directly attributable to rainfall amount. The detailed exploration of the characteristics of the five kinematic stages suggests that the weekly averaged displacement rates are more tied to the frequency or rainy days than to the rainfall rate values. These results suggest the pattern of Séchilienne rock avalanche is consistent with the previous findings that landslide kinematics is dependent upon not only rainfall but also soil moisture conditions (as known as being more strongly related to precipitation frequency than to precipitation amount). Finally, our analysis of the displacement rate time-series pinpoints a susceptibility change of slope response to rainfall, as being slower before the end of 2009 than after, respectively. The kinematic history as depicted by statistical tools opens new routes to understand the apparent complexity of Séchilienne landslide kinematic.

  9. From Correlates to Causes: Can Quasi-Experimental Studies and Statistical Innovations Bring Us Closer to Identifying the Causes of Antisocial Behavior?

    PubMed Central

    Jaffee, Sara R.; Strait, Luciana B.; Odgers, Candice L.

    2011-01-01

    Longitudinal, epidemiological studies have identified robust risk factors for youth antisocial behavior, including harsh and coercive discipline, maltreatment, smoking during pregnancy, divorce, teen parenthood, peer deviance, parental psychopathology, and social disadvantage. Nevertheless, because this literature is largely based on observational studies, it remains unclear whether these risk factors have truly causal effects. Identifying causal risk factors for antisocial behavior would be informative for intervention efforts and for studies that test whether individuals are differentially susceptible to risk exposures. In this paper, we identify the challenges to causal inference posed by observational studies and describe quasi-experimental methods and statistical innovations that may move us beyond discussions of risk factors to allow for stronger causal inference. We then review studies that use these methods and we evaluate whether robust risk factors identified from observational studies are likely to play a causal role in the emergence and development of youth antisocial behavior. For most of the risk factors we review, there is evidence that they have causal effects. However, these effects are typically smaller than those reported in observational studies, suggesting that familial confounding, social selection, and misidentification might also explain some of the association between risk exposures and antisocial behavior. For some risk factors (e.g., smoking during pregnancy, parent alcohol problems) the evidence is weak that they have environmentally mediated effects on youth antisocial behavior. We discuss the implications of these findings for intervention efforts to reduce antisocial behavior and for basic research on the etiology and course of antisocial behavior. PMID:22023141

  10. Systematic Correlation Matrix Evaluation (SCoMaE) - a bottom-up, science-led approach to identifying indicators

    NASA Astrophysics Data System (ADS)

    Mengis, Nadine; Keller, David P.; Oschlies, Andreas

    2018-01-01

    This study introduces the Systematic Correlation Matrix Evaluation (SCoMaE) method, a bottom-up approach which combines expert judgment and statistical information to systematically select transparent, nonredundant indicators for a comprehensive assessment of the state of the Earth system. The methods consists of two basic steps: (1) the calculation of a correlation matrix among variables relevant for a given research question and (2) the systematic evaluation of the matrix, to identify clusters of variables with similar behavior and respective mutually independent indicators. Optional further analysis steps include (3) the interpretation of the identified clusters, enabling a learning effect from the selection of indicators, (4) testing the robustness of identified clusters with respect to changes in forcing or boundary conditions, (5) enabling a comparative assessment of varying scenarios by constructing and evaluating a common correlation matrix, and (6) the inclusion of expert judgment, for example, to prescribe indicators, to allow for considerations other than statistical consistency. The example application of the SCoMaE method to Earth system model output forced by different CO2 emission scenarios reveals the necessity of reevaluating indicators identified in a historical scenario simulation for an accurate assessment of an intermediate-high, as well as a business-as-usual, climate change scenario simulation. This necessity arises from changes in prevailing correlations in the Earth system under varying climate forcing. For a comparative assessment of the three climate change scenarios, we construct and evaluate a common correlation matrix, in which we identify robust correlations between variables across the three considered scenarios.

  11. From correlates to causes: can quasi-experimental studies and statistical innovations bring us closer to identifying the causes of antisocial behavior?

    PubMed

    Jaffee, Sara R; Strait, Luciana B; Odgers, Candice L

    2012-03-01

    Longitudinal, epidemiological studies have identified robust risk factors for youth antisocial behavior, including harsh and coercive discipline, maltreatment, smoking during pregnancy, divorce, teen parenthood, peer deviance, parental psychopathology, and social disadvantage. Nevertheless, because this literature is largely based on observational studies, it remains unclear whether these risk factors have truly causal effects. Identifying causal risk factors for antisocial behavior would be informative for intervention efforts and for studies that test whether individuals are differentially susceptible to risk exposures. In this article, we identify the challenges to causal inference posed by observational studies and describe quasi-experimental methods and statistical innovations that may move researchers beyond discussions of risk factors to allow for stronger causal inference. We then review studies that used these methods, and we evaluate whether robust risk factors identified from observational studies are likely to play a causal role in the emergence and development of youth antisocial behavior. There is evidence of causal effects for most of the risk factors we review. However, these effects are typically smaller than those reported in observational studies, suggesting that familial confounding, social selection, and misidentification might also explain some of the association between risk exposures and antisocial behavior. For some risk factors (e.g., smoking during pregnancy, parent alcohol problems), the evidence is weak that they have environmentally mediated effects on youth antisocial behavior. We discuss the implications of these findings for intervention efforts to reduce antisocial behavior and for basic research on the etiology and course of antisocial behavior.

  12. Bayes factors based on robust TDT-type tests for family trio design.

    PubMed

    Yuan, Min; Pan, Xiaoqing; Yang, Yaning

    2015-06-01

    Adaptive transmission disequilibrium test (aTDT) and MAX3 test are two robust-efficient association tests for case-parent family trio data. Both tests incorporate information of common genetic models including recessive, additive and dominant models and are efficient in power and robust to genetic model specifications. The aTDT uses information of departure from Hardy-Weinberg disequilibrium to identify the potential genetic model underlying the data and then applies the corresponding TDT-type test, and the MAX3 test is defined as the maximum of the absolute value of three TDT-type tests under the three common genetic models. In this article, we propose three robust Bayes procedures, the aTDT based Bayes factor, MAX3 based Bayes factor and Bayes model averaging (BMA), for association analysis with case-parent trio design. The asymptotic distributions of aTDT under the null and alternative hypothesis are derived in order to calculate its Bayes factor. Extensive simulations show that the Bayes factors and the p-values of the corresponding tests are generally consistent and these Bayes factors are robust to genetic model specifications, especially so when the priors on the genetic models are equal. When equal priors are used for the underlying genetic models, the Bayes factor method based on aTDT is more powerful than those based on MAX3 and Bayes model averaging. When the prior placed a small (large) probability on the true model, the Bayes factor based on aTDT (BMA) is more powerful. Analysis of a simulation data about RA from GAW15 is presented to illustrate applications of the proposed methods.

  13. Reliability of simulated robustness testing in fast liquid chromatography, using state-of-the-art column technology, instrumentation and modelling software.

    PubMed

    Kormány, Róbert; Fekete, Jenő; Guillarme, Davy; Fekete, Szabolcs

    2014-02-01

    The goal of this study was to evaluate the accuracy of simulated robustness testing using commercial modelling software (DryLab) and state-of-the-art stationary phases. For this purpose, a mixture of amlodipine and its seven related impurities was analyzed on short narrow bore columns (50×2.1mm, packed with sub-2μm particles) providing short analysis times. The performance of commercial modelling software for robustness testing was systematically compared to experimental measurements and DoE based predictions. We have demonstrated that the reliability of predictions was good, since the predicted retention times and resolutions were in good agreement with the experimental ones at the edges of the design space. In average, the retention time relative errors were <1.0%, while the predicted critical resolution errors were comprised between 6.9 and 17.2%. Because the simulated robustness testing requires significantly less experimental work than the DoE based predictions, we think that robustness could now be investigated in the early stage of method development. Moreover, the column interchangeability, which is also an important part of robustness testing, was investigated considering five different C8 and C18 columns packed with sub-2μm particles. Again, thanks to modelling software, we proved that the separation was feasible on all columns within the same analysis time (less than 4min), by proper adjustments of variables. Copyright © 2013 Elsevier B.V. All rights reserved.

  14. Limited data tomographic image reconstruction via dual formulation of total variation minimization

    NASA Astrophysics Data System (ADS)

    Jang, Kwang Eun; Sung, Younghun; Lee, Kangeui; Lee, Jongha; Cho, Seungryong

    2011-03-01

    The X-ray mammography is the primary imaging modality for breast cancer screening. For the dense breast, however, the mammogram is usually difficult to read due to tissue overlap problem caused by the superposition of normal tissues. The digital breast tomosynthesis (DBT) that measures several low dose projections over a limited angle range may be an alternative modality for breast imaging, since it allows the visualization of the cross-sectional information of breast. The DBT, however, may suffer from the aliasing artifact and the severe noise corruption. To overcome these problems, a total variation (TV) regularized statistical reconstruction algorithm is presented. Inspired by the dual formulation of TV minimization in denoising and deblurring problems, we derived a gradient-type algorithm based on statistical model of X-ray tomography. The objective function is comprised of a data fidelity term derived from the statistical model and a TV regularization term. The gradient of the objective function can be easily calculated using simple operations in terms of auxiliary variables. After a descending step, the data fidelity term is renewed in each iteration. Since the proposed algorithm can be implemented without sophisticated operations such as matrix inverse, it provides an efficient way to include the TV regularization in the statistical reconstruction method, which results in a fast and robust estimation for low dose projections over the limited angle range. Initial tests with an experimental DBT system confirmed our finding.

  15. Enhanced detection and visualization of anomalies in spectral imagery

    NASA Astrophysics Data System (ADS)

    Basener, William F.; Messinger, David W.

    2009-05-01

    Anomaly detection algorithms applied to hyperspectral imagery are able to reliably identify man-made objects from a natural environment based on statistical/geometric likelyhood. The process is more robust than target identification, which requires precise prior knowledge of the object of interest, but has an inherently higher false alarm rate. Standard anomaly detection algorithms measure deviation of pixel spectra from a parametric model (either statistical or linear mixing) estimating the image background. The topological anomaly detector (TAD) creates a fully non-parametric, graph theory-based, topological model of the image background and measures deviation from this background using codensity. In this paper we present a large-scale comparative test of TAD against 80+ targets in four full HYDICE images using the entire canonical target set for generation of ROC curves. TAD will be compared against several statistics-based detectors including local RX and subspace RX. Even a perfect anomaly detection algorithm would have a high practical false alarm rate in most scenes simply because the user/analyst is not interested in every anomalous object. To assist the analyst in identifying and sorting objects of interest, we investigate coloring of the anomalies with principle components projections using statistics computed from the anomalies. This gives a very useful colorization of anomalies in which objects of similar material tend to have the same color, enabling an analyst to quickly sort and identify anomalies of highest interest.

  16. Assessment of Reliable Change Using 95% Credible Intervals for the Differences in Proportions: A Statistical Analysis for Case-Study Methodology

    ERIC Educational Resources Information Center

    Unicomb, Rachael; Colyvas, Kim; Harrison, Elisabeth; Hewat, Sally

    2015-01-01

    Purpose: Case-study methodology studying change is often used in the field of speech-language pathology, but it can be criticized for not being statistically robust. Yet with the heterogeneous nature of many communication disorders, case studies allow clinicians and researchers to closely observe and report on change. Such information is valuable…

  17. Inferring Causalities in Landscape Genetics: An Extension of Wright's Causal Modeling to Distance Matrices.

    PubMed

    Fourtune, Lisa; Prunier, Jérôme G; Paz-Vinas, Ivan; Loot, Géraldine; Veyssière, Charlotte; Blanchet, Simon

    2018-04-01

    Identifying landscape features that affect functional connectivity among populations is a major challenge in fundamental and applied sciences. Landscape genetics combines landscape and genetic data to address this issue, with the main objective of disentangling direct and indirect relationships among an intricate set of variables. Causal modeling has strong potential to address the complex nature of landscape genetic data sets. However, this statistical approach was not initially developed to address the pairwise distance matrices commonly used in landscape genetics. Here, we aimed to extend the applicability of two causal modeling methods-that is, maximum-likelihood path analysis and the directional separation test-by developing statistical approaches aimed at handling distance matrices and improving functional connectivity inference. Using simulations, we showed that these approaches greatly improved the robustness of the absolute (using a frequentist approach) and relative (using an information-theoretic approach) fits of the tested models. We used an empirical data set combining genetic information on a freshwater fish species (Gobio occitaniae) and detailed landscape descriptors to demonstrate the usefulness of causal modeling to identify functional connectivity in wild populations. Specifically, we demonstrated how direct and indirect relationships involving altitude, temperature, and oxygen concentration influenced within- and between-population genetic diversity of G. occitaniae.

  18. The Influence Function of Principal Component Analysis by Self-Organizing Rule.

    PubMed

    Higuchi; Eguchi

    1998-07-28

    This article is concerned with a neural network approach to principal component analysis (PCA). An algorithm for PCA by the self-organizing rule has been proposed and its robustness observed through the simulation study by Xu and Yuille (1995). In this article, the robustness of the algorithm against outliers is investigated by using the theory of influence function. The influence function of the principal component vector is given in an explicit form. Through this expression, the method is shown to be robust against any directions orthogonal to the principal component vector. In addition, a statistic generated by the self-organizing rule is proposed to assess the influence of data in PCA.

  19. Stochastic simulation and robust design optimization of integrated photonic filters

    NASA Astrophysics Data System (ADS)

    Weng, Tsui-Wei; Melati, Daniele; Melloni, Andrea; Daniel, Luca

    2017-01-01

    Manufacturing variations are becoming an unavoidable issue in modern fabrication processes; therefore, it is crucial to be able to include stochastic uncertainties in the design phase. In this paper, integrated photonic coupled ring resonator filters are considered as an example of significant interest. The sparsity structure in photonic circuits is exploited to construct a sparse combined generalized polynomial chaos model, which is then used to analyze related statistics and perform robust design optimization. Simulation results show that the optimized circuits are more robust to fabrication process variations and achieve a reduction of 11%-35% in the mean square errors of the 3 dB bandwidth compared to unoptimized nominal designs.

  20. Robust control algorithms for Mars aerobraking

    NASA Technical Reports Server (NTRS)

    Shipley, Buford W., Jr.; Ward, Donald T.

    1992-01-01

    Four atmospheric guidance concepts have been adapted to control an interplanetary vehicle aerobraking in the Martian atmosphere. The first two offer improvements to the Analytic Predictor Corrector (APC) to increase its robustness to density variations. The second two are variations of a new Liapunov tracking exit phase algorithm, developed to guide the vehicle along a reference trajectory. These four new controllers are tested using a six degree of freedom computer simulation to evaluate their robustness. MARSGRAM is used to develop realistic atmospheres for the study. When square wave density pulses perturb the atmosphere all four controllers are successful. The algorithms are tested against atmospheres where the inbound and outbound density functions are different. Square wave density pulses are again used, but only for the outbound leg of the trajectory. Additionally, sine waves are used to perturb the density function. The new algorithms are found to be more robust than any previously tested and a Liapunov controller is selected as the most robust control algorithm overall examined.

Top